1. Introduction
AI infrastructure is usually discussed through GPUs, power, and capital expenditure. That framing misses a quieter bottleneck inside the cluster: the network fabric linking accelerators together.
As model training scales, copper interconnects run into physical limits on reach, signal integrity, and power efficiency. Optical links are moving from the edge of the data center toward the center of the AI stack.
2. Copper vs Optics
Copper has historically dominated short-reach connections because it is familiar, cheap, and simple to deploy. The problem is that bandwidth density and distance requirements are now rising faster than copper can comfortably absorb.
| Attribute | Copper | Optics |
|---|---|---|
| Reach | Strong at very short distances | Scales better over longer runs |
| Bandwidth density | Increasingly constrained | Better fit for high-speed scaling |
| Power per bit | Rises as speeds increase | Can be more efficient at scale |
| Cluster implications | Good for legacy and short-reach | Critical for larger AI fabrics |
3. The AI Data Center Network
An AI data center is not just a room full of GPUs. It is a tightly coupled system of servers, switches, transceivers, and software orchestration that has to move enormous volumes of data with very low latency.
"The network now determines whether expensive compute is fully utilized or left waiting on communication overhead."
4. The Optical Supply Chain
The optical stack spans lasers, DSPs, modulators, transceivers, fiber infrastructure, and switch integration. Each layer has different economics and different bottlenecks.
| Layer | Role | Why it matters |
|---|---|---|
| Laser / Photonic | Create and shape signals | Foundational performance layer |
| DSP / Processing | Convert and manage data | Critical for efficiency and reliability |
| Transceivers | Package optics into hardware | Main interface with switching gear |
5. Co-Packaged Optics
Co-packaged optics (CPO) aims to bring optical engines closer to the switch silicon itself. The goal is straightforward: reduce power, improve signal integrity, and avoid the growing penalties of driving electrical traces at extreme speeds.
6. AI Cluster Communication
Large training clusters behave like communication machines as much as compute machines. At frontier scale, communication becomes a first-order variable in both training time and total cost.
| Cluster metric | Implication | Networking requirement |
|---|---|---|
| Accelerator Count | Higher east-west traffic | Denser interconnect fabric |
| Parameter Count | More synchronization | Lower-latency communication |
| Longer Reach | Copper impractical | Greater optical penetration |
7. Networking Bandwidth Evolution
The progression from lower-speed networking to 800G and beyond is not just a spec sheet story. It changes rack design, power budgets, and system architecture.
| Era | Typical bandwidth step | Architectural pressure |
|---|---|---|
| Legacy Cloud | 25G to 100G | Incremental rack efficiency |
| Modern AI | 200G to 400G | Cluster-level throughput |
| Frontier AI | 800G and beyond | Power, reach, and redesign |
8. Why This Matters
Optics is becoming one of the hidden denominators of AI infrastructure. If compute demand continues to rise, the network can no longer be treated as a commodity afterthought.
9. Follow Along
I’ll continue writing about the less obvious constraints in AI infrastructure, especially where hardware, energy, and capital markets intersect.