1. Introduction

AI infrastructure is usually discussed through GPUs, power, and capital expenditure. That framing misses a quieter bottleneck inside the cluster: the network fabric linking accelerators together.

As model training scales, copper interconnects run into physical limits on reach, signal integrity, and power efficiency. Optical links are moving from the edge of the data center toward the center of the AI stack.

2. Copper vs Optics

Copper has historically dominated short-reach connections because it is familiar, cheap, and simple to deploy. The problem is that bandwidth density and distance requirements are now rising faster than copper can comfortably absorb.

Attribute	Copper	Optics
Reach	Strong at very short distances	Scales better over longer runs
Bandwidth density	Increasingly constrained	Better fit for high-speed scaling
Power per bit	Rises as speeds increase	Can be more efficient at scale
Cluster implications	Good for legacy and short-reach	Critical for larger AI fabrics

3. The AI Data Center Network

An AI data center is not just a room full of GPUs. It is a tightly coupled system of servers, switches, transceivers, and software orchestration that has to move enormous volumes of data with very low latency.

"The network now determines whether expensive compute is fully utilized or left waiting on communication overhead."

4. The Optical Supply Chain

The optical stack spans lasers, DSPs, modulators, transceivers, fiber infrastructure, and switch integration. Each layer has different economics and different bottlenecks.

Layer	Role	Why it matters
Laser / Photonic	Create and shape signals	Foundational performance layer
DSP / Processing	Convert and manage data	Critical for efficiency and reliability
Transceivers	Package optics into hardware	Main interface with switching gear

5. Co-Packaged Optics

Co-packaged optics (CPO) aims to bring optical engines closer to the switch silicon itself. The goal is straightforward: reduce power, improve signal integrity, and avoid the growing penalties of driving electrical traces at extreme speeds.

6. AI Cluster Communication

Large training clusters behave like communication machines as much as compute machines. At frontier scale, communication becomes a first-order variable in both training time and total cost.

Cluster metric	Implication	Networking requirement
Accelerator Count	Higher east-west traffic	Denser interconnect fabric
Parameter Count	More synchronization	Lower-latency communication
Longer Reach	Copper impractical	Greater optical penetration

7. Networking Bandwidth Evolution

The progression from lower-speed networking to 800G and beyond is not just a spec sheet story. It changes rack design, power budgets, and system architecture.

Era	Typical bandwidth step	Architectural pressure
Legacy Cloud	25G to 100G	Incremental rack efficiency
Modern AI	200G to 400G	Cluster-level throughput
Frontier AI	800G and beyond	Power, reach, and redesign

8. Why This Matters

Optics is becoming one of the hidden denominators of AI infrastructure. If compute demand continues to rise, the network can no longer be treated as a commodity afterthought.

9. Follow Along

I’ll continue writing about the less obvious constraints in AI infrastructure, especially where hardware, energy, and capital markets intersect.