Lambda Analyzes Copackaged Optics in Next Generation NVIDIA Blackwell Clusters

Lambda Analyzes Copackaged Optics in Next Generation NVIDIA Blackwell Clusters

Copackaged optics and the NVIDIA Quantum X Photonics Q3450 LD switch redefine power efficiency and reliability in massive GPU clusters

The days of GPU cluster networks being mere back end support systems are over; they are now an integral part of the computing envelope itself. With systems such as 800G architectures and NVIDIA GB300 NVL72 at play, it is estimated that as much as 86% of a standard 3 layer cluster networking power comes from the back end fabric. When these workloads began to evolve with the introduction of agentic AI models, this power overhead became a serious issue. In order to perform a single agentic task the GPU has to send off a series of cascaded calls to model execution, tools, other model calls, recursive logic which drastically increases east west communication in the cluster. If either the interconnect breaks down or individual transceivers fail, then the token throughput will plummet.

To help address the problems with this new wave of computing demands, Lambda has started evaluating the use of copackaged optics through testing the NVIDIA Quantum X Photonics Q3450 LD switch. The addition of this new hardware indicates a shift toward a system that can drastically reduce power draw while also increase the reliability of the infrastructure. By placing the optical conversion inside of the switch package, significant electrical headroom can be freed up for the data center to put power directly back into accelerators instead of into its network.

The primary advantage of using copackaged optics is the reduction in power draw. A standard switch requires approximately 7.0 kW of power while a NVIDIA Photonics copackaged optics switch operates at 3.95 kW per switch, for an immediate 3.05 kW saved per switch. This is significant, as it takes 1400 W of thermal design power for a single NVIDIA Blackwell Ultra GPU. This means that power can be recovered from the network to run another GPU.

This benefit increases dramatically with cluster size:

  • At a 576 GPU system using 12 switches, the amount of power saved equals 37 kW, enough to power 26 additional GPUs.
  • At a 4608 GPU cluster using 100 network devices, a savings of 305 kW is possible, which is the same as running 217 additional GPUs.
  • For large installations of 10368 GPUs using 216 network devices, the savings is 658 kW of power for 470 additional GPUs.
  • A massive 41472 GPU cluster using 1440 switches could save 4392 kW of power, running an additional 3137 GPUs within the same power budget.

Reliability is the second biggest benefit. At a 128,000 GPU facility using pluggable transceivers there would be 655,360 distinct optical components that all could fail. Copackaged optics would remove the entire category of pluggable components. While network down time in a training job would simply require replaying from a checkpoint, a failure during a live agentic inference task could result in a stopped workflow with lost throughput and wasted GPU power.

The new 4U hardware for the Q3450 LD switch represents a departure from traditional networking hardware designs. The front of the unit is no longer dotted with OSFP cages, but with direct fiber array connectors that lead into the silicon photonics engines. There are 18 removable external light source modules that supply optical inputs, with one supply for every 8 ports to cover the 144 MPO connections. Keeping these laser sources separate makes the component field replaceable while the rest of the optics are integrated directly into the hardware. This provides a total of 115.2 Tb/s of non blocking switching through its NVIDIA Quantum X800 ASIC.

The rear of the unit is consistent with the design considerations for the NVIDIA GB300 NVL72 rack. Power is provided via a 48V DC busbar connector, and cooling is handled by 4 UDQ4 liquid cooling connections that tie into 2 distinct internal cooling loops. By having a compatible infrastructure with the server nodes the switch fits seamlessly into the busbar and liquid cooling architecture within the data center.

Traditional network components route electrical signals from the ASIC, down a printed circuit board, and through the electronics in a pluggable transceiver. In the transceiver, a digital signal processor had to try and correct for the degradation the signal experienced on the board before it could convert the electrical signal into a light signal. Copackaged optics integrates this conversion process so that it is located right next to the ASIC, effectively shortening the electrical path from centimeters down to tens of micrometers, which reduces the signal loss from about 20dB to approximately 4dB. Because there is no longer an extra chip needed for every connection to correct for signal loss, the power draw and latency of this extra chip are eliminated at every node.

This hardware change presents its own set of challenges with the installation of data centers. The optical pathway is now deeply embedded into the system which necessitates thorough planning of fiber routing, power delivery, and cooling before hand. Lambda and NVIDIA have collaborated to create standardized, reliable installation processes, validating rack design, power delivery, and cooling in real world tests. Lambda is an NVIDIA Exemplar Cloud Partner which helped in testing and validation of these methods.

The implementation of copackaged optics aligns with the financial side of token generation. Speaking at the GTC conference Ashkan Seyedi, Director of Product Marketing for Networking at NVIDIA, spoke about the importance of the technology for future AI applications

"Multi agentic inference requires elastic and resilient data movement, so that GPUs are not waiting around for data, while keeping tokens per second high and time to first token low."

In the end, the limitations on power and cooling present in today s massive GPU clusters means that reducing the network s power draw is one of the only ways to expand compute capacity. Removing the energy hog of the switching fabric and eliminating hundreds of thousands of possible points of failure means that data centers can fit more productive, running compute hardware into their facilities than ever before.

About the author

mgtid
Owner of Technetbook | 10+ Years of Expertise in Technology | Seasoned Writer, Designer, and Programmer | Specialist in In-Depth Tech Reviews and Industry Insights | Passionate about Driving Innovation and Educating the Tech Community Technetbook

Join the conversation

Newsletter Subscription