OpenAI Multipath Reliable Connection Protocol Released to Open Compute Project for AI Networking

OpenAI Multipath Reliable Connection Protocol Released to Open Compute Project for AI Networking

OpenAI Releases Multipath Reliable Connection MRC Specification Through Open Compute Project Partnership to Standardize Networking Architecture and Enhance Failure Resilience for Frontier Model Training Across Global GPU Clusters

The Multipath Reliable Connection specification which OpenAI developed was released to the Open Compute Project. The release comes after a 2 year partnership with AMD Broadcom Intel Microsoft and NVIDIA to address networking problems which hinder frontier model training. The OpenAI technical announcement states that the Stargate supercomputer system uses the protocol as a fundamental part of its architecture which will be distributed to all companies in the industry to create common standards for managing data between 100000 GPU clusters.

The MRC protocol enables a fundamental change in how supercomputer networks are built. Conventional designs treat a single network interface as a lone 800 Gbps link. MRC instead divides that interface into 8 parallel networks which operate at 100 Gbps each. The design provides its primary advantage through enhanced switch Radix Efficiency. A switch that previously supported 64 ports at 800 Gbps can now connect 512 ports at 100 Gbps. Engineers can connect 131000 GPUs by using only 2 switch tiers because traditional Ethernet layouts require 3 or 4 tiers for this task. The design reduces power usage while minimizing the chances of hardware parts failing during training sessions.

The system uses adaptive packet spraying to control Path Diversity and Congestion Control. The older RoCE deployments experienced link congestion because all data transfers used their single path. MRC takes a single data transfer and spreads it over multiple paths in hundreds of different routes. MRC packets include their complete memory address which allows the destination to reconstruct data from the received data even when packets arrive in the wrong order. The system switches to an unused path whenever a congested path is detected. The design decreases the graphics processing unit idle time which occurs during synchronized training sessions.

OpenAI has eliminated dynamic routing protocols which include BGP in favor of using IPv6 Segment Routing as their routing method. The sender defines the complete route for each packet through static source routing by including switch identifiers into the destination address. The system achieves Hardware Level Reliability because switches do not require route recomputation after a link fails. MRC stops using a specific path when it detects a packet loss within that route. The system enables training jobs to continue after link flaps and switch reboots which previously would have led to complete system failures or needed restarts from saved checkpoints.

Packet Trimming Technology acts as a vital secondary protection mechanism which prevents congestion. The switch forwards only header information to the target after it removes packet content because of buffer pressure. The system enables the recipient to use the explicit signal which requests retransmission while avoiding standard packet loss false positives. The cluster enables ongoing maintenance and repair operations through its complete operational status. OpenAI reports that tier 1 switches have been rebooted during active frontier model training without any measurable impact on the job progress.

The protocol operates on NVIDIA GB200 supercomputers which power Oracle Cloud Infrastructure in Abilene Texas and Microsoft Fairwater systems. The protocol validation stage received critical input from AMD Pensando Technology. According to AMD engineering briefs their Pensando Pollara 400 and the newer Vulcano 800G AI NICs provide the programmability required to implement MRC at the hardware level. The industry achieved production readiness through the creation of a specification which transformed from theoretical concepts into an operational base that could manage the complete AI workload spectrum.

Broadcom Infrastructure Support is provided through the Thor Ultra 800 Gbps NIC and the Tomahawk 5 and Tomahawk 6 switches. The Tomahawk 6 switch enables 512 ports to operate at 200 Gbps which enables the development of vast 2 tier networks that link 128000 XPUs together. The company Broadcom developed high performance packet trimming logic which allows switches to maintain their responsive capabilities during 15 to 1 many to one traffic conditions. MRC enables users to choose paths through their sender system while maintaining network performance because it makes the network function as a single switch which does not block operations across all sizes of the supercomputer system.

About the author

mgtid
Owner of Technetbook | 10+ Years of Expertise in Technology | Seasoned Writer, Designer, and Programmer | Specialist in In-Depth Tech Reviews and Industry Insights | Passionate about Driving Innovation and Educating the Tech Community Technetbook

Join the conversation

Newsletter Subscription