MLPerf Training v6.0 Benchmarks Highlight Shift to Sparse Computation and Mixture of Experts

MLPerf Training v6.0 Benchmarks Highlight Shift to Sparse Computation and Mixture of Experts

MLPerf Training v6.0 Benchmarks Reveal Rise Of Sparse Computation Mixture Of Experts and Unprecedented AI System Diversity

MLCommons has announced the latest performance data for the MLPerf Training v6.0 benchmarks. The updated industry standard tests demonstrate a definitive shift in how state of the art AI models are trained. This iteration introduces two new benchmarks to measure sparse computation, a key feature of the rapid adoption of the Mixture of Experts architecture throughout enterprise and research. An open source, peer reviewed benchmark suite, the test measures the efficiency of ML hardware, software, and systems.

"We are seeing strong convergence on a set of best practices for training AI models, but at the same time there is increasing technical diversity in the underlying frameworks and systems that are being used to host and run them," said Shriya Rishab, MLPerf Training Working Group co chair.

The two new benchmarks in version 6.0, DeepSeek V3 and GPT OSS 20B, both measure performance in sparse computation. Mixture of Experts architecture uses a routing mechanism to selectively send certain data tokens to specific expert sub networks. Models of this design can leverage truly massive total parameters while maintaining efficiency, as only a subset of a network's parameters will be active per token during training or inference.

DeepSeek V3 is the largest benchmark in the entire suite at 671,000,000,000 total parameters, but it is only 37,000,000,000 active per token, allowing businesses to benchmark open weights models at production scale. The benchmark tests recent innovations like Multihead Latent Attention and loss free load balancing, among others. The smaller of the two new benchmarks is GPT OSS 20B with 21,000,000,000 total parameters and 3,600,000,000 active per token. This benchmark allows smaller organizations to analyze sparse computation and routing complexity at a smaller scale, even in a single node environment with only 8 GPUs.

MLPerf Training v6.0 saw the most diverse set of submitted systems to date, with 95 unique configurations, 13 different accelerators, and 19 different hosts. Sixty percent of submissions used a multinode system. The most significant increase by volume is cloud based ML training platforms, which have more than doubled in submissions from 6 months ago as businesses embrace on premise and cloud training options.

"There are more ways of getting your AI training than ever before. Several companies now offer training systems in the cloud, complementing the on premises systems that continue to be built out at a furious pace," explained Pavan Yalamanchili, another of the working group co chairs. "And we are excited to see so many competitive submissions from a variety of on premises and cloud providers."

The testing also identified a number of different FP4 implementations as hardware vendors begin to explore the possibility of lower precision formats for faster training. Since the MLPerf standard requires all submissions to meet certain accuracy constraints on model performance, buyers can easily tell where mathematical precision optimizations yield the highest performance gains.

Twenty four organizations submitted verifiable performance data for the latest testing round, with contributions from major companies like Dell, AMD, Google, NVIDIA, Supermicro, Azure, and Oracle, as well as first time submitters such as TTA, Inventec, Netweb Technologies India LTD, and Vultr.

For full, detailed performance data, visit the MLPerf Training benchmarks results page.

About the author

mgtid
Owner of Technetbook | 10+ Years of Expertise in Technology | Seasoned Writer, Designer, and Programmer | Specialist in In-Depth Tech Reviews and Industry Insights | Passionate about Driving Innovation and Educating the Tech Community Technetbook

Join the conversation

Newsletter Subscription