NVIDIA Sweeps MLPerf Training Benchmarks: Achieves Near-Perfect Scaling

NVIDIA Sweeps MLPerf Training Benchmarks: Achieves Near-Perfect Scaling


NVIDIA, a powerhouse in AI, has once again demonstrated its dominance in the MLPerf arena. With remarkable efficiency and substantial gains using Hopper H100 and H200 GPUs, NVIDIA solidifies its position as the AI leader.

The AI Imperative

AI computational demands are skyrocketing, especially since the advent of transformers. Within just two years, requirements have grown 256-fold. NVIDIA recognizes this explosive growth and continues to push boundaries.

Performance Matters

Performance directly impacts ROI. NVIDIA focuses on three critical segments:

  1. Training: Faster, more intelligent models are essential.
  2. Inference: Instant responses for user experiences (think ChatGPT).
  3. Business Opportunities: LLM service providers can turn $1 into $7 over four years by running Llama 3 70B on NVIDIA HGX H200 servers.

Hopper H200: Supercharging AI

  • The H200 Tensor GPU, building on Hopper architecture, boasts 141GB of HBM3 memory and 40% more bandwidth than the H100.
  • In MLPerf Training, the H200 achieved a 14% performance boost over the H100.

Software Enhancements

  • NVIDIA’s software stack optimizations led to a 27% speed increase in a 512 H100 GPU configuration compared to last year.
  • Perfect scaling: As GPUs increased 3.2x, so did performance.

LLM Fine-Tuning Excellence

  • NVIDIA’s platform scaled effortlessly from eight to 1,024 GPUs, making it versatile for various business needs.
  • Stable Diffusion v2 training performance accelerated by up to 80%.

Record-Breaking Benchmarks

NVIDIA shattered its own records in MLPerf Training v4.0:

  • Graph Neural Network R-GAT: 1.1 minutes (512 H100 GPUs)
  • LLM Fine-Tuning Llama 2 70B-LoRA: 1.5 minutes (1,024 H100 GPUs)
  • LLM GPT-3 175B: 3.4 minutes (11,616 H100 GPUs)
  • And more…

Scaling and Future Prospects

  • EOS-DFW superpod now features 11,616 H100 GPUs interconnected via NVIDIA 400G Quantum-2 InfiniBand.
  • Hopper GPUs continue to evolve, with 27% performance gains and impressive throughput.

“The More You Buy, The More You Make”

NVIDIA’s AI factories, equipped with 100,000 to 300,000 GPUs, are set to revolutionize AI deployment.


NVIDIA’s hardware and software prowess continues to astound. Expect even greater performance in the upcoming software stack.