NVIDIA's 550B Nemotron Embarrassed Every US Open Model — and It Shouldn't Run This Fast

9 min read

10 hours ago

NVIDIA just shipped a 550B-parameter open model that scores 48 on the Artificial Analysis Intelligence Index. The next-best American open-weights model, Google’s Gemma 4, sits at 39. OpenAI’s gpt-oss-120b sits at 33. NVIDIA’s own previous flagship, Nemotron 3 Super, sits at 36.

That is not a close race. That is a 9-to-15-point lead over every other open model the United States has produced — and NVIDIA, a company most people still think of as “the GPU company,” is the one holding the trophy.

I spent the morning digging through the release notes, the Artificial Analysis evaluation, and the Hugging Face model cards. Two numbers refused to leave me alone. The first is 550 billion — the parameter count of a model NVIDIA is giving away under a commercial-friendly license. The second is 300 — the tokens per second this half-trillion-parameter model serves on a single pre-release endpoint, while comparable Chinese frontier models crawl along at 50 to 100. A model this big is not supposed to be this fast. This one is. Here is what actually happened, and the uncomfortable footnote NVIDIA buried in the keynote.

What NVIDIA actually announced at Computex

Jensen Huang unveiled Nemotron 3 Ultra during his Computex keynote in Taipei on June 1, 2026. It is the largest open language model NVIDIA has ever released — roughly 500 billion total parameters by NVIDIA’s count…