Nvidia launched Nemotron 3 Tremendous, a 120B open-weight AI mannequin optimized for autonomous brokers and ultra-long context duties.
The hybrid Mamba-Transformer MoE structure delivers quicker reasoning and over 5× throughput whereas working at 4-bit precision.
Nvidia’s $26 billion funding into open-source AI desires to counter China’s rise within the subject.
Nvidia simply shipped Nemotron 3 Tremendous, a 120-billion-parameter open-weight mannequin constructed to do one factor properly: run autonomous AI brokers with out bleeding your compute finances dry.
That is not a small downside. Multi-agent techniques generate much more tokens than a traditional chat—each instrument name, reasoning step, and slice of context will get re-sent from scratch. Because of this, prices explode, fashions are likely to drift, and the brokers slowly overlook what they had been imagined to be doing within the first place… or not less than lower in accuracy.
Nemotron 3 Tremendous is Nvidia’s reply to all of that. The mannequin runs 12 billion lively parameters out of 120 billion whole, utilizing a mixture-of-experts (MoE) design that retains inference low-cost whereas retaining the reasoning depth advanced workflows want. It packs a 1-million-token context window, so brokers can maintain a complete codebase, or practically 750,000 phrases in reminiscence earlier than collapsing.
To construct its mannequin, Nvidia mixed three parts that not often seem collectively in the identical structure: Mamba-2 state-space layers—a quicker, memory-efficient different to consideration for dealing with lengthy token streams—together with Transformer consideration layers for exact recall, and a brand new “Latent MoE” design that compresses token embeddings earlier than routing them to consultants. That permits the mannequin to activate 4 instances as many specialists on the similar compute value.
Introducing NVIDIA Nemotron 3 Tremendous 🎉
Open 120B-parameter (12B lively) hybrid Mamba-Transformer MoE mannequin
Native 1M-token context
Constructed for compute-efficient, high-accuracy multi-agent functions
Plus, absolutely open weights, datasets and recipes for simple customization and… pic.twitter.com/kMFI23noFc
— NVIDIA AI Developer (@NVIDIAAIDev) March 11, 2026
The mannequin was additionally pretrained natively in NVFP4, Nvidia’s 4-bit floating-point format. In apply, meaning the system discovered to function precisely inside 4-bit arithmetic from the very first gradient replace, relatively than being educated at excessive precision and compressed afterward, which frequently causes fashions to lose accuracy.
For context, a mannequin’s precision is measured in bits. Full precision, referred to as FP32, is the gold commonplace—however it’s also extraordinarily costly to run at scale. Builders usually cut back precision to avoid wasting compute whereas making an attempt to protect helpful efficiency.
Consider it like shrinking a 4K picture right down to 1080p: The image nonetheless appears to be like the identical at a look, simply with much less element. Usually, dropping from 32-bit precision all the best way to 4-bit would cripple a mannequin’s reasoning capability. Nemotron avoids that downside by studying to function at low precision from the beginning, as an alternative of being squeezed into it later.
In comparison with its personal predecessor, Nemotron 3 Tremendous delivers greater than 5 instances the throughput. Towards exterior rivals, it is 2.2x quicker than OpenAI’s GPT-OSS 120B on inference throughput, and seven.5x quicker than Alibaba’s Qwen3.5-122B.
We ran our personal fast take a look at. The reasoning held up properly, together with on prompts that had been intentionally imprecise, badly worded, or primarily based on mistaken info. The mannequin caught small errors in context with out being requested to, dealt with math and logic issues cleanly, and did not crumble when the query itself was barely off.
The total coaching pipeline is public: weights on Hugging Face, 10 trillion curated pretraining tokens seen over 25 trillion whole throughout coaching, 40 million post-training samples, and reinforcement studying recipes throughout 21 setting configurations. Perplexity, Palantir, Cadence, and Siemens are already integrating the mannequin of their workflows.
The $26 billion guess
The mannequin could also be one piece of a bigger technique. A 2025 monetary submitting reveals Nvidia plans to spend $26 billion over the following 5 years constructing open-weight AI fashions. Executives confirmed it, too.
Bryan Catanzaro, VP of utilized deep studying analysis, instructed Wired the corporate not too long ago completed pretraining a 550-billion-parameter mannequin. Nvidia launched its first Nemotron mannequin again in November 2023, however that submitting makes clear that is now not a aspect venture.
The funding is strategic contemplating Nvidia’s chips are nonetheless the default infrastructure for coaching and working frontier fashions. Fashions tuned to its {hardware} give prospects a built-in purpose to remain on Nvidia regardless of efforts from rivals to make use of different {hardware}. However there is a extra pressing strain behind the transfer: America is dropping the open-source AI race, and dropping it quick.
Chinese language open fashions went from barely 1.2% of worldwide open-model utilization in late 2024 to roughly 30% by the top of 2025, in response to analysis by OpenRouter and Andreessen Horowitz. Alibaba’s Qwen overtook Meta’s Llama because the most-used self-hosted open-source mannequin, in response to Runpod. American corporations together with Airbnb adopted it for customer support. Startups worldwide are constructing on prime of it. Past market share, that sort of adoption creates infrastructure dependencies which might be laborious to reverse.
Whereas U.S. giants like OpenAI, Anthropic, and Google preserve their greatest fashions locked behind APIs, Chinese language labs from DeepSeek to Alibaba have been flooding the open ecosystem. Meta was the one main American participant competing in open supply with Llama, however Zuckerberg not too long ago signaled the corporate won’t make future fashions absolutely open.
The hole between “greatest proprietary mannequin” and “greatest open mannequin” was once huge—and in America’s favor. That hole is now very small, and the open aspect of the ledger is more and more Chinese language.
Unimaginable graph. In only one 12 months, China fully overtook the U.S. in free AI fashions.
Not a single U.S. mannequin within the prime 5 in the present day when final 12 months the highest 3 had been all American. pic.twitter.com/34ErpBv8rg
— Arnaud Bertrand (@RnaudBertrand) October 14, 2025
There’s additionally a {hardware} risk beneath all of this. A brand new DeepSeek mannequin is extensively anticipated to drop quickly, and it is rumored to have been educated fully on chips made by Huawei—a sanctioned Chinese language firm. If that is confirmed, then it could give builders around the globe, notably in China, a concrete purpose to begin testing Huawei’s {hardware}. China’s Ziphu AI is already doing that.
That is the situation Nvidia most wants to stop: Chinese language open fashions and Chinese language chips constructing an ecosystem that does not want Nvidia in any respect.
Day by day Debrief Publication
Begin day-after-day with the highest information tales proper now, plus unique options, a podcast, movies and extra.