NVIDIA Launches Nemotron 3 Super Open Weights Model for Autonomous Agentic AI
NVIDIA has launched Nemotron 3 Super an open weights model designed specifically for autonomous agentic AI. Two main obstacles exist for multi agent systems which will emerge by 2026 because of the context explosion phenomenon which causes historical records to exceed the limits of normal models and the thinking tax problem which results from expensive reasoning processes that prevent ongoing system functionality. The 120 billion parameter Mixture of Experts (MoE) architecture with 12 billion active parameters serves as the framework through which Nemotron 3 Super tackles these challenges.
The model uses a dual architecture backbone which integrates multiple neural systems to achieve four times better computational performance compared to earlier system versions.
- Mamba 2 Layers These layers use linear time operations to process sequences which makes it possible to handle 1 million tokens over a 1 million token context window.
- Transformer Attention The interleaved layers enable the model to retrieve particular data points from extensive datasets through exact associative memory recall.
- Latent MoE The model projects token embeddings into low rank latent space before routing which allows it to activate four specialized experts at the same processing cost as one expert.
The agentic workflows in 2026 create 15 times more tokens than standard chat systems because they continuously process previous conversation history. Nemotron 3 Super solves this with a native 1M token context window preventing goal drift by keeping the original objectives aligned throughout long running tasks. The model uses Multi Token Prediction (MTP) to reduce generation delays by predicting multiple upcoming tokens at once which achieves a threefold improvement in code and structured data generation speed.
Nemotron 3 Super was pretrained using NVFP4 (4 bit floating point) which differs from other models that apply quantization post training. The Blackwell architecture of NVIDIA Blackwell enables optimal performance with this format on the B200 GPU platform. The model achieves high accuracy through native pretraining but needs less memory space while boosting inference speed to four times FP8 performance on older H100 systems.
The training data for Nemotron 3 Super consisted of 25 trillion tokens which included specialized coding and complex reasoning datasets. The new benchmarks which follow 2026 standards confirm its performance because it successfully passed all tests.
PinchBench Scored 85.6% leading its class for autonomous agent performance. The DeepResearch Bench currently stands as the top research platform because it enables researchers to stay focused through extended research projects. The system went through post training reinforcement learning across 21 environments while using NeMo Gym which required 1.2 million environment rollouts to verify proper function of tool calling and planning execution.
NVIDIA has released Nemotron 3 Super with open weights and complete training recipes. The following platforms make it accessible
- NVIDIA NIM Packaged as a microservice for rapid deployment on premises or in the cloud.
- Hugging Face, Perplexity, and OpenRouter provide access to Open Ecosystems.
- Google Cloud Vertex AI, Oracle Cloud Infrastructure, and CoreWeave serve as key infrastructure partners who support their business operations.
The combination of Mamba's sequence efficiency together with Transformer level reasoning capabilities enables Nemotron 3 Super to deliver advanced technical capabilities which software developers and cybersecurity triagers and molecular data analysts need for their large scale work.
