NVIDIA Open-Sources 120B Intelligent Agent Model Nemotron 3 Super: Only 10% of Parameters Activated, Throughput Up to 5 Times Higher Than the Previous Generation

robot
Abstract generation in progress

According to CoinWorld, based on monitoring by 1M AI News, NVIDIA has released the open-source large language model Nemotron 3 Super, designed for multi-agent application scenarios. The model has a total of 120 billion parameters, uses a hybrid Mamba-Transformer MoE architecture, and activates only 12 billion parameters per token during inference. Its core technology, “Latent MoE,” compresses token embeddings into a low-rank latent space before routing to expert networks, enabling four experts to be activated with the computational cost of a single expert. This results in up to a fivefold increase in inference throughput compared to the previous generation Nemotron Super. The model natively supports a context window of 1 million tokens, suitable for autonomous agents that need to maintain workflow states over long periods. In the PinchBench benchmark for evaluating agent workloads, Nemotron 3 Super scored 85.6%, the highest among similar open-source models. NVIDIA also open-sourced over 10 trillion tokens of training data, 15 reinforcement learning training environments, and evaluation schemes, under the NVIDIA Nemotron Open Model License. The model is now available on platforms such as Hugging Face, build.nvidia.com, Perplexity, and OpenRouter, and supports deployment via cloud services including Google Cloud, Oracle, AWS Bedrock, and Azure. Companies like Perplexity, CodeRabbit, Cadence, Dassault Systèmes, and Siemens have already adopted it.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
0/400
No comments
  • Pin