No one understands "next-generation AI computing" better than NVIDIA.

TechubNews · 2026-03-17T06:51:23+00:00

Source: Geek ParkWritten by: Xu Shan "I am convinced that this is a tech feast for us. Everyone here today represents part of Nvidia's ecosystem." At the opening of GTC 2026, the man who always wears a black leather jacket—Jensen Huang—took the stage with his trademark confidence. Unlike previous years' openings, this year marks the 20th anniversary of the CUDA ecosystem. Right from the start, Lao Huang explained how Nvidia evolved from GeForce to ray tracing, and then to AI agents, building its own moat step by step. "Today, the CUDA ecosystem has formed a data flywheel," he said. Through his narrative, we can see that the CUDA ecosystem he built has become a business empire. At this conference, Lao Huang introduced Vera Rubin, designed specifically for AI agents, with computing power reaching 3

TechubNews

2026-03-17 06:51:23

Source: GeekPark

Written by: Xu Shan

“I am confident that this is a grand feast of technology. Every person here today represents NVIDIA’s ecosystem.” At the opening of GTC 2026, the man who always wears a black leather jacket—Jensen Huang—took the stage with his signature confidence.

Unlike previous years, this year marks the 20th anniversary of the CUDA ecosystem. Right from the start, Huang recounted how NVIDIA has built its moat step by step—from GeForce to ray tracing, and now to AI intelligent agents. “Today, the CUDA ecosystem has formed a data flywheel.” Under his narration, we can see that the CUDA ecosystem he personally created has already established a business empire.

At this release event, Huang introduced Vera Rubin, designed specifically for AI agents, with a computing power of 3.6 EFLOPS. Coupled with the latest rack, the new system achieves a 35-fold increase in throughput per megawatt. Additionally, he launched a CPU called Vera CPU, optimized for extremely high single-thread performance.

More importantly, NVIDIA also unveiled the enterprise-level OpenClaw reference solution, NemoClaw. You can directly download, use, modify, and connect it to all SaaS companies worldwide. Compared to this, NVIDIA’s focus on robotics, autonomous driving, and quantum computing in the keynote has somewhat diminished this year.

After watching this speech, you get a sense that we are at the starting point of a new computing platform revolution. Just like the emergence of Google and Amazon during the last wave of transformation, this AI explosion is nurturing a new batch of “AI giants” that will change the world.

Huang’s confidence comes from the long list of companies he holds in his hand at every stage. He is not only defining computing but also leveraging AI as a fulcrum to forcibly pull the entire world’s finance, healthcare, manufacturing, and retail into a “AI era” dominated by GPUs.

Once again, NVIDIA sees the future of AI. Now, it is pulling the whole world to jump into it together.

01 Vera Rubin: 7 Breakthrough Chips, 5 Rack-Scale Systems, 1 Supercomputer

“In just ten years, computing power has achieved a leap of 40 million times.” NVIDIA reviewed how, over the past decade, computing power has grown rapidly following the three major laws of scaling—from pretraining, fine-tuning, to inference—driving the development of general intelligent systems, with demand still growing exponentially.

Entering the AI Agent era, NVIDIA introduced Vera Rubin. It is designed for the entire lifecycle of AI agents, redefining the CPU, storage, networking, and security needs of AI intelligences from the chip level.

In terms of parameters, Vera Rubin is equipped with NVLink 6, reaching a total computing power of 3.6 EFLOPS, becoming a key engine driving the AI agent era. Moreover, compared to the previous generation, Vera Rubin systems are 100% liquid-cooled, eliminating all traditional cables.

In terms of overall configuration, the Vera CPU rack is designed for orchestration and general workloads. The STX rack, based on BlueField-4, provides AI-native storage. With Spectrum-6 integrated optical packaging technology, it achieves horizontal scalability, greatly improving energy efficiency and reliability.

The Groq 3 LPX rack is deeply interconnected with Vera Rubin, with the Groq LPU integrating 230MB of on-chip SRAM, further boosting overall computational speed.

Together, this entire system increases throughput per megawatt by 35 times. Undoubtedly, the Vera Rubin platform, built with seven chips and five major rack-level computers, creates a revolutionary AI supercomputer for general intelligence.

When discussing product design, Huang believes large language models will become increasingly larger, with tokens generated becoming more numerous and faster, enabling quicker thinking. However, they also need frequent memory access, putting enormous pressure on memory systems—including KV caches, structured data QDF, unstructured data QVS, etc.—which demands a complete rewrite of storage systems for the AI era.

In the Agentic era, AI will use various tools and require high-speed web browsing and virtual PC tools. Therefore, these PCs and computing nodes must be faster. NVIDIA has developed a new CPU, Vera CPU, optimized for extremely high single-thread performance, with high data output, strong data processing efficiency, and good energy efficiency. It is the world’s only data center CPU using LPDFR5X, offering top performance and cost-effectiveness in single-thread and per-watt performance.

Vera Rubin family｜Source: NVIDIA

“We built this CPU to work in harmony with the entire rack to support AI agent processing tasks. This product is also in mass production. We never intended to sell CPUs separately, but now our standalone CPU sales are very significant. This will undoubtedly become a multi-billion dollar business for us. I am very proud of our CPU architecture team,” Huang said.

He also demonstrated Rubin Ultra on site. Unlike the standard Rubin, which is horizontally inserted, Rubin Ultra uses a new Groq rack, inserted vertically into the Groq rack. “This Groq rack is very heavy—I definitely can’t lift it, so I won’t try.”

Groq rack｜Source: NVIDIA

On the back of the midboard, NVIDIA no longer uses traditional copper cables. Huang believes copper cables have transmission distance limitations, so they connect 144 GPUs with a new system called NVLink, which is also installed vertically, connecting to the back midboard. The front handles computation, and the back is an NVLink switch, forming a massive computer.

Returning to the core question: how much tangible benefit can the new chip architecture bring? Huang mentioned that chip design will ultimately influence the market positioning and pricing of future tokens.

“Tokens are a new commodity. Like all commodities, once they cross a tipping point and mature, they will be stratified and graded.” He outlined future token classifications:

High throughput, low speed versions for free tiers;
Mid-tier packages with larger models, faster speeds, and longer input contexts;
In the future, high-end flagship packages supporting extremely high token generation speeds for critical tasks or ultra-long research scenarios. By then, $150 per million tokens will be entirely reasonable.

“The larger the model, the smarter it is; the longer the input tokens’ context, the more accurate and relevant the results; the faster the speed, the more thorough the thinking and iteration, making AI smarter. As models become more intelligent, each step up can command a higher price—say, $45 per tier.” He believes future token consumption will change everything.

He envisions that if a researcher uses 50 million tokens daily, at $150 per million tokens, it’s entirely acceptable for a research team. “This is the future of AI.” From a customer perspective, he suggests reallocating all computing resources: 25% for free tiers, 25% for mid-tier, 25% for high-end, and 25% for premium tiers. If their data center consumes only 1 gigawatt, customers can decide how to distribute these resources—free tiers attract more users, high-end serve the most valuable clients. These combinations ultimately determine revenue.

Based on this simple model, Huang states that using the Blackwell platform can achieve five times the revenue growth of Hopper, and Vera Rubin can bring five times the revenue of Blackwell.

Vera Rubin｜Source: NVIDIA

Groq’s computing system is a deterministic dataflow processor using static compilation and compiler scheduling architecture. All timing arrangements—such as data transfer, computation execution, and data synchronization—are pre-planned by the compiler, with no dynamic scheduling. This architecture, equipped with large-capacity HBM, is designed specifically for inference workloads.

Currently, Groq 3 LPU has entered mass production, expected to start shipping around Q3 this year, under the product name Groq LP.

As for Vera Rubin, although early samples of Grace Blackwell required complex debugging and simulation of 72 interconnects, the Vera Rubin samples have completed testing, and the first Vera Rubin rack is already online and running smoothly on Microsoft Azure.

Currently, NVIDIA is rapidly producing Vera Rubin racks and GB300 racks at full capacity, with a supply chain capable of producing thousands of systems weekly.

NVIDIA’s next chip platform architecture is called the Feynman architecture.

Moreover, both Groq and Vera Rubin will be core components of NVIDIA’s AI factory.

Groq’s chip has only 500MB of storage; a Rubin GPU chip will have 288GB of HBM4 memory.

A trillion-parameter model requires vast storage for all parameters on Groq chips, necessitating many chips. But placing it next to Vera Rubin allows storing massive KV caches needed for AIGC systems on Rubin.

Thus, NVIDIA has restructured AI inference resource allocation—matching the most suitable work to the most suitable chips.

In Huang’s vision, parts of the model, like the attention decoding, require heavy computation and can be handled by Vera Rubin; token generation during decoding can be performed on Groq chips.

Through a special Ethernet coupling mode, these two chipsets can reduce Alpamayo latency by nearly half. With NVIDIA’s Dynamo software for scheduling and integration, the Vera Rubin architecture combined with Groq LPU has boosted high-level inference performance by 35 times.

02 NemoClaw: A Commercial Reference for AIOS

“OpenClaw is the most popular open-source project in human history, achieving this in just a few weeks. Its development speed even surpasses that of Linux back in the day,” Huang said. OpenClaw can interact with any modality, understand commands, and send messages, texts, or emails. It has complete IO capabilities.

“OpenClaw has open-sourced the operating system for intelligent agents. Just like Windows enabled the creation of personal computers, OpenClaw now enables the creation of personal intelligent agents.”

Huang believes that for every company, every software firm, and every tech CEO, the most critical question today is: what is your OpenClaw strategy?

“Just as we all needed Linux strategies, HTTP, HTML strategies—these launched the internet era; and Kubernetes strategies—these created the mobile cloud era. Today, every company in the world must have an OpenClaw strategy, which is an intelligent agent system strategy. This is the next-generation computer.”

He predicts that future work methods, human employment, and even compensation models will change.

Before OpenClaw, enterprise IT was centered around “data centers,” where large rooms stored data, files, and structured enterprise data. Data flowed through tools, record systems, and workflows embedded in IT, ultimately becoming tools for humans and digital workers. Under the old IT industry, software companies built tools, stored files, and IT consultants helped enterprises use and integrate these tools.

Future enterprise structure｜Source: NVIDIA

But in the OpenClaw era, and beyond the intelligent agent era, every IT company, enterprise, and SaaS provider will become an “Agent-as-a-Service” (AAS) company.

However, a key unresolved issue remains—agent systems within enterprise networks can access sensitive information, execute code, and communicate externally. They could potentially access employee data, supply chain info, financial secrets, and transmit these externally, posing security risks.

Subsequently, NVIDIA launched its own NVIDIA OpenClaw reference solution—Open NemoClaw.

NemoClaw｜Source: NVIDIA

It includes a complete set of intelligent agent AI tools, with one core technology being the OpenShell module, now fully integrated into OpenClaw. Users can directly download, use, modify, and connect it to the strategy engines of all SaaS companies worldwide.

At the same time, users can connect these strategy engines to enforce security policies, set network firewalls, and run privacy routing to protect internal enterprise environments, allowing agents to operate securely and controllably. Open NemoClaw also supports users building custom agents with their own models.

NemoClaw equipped with NVIDIA models ranks on the leaderboard of OpenClaw-like products｜Source: NVIDIA

He predicts that one of Silicon Valley’s future hiring methods will be: “How many tokens does this job come with?”

Eventually, employees’ base annual salary might be hundreds of thousands of dollars, with companies additionally paying half of their compensation in tokens, amplifying productivity tenfold.

In the future, every software company will be AI-driven. They will be token producers, token users, and providers of tokens to all customers.

03 Physical AI: BYD, Geely Join NVIDIA’s Robotaxi Circle, Disney’s Snow White Robot Debuts

After discussing application-level changes, Huang turned to physical AI, showcasing his physical AI family.

NVIDIA’s physical AI lineup｜Source: NVIDIA

Currently, NVIDIA has three types of computers for training, data synthesis, and simulation, as well as onboard vehicle computers inside robots.

NVIDIA also announced a host of new partners. “Autonomous driving with ChatGPT capabilities is here. We are now confident that fully autonomous vehicles are achievable,” Huang said.

NVIDIA announced four new partners for its NVIDIA Robotaxi platform: BYD, Hyundai, Nissan, and Geely. These manufacturers produce a combined total of 18 million vehicles annually. Along with existing partners like Mercedes, Toyota, and GM, the number of vehicles supporting Robotaxi will be substantial. NVIDIA also plans to connect these vehicles into partner-operated networks in multiple cities.

In the future, traditional radio towers will transform into NVIDIA Aerial AI RIM smart base stations—“Robotaxi radio towers.” They will understand traffic conditions, intelligently adjust beamforming, maximize fidelity, and save energy.

He also mentioned that with NVIDIA Alpamayo, vehicles now have inference capabilities to drive safely and intelligently in various scenarios. They can explain their decision-making process and respond directly to voice commands.

For example, we say to the car: “Hey Mercedes, can we go faster?” The vehicle responds: “Of course, I’ll accelerate now.” By combining traditional and neural simulation, they generate massive synthetic data and train strategy models at scale.

This time, NVIDIA also developed several open-source tools: Isaac Lab for training and evaluating robots in simulation; Newton, a scalable GPU-accelerated physics engine; Cosmos, a neural simulation world model; GR00T, an open-source robot foundation model for reasoning and action generation.

At the end of the keynote, Disney’s Olaf robot from Frozen took the stage. Currently, Disney’s robots are trained using NVIDIA simulation. “One of the robots I look forward to most is Disney’s robot,” Huang said.

Huang and Olaf wave goodbye at GTC｜Source: NVIDIA

This year, Huang’s keynote at GTC was no longer about directions and slogans but a complete set of practical tools for current AI entrepreneurs.

From AI chips, OpenClaw intelligent agent systems, to physical AI, robotics, and large-scale autonomous driving deployment, he provided answers to the path, the toughest problems, and the bottlenecks the AI industry must face in the coming years. Every company and developer can find their place within this new framework.

Starting this year, AI is no longer just about stacking parameters, competing for computing power, or storytelling. It is moving into enterprises and practical applications. This may not be a victory for any single company but the true start of AI’s flywheel.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.