After an 83% price increase, tokens are selling out rapidly. Zhipu's financial report hides an industry turning point.

GateUser-bd883c58 · 2026-04-08T04:57:45+00:00

In the third week of February 2026, a historic milestone was reached on OpenRouter, the world's largest AI model API aggregation platform. The weekly token calls for Chinese large models surged to 5.16 trillion, surpassing the US models' 2.7 trillion for the first time. Among the top five models globally by call volume, China holds four spots. ====================================================================================================================A year ago, the landscape on this platform was entirely different. Anthropic dominated with 42% of the token share, and Chinese models were barely present at the table.And just as this reversal occurred,

GateUser-bd883c58

2026-04-08 04:57:45

In the third week of February 2026, a set of historic data appeared on OpenRouter, the world’s largest AI model API aggregation platform: the weekly Token usage calls for China’s large models surged to 5.16 trillion, first surpassing the 2.7 trillion weekly calls of U.S. models in the same period. Among the top five models by global call volume, China holds four seats.

A year ago, the landscape on this platform was an entirely different story. Anthropic alone accounted for 42% of Token share, and China’s models were nearly absent from the table.

In the very same week that this reversal happened, Zhipu released GLM-5, while also announcing an 83% API price increase. Against the backdrop of a price war still being the industry’s main theme, this was China’s first price-hike card for domestically developed large models— and after raising prices, the market still was willing to pay for Zhipu.

On March 31, Zhipu (02513.HK) released its first annual performance report since listing. For full-year 2025, revenue was RMB 724 million, up 131.9% year over year, maintaining its position as the domestic independent large model company with the largest revenue scale. At the earnings briefing, Zhipu CEO Zhang Peng boiled the company’s growth logic down to one sentence: “When the model is strong enough, the API itself is the best business model.” He further judged: “The quality of intelligence creates pricing power; deep usage by enterprises and users creates growth through Scaling.”

The key takeaway of this earnings report isn’t a specific revenue number, but the fact that—like Anthropic—this commercial model and growth pattern is taking shape on Zhipu. It provides a reference point at the level of a coordinate system for China’s large model industry.

A Turning-Point Moment for Commercializing China’s Large Models

From the second half of 2024 to early 2025, China’s large model industry fought a brutal price war.

Byte’s Doubao cut the price of inference input to 0.0008 yuan per 12.4k tokens, Alibaba’s Tongyi Qianwen cut prices by 97% for GPT-4-class flagship models, and Zhipu itself once announced a 90% price cut for GLM-4-Plus. In that stage, almost every player was doing the same thing: buying market ecosystems with subsidies, using low prices to capture call volume. With excess supply, grabbing users was the top priority.

The price war really completed its historical mission. Once Tokens were cheap enough, the usage habits of individual developers and enterprises were cultivated, and the underlying base of call volume was established.

But the endgame of a price war isn’t who’s cheaper—it’s who makes customers feel the price is truly worth it first.

On February 12, 2026, the turning point arrived. The day GLM-5 was released, Zhipu also announced a structural adjustment to the Coding Plan pricing system, with an overall increase starting from 30%. In Q1 2026, Zhipu’s API prices rose by as much as 83%. The market’s reaction was not churn, but a buying rush—sold out, limited sales, apologies, all in a row.

Why does raising prices result in stock running out instead of sales decline?

Coding isn’t chat—it’s a real productivity scenario. GLM-5 has remained first among open-source models on core programming leaderboards such as SWE-bench Verified, and it can autonomously complete system engineering tasks like backend refactoring and deep debugging with minimal human intervention. Developers pay for such an “engineer,” and the decision is completely different from paying for a chatbot. The 149 yuan/month Pro package isn’t an expense for programmers—it’s an investment; the time saved translates directly into higher delivery-to-closure efficiency.

At the earnings briefing, Zhang Peng said it plainly: “Developers are the most sensitive group to the upper bound of perceived intelligence.” Zhipu launched the programming package GLM Coding Plan in China as the first company in 2025. The number of paid developers broke rapidly past 242k, and Token call volume increased by 15 times over six months.

From the revenue structure, this financial report shows a picture that is utterly different from the market’s old impressions: explosive growth in API call volume, while the proportion of private (enterprise) revenue shrank significantly. Today, recurring API revenue has become the main engine of Zhipu’s performance; growth no longer depends on signing contracts to drive it, but on usage volume rising on its own.

When a model is merely a陪聊 tool, price is a cost variable; when a model can deliver a complete system, price becomes a productivity variable. The ceiling for the former is users’ patience; the ceiling for the latter is the labor costs users save.

This shift directly rewrites how the market prices Zhipu. Project-based companies look at PE, while platform companies look at ARR—two completely different valuation logics. In the market’s view today, Zhipu no longer charges for projects; it charges for calls and “collects rent” by usage. The former is a labor-intensive business; the latter is a platform economy.

Zhang Peng summarized the logic of pricing power into a formula: “Commercial value in the AGI era = the intelligence upper bound × Token consumption scale.” “The intelligence upper bound determines pricing power, and the Token consumption scale determines the size of value.” He further judged: “When the model is strong enough, the API itself is the best business model. The quality of intelligence creates pricing power; deep usage by enterprises and users creates growth through Scaling.”

After a 83% price increase, call volume doesn’t fall—it rises instead, which is the first real-world validation of this judgment. Anthropic across the ocean has walked the same path—by late 2025, ARR reached $9 billion, and Claude Code alone hit $2.5 billion in nine months.

When programming upgrades from writing code snippets to completing system engineering, Token consumption and the unit price can rise in parallel. Zhipu is reproducing this path in China.

“China’s Anthropic” Enters a Breakout Period

To further understand the weight of this earnings report from Zhipu, you need to first look at a set of numbers from across the ocean.

Anthropic achieved unprecedented growth in 15 months: ARR rose from $1 billion at the end of 2024 to $19 billion by March 2026. Its user base is only 5% of ChatGPT, yet its revenue reaches over 40% of OpenAI’s, and its per-user monetization efficiency is 8 times that of the latter. About 80% of Anthropic’s revenue comes from enterprise-level API call services, and 70% of Fortune 100 companies are Claude customers.

Anthropic’s lesson isn’t about how big it is; it’s about proving one thing: user count is a scale metric, and depth of calls can become real money.

At the earnings briefing, Zhang Peng explicitly placed Zhipu within this coordinate system. He said directly that the company will “continue along China’s Anthropic commercial path, with model intelligence as the foundation and the API platform as the engine.”

The data is proving this judgment. In 2025, Zhipu’s full-year revenue was RMB 724 million, up 131.9%, exceeding the company’s targets set at the beginning of the year. Full-year consolidated gross margin was 41%, far above industry norms. Zhipu’s MaaS API platform ARR is about RMB 1.7 billion, a 60x increase over the past 12 months. The MaaS platform gross margin increased nearly 5x to 18.9%.

But more than the financial numbers, what best explains whether the flywheel is spinning is the “density of integrations.”

Among China’s top 10 internet companies, 9 already make daily deep calls to GLM models. Within 24 hours of each new GLM model release, official integrations with major platform products such as ByteDance TRAE/Coze’s domestic plugin “扣子,” Alibaba’s Qoder, Tencent’s CodeBuddy, Meituan’s CatPaw, Kuaishou’s 万擎, Baidu Intelligent Cloud, and WPS Office are obtained.

Looking further into the long tail, more than 4 million enterprise users and developers continuously call in real production environments, covering over 218 countries and regions worldwide. GLM has become the default model for international coding platforms such as Windsurf and OpenCode, and OpenRouter’s paid model ranking is number one.

The density of integration is the density of non-replaceability.

For the flywheel to spin, the starting point is model performance. The GLM series has continuously maintained global first place among open-source models and first place among China’s models; among all global models, it stays closely behind GPT, Claude, and Gemini, and consistently ranks within the global AI first tier.

The way GLM-5 launched itself is also a positioning statement. It took the top spot on OpenRouter’s hot charts under the anonymous identity “Pony Alpha.” Silicon Valley developers speculated it was Claude Sonnet 5 or DeepSeek-V4; after the reveal, on its first day after launch it processed 4 billion Tokens and 206k requests.

Improvements in gross margin also validate the flywheel efficiency gains. Through co-design across software and hardware, as well as on the inference side, a dynamic sparse attention mechanism reduced deployment costs to 50% of the original while preserving performance; on the customer side, the price increase played a positive filtering effect—customers willing to pay for the results had higher retention rates and deeper call usage.

Zhang Peng described this virtuous cycle as: “Breakthrough in the intelligence upper bound drives an exponential increase in Token consumption— the stronger the model, the deeper the usage scenarios, and the greater the Token call volume.” “Commercial positive feedback supports us to invest more in compute and R&D, further lifting the intelligence upper bound. This flywheel has already started turning.”

The Next Breakout Point for Token Economics

On February 26, NVIDIA CEO Jensen Huang, during the earnings call, repeatedly emphasized a judgment to the market: “Compute is revenue, inference is revenue.” Without compute, you can’t generate Tokens; without Tokens, you can’t bring revenue growth.

Global data confirms this view. Over the past year, OpenRouter’s top 10 models’ weekly Token call volume jumped from 1.24 trillion to nearly 14 trillion—an increase of more than 10 times. It’s not only user growth; the depth of Token consumption per user is also rising. As Agents complete tasks, they require more steps and more tool calls, and Token consumption accumulates by step.

In the internet era, free is the way to win, because the marginal cost of traffic is close to zero. AI is completely different: every inference burns compute, and Tokens naturally carry a price. This means AI companies sit on a “pay-as-you-go per-usage” business model from day one.

Zhang Peng laid out his judgment framework: in 2025, Zhipu’s keyword is “the intelligence upper bound,” and in 2026, the keyword is “Token volume.” “Applications represented by OpenClaw spark a frenzy of Token consumption. We will continue to increase investment, push inference performance to the limit—not for short-term profitability, but to support that ever-rising, high-quality Token consumption exponential curve.”

Over the past year, Zhipu’s five generations of model iterations have been telling the story of how Token consumption volume keeps being amplified.

Zhang Peng broke down this paradigm path: In the AI coding stage, models learn to write code, but fundamentally they are helpers; in the Vibe coding stage, Code is cheap—ideas are what’s valuable; in the Agentic engineering stage, AI understands requirements and formulates plans, writes, tests, and iterates fixes like an engineer doing so independently; in the long horizon stage, AI must work continuously on a longer time scale like a senior expert, delivering results over extended periods.

Every leap multiplies Token consumption per single task by several times over the previous stage. It is reported that the soon-to-be officially released GLM-5.1 will make systematic optimizations around long horizon tasks, pointing exactly to the next step.

OpenClaw’s explosion turned this trend from theory into reality. In March 2026, Zhipu launched Claw Plan. Within two days of going live, subscribed users exceeded 100k; within 20 days, it surpassed 400k. AI Agents run 24/7 autonomously—each instance is a “digital employee” that keeps burning Tokens.

Demand is exploding, and the supply side cannot drop the ball. GLM-5 has already completed deep inference adaptation with 7 domestic chip platforms, including Huawei Ascend, Moore Threads, and Cambricon. Zhang Peng said that on domestic chips, the GLM series has already achieved inference efficiency comparable to international top-tier chips. With compute that is independently controllable, Token capacity won’t be bottlenecked.

Zhipu condensed the entire logic into one concept: TAC (Token Architecture Capability), Token architecture capability. TAC = smart call volume × smart quality × economic conversion efficiency.

Zhang Peng believes that the future standard for measuring value will no longer be how much information is possessed, but the ability, as a Token architect, to drive large models and Agents to complete complex tasks. “Zhipu’s goal is to become the infrastructure that improves society’s overall TAC, so that every drop of Tokens can be converted into deliverable economic incremental gains.”

Using the same logic, Anthropic obtained a $380 billion valuation and $19 billion ARR. Where is the ceiling of China’s MaaS model? This earnings report is the first official coordinate point.

In the large-model industry, the four words “supply is tight” are more convincing than any earnings report number. Once a company starts apologizing for “selling out,” all debates about pricing power can end.

*The above content does not constitute investment advice. It does not represent the views of the publishing platform. The market involves risk; investments require caution. Please make independent judgments and decisions.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.