Lofree: The lobster cost black hole is emerging, requiring a more efficient token Agent framework.

2026-04-06 06:30:48

Anthropic’s move to block third-party tools from abusing subscription access is uncovering a long-ignored cost crisis in the AI Agent era.

Two days ago, Anthropic announced it was cutting off the channel that allowed access to Claude subscriptions via a third-party calling framework. Xiaomi MiMo large model executive Fuli Luo immediately posted a message, offering an in-depth interpretation of this event in light of the Token plan MiMo rolled out just three days earlier.

She believes Anthropic’s action is not simply a defensive move in business, but a necessary milestone in the ecosystem’s maturation, given that the global supply of compute can’t keep up with the accelerating demand of Agents.

Directly impacted by this change are users of third-party calling frameworks such as OpenClaw and OpenCode that run on Claude subscription access. These users are facing a sharp spike in costs, which in the short term could reach dozens of times what it was previously.

But Luo believes this pressure is precisely a catalyst to improve engineering quality—only by making the truly visible costs of inefficiency apparent can we push developers to take context management and cache optimization seriously.

A cost black hole behind the subscription system

Luo points out that the subscription system for Claude Code is quite sophisticated in terms of compute allocation design, but she admits that this system likely won’t be profitable, and may even be in the red.

The root cause lies in how third-party frameworks make calls. Take OpenClaw as an example: its context management has obvious flaws—in handling a single user request, the system breaks it into multiple rounds of low-value tool calls, which are then issued one by one as separate API requests. Each request typically carries a context window of more than 100k tokens.

Even if there are cache hits, this pattern is extremely wasteful; in extreme cases, it can also drive up the cache miss rates of other requests.

Luo estimates that the number of actual requests produced per query by these frameworks is often several times that of the Claude Code native framework. When converted to API billing, the real cost could be dozens of times the subscription price. She describes this gap as “not a shortfall, but a bottomless pit.”

AI Workshop organizer @newlinedotco commented: The subscription “all you can eat” bundle has been a ticking time bomb from the start—third-party harnesses (such as OpenClaw) loop calls 24/7, with API costs that may reach as high as $5,000, while the subscription is only $200. Official tools (such as Claude Code) can remain sustainable only thanks to prompt cache optimization.

After the block: short-term pain and long-term规律

Anthropic’s adjustment did not completely close the door to third-party access. Tools such as OpenClaw and OpenCode can still call Claude via APIs, but they have lost the channel to ride on the subscription plan.

This distinction is crucial. For users who are used to using these tools at subscription prices, the cost shock is immediate and significant.

But Luo believes this pain has a corrective effect—it will force framework developers to seriously improve their context management capabilities, maximize prompt cache hit rates to reuse already-processed context, and cut down on wasted token consumption. She describes this process as “pain eventually transforming into engineering规律.”

She also reminds major language model companies that before they have clarified the programming plan cost structure, they should not blindly jump into pricing races. Selling tokens cheaply both opens the door for third-party frameworks and appears friendly to users on the surface, but in reality it’s a trap—and Anthropic has just stepped out of that trap.

She further points out that if users spend a lot of effort on low-quality frameworks, unstable inference services, and degraded models and still get nothing in return, it will cause real harm to user experience and retention.

In response, AI engineer @karpathy noted:

“Great software is often born under constraints. If Token is free, nobody will write streamlined Prompts or research Context compression; when cost becomes a bottleneck, developers will truly think about how to build ‘smart-brained’ Agents.”

Different paths of the MiMo Token plan

While interpreting Anthropic’s move, Luo also explains the design logic behind the MiMo Token plan.

The plan supports third-party calling frameworks and uses token-allocated quota billing. Logically, it matches the overage usage packs that Claude recently introduced.

Luo emphasizes that MiMo’s goal is “to deliver high-quality models and services steadily over the long term, not to let users pay impulsively and then churn.”

This statement reflects a compute allocation philosophy that differs from the subscription model: constrain user and framework behavior through real usage costs, rather than managing abuse risk in a closed way.

Competition in efficiency, not compute consumption

At the end of her article, Luo offers a more macro judgment: the global supply of compute can no longer keep up with the rate of token demand generated by Agents.

In her view, the way forward is not to further drive down token prices, but to evolve in synergy between “Agent frameworks with higher token efficiency” and “more powerful, higher-efficiency models.”

Whether Anthropic’s move is driven by intentional design or not, it is pushing the entire ecosystem—whether open source or closed source—toward that direction.

“The Agent era does not belong to those who burn the most compute, but to those who use it the smartest.” Luo wrote.

Risk warning and disclaimer

        There are risks in the market; investment involves caution. This article does not constitute personal investment advice, and it does not take into account any special investment objectives, financial conditions, or needs of individual users. Users should consider whether any opinions, viewpoints, or conclusions in this article align with their specific circumstances. Investing based on this is at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.