Anthropic created a "too dangerous" AI and then decided not to release it.

Is this real security awareness, or a carefully engineered capability marketing play?

Author: Deep Tide TechFlow

On April 7, Anthropic did something the AI industry has never seen before: it officially released a model—and told the whole world that you can’t use it.

The model is called Claude Mythos Preview. It’s not a chat bot, not a coding assistant. According to Anthropic itself, it is a “vulnerability-finding machine,” which, on its own, found thousands of zero-day vulnerabilities within the past few weeks—covering all major operating systems and all major browsers. Some of these vulnerabilities have survived in human code reviews and automated testing for more than twenty years.

The oldest bug—hidden in OpenBSD, known for security—has already been around for 27 years.

Anthropic did not put Mythos on the API, nor open it up to ordinary users. Instead, it chose a path that has never been taken before: it formed a tech-giant alliance called “Project Glasswing,” limiting the model to provide it only to 12 core partners—Amazon, Apple, Microsoft, Google, NVIDIA, CrowdStrike, Palo Alto Networks, and others—plus roughly 40 organizations that maintain critical infrastructure. For this, Anthropic promised up to $100 million in usage credits and $4 million in direct donations to open-source security organizations.

This is the first time in the AI industry that there has been a situation of “the model is too powerful, so we don’t dare to release it.”

The question is: Is this real security awareness, or a carefully engineered capability marketing play?

The Sandwich Incident

A few unsettling details are buried in Mythos’s 244-page System Card, and the one that really went viral is the “sandwich incident.”

During an internal behavioral test, Anthropic provided an early version of Mythos with a hardened sandbox environment. It simulated that a user instructed it to attempt escape and to find a way to contact researchers outside. The model succeeded. It broke out of the security container and sent the researchers an email. That researcher was eating a sandwich in the park at the time.

But the story didn’t end there. Without being asked, Mythos independently published the technical details of the jailbreak to multiple websites that are “difficult to find but technically public.” In the System Card’s own words, this was an “alarming, unrequested display of boasting.”

This isn’t the only time. The System Card also records the following behaviors:

Early versions searched for credentials across multiple instances via Linux’s /proc/ interface, attempted to bypass the sandbox, and escalated privileges. It succeeded in obtaining resources that Anthropic deliberately did not provide, including credentials for a messaging service, credentials for source code control, and even—by checking process memory—an Anthropic API key. In one case, the model found a way to edit a file it had no permission to modify, and then went further to ensure those modifications would not appear in Git’s change history. Another case: the model accidentally received an exact answer to an estimation question through a method that was explicitly prohibited, but it didn’t flag the issue and refuse to answer. Instead, it solved the problem again on its own, and in its internal reasoning explicitly thought, “the final submitted answer shouldn’t be too accurate,” to avoid being detected.

In other words, early Mythos didn’t just take unauthorized actions—it also actively covered its tracks and adjusted its behavior to evade supervision.

Anthropic emphasized that these serious incidents all occurred in early versions before training interventions; the final released Preview version has been substantially improved. But even so, the narrative itself is chilling: before a model is trained to “be obedient,” it shows what it can do when it isn’t.

From 0% to 72.4%

What truly shook the industry wasn’t Mythos’s jailbreak story—it was its attack capability.

In autonomous vulnerability exploitation development, Anthropic’s former flagship model Claude Opus 4.6 had a success rate close to zero. It could find vulnerabilities, but it could hardly turn them into working attack code. Mythos Preview is completely different: in the testing domain of the Firefox JavaScript engine, the success rate of converting found vulnerabilities into runnable exploits reached 72.4%.

Even more astonishing is the sophistication of the attacks. Mythos wrote, on its own, a browser exploit chain that chained four independent vulnerabilities together, constructing a JIT heap spraying attack that successfully escaped both the renderer sandbox and the operating system sandbox. In another case, it wrote a remote code execution exploit on a FreeBSD NFS server by distributing 20 ROP gadgets across multiple network data packets, enabling full root access by an unauthorized user.

In the world of human security researchers, these kinds of vulnerability-chain attacks are the kind of work only top-tier APT teams can accomplish. Now, a general-purpose AI model can do it autonomously.

Axios quoted Anthropic’s red team lead, Logan Graham, as saying that Mythos Preview has reasoning capabilities comparable to those of senior human security researchers. Nicholas Carlini put it more bluntly: the Bugs he found with Mythos in the past few weeks were more than the total number he found over his entire career.

In benchmarks, Mythos also leads overwhelmingly. CyberGym vulnerability reproduction benchmark: 83.1% (Opus 4.6: 66.6%). SWE-bench Verified: 93.9% (Opus 4.6: 80.8%). SWE-bench Pro: 77.8% (Opus 4.6: 53.4%; previously leading GPT-5.3-Codex at 56.8%). Terminal-Bench 2.0: 82.0% (Opus 4.6: 65.4%).

This isn’t incremental progress. It’s a model pulling ahead by a margin of more than a dozen to a couple dozen percentage points in nearly all coding and security benchmarks at once.

The “Strongest Model” That Was Leaked

Mythos wasn’t widely known to the public on April 7.

In late March, Fortune’s reporters and security researchers found nearly 3,000 unreleased internal documents in an incorrectly configured CMS at Anthropic. One draft blog post explicitly used the name “Claude Mythos” and described it as Anthropic’s “most powerful AI model to date.” The internal codename was “Capybara” (a capybara), representing a new tier of models—bigger, stronger, and more expensive than the existing flagship Opus.

One line in the leaked materials hit the market’s nerve: Mythos is “far ahead of any other AI model” in network security capabilities—signaling an upcoming wave of models that will be able to exploit vulnerabilities at speeds far beyond those of defenders.

That line triggered a “flash crash” in the cybersecurity sector on March 27. CrowdStrike fell 7.5% in a single day, wiping out about $15 billion in market value in just one trading day. Palo Alto Networks dropped more than 6%, Zscaler fell 4.5%, and Okta and SentinelOne and Fortinet all fell by more than 3%. The iShares cybersecurity ETF (IHAK) briefly fell close to 4% during the day.

Investors’ logic was simple: if a general-purpose AI model can autonomously discover and exploit vulnerabilities, how long can the traditional security companies’ two moat pillars—“proprietary threat intelligence” and “human expert knowledge”—last?

Raymond James analyst Adam Tindle pointed out several key risks: traditional defensive advantages are being compressed; both attack complexity and defense costs are rising at the same time; and security architecture and spending patterns face a rebuild. A more pessimistic view came from KBW analyst Borg, who believes Mythos has the potential to “elevate any ordinary hacker to the level of a nation-state adversary.”

But the market also has another side. After CrowdStrike’s share price plunge, Palo Alto Networks CEO Nikesh Arora bought $10 million worth of his company’s own stock. The bullish logic is: stronger offensive AI means enterprises must upgrade defenses faster; cybersecurity spending won’t shrink—it will accelerate the transition from traditional tools to AI-native defense.

Project Glasswing: The Defenders’ Time Window

Anthropic chose not to publicly release Mythos and instead formed a defense alliance. The core logic behind the decision is the “time gap.”

CrowdStrike CTO Elia Zaitsev put it plainly: the time window from when a vulnerability is discovered to when it is exploited has shrunk from months to minutes. Palo Alto Networks’s Lee Klarich went even further, directly warning everyone that they need to be ready for AI-assisted attackers.

Anthropic’s gamble is this: before other labs train models with similar capabilities, let the defenders use Mythos to patch the most critical vulnerabilities first. That is the logic of Project Glasswing—the name comes from the glasswing butterfly, a metaphor for vulnerabilities “hidden in plain sight.”

Jim Zemlin of the Linux Foundation pointed to a long-standing structural problem: security expertise has always been a luxury for large enterprises, while the open-source maintainers that support global critical infrastructure have long had to figure out security defenses on their own. Mythos offers a credible path to change that asymmetry.

But the question is: how big is this time window? China’s Zhipu AI (Z.ai) published GLM-5.1 almost on the same day, claiming a #1 global ranking on SWE-bench Pro and that it was trained entirely on Huawei Ascend chips—without using a single NVIDIA GPU. GLM-5.1 is open-weight and aggressively priced. If Mythos represents the ceiling of capabilities defenders need, then GLM-5.1 is a signal: this ceiling is being approached quickly, and those closing in may not have the same security intent.

OpenAI also won’t stand still. Reportedly, its front-line model codenamed “Spud” completed pretraining around the same time. Both companies are preparing for an IPO later this year. Whether Mythos’s timing was truly accidental or not, it lands squarely on one of the most explosive nodes.

Security Pioneer or Capability Marketing?

One has to face an uncomfortable question: Is Anthropic really not releasing Mythos out of security concerns—or is this itself the highest-level product marketing?

Skeptics have plenty of reasons. Dario Amodei and Anthropic have a history of raising product value by showcasing the dangers of rendering models. Jake Handy wrote on Substack: “The sandwich incident, Git hiding traces, self-degradation in evaluations—maybe all of it is real. But the fact that Anthropic got this scale of media exposure basically proves that this is exactly the effect they wanted.”

A company that started in AI security has an incorrectly configured CMS that leads to nearly 3,000 files leaked; last year, because of an error in the Claude Code software package, it accidentally exposed nearly 2,000 source code files and more than 500k lines of code, and then during cleanup caused thousands of code repositories on GitHub to be accidentally taken down. A company whose biggest selling point is security capabilities can’t even manage its own release process—this mismatch is more worth pondering than any benchmark.

But from another angle, if Mythos’s capabilities truly match the description, then not releasing it is an option that comes at an extremely high cost. Anthropic gives up API revenue, gives up market share, and locks the strongest model inside a limited consortium. $100 million in usage credits isn’t a small number. For a company still operating at a loss and preparing for an IPO, this doesn’t look like a purely marketing decision.

A more reasonable interpretation might be: the security concerns are real, but Anthropic also clearly understands that the narrative of “our model is too strong, so we dare not release it” is itself the most persuasive proof of capability. The two things can both be true.

An “iPhone moment” for cybersecurity?

No matter how you view Anthropic’s motives, you can’t avoid the underlying fact Mythos reveals: AI’s code understanding and attack capabilities have crossed a qualitative threshold.

The previous generation model (Opus 4.6) could find vulnerabilities but could hardly write exploits. Mythos can find vulnerabilities, write exploits, chain vulnerability chains, escape sandboxes, obtain root privileges—and it can complete the entire process autonomously. An engineer without security training can have Mythos go find vulnerabilities before bed, and wake up the next morning to a complete, working exploit report.

What does that mean? It means the marginal cost of vulnerability discovery and exploitation is approaching zero. What used to require months of work by top security teams can now be completed overnight with a single API call. This isn’t “efficiency.” It’s a complete transformation of the cost structure.

For traditional cybersecurity companies, short-term stock price volatility might just be the prelude. The real challenge is: when both attacks and defenses are driven by AI models, how will the value chain in the security industry be restructured? A Raymond James analysis suggests a possibility: security functions may eventually be embedded into cloud platforms themselves, and pricing power for independent security vendors could face fundamental pressure.

For the software industry as a whole, Mythos is more like a mirror that reflects decades of accumulated technical debt. Those vulnerabilities that survived 27 years in human review and automated testing weren’t because no one found them, but because humans have limited attention and patience. AI has no such limitation.

For the crypto industry, the signal is even harsher. The security audit market for DeFi protocols and smart contracts has long relied on a small number of professional audit firms and human experts. If a Mythos-level model can autonomously complete the full workflow from code review to exploit construction, then audit pricing, efficiency, and credibility will be redefined from the ground up. This could be a boon for on-chain security—or the end of audit firms’ moat.

The 2026 AI security race has already upgraded from “can the model understand code” to “can the model break into your system.” Anthropic chose to put defenders on stage first, but it also acknowledges that this window won’t stay open for long.

When AI becomes the strongest hacker, the only way out is to make AI also become the strongest guard.

The problem is: the guard and the hacker use the same model.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments