← Back to articles

Claude Mythos and the Cybersecurity AI Hype: What's Real, What's Not

Anthropic claims their new AI can hack any major operating system. I dug into the technical details to separate the breakthrough from the marketing spin.

I’ve been building AI systems for quite a while now, and I’ve learned to be skeptical when companies make bold claims about their models’ capabilities. So when Anthropic announced in April 2026 that their new Claude Mythos Preview can “surpass all but the most skilled humans at finding and exploiting software vulnerabilities” and has already found “thousands of high-severity vulnerabilities in every major operating system and web browser,” my first reaction was: show me the receipts.

After spending the last few days digging through their technical documentation, research papers, and Project Glasswing announcement, I’ve found something interesting. This isn’t just marketing hype—but it’s also not the cybersecurity apocalypse some headlines are suggesting. The reality, as usual, is more nuanced and more fascinating than either extreme.

What Anthropic Actually Claims

Let me start with what Anthropic is actually saying about Claude Mythos Preview, because the claims are specific enough to fact-check:

The model has autonomously identified zero-day vulnerabilities in every major operating system (Windows, macOS, Linux distributions) and every major web browser (Chrome, Firefox, Safari, Edge). These aren’t simple buffer overflows that any static analysis tool could find—they’re talking about complex exploitation chains that combine multiple vulnerabilities.

In one documented case, Mythos Preview wrote a web browser exploit that chained together four separate vulnerabilities, creating what security researchers call a “JIT heap spray”—essentially a technique that tricks the browser’s JavaScript engine into placing malicious code in predictable memory locations—that escaped both the browser’s renderer sandbox and the operating system’s security boundaries. For non-security folks, that’s like picking four different locks in sequence to break into a building.

The oldest vulnerability the model found was a 27-year-old bug in OpenBSD, an operating system literally known for its security-first approach. The bug was so old it predates most of the security practices we consider standard today, yet it had somehow survived decades of human code review and automated security testing.

But here’s where the claims get really interesting: Anthropic says that engineers at the company with “no formal security training” have used Mythos Preview to find remote code execution vulnerabilities overnight. They literally go to sleep, wake up the next morning, and find a working exploit waiting for them.

The Technical Reality Check

I wanted to understand how this actually works, so I dove into Anthropic’s technical blog post on their red team site. The details they share are both impressive and illuminating about what’s really happening here.

The model’s improvement over previous versions is dramatic. According to Anthropic’s technical documentation, Claude Opus 4.6 had a near-zero success rate at autonomous exploit development. When they tested it on vulnerabilities found in Firefox’s JavaScript engine, it managed to create working exploits only 2 times out of several hundred attempts.

Claude Mythos Preview, by contrast, developed working exploits 181 times on the same test cases and achieved what security researchers call “register control” (basically, the ability to manipulate the computer’s memory) 29 additional times. That’s not an incremental improvement—that’s a qualitative leap.

But here’s what Anthropic doesn’t emphasize in their marketing materials: they didn’t specifically train this model to be a hacking machine. According to their technical documentation, these cybersecurity capabilities “emerged as a downstream consequence of general improvements in code, reasoning, and autonomy.” In other words, they made Claude better at understanding and reasoning about code in general, and exploitation capabilities just… appeared.

This emergence pattern is both encouraging and concerning. It’s encouraging because it suggests the same improvements that make AI better at finding vulnerabilities also make it better at fixing them. It’s concerning because it means these capabilities might show up in other models without their creators even intending it.

Putting the Numbers in Context

Now let’s talk about those “thousands” of vulnerabilities. This number sounds impressive until you understand how vulnerability counting works in the security industry.

Anthropic mentions they tested against roughly 1,000 open source repositories from the OSS-Fuzz corpus across 7,000 entry points. Finding multiple vulnerabilities across that much code isn’t necessarily surprising—most large codebases have bugs, and many of those bugs could be security-relevant under the right circumstances.

What is impressive is the severity of what they’re finding. They grade vulnerabilities on a five-tier scale: tier 1 (basic crashes that don’t grant system access), tier 2 (memory corruption without control), tier 3 (limited code execution), tier 4 (significant system access), and tier 5 (complete control flow hijack where attackers can run arbitrary code). Previous Claude models typically maxed out at tier 1 or 2 vulnerabilities with very rare tier 3 findings. Mythos Preview achieved full control flow hijack (tier 5) on ten separate, fully patched targets.

For comparison, Claude Sonnet 4.6 and Opus 4.6 each achieved only a single tier 3 crash across all their testing. The jump from essentially zero high-severity findings to ten tier 5 exploits represents a fundamental capability shift.

But let’s also be honest about what these numbers don’t tell us. Anthropic can only publicly discuss about 1% of the vulnerabilities they’ve found because the other 99% haven’t been patched yet. This limitation, while responsible, makes it impossible to independently verify the severity and impact of most of their discoveries.

Project Glasswing: Defensive Strategy or PR Move?

Anthropic’s response to discovering these capabilities was to launch Project Glasswing, a $100 million initiative bringing together Amazon Web Services, Anthropic, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks—12 major organizations total—to use Mythos Preview for defensive security work.

On the surface, this looks like responsible AI development. Instead of quietly developing a super-hacking tool, they’re rallying the industry to use it for defense first. They’ve committed $100 million in usage credits for defensive security work and $4 million in direct donations to open-source security organizations.

But I can’t help wondering if this is also brilliant positioning. By framing the announcement around defense and industry cooperation, Anthropic gets to demonstrate their model’s capabilities while appearing responsible. It’s a lot better PR than “we accidentally built a super-hacker and don’t know how to control it.”

The timing is also interesting. This announcement comes as AI companies are facing increasing scrutiny about the safety implications of frontier models. Demonstrating that you can use powerful AI for cybersecurity defense helps make the case that these models provide net benefits to society.

What This Means for the Rest of Us

The honest answer is that most of us won’t directly interact with Claude Mythos Preview anytime soon. Anthropic is keeping this model under tight access controls, limiting it to vetted organizations working on defensive security.

But the broader implications are significant. If one company can accidentally develop these capabilities as a side effect of general AI improvements, other companies probably can too.

This creates an interesting arms race dynamic. Companies developing AI need to assume that both defensive and offensive actors will have access to these capabilities soon. The advantage will likely go to whoever can deploy and iterate on these tools most effectively.

OpenAI’s recent documentation on strengthening cyber resilience suggests they’re planning for similar capabilities in their upcoming models, with their GPT-5 family improving from 27% to 76% success rates on cybersecurity capture-the-flag challenges.

For software developers, this means the cost of shipping insecure code is about to go up dramatically. Bugs that might have taken security researchers months to find and exploit could soon be discoverable overnight by AI systems. The grace period that many organizations have relied on—where they could patch vulnerabilities after disclosure but before widespread exploitation—is shrinking.

The Economics of AI-Powered Security

One aspect that doesn’t get enough attention in the hype is the economic implications. Traditional security research requires highly skilled humans who command significant salaries and can only work so many hours per day. If AI can automate even part of this work, it fundamentally changes the economics of both attack and defense.

Anthropic’s claim that non-security engineers can now find sophisticated vulnerabilities overnight suggests we’re approaching a world where the barrier to entry for security research drops dramatically. This could democratize defensive security research, helping smaller organizations identify and fix vulnerabilities they couldn’t afford to find otherwise.

But it also potentially democratizes offensive capabilities. While Anthropic is keeping Mythos Preview under access controls, the underlying techniques that enabled these capabilities will inevitably spread to other models and organizations with fewer scruples about how they’re used.

Looking Past the Marketing

After digging through all the documentation and claims, here’s what I think is actually happening:

Anthropic has achieved a genuine breakthrough in AI’s ability to understand and manipulate code for security purposes. The technical evidence they’ve shared, while limited, is convincing that this represents a significant capability jump over previous models.

However, the framing of this breakthrough as a cybersecurity revolution feels overblown. Security professionals have been using automated tools to find vulnerabilities for decades. What’s changed is that the automation has gotten much more sophisticated and can handle more complex reasoning tasks.

The Project Glasswing initiative appears to be both a genuine attempt to ensure these capabilities benefit defenders and a savvy PR strategy that positions Anthropic as a responsible leader in AI safety.

The “thousands of vulnerabilities” claim is probably accurate but should be understood in the context of testing against thousands of software targets. Finding bugs in software isn’t surprising—finding high-severity bugs reliably and autonomously is what’s noteworthy.

What Happens Next

The most interesting question isn’t whether AI can find security vulnerabilities—it’s what happens when these capabilities become more widely available. Will the advantage go to defenders or attackers? Will the overall security of software improve or degrade?

Anthropic and OpenAI both argue that defensive applications will ultimately dominate because defenders can use these tools at scale across their entire infrastructure, while attackers typically focus on specific targets. This makes intuitive sense, but it assumes that defensive organizations will adopt these tools faster than malicious actors.

The evidence from other security technologies is mixed. When fuzzing tools became widely available, they did help defenders find and fix many vulnerabilities. But they also enabled attackers to find zero-days more easily. The net effect was probably positive for security, but the transition period created new risks.

We’re likely entering a similar transition period for AI-powered security tools. The next few years will determine whether the optimists are right that defense will benefit more than offense, or whether we’ll see a temporary period where sophisticated attacks become more common before defenses catch up.

Bottom Line

Claude Mythos Preview represents a real breakthrough in AI’s cybersecurity capabilities, not just marketing hype. The technical evidence Anthropic has shared is convincing that this model can autonomously find and exploit sophisticated vulnerabilities at a scale and sophistication level we haven’t seen before.

But the apocalyptic framing around “AI that can hack any system” misses the more nuanced reality. This is powerful automation of tasks that skilled humans were already doing, not magic that creates vulnerabilities where none existed.

The more important story is how the industry responds to these capabilities becoming available. Project Glasswing might be as much about PR as genuine industry cooperation, but the principle is sound: if these tools are going to exist, defenders need to get their hands on them first.

For those of us building software, the message is clear: the era of security through obscurity and slow disclosure timelines is ending. AI is making it easier to find bugs in code, which means we need to get better at writing secure code in the first place.

The hype around Claude Mythos will fade, but the underlying capability shift it represents is here to stay. The question isn’t whether AI will transform cybersecurity—it’s whether we’ll use that transformation to make the digital world more secure or more vulnerable.


What do you think about AI’s role in cybersecurity? Are you seeing these capabilities emerge in your own security work? I’d love to hear about your experiences with AI security tools.