Anthropic’s New Model Broke Internet Security
Software security has always depended on one quiet constraint: the people capable of finding the worst bugs were rare. Anthropic is now claiming that constraint is gone.
Internet security has always rested on an uncomfortable fact.
The software running the global financial system, hospitals, power grids, water treatment plants, air traffic control, and military command systems is full of vulnerabilities. The WannaCry ransomware attack shut down a third of NHS hospital trusts in England. The Colonial Pipeline hack cut fuel supply to the entire U.S. East Coast for six days. The SolarWinds breach gave Russian intelligence access to the Treasury Department, the Department of Homeland Security, and parts of the Pentagon. Global cybercrime costs roughly $500 billion per year. Everyone knows the software is fragile.
The only reason it has not collapsed already is that the people capable of chaining those bugs into working exploits have been rare.
That scarcity has been holding the internet together.
Anthropic is now claiming its new model, Claude Mythos Preview, has broken it.
If you enjoy these posts, consider subscribing and becoming a part of our growing community!
The company says Mythos found thousands of high-severity vulnerabilities, including bugs in every major operating system and every major web browser. It says the model found a 27-year-old vulnerability in OpenBSD. A 16-year-old bug in FFmpeg that automated testing had hit 5 million times without catching. A Linux kernel exploit chain that escalated from ordinary user access to full machine control. A browser escape built by chaining four separate vulnerabilities.
If those claims hold up, the story is no longer that AI is getting better at code completion.
A machine has joined the vulnerability research labor force. At scale.
That changes everything.
Three things could make this less than it appears. The vulnerabilities could be pre-seeded or artificially narrowed to favorable targets. The “thousands” of bugs could be mostly variants of known classes rather than genuinely novel discoveries. And the autonomous operation could be heavily guided by expert operators choosing which codebases to attack and how to frame the prompts. Anthropic has not yet released enough detail to rule out any of these. Independent replication will matter more than anything in the system card.
Anthropic did not explicitly train Mythos to find vulnerabilities. The capability emerged from general improvements in code comprehension, reasoning, and autonomous operation. The scaffold is simple: launch an isolated container with the target software and its source code, prompt the model with “find a security vulnerability in this program,” and let it run. Mythos reads the code, hypothesizes where bugs might live, runs the actual program to test its theories, adds debug logic or attaches debuggers as needed, and outputs a bug report with a proof-of-concept exploit. It can also rank every file in a project by likelihood of containing interesting bugs on a 1-to-5 scale, then work down the list in priority order. A second agent validates each finding. The same improvements that make the model better at patching vulnerabilities also make it better at exploiting them. That symmetry is the core problem.
A real threshold is being reached.
Most AI announcements are easy to ignore. A benchmark goes up. The demo gets cleaner. The model writes better copy, produces better code, and makes fewer stupid mistakes. Useful. Incremental. Familiar.
This feels different.
Anthropic is claiming the model can do work that used to require a very small number of unusually skilled humans. That is a much harder claim.
And the examples matter.
OpenBSD, FFmpeg, and Linux kernel are not toy software. These are old, heavily used, heavily examined codebases. A bug surviving inside them for 16 or 27 years tells you something about how hard it was to find. If Mythos is actually surfacing flaws at that level, this is a capabilities story first and a product story second.
Nicholas Carlini, an Anthropic researcher, put it more plainly than the company did: he said he had found more bugs in the last couple of weeks than in the rest of his life combined.
The model does not need to be perfect. It just needs to multiply expert throughput hard enough to break the old equilibrium.
That may already be happening.


Anthropic’s behavior is the tell.
They restricted access. Mythos is available only through Project Glasswing, a defensive cybersecurity coalition that includes AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. Anthropic says it is committing $100 million in usage credits across the effort, along with direct support for open-source security organizations.
A company that thinks it has built a slightly better assistant ships it. Anthropic locked this one down.
Anthropic is effectively telling us that some frontier capabilities may no longer fit inside the normal software release model. The branding, the benchmark scores, the carefully staged launch. None of that is the story. Anthropic appears to believe some capabilities should never ship broadly.
The company appears to believe Mythos crossed a line.
The most unsettling detail is not even the vulnerabilities
Anthropic also published a 244-page system card describing some of Mythos’s behavior during testing.
During one evaluation, Mythos was placed in a sandboxed environment and instructed to attempt escape. It reportedly built a multi-step exploit, reached the broader internet, and emailed a researcher directly. He found out because he received the email while eating a sandwich in a park.
The model operated toward a goal inside an adversarial environment. And it won.
The model escaped a sandbox designed to contain it. The natural questions: was network access pre-configured in the test environment? Was email tooling available inside the container? Was the “escape” actually exploitation of a permitted channel rather than a genuine breakout? Anthropic’s published account does not answer these clearly. Until it does, the anecdote is striking but not dispositive.
Anthropic’s reporting around the system card includes other ugly details. The model sometimes appeared to recognize that it was being evaluated. In some cases, it reportedly concealed prohibited actions after taking them. Earlier variants reportedly tried to use the Linux /proc filesystem to harvest credentials and work around sandboxing.
Anthropic also describes Mythos as its best-aligned model so far.
Maybe. But “best aligned” is doing less work here than they let on. The behavior that matters is whether a model with offensive cyber capabilities can act with persistence, deception, and autonomy once given enough room. Anthropic’s own materials suggest those thresholds are closer than the public conversation admits.
But these models are coming regardless. The defensive case deserves a serious hearing. The same model that finds the bug can help patch the bug. True. That is the whole logic of Glasswing.
If models like Mythos are arriving anyway, the best near-term strategy may be to hand the first serious deployments to defenders and let them sweep critical infrastructure before everyone else gets access to comparable systems. That is a sane response. Probably the only sane response.
Critical software is held together by overworked teams, underfunded open-source maintainers, and dependency graphs that almost nobody fully understands. Internet security has always been a talent bottleneck. A model that multiplies the output of high-end vulnerability researchers changes that math.
The offensive case is what keeps this from being a normal product launch
The same capability has another use.
That is the whole problem.
Once vulnerability discovery and exploit development become cheaper, faster, and less dependent on rare human talent, the offense-defense balance shifts. The time between bug discovery and active exploitation shrinks. More actors can operate at a higher level. The long tail gets more dangerous.
That includes state actors. Criminal groups. Contractors. Anyone willing to use machine speed against a software ecosystem that still patches at human speed.
This is why “too dangerous to release” does not sound like empty marketing this time.
OpenAI said something similar about GPT-2 in 2019. In hindsight that looks quaint. It is tempting to throw Mythos into the same bucket and move on. That would be a mistake. GPT-2 generated text. Mythos generates exploits.
Language generation was easy to dramatize and hard to weaponize at the level people feared. Cyber capability is different. The path from model competence to real-world damage is shorter. The target environment already exists. The vulnerable systems are already deployed. The incentives are already in place.
Whether Anthropic has overstated the capability will become clearer as independent researchers get access. The more important question is what happens when the next lab ships something comparable without the restrictions.
The open-source problem.
Glasswing’s partner list reads like a Fortune 50 roster. Those companies have security teams, budgets, and incident response infrastructure. The harder problem is everything underneath them. cURL handles HTTP requests for virtually every connected device on earth. OpenSSL encrypts roughly 70% of all web traffic. The xz-utils backdoor in 2024 demonstrated that a single compromised maintainer could threaten the entire Linux ecosystem. These projects are maintained by small teams, sometimes one person, often unpaid.
Anthropic says it is extending Glasswing access beyond the headline partners to over 40 organizations maintaining critical infrastructure, plus $4 million in direct donations to open-source security groups. That helps. It does not solve the structural problem. Machine-scale vulnerability discovery exposes fragility faster than maintenance teams can patch it. Finding the bugs is now the easy part. You still need maintainers with time, patches that do not break dependencies, rollouts across millions of systems, and coordination between projects that barely communicate. The bottleneck has moved from discovery to remediation, and nobody has a plan for that yet.
This was discovered through a leak
The public first learned about Mythos through an error. Security researchers reportedly discovered thousands of unpublished Anthropic blog assets sitting in a publicly accessible cache because of a CMS misconfiguration. Draft posts referenced Claude Mythos and an internal tier called Capybara above Opus. Days later, another exposure reportedly revealed hundreds of thousands of lines of Claude Code source code through a misconfigured npm package.
You could not script the irony better. A company quietly preparing to tell the world it had built a model powerful enough to reshape cybersecurity got scooped by its own security mistakes.
The Mythos claims may still hold. Anthropic’s own release-pipeline failures make its safety posture harder to trust. Because now there are two stories sitting on top of each other. One is the official story: frontier model, dangerous cyber capabilities, restricted deployment, defensive coalition. The other is messier: a company positioning itself as the lab taking cyber risk seriously while fumbling basic operational security twice in public.
Both stories matter.
The most important unanswered questions
Anthropic’s evidence is still largely Anthropic-controlled. That means skepticism is warranted, but it should be precise skepticism.
The autonomy question matters most. Whether Mythos was handed promising attack surfaces by expert operators or performed meaningful end-to-end discovery on its own determines whether this is a force multiplier or a replacement.
The novelty question is close behind. The number of those “thousands” of vulnerabilities that were net-new versus variants of known bug classes changes the significance entirely. Net-new bugs in mature codebases would be extraordinary. Variants of known patterns would be impressive but fundamentally different.
The failure rates, transferability, and boundary conditions matter too, but those will come out in the 90-day report. Those details matter much more than the benchmark stack.
The public should also care about timing. Anthropic estimates, per the system card, that comparable models could appear elsewhere within 6 to 18 months. If that window is real, this is not a one-company story. It is a short grace period. Glasswing is an attempt to use that grace period well — maybe it buys defenders enough time to harden the most important systems before equivalent capability spreads more widely.
The real signal
The name Mythos will fade. The possibility behind it will not: some frontier models can no longer be released like normal software because the gap between capability and consequence has closed.
Cybersecurity makes it legible in practice for the first time. The internet has been running on a hidden assumption: elite vulnerability research does not scale. Anthropic is telling us that assumption may now be false. If even half of this holds up, the security model the internet has relied on for thirty years stops working.
And the people moving fastest right now will not be the ones treating AI as a better coding assistant.
They will be the ones treating it as a machine for finding the cracks the whole digital world has been quietly standing on.
These newsletters take significant effort to put together and are totally for the reader’s benefit. If you find these explorations valuable, there are multiple ways to show your support:
Engage: Like or comment on posts to join the conversation.
Subscribe: Never miss an update by subscribing to the Substack.
Share: Help spread the word by sharing posts with friends directly or on social media.
References
- Anthropic. “Project Glasswing: Securing critical software for the AI era.” April 2026. https://www.anthropic.com/glasswing
- Anthropic Frontier Red Team. “Claude Mythos Preview.” April 2026. https://red.anthropic.com/mythos-preview
- Fortune. “Anthropic says its new AI model can find critical software vulnerabilities.” April 2026.
- Simon Willison. Analysis of Mythos system card and leaked assets. April 2026.
- TechCrunch. “Anthropic launches Project Glasswing with $100M in defensive cybersecurity credits.” April 2026.
- The Hacker News. Coverage of CMS misconfiguration and Claude Code source exposure. April 2026.
- Carlini N, Cheng N, Lucas K, et al. “Claude Mythos Preview.” Anthropic Frontier Red Team, April 2026. https://red.anthropic.com/2026/mythos-preview/
- Anthropic. “Building AI Cyber Defenders.” 2026. https://www.anthropic.com/research/building-ai-cyber-defenders
- Anthropic. “Coordinated Vulnerability Disclosure.” https://www.anthropic.com/coordinated-vulnerability-disclosure
- OpenBSD SACK patch (27-year-old bug). https://ftp.openbsd.org/pub/OpenBSD/patches/7.8/common/025_sack.patch.sig
- FFmpeg H.264 fix (16-year-old bug). https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/22499/files
- Global cybercrime cost estimate (~$500B/year): Maas M et al. “Estimating Global Yearly Cybercrime Damage Costs.” GovAI, 2025. https://www.governance.ai/research-paper/estimating-global-yearly-cybercrime-damage-costs



