Anthropic Found 1,000 Zero-Days Nobody Else Could. Then They Gave Them Away.
A signed integer overflow in OpenBSD's TCP stack went unnoticed for 27 years. It survived mass fuzzing campaigns, formal audits by one of the most security-obsessed development teams on the planet, and countless third-party reviews. Then an AI model found it in hours, built a proof of concept, and moved on to the next target.
That model is Claude Mythos Preview. The vulnerability is one of roughly a thousand critical and high-severity zero-days it has identified across every major operating system and web browser. And Anthropic's response was not to sell access or stockpile exploits. It was to lock the model down and launch Project Glasswing — a $100 million defensive security initiative designed to get these findings into the hands of the people who maintain the software before anyone else gets a chance to weaponize them.
This is the most consequential thing to happen in software security since the invention of fuzzing. And the implications go well beyond one company's research preview.
The Coalition
Project Glasswing is a partnership between Anthropic and 11 of the largest technology and security companies in the world: Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks. An additional 40-plus organizations that maintain critical open-source infrastructure have access to scan and secure their systems.
The financial commitment is real. Anthropic is providing $100 million in Mythos Preview usage credits, $2.5 million to Alpha-Omega and the Open Source Security Foundation through the Linux Foundation, and $1.5 million to the Apache Software Foundation. This is not a press release partnership. It is funded infrastructure for defensive work.
Every 90 days, the coalition will publish reports on what it has found and what has been fixed. That cadence matters. It creates accountability and gives the broader security community a signal about what classes of vulnerabilities AI is surfacing.
What Mythos Can Actually Do
The numbers are striking, but they undersell the story. Mythos Preview scores 83.1% on CyberGym's vulnerability reproduction benchmark, compared to 66.6% for Claude Opus 4.6. It hits 77.8% on SWE-bench Pro. Those are benchmarks. What matters more is what it does in practice.
Mythos operates through an agentic scaffold. It reads source code, ranks files by vulnerability likelihood on a 1-to-5 scale, then dispatches parallel agents to investigate the most promising targets. Each agent hypothesizes a vulnerability, spins up an isolated container, instruments the code, and attempts to confirm or reject the hypothesis. A final quality-filtering agent evaluates severity. The entire pipeline runs without human steering.
The results are difficult to overstate.
Memory Corruption and Exploit Chains
Mythos independently constructs complex return-oriented programming (ROP) gadget chains, JIT heap sprays, and sandbox escapes that chain multiple vulnerabilities across privilege boundaries. Against Firefox, it developed working exploits 181 times and achieved register control 29 more — where Opus 4.6 succeeded exactly twice.
The FreeBSD NFS Vulnerability
One of the most impressive finds is CVE-2026-4747: an unauthenticated remote code execution flaw in FreeBSD's NFS implementation that grants root access. It is a 128-byte stack overflow in RPCSEC_GSS authentication, and it has been sitting there for 17 years. The exploit requires a sophisticated ROP chain split across six sequential network packets and bypasses stack protection because the relevant binary was compiled without -fstack-protector-strong. Mythos found it, built the exploit, and documented the reproduction steps autonomously.
The FFmpeg H.264 Bug
A slice collision vulnerability in FFmpeg's H.264 decoder dates back to 2003 when the codec was first introduced. It enables an out-of-bounds heap write. This bug survived 16 years of continuous fuzzing — the kind of coverage that is supposed to catch exactly this class of issue. Mythos found it through code-level reasoning, not brute-force input generation.
Reverse Engineering and Logic Bugs
Mythos can reconstruct plausible source code from stripped binaries, identify vulnerabilities in closed-source software, and find firmware exploits. It understands the gap between intended and implemented behavior well enough to catch cryptography implementation weaknesses in TLS, AES-GCM, and SSH — the kind of subtle logic bugs that even specialized auditors miss.
Anthropic's own team put it bluntly: "Last month, we wrote that Opus 4.6 is currently far better at identifying and fixing vulnerabilities than at exploiting them. Our internal evaluations showed that Opus 4.6 generally had a near-0% success rate at autonomous exploit development." Mythos changed that equation overnight.
Why This Changes Everything
Here is the part that should keep you up at night if you run a security team.
Friction-based defenses are dead. Address space layout randomization (ASLR), stack canaries, sandboxing — these mitigations have always been speed bumps, not walls. They work because exploiting them takes time, skill, and patience. Mythos treats them as implementation details. When a model can chain a KASLR bypass with a heap spray and a cross-cache reclamation attack in a single autonomous session, the friction that made exploitation expensive evaporates.
N-day exploitation just became trivial. The gap between a CVE identifier being published and a working exploit existing has historically been days to weeks of skilled researcher time. Mythos collapses that to hours, cheaply, without human intervention. Every unpatched system with a public CVE is now a softer target than it was last month.
The patch cycle is broken. Over 99% of the vulnerabilities Glasswing has found have not yet been patched. That is not a criticism of maintainers — it is a reflection of the volume. When a single model run against OpenBSD costs under $20,000 and surfaces hundreds of findings, the bottleneck is no longer discovery. It is remediation. And most organizations are not built to remediate at that speed.
89% accuracy on severity assessment. When Anthropic's human triagers manually reviewed Mythos's reports, 89% of them matched the model's severity rating exactly. This is not a firehose of false positives. It is a high-signal feed of real vulnerabilities with real exploits attached.
The trajectory matters as much as the current capability. Anthropic sees no reason to think Mythos is where language model cybersecurity capabilities plateau. Every generation has been materially better than the last. If Opus 4.6 was the model that could find bugs but not exploit them, and Mythos is the model that finds and exploits them autonomously, the next generation will do both faster and against harder targets.
What Defenders Should Do Now
This is not a "watch and wait" situation. Here is what is actionable today.
Start using frontier models for vulnerability discovery now. You do not need Mythos to begin. Opus 4.6 is available today through the Claude API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry. Build agentic scaffolds that scan your codebase. Design the procedures, the triage pipelines, and the remediation workflows. When model capabilities improve — and they will — you want the infrastructure already in place.
Accelerate your patch cycle dramatically. If your organization treats CVE-fix dependency bumps as routine maintenance, that posture is now a liability. Every day between a patch being available and a patch being deployed is a day that an AI-assisted attacker can build a working exploit from the CVE description alone. Enable auto-updates wherever possible. Shorten your time-to-deploy for security updates to hours, not weeks.
Automate your incident response pipeline. The volume of legitimate vulnerability reports is about to increase by an order of magnitude. You cannot triage that with the same headcount and the same manual processes. Implement model-assisted triage, alert summarization, and automated severity classification. The models are good enough at this today.
Audit your legacy software exposure. The bugs Mythos found — a 27-year-old TCP flaw, a 17-year-old NFS RCE, a 16-year-old codec vulnerability — share a pattern. They live in old, stable, trusted code that nobody re-examines because it has been running without issues for years. That is exactly where AI-powered discovery excels. Identify your oldest, most critical dependencies and prioritize them for review.
Review your disclosure policies. Coordinated vulnerability disclosure at scale looks different than one-off researcher reports. Glasswing uses a 90-plus-45-day disclosure timeline with SHA-3 commitments that prove possession of unreleased findings without flooding maintainers. If you maintain open-source software, think about how you will handle a surge in high-quality, AI-generated vulnerability reports.
The Choice Anthropic Made
Mythos Preview will not be released for general availability. Anthropic is restricting it to the Glasswing coalition and a future Cyber Verification Program for legitimate security researchers. Future Opus models will incorporate new safeguards informed by what Glasswing discovers.
This is the right call, and it is worth noting why. The offensive potential of a model that autonomously develops kernel exploits from source code is obvious. Restricting access buys time — time to patch the thousand-plus vulnerabilities already found, time to build the defensive infrastructure the industry needs, and time to develop the safeguards that future models will require.
But restriction alone is not a strategy. Anthropic knows this. The Glasswing investment — the $100 million in credits, the open-source funding, the 90-day reporting cadence, the coalition of companies that collectively touch most of the world's software infrastructure — is a bet that defense can outrun offense if you give defenders a head start.
It is the same argument I have been making on this blog: the answer to powerful AI is not to lock it in a vault. It is to put it to work for the people who build and maintain the systems we all depend on. Glasswing is that argument made concrete, at scale, with real money behind it.
The window between "AI can find these bugs" and "everyone has a model that can find these bugs" is measured in months, not years. What the security community does with that window will determine whether this moment is remembered as the beginning of a new era of software safety or the start of something much worse.
The clock is running.