Frontier AI models are gaining cyber capabilities faster than anyone expected. The UK's AI Security Institute (AISI) has revised its estimates upward twice in just a few months.
In November 2025, the agency estimated that cyber capabilities were doubling every eight months. By February 2026, it had revised that figure to 4.7 months. Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have now "substantially exceeded" even that accelerated timeline, according to AISI. Whether this represents a new trend or a one-time jump remains unclear.

Mythos Preview is the first model to clear both AISI cyber ranges
The biggest gains showed up in AISI's cyber ranges, complex attack simulations designed to test real-world hacking ability. One range simulates a 32-step attack on a corporate network that human experts would need about 20 hours to complete, according to AISI. The latest Mythos Preview checkpoint finished the full attack in 6 out of 10 attempts. This checkpoint was also rolled out to partners. The previously tested Mythos version managed it in only 3 out of 10.

The model also solved "Cooling Tower," a simulation of an industrial control system, in 3 out of 10 attempts. No other model had ever passed this simulation, including the earlier Mythos version.
"The direction of travel is clear: cyber capabilities are advancing rapidly, and recent models represent a meaningful step up from what came before," AISI wrote. The agency is already building harder evaluations with active defenses to keep pace with the technology.
XBOW confirms source code analysis strength but sees limits
Offensive security firm XBOW independently tested Mythos Preview with a team of ten experts. The model is "a major advance" and shows "token-for-token" an "unprecedented precision" in vulnerability detection, the company said. Compared to Anthropic's Opus 4.6, Mythos Preview cut false negatives by 42 percent. With additional source code access, that reduction hit 55 percent.

Mythos Preview's biggest strength is source code analysis, according to XBOW. "This was the first instance of a theme that would surface again and again: Mythos Preview is impressive at writing code, but even more impressive at reading it," the report states. The model even found vulnerabilities in Chromium's V8 sandbox, an area where previous models had produced nothing but false positives.
Still, XBOW's evaluation also exposed the limits of that strength. Access to a running system is often more important than access to source code, since many vulnerabilities only emerge from configuration, dependencies, or the interaction between individually secure components.
Even on benchmarks where the vulnerability existed purely in code, removing live system access hurt performance more than removing source code access. Mythos Preview reads code exceptionally well but still depends on interacting with live systems to reach its full potential.
Capable but expensive: costs put the lead in perspective
XBOW raises a question that matters given the sharp rise in AI model pricing: Is the performance worth the cost? Anthropic has announced that Mythos Preview could cost five times as much as an Opus model.
When normalized by estimated operating costs, Mythos Preview "isn't terribly inefficient, at least if you desire high accuracy, but it’s not best-in-class on our benchmarks either," XBOW writes. The alternative would be giving a GPT-5.5-powered agent more time. Often, that delivers equivalent or better results at a lower cost.
"The better option depends on the use case; often, it’s the latter," XBOW writes. The company recommends deploying a "cadre of models" rather than betting on a single one.

Anthropic: "Within a year, Mythos will probably look quite dumb"
Logan Graham, who leads red-teaming around Project Glasswing at Anthropic, put the results in context: Glasswing partners used Mythos Preview to find "many thousands of (estimated) high + critical severity vulnerabilities" in just a few weeks, "sometimes double what they'd normally find in a year."
But Graham stressed this isn't about hyping a single model. "Within a year, Mythos will probably look quite dumb (relative to other new models)."
The real message, he said, is preparing for a world where models are "better, faster, cheaper, and more creative than some of the best human experts at dual use capabilities." Other providers could release openly available or unsecured models at Mythos-level performance.
Cybersecurity is becoming even more political
Anthropic introduced Claude Mythos in early April and restricted access to roughly 50 companies, officially for safety reasons. Some critics called the restrictions overblown or dismissed them as a PR move.
The truth is probably somewhere in between: Claude Mythos may not be an unprecedented outlier, but it is the first publicly announced model of its kind with significantly advanced cyber capabilities that go well beyond what was previously known.
That creates pressure to act across the software industry and in politics alike. The US government is closely examining Claude Mythos and already testing the model, while Anthropic is blocking access for China and apparently the EU as well. OpenAI at least reached out to the EU to discuss early access to GPT-5.5-Cyber. Either way, the situation shows how deeply the European Union depends on the goodwill of major US tech companies, largely because comparable European products don't exist.
AI News Without the Hype – Curated by Humans
Subscribe to THE DECODER for ad-free reading, a weekly AI newsletter, our exclusive "AI Radar" frontier report six times a year, full archive access, and access to our comment section.
