Anthropic Restores Worldwide Access to Claude Fable 5 and Mythos 5 Models Following US Regulatory Outage and Enhanced Security Implementations
Anthropic has reestablished worldwide availability of version 5 of its state of the art language models, Claude Fable 5 and Mythos 5, after a short service outage. The company announced via official channels that an order from the US government to impose immediate export controls on the two models caused the interruption. As the order went into effect immediately and Anthropic was unable to track the nationality of its users in real time, the developer temporarily cut access to prevent regulatory violations.
This regulatory intervention was based on a review of an external report assembled by Amazon researchers. The report described a means of circumventing the controls of Fable 5. Anthropic noted that "as a consequence, the system detected numerous software issues and in one instance resorted to generating code to harness this flaw." In response, Anthropic collaborated with federal agencies to examine the bypass method. Their investigation revealed that multiple legacy models, including Claude Opus 4.8, GPT 5.5, and Kimi K2.7, had already demonstrated the ability to recognize the same flaws outlined in the report.
Fable 5 and Mythos 5 are built on the same framework but utilize different safety settings. Mythos 5 has very few limitations and is only used by a limited number of trusted partners under the defensive cybersecurity program, Project Glasswing. Meanwhile, Fable 5 is available to the general public with a much higher safety buffer. To eliminate the attack vector identified by Amazon, Anthropic implemented an improved safety classifier able to eliminate the targeted bypass method in over 99 percent of tested examples. If a user request is detected by this new safety filter, the prompt is redirected to the older Opus 4.8 model.
This elevated classifier is based on a defense in depth strategy, integrating both automated guardrails and retroactive pattern detection. Automated classifiers serve as a secondary AI system that detects and flags malicious intent in inputs and outputs. To reduce the chance of success with complex prompts intended to trick the classifier, Anthropic increased the safety margin of the Fable 5 model. This increased margin means the model will block all collusions that have even a small likelihood of causing damage, even if they prove to be noise during most coding and debugging tasks.
The regulatory freeze has exposed a fundamental industry need for a shared language to measure the severity of security bypasses or jailbreaks. To establish this, Anthropic is working with Google, Microsoft, Amazon, and other tech companies on a consensus framework. This new scheme assigns a score to each new exploit based on four parameters. The first parameter is capability gain, which tracks how much an exploit propels a user beyond existing open source tools. The second parameter is the scope of the capability gain, measuring how many different offensive tasks can be performed with the same bypass method. The third is the ease of weaponization, which assesses the amount of manual prompting and knowledge needed to provoke the exploit. The fourth parameter is zero discoverability, which assesses how easily a novice user might find the bypass online.
In this collaborative approach, very high severity jailbreaks lead to the instantiation of reflexive mitigation measures. Anthropic is setting up a dedicated security team to monitor critical public messaging for threats around the clock. To tie its security paradigm more closely to public policy, the company is onboarding a dedicated program on HackerOne that allows independent researchers to submit newly found exploits for review and reward.
Going forward, Anthropic has committed to several joint projects with the US government to enable the safe launch of frontier technology. Federated models and their associated guardrails will be shared in advance of release to defined federal partners. This prerelease access will enable independent testers, such as researchers from the Center for AI Standards and Innovation, to evaluate systems for national security concerns. Anthropic will also provide threat intel reports and technical information about new safety classifiers to federal clearinghouses to aid in estimation.
