Cybersecurity researchers frustrated with Anthropic’s Fable AI guardrails

Anthropic released its latest AI model Fable on Tuesday, positioning it as a public version of its powerful cybersecurity-focused model Mythos. But the launch has sparked frustration among cybersecurity professionals who say the safety restrictions are too aggressive and impractical for real-world security work.

The complaints highlight a growing tension in the AI industry between safety concerns and practical utility, especially for specialized professional applications. As AI companies face pressure to prevent misuse of their models, they risk making them less useful for the legitimate experts who need them most.

“[Fable] rejects any request that could be tangentially cyber related. Even innocuous tasks like reading a blog post,” said Valentina “Chompie” Palmiotti, a security researcher at IBM X-Force. When Fable’s safety measures are triggered, the AI pauses conversations and explains that its “safety measures flagged this message for cybersecurity or biology topics.”

The guardrails exist to prevent Fable from being used to develop malware or compromise software systems. Biology restrictions stem from similar concerns about developing biological weapons. These safety measures reflect Anthropic’s cautious approach to deploying powerful AI models that could potentially be misused by bad actors.

Anthropic initially released the full Mythos model in April through Project Glasswing, limiting access to select companies and organizations working on critical infrastructure security. Last week, the company expanded Mythos access to hundreds of organizations across 15 countries.

However, security experts say Fable’s restrictions are poorly calibrated. Matt Suiche, a cybersecurity veteran at AI security startup Tolmo, explained that “if you ask it to write secure code, it assumes it is cybersecurity related work instead of software engineering best practices, and you get downgraded.” When Fable hits these guardrails, it automatically falls back to the less capable Claude Opus 4.8 model.

The filtering system appears to rely heavily on keywords rather than context. “It seems to be keyword based, so anything in the lexical field of ‘cybersecurity’ triggers the guardrails,” Suiche noted. This approach creates problems for legitimate security professionals trying to perform routine tasks.

Other researchers have reported similar issues:

  • Basic code review requests trigger safety warnings
  • Reading cybersecurity blog posts is blocked
  • General software security questions are rejected

Despite the frustrations, some experts understand Anthropic’s cautious approach. “It’s better to catch more people than not enough when you do such a release and to relax the guardrails over time,” Suiche said. He expects the restrictions to evolve as AI companies work more closely with cybersecurity firms to refine their safety measures.

The situation reflects broader challenges facing AI companies as they try to balance safety with utility. Similar tensions have emerged across the industry as models become more capable but potentially more dangerous in the wrong hands.

Anthropic offers a potential workaround through its Cyber Verification Program, which allows approved cybersecurity professionals to use Claude with fewer restrictions. OpenAI operates a similar initiative called Trusted Access for Cyber. However, these programs add bureaucratic hurdles that some researchers find burdensome for routine work.

The company did not respond to requests for comment about the guardrail complaints or potential adjustments to Fable’s restrictions.