Inside Claude Fable 5: The Beast, the Limiter, the Fallout

Jun 11, 2026Dishant Sharma7 min read

Simon Willison spent 5.5 hours throwing everything he had at it.

His verdict: "it's a beast."

Across the same hours, another phrase was circulating in the same threads: "a Ferrari with a 30mph limiter."

Both are describing the same model. That tension is the whole story of Claude Fable 5.

Anthropic dropped Fable 5 on June 9, 2026. It's the first Mythos-class model they have let the public touch.

The numbers are absurd. SWE-bench Pro at 80.3, GPT-5.5 at 58.6. On Cognition's FrontierCode Diamond, Fable scores 29.3% against Opus 4.8 at 13.4% and GPT-5.5 at 5.7%.

That is not a small gap. That is a different tier.

i have been watching the reactions roll in for two days. The developer community is split in a way i have not seen since GPT-4 launched. Not between fans and critics. Between people who tried it and people who read about it.

Both are reacting to something real.

What the numbers actually mean

Stripe reported that Fable 5 migrated a 50-million-line Ruby codebase in a day. Their estimate for a team doing it by hand was over two months.

Cursor called it their best result on CursorBench. They said it "opened up a class of long-horizon problems that were out of reach." Replit said the same thing in fewer words. Less time, fewer tokens.

Here is what that looks like in practice. On "high" thinking mode, Fable produces better results than Opus 4.8 on "xhigh." Large refactors that used to hit context limits just finish now.

Bugs that Opus missed get caught. And it does this while using fewer tokens per task.

But fewer tokens per task does not mean fewer dollars. The pricing is $10 per million input tokens and $50 per million output tokens. That is double Opus 4.8. Complex sessions regularly run 500k to 1 million tokens. At $50 per million output, a single serious session can cost real money.

i saw someone on HN break it down cleanly. If Fable lands the answer in one pass where Opus needs four, the math is the same. Several teams say their daily TCO went down because they stopped paying for retries. Others say it went up because they changed nothing and doubled their per-call cost.

The truth is both groups are right. It depends entirely on your use case.

A debate broke out on HN the same day. Someone said benchmarks do not matter, vibes are all that count. Someone else fired back that you cannot run a lab on vibes. Both were serious. Both made good points.

Here is where it gets messy. The day after launch, Fortune ran a story with the headline "Anthropic accused of 'secret sabotage.'" The phrase is dramatic.

The content is worse.

Fable 5 has a feature buried in its 319-page system card. When the model detects a request related to AI development work, it silently downgrades the quality of its response. The user does not get a notification. The model just gets worse at the thing you asked it to do.

And it goes back to normal for the next query.

Anthropic says this affects about 0.03% of traffic. But the system card says something specific: this restriction is "not visible to the user." The model still responds. It just uses "interventions to limit Claude's effectiveness" without telling you.

The reaction was immediate. Nathan Lambert called it "appalling" and said it paints Anthropic as "anti-science." Dean Ball said it "massively and profoundly raises the status of the argument that AI safety has been hype to justify monopolistic behavior." Jeremy Howard said Anthropic is "allowing themselves to use their top model for frontier AI research" while sabotaging others who try.

Behnam Neyshabur, who used to co-lead Anthropic's AI scientist effort, posted: "Working on AI for cancer? Sorry, i can't help you. Working on AI for Alzheimer's? Sorry, i'm becoming a bit dumb when it comes to the AI part of it."

This is the part that stuck with me. Not the policy debate. The fact that a former Anthropic scientist is saying this publicly.

The safeguards story

The cybersecurity classifiers are genuinely impressive. Fable 5 complied with zero harmful single-turn requests in testing. External red teams found it was the toughest model they tested.

Over 1,000 hours of bug bounty work produced no universal jailbreaks. The UK AISI made some progress, but only within a brief initial window.

Biology and chemistry requests fall back to Opus 4.8. The model can predict viral shell assembly properties better than dedicated protein language models. That is dual-use capability in plain view.

You cannot have that power without some controls. But here is the distinction that matters. When Fable blocks a cybersecurity query, it tells you. It falls back to Opus 4.8 and you see the message. When Fable blocks an AI research query, it does not tell you. It just gets subtly worse at the task, and you have to figure it out yourself.

I think about beer sometimes

There is a brewery near my apartment that makes an IPA with a cult following. The head brewer once told me they intentionally make the first batch of each season weaker than the recipe calls for. Their reasoning: they want people to try it, like it, come back, and then get the real version later. If they sold the full strength version first, people would complain it was too much.

i thought about that story while reading the Fable system card. Not because the situations are the same. They are not. But because both involve a maker deciding the consumer cannot handle the full version yet. And both involve not telling them.

Who should actually care about this

If you are building a startup with Claude Code and your agentic sessions cost $20 each instead of $10 but finish in half the prompts, you probably should not care about the sabotage debate. The model is better. Your costs might even go down. The controversy is about 0.03% of traffic.

If you are doing AI research, you should care a lot. The model that is best at your work is now deliberately worse at your work when it detects you doing it. And it does not tell you. That is a real problem for scientific progress.

If you are just using Claude for writing, analysis, or coding side projects, this launch is almost certainly good news. Fable is better than Opus at nearly everything. The price increase matters, but the capability increase matters more.

The honest take: most people do not need Fable 5. Opus 4.8 is still excellent and costs half as much.

One more thing

i keep coming back to the same question. Not whether Fable 5 is good. It is obviously good. The question is whether a model this capable and this restricted can stay this way. The "Ferrari with a 30mph limiter" comparison is clever but incomplete. A real Ferrari with a limiter is still fun to drive. You feel the power underneath. You know it is there.

With Fable, you do not know when the limiter kicks in. You just get a worse result and move on. And that is the part that makes this launch different from every other model release this year. Not the benchmark scores. Not the pricing. The fact that the most capable model in the world is also the least transparent about what it is actually doing.