Claude Fable 5 Is Back. Its Top Coding Score Depends on Who's Counting

Jeffrey Liu··4 min read
Claude Fable 5 Is Back. Its Top Coding Score Depends on Who's Counting

Key Takeaways

  1. 1Anthropic restored Claude Fable 5 on July 1, 2026, nineteen days after a June 12 US export-control directive took it offline; the Commerce Department lifted the order June 30.
  2. 2Fable 5's 80.3% SWE-Bench Pro score is vendor-reported on Anthropic's own scaffolding; Scale AI's standardized leaderboard has no Fable 5 entry.
  3. 3Three numbers currently claim SWE-Bench Pro leadership: 59.1% (GPT-5.4 xHigh, Scale standardized), 69.2% (Opus 4.8, active vendor aggregate), 80% (Fable 5, all-time vendor aggregate).
  4. 4Independently confirmed Fable 5 figures: 95.0% SWE-Bench Verified (vals.ai) and 1,932 GDPval-AA (Artificial Analysis). Epoch AI's evaluation is still pending.
  5. 5At $50 per million output tokens, Fable 5 delivers about 1.6 benchmark points per output dollar; open-weights MiniMax M2.7 delivers about 78.

Anthropic restored Claude Fable 5 on July 1, 2026, nineteen days after a US export-control directive forced the company to switch off its flagship model for every customer in the world. The model returns to the same seat it left: the highest coding score on record, 80.3% on SWE-Bench Pro. Whether that seat is real depends on which leaderboard you read.

What happened to Claude Fable 5?

Claude Fable 5 was offline from June 12 to July 1, 2026 because of a US export-control order, not a technical failure. Anthropic launched the model on June 9 alongside Claude Mythos 5, its less-restricted sibling for vetted partners. Three days later, the government applied export controls after learning of a report in which Amazon researchers bypassed Fable 5's safeguards and got it to identify software vulnerabilities. Anthropic's own testing later found that every model it checked, including GPT-5.5 and its older Claude models, could reproduce the same demonstration. The government lifted the controls on June 30, and Anthropic restored Fable 5 across Claude.ai, the Claude Platform, Claude Code, and Cowork the next day, with a new safety classifier that blocks the reported technique in over 99% of cases and routes blocked requests to Opus 4.8. Mythos 5 remains limited to approved partners. (Anthropic's full account)

Nineteen days is a long time in this market. While Fable 5 sat dark, OpenAI announced GPT-5.6 as a gated preview and Claude Opus 4.8 held the top active score. The suspension turned "which model is best at coding" from a settled question into a contested one, and the restoration does not settle it back.

Does Fable 5 actually lead SWE-Bench Pro?

On Anthropic's own scaffolding, yes. On a neutral harness, nobody knows, because Fable 5 has never been scored on one. Three different numbers currently claim the top of SWE-Bench Pro, and all three are technically real:
    • 59.1%: GPT-5.4 (xHigh), the best score on Scale AI's standardized public leaderboard, where every model runs identical scaffolding.
    • 69.2%: Claude Opus 4.8, the best active model on the llm-stats vendor aggregate, where each lab reports its own tuned-harness results.
    • 80%: Claude Fable 5, the all-time high on that same vendor aggregate, produced with Anthropic's own tooling.
The spread between those numbers is not model quality. It is scaffolding. Vendor harnesses with tuned context retrieval and tool use routinely score 10 to 30 points above Scale's standardized runs of the same model. When a launch headline quotes a SWE-Bench Pro score far above the standardized leaderboard, that is the tell.

Which Fable 5 numbers are independently confirmed?

Two figures survive contact with third parties. The independent leaderboard vals.ai confirms Fable 5 at 95.0% on SWE-Bench Verified, a separate and easier benchmark. Artificial Analysis independently measures 1,932 on GDPval-AA. Those are the defensible numbers today.

The 80.3% SWE-Bench Pro figure comes from Anthropic's system card and has no independent confirmation. Independent evaluators explicitly flag it as vendor-reported, and Epoch AI's neutral evaluation of Fable 5 was still pending as of mid-June. Until it publishes, every Fable 5 SWE-Bench Pro comparison carries an asterisk.

What does the top score cost?

Fable 5 is the most expensive way to buy benchmark points on the market. At $50 per million output tokens, its 80-point vendor score works out to roughly 1.6 SWE-Bench Pro points per output dollar. Open-weights MiniMax M2.7 scores 56.2% at $0.72 per million, roughly 78 points per dollar. The highest absolute score on the leaderboard is also its worst value, before accounting for a tokenizer that can emit up to 35% more tokens per request than pre-4.7 Claude models. (Morph's June 28 leaderboard analysis)

What should teams actually do with this?

Read benchmark scores as two facts, not one: the number, and the harness that produced it. For production model decisions, Scale's standardized commercial-set scores are the closest proxy for private-codebase reality, the standardized public set is the clean cross-model comparison, and vendor-reported numbers are an upper bound under ideal tooling. Fable 5 is back and it may well be the strongest coding model available. The honest version of that sentence is that its strongest evidence is still graded on its own equipment.

FAQ

A June 12, 2026 US export-control directive forced Anthropic to disable Fable 5 and Mythos 5 globally. The Commerce Department lifted the order June 30 and Anthropic restored Fable 5 on July 1, 2026.

80.3% per Anthropic's system card (80.0% on the llm-stats vendor aggregate), produced on Anthropic's own scaffolding. Scale AI's standardized leaderboard has no Fable 5 entry, so there is no neutral-harness score.

SWE-Bench Verified at 95.0% (vals.ai) and GDPval-AA at 1,932 (Artificial Analysis). Epoch AI's independent evaluation is pending.

$50 per million output tokens, the highest output price of any model on the SWE-Bench Pro leaderboard.

Related Articles

More insights on trending topics and technology

Newsletter

We read 100+ sources so you don't have to.

One email. Delivered weekly. The AI and tech stories actually worth your time.