Between February and May 2026, Anthropic killed its cheapest model, shipped a new flagship whose tokenizer silently inflates your bill by up to 35%, and is about to break every app still pointing at the original Claude 4 release strings. The rate card says $5/MTok. Your actual bill may disagree.

Anthropic Raised Your Claude API Bill Three Times in 90 Days. They Never Called It a Price Increase.

Between February and May 2026, Anthropic’s official pricing page stayed locked at $5 per million input tokens for Opus. The rate card never budged. But we’ve been running the same workloads across those months, and our bills grew anyway — because Anthropic shipped three separate cost increases dressed up as model upgrades and routine maintenance.

The Three Increases — And When They Hit You

Increase #1 hit April 20, 2026: Anthropic retired Claude Haiku 3, its cheapest model at $0.25 per million input tokens. Anyone running cost-sensitive tasks on Haiku 3 had two choices: stay on a dead model (no longer supported) or migrate to Haiku 4.5 at $1 per MTok. That’s a 4x jump in per-token cost. If your entire inference pipeline depended on Haiku 3’s $0.25 rate, the forced migration is a 4x price increase whether or not you wanted it.

Increase #2 arrived in May 2026: Opus 4.7 shipped with a new tokenizer that Anthropic documented explicitly: “This new tokenizer may use up to 35% more tokens for the same fixed text.” Translation: identical prompts that consumed 1M tokens under Opus 4.6 now consume up to 1.35M tokens. The per-token rate is unchanged. The bill isn’t.

Increase #3 lands June 15, 2026: Anthropic is retiring both Claude Opus 4 and Claude Sonnet 4, forcing migration to Opus 4.7 (the tokenizer-inflated model) on a hard deadline. Any application still calling claude-opus-4-20250514 will error out that day. No opt-out, no grace extension.

The Math on Increase #2: Why Token Inflation Matters

We modeled a coding agent running on Opus 4.6, processing 1M input tokens plus 200K output tokens daily. Monthly cost: $300 ($5 x 1M + $25 x 0.2M). The same workload on Opus 4.7, applying the 35% tokenizer ceiling, lands at $405/month. That’s $105/month more on identical prompts, identical outputs, identical use case.

Anthropic’s official pricing table is explicit: the new tokenizer is baked into Opus 4.7’s architecture. There’s no opt-out path. If you’re running subscription-tier Claude and wondering where the API fits into your stack, our Claude Max 5x vs 20x breakdown covers the consumer-tier cost picture separately.

What Anthropic’s Announcements Actually Said

The three increases were never announced as price hikes, because per-token rates technically didn’t change. What changed was buried in product copy:

Haiku 3’s retirement notice read like a performance upgrade: “retiring legacy models to improve performance.” What it meant was that the cheapest tier in the catalog was gone. Opus 4.7’s tokenizer note was tucked inside a pricing footnote about improved task performance, the kind of sentence you skip when you’re excited about capability gains. And the Opus 4 / Sonnet 4 sunset framing (“all models actively maintained at their best versions”) made forced migration sound like routine ecosystem hygiene rather than a deadline that will break production apps.

The rate card staying clean is doing a lot of work here. We don’t think Anthropic is being deceptive (the tokenizer inflation is documented). But it takes real effort to connect the three events and realize your effective API cost went up 30%+ in 90 days without a single rate card change.

What Actually Cuts Cost: It’s Not Magic

We audited the two levers Anthropic offers:

Prompt caching reduces cache-read costs to 10% of standard input pricing. It works well if you’re reusing the same context across multiple calls: system prompts, document corpora, shared conversation history. A one-shot request gets nothing from it. According to Finout’s analysis, caching can recover up to 90% of the tokenizer overhead, but only for repetitive workloads where the same content appears in every request.

Batch API applies a flat 50% discount on both input and output tokens. It’s asynchronous: results come back in minutes or hours, not seconds. For a coding agent that needs real-time responses, batch mode is off the table. For bulk document processing, it saves $150/month on that same $300 baseline.

Stacking both together (batch API for non-urgent work, caching for repeated contexts) is the most effective cost defense we’ve found. But neither tool fully recaptures the 35% tokenizer tax if your workload doesn’t fit their design assumptions. For a wider view of where Anthropic’s API sits in the current developer cost landscape, our LLM API price war breakdown has the full cross-provider comparison.

The June 15 Deadline: What Needs to Change

In 36 days, every application still calling Opus 4 or Sonnet 4 will throw errors. Here’s the audit we ran:

Model string audit. We grepped our codebases for claude-opus-4-20250514 and claude-sonnet-4-20250514, both in application code and in .env files and configuration management. Anthropic’s migration guide documents the full API breaking changes in Opus 4.7, including the new extended thinking syntax and the removal of sampling parameters. Worth reading before you flip the string.

Cost benchmark before full cutover. The 35% tokenizer inflation is the ceiling. Your workload might land at 20% or 10% depending on content type and domain. We ran comparisons on real traffic samples before committing. As Nerova explains, per-token rate comparisons are less useful than per-task cost modeling. Measure what your inference actually spends on your data, not what the rate card predicts.

Batch or cache decision. If your application can tolerate 5-10 minute latency on some request types, batch processing cuts that cost in half. If you’re running the same system prompt or document context repeatedly, enable prompt caching. Many teams benefit from both running simultaneously on different request classes.

Provider comparison if you’re price-sensitive. Anthropic hasn’t raised per-token rates, but through model churn and tokenizer inflation, effective costs went up 30%+ in 90 days. If cost matters more than staying on Claude specifically, our Claude Code review covers how token burn actually stacks up in real agentic workloads, useful data for a fair cross-provider comparison.

We think Anthropic’s strategy is deliberate: keep the rate card clean while embedding cost increases in product velocity. Documented, not deceptive. But the bill is the only truth that matters, and it’s worth doing that math before June 15 hands you a 400 error and a forced decision.

Sources: Anthropic Claude API Pricing · Anthropic Model Deprecations · Finout: Claude Opus 4.7 Pricing Analysis · Nerova: Claude Opus 4.7 Pricing Explained · AI Codex Migration Guide