Anthropic API Pricing Calculator: Claude Haiku, Sonnet 4.6 & Opus 4.8 (2026)
Calculate your real monthly Anthropic API bill with discount toggles for prompt caching (up to 90% off input) and batch API (50% off everything). Compare against GPT-4o at the same token volume.
Updated June 12, 2026 · By Jim Liu
Quick Answer
- • At 1M input + 200K output/month: Haiku 4.5 = $0.80, Sonnet 4.6 = $3.04, Opus 4.8 = $15.04 (no discounts applied).
- • Prompt caching can reduce your input bill by up to 90% — the biggest Anthropic pricing hack most developers miss entirely.
- • Batch API cuts everything in half, but requires async processing with a 24-hour turnaround SLA.
- • Claude Sonnet 4.6 with caching often beats GPT-4o on price for coding workloads with repeated system prompts — effective input cost drops to $0.30/M vs GPT-4o's $2.50/M.
Claude API Pricing Calculator
Best for coding and analysis · Context: 200K tokens
System prompt + user messages + context
Generated responses, code, completions
GPT-4o at $2.50/M input + $10.00/M output, no caching or batch applied.
Claude Model Lineup 2026: Haiku, Sonnet, Opus — What Each Tier Gets You
Claude Haiku 4.5
Best ValueBest for: High-volume simple tasks — summarization, classification, metadata extraction. At $0.80/M input, 10M daily requests costs $8/day.
Claude Sonnet 4.6
Most PopularBest for: Software engineering — code generation, review, and refactoring with test coverage. The default choice for most production Claude deployments.
Claude Opus 4.8
Highest QualityBest for: Agentic pipelines — multi-step reasoning, complex analysis, autonomous research. Use sparingly: 5x the Sonnet cost per token.
Prompt Caching: The 90% Input Discount No One Talks About
Mark parts of your prompt as cacheable — system prompt, tool definitions, large document context. The first call writes to cache at a slight premium (~125% of input price); every subsequent call pays the cache read price: ~10% of standard input price.
Real example: AI coding assistant with a 50K-token system prompt. Without caching: $150 per 1M requests. With caching after the first call: ~$16 per 1M requests — 90% reduction on those input tokens.
Enable: cache_control: { type: "ephemeral" } on the content blocks to cache. TTL is 5 minutes, refreshed on each call that includes the cached block.
| Model | Input price | Cache write | Cache read | Read vs input |
|---|---|---|---|---|
| Claude Haiku 4.5 | $0.80/M | $1.00/M | $0.08/M | 10% of input |
| Claude Sonnet 4.6 | $3.00/M | $3.75/M | $0.30/M | 10% of input |
| Claude Opus 4.8 | $15.00/M | $18.75/M | $1.50/M | 10% of input |
Anthropic Batch API vs Real-Time: When 50% Savings Is Worth the Wait
Anthropic's Message Batches API offers a flat 50% discount on all tokens in exchange for async processing with a 24-hour maximum SLA.
Good for: nightly reports, content pipelines, dataset annotation, bulk summarization. Not for: chatbots, real-time tools, anything user-facing.
API: anthropic.beta.messages.batches.create(requests) — each item needs custom_id + params. Retrieve with batches.results(batch_id).
| Model | Real-time | Batch API (50% off) | Savings |
|---|---|---|---|
| Claude Haiku 4.5 | $1.60 | $0.8000 | $0.8000 |
| Claude Sonnet 4.6 | $6.00 | $3.00 | $3.00 |
| Claude Opus 4.8 | $30.0 | $15.0 | $15.0 |
Related Tools
Frequently Asked Questions
- At an average of 500 input + 500 output tokens per call, 1,000 calls = 0.5M input + 0.5M output. That costs $1.50 + $7.50 = $9.00 at Sonnet 4.6 standard pricing. With batch API this drops to $4.50.