How much does Claude Sonnet 4.6 cost per 1,000 API calls?

At an average of 500 input + 500 output tokens per call, 1,000 calls = 0.5M input + 0.5M output. That costs $1.50 + $7.50 = $9.00 at Sonnet 4.6 standard pricing. With batch API this drops to $4.50.

What is the difference between cache_write and cache_read in Anthropic's API?

cache_write is charged at ~125% of the normal input price — you pay a premium to store content in the cache. cache_read is charged at ~10% of the normal input price for all subsequent calls that hit the same cached content. Break-even is roughly 2 cache reads per cache write; after that every call saves ~90% on those input tokens.

Does Anthropic's prompt caching work with all Claude models?

Yes, ephemeral prompt caching (5-minute TTL) is supported on Claude Haiku 4.5, Sonnet 4.6, and Opus 4.8. Pricing ratios are consistent across all models: cache_write at ~125% of input, cache_read at ~10% of input. Enable by adding cache_control: { type: "ephemeral" } to the content blocks you want cached.

When does Claude Opus 4.8 make economic sense vs Sonnet 4.6?

Opus 4.8 is 5x more expensive than Sonnet 4.6 per token. It makes sense for low-volume, high-stakes tasks: multi-step agent pipelines, autonomous research, legal document drafting, or tasks where quality directly translates to revenue. For most coding and analysis workloads, Sonnet 4.6 delivers 90%+ of Opus quality at 20% of the cost.

How do I enable batch processing in Anthropic's Python SDK?

Use anthropic.beta.messages.batches.create() with a list of MessageBatchRequestParam objects. Each needs a custom_id and params (model, messages, max_tokens). Batches resolve within 24 hours. Retrieve results with batches.results(batch_id). Ideal for nightly pipelines, dataset annotation, and report generation.

Can I combine prompt caching AND batch API for maximum savings?

Yes. Anthropic supports combining both discounts. Cache writes get the 50% batch discount and cache reads remain at the low cache_read rate with an additional 50% off. For a Sonnet 4.6 workload with a stable 50K-token system prompt at 80% cache hit rate, combined savings can exceed 90% of the standard input cost.

Back to Tools

Anthropic API Pricing Calculator: Claude Haiku, Sonnet 4.6 & Opus 4.8 (2026)

Calculate your real monthly Anthropic API bill with discount toggles for prompt caching (up to 90% off input) and batch API (50% off everything). Compare against GPT-4o at the same token volume.

Updated June 12, 2026 · By Jim Liu

Quick Answer

• At 1M input + 200K output/month: Haiku 4.5 = $0.80, Sonnet 4.6 = $3.04, Opus 4.8 = $15.04 (no discounts applied).
• Prompt caching can reduce your input bill by up to 90% — the biggest Anthropic pricing hack most developers miss entirely.
• Batch API cuts everything in half, but requires async processing with a 24-hour turnaround SLA.
• Claude Sonnet 4.6 with caching often beats GPT-4o on price for coding workloads with repeated system prompts — effective input cost drops to $0.30/M vs GPT-4o's $2.50/M.

Claude API Pricing Calculator

Select Claude Model

Best for coding and analysis · Context: 200K tokens

Input Tokens / Month (M)

System prompt + user messages + context

Output Tokens / Month (M)

Generated responses, code, completions

Enable Prompt CachingUp to 90% off input

Enable Batch API50% off everything

Monthly Cost

$6.00

1.00M input + 0.20M output

Input cost$3.00

Output cost$3.00

vs GPT-4o at same volume

$4.50GPT-4o saves $1.50

GPT-4o at $2.50/M input + $10.00/M output, no caching or batch applied.

Compare all providers → AI Model Cost Comparison

Claude Model Lineup 2026: Haiku, Sonnet, Opus — What Each Tier Gets You

Claude Haiku 4.5

Best Value

Fastest and most affordable

Input$0.80/M tokens

Output$4.00/M tokens

Cache write$1.00/M

Cache read$0.08/M

Best for: High-volume simple tasks — summarization, classification, metadata extraction. At $0.80/M input, 10M daily requests costs $8/day.

Claude Sonnet 4.6

Claude Opus 4.8

Highest Quality

Most capable, agentic tasks

Input$15.00/M tokens

Output$75.00/M tokens

Cache write$18.75/M

Cache read$1.50/M

Best for: Agentic pipelines — multi-step reasoning, complex analysis, autonomous research. Use sparingly: 5x the Sonnet cost per token.

Prompt Caching: The 90% Input Discount No One Talks About

Mark parts of your prompt as cacheable — system prompt, tool definitions, large document context. The first call writes to cache at a slight premium (~125% of input price); every subsequent call pays the cache read price: ~10% of standard input price.

Real example: AI coding assistant with a 50K-token system prompt. Without caching: $150 per 1M requests. With caching after the first call: ~$16 per 1M requests — 90% reduction on those input tokens.

Enable: cache_control: { type: "ephemeral" } on the content blocks to cache. TTL is 5 minutes, refreshed on each call that includes the cached block.

Model	Input price	Cache write	Cache read	Read vs input
Claude Haiku 4.5	$0.80/M	$1.00/M	$0.08/M	10% of input
Claude Sonnet 4.6	$3.00/M	$3.75/M	$0.30/M	10% of input
Claude Opus 4.8	$15.00/M	$18.75/M	$1.50/M	10% of input

Anthropic Batch API vs Real-Time: When 50% Savings Is Worth the Wait

Anthropic's Message Batches API offers a flat 50% discount on all tokens in exchange for async processing with a 24-hour maximum SLA.

Good for: nightly reports, content pipelines, dataset annotation, bulk summarization. Not for: chatbots, real-time tools, anything user-facing.

API: anthropic.beta.messages.batches.create(requests) — each item needs custom_id + params. Retrieve with batches.results(batch_id).

At 1M input + 200K output per month, no caching
Model	Real-time	Batch API (50% off)	Savings
Claude Haiku 4.5	$1.60	$0.8000	$0.8000
Claude Sonnet 4.6	$6.00	$3.00	$3.00
Claude Opus 4.8	$30.0	$15.0	$15.0

LLM API Budget Calculator

Start from a budget, find which model fits your monthly spend.

AI Model Cost Comparison

Compare Anthropic, OpenAI, Google, and DeepSeek across a selection matrix.

LLM API Token Cost Calculator

Multi-model comparison by actual input/output token ratios.

Frequently Asked Questions

: At an average of 500 input + 500 output tokens per call, 1,000 calls = 0.5M input + 0.5M output. That costs $1.50 + $7.50 = $9.00 at Sonnet 4.6 standard pricing. With batch API this drops to $4.50.