Skip to main content
Back to Tools

How Much Will Your Claude API Project Cost?

Enter your request volume and token counts to see exact monthly costs across all Claude models, plus a side-by-side comparison with GPT-4o and Gemini 2.5.

Key facts

  • Claude Haiku 4.5 is $1.00/$5.00 per 1M input/output tokens - the cheapest Claude model
  • Prompt caching cuts cached input cost to $0.10 per 1M reads (90% off) on all Claude models
  • Output tokens cost 5x more than input tokens across all Claude tiers
  • GPT-4o mini ($0.15/$0.60 per 1M) undercuts every Claude model on raw price alone

Common use cases - click to load

Your usage

Total API calls your app makes each month

Includes system prompt + user message (average)

Average length of the model response

Monthly totals:6.00M input tokens+2.00M output tokens

Cached input tokens cost 90% less than fresh input tokens. Typical for repeated system prompts.

Monthly cost - sorted cheapest first

ModelMonthly cost
GPT-4o miniOpenAI
$2.10/mo
Gemini 2.5 FlashGoogle
$6.80/mo
Claude Haiku 4.5AnthropicFastest
$16.00/mo
Gemini 2.5 ProGoogle
$27.50/mo
GPT-4oOpenAI
$35.00/mo
Claude Sonnet 4.6AnthropicBalanced
$48.00/mo
Claude Opus 4.8AnthropicMost Capable
$80.00/mo
Claude Fable 5Anthropic
$160.00/mo

Prices as of June 2026. Sources: anthropic.com/api (Claude), openai.com/api/pricing (OpenAI), ai.google.dev/gemini-api/docs/pricing (Google). Verify before committing to a production budget.

Cheapest overall
GPT-4o mini
$2.10/mo
Simple extraction, classification, low-budget pipelines
Cheapest Claude model
Claude Haiku 4.5
$16.00/mo
High-volume chatbots, classification, simple Q&A at scale
Not sure which model to pick? The AI tool picker walks through your use case and budget to recommend the right model.

Why input and output token pricing differ

Output tokens cost roughly 5x more than input tokens on every Claude model - a ratio that holds across Haiku, Sonnet, Opus, and Fable. The reason is compute: reading your prompt (input) is a single forward pass, while generating a reply (output) requires the model to run an autoregressive forward pass for each token it generates. That adds up fast on long responses.

Prompt caching: Claude API cost reducer

If your system prompt stays the same across requests - a common pattern for chatbots and coding assistants - Claude prompt caching lets you pay 90% less on those repeated tokens. The first request pays the cache write price (1.25x normal), and every subsequent request reads from cache at $0.10 per 1M tokens on Haiku 4.5. For a chatbot with a 500-token system prompt and 10,000 requests per month, that is roughly $4.50 in savings per month on Haiku alone.

OpenAI API cost at the same request volume

GPT-4o at $2.50/$10.00 per 1M tokens lands between Claude Sonnet and Opus in price. GPT-4o mini at $0.15/$0.60 per 1M is substantially cheaper than any Claude model on raw token cost. Whether that price difference is worth it depends on your task: GPT-4o mini falls short on complex reasoning and multi-step code tasks where Claude Haiku 4.5 tends to perform better despite its higher per-token cost.

Gemini API cost at the same request volume

Gemini 2.5 Flash at $0.30/$2.50 per 1M sits between GPT-4o mini and Claude Haiku 4.5 on price and is worth benchmarking for multimodal or long-context tasks (Gemini 2.5 Pro supports a 1M token context window). For purely text-based pipelines where you are not using vision or audio, the Claude Haiku vs Gemini Flash comparison is mainly a quality question - benchmark both on your actual inputs before committing.

How to lower your LLM token cost without switching models

Three levers in order of impact: (1) enable prompt caching if your system prompt is longer than 100 tokens and repeats across requests; (2) route simple tasks like classification or short extraction to Haiku 4.5 and reserve Sonnet or Opus for tasks that actually need the extra capability; (3) audit your average output length - if you are generating 1,000 tokens per response but only need 200, add a max_tokens cap. That one change alone can cut costs by up to 80% on output-heavy workloads.

Frequently asked questions about Claude API pricing

How much does the Claude API cost per month?

It depends on your usage volume. Claude Haiku 4.5 starts at $1.00 per 1M input tokens and $5.00 per 1M output tokens. A chatbot handling 10,000 requests per month with 600 input tokens and 200 output tokens per call would cost roughly $10/month on Haiku 4.5, $30/month on Sonnet 4.6, and $50/month on Opus 4.8. Use the calculator at the top to get an exact number for your workload.

What is the difference between Claude Opus and Sonnet pricing?

Claude Sonnet 4.6 costs $3.00 per 1M input tokens and $15.00 per 1M output tokens. Claude Opus 4.8 costs $5.00 per 1M input tokens and $25.00 per 1M output tokens - about 1.7x more on each side. Sonnet handles most coding, analysis, and agent tasks well. Reserve Opus for tasks requiring deep multi-step reasoning where Sonnet consistently falls short.

Does Claude prompt caching reduce API costs?

Yes. Cached input tokens cost $0.10 per 1M on Haiku 4.5 (versus $1.00 for fresh input) - a 90% reduction. If your system prompt is 500 tokens and you send 10,000 requests per month, caching that system prompt saves roughly $4.50/month on Haiku alone. The savings scale linearly with prompt length and request volume.

How does Claude API pricing compare to GPT-4o?

GPT-4o costs $2.50/$10.00 per 1M input/output tokens. Claude Sonnet 4.6 at $3.00/$15.00 is slightly more expensive per token. GPT-4o mini at $0.15/$0.60 per 1M is far cheaper than any Claude model on raw price, though the quality gap on complex tasks is real. The right call depends on your benchmark results on actual production inputs, not just token price.

What is the difference between input and output token pricing?

Output tokens cost 5x more than input tokens on all Claude models. Generating each output token requires a full autoregressive forward pass through the model, while reading input is a single pass. For most applications, output tokens are 15-30% of total tokens, but long-form writing or code generation tasks can push that ratio much higher.

Which Claude model should I start with for a chatbot?

Start with Claude Haiku 4.5. It handles conversational turns, short Q&A, and classification well, and it runs fast. Move up to Sonnet 4.6 only if Haiku consistently fails on your specific query types after actual testing. Reserve Opus 4.8 for high-stakes internal tools where quality matters more than operating cost.

Sponsored

Ad served by Adsterra. OpenAIToolsHub is not responsible for advertiser content.