Which Claude model should I use for a chatbot?

For most customer-facing chatbots, Claude Haiku 4.5 ($1.00/$5.00 per 1M tokens) is the right starting point. It handles conversational turns, short Q&A, and simple classification well, and it is fast. Move up to Sonnet 4.6 if your chatbot needs to reason through complex questions, write substantial code, or handle multi-step workflows. Reserve Opus 4.8 for internal tools where quality matters more than cost.

Back to Tools

How Much Will Your Claude API Project Cost?

Enter your request volume and token counts to see exact monthly costs across all Claude models, plus a side-by-side comparison with GPT-4o and Gemini 2.5.

Key facts

Claude Haiku 4.5 is $1.00/$5.00 per 1M input/output tokens - the cheapest Claude model
Prompt caching cuts cached input cost to $0.10 per 1M reads (90% off) on all Claude models
Output tokens cost 5x more than input tokens across all Claude tiers
GPT-4o mini ($0.15/$0.60 per 1M) undercuts every Claude model on raw price alone

Common use cases - click to load

Your usage

Requests per month

Total API calls your app makes each month

Input tokens per request

Includes system prompt + user message (average)

Output tokens per request

Average length of the model response

Monthly totals:6.00M input tokens+2.00M output tokens

Enable prompt caching (Claude models only)

Cached input tokens cost 90% less than fresh input tokens. Typical for repeated system prompts.

Monthly cost - sorted cheapest first

Model	Monthly cost	Input $/1M	Output $/1M	Best for
GPT-4o miniOpenAI Not ideal for: Complex multi-turn reasoning or coding tasks	$2.10/mo	$0.15	$0.60	Simple extraction, classification, low-budget pipelines
Gemini 2.5 FlashGoogle Not ideal for: Complex coding or nuanced long-chain reasoning	$6.80/mo	$0.30	$2.50	Speed-sensitive apps, multimodal, low-latency consumer products
Claude Haiku 4.5AnthropicFastest Not ideal for: Complex reasoning, long-form code generation, or multi-step agents	$16.00/mo	$1.00	$5.00	High-volume chatbots, classification, simple Q&A at scale
Gemini 2.5 ProGoogle Not ideal for: Cost-sensitive high-volume workloads	$27.50/mo	$1.25	$10.00	Long-context tasks (1M token window), multimodal reasoning
GPT-4oOpenAI Not ideal for: Budget-sensitive high-volume pipelines	$35.00/mo	$2.50	$10.00	Multimodal tasks, structured output, broad tool use
Claude Sonnet 4.6AnthropicBalanced Not ideal for: Pure cost minimization on simple tasks	$48.00/mo	$3.00	$15.00	Coding, document analysis, agent workflows, production APIs
Claude Opus 4.8AnthropicMost Capable Not ideal for: Simple queries or high-volume production workloads	$80.00/mo	$5.00	$25.00	Complex reasoning, long-horizon research, senior engineering tasks
Claude Fable 5Anthropic Not ideal for: Anything a smaller Claude model handles well	$160.00/mo	$10.00	$50.00	Hardest reasoning tasks, frontier research, long-horizon autonomous agents

Prices as of June 2026. Sources: anthropic.com/api (Claude), openai.com/api/pricing (OpenAI), ai.google.dev/gemini-api/docs/pricing (Google). Verify before committing to a production budget.

Cheapest overall

GPT-4o mini

$2.10/mo

Simple extraction, classification, low-budget pipelines

Cheapest Claude model

Claude Haiku 4.5

$16.00/mo

High-volume chatbots, classification, simple Q&A at scale

Not sure which model to pick? The AI tool picker walks through your use case and budget to recommend the right model.

Why input and output token pricing differ

Output tokens cost roughly 5x more than input tokens on every Claude model - a ratio that holds across Haiku, Sonnet, Opus, and Fable. The reason is compute: reading your prompt (input) is a single forward pass, while generating a reply (output) requires the model to run an autoregressive forward pass for each token it generates. That adds up fast on long responses.

Prompt caching: Claude API cost reducer

If your system prompt stays the same across requests - a common pattern for chatbots and coding assistants - Claude prompt caching lets you pay 90% less on those repeated tokens. The first request pays the cache write price (1.25x normal), and every subsequent request reads from cache at $0.10 per 1M tokens on Haiku 4.5. For a chatbot with a 500-token system prompt and 10,000 requests per month, that is roughly $4.50 in savings per month on Haiku alone.

OpenAI API cost at the same request volume

GPT-4o at $2.50/$10.00 per 1M tokens lands between Claude Sonnet and Opus in price. GPT-4o mini at $0.15/$0.60 per 1M is substantially cheaper than any Claude model on raw token cost. Whether that price difference is worth it depends on your task: GPT-4o mini falls short on complex reasoning and multi-step code tasks where Claude Haiku 4.5 tends to perform better despite its higher per-token cost.

Gemini API cost at the same request volume

Gemini 2.5 Flash at $0.30/$2.50 per 1M sits between GPT-4o mini and Claude Haiku 4.5 on price and is worth benchmarking for multimodal or long-context tasks (Gemini 2.5 Pro supports a 1M token context window). For purely text-based pipelines where you are not using vision or audio, the Claude Haiku vs Gemini Flash comparison is mainly a quality question - benchmark both on your actual inputs before committing.

How to lower your LLM token cost without switching models

Three levers in order of impact: (1) enable prompt caching if your system prompt is longer than 100 tokens and repeats across requests; (2) route simple tasks like classification or short extraction to Haiku 4.5 and reserve Sonnet or Opus for tasks that actually need the extra capability; (3) audit your average output length - if you are generating 1,000 tokens per response but only need 200, add a max_tokens cap. That one change alone can cut costs by up to 80% on output-heavy workloads.

AI ROI Calculator

Is your API spend paying for itself?

Token Counter

Count tokens in a single prompt before you send it

LLM Latency Comparator

Compare real-world latency across models

AI Model Cost Calculator

Compare costs by monthly token volume across 8 LLMs

Frequently asked questions about Claude API pricing

How much does the Claude API cost per month?

It depends on your usage volume. Claude Haiku 4.5 starts at $1.00 per 1M input tokens and $5.00 per 1M output tokens. A chatbot handling 10,000 requests per month with 600 input tokens and 200 output tokens per call would cost roughly $10/month on Haiku 4.5, $30/month on Sonnet 4.6, and $50/month on Opus 4.8. Use the calculator at the top to get an exact number for your workload.

What is the difference between Claude Opus and Sonnet pricing?

Claude Sonnet 4.6 costs $3.00 per 1M input tokens and $15.00 per 1M output tokens. Claude Opus 4.8 costs $5.00 per 1M input tokens and $25.00 per 1M output tokens - about 1.7x more on each side. Sonnet handles most coding, analysis, and agent tasks well. Reserve Opus for tasks requiring deep multi-step reasoning where Sonnet consistently falls short.

Does Claude prompt caching reduce API costs?

Yes. Cached input tokens cost $0.10 per 1M on Haiku 4.5 (versus $1.00 for fresh input) - a 90% reduction. If your system prompt is 500 tokens and you send 10,000 requests per month, caching that system prompt saves roughly $4.50/month on Haiku alone. The savings scale linearly with prompt length and request volume.

How does Claude API pricing compare to GPT-4o?

GPT-4o costs $2.50/$10.00 per 1M input/output tokens. Claude Sonnet 4.6 at $3.00/$15.00 is slightly more expensive per token. GPT-4o mini at $0.15/$0.60 per 1M is far cheaper than any Claude model on raw price, though the quality gap on complex tasks is real. The right call depends on your benchmark results on actual production inputs, not just token price.

What is the difference between input and output token pricing?

Output tokens cost 5x more than input tokens on all Claude models. Generating each output token requires a full autoregressive forward pass through the model, while reading input is a single pass. For most applications, output tokens are 15-30% of total tokens, but long-form writing or code generation tasks can push that ratio much higher.

Which Claude model should I start with for a chatbot?

Start with Claude Haiku 4.5. It handles conversational turns, short Q&A, and classification well, and it runs fast. Move up to Sonnet 4.6 only if Haiku consistently fails on your specific query types after actual testing. Reserve Opus 4.8 for high-stakes internal tools where quality matters more than operating cost.