Skip to main content
Back to Tools

LLM API Token Cost CalculatorGPT-4o vs Claude Sonnet 4.6 vs DeepSeek V4 Pro vs Gemini 2.5 Pro

Compare monthly LLM API costs by actual input/output token ratio — not just total volume. Select your use-case profile, adjust token counts, and see exactly which model saves money for your workload. Prices sourced from official API pricing pages in May 2026.

Updated May 25, 2026 · By Jim Liu

TL;DR

  • Cheapest for most workloads: DeepSeek V4 Pro at $0.14/M input — 18x cheaper than GPT-4o, 95x cheaper than Claude Sonnet 4.6 on input cost alone.
  • Output costs dominate code review / generation: Output tokens cost 4-5x more than input tokens on every model. A chatbot that generates 80% output will cost 2-3x more than a RAG app with 70% input.
  • Best input-to-output ratio among premium models: Gemini 2.5 Pro at $1.25/M input + $10/M output beats GPT-4o ($2.50) and Claude ($3.00) for input-heavy workloads and offers a 1M-token context window.
  • Batch API cuts costs 50%: OpenAI Batch API and Anthropic Message Batches both offer 50% off for async jobs with 24-hour turnaround.
  • Prompt caching not included: Repeated system prompts with Anthropic caching can cut input costs up to 90% — add that manually for accurate production estimates.

Configure your workload

Balanced turns, moderate context, balanced cost

Context, system prompt, retrieved chunks

Generated text, code, answers, completions

Total tokens
10.0M
Input
5.0M(50%)
Output
5.0M(50%)
Cheapest at this workload
DeepSeek V4 Proby DeepSeek$2.10/mo

Cheapest frontier-grade model at $0.14/M input. 18-95x cheaper than competitors. Best ROI for bulk processing and RAG.

Side-by-side monthly cost

ModelProviderInput costOutput costTotal / movs cheapest
DeepSeek V4 Pro
DeepSeek$0.70$1.40$2.10Cheapest
Gemini 2.5 Pro
Google$6.25$50.0$56.3+2579%
GPT-4o
OpenAI$12.5$50.0$62.5+2876%
Claude Sonnet 4.6
Anthropic$15.0$75.0$90.0+4186%

Based on public pricing pages reviewed May 2026. Input cost = inputM × input rate; output cost = outputM × output rate. Batch API (50% off) and prompt caching discounts not applied. Real bills may differ by 10-25% due to rounding, minimum charges, and regional pricing.

By the Numbers — Current API Pricing (May 2026)

ModelProviderInput (per 1M tokens)Output (per 1M tokens)Output / Input ratio
GPT-4oOpenAI$2.50/M$10.00/M4.0x
Claude Sonnet 4.6Anthropic$3.00/M$15.00/M5.0x
DeepSeek V4 ProDeepSeek$0.14/M$0.28/M2.0x
Gemini 2.5 ProGoogle$1.25/M$10.00/M8.0x

DeepSeek V4 Pro redefined the cost floor in early 2026 with a 75% price cut, landing at $0.14/M input and $0.28/M output. At those rates, 100M tokens of mixed workload costs roughly $21 — a bill that would be $385 with GPT-4o or $390 with Claude Sonnet 4.6. The primary trade-off is API rate limits and higher latency from overseas routing.

Gemini 2.5 Pro offers the best input-to-output price ratio among premium models at $1.25/M input and $10/M output (8x ratio). Its 1M-token context window is particularly valuable for long-document RAG: you can feed an entire codebase or legal document without chunking, eliminating retrieval pipeline complexity entirely.

GPT-4o at $2.50/M input and $10.00/M output remains the reliability benchmark. Its 99.9% uptime SLA, fine-tuning support, and consistent latency make it the default for production workloads where cost is secondary to stability. The 4x output multiplier means a balanced 50/50 chatbot spends 80% of its bill on output tokens.

Claude Sonnet 4.6 at $3.00/M input and $15.00/M output carries the highest output price in this comparison — 5x the input rate. It is the right choice for quality-critical generation tasks (structured documents, code, reasoning chains) where the marginal quality improvement justifies the premium. Anthropic's prompt caching can cut effective input costs by up to 90% for repeated system prompts, narrowing the gap with cheaper alternatives.

FAQ

Generating tokens (output) is computationally far more expensive than reading tokens (input). During inference, the GPU runs a full forward pass for every output token but only a single parallel pass for the entire input prompt. That difference in compute manifests as a 2-5x price gap: GPT-4o charges $2.50/M input vs $10.00/M output, and Claude Sonnet 4.6 charges $3.00/M vs $15.00/M. Getting this ratio wrong in your cost model leads to large underestimates on generation-heavy workloads like code review or creative writing.
Sponsored

Ad served by Adsterra. OpenAIToolsHub is not responsible for advertiser content.