LLM API Token Cost CalculatorGPT-4o vs Claude Sonnet 4.6 vs DeepSeek V4 Pro vs Gemini 2.5 Pro
Compare monthly LLM API costs by actual input/output token ratio — not just total volume. Select your use-case profile, adjust token counts, and see exactly which model saves money for your workload. Prices sourced from official API pricing pages in May 2026.
Updated May 25, 2026 · By Jim Liu
TL;DR
- • Cheapest for most workloads: DeepSeek V4 Pro at $0.14/M input — 18x cheaper than GPT-4o, 95x cheaper than Claude Sonnet 4.6 on input cost alone.
- • Output costs dominate code review / generation: Output tokens cost 4-5x more than input tokens on every model. A chatbot that generates 80% output will cost 2-3x more than a RAG app with 70% input.
- • Best input-to-output ratio among premium models: Gemini 2.5 Pro at $1.25/M input + $10/M output beats GPT-4o ($2.50) and Claude ($3.00) for input-heavy workloads and offers a 1M-token context window.
- • Batch API cuts costs 50%: OpenAI Batch API and Anthropic Message Batches both offer 50% off for async jobs with 24-hour turnaround.
- • Prompt caching not included: Repeated system prompts with Anthropic caching can cut input costs up to 90% — add that manually for accurate production estimates.
Configure your workload
Balanced turns, moderate context, balanced cost
Context, system prompt, retrieved chunks
Generated text, code, answers, completions
Cheapest frontier-grade model at $0.14/M input. 18-95x cheaper than competitors. Best ROI for bulk processing and RAG.
Side-by-side monthly cost
| Model | Provider | Input cost | Output cost | Total / mo | vs cheapest |
|---|---|---|---|---|---|
DeepSeek V4 Pro | DeepSeek | $0.70 | $1.40 | $2.10 | Cheapest |
Gemini 2.5 Pro | $6.25 | $50.0 | $56.3 | +2579% | |
GPT-4o | OpenAI | $12.5 | $50.0 | $62.5 | +2876% |
Claude Sonnet 4.6 | Anthropic | $15.0 | $75.0 | $90.0 | +4186% |
Based on public pricing pages reviewed May 2026. Input cost = inputM × input rate; output cost = outputM × output rate. Batch API (50% off) and prompt caching discounts not applied. Real bills may differ by 10-25% due to rounding, minimum charges, and regional pricing.
By the Numbers — Current API Pricing (May 2026)
| Model | Provider | Input (per 1M tokens) | Output (per 1M tokens) | Output / Input ratio |
|---|---|---|---|---|
| GPT-4o | OpenAI | $2.50/M | $10.00/M | 4.0x |
| Claude Sonnet 4.6 | Anthropic | $3.00/M | $15.00/M | 5.0x |
| DeepSeek V4 Pro | DeepSeek | $0.14/M | $0.28/M | 2.0x |
| Gemini 2.5 Pro | $1.25/M | $10.00/M | 8.0x |
DeepSeek V4 Pro redefined the cost floor in early 2026 with a 75% price cut, landing at $0.14/M input and $0.28/M output. At those rates, 100M tokens of mixed workload costs roughly $21 — a bill that would be $385 with GPT-4o or $390 with Claude Sonnet 4.6. The primary trade-off is API rate limits and higher latency from overseas routing.
Gemini 2.5 Pro offers the best input-to-output price ratio among premium models at $1.25/M input and $10/M output (8x ratio). Its 1M-token context window is particularly valuable for long-document RAG: you can feed an entire codebase or legal document without chunking, eliminating retrieval pipeline complexity entirely.
GPT-4o at $2.50/M input and $10.00/M output remains the reliability benchmark. Its 99.9% uptime SLA, fine-tuning support, and consistent latency make it the default for production workloads where cost is secondary to stability. The 4x output multiplier means a balanced 50/50 chatbot spends 80% of its bill on output tokens.
Claude Sonnet 4.6 at $3.00/M input and $15.00/M output carries the highest output price in this comparison — 5x the input rate. It is the right choice for quality-critical generation tasks (structured documents, code, reasoning chains) where the marginal quality improvement justifies the premium. Anthropic's prompt caching can cut effective input costs by up to 90% for repeated system prompts, narrowing the gap with cheaper alternatives.
FAQ
- Generating tokens (output) is computationally far more expensive than reading tokens (input). During inference, the GPU runs a full forward pass for every output token but only a single parallel pass for the entire input prompt. That difference in compute manifests as a 2-5x price gap: GPT-4o charges $2.50/M input vs $10.00/M output, and Claude Sonnet 4.6 charges $3.00/M vs $15.00/M. Getting this ratio wrong in your cost model leads to large underestimates on generation-heavy workloads like code review or creative writing.