Count tokens across GPT-5, Claude 4, Gemini 2.5, Llama 4 with live cost estimation
| Model | Input tokens | Input $ | Output $ | Total $ | Context fit |
|---|---|---|---|---|---|
GPT-5 OpenAI · 272,000 ctx | 96 | $0.000120 | $0.005000 | $0.005120 | ✓ fits (0.2%) |
GPT-5 mini OpenAI · 272,000 ctx | 96 | $0.000024 | $0.001000 | $0.001024 | ✓ fits (0.2%) |
GPT-5 nano OpenAI · 272,000 ctx | 96 | $0.000005 | $0.000200 | $0.000205 | ✓ fits (0.2%) |
GPT-4.1 OpenAI · 1,047,576 ctx | 96 | $0.000192 | $0.004000 | $0.004192 | ✓ fits (0.1%) |
o3 (reasoning) OpenAI · 200,000 ctx | 96 | $0.000192 | $0.004000 | $0.004192 | ✓ fits (0.3%) |
Claude Opus 4.7 Anthropic · 1,000,000 ctx | 101 | $0.001515 | $0.037500 | $0.039015 | ✓ fits (0.1%) |
Claude Sonnet 4.6 Anthropic · 1,000,000 ctx | 101 | $0.000303 | $0.007500 | $0.007803 | ✓ fits (0.1%) |
Claude Haiku 4.5 Anthropic · 200,000 ctx | 101 | $0.000101 | $0.002500 | $0.002601 | ✓ fits (0.3%) |
Gemini 2.5 Pro Google · 2,097,152 ctx | 107 | $0.000134 | $0.005000 | $0.005134 | ✓ fits (0.0%) |
Gemini 2.5 Flash Google · 1,048,576 ctx | 107 | $0.000032 | $0.001250 | $0.001282 | ✓ fits (0.1%) |
Gemini 2.5 Flash-Lite Google · 1,000,000 ctx | 107 | $0.000011 | $0.000200 | $0.000211 | ✓ fits (0.1%) |
Llama 4 Maverick Meta · 10,000,000 ctx | 98 | $0.000026 | $0.000425 | $0.000451 | ✓ fits (0.0%) |
Llama 4 Scout Meta · 10,000,000 ctx | 98 | $0.000011 | $0.000170 | $0.000181 | ✓ fits (0.0%) |
DeepSeek V3.1 DeepSeek · 128,000 ctx | 98 | $0.000026 | $0.000550 | $0.000576 | ✓ fits (0.5%) |
DeepSeek R1 (reasoning) DeepSeek · 64,000 ctx | 98 | $0.000054 | $0.001095 | $0.001149 | ✓ fits (0.9%) |
Mistral Large 2 Mistral · 128,000 ctx | 120 | $0.000240 | $0.003000 | $0.003240 | ✓ fits (0.5%) |
Mistral Medium 3 Mistral · 128,000 ctx | 120 | $0.000048 | $0.001000 | $0.001048 | ✓ fits (0.5%) |
Grok 4 xAI · 256,000 ctx | 96 | $0.000288 | $0.007500 | $0.007788 | ✓ fits (0.2%) |
Every call to an LLM API bills per token — the atomic unit each model breaks your text into. Knowing the token count before you call is essential for cost forecasting, context-window budgeting, and avoiding mid-response cutoffs. This tool estimates tokens across every frontier model's tokenizer and multiplies by current API prices so you can compare apples to apples.
Estimates are calibrated from publicly reported tokenizer characteristics (~3.6–4.0 chars/token English). English text is typically ±3% vs. official tokenizers. For exact GPT counts use OpenAI's tokenizer; for Claude use Anthropic's token-count API.
Each model family ships with its own tokenizer (GPT uses tiktoken, Claude uses its own ~100k vocab, Gemini uses SentencePiece). More specialized tokenizers ≠ fewer tokens — it depends on how well the vocab matches your text.
Remove redundant whitespace, use concise phrasing, put long static context behind prompt-caching, and prefer models with larger English-optimized vocabularies (Claude, GPT-4o) for long English prose.
Varies by workload — see our LLM Price Comparison tool. As of April 2026: Llama 4 Scout, Gemini Flash-Lite, and GPT-5 nano are the budget leaders.
Yes — paste everything you'll send (system + user + any prior assistant turns) to get a realistic estimate.