Claude 4.5 Haiku token & cost calculator
Claude 4.5 Haiku is the budget tier of the Claude 4.5 family — built for high-volume, low-latency, structured tasks where you want a capable model but can't afford Sonnet's per-million price. It's the model you reach for when most of your token spend is on plumbing: classification turns inside an agent loop, extraction of fields from semi-structured input, summarization of pages, conversion between formats, light code edits.
The pricing math on Haiku is qualitatively different from Sonnet. At $1 input / $5 output per million, a workload that breaks even on Sonnet at 12,000 page views per month breaks even on Haiku at roughly a third of that — assuming the eval quality is acceptable, which is the question worth your time before you reach for this calculator.
Saved scenariosnone yet
Saved on this browser only — never uploaded. Up to 10 scenarios.
Tip: save a scenario when you have a prompt + model + response length you might revisit. Useful for sizing features before committing to a vendor.
Verify privacysince this page loaded — updates live
Open DevTools → Network. Type into the calculator. No request bodies should contain your prompt text.
Pricing
Haiku is flat-priced. The thing to notice is the input/output spread — output is 5× input, the same multiple as Sonnet, so the same prompt-engineering instincts transfer.
| Tier | Input $/M | Output $/M |
|---|---|---|
| All input | $1 | $5 |
| Context window | 200,000 tokens | |
Verified against www.anthropic.com on 2026-05-09.
Worked examples
The scenarios below show how cheap Haiku gets at scale. A short chat turn rounds to a fraction of a cent; even a long-document Q&A is well under a penny per request before you account for caching.
| Scenario | Input | Output | Cost |
|---|---|---|---|
Short chat turn A typical Q&A turn with a small system prompt. | 800 | 400 | <$0.01 |
System prompt + tool spec A larger context window with a tool schema, single response. | 5,000 | 500 | <$0.01 |
Long document Q&A A long-form input (e.g. transcript) with a structured response. | 50,000 | 1,500 | $0.058 |
The honest framing is this: at Haiku's prices, your tokenizer mistakes are probably more expensive than your model bill. A debug log accidentally pasted into a prompt, a system prompt that grew unmonitored to 8,000 tokens, an output schema that returns a paragraph instead of a JSON field — any of these will dominate your spend long before the model rate does. Haiku rewards good prompt hygiene more than it rewards rate negotiation.
How is this counted?
We approximate Haiku's tokenizer with cl100k_base from gpt-tokenizer (MIT). For Haiku specifically, the approximation is well-suited: most Haiku workloads are short and structured, and cl100k tends to under-count slightly on heavily-formatted inputs (XML, JSON), which biases the estimate toward conservative budget numbers. Inputs over 50,000 characters tokenize in a Web Worker.
FAQ
Where does Haiku stop being good enough?
Can I run it concurrently at high volume?
Does the token count match what the API charges?
How does Haiku compare to Gemini 2.5 Flash?
Should I cache long system prompts?
Compare against every other model
To see this exact prompt scored against every supported model, sorted by total cost, paste it into the home calculator and toggle Compare across all models. Numbers are exact for OpenAI and within ±2–3% for Claude and Gemini.
Related models
The most relevant comparison set: Sonnet (one tier up in the Claude 4.5 family) and the Gemini 2.5 Flash family across the vendor boundary.