Gemini 2.5 Flash token & cost calculator
Gemini 2.5 Flash is Google's high-throughput, low-cost frontier model — the tier you reach for when you want a capable model at vending-machine prices. At $0.30 input / $2.50 output per million tokens, it's one of the cheapest entries in any current frontier-model family, and unlike Pro it ships at this flat rate even at the very long end of its 1M-token context window.
The price profile makes Flash an unusually good default for the boring 80% of an AI feature: classification turns, extraction, summarization, structured rewrites, batch processing, agent loops where most of the work is plumbing rather than reasoning. It is not the model you want generating customer-facing prose at the high quality bar — that's Pro or Sonnet territory — but as the workhorse underneath, Flash compresses what you spend on tokens by an order of magnitude versus the premium tiers.
Saved scenariosnone yet
Saved on this browser only — never uploaded. Up to 10 scenarios.
Tip: save a scenario when you have a prompt + model + response length you might revisit. Useful for sizing features before committing to a vendor.
Verify privacysince this page loaded — updates live
Open DevTools → Network. Type into the calculator. No request bodies should contain your prompt text.
Pricing
Flash is flat-priced — no tier surcharge above 200k tokens, unlike Pro. That makes it qualitatively different from its sibling for large-input workloads: the rate you see is what you pay regardless of context size.
| Tier | Input $/M | Output $/M |
|---|---|---|
| All input | $0.3 | $2.5 |
| Context window | 1,000,000 tokens | |
Verified against ai.google.dev on 2026-05-09.
Worked examples
The scenarios below illustrate the price floor. All three round to fractions of a cent, which is the point — Flash unlocks workloads where the model bill stops being the budget conversation entirely.
| Scenario | Input | Output | Cost |
|---|---|---|---|
Short chat turn A typical Q&A turn with a small system prompt. | 800 | 400 | <$0.01 |
System prompt + tool spec A larger context window with a tool schema, single response. | 5,000 | 500 | <$0.01 |
Long document Q&A A long-form input (e.g. transcript) with a structured response. | 50,000 | 1,500 | $0.019 |
The honest framing for Flash: at these prices your engineering time is the cost center, not the model. A poorly-written prompt that produces 2,000 tokens instead of 200 still costs you about half a cent — which means the lever to optimize is correctness and reliability, not cost-per-token. Spend your time on evals and routing logic; the per-million numbers will take care of themselves.
How is this counted?
We approximate Flash's tokenizer with o200k_base from js-tiktoken (MIT). Drift on Gemini 2.5 is typically ~3% on natural language. At Flash's price level the dollar impact of a 3% miss is negligible per request — it matters mostly for batch-job sizing where small percentages compound across millions of requests. Inputs over 50,000 characters tokenize in a Web Worker.
FAQ
When does Flash beat Claude Haiku?
Can I use the full 1M context window with Flash?
Is the token count exact?
What's the right way to pick a max_tokens cap for Flash?
Does my prompt leave the browser?
Compare against every other model
To see this exact prompt scored against every supported model, sorted by total cost, paste it into the home calculator and toggle Compare across all models. At Flash's price point the per-million numbers are essentially noise — the comparison view tells you which model fits your context window and reliability requirements.
Related models
The two most relevant comparisons: Gemini 2.5 Pro (when Flash isn't quite good enough) and Claude 4.5 Haiku (the cross-vendor budget tier that competes head-to-head on price and capability).