Skip to main content
tokenmath
Menu

Gemini 2.5 Flash token & cost calculator

Gemini 2.5 Flash is Google's high-throughput, low-cost frontier model — the tier you reach for when you want a capable model at vending-machine prices. At $0.30 input / $2.50 output per million tokens, it's one of the cheapest entries in any current frontier-model family, and unlike Pro it ships at this flat rate even at the very long end of its 1M-token context window.

The price profile makes Flash an unusually good default for the boring 80% of an AI feature: classification turns, extraction, summarization, structured rewrites, batch processing, agent loops where most of the work is plumbing rather than reasoning. It is not the model you want generating customer-facing prose at the high quality bar — that's Pro or Sonnet territory — but as the workhorse underneath, Flash compresses what you spend on tokens by an order of magnitude versus the premium tiers.

Client-side. Never uploaded.
0 / 1,000,000 charactersContext window: 1,000,000 tokens
Or start with an example
Total estimated cost
<$0.01Gemini 2.5 Flash
Tokens±3% approx
0
Input cost
$0.00
Output cost (est.)
<$0.01
@ 1,024 response tokens
Context used
0%
of 1,000,000
Verified 2026-05-09 · ±3%
Saved scenariosnone yet

Saved on this browser only — never uploaded. Up to 10 scenarios.

Tip: save a scenario when you have a prompt + model + response length you might revisit. Useful for sizing features before committing to a vendor.

Verify privacysince this page loaded — updates live
Prompt uploads0Always 0 — by design
Outgoing requests0Analytics + page assets only — no prompt content
Cookies on this origin0Vercel Analytics + Clarity may set first-party cookies
localStorage keys0Theme preference + saved scenarios live here
Server endpoints1/api/og only — accepts title + subtitle, never prompt text
Inspect

Open DevTools → Network. Type into the calculator. No request bodies should contain your prompt text.

Pricing

Flash is flat-priced — no tier surcharge above 200k tokens, unlike Pro. That makes it qualitatively different from its sibling for large-input workloads: the rate you see is what you pay regardless of context size.

TierInput $/MOutput $/M
All input$0.3$2.5
Context window1,000,000 tokens

Verified against ai.google.dev on 2026-05-09.

Worked examples

The scenarios below illustrate the price floor. All three round to fractions of a cent, which is the point — Flash unlocks workloads where the model bill stops being the budget conversation entirely.

ScenarioInputOutputCost
Short chat turn
A typical Q&A turn with a small system prompt.
800400<$0.01
System prompt + tool spec
A larger context window with a tool schema, single response.
5,000500<$0.01
Long document Q&A
A long-form input (e.g. transcript) with a structured response.
50,0001,500$0.019

The honest framing for Flash: at these prices your engineering time is the cost center, not the model. A poorly-written prompt that produces 2,000 tokens instead of 200 still costs you about half a cent — which means the lever to optimize is correctness and reliability, not cost-per-token. Spend your time on evals and routing logic; the per-million numbers will take care of themselves.

How is this counted?

We approximate Flash's tokenizer with o200k_base from js-tiktoken (MIT). Drift on Gemini 2.5 is typically ~3% on natural language. At Flash's price level the dollar impact of a 3% miss is negligible per request — it matters mostly for batch-job sizing where small percentages compound across millions of requests. Inputs over 50,000 characters tokenize in a Web Worker.

FAQ

When does Flash beat Claude Haiku?
Flash is cheaper per million tokens — roughly $0.30 input vs Haiku's $1 — so on pure cost it wins when both meet your quality bar. The right tiebreaker is your eval set: run your real workload on both and pick the one that hits 95%+ of Sonnet/Opus quality at the lowest spend. Flash tends to be very good at extraction, classification, and structured rewriting; Haiku tends to be more reliable on instruction-following.
Can I use the full 1M context window with Flash?
Yes. Flash supports the same 1M-token context as Pro, which is unusual at this price point. There is no published tier surcharge for Flash above 200k tokens — it stays at the published flat rate — making it a credible choice for long-context jobs that don't need Pro's reasoning capability.
Is the token count exact?
No. Google does not publish a client-side Gemini tokenizer, so we use o200k_base (via js-tiktoken) as the closest available encoding. Drift is typically ~3% on natural language. For Flash specifically, where per-request cost is fractions of a cent, that drift rarely changes which decision you would make on the basis of the estimate.
What's the right way to pick a max_tokens cap for Flash?
Lower than you would for Sonnet or Pro. Flash is fast and cheap, but it can produce verbose output if uncapped — and at 8× the per-token cost of input, output is what dominates your bill. For structured-output workloads, set max_tokens to roughly 1.5× your expected schema size and validate the response.
Does my prompt leave the browser?
No. Tokenization runs entirely client-side, in a Web Worker for inputs over 50,000 characters. There is no server route that ever receives prompt content.

Compare against every other model

To see this exact prompt scored against every supported model, sorted by total cost, paste it into the home calculator and toggle Compare across all models. At Flash's price point the per-million numbers are essentially noise — the comparison view tells you which model fits your context window and reliability requirements.

Related models

The two most relevant comparisons: Gemini 2.5 Pro (when Flash isn't quite good enough) and Claude 4.5 Haiku (the cross-vendor budget tier that competes head-to-head on price and capability).

Keyboard shortcuts

Press ? any time to reopen this list.

Show this overlay?
Toggle themet
Focus the prompt textarea/
Go to homegh
Go to modelsgm
Go to pricing datagp
Go to changeloggc
Go to aboutga
Close overlays / dialogsEsc