Gemini 2.5 Flash token & cost calculator

Gemini 2.5 Flash is Google's high-throughput, low-cost frontier model — the tier you reach for when you want a capable model at vending-machine prices. At $0.30 input / $2.50 output per million tokens, it's one of the cheapest entries in any current frontier-model family, and unlike Pro it ships at this flat rate even at the very long end of its 1M-token context window.

The price profile makes Flash an unusually good default for the boring 80% of an AI feature: classification turns, extraction, summarization, structured rewrites, batch processing, agent loops where most of the work is plumbing rather than reasoning. It is not the model you want generating customer-facing prose at the high quality bar — that's Pro or Sonnet territory — but as the workhorse underneath, Flash compresses what you spend on tokens by an order of magnitude versus the premium tiers.

Expected response (output tokens)

Prompt

Client-side. Never uploaded.

0 / 1,000,000 charactersContext window: 1,000,000 tokens

Or start with an example

Total estimated cost

<$0.01Gemini 2.5 Flash

Tokens±3% approx

Input cost

$0.00

Output cost (est.)

<$0.01

@ 1,024 response tokens

Context used

of 1,000,000

Verified 2026-05-09 · ±3%

Saved scenariosnone yet

Saved on this browser only — never uploaded. Up to 10 scenarios.

Tip: save a scenario when you have a prompt + model + response length you might revisit. Useful for sizing features before committing to a vendor.

Verify privacysince this page loaded — updates live

Prompt uploads0Always 0 — by design

Outgoing requests0Analytics + page assets only — no prompt content

Cookies on this origin0Vercel Analytics + Clarity may set first-party cookies

localStorage keys0Theme preference + saved scenarios live here

Server endpoints1/api/og only — accepts title + subtitle, never prompt text

Inspect

Open DevTools → Network. Type into the calculator. No request bodies should contain your prompt text.

Pricing

Flash is flat-priced — no tier surcharge above 200k tokens, unlike Pro. That makes it qualitatively different from its sibling for large-input workloads: the rate you see is what you pay regardless of context size.

Tier	Input $/M	Output $/M
All input	$0.3	$2.5
Context window	1,000,000 tokens

Verified against ai.google.dev on 2026-05-09.

Worked examples

The scenarios below illustrate the price floor. All three round to fractions of a cent, which is the point — Flash unlocks workloads where the model bill stops being the budget conversation entirely.

Scenario	Input	Output	Cost
Short chat turn A typical Q&A turn with a small system prompt.	800	400	<$0.01
System prompt + tool spec A larger context window with a tool schema, single response.	5,000	500	<$0.01
Long document Q&A A long-form input (e.g. transcript) with a structured response.	50,000	1,500	$0.019

The honest framing for Flash: at these prices your engineering time is the cost center, not the model. A poorly-written prompt that produces 2,000 tokens instead of 200 still costs you about half a cent — which means the lever to optimize is correctness and reliability, not cost-per-token. Spend your time on evals and routing logic; the per-million numbers will take care of themselves.

How is this counted?

We approximate Flash's tokenizer with o200k_base from js-tiktoken (MIT). Drift on Gemini 2.5 is typically ~3% on natural language. At Flash's price level the dollar impact of a 3% miss is negligible per request — it matters mostly for batch-job sizing where small percentages compound across millions of requests. Inputs over 50,000 characters tokenize in a Web Worker.

FAQ

When does Flash beat Claude Haiku?

Flash is cheaper per million tokens — roughly $0.30 input vs Haiku's $1 — so on pure cost it wins when both meet your quality bar. The right tiebreaker is your eval set: run your real workload on both and pick the one that hits 95%+ of Sonnet/Opus quality at the lowest spend. Flash tends to be very good at extraction, classification, and structured rewriting; Haiku tends to be more reliable on instruction-following.

Can I use the full 1M context window with Flash?

Yes. Flash supports the same 1M-token context as Pro, which is unusual at this price point. There is no published tier surcharge for Flash above 200k tokens — it stays at the published flat rate — making it a credible choice for long-context jobs that don't need Pro's reasoning capability.

Is the token count exact?

No. Google does not publish a client-side Gemini tokenizer, so we use o200k_base (via js-tiktoken) as the closest available encoding. Drift is typically ~3% on natural language. For Flash specifically, where per-request cost is fractions of a cent, that drift rarely changes which decision you would make on the basis of the estimate.

What's the right way to pick a max_tokens cap for Flash?

Lower than you would for Sonnet or Pro. Flash is fast and cheap, but it can produce verbose output if uncapped — and at 8× the per-token cost of input, output is what dominates your bill. For structured-output workloads, set max_tokens to roughly 1.5× your expected schema size and validate the response.

Does my prompt leave the browser?

No. Tokenization runs entirely client-side, in a Web Worker for inputs over 50,000 characters. There is no server route that ever receives prompt content.

Compare against every other model

To see this exact prompt scored against every supported model, sorted by total cost, paste it into the home calculator and toggle Compare across all models. At Flash's price point the per-million numbers are essentially noise — the comparison view tells you which model fits your context window and reliability requirements.

Related models

The two most relevant comparisons: Gemini 2.5 Pro (when Flash isn't quite good enough) and Claude 4.5 Haiku (the cross-vendor budget tier that competes head-to-head on price and capability).