Complete Guide

Token Counter for AI Models
How It Works, What It Costs

A practical, honest guide to token counting across OpenAI, Claude, Gemini, Mistral, and xAI. Understand how your text becomes tokens, why the same prompt costs differently across models, and how to optimise your prompts to spend less.

AI Providers

60+

Models Covered

Local + API

Counting Method

None

Data Stored

The Fundamentals

What Exactly Is a Token?

Not a word, not a character — something in between

The Subword Unit

How LLMs read text

AI language models don't read text the way humans do. They convert everything into tokens — compact numeric IDs representing pieces of text. A token is usually a word fragment, a whole common word, or a punctuation symbol. The word "tokenization" might be split into "token" + "ization" — that's two tokens. The word "cat" is one token. The word "antidisestablishmentarianism" is probably six or seven tokens.

This subword approach is intentional. It gives models a fixed-size vocabulary that can still handle rare words, typos, and any language — because even an unknown word can be built from known subword pieces.

Live Examples: Text → Tokens

Hello, world!4 tokens

Hello, world!

ChatGPT3 tokens

ChatGPT

tokenization2 tokens

tokenization

🚀2 tokens

🚀

(4 spaces)1 tokens

⎵⎵⎵⎵

* Examples approximate GPT-4o tokenization (o200k_base)

Getting Started

How to Use This Tool

Three steps to accurate token and cost estimates

Select Your Platform & Model

Choose the AI provider (OpenAI, Anthropic, Google, Mistral, or xAI) then pick the exact model you plan to use. Different models from the same provider can tokenize differently — always match the model precisely.

Paste Your Text

Paste your full prompt — system message, user input, few-shot examples, whatever you plan to send. For the most accurate cost estimate, include everything that will be in the actual API call.

Hit Calculate

You'll instantly see: total token count, estimated input cost in USD, and what percentage of the model's context window you're using. A warning appears if you're above 80% of the limit.

Power User Tips

Use ⌘ + Enter (Mac) or Ctrl + Enter (Windows) to calculate without reaching for the mouse.

For a realistic cost estimate, paste system prompt + typical user message together — not just the user message alone.

The 'Input Cost' only covers your prompt. Output tokens (the model's response) are billed separately and often at a higher rate.

Watch the '~ approximation' badge. It appears for Claude, Mistral, and Grok counts, meaning there's a 3–7% margin. For critical budget planning, use the provider's official tokenizer.

Under the Hood

Model-Specific Tokenizer Logic

Why each provider needs different counting logic

OpenAI

100% Exact

GPT-4o, GPT-4, GPT-3.5, o1, o3, Codex series

Method: tiktoken (local, free)

OpenAI open-sourced their tiktoken library — the actual tokenizer used in production. We run it locally in Node.js with zero network calls. GPT-4o and newer models use o200k_base (200K vocabulary), while older models use cl100k_base (100K vocabulary). The difference matters: o200k_base tends to produce 5–10% fewer tokens on the same text because its larger vocabulary covers more whole words.

Anthropic (Claude)

~93–97% Accurate

Claude 3, 3.5, Claude 4 series

Method: tiktoken cl100k_base (local approximation)

Anthropic trains their own BPE tokenizer and has not released it publicly. We use tiktoken's cl100k_base as the closest public approximation. In internal benchmarks on English text, cl100k_base over-counts by about 3–7% compared to Claude's actual tokenizer. For non-English text, code, or heavily formatted content, variance can reach 10–15%. Anthropic does offer a free /v1/messages/count_tokens API endpoint for exact counts — we chose the local approach to avoid requiring an API key from users.

Google (Gemini)

100% Exact

Gemini 1.0, 1.5, 2.0, 2.5 series

Method: Official countTokens API (free endpoint)

Google exposes a dedicated countTokens() method in their AI SDK that calls Google's servers without triggering inference billing. We use this for exact counts. The Gemini family uses SentencePiece tokenization, which behaves differently from OpenAI's BPE — particularly on multilingual text, where Gemini models often achieve lower token counts. Google's free API tier allows 15 requests/minute; our rate limiter keeps you safely under that.

Mistral

~93–96% Accurate

Mistral Small/Medium/Large, Mixtral, Codestral, Pixtral

Method: tiktoken cl100k_base (local approximation)

Mistral uses a custom BPE tokenizer closely related to the LLaMA family. Its vocabulary structure is similar enough to cl100k_base that approximations are quite accurate for English and common European languages. Code tokenization may diverge more. Mistral does not offer a public token counting endpoint separate from inference.

xAI (Grok)

~92–96% Accurate

Grok 2, Grok 3, Grok 4, Grok 4.20

Method: tiktoken cl100k_base (local approximation)

xAI's API is OpenAI-compatible in its request format but uses Grok's own internal tokenizer. The tokenizer details are not public. We use cl100k_base approximation, which works well for English but may show higher variance on technical content, code, and non-Latin scripts. xAI provides no standalone token counting endpoint.

Side by Side

Same Prompt, Different Token Counts

How identical text tokenizes across six leading AI models

Test Prompt

"Explain the theory of relativity in simple terms."

33 characters · 7 words

Provider	Model	Tokenizer	Tokens	Method	vs GPT-4o
OpenAI	GPT-4o	`o200k_base`	9	Merges punctuation aggressively	Baseline
OpenAI	GPT-3.5 Turbo	`cl100k_base`	10	Slightly more splits on quotes	+1
Anthropic	Claude 3.5 Sonnet	`Claude BPE (est.)`	11	~approx via cl100k_base	+2
Google	Gemini 1.5 Pro	`SentencePiece`	10	Exact via API	+1
Mistral	Mistral Large	`BPE (est.)`	10	~approx via cl100k_base	+1
xAI	Grok 3	`BPE (est.)`	10	~approx via cl100k_base	+1

* Claude, Mistral, and xAI counts are approximations via tiktoken. Google is exact via API. Actual counts may vary slightly.

What this means for you: On a short sentence the difference is 1–2 tokens and negligible. But on a 10,000-token document, a 10% tokenizer difference means 1,000 tokens — roughly $0.0025 on GPT-4o, or enough to meaningfully affect context window planning. Always use the model-specific tokenizer when working at scale.

Cost Transparency

The Math Behind Token Cost

Exactly how input cost is calculated — no black box

The Formula

Cost = (Tokens ÷ 1,000,000) × Price per 1M

GPT-4o

$2.50/1M · 1,000 tokens

$0.0000025

GPT-4o Mini

$0.15/1M · 1,000 tokens

$0.00000015

Claude Sonnet 4.6

$3.00/1M · 1,000 tokens

$0.000003

Gemini 1.5 Flash

$0.075/1M · 1,000 tokens

$0.000000075

Input vs Output Pricing

Important: Our tool estimates input cost only. Output tokens — the model's response — are always priced separately and usually higher.

GPT-4o Example

→ Input (your prompt)$2.50/1M

← Output (model response)$10.00/1M

⚡ Cached input$1.25/1M

Output is 4× more expensive on GPT-4o. For a 1,000-token prompt that generates a 500-token response, 83% of the cost comes from the smaller output. Always estimate both sides for accurate budgeting.

What Does It Cost at Scale?

Model	1 API Call (500 tokens)	1K calls/day	10K calls/day	100K calls/day
GPT-4o	$0.00125	$1.25	$12.50	$125
GPT-4o Mini	$0.000075	$0.075	$0.75	$7.50
Claude Sonnet	$0.0015	$1.50	$15.00	$150
Gemini Flash	$0.0000375	$0.038	$0.375	$3.75
Grok 3	$0.0015	$1.50	$15.00	$150

* Input-only pricing. 500 token prompt assumed. Actual costs vary by prompt length and output tokens.

Token Density

Code vs. Prose: The Density Gap

Why programming text costs more per character than natural language

Plain English Prose

~4.0 chars per token avg

The quick brown fox jumps over the lazy dog.

44 characters≈ 10 tokens

Common English words are usually single tokens. Articles, prepositions, and common verbs are highly compressed. The tokenizer has seen these patterns billions of times.

Python Code

~2.5 chars per token avg

def quick_sort(arr): if len(arr) <= 1: return arr

50 characters≈ 20 tokens

Colons, underscores, brackets, indentation, and camelCase splits all add tokens. The same character count produces roughly 2× more tokens in code than English.

Token Density by Content Type

English prose~4.0 chars/token

Articles, emails, documentation

JSON/YAML data~2.8 chars/token

Config files, API payloads

Python / JS code~2.5 chars/token

Functions, classes, logic

Markdown formatted~3.5 chars/token

Headers, bold, bullets add overhead

Chinese / Japanese~1.5 chars/token

CJK characters are 2–3 tokens each

Arabic / Hebrew~1.8 chars/token

RTL scripts tokenize less efficiently

Emojis~0.5 chars/token

Each emoji = 1–3 tokens typically

Higher chars/token = cheaper. Lower = more expensive per character of content.

Advanced Knowledge

Lost in the Middle: Why Position Matters

Token count isn't the whole story — where information sits in your context affects model quality

⬆

Beginning

High Recall

Models recall information placed at the start of the context most reliably. Put your critical instructions, constraints, and key facts here.

↔

Middle

Lower Recall

Research shows LLMs struggle most with information buried in the middle of long contexts. Avoid placing critical details here in large prompts.

⬇

End

Good Recall

The recency effect: information at the end of context is recalled well. User questions and immediate tasks should usually go last.

The Practical Implications for Token Budgeting

What this means when you're near the context limit

Don't just count — plan the distribution

Two prompts with identical token counts can produce radically different output quality depending on how the information is arranged. Knowing you're at 60% context usage is useful; knowing your critical instructions are in the middle of that 60% is essential.

Context window ≠ effective context window

Models advertise context windows of 128K–2M tokens, but empirical benchmarks show recall degrades significantly in the middle sections of very long contexts. For critical tasks, treat the effective context window as roughly 60–70% of the stated maximum.

Prompt caching changes the economics of position

Anthropic and OpenAI both cache from the beginning of your prompt. If your static system prompt is 20,000 tokens and lives at the start, it gets cached on the second call and costs 50–90% less. This makes the beginning of your context window doubly valuable.

Transparency

Our Accuracy Commitment

We tell you exactly where numbers are exact and where they're estimates

Exact Counts

All OpenAI models — tiktoken is their production tokenizer

Google Gemini — official countTokens API

Approximations

Claude models — cl100k_base approx, 93–97% accurate

Mistral models — cl100k_base approx, 93–96% accurate

xAI Grok models — cl100k_base approx, 92–96% accurate

Why We Don't Claim False Precision

The honest constraint every token counter faces

Some token counter tools show exact-looking numbers for all models without disclosing that they're approximations. We think that's misleading. Anthropic, Mistral, and xAI have not released their tokenizers publicly. Any tool claiming exact counts for these models without making a live API call to the provider is either using an approximation (like us) or simply wrong.

Our approach: use the best available public approximation, be transparent about which results are exact and which are estimates, and display an "~ approximation" indicator so you can factor the margin into your planning. For production budget forecasting on Claude or Mistral, we recommend cross-checking with the provider's own tooling.

Save Money

8 Ways to Reduce Your Token Count

Practical, tested techniques to cut API costs without hurting output quality

Remove unnecessary politeness

"Please could you kindly help me by explaining..." costs 12 tokens more than "Explain:". LLMs don't need pleasantries — they respond identically.

Use bullet points over paragraphs in few-shot examples

Prose examples consume 30–50% more tokens than equivalent bullet-point examples for the same information density.

Abbreviate field names in JSON prompts

"user_first_name" → "fname". In high-volume structured output prompts, short field names can cut token usage by 15–25%.

Put your system prompt first, always

Prompt caching caches from the start. A 5,000-token static system prompt cached at 90% discount saves $0.0135 per call on GPT-4o — $13.50 per 1,000 calls.

Avoid repeating context you already provided

In multi-turn conversations, don't resend the full conversation history unless necessary. Summarise older turns to compress context cost.

Use smaller models for simple tasks

GPT-4o Mini at $0.15/1M does classification, extraction, and summarisation as well as GPT-4o for most tasks at 16× lower cost.

Strip whitespace from code blocks

If you're sending code to the model, minified or stripped code (removing comments, blank lines) can cut token count by 20–40%.

Use model-native markdown sparingly

Markdown formatting (##, **, *, ---) adds tokens. Only include it if the model needs to produce formatted output — for JSON or data extraction tasks, skip it.

FAQ

Questions Worth Asking

The questions most tools don't answer honestly

A useful rule of thumb: 1,000 English words ≈ 1,333 tokens, or put differently, 750 words ≈ 1,000 tokens. That works out to roughly one token per 4 characters of prose. But this ratio shifts with content type — code and JSON are denser (more tokens per word), while formal academic writing is usually closer to the average. Paste your actual text into the counter above to get a model-specific number instead of guessing.

Every API call has two sides: input tokens (everything you send — system prompt, conversation history, your message) and output tokens (the model's reply). Output tokens are almost always priced 3–5× higher per token than input tokens. For example, GPT-4o currently charges $2.50/M input vs $10/M output. This means a verbose, rambling response costs far more than a concise one — optimising your output instructions is often the fastest way to cut API bills.

Prompt caching lets you pay a heavily discounted rate to reuse a fixed block of tokens — like a long system prompt or a reference document — across multiple API calls. Once cached, that prefix is re-read from fast storage instead of reprocessed. Anthropic charges ~90% less for cached input tokenson Claude; OpenAI offers similar savings on their models. This is especially powerful for RAG pipelines, chatbots with static instructions, or document Q&A where the same context is sent repeatedly. Our calculator shows estimated cached-token costs so you can model the savings before you build.

Context window sizes have grown rapidly. As of 2026: GPT-4o supports 128K tokens; Claude Sonnet 4.6 and Opus 4.6 support up to 1 million tokens at flat pricing (no surcharge); and Gemini 1.5 Pro supports 1M tokens, with Gemini Flash pushing to 2M. Hitting the context limit means the model can no longer "see" earlier parts of your conversation. Use our calculator to measure your prompt size and check whether it fits before sending it to the API.

Yes — significantly. Reasoning models generate internal "thinking" or "chain-of-thought" tokens before producing a final answer. On OpenAI's o3 these are billed as output tokens; on Claude's extended thinking mode they are billed separately but still count toward your context window. A single complex query can silently burn thousands of extra tokens in reasoning steps alone. This is why the sticker price of reasoning models looks deceptively low per token — the token volume per task is much higher. Always benchmark on real tasks, not just per-token rates.

Tokenizers are trained primarily on English data, so non-English scripts are dramatically less efficient. A single Chinese, Arabic, Hindi, or Thai character can cost 2–4 tokens compared to 1 token for most English words. A 500-word prompt in Japanese can easily cost 3× what the same ideas would cost written in English. This also affects multilingual apps: if your system prompt is in English but your users reply in another language, the output tokens get expensive fast. Our counter reflects this — paste non-English text and compare the token count to an English equivalent.

For most models, token counting runs entirely in your browser using the open-source tiktoken library — your text is never sent anywhere. The only exception is Google Gemini, where we call Gemini's official countTokens API to get an exact count (that API call contains your text). If your prompt includes sensitive data, either use the browser-side models for an approximation, or sanitise the text before pasting. No text you enter here is stored, logged, or used for training.

Yes — and this surprises many developers when their first API bill arrives. Every API call bundles together your system prompt + full conversation history + the new user message, and all of it counts as input tokens. A detailed system prompt of 800 words (~1,067 tokens) is silently prepended to every single turn. Multiply that by thousands of daily API calls and it becomes a major cost driver. Our calculator lets you paste your full prompt — system message included — to see the true per-call token footprint.

There is no universal answer — it depends on your task. As a rough guide: GPT-4o Mini, Claude Haiku 4.5, and Gemini Flash are the cost leaders at roughly $0.10–$0.30 per million input tokens; mid-tier models like GPT-4o and Claude Sonnet sit around $2–$3/M; and frontier models like o3, Claude Opus, and Gemini Pro sit at $10–$15/M or more. The best strategy is to test the cheapest model that meets your quality bar — use our multi-model cost comparison to see the price difference on your actual prompt length.

The highest-impact optimisations are: (1) Trim your system prompt — every word in it is paid for on every call. (2) Use prompt caching for any static context that repeats. (3) Set a firm max_tokens output limit — models left uncapped sometimes write essays when a sentence would do. (4) Compress conversation history — summarise older turns instead of sending the full thread. (5) Downgrade the model for simpler sub-tasks within a larger pipeline. Use the counter to measure before and after each change — even a 20% token reduction compounds into large savings at scale.

Editorial Policy

How we maintain accuracy and independence

No Affiliate Bias

We are not affiliated with OpenAI, Anthropic, Google, Mistral, or xAI. Pricing data is sourced from official provider documentation and updated manually. We do not receive compensation for featuring any provider.

Methodology Transparency

Every counting method is disclosed in-tool (see the '~ approximation' label) and on this page. Where we approximate, we say so and quantify the accuracy margin. Where we use official APIs, we say so.

Pricing Accuracy

AI model pricing changes frequently. Our pricing table reflects publicly available rates as of the last update. For billing-critical decisions, always verify current pricing directly on the provider's pricing page.

Technical stack: Token counting for OpenAI models uses the open-source tiktoken npm package (Apache 2.0 licensed, authored by OpenAI). Google Gemini counts use the @google/generative-ai SDK's countTokens() method. Approximations for Claude, Mistral, and Grok use tiktoken with the cl100k_base encoding as the closest publicly available proxy tokenizer. All calculations run server-side in Next.js API routes. No user text is persisted.

Free AI Token Calculator (OpenAI, Claude, Gemini, Grok, Mistral)

Free tool to count tokens and estimate cost across major AI platforms.

Token Counter for AI ModelsHow It Works, What It Costs

What Exactly Is a Token?

The Subword Unit

Live Examples: Text → Tokens

How to Use This Tool

Select Your Platform & Model

Paste Your Text

Hit Calculate

Power User Tips

Model-Specific Tokenizer Logic

OpenAI

Anthropic (Claude)

Google (Gemini)

Mistral

xAI (Grok)

Same Prompt, Different Token Counts

The Math Behind Token Cost

The Formula

Input vs Output Pricing

What Does It Cost at Scale?

Code vs. Prose: The Density Gap

Plain English Prose

Python Code

Token Density by Content Type

Lost in the Middle: Why Position Matters

Beginning

Middle

End

The Practical Implications for Token Budgeting

Our Accuracy Commitment

Exact Counts

Approximations

Why We Don't Claim False Precision

8 Ways to Reduce Your Token Count

Remove unnecessary politeness

Use bullet points over paragraphs in few-shot examples

Abbreviate field names in JSON prompts

Put your system prompt first, always

Avoid repeating context you already provided

Use smaller models for simple tasks

Strip whitespace from code blocks

Use model-native markdown sparingly

Questions Worth Asking

Editorial Policy

No Affiliate Bias

Methodology Transparency

Pricing Accuracy

UntangleTools

Token Counter for AI Models
How It Works, What It Costs