Token Counter for AI Models
How It Works, What It Costs
A practical, honest guide to token counting across OpenAI, Claude, Gemini, Mistral, and xAI. Understand how your text becomes tokens, why the same prompt costs differently across models, and how to optimise your prompts to spend less.
5
AI Providers
60+
Models Covered
Local + API
Counting Method
None
Data Stored
What Exactly Is a Token?
Not a word, not a character — something in between
The Subword Unit
How LLMs read text
AI language models don't read text the way humans do. They convert everything into tokens — compact numeric IDs representing pieces of text. A token is usually a word fragment, a whole common word, or a punctuation symbol. The word "tokenization" might be split into "token" + "ization" — that's two tokens. The word "cat" is one token. The word "antidisestablishmentarianism" is probably six or seven tokens.
This subword approach is intentional. It gives models a fixed-size vocabulary that can still handle rare words, typos, and any language — because even an unknown word can be built from known subword pieces.
Live Examples: Text → Tokens
Hello, world!4 tokensChatGPT3 tokenstokenization2 tokens🚀2 tokens (4 spaces)1 tokens* Examples approximate GPT-4o tokenization (o200k_base)
How to Use This Tool
Three steps to accurate token and cost estimates
Select Your Platform & Model
Choose the AI provider (OpenAI, Anthropic, Google, Mistral, or xAI) then pick the exact model you plan to use. Different models from the same provider can tokenize differently — always match the model precisely.
Paste Your Text
Paste your full prompt — system message, user input, few-shot examples, whatever you plan to send. For the most accurate cost estimate, include everything that will be in the actual API call.
Hit Calculate
You'll instantly see: total token count, estimated input cost in USD, and what percentage of the model's context window you're using. A warning appears if you're above 80% of the limit.
Power User Tips
Use ⌘ + Enter (Mac) or Ctrl + Enter (Windows) to calculate without reaching for the mouse.
For a realistic cost estimate, paste system prompt + typical user message together — not just the user message alone.
The 'Input Cost' only covers your prompt. Output tokens (the model's response) are billed separately and often at a higher rate.
Watch the '~ approximation' badge. It appears for Claude, Mistral, and Grok counts, meaning there's a 3–7% margin. For critical budget planning, use the provider's official tokenizer.
Model-Specific Tokenizer Logic
Why each provider needs different counting logic
OpenAI
100% ExactGPT-4o, GPT-4, GPT-3.5, o1, o3, Codex series
Method: tiktoken (local, free)
OpenAI open-sourced their tiktoken library — the actual tokenizer used in production. We run it locally in Node.js with zero network calls. GPT-4o and newer models use o200k_base (200K vocabulary), while older models use cl100k_base (100K vocabulary). The difference matters: o200k_base tends to produce 5–10% fewer tokens on the same text because its larger vocabulary covers more whole words.
Anthropic (Claude)
~93–97% AccurateClaude 3, 3.5, Claude 4 series
Method: tiktoken cl100k_base (local approximation)
Anthropic trains their own BPE tokenizer and has not released it publicly. We use tiktoken's cl100k_base as the closest public approximation. In internal benchmarks on English text, cl100k_base over-counts by about 3–7% compared to Claude's actual tokenizer. For non-English text, code, or heavily formatted content, variance can reach 10–15%. Anthropic does offer a free /v1/messages/count_tokens API endpoint for exact counts — we chose the local approach to avoid requiring an API key from users.
Google (Gemini)
100% ExactGemini 1.0, 1.5, 2.0, 2.5 series
Method: Official countTokens API (free endpoint)
Google exposes a dedicated countTokens() method in their AI SDK that calls Google's servers without triggering inference billing. We use this for exact counts. The Gemini family uses SentencePiece tokenization, which behaves differently from OpenAI's BPE — particularly on multilingual text, where Gemini models often achieve lower token counts. Google's free API tier allows 15 requests/minute; our rate limiter keeps you safely under that.
Mistral
~93–96% AccurateMistral Small/Medium/Large, Mixtral, Codestral, Pixtral
Method: tiktoken cl100k_base (local approximation)
Mistral uses a custom BPE tokenizer closely related to the LLaMA family. Its vocabulary structure is similar enough to cl100k_base that approximations are quite accurate for English and common European languages. Code tokenization may diverge more. Mistral does not offer a public token counting endpoint separate from inference.
xAI (Grok)
~92–96% AccurateGrok 2, Grok 3, Grok 4, Grok 4.20
Method: tiktoken cl100k_base (local approximation)
xAI's API is OpenAI-compatible in its request format but uses Grok's own internal tokenizer. The tokenizer details are not public. We use cl100k_base approximation, which works well for English but may show higher variance on technical content, code, and non-Latin scripts. xAI provides no standalone token counting endpoint.
Same Prompt, Different Token Counts
How identical text tokenizes across six leading AI models
Test Prompt
"Explain the theory of relativity in simple terms."
33 characters · 7 words
| Provider | Model | Tokenizer | Tokens | Method | vs GPT-4o |
|---|---|---|---|---|---|
| OpenAI | GPT-4o | o200k_base | 9 | Merges punctuation aggressively | Baseline |
| OpenAI | GPT-3.5 Turbo | cl100k_base | 10 | Slightly more splits on quotes | +1 |
| Anthropic | Claude 3.5 Sonnet | Claude BPE (est.) | 11 | ~approx via cl100k_base | +2 |
| Gemini 1.5 Pro | SentencePiece | 10 | Exact via API | +1 | |
| Mistral | Mistral Large | BPE (est.) | 10 | ~approx via cl100k_base | +1 |
| xAI | Grok 3 | BPE (est.) | 10 | ~approx via cl100k_base | +1 |
* Claude, Mistral, and xAI counts are approximations via tiktoken. Google is exact via API. Actual counts may vary slightly.
What this means for you: On a short sentence the difference is 1–2 tokens and negligible. But on a 10,000-token document, a 10% tokenizer difference means 1,000 tokens — roughly $0.0025 on GPT-4o, or enough to meaningfully affect context window planning. Always use the model-specific tokenizer when working at scale.
The Math Behind Token Cost
Exactly how input cost is calculated — no black box
The Formula
Cost = (Tokens ÷ 1,000,000) × Price per 1MGPT-4o
$2.50/1M · 1,000 tokens
GPT-4o Mini
$0.15/1M · 1,000 tokens
Claude Sonnet 4.6
$3.00/1M · 1,000 tokens
Gemini 1.5 Flash
$0.075/1M · 1,000 tokens
Input vs Output Pricing
Important: Our tool estimates input cost only. Output tokens — the model's response — are always priced separately and usually higher.
GPT-4o Example
Output is 4× more expensive on GPT-4o. For a 1,000-token prompt that generates a 500-token response, 83% of the cost comes from the smaller output. Always estimate both sides for accurate budgeting.
What Does It Cost at Scale?
| Model | 1 API Call (500 tokens) | 1K calls/day | 10K calls/day | 100K calls/day |
|---|---|---|---|---|
| GPT-4o | $0.00125 | $1.25 | $12.50 | $125 |
| GPT-4o Mini | $0.000075 | $0.075 | $0.75 | $7.50 |
| Claude Sonnet | $0.0015 | $1.50 | $15.00 | $150 |
| Gemini Flash | $0.0000375 | $0.038 | $0.375 | $3.75 |
| Grok 3 | $0.0015 | $1.50 | $15.00 | $150 |
* Input-only pricing. 500 token prompt assumed. Actual costs vary by prompt length and output tokens.
Code vs. Prose: The Density Gap
Why programming text costs more per character than natural language
Plain English Prose
~4.0 chars per token avg
Common English words are usually single tokens. Articles, prepositions, and common verbs are highly compressed. The tokenizer has seen these patterns billions of times.
Python Code
~2.5 chars per token avg
Colons, underscores, brackets, indentation, and camelCase splits all add tokens. The same character count produces roughly 2× more tokens in code than English.
Token Density by Content Type
Articles, emails, documentation
Config files, API payloads
Functions, classes, logic
Headers, bold, bullets add overhead
CJK characters are 2–3 tokens each
RTL scripts tokenize less efficiently
Each emoji = 1–3 tokens typically
Higher chars/token = cheaper. Lower = more expensive per character of content.
Lost in the Middle: Why Position Matters
Token count isn't the whole story — where information sits in your context affects model quality
Beginning
High Recall
Models recall information placed at the start of the context most reliably. Put your critical instructions, constraints, and key facts here.
Middle
Lower Recall
Research shows LLMs struggle most with information buried in the middle of long contexts. Avoid placing critical details here in large prompts.
End
Good Recall
The recency effect: information at the end of context is recalled well. User questions and immediate tasks should usually go last.
The Practical Implications for Token Budgeting
What this means when you're near the context limit
Don't just count — plan the distribution
Two prompts with identical token counts can produce radically different output quality depending on how the information is arranged. Knowing you're at 60% context usage is useful; knowing your critical instructions are in the middle of that 60% is essential.
Context window ≠ effective context window
Models advertise context windows of 128K–2M tokens, but empirical benchmarks show recall degrades significantly in the middle sections of very long contexts. For critical tasks, treat the effective context window as roughly 60–70% of the stated maximum.
Prompt caching changes the economics of position
Anthropic and OpenAI both cache from the beginning of your prompt. If your static system prompt is 20,000 tokens and lives at the start, it gets cached on the second call and costs 50–90% less. This makes the beginning of your context window doubly valuable.
Our Accuracy Commitment
We tell you exactly where numbers are exact and where they're estimates
Exact Counts
All OpenAI models — tiktoken is their production tokenizer
Google Gemini — official countTokens API
Approximations
Claude models — cl100k_base approx, 93–97% accurate
Mistral models — cl100k_base approx, 93–96% accurate
xAI Grok models — cl100k_base approx, 92–96% accurate
Why We Don't Claim False Precision
The honest constraint every token counter faces
Some token counter tools show exact-looking numbers for all models without disclosing that they're approximations. We think that's misleading. Anthropic, Mistral, and xAI have not released their tokenizers publicly. Any tool claiming exact counts for these models without making a live API call to the provider is either using an approximation (like us) or simply wrong.
Our approach: use the best available public approximation, be transparent about which results are exact and which are estimates, and display an "~ approximation" indicator so you can factor the margin into your planning. For production budget forecasting on Claude or Mistral, we recommend cross-checking with the provider's own tooling.
8 Ways to Reduce Your Token Count
Practical, tested techniques to cut API costs without hurting output quality
Remove unnecessary politeness
"Please could you kindly help me by explaining..." costs 12 tokens more than "Explain:". LLMs don't need pleasantries — they respond identically.
Use bullet points over paragraphs in few-shot examples
Prose examples consume 30–50% more tokens than equivalent bullet-point examples for the same information density.
Abbreviate field names in JSON prompts
"user_first_name" → "fname". In high-volume structured output prompts, short field names can cut token usage by 15–25%.
Put your system prompt first, always
Prompt caching caches from the start. A 5,000-token static system prompt cached at 90% discount saves $0.0135 per call on GPT-4o — $13.50 per 1,000 calls.
Avoid repeating context you already provided
In multi-turn conversations, don't resend the full conversation history unless necessary. Summarise older turns to compress context cost.
Use smaller models for simple tasks
GPT-4o Mini at $0.15/1M does classification, extraction, and summarisation as well as GPT-4o for most tasks at 16× lower cost.
Strip whitespace from code blocks
If you're sending code to the model, minified or stripped code (removing comments, blank lines) can cut token count by 20–40%.
Use model-native markdown sparingly
Markdown formatting (##, **, *, ---) adds tokens. Only include it if the model needs to produce formatted output — for JSON or data extraction tasks, skip it.
Questions Worth Asking
The questions most tools don't answer honestly
Editorial Policy
How we maintain accuracy and independence
No Affiliate Bias
We are not affiliated with OpenAI, Anthropic, Google, Mistral, or xAI. Pricing data is sourced from official provider documentation and updated manually. We do not receive compensation for featuring any provider.
Methodology Transparency
Every counting method is disclosed in-tool (see the '~ approximation' label) and on this page. Where we approximate, we say so and quantify the accuracy margin. Where we use official APIs, we say so.
Pricing Accuracy
AI model pricing changes frequently. Our pricing table reflects publicly available rates as of the last update. For billing-critical decisions, always verify current pricing directly on the provider's pricing page.
Technical stack: Token counting for OpenAI models uses the open-source tiktoken npm package (Apache 2.0 licensed, authored by OpenAI). Google Gemini counts use the @google/generative-ai SDK's countTokens() method. Approximations for Claude, Mistral, and Grok use tiktoken with the cl100k_base encoding as the closest publicly available proxy tokenizer. All calculations run server-side in Next.js API routes. No user text is persisted.


