Anthropic / Claude Token Guide

Claude Token Counter
Claude 3.5 Sonnet, Opus & Haiku — Deep Dive

Count tokens for Anthropic's Claude models accurately. Understand how Claude's 200K context window works in practice, why XML-structured prompts outperform plain text, and where token waste silently accumulates in production Claude applications.

200K

Max Context

Claude Models

Up to 90%

Cache Savings

≈ Estimated

Count Method

What Sets Claude Apart

Claude's Long Context Advantage Explained

200,000 tokens — and why it changes what's possible with AI

What 200K Tokens Actually Means

Practical context sizes

Full novel (e.g., War and Peace)~180,000

Entire codebase (medium app)~80,000–150,000

100-page PDF document~50,000–70,000

50 customer support tickets~25,000–40,000

Full legal contract~10,000–30,000

1-hour meeting transcript~8,000–12,000

The "Lost in the Middle" Problem

Research shows all LLMs — including Claude — have reduced attention to information buried in the middle of very long contexts. Information at the beginning (primacy effect) and end (recency effect) is recalled better than content in the middle 40–60% of the context.

Place the most critical instructions at the very beginning of your system prompt.

High

Repeat key constraints at the end of the user message for long-context calls.

High

For document Q&A, place the question both before and after the document.

Medium

Use XML section markers to help Claude navigate structure: <section id='key_facts'>

Medium

Claude-Specific Technique

XML Prompts Perform Better on Claude

A rare insight: Anthropic's own documentation confirms XML structure improves Claude accuracy

Why XML Works for Claude

Claude's training data included vast amounts of XML/HTML-structured content. Its RLHF reward model learned to associate XML-delimited sections with distinct semantic roles. When you use XML tags, Claude applies higher attention weight to the labeled section — improving instruction-following, especially in complex multi-part prompts.

Anthropic's Recommended Pattern

<system>
  <role>Senior data analyst</role>
  <output_format>JSON only</output_format>
  <constraints>
    No hallucination. If unknown, say null.
  </constraints>
</system>

<document>
  {{DOCUMENT_CONTENT}}
</document>

<task>
  Extract all named entities.
</task>

Token Cost of XML vs Plain Text

Plain Text Prompt (~38 tokens)

You are a senior data analyst. Output only JSON. Never hallucinate — use null if unknown. Extract named entities from the document below.

XML Prompt (~46 tokens, +8 tokens)

<role>Senior data analyst</role>
<output>JSON only</output>
<rule>null if unknown</rule>
<task>Extract named entities</task>

The Trade-off

XML adds ~8–20 tokens per prompt (~3–5% overhead). However, improved instruction-following reduces retry rates by an estimated 15–30% in production. One avoided retry saves far more than 20 tokens. XML structure is a net cost saving at scale.

Advanced XML Patterns for Claude

Few-shot examples

<examples>
  <example>
    <input>...</input>
    <output>...</output>
  </example>
</examples>

Chain-of-thought control

<thinking>
  Analyze step by step.
</thinking>
<answer>
  Final answer only.
</answer>

Document injection

<documents>
  <doc id="1" source="contract.pdf">
    {{CONTENT}}
  </doc>
</documents>

Efficiency Audit

Where Claude Wastes Your Tokens

Common token wastage patterns in production Claude applications and how to fix them

Verbose Reasoning by Default

High Impact

Claude often prepends its reasoning with phrases like 'Certainly! I'd be happy to help...' or 'Great question! Let me think through this step by step...' before reaching the actual answer. These preambles can add 20–50 tokens per response. Fix: add 'Skip pleasantries. Begin your response immediately with the answer.' to your system prompt.

Redundant Acknowledgments

Medium Impact

Claude restates the task before completing it ('You've asked me to summarize the following document...'). This costs 15–40 tokens per call. In an instruction-following context, Claude doesn't need to repeat what you asked. Add: 'Do not restate the task. Respond directly.'

Over-qualification of Uncertain Answers

Medium Impact

Claude adds extensive caveats and disclaimers — good for accuracy but costly in tokens. 'Please note that I may be mistaken and you should verify this information with a qualified professional...' can add 30–60 tokens. For internal tooling where users understand limitations, instruct Claude to be concise about uncertainty: 'When uncertain, write (verify) after the claim instead of a full caveat.'

Extended Thinking in Claude 3.7+

Budget Buster

Claude 3.7 Sonnet introduces extended thinking — the model can reason for thousands of tokens before answering. This is billed as output tokens ($15/1M). For simple tasks, thinking can cost 10–50× the answer itself. Disable extended thinking for classification, extraction, and short-answer tasks where it provides no benefit.

Claude vs GPT-4o — Token Efficiency by Use Case

Use Case	Claude Efficiency	GPT-4o Efficiency	Reason
Long document analysis	✅ Better	⚠️ Good	200K context, stronger attention
Structured JSON output	⚠️ Good	✅ Better	GPT-4o's JSON mode is more consistent
Multi-step reasoning	✅ Better	✅ Equal (o1)	Extended thinking vs o1 series
Short Q&A / classification	⚠️ Verbose	✅ Better	Claude adds unnecessary preambles
Code generation	✅ Better	✅ Equal	Claude produces cleaner code comments
Safety-critical applications	✅ Better	⚠️ Good	Constitutional AI training

Expert FAQ

Claude Token Questions Answered

Based on real questions from developers building with Anthropic's API

Editorial Standards

Our content is created by experts and reviewed for technical accuracy. We follow strict editorial guidelines to ensure quality.

Learn more about our standards

Contact Information

UntangleTools
support@untangletools.com

Last Updated

April 6, 2026

Claude Token Counter

Count tokens for Claude models and estimate API cost efficiently.

Claude Token CounterClaude 3.5 Sonnet, Opus & Haiku — Deep Dive