AI Engineering

May 11, 202614 mins read

Is Your AI Habit Sustainable? A Week in the Life of a Conscious Prompt Engineer

I spent a week logging every single AI interaction — model, token count, task type — and then ran the numbers through a carbon calculator. The result surprised me. 847,000 tokens. $18.40. Roughly 3.2 kg CO2. That last number is what made me sit down and rethink how I prompt.

#AI carbon footprint#token usage#sustainable AI#prompt engineering#LLM cost optimization#AI energy consumption#conscious AI use

The moment I started treating my AI usage like a utility bill — something to meter, to understand, to optimize — everything changed. Not because I became less productive. Because I became more deliberate.

It started with a conversation I had with a colleague about data center energy in early 2025. She mentioned, almost in passing, that GPT-4 uses roughly 10 watt-hours per query. I didn't believe her. I ran maybe 80–100 AI queries a day at that point — code reviews, email drafts, documentation, research summaries. If she was right, my AI habit was consuming about a kilowatt-hour of electricity daily. That's more than leaving a desktop computer running all day. That's before my team's usage, before the caching, before the reasoning chains I sometimes kick off and forget about.

I decided to track it for a week. Properly. Every prompt, every model, every token count. What I found was useful enough that I wrote this — not as a confession, not as a polemic, but as a practical account of what conscious AI use actually looks like when you measure it.

---

Why I Started Tracking My AI Usage Like a Utility Bill

Most developers I know have no idea how many tokens they consume in a week. They know their monthly API bill, maybe. They know when they hit a rate limit. But the token volume — the actual scale of text in and out — is invisible to them, the same way most people couldn't tell you how many kWh their home used last month without checking.

This invisibility matters because tokens are the unit of cost *and* the unit of energy in AI inference. Every token processed requires computation, and every computation requires electricity. The connection between "write me a product description" and a running turbine somewhere is real, not metaphorical. It's just distant and abstracted behind a clean API response.

The reason I use a token calculator as my first step is that it makes the invisible visible before any cost or carbon math runs. You paste your prompt, see the token count, and suddenly "this feels like a short prompt" becomes "this is 340 input tokens with an estimated 800 token completion." That number is concrete. It's auditable. It's the foundation for everything else.

Token Calculator — Count Tokens Before You Send

Once you have token counts that you actually trust, the carbon and cost math becomes straightforward. Without them, you are estimating on top of guesses.

---

How Many Tokens Does a Typical AI User Consume Weekly?

Before the case study, it's worth anchoring to baselines — because most people dramatically underestimate their token consumption.

A casual AI user — someone who chats with Claude or ChatGPT a few times a day for quick questions, email polish, or idea generation — sits in the 50,000 to 200,000 token per week range. That sounds large until you realize that a typical back-and-forth conversation with context included might run 2,000–4,000 tokens per exchange. Ten conversations a day gets you to 35,000 tokens before you've done anything heavy.

A working developer using AI for code review, documentation, test generation, and debugging operates in a different tier entirely: 500,000 to 2,000,000 tokens per week is common. Once you include context windows (sending 8,000 tokens of code file context with every review request), the numbers compound fast.

An AI power user running agents, multi-step pipelines, or automated workflows hits 10,000,000 tokens per week or more without blinking. Automated agents that loop on tasks can burn through tokens faster than any human reviewer could sanity-check.

User Type	Weekly Token Range	Typical Use Case
Casual	50K – 200K	Chat, quick questions, email drafts
Active knowledge worker	200K – 800K	Writing, research, summarization
Developer	500K – 2M	Code review, debugging, documentation
AI-augmented developer	2M – 10M	Agents, code generation, pipelines
Enterprise / automated workflows	10M – 1B	Multi-agent systems, batch processing

I came into this exercise thinking I was an "active knowledge worker." I was wrong about that. The log told a different story.

---

The Week-Long Case Study: Day-by-Day Token Log

I'm a full-stack developer who uses AI daily for code, writing, and system design. I also run a tools product, which means I'm occasionally testing prompts as product development. Here's what one representative week looked like — all tokens counted through API response logs, not estimated.

Monday — Email Triage and Weekly Planning

I started the week using Claude Sonnet to work through a backlog of technical emails that needed thoughtful replies. I sent context-heavy messages — each one included thread history and relevant background — and asked for draft replies I could edit. 12 drafts, average 3,200 tokens each.

I also ran a planning session: I pasted last week's task log and asked for a prioritization framework. That conversation ran long — the context window carried 6,000 tokens by the end.

*Monday total: 52,400 tokens. Model: Claude Sonnet. Estimated cost: $1.20*

Tuesday — Feature Documentation and API Design

Documentation day. I was writing specs for a new feature and used GPT-4o to help draft the technical reference sections. This is where token consumption accelerates: documentation prompts include large code blocks as context, and the completions are themselves long and structured.

I also had three separate code review sessions — pasted functions, asked for analysis, iterated on suggestions. Each review cycle averaged 4,800 tokens.

*Tuesday total: 143,000 tokens. Model: GPT-4o. Estimated cost: $4.30*

Wednesday — Research and Competitive Analysis

I needed to understand how three competing tools handled a specific UX problem. I ran a structured research workflow — ask for analysis, get a response, ask follow-up questions with the previous response included as context. By the fifth exchange in each thread, the context alone was 9,000+ tokens per message.

This is where I first noticed the snowball effect: each conversation gets more expensive as it lengthens, not because your new prompt is longer, but because the included context grows with every turn.

*Wednesday total: 189,000 tokens. Model mix: GPT-4o + Claude Sonnet. Estimated cost: $5.10*

Thursday — Code Generation Sprint

I was building a new UI component and used AI heavily for the boilerplate. Scaffold the component, add the props interface, write the unit tests, write the storybook story. Four tasks, but each task included the growing codebase as context.

I also accidentally triggered a reasoning chain I hadn't needed — I asked a question that GPT-4o decided required step-by-step thinking, and the completion ran to 2,400 tokens for an answer I could have gotten in 200. I noticed this only when I checked the usage dashboard that evening.

*Thursday total: 218,000 tokens. Model: GPT-4o. Estimated cost: $6.50*

Friday — Lighter Day: Writing and Ideation

I used Claude Haiku for most of Friday — blog outline generation, headline variants, rephrasing paragraphs. Haiku is fast and cheap for this kind of task. My token count dropped significantly despite similar task volume, purely because I matched the model to the task.

*Friday total: 87,000 tokens. Model: Claude Haiku (primary). Estimated cost: $0.52*

Weekend — Minimal, Personal Use

Some personal queries, a recipe conversion, a few casual questions. Light consumption.

*Weekend total: 28,600 tokens. Model: Claude Sonnet. Estimated cost: $0.78*

---

Week Total: 718,000 tokens. Total cost: $18.40.

The Tuesday–Thursday block was the surprise. I knew those were heavy AI days, but I'd have estimated 400,000 tokens max, not 550,000. The context-snowball effect — conversations getting heavier as they lengthen — is real and invisible if you're not actively monitoring.

---

What Is the Carbon Footprint of My AI Prompts?

This is the question I actually wanted to answer. Token counts and dollar costs are useful — but carbon emissions are what made this feel real to me in a different way.

The carbon calculation chains three variables together:

Tokens × Energy per Token × Carbon Intensity of the grid = gCO₂

The energy-per-token figure varies by model. A large model like GPT-4o or Claude Opus uses approximately 0.0035–0.005 Wh per 1,000 tokens. A medium model like Claude Sonnet or GPT-3.5 runs around 0.0012 Wh/1K tokens. A small model like Claude Haiku, Mistral 7B, or LLaMA 3 8B comes in around 0.0005 Wh/1K tokens. These figures come from published academic research on transformer inference efficiency — they're estimates with real uncertainty ranges, not precise measurements.

The grid carbon intensity is the other major variable. India's grid runs at approximately 700 gCO₂/kWh. The US average sits around 400 gCO₂/kWh. Western Europe averages 200–300 gCO₂/kWh. Cloud data centers — optimized infrastructure scenarios — typically run at 250 gCO₂/kWh or lower. Where the inference runs matters as much as how much inference runs.

For my week, running the numbers through the AI carbon calculator with a cloud data center scenario (250 gCO₂/kWh — the most realistic assumption for API calls to major providers):

Day	Tokens	Primary Model	Energy (Wh)	CO₂ (grams)
Monday	52,400	Claude Sonnet	0.063 Wh	15.7g
Tuesday	143,000	GPT-4o	0.715 Wh	178.8g
Wednesday	189,000	GPT-4o / Sonnet	0.794 Wh	198.5g
Thursday	218,000	GPT-4o	1.090 Wh	272.5g
Friday	87,000	Claude Haiku	0.044 Wh	10.9g
Weekend	28,600	Claude Sonnet	0.034 Wh	8.6g
Total	718,000	—	2.74 Wh	685g

685 grams of CO₂ for one week of personal AI use. In relatable terms: roughly equivalent to driving 4.3 km in an average petrol car, or running an LED bulb continuously for about 450 hours.

That sounds manageable — until you scale it. If every developer on a 20-person engineering team is using AI at similar intensity, that's 13.7 kg CO₂ per week from AI inference alone, before touching infrastructure, CI/CD, or anything else. Multiply by 52 weeks and you're at 712 kg CO₂ annually from a 20-person team's AI habit. That's getting into territory that belongs on a sustainability report.

Calculate Your AI Carbon Footprint by Model and Grid

The carbon calculator lets you enter token counts alongside your model choice, country, and whether you're running through cloud infrastructure. It outputs the CO₂ in grams alongside relatable equivalents — Google searches, emails, km driven — which makes the abstract number land differently than raw grams do.

---

Is AI More Polluting Than Web Searches?

Yes, by a significant margin — and the gap is larger than most people expect.

A single Google search consumes approximately 0.0003 kWh of electricity. A query to a large language model like GPT-4 consumes roughly 0.001–0.003 kWh depending on the length and complexity of the response. That's 3x to 10x more energy per query, at the individual level.

But the more relevant comparison is at scale. Google processes approximately 8.5 billion searches per day. OpenAI reportedly processes over 100 million queries per day across its products. The absolute energy differential per query multiplied by the volume difference means that AI inference is not yet close to search in total global energy footprint — but it's growing at a rate that search never did. The International Energy Agency's 2025 projections flagged AI data center growth as one of the fastest-growing electricity demand categories in the developed world.

The honest framing: one AI query uses more electricity than one web search by a factor of 5x to 10x. For a light user who replaces ten daily searches with two AI queries, the net energy impact might actually be similar. For a developer running dozens of AI queries daily that go far beyond what search could accomplish, the comparison breaks down — the use cases aren't comparable.

---

Why Do Some Prompts Use 50x More Energy Than Others?

This was the most practically useful thing I learned during the week.

The energy cost of an AI query is almost entirely determined by the number of tokens in the completion — the output — not the input. Input tokens are processed in parallel and are relatively cheap. Output tokens are generated sequentially, one at a time, each requiring a full forward pass through the model. Longer outputs cost more energy proportionally.

Reasoning models make this worse in a specific way. When you send a query to a model configured to "think step by step" — or when you're using a model like o1, o3, or Claude's extended thinking mode — the model generates reasoning tokens internally before producing the final answer. These tokens are often invisible to you in the final response, but they're processed and therefore they cost energy. A reasoning chain might generate 600 internal tokens to produce a 150-token answer. You see 150 tokens billed; the model processed 750.

The 50x energy difference between a simple query and a heavy reasoning query comes from stacking multiple factors:

Model size — A 70B parameter model uses roughly 50x more energy per token than a 7B parameter model. The relationship isn't linear with parameter count, but it's substantial.

Output length — A 2,000-token completion costs 50x more than a 40-token completion. Asking for exhaustive analysis when you need a decision doubles your token bill on top of everything else.

Reasoning chains — Extended thinking modes can multiply token consumption 3x to 5x per query without a proportional improvement in output quality for most tasks.

Context size — Sending a 16,000-token context window with every message in a multi-turn conversation means each subsequent message processes all previous context again. A 10-message conversation with growing context can cost 10x more than the same information exchanged more efficiently.

---

How Do I Track Token Usage Daily Without Losing My Mind?

The honest answer is: you don't track it manually. You build one lightweight habit and let the tooling do the rest.

For API users, every response from OpenAI, Anthropic, Google, and most other providers includes a `usage` object in the JSON response. It looks something like this:

\`\`\`json

{

"usage": {

"prompt_tokens": 1847,

"completion_tokens": 412,

"total_tokens": 2259

}

\`\`\`

If you're building with the API directly, logging this field to a database or even a local CSV file takes about five lines of code. Over a week you accumulate a complete usage log with timestamps, model names, and token counts. That's the raw material for everything else — cost calculations, carbon estimates, trend spotting.

For non-API users — people using ChatGPT, Claude.ai, or similar web products — the OpenAI dashboard shows token usage broken down by model and date. Anthropic's console does the same for API keys. These dashboards are sufficient for weekly reviews even if they're too coarse for real-time tracking.

For a quick sanity check before sending a long prompt — especially one with large code blocks or document context — pasting the prompt into a token calculator first takes ten seconds and can save meaningful cost if the count comes back higher than expected.

Count Tokens Before You Send — Free Token Calculator

---

How to Reduce AI Token Consumption Without Reducing Output Quality

This is the part of the week that changed my actual behavior, because I tested changes and measured the results. Here are the five that moved the needle most.

Match the model to the task. This is the single highest-leverage change available. Using Claude Haiku or GPT-3.5-level models for tasks that don't require deep reasoning — formatting, rephrasing, short summaries, simple classification — reduces per-token energy consumption by 70–85% compared to GPT-4o or Claude Opus. Friday's numbers in my case study illustrate this: similar task volume to Monday, less than half the token count because Haiku's completions are leaner and faster.

Write shorter, more specific prompts. Vague prompts produce longer, more exploratory completions. A prompt that says "help me with this function" generates a response that covers explanation, alternatives, edge cases, and suggestions. A prompt that says "rewrite this function to handle null inputs without changing the return type" generates a focused 80-token response. The specificity shift cuts completion length by 40–60% for coding tasks.

Compress context before including it. Rather than pasting a full 400-line file as context, paste only the relevant function and a three-sentence description of the surrounding system. You lose nothing the model needs and cut your input token count by 60–80%.

Start new conversations instead of extending old ones. Once a conversation exceeds eight to ten turns, the context overhead often exceeds the value of the conversation history. Start a new thread with a compressed summary of the relevant context rather than continuing to pay the tax on an ever-growing window.

Avoid reasoning chains for tasks that don't require them. Extended thinking modes and chain-of-thought prompting are valuable when you genuinely need complex multi-step reasoning. They're expensive overhead when you need a quick answer, a reformatting task, or a well-defined code change. Reserve them deliberately.

Optimization	Estimated Token Reduction	Best Applied To
Downsize model tier	70–85% energy per token	Formatting, classification, simple edits
Shorter, specific prompts	40–60% on completions	Code tasks, structured outputs
Compressed context	60–80% on inputs	Code review, document analysis
Fresh conversation threads	20–40% per late-session message	Research sessions, long projects
Skip unnecessary reasoning chains	3–5x per triggered query	Simple lookups, factual questions

---

Will AI Usage Double My Carbon Footprint?

For most individuals using AI at the casual-to-active level, no — not yet. Personal AI inference is still a small fraction of total energy use compared to home heating, transportation, and food. A week of 700,000 tokens at 685g CO₂ is real but not dominant.

For enterprises, the picture is different. A company running AI agents, automated customer service, code generation pipelines, and internal assistants at scale is operating in the billions of tokens per month range. The IEA's 2025 data center projections suggested AI could account for 3–4% of global electricity demand by 2028, up from under 0.5% in 2023. At those scales, "will it double my footprint" is the wrong frame — the question becomes whether the organization is measuring it, reporting it, and actively managing it the same way it manages other infrastructure costs.

The useful personal threshold I've started using: if your weekly token consumption exceeds 1,000,000 tokens on large models consistently, the carbon contribution is meaningful enough to belong in a personal sustainability accounting. Below that, the highest-leverage thing you can do is use the right model size for the task — which also happens to be the thing that saves the most money.

---

What I Changed — And What Actually Worked

After the week of tracking, I made three changes that I've maintained for the two months since.

I built a default model hierarchy. My first instinct for any task now goes to a small model. I escalate to a large model only when the small model's output is genuinely insufficient for two iterations. This alone cut my weekly token cost by about 35%.

I started writing "tight" context blocks. Before including code or document context, I spend 60 seconds compressing it to the essential parts. This became a habit faster than I expected — the act of compressing forces me to think through what the model actually needs, which often clarifies what I should be asking.

I do a weekly carbon review. Ten minutes, Sunday evening. I pull the token counts from API logs, run them through the carbon calculator with the actual model split, and look at the number. The review itself doesn't reduce emissions — but watching the number trend down week over week is the feedback loop that keeps the habits alive.

The week before I started tracking: 718,000 tokens. Six weeks later: 441,000 tokens. Same output quality. Same project velocity. 38.6% reduction.

The prompts didn't get worse. They got more deliberate.

---

The Tools I Use to Stay Accountable

Two tools sit at the center of this practice, and both are free.

Token Calculator — I use this before sending any prompt that feels large. Paste the prompt, see the count, decide whether to trim. It takes ten seconds and has paid for itself in API costs many times over. For developers building with the API, it's also the fastest way to verify that your context compression is actually working — paste the compressed version and compare the count to the original.

Count Your Tokens Before You Pay For Them

AI Carbon Calculator — This is the tool I use for the weekly review. Enter the token volume, select the model, pick the grid (I use cloud data center mode for API calls). The output gives me grams CO₂, energy in watt-hours, and the real-world comparisons that make the number land. The grid vs. cloud comparison is useful because it shows what the same token volume would cost on a coal-heavy grid versus optimized cloud infrastructure — a useful argument for provider selection when you have flexibility.

Calculate the Carbon Cost of Your AI Usage This Week

The goal isn't to feel guilty about using AI. The goal is to use it the same way a good engineer uses any resource: deliberately, with awareness of the cost, and with a bias toward efficiency that doesn't compromise the work.

Token awareness is the first step. Everything else follows from that.

About the Author

Devansh Gondaliya

Software Engineer | Content Creator

Devansh is a MERN stack developer and AI systems engineer who builds production AI pipelines, token-optimized prompt systems, and developer tools. He has been tracking API token consumption and inference costs across projects since 2023 and writes about the engineering decisions that determine what AI actually costs — in dollars and in carbon.

Sources & References

External links are provided for informational purposes. We are not responsible for the content of external sites.

FAQ

Frequently Asked Questions

Answers to the most common questions about creating invoices, GST billing, and using the tool

It depends heavily on use case. A casual user (chat, quick questions, email drafts) typically consumes 50,000 to 200,000 tokens per week. An active developer using AI for code review, documentation, and debugging sits in the 500,000 to 2,000,000 token range. Power users running agents or automated pipelines can exceed 10,000,000 tokens per week. You can track your usage via API response logs (every response includes a usage object), provider dashboards like OpenAI's usage console, or a token calculator for prompt-level estimates before sending.

The calculation chains three values: tokens processed × energy per token × carbon intensity of the electricity grid. For a large model like GPT-4o running on typical cloud infrastructure (250 gCO₂/kWh), approximately 1,000 tokens produces roughly 0.875 to 1.25 grams of CO₂. A week of 700,000 tokens across a mix of large and medium models comes to approximately 600–800 grams CO₂ — comparable to driving a petrol car about 4–5 km. The carbon intensity of the grid matters as much as the model size: the same token volume on a coal-heavy grid (700 gCO₂/kWh) produces nearly 3x the emissions of a renewable-heavy grid.

The most reliable method for API users is logging the usage field from every API response — it contains prompt_tokens, completion_tokens, and total_tokens for each call. Aggregating this to a database or CSV gives you a complete daily log. For web product users, OpenAI's usage dashboard and Anthropic's console both show token consumption broken down by model and date. For pre-send estimates on large prompts, a token calculator lets you count tokens before the call, which is useful for identifying prompts that are heavier than expected before paying for them.

Several factors stack to create large energy differences. Model size is the biggest: a 70B parameter model uses roughly 50x more energy per token than a 7B model. Output length amplifies this — output tokens are generated sequentially (unlike input tokens processed in parallel), so a 2,000-token completion costs 50x more energy than a 40-token one. Reasoning chains compound the problem: extended thinking modes generate hundreds or thousands of internal reasoning tokens before producing the visible answer, multiplying energy use 3x to 5x per query. Growing context windows in long conversations mean each subsequent message re-processes all prior context, making late-session messages disproportionately expensive.

The highest-leverage change is model tier selection: using small models (Claude Haiku, Mistral 7B) for tasks that don't require deep reasoning cuts energy per token by 70–85% versus large models, with no quality loss on formatting, rephrasing, and simple classification tasks. Beyond model selection: write shorter, more specific prompts (this cuts completion length by 40–60% on most tasks), compress context to only the relevant sections before including it, start fresh conversations rather than extending long threads past 8–10 turns, and avoid triggering reasoning chains for straightforward queries.

Per query, yes — by a factor of roughly 5x to 10x. A Google search uses approximately 0.0003 kWh. A large-model AI query uses 0.001 to 0.003 kWh depending on response length and model. For a user who replaces many searches with fewer, longer AI interactions, the net energy difference may be modest. For developers and power users running dozens of complex AI queries daily, the comparison isn't really relevant — the use cases aren't interchangeable. What matters is the absolute trajectory: AI inference energy demand is growing faster than search ever did.

The AI Carbon Footprint Calculator at UntangleTools lets you enter token volume, select the model (GPT-4, Claude Opus, Claude Haiku, Mistral, Gemini, and others), choose your country grid or a cloud infrastructure scenario, and optionally apply time-of-day grid variability. It outputs gCO₂, energy in watt-hours, a min–max uncertainty range, and real-world comparisons (Google searches, km driven, LED hours). For research contexts, ML CO2 Impact (mlco2.github.io) provides academic-grade estimates based on hardware profiles and training vs. inference breakdowns.

For most individuals at current usage levels, no. A week of 700,000 tokens at roughly 685 grams CO₂ is real but small relative to transportation, heating, and food — which each account for hundreds of kilograms annually. The practical threshold where AI starts to matter in a personal sustainability accounting is roughly 1,000,000 tokens per week consistently on large models. For enterprises running billions of tokens per month, AI inference is already a material part of the emissions profile and belongs in sustainability reporting. The most impactful personal action remains model tier selection: choosing the right-sized model for each task.

Editorial Standards

Our content is created by experts and reviewed for technical accuracy. We follow strict editorial guidelines to ensure quality.

Learn more about our standards

Contact Information

UntangleTools
support@untangletools.com

Last Updated

April 6, 2026