How to Reduce LLM Token Usage

Reduce token usage with patterns that cut context, shorten answers, and prevent runaway prompts.

The problem

Token usage grows when your app sends more context than needed or when agent loops keep appending new text.

The hidden places tokens multiply

Re-sending full chat history every turn
Tool results that are too verbose
“Self-reflection” steps that repeat similar work

Cost breakdown: tokens → usage

Tokens are not just “words”: every prompt, every tool call, and every retry adds billed input/output tokens.

A real example scenario

A content generator includes 20 past paragraphs each time it drafts a new post. Switching to summaries and chunk selection reduces input tokens while keeping quality.

Optimization ideas (from fastest to safest)

Compress context: summaries, retrieval, and selective quoting.
Control outputs: set max output tokens and use stop sequences.
Add guardrails for agents: cap depth, retries, and tool calls.

Quick checklist

Send less context per turn
Shorten tool outputs
Enforce limits in code and budgets in monitoring

Estimate your token spend