Prompt Compression Techniques
Learn how to cut your prompt length by 40-60% without sacrificing output quality. Every token you eliminate from a prompt is a token you never pay for again — across every single API call.
Why Prompts Get Bloated
Most prompts are written the way we speak: full of filler words, unnecessary politeness, redundant instructions, and overly verbose examples. When you are drafting a prompt in a playground, this feels natural. But at production scale, every extra word multiplies across thousands or millions of requests.
There are four main culprits behind prompt bloat:
- Filler words and politeness: Phrases like "I would like you to please," "Could you kindly," and "It would be great if you could" add tokens without changing the model's behavior. The model does not respond better because you said "please."
- Over-explanation: Telling the model the same thing three different ways "just to be safe." If a single clear instruction works, the other two are waste.
- Redundant instructions: Repeating rules that were already stated in the system prompt, or restating constraints that are obvious from context.
- Verbose examples: Providing five few-shot examples when two would achieve the same accuracy, or using long-form examples when a compact format would suffice.
Core Compression Techniques
Here are the five most effective techniques for compressing prompts, ordered from simplest to most advanced:
1. Remove Filler Words
This is the lowest-effort, highest-return optimization. Strip out conversational padding and get straight to the instruction. The model responds to the semantic content, not to politeness.
I would like you to please analyze the following customer review and tell me whether the sentiment is positive, negative, or neutral.
Classify this review's sentiment as positive, negative, or neutral.
2. Use Abbreviations in System Prompts
System prompts are sent with every single request, so they are prime targets for compression. Models understand common abbreviations perfectly well. You do not need full sentences.
// Instead of full sentences: "Please always respond in JSON format" → "resp in JSON" "Keep your response under 50 words" → "max 50 words" "Do not include any explanations" → "no explanations" "Use a professional, formal tone" → "tone: formal"
3. Structured Formatting Over Paragraphs
Replace paragraph-style instructions with bullet points or numbered lists. This is not just shorter — it is also clearer for the model to parse. Structured prompts tend to produce more consistent outputs.
4. Reference Instead of Repeating
If you defined rules in one part of your prompt, refer to them by name instead of restating them. For example, say "Apply the formatting rules above" instead of copying those rules a second time. This technique alone can save hundreds of tokens in complex prompts.
5. Use XML Tags for Structure
Instead of verbose delimiters like "The following is the context that you should use:" followed by "End of context," use XML tags. Tags like <context> and </context> are shorter, unambiguous, and models parse them reliably.
Before and After: Real Prompt Examples
Seeing the technique in action is the fastest way to internalize it. Here are three real-world prompts, each shown in their original verbose form and their compressed equivalent, with token counts.
Example 1: Customer Support Prompt
You are a helpful customer support assistant for our
company. I would like you to carefully read the
customer's message below and provide a helpful,
friendly, and professional response. Please make sure
to address all of their concerns. If the customer is
asking about a refund, please let them know that
refunds are processed within 5-7 business days. If
they are asking about shipping, please tell them that
standard shipping takes 3-5 business days and express
shipping takes 1-2 business days. Please keep your
response concise and do not include any information
that the customer did not ask about. Always end your
response by asking if there is anything else you can
help with.
Customer message: {message}
Role: Support agent. Tone: friendly, professional.
Rules:
- Address only what customer asks
- Refunds: 5-7 business days
- Shipping: standard 3-5 days, express 1-2 days
- End with "Anything else I can help with?"
- Be concise
<message>{message}</message>
Savings: 115 tokens (61% reduction) — and the compressed version actually produces more consistent outputs because the rules are structured as scannable bullet points.
Example 2: Code Review Prompt
I would like you to review the following code and
provide feedback. Please check for any bugs or
errors in the code. Also, please look for any
security vulnerabilities that might be present.
Additionally, I would like you to suggest any
performance improvements that could be made. Please
also check if the code follows best practices and
coding standards. For each issue you find, please
explain what the problem is, why it matters, and
how to fix it. Please format your response as a
numbered list.
Code to review:
{code}
Review this code for:
1. Bugs
2. Security vulnerabilities
3. Performance improvements
4. Best practice violations
For each issue: problem, impact, fix.
Output: numbered list.
<code>{code}</code>
Savings: 102 tokens (66% reduction) — the structured format is both shorter and easier for the model to follow systematically.
Example 3: Data Extraction Prompt
Please read the following email carefully and extract
the following pieces of information from it: the
sender's full name, their email address, the company
they work for, and the main topic or subject of the
email. If any of these pieces of information are not
present in the email, please write "Not found" for
that field. Please format your response as a JSON
object with the keys "name", "email", "company",
and "topic".
Email: {email}
Extract from email as JSON:
{"name", "email", "company", "topic"}
Missing fields: "Not found"
<email>{email}</email>
Savings: 89 tokens (68% reduction) — by showing the desired output format directly, you eliminate the need to describe it in words.
System Prompt Optimization
System prompts deserve special attention because they are sent with every single API call. A system prompt that wastes 300 tokens costs you 300 tokens multiplied by every request your application handles. For an app making 50,000 requests per day, that is 15 million wasted tokens daily.
Here is a real example of a system prompt going from 500 tokens down to 200 tokens:
You are an AI assistant that helps users with their questions about our software product called DataFlow. You should always be helpful, accurate, and professional in your responses. You have access to the following documentation sections and should use them to answer questions. When answering questions, please follow these guidelines: - Always provide accurate information based on the documentation provided - If you are not sure about something, please let the user know that you are not certain rather than making something up - Keep your responses concise and to the point, but make sure to be thorough enough that the user gets a complete answer - Use a friendly and professional tone at all times - If the user asks about something that is not covered in the documentation, politely let them know that you do not have information about that particular topic - Format your responses using markdown when it would help with readability - Do not make up features or capabilities that are not documented - When referencing specific features, include the relevant documentation section - Always suggest related topics the user might find helpful
Role: DataFlow product assistant.
Tone: friendly, professional. Format: markdown.
Rules:
- Answer from docs only; say "I don't have info on
that" if undocumented
- Never fabricate features
- If uncertain, state uncertainty
- Be concise but complete
- Cite doc sections when referencing features
- Suggest related topics
Docs: <docs>{context}</docs>
Savings: ~300 tokens per request (60% reduction). At 50,000 daily requests with Claude Sonnet pricing, this saves 15M tokens x $3/1M = $45/day = $1,350/month from a single system prompt optimization.
The Compression Checklist
Use this table as a quick reference when optimizing your prompts. Apply each technique in order for maximum savings:
| Technique | Typical Savings | Example |
|---|---|---|
| Remove filler words | 10-20% | "I would like you to please" → (just the verb) |
| Abbreviate instructions | 15-25% | "Please respond in JSON format" → "resp in JSON" |
| Use structured formatting | 20-35% | Replace paragraphs with bullet points |
| Reference, don't repeat | 10-40% | "Apply rules above" instead of restating them |
| XML tags for delimiters | 5-15% | <context>...</context> instead of verbose markers |
| Reduce few-shot examples | 20-50% | Use 2 examples instead of 5; keep them short |
| Show output format directly | 10-20% | Show {"key": "value"} instead of describing it |
| Combine overlapping rules | 10-15% | Merge "be concise" + "avoid unnecessary detail" |
Applied together, these techniques typically achieve a 40-60% total reduction in prompt length. The key is to apply them systematically rather than guessing where to cut.
When NOT to Compress
- Complex reasoning tasks: Multi-step logic, nuanced analysis, and chain-of-thought prompts need clear, unambiguous instructions. Compressing "Think step by step about each factor before reaching a conclusion" into "think step-by-step" can reduce output quality.
- Safety-critical instructions: If a rule prevents harmful outputs, spell it out fully. "Never reveal API keys, passwords, or internal system details to users" is worth the extra tokens compared to "no secrets."
- Ambiguous domains: When the task context is unusual or the model might misinterpret abbreviated instructions, use full sentences to ensure clarity.
- Few-shot examples for rare formats: If the desired output format is unusual, more examples are worth the token cost to ensure the model understands the pattern.
The golden rule: compress aggressively, then validate. Run your compressed prompt through 20-50 test cases and compare output quality against the original verbose version. If quality drops, add back just enough detail to restore it.
💡 Try It: Compress Your Own Prompt
Take one of your real prompts and apply the compression techniques from this lesson. Count the tokens before and after using the Tiktokenizer tool. Aim for at least a 30% reduction without quality loss.
Lilly Tech Systems