INDEX
- Introduction
- Requirements
- The problem
- Solution 1: clean noise before prompting
- Solution 2: operational summary
- Solution 3: compress memory with Caveman
- Solution 4: task-specific templates
- Solution 5: measure and decide
- Extras
- Full workflow
1. Introduction
Saving tokens is not only about shorter prompts. The biggest saving often happens before the prompt: not sending junk to the model.
In this lab we clean repeated text, create operational summaries, compress repeated project memory and only send the expensive model what it really needs.
2. Requirements
We need a long text, a clear target, a local cleanup step, a paid model for the final answer and a way to measure tokens before and after.
3. The problem
We will separate useful information from noise, create a smaller input, compress repeated memory when relevant and measure the result.
4. Solution 1: clean noise before prompting
Rules are often enough for repeated headers, signatures, duplicated lines or useless debug traces.
def clean_lines(text):
lines = []
seen = set()
for line in text.splitlines():
normalized = line.strip()
if not normalized or "DEBUG" in normalized:
continue
if normalized in seen:
continue
seen.add(normalized)
lines.append(line)
return "\n".join(lines)
5. Solution 2: operational summary
Use a small, stable structure:
Goal:
Relevant data:
Constraints:
Recent changes:
Concrete question:
This keeps reasoning material and removes filler.
6. Solution 3: compress memory with Caveman
Caveman includes caveman-compress, a tool for compressing memory files such as CLAUDE.md, todos or preferences so every session loads fewer tokens. Its README explains that it keeps a human-readable .original.md backup and preserves code blocks, URLs, paths, commands, technical terms, headings, tables, dates and numeric values.
This is useful when the same project memory is loaded repeatedly.
7. Solution 4: task-specific templates
Use different templates for tickets, meetings and code review. Each workflow has different fields that matter.
8. Solution 5: measure and decide
Track tokens before, tokens after, answer quality and human review time. Saving tokens is only useful if the review cost does not explode.
9. Extras
Use local AI as a filter, keep cleaned context, and avoid summarizing away dates, numbers or constraints.
10. Full workflow
Clean, summarize, compress recurring memory, call the paid model only with relevant context, and measure.