Managing Context

The problem

You write a skill. It works great on the first message. By the fifth message, the agent has forgotten half the instructions because the context window is full of conversation history, tool outputs, and other loaded skills. This is the most common frustration skill authors report: the skill “gets forgotten after five minutes.” The solution isn’t writing more instructions — it’s writing less, and loading them at the right time.

Progressive disclosure in practice

The agent skills spec defines three tiers of loading:

Tier	What	When	Token cost
Catalog	Name + description	Session start	~50-100 per skill
Instructions	Full SKILL.md body	When skill activates	<5000 recommended
Resources	Scripts, references, assets	When referenced	Varies

The key insight: every token you load upfront is a token that competes with the user’s conversation later. Front-load only what the agent needs to route correctly. Defer everything else.

Tier 1: Keep descriptions tight

Your description carries the entire burden of triggering. But it only needs to be a few sentences — the hard cap is 1024 characters.

# Too much in the description (wastes catalog space):
description: >
  Build financial models using JSON-based cell structures with support
  for top-down and bottom-up approaches, DCF analysis, LBO modeling,
  comparable company analysis, and merger models. Supports Excel export,
  PDF generation, and chart rendering. Uses progressive disclosure to
  manage model complexity. Cells are defined as...

# Right amount:
description: >
  Build and analyze financial models — forecasts, DCFs, LBOs, comps,
  and merger models. USE WHEN financial model, forecast, valuation,
  spreadsheet, pro forma, assumptions, sensitivity analysis.

Tier 2: SKILL.md as a router, not a manual

If your SKILL.md is over 300 lines, you’re probably loading too much at activation. Use the router pattern to keep it lean:

# The agent loads this (~150 lines) at activation
## Routing
| Intent | Workflow |
|--------|----------|
| Create | Read `workflows/create.md` |
| Edit   | Read `workflows/edit.md` |

Tier 3: Load references only when needed

Don’t pre-load reference material. Tell the agent when to fetch it:

## Step 3: Validate the schema

If validation fails, read `references/error-codes.md` for the
error code lookup table. Do not read this file unless validation fails.

The “do not read unless” instruction is important — agents tend to be eager about reading files they know about.

CLI feedback loops

The most effective context management technique is moving intelligence out of the skill and into deterministic code that gives the agent feedback in real time. Instead of documenting every error path in your skill:

## Error handling (DON'T DO THIS)
If the build fails with error E001, run fix-schema.
If it fails with E002, check that all required fields exist.
If it fails with E003, verify the date format is ISO 8601.
[... 50 more lines of error handling ...]

Build a CLI that returns actionable error messages:

$ my-tool build --input model.json
Error E002: Missing required fields: revenue, cogs
Fix: Add these fields to the assumptions section, then retry.
Hint: Run `my-tool scaffold --template dcf` for a complete template.

The agent reads the error, follows the instructions, and never needed those 50 lines of error handling in context. This approach has three advantages:

Smaller context footprint — error handling lives in code, not in the skill
More reliable — deterministic code doesn’t hallucinate error handling
Self-correcting — the agent gets immediate, specific feedback

What to keep in SKILL.md vs. code

In SKILL.md	In scripts/CLI
When to use the skill	How to validate inputs
High-level workflow steps	Data transformation logic
Judgment calls (which approach to use)	Error messages with fix instructions
User-facing output formatting	File parsing and generation

A good rule: if the agent does the same thing every time regardless of context, it should be in code.

Practical token budgets

For a skill that coexists with conversation:

Component	Budget	Notes
Description (always loaded)	50-100 tokens	Part of catalog
SKILL.md body	1000-3000 tokens	Loaded at activation
Active workflow	500-1500 tokens	One workflow at a time
Reference (if needed)	500-1000 tokens	Only when referenced
Total peak	~2000-5000 tokens

Compare that to a monolithic 800-line SKILL.md that loads 8000+ tokens at activation. The difference matters when the agent is 20 messages into a conversation.

Selftune’s role

selftune helps you stay lean by detecting when skills have context problems:

Grading tier 2 (process) catches when the agent deviates from skill instructions — often a sign that instructions were pushed out of context
Evolution keeps descriptions optimally sized — not too broad, not too narrow
Session analysis shows you which parts of your skill the agent actually uses, so you can move unused sections to references

# See which skills are having execution problems
selftune status

# Check if a skill's instructions are being followed
selftune grade --skill my-skill

The problem

Progressive disclosure in practice

Tier 1: Keep descriptions tight

Tier 2: SKILL.md as a router, not a manual

Tier 3: Load references only when needed

CLI feedback loops

What to keep in SKILL.md vs. code

Practical token budgets

Selftune’s role

Next steps

Structuring Skills

Iterating with selftune

Documentation Index

​The problem

​Progressive disclosure in practice

​Tier 1: Keep descriptions tight

​Tier 2: SKILL.md as a router, not a manual

​Tier 3: Load references only when needed

​CLI feedback loops

​What to keep in SKILL.md vs. code

​Practical token budgets

​Selftune’s role

​Next steps

Structuring Skills

Iterating with selftune

The problem

Progressive disclosure in practice

Tier 1: Keep descriptions tight

Tier 2: SKILL.md as a router, not a manual

Tier 3: Load references only when needed

CLI feedback loops

What to keep in SKILL.md vs. code

Practical token budgets

Selftune’s role

Next steps