Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

The problem

You write a skill. It works great on the first message. By the fifth message, the agent has forgotten half the instructions because the context window is full of conversation history, tool outputs, and other loaded skills. This is the most common frustration skill authors report: the skill “gets forgotten after five minutes.” The solution isn’t writing more instructions — it’s writing less, and loading them at the right time.

Progressive disclosure in practice

The agent skills spec defines three tiers of loading:
TierWhatWhenToken cost
CatalogName + descriptionSession start~50-100 per skill
InstructionsFull SKILL.md bodyWhen skill activates<5000 recommended
ResourcesScripts, references, assetsWhen referencedVaries
The key insight: every token you load upfront is a token that competes with the user’s conversation later. Front-load only what the agent needs to route correctly. Defer everything else.

Tier 1: Keep descriptions tight

Your description carries the entire burden of triggering. But it only needs to be a few sentences — the hard cap is 1024 characters.
# Too much in the description (wastes catalog space):
description: >
  Build financial models using JSON-based cell structures with support
  for top-down and bottom-up approaches, DCF analysis, LBO modeling,
  comparable company analysis, and merger models. Supports Excel export,
  PDF generation, and chart rendering. Uses progressive disclosure to
  manage model complexity. Cells are defined as...

# Right amount:
description: >
  Build and analyze financial models — forecasts, DCFs, LBOs, comps,
  and merger models. USE WHEN financial model, forecast, valuation,
  spreadsheet, pro forma, assumptions, sensitivity analysis.

Tier 2: SKILL.md as a router, not a manual

If your SKILL.md is over 300 lines, you’re probably loading too much at activation. Use the router pattern to keep it lean:
# The agent loads this (~150 lines) at activation
## Routing
| Intent | Workflow |
|--------|----------|
| Create | Read `workflows/create.md` |
| Edit   | Read `workflows/edit.md` |

Tier 3: Load references only when needed

Don’t pre-load reference material. Tell the agent when to fetch it:
## Step 3: Validate the schema

If validation fails, read `references/error-codes.md` for the
error code lookup table. Do not read this file unless validation fails.
The “do not read unless” instruction is important — agents tend to be eager about reading files they know about.

CLI feedback loops

The most effective context management technique is moving intelligence out of the skill and into deterministic code that gives the agent feedback in real time. Instead of documenting every error path in your skill:
## Error handling (DON'T DO THIS)
If the build fails with error E001, run fix-schema.
If it fails with E002, check that all required fields exist.
If it fails with E003, verify the date format is ISO 8601.
[... 50 more lines of error handling ...]
Build a CLI that returns actionable error messages:
$ my-tool build --input model.json
Error E002: Missing required fields: revenue, cogs
Fix: Add these fields to the assumptions section, then retry.
Hint: Run `my-tool scaffold --template dcf` for a complete template.
The agent reads the error, follows the instructions, and never needed those 50 lines of error handling in context. This approach has three advantages:
  1. Smaller context footprint — error handling lives in code, not in the skill
  2. More reliable — deterministic code doesn’t hallucinate error handling
  3. Self-correcting — the agent gets immediate, specific feedback

What to keep in SKILL.md vs. code

In SKILL.mdIn scripts/CLI
When to use the skillHow to validate inputs
High-level workflow stepsData transformation logic
Judgment calls (which approach to use)Error messages with fix instructions
User-facing output formattingFile parsing and generation
A good rule: if the agent does the same thing every time regardless of context, it should be in code.

Practical token budgets

For a skill that coexists with conversation:
ComponentBudgetNotes
Description (always loaded)50-100 tokensPart of catalog
SKILL.md body1000-3000 tokensLoaded at activation
Active workflow500-1500 tokensOne workflow at a time
Reference (if needed)500-1000 tokensOnly when referenced
Total peak~2000-5000 tokens
Compare that to a monolithic 800-line SKILL.md that loads 8000+ tokens at activation. The difference matters when the agent is 20 messages into a conversation.

Selftune’s role

selftune helps you stay lean by detecting when skills have context problems:
  • Grading tier 2 (process) catches when the agent deviates from skill instructions — often a sign that instructions were pushed out of context
  • Evolution keeps descriptions optimally sized — not too broad, not too narrow
  • Session analysis shows you which parts of your skill the agent actually uses, so you can move unused sections to references
# See which skills are having execution problems
selftune status

# Check if a skill's instructions are being followed
selftune grade --skill my-skill

Next steps

Structuring Skills

Organize skills with routers and workflows.

Iterating with selftune

Use real usage data to improve your skills.