Documentation Index
Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt
Use this file to discover all available pages before exploring further.
The core problem
You can’t know how good a skill is until real people use it. And you can’t improve it without data on where it fails. This chicken-and-egg problem is why most skills ship once and never get better. If you are still in the pre-ship stage, start with Create, Test, and Deploy a Skill. This guide assumes you already have a skill in the loop. selftune breaks this cycle by observing real sessions, detecting failures, and proposing improvements — continuously.The three stages of a skill
Most skills evolve through a predictable progression:Stage 1: Capture the workflow
You complete a task with an AI agent. Along the way, you make corrections, provide context, and steer the agent toward the right approach. The reusable pattern in that interaction is the seed of a skill.Stage 2: Test and harden
Run the skill against varied inputs. Use selftune to generate eval sets from your real usage, then run evolution to improve the description:Stage 3: Ship and observe
Once the skill passes your eval set reliably, ship it. Then let selftune observe how others use it:The selftune feedback loop
1. Observe
selftune hooks capture every user query and whether each skill triggered. This happens automatically — no manual logging required.2. Detect
Grading identifies three types of problems:- Missed triggers — the skill should have fired but didn’t
- Process failures — the skill fired but the agent didn’t follow instructions
- Quality issues — the skill produced a result, but it wasn’t good
3. Evolve
Evolution proposes improved descriptions based on the detected failures. Multiple candidates are generated and validated against your eval set:4. Watch
After deploying an evolution, selftune monitors for regressions. If the new description causes problems, it rolls back automatically:When to iterate manually vs. automatically
| Situation | Approach |
|---|---|
| New skill, no usage data | Manual: write description, run synthetic evals |
| Skill works for you, shipping to others | Semi-auto: generate evals from your sessions, run evolution |
| Skill is live with users | Automatic: selftune run handles the full loop |
| Major workflow change | Manual: update SKILL.md body, re-baseline, then resume auto |
Moving logic from skills to code
As you iterate, you’ll notice parts of your skill that the agent does the same way every time. These are candidates for extraction into scripts:Real-world example
A skill author builds a “create presentation” skill. Initial description:Next steps
Evolution Reference
Full evolution pipeline documentation.
Structuring Skills
Organize skills that scale.
Managing Context
Keep skills lean as they grow.
Testing Triggers
Verify skills fire correctly.