Documentation Index
Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Evolution is selftune’s core capability: automatically improving skill descriptions based on real usage evidence. The pipeline generates candidate improvements, validates them against eval sets, and deploys the best one — or rejects all candidates if none improve.The evolution pipeline
1. Pre-gates
Constitutional checks run before any LLM calls:- Size guard — rejects descriptions that are too long
- XML rejection — blocks malformed output
- Unbounded broadening guard — prevents descriptions from becoming too generic
- Anchor preservation — ensures core trigger terms aren’t removed
2. Pattern extraction
selftune analyzes real session data to find:- Queries that should have triggered the skill but didn’t (false negatives)
- Language patterns users actually use vs. what the description covers
- Contextual variations that the current description misses
3. Proposal generation
Multiple candidate descriptions are generated in parallel. Each candidate attempts to cover more real-world query patterns while preserving existing trigger accuracy.4. Validation
Every candidate is validated against the skill’s eval set — a collection of test queries with expected outcomes. A candidate that improves implicit triggers but breaks explicit ones is rejected.5. Pareto selection
When multiple candidates pass validation, selftune uses Pareto multi-dimensional selection across invocation types (explicit, implicit, contextual, negative) to pick the best one. No single dimension is sacrificed for another.6. Deploy or reject
If a candidate passes all gates, it’s deployed to the SKILL.md file with a backup (.bak). If no candidate improves enough, all are rejected and the pipeline can retry with different parameters.
Running evolution
Basic evolution:Body evolution
Evolve the full skill body (routing tables, workflow instructions), not just the description:selftune evolve, including replay-backed validation when the target host supports runtime replay (Claude Code, Codex, or OpenCode):
Rollback
If an evolution causes a regression, roll it back:Model configuration
Control which models are used at each stage:| Flag | Stage | Default behavior |
|---|---|---|
--proposal-model | Generating candidates | Agent’s default model |
--validation-model | Evaluating candidates | Agent’s default model |
--gate-model | Final approval gate | Agent’s default model |
--gate-effort | Gate thoroughness | Standard |
--adaptive-gate | Auto-escalate for high-stakes | Disabled |
Evolution guards
See the CLI reference forselftune evolve for the full command syntax.