Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Evolution is selftune’s core capability: automatically improving skill descriptions based on real usage evidence. The pipeline generates candidate improvements, validates them against eval sets, and deploys the best one — or rejects all candidates if none improve.

The evolution pipeline

Pre-Gates → Extract Patterns → Generate Proposals → Validate → Pareto Select → Deploy (or Reject)

1. Pre-gates

Constitutional checks run before any LLM calls:
  • Size guard — rejects descriptions that are too long
  • XML rejection — blocks malformed output
  • Unbounded broadening guard — prevents descriptions from becoming too generic
  • Anchor preservation — ensures core trigger terms aren’t removed

2. Pattern extraction

selftune analyzes real session data to find:
  • Queries that should have triggered the skill but didn’t (false negatives)
  • Language patterns users actually use vs. what the description covers
  • Contextual variations that the current description misses

3. Proposal generation

Multiple candidate descriptions are generated in parallel. Each candidate attempts to cover more real-world query patterns while preserving existing trigger accuracy.

4. Validation

Every candidate is validated against the skill’s eval set — a collection of test queries with expected outcomes. A candidate that improves implicit triggers but breaks explicit ones is rejected.

5. Pareto selection

When multiple candidates pass validation, selftune uses Pareto multi-dimensional selection across invocation types (explicit, implicit, contextual, negative) to pick the best one. No single dimension is sacrificed for another.

6. Deploy or reject

If a candidate passes all gates, it’s deployed to the SKILL.md file with a backup (.bak). If no candidate improves enough, all are rejected and the pipeline can retry with different parameters.

Running evolution

Basic evolution:
selftune evolve --skill my-skill --skill-path path/to/SKILL.md
Dry run (see what would change without deploying):
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run
With Pareto multi-candidate selection:
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --pareto --candidates 5
Cost-optimized mode (Haiku for iteration, Sonnet for final gate):
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --cheap-loop

Body evolution

Evolve the full skill body (routing tables, workflow instructions), not just the description:
selftune evolve body --skill my-skill --skill-path path/to/SKILL.md --target body
Body evolution uses a teacher-student model with 3-gate validation (structural, trigger, quality). It supports the same validation strategies as selftune evolve, including replay-backed validation when the target host supports runtime replay (Claude Code, Codex, or OpenCode):
selftune evolve body --skill my-skill --skill-path path/to/SKILL.md --target routing --validation-mode replay

Rollback

If an evolution causes a regression, roll it back:
selftune evolve rollback --skill my-skill --skill-path path/to/SKILL.md
Or rollback a specific proposal:
selftune evolve rollback --skill my-skill --skill-path path/to/SKILL.md --proposal-id abc123

Model configuration

Control which models are used at each stage:
FlagStageDefault behavior
--proposal-modelGenerating candidatesAgent’s default model
--validation-modelEvaluating candidatesAgent’s default model
--gate-modelFinal approval gateAgent’s default model
--gate-effortGate thoroughnessStandard
--adaptive-gateAuto-escalate for high-stakesDisabled

Evolution guards

selftune installs a hook that prevents unreviewed SKILL.md changes during active evolutions. This prevents conflicts between manual edits and the evolution pipeline.
See the CLI reference for selftune evolve for the full command syntax.