Documentation Index
Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Evals (evaluation sets) are collections of test queries annotated with expected behavior. They’re the ground truth that evolution validates against — no candidate description is deployed unless it passes the eval set.
Generating evals
Generate evals from real usage logs:
selftune eval generate --skill my-skill
Generate synthetic evals from SKILL.md content (useful for new skills):
selftune eval generate --skill my-skill --synthetic --skill-path path/to/SKILL.md
Options
| Flag | Description |
|---|
--max N | Maximum number of eval entries to generate |
--seed N | Random seed for reproducible generation |
--output PATH | Write eval set to a specific file |
--list-skills | List all skills with available data |
--stats | Show eval generation statistics |
Eval structure
Each eval entry contains:
- Query — the user input to test
- Expected skill — which skill should trigger (or none)
- Invocation type — explicit, implicit, contextual, or negative
- Expected outcome — pass or fail
Unit tests
Write deterministic unit tests for skill triggers:
selftune eval unit-test --skill my-skill --tests path/to/tests.json
Generate unit tests automatically:
selftune eval unit-test --skill my-skill --generate
Run unit tests with a live agent:
selftune eval unit-test --skill my-skill --tests path/to/tests.json --run-agent
Composability analysis
Check how a skill interacts with other skills in the same agent:
selftune eval composability --skill my-skill
This analyzes a sliding window of sessions to detect:
- Skills that compete for the same queries
- Skills that block or interfere with each other
- Multi-skill workflows that should be documented
Options
| Flag | Description |
|---|
--window N | Number of recent sessions to analyze |
--telemetry-log PATH | Use a specific telemetry log file |
Family overlap detection
For skill families that share a common prefix, detect overlap:
selftune eval family-overlap --prefix my-family-
Or specify skills explicitly:
selftune eval family-overlap --skills skill-a,skill-b,skill-c
Options
| Flag | Description |
|---|
--parent-skill NAME | Specify a parent skill for hierarchy analysis |
--min-overlap 0.3 | Minimum overlap threshold to report |
--min-shared 2 | Minimum shared queries to report |
Importing evals
Import evaluation data from external sources:
selftune eval import --dir path/to/data --skill my-skill --output path/to/eval-set.json
| Flag | Description |
|---|
--match-strategy exact|fuzzy | How to match queries to skills |