Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

Usage

selftune eval <subcommand> [options]

Subcommands

Start with the lifecycle entrypoints, not the low-level stage commands:
selftune status
selftune verify --skill-path path/to/SKILL.md
# If verify returns a next_command, run it, then rerun verify.
selftune publish --skill-path path/to/SKILL.md
The eval subcommands are the most common supporting steps that verify asks you to fill in when a draft still needs evidence. If you want to drive the advanced draft-package loop manually, the stage-level sequence is still:
selftune eval generate --skill my-skill --skill-path path/to/SKILL.md
selftune eval unit-test --skill my-skill --generate --skill-path path/to/SKILL.md
selftune create replay --skill-path path/to/my-skill --mode package
selftune create baseline --skill-path path/to/my-skill --mode package
selftune verify --skill-path path/to/SKILL.md
selftune publish --skill-path path/to/SKILL.md
The dashboard, selftune status, and per-skill report all read the artifacts from this flow to show what is still missing before you trust a live deploy, and when the skill has already moved into watch mode.

generate

Generate eval sets from real usage or synthetically:
selftune eval generate --skill my-skill
selftune eval generate --skill my-skill --auto-synthetic --skill-path path/to/SKILL.md
selftune eval generate --skill my-skill --auto-synthetic --skill-path path/to/SKILL.md --agent opencode
selftune eval generate --skill my-skill --blend --skill-path path/to/SKILL.md
FlagDescription
--skill NAMESkill to generate evals for
--list-skillsList all skills with available data
--statsShow eval generation statistics
--max NMaximum entries per side to generate
--seed NRandom seed for reproducibility
--output PATHOutput file path
--no-negativesOmit negative eval entries
--no-taxonomySkip invocation_type classification
--skill-log PATHOverride the skill usage log source
--agent NAMERuntime agent for synthetic or blended eval generation (claude, codex, opencode, pi)
--query-log PATHOverride the query log source
--telemetry-log PATHOverride the telemetry log source
--syntheticGenerate from SKILL.md instead of real data
--auto-syntheticFall back to SKILL.md cold-start generation when trusted triggers do not exist
--blendMerge log-based evals with synthetic gap-fillers
--skill-path PATHPath to SKILL.md (required with --synthetic)
--model MODELOverride the synthetic-generation model
--helpShow command help
selftune eval generate --help now prints the exact generate-subcommand surface, including cold-start and blended eval flags. If Claude Code is rate-limited or you want to force a different runtime, use --agent opencode (or codex / pi) for --synthetic, --auto-synthetic, and --blend paths. Every successful generate run also mirrors a canonical copy to:
~/.selftune/eval-sets/<skill>.json
That canonical copy is what the local dashboard and selftune status use to decide whether a skill already has eval coverage. For new draft packages, the next steps after eval generate are usually rerunning verify or, if you are driving the advanced loop manually, continuing with create replay and create baseline.

unit-test

Run or generate deterministic unit tests:
selftune eval unit-test --skill my-skill --tests path/to/tests.json
selftune eval unit-test --skill my-skill --generate
selftune eval unit-test --skill my-skill --tests path/to/tests.json --run-agent
Generated test files live under:
~/.selftune/unit-tests/<skill>.json
After a run, selftune also stores the latest suite summary at:
~/.selftune/unit-tests/<skill>.last-run.json
That stored result feeds the draft-lifecycle readiness surfaces in the dashboard, skill report, and selftune status.

composability

Analyze cross-skill interactions:
selftune eval composability --skill my-skill [--window N] [--telemetry-log PATH]

family-overlap

Detect overlap within skill families:
selftune eval family-overlap --prefix my-family-
selftune eval family-overlap --skills a,b,c [--parent-skill NAME] [--min-overlap 0.3] [--min-shared 2]

import

Import evaluation data from external sources:
selftune eval import --dir PATH --skill NAME --output PATH [--match-strategy exact|fuzzy]
See Evals concepts for more on how evaluation sets work.