Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

Usage

selftune improve --skill <name> --skill-path <path> [options]
selftune evolve --skill <name> --skill-path <path> [options]
Runs the full evolution loop: generates a candidate improvement, validates it against an eval set, and deploys it if it meets the quality bar. If you want the simplified lifecycle surface, selftune improve is the front door. It maps --scope auto|description|routing|body|package onto evolve, evolve body --target ..., or bounded package search through search-run. auto still defaults to description-surface evolution unless you choose a broader scope explicitly. --confidence no longer skips validation by itself. selftune always measures the proposal against replay or judge validation first, then uses the confidence value as review metadata for warnings and adaptive-gate risk escalation.

Options

FlagTypeDefaultDescription
--skillstringRequired. Skill name to evolve
--skill-pathstringRequired. Path to the skill’s SKILL.md
--scopeauto | description | routing | body | packageautoAlias-only scope selector for selftune improve
--eval-setstringAuto-generatedUse a pre-built eval set instead of building one from logs
--agentclaude | codex | opencodeAuto-detectedAgent runtime to target
--dry-runbooleanfalseValidate proposals without deploying
--confidencenumber0.6Low-confidence review threshold
--validation-modeauto | replay | judgeautoValidation strategy to use
--max-iterationsnumber3Maximum evolution iterations
--paretobooleantrueKeep Pareto multi-candidate selection enabled
--candidatesnumber3Candidate count when Pareto mode is enabled
--token-efficiencybooleanfalseScore proposals with token-efficiency weighting
--with-baselinebooleanfalseGate deploys on a no-skill baseline lift
--validation-modelstringhaikuModel for trigger-check validation calls
--cheap-loopbooleantrueUse cheaper models in the inner loop and a stronger gate
--full-modelbooleanfalseUse one model for all stages
--gate-modelstringsonnetModel for the final validation gate
--gate-effortstringOverride the final-gate thinking effort
--adaptive-gatebooleanfalseEscalate risky gate checks to opus with higher effort
--proposal-modelstringAgent defaultOverride the proposal-generation model
--sync-firstbooleanfalseSync source-truth telemetry before evolving
--sync-forcebooleanfalseForce a full rescan during --sync-first
--verbosebooleanfalsePrint detailed progress output
--helpbooleanfalseShow command help

Validation modes

selftune evolve validates every proposal before deploying it. The mode controls how validation runs:
ModeBehavior
autoUses replay validation if available, falls back to judge
replayRequires replay validation; fails if unavailable
judgeUses an LLM judge to score the proposal

Automatic replay validation

When --validation-mode is auto or replay and the target agent supports runtime replay, selftune automatically constructs a replay fixture from the skill’s SKILL.md. Today that includes Claude Code, Codex, and OpenCode. No --replay-fixture flag is needed. Replay validation stages the candidate skill content into a temporary local registry and observes the runtime’s actual routing decision for each eval query. Description evolution stages the proposed description; routing evolution stages the proposed ## Workflow Routing section; body evolution stages the full candidate body while preserving the original frontmatter and title. If real host/runtime replay is unavailable, auto falls back to judge validation and records a validation_fallback_reason in the audit/evidence trail. replay mode exits with REPLAY_UNAVAILABLE instead of silently downgrading to fixture simulation.
# auto mode uses replay automatically on supported hosts
selftune evolve --skill my-skill --skill-path path/to/SKILL.md

# Force judge-only (skip replay)
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --validation-mode judge

How it works

  1. Proposal generation — produces a candidate update to the skill description, routing, or body
  2. Eval construction — builds an eval set from session history (or synthetic data for cold-start skills)
  3. Validation — scores the proposal against the eval set using replay or judge
  4. Pareto check — accepts the proposal only if it improves pass rate without regressing other signals
  5. Deployment — writes the accepted proposal to the skill and records it in the audit log

Examples

# Simplified lifecycle alias
selftune improve --skill my-skill --skill-path path/to/SKILL.md --scope description --dry-run --validation-mode replay

# Bounded package search through the primary lifecycle alias
selftune improve --skill my-skill --skill-path path/to/SKILL.md --scope package --eval-set path/to/evals.json

# Keep search review-only while still using package scope
selftune improve --skill my-skill --skill-path path/to/SKILL.md --scope package --dry-run --eval-set path/to/evals.json

# Standard evolution run
selftune evolve --skill my-skill --skill-path path/to/SKILL.md

# Evolve with fresh data
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --sync-first

# Force judge validation
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --validation-mode judge

# Route to routing-surface evolution
selftune improve --skill my-skill --skill-path path/to/SKILL.md --scope routing --dry-run --validation-mode replay

# More iterations for stubborn skills
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --max-iterations 5 --verbose

Troubleshooting

Replay unavailable in replay mode:
Error: Replay validation requested but no replay fixture or runner is available.
Switch to --validation-mode auto to allow judge fallback, or verify that the skill has a valid SKILL.md at the expected path. No eval data: Run selftune sync to ingest recent session data before evolving, or use a cold-start eval set.