Documentation Index
Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt
Use this file to discover all available pages before exploring further.
Usage
selftune improve is the front
door. It maps --scope auto|description|routing|body|package onto evolve,
evolve body --target ..., or bounded package search through search-run.
auto still defaults to description-surface evolution unless you choose a
broader scope explicitly.
--confidence no longer skips validation by itself. selftune always measures
the proposal against replay or judge validation first, then uses the
confidence value as review metadata for warnings and adaptive-gate risk
escalation.
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--skill | string | — | Required. Skill name to evolve |
--skill-path | string | — | Required. Path to the skill’s SKILL.md |
--scope | auto | description | routing | body | package | auto | Alias-only scope selector for selftune improve |
--eval-set | string | Auto-generated | Use a pre-built eval set instead of building one from logs |
--agent | claude | codex | opencode | Auto-detected | Agent runtime to target |
--dry-run | boolean | false | Validate proposals without deploying |
--confidence | number | 0.6 | Low-confidence review threshold |
--validation-mode | auto | replay | judge | auto | Validation strategy to use |
--max-iterations | number | 3 | Maximum evolution iterations |
--pareto | boolean | true | Keep Pareto multi-candidate selection enabled |
--candidates | number | 3 | Candidate count when Pareto mode is enabled |
--token-efficiency | boolean | false | Score proposals with token-efficiency weighting |
--with-baseline | boolean | false | Gate deploys on a no-skill baseline lift |
--validation-model | string | haiku | Model for trigger-check validation calls |
--cheap-loop | boolean | true | Use cheaper models in the inner loop and a stronger gate |
--full-model | boolean | false | Use one model for all stages |
--gate-model | string | sonnet | Model for the final validation gate |
--gate-effort | string | — | Override the final-gate thinking effort |
--adaptive-gate | boolean | false | Escalate risky gate checks to opus with higher effort |
--proposal-model | string | Agent default | Override the proposal-generation model |
--sync-first | boolean | false | Sync source-truth telemetry before evolving |
--sync-force | boolean | false | Force a full rescan during --sync-first |
--verbose | boolean | false | Print detailed progress output |
--help | boolean | false | Show command help |
Validation modes
selftune evolve validates every proposal before deploying it. The mode controls how validation runs:
| Mode | Behavior |
|---|---|
auto | Uses replay validation if available, falls back to judge |
replay | Requires replay validation; fails if unavailable |
judge | Uses an LLM judge to score the proposal |
Automatic replay validation
When--validation-mode is auto or replay and the target agent supports runtime replay, selftune automatically constructs a replay fixture from the skill’s SKILL.md. Today that includes Claude Code, Codex, and OpenCode. No --replay-fixture flag is needed.
Replay validation stages the candidate skill content into a temporary local registry and observes the runtime’s actual routing decision for each eval query. Description evolution stages the proposed description; routing evolution stages the proposed ## Workflow Routing section; body evolution stages the full candidate body while preserving the original frontmatter and title.
If real host/runtime replay is unavailable, auto falls back to judge validation and records a validation_fallback_reason in the audit/evidence trail. replay mode exits with REPLAY_UNAVAILABLE instead of silently downgrading to fixture simulation.
How it works
- Proposal generation — produces a candidate update to the skill description, routing, or body
- Eval construction — builds an eval set from session history (or synthetic data for cold-start skills)
- Validation — scores the proposal against the eval set using replay or judge
- Pareto check — accepts the proposal only if it improves pass rate without regressing other signals
- Deployment — writes the accepted proposal to the skill and records it in the audit log
Examples
Troubleshooting
Replay unavailable in replay mode:--validation-mode auto to allow judge fallback, or verify that the skill has a valid SKILL.md at the expected path.
No eval data:
Run selftune sync to ingest recent session data before evolving, or use a cold-start eval set.