Monitoring - selftune

Overview

After an evolution deploys a new skill description, selftune monitors for regressions. If the new description performs worse than the old one, selftune can automatically roll back.

How monitoring works

selftune watch uses a sliding window of post-deploy sessions to compare against the pre-deploy baseline:

Baseline capture — records pass rates before the evolution deploys
Post-deploy tracking — monitors new sessions after deployment
Regression detection — compares post-deploy metrics against the baseline
Auto-rollback — if regression confidence is strong enough, reverts to the backup

selftune watch --skill my-skill --skill-path path/to/SKILL.md

With auto-rollback enabled:

selftune watch --skill my-skill --skill-path path/to/SKILL.md --auto-rollback

Activation rules

selftune includes built-in activation rules that trigger automatically:

Rule	Condition	Action
`post-session-diagnostic`	More than 2 unmatched queries in a session	Suggests `selftune last`
`grading-threshold-breach`	Session pass rate below 60%	Suggests `selftune evolve`
`stale-evolution`	No evolution in 7+ days with pending false negatives	Suggests evolve
`regression-detected`	Monitoring detects regression	Suggests rollback

Rules fire at most once per session to avoid noise.

Orchestrate loop

For fully autonomous operation, selftune run runs the complete loop:

sync → grade → evolve → watch

selftune run

In continuous mode:

selftune run --loop --loop-interval 3600

See the orchestrate command reference for all options.

Dashboard monitoring

The local dashboard shows real-time skill health with SSE live updates:

selftune dashboard

The dashboard displays:

Per-skill pass rates over time
Evolution history and outcomes
Missed queries and false negatives
Orchestrate run summaries

Documentation Index

​Overview

​How monitoring works

​Activation rules

​Orchestrate loop

​Dashboard monitoring

Overview

How monitoring works

Activation rules

Orchestrate loop

Dashboard monitoring