Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

Overview

selftune grades skill execution across three tiers, from basic trigger detection to output quality assessment.

The 3 tiers

Tier 1: Trigger

Did the skill fire at all? This is the most fundamental check. When a user query should have activated a skill, did it? Tier 1 catches false negatives — queries that match a skill’s intent but the skill stayed silent.

Tier 2: Process

Did the agent follow the right steps? Once a skill fires, did the agent execute the workflow correctly? This checks whether the agent followed the SKILL.md instructions, used the right tools, and completed the expected steps.

Tier 3: Quality

Was the output actually good? The final tier evaluates whether the end result met the user’s needs. A skill can trigger correctly and follow the right process but still produce a poor result.

Scoring

Each tier produces a score from 0.0 to 1.0. The overall grade is one of:
GradeMeaning
passSkill triggered correctly and executed well
partialSkill triggered but execution had issues
failSkill didn’t trigger or execution failed

Deterministic pre-gates

Before making any LLM calls, selftune runs deterministic pre-gate checks that resolve in under 20ms. These handle obvious cases without spending tokens:
  • Exact keyword matches
  • Known negative patterns
  • Previously graded identical queries
This makes grading fast and cost-effective — most queries resolve at the pre-gate level.

Running grades

See selftune grade for the full command reference. Grade a specific skill:
selftune grade --skill my-skill
Auto-grade with expectations:
selftune grade auto --skill my-skill --expectations "should handle implicit triggers"
Establish a baseline before evolution:
selftune grade baseline --skill my-skill --skill-path path/to/SKILL.md

Grade results

View grading history in the dashboard:
selftune dashboard
Or check the most recent session:
selftune last