Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

Start with the symptom

Most skill failures fall into one of five buckets:
SymptomUsually means
The skill does not fire when it shouldThe description is too narrow or too technical
The skill fires for the wrong promptsThe description is too broad or overlaps with another skill
The skill fires but ignores the workflowThe instructions are too vague, too long, or not deterministic enough
The skill runs but output quality is poorThe workflow is missing constraints, examples, or validation steps
The agent hangs while running the skillA bundled script is interactive or produces unusable output
Use the sections below to narrow the problem quickly.

1. The skill undertriggers

This is the most common failure mode. The skill exists, but users ask in natural language and the trigger never matches.

What to check

  • Does the description use developer terms instead of user terms?
  • Does it say what the skill does but not when to use it?
  • Are you missing common synonyms, adjacent phrases, or contextual wording?

What to run

selftune status
selftune eval generate --skill my-skill
selftune verify --skill-path path/to/my-skill
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run

Typical fix

Rewrite the description around intent:
- Process CSV files.
+ Analyze CSV and tabular data files. USE WHEN spreadsheet, sales data,
+ csv, tabular data, clean rows, summarize columns, chart this file.
The key move is not “add more keywords.” It is “describe the task the way users ask for it.” Read next: Writing Effective Descriptions

2. The skill overtriggers

This is the opposite failure. The skill catches prompts that belong to another skill or no skill at all.

What to check

  • Does the description include generic words like “analyze,” “build,” or “review” without domain boundaries?
  • Are two skills trying to own the same user intent?
  • Did a recent evolution increase recall by sacrificing precision?

What to run

selftune eval composability --skill my-skill
selftune grade auto --skill my-skill
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run
If the last evolution clearly widened the scope too far:
selftune evolve rollback --skill my-skill --skill-path path/to/SKILL.md

Typical fix

Narrow the trigger boundary:
- Analyze documents and summarize them.
+ Summarize customer support tickets and issue reports for engineering triage.
+ Use when the user wants bug reports, support threads, or issue trackers condensed.
You want specificity, not cleverness. Read next: Testing Skill Triggers

3. The skill fires but the agent ignores instructions

This is usually a Tier 2 problem: the trigger is fine, but the workflow execution is unreliable.

What to check

  • Is SKILL.md too long to stay in context?
  • Are the steps ambiguous or full of prose instead of explicit actions?
  • Are repeated mechanical tasks still written as markdown instead of scripts?
  • Are there too many branches packed into one file?

What to run

selftune grade auto --skill my-skill
selftune workflows --skill my-skill
selftune last

Typical fixes

  • Split one large SKILL.md into a router plus focused workflow files
  • Move deterministic logic into scripts
  • Add concrete command examples
  • Remove long explanatory text from the operational path
If the instructions are bloated, read Structuring Large Skills and Managing Context.

4. The workflow is followed, but the output is weak

This is a Tier 3 problem. The agent is activating the skill and roughly following instructions, but the result is still not good enough.

What to check

  • Does the workflow specify quality bars, output format, or acceptance criteria?
  • Are there examples of good output?
  • Is there a validation or review step before final output?

What to run

selftune grade auto --skill my-skill --expectations "high quality output"
selftune last

Typical fixes

  • Add a target structure for the final answer
  • Include one short example of a strong result
  • Add a deterministic validation step before completion
  • Separate “collect data” from “present result”
If the same output-quality fix is needed every time, it probably belongs in code instead of prose. Read next: Using Scripts in Skills

5. Scripts hang or behave unpredictably

This is almost always a script design problem, not a selftune problem.

What to check

  • Does the script prompt for input interactively?
  • Does it print verbose logs to stdout instead of returning structured data?
  • Does it require local dependencies that are not declared?
  • Does it mutate files or state without explicit flags?

Typical bad pattern

read -p "Enter filename: " filename
That will hang an agent session.

Typical fix

filename="${1:?Usage: validate.sh <filename>}"
Better script properties:
  • All inputs via flags, env vars, or stdin
  • --help explains usage
  • stdout contains parseable output
  • stderr contains diagnostics
  • exit codes are meaningful
Read next: Using Scripts in Skills

6. selftune does not seem to have enough evidence

Sometimes the skill itself is fine, but selftune cannot judge or evolve it well because the local evidence is thin.

What to check

  • Did you run selftune sync after recent sessions?
  • Is this a brand-new skill with no real usage history?
  • Did the local setup pass selftune doctor?

What to run

selftune doctor
selftune sync
selftune status
selftune verify --skill-path path/to/my-skill
selftune eval generate --skill my-skill --auto-synthetic --skill-path path/to/SKILL.md

Typical fix

  • Repair setup issues first
  • Use synthetic evals until real usage arrives
  • For draft packages, run verify first, then fill only the missing replay or baseline steps it asks for before publish
  • Establish a baseline before you judge evolution results
If selftune itself looks unhealthy, start with Quickstart and How It Works.

A practical triage order

When you are unsure where to start, use this order:
  1. selftune doctor
  2. selftune sync
  3. selftune status
  4. selftune grade auto --skill my-skill
  5. selftune eval generate --skill my-skill
That sequence tells you whether the problem is setup, missing evidence, triggering, workflow compliance, or output quality.

Escalate carefully

Do not jump straight to rewriting the whole skill. Usually the right fix is smaller:
  • Tighten the description
  • Split a workflow
  • Add a script
  • Roll back a bad evolution
  • Add stronger eval coverage
Large rewrites are expensive and often erase useful signal.

Next steps

Build Your First Skill

Follow the full authoring and optimization loop.

Testing Triggers

Diagnose undertriggering and overtriggering.

Managing Context

Fix workflows that get ignored mid-session.

Using Scripts

Make the mechanical parts deterministic.