Troubleshooting Skills

Start with the symptom

Most skill failures fall into one of five buckets:

Symptom	Usually means
The skill does not fire when it should	The description is too narrow or too technical
The skill fires for the wrong prompts	The description is too broad or overlaps with another skill
The skill fires but ignores the workflow	The instructions are too vague, too long, or not deterministic enough
The skill runs but output quality is poor	The workflow is missing constraints, examples, or validation steps
The agent hangs while running the skill	A bundled script is interactive or produces unusable output

Use the sections below to narrow the problem quickly.

1. The skill undertriggers

This is the most common failure mode. The skill exists, but users ask in natural language and the trigger never matches.

What to check

Does the description use developer terms instead of user terms?
Does it say what the skill does but not when to use it?
Are you missing common synonyms, adjacent phrases, or contextual wording?

What to run

selftune status
selftune eval generate --skill my-skill
selftune verify --skill-path path/to/my-skill
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run

Typical fix

Rewrite the description around intent:

- Process CSV files.
+ Analyze CSV and tabular data files. USE WHEN spreadsheet, sales data,
+ csv, tabular data, clean rows, summarize columns, chart this file.

The key move is not “add more keywords.” It is “describe the task the way users ask for it.” Read next: Writing Effective Descriptions

2. The skill overtriggers

This is the opposite failure. The skill catches prompts that belong to another skill or no skill at all.

What to check

Does the description include generic words like “analyze,” “build,” or “review” without domain boundaries?
Are two skills trying to own the same user intent?
Did a recent evolution increase recall by sacrificing precision?

What to run

selftune eval composability --skill my-skill
selftune grade auto --skill my-skill
selftune evolve --skill my-skill --skill-path path/to/SKILL.md --dry-run

If the last evolution clearly widened the scope too far:

selftune evolve rollback --skill my-skill --skill-path path/to/SKILL.md

Typical fix

Narrow the trigger boundary:

- Analyze documents and summarize them.
+ Summarize customer support tickets and issue reports for engineering triage.
+ Use when the user wants bug reports, support threads, or issue trackers condensed.

You want specificity, not cleverness. Read next: Testing Skill Triggers

3. The skill fires but the agent ignores instructions

This is usually a Tier 2 problem: the trigger is fine, but the workflow execution is unreliable.

What to check

Is SKILL.md too long to stay in context?
Are the steps ambiguous or full of prose instead of explicit actions?
Are repeated mechanical tasks still written as markdown instead of scripts?
Are there too many branches packed into one file?

What to run

selftune grade auto --skill my-skill
selftune workflows --skill my-skill
selftune last

Typical fixes

Split one large SKILL.md into a router plus focused workflow files
Move deterministic logic into scripts
Add concrete command examples
Remove long explanatory text from the operational path

If the instructions are bloated, read Structuring Large Skills and Managing Context.

4. The workflow is followed, but the output is weak

This is a Tier 3 problem. The agent is activating the skill and roughly following instructions, but the result is still not good enough.

What to check

Does the workflow specify quality bars, output format, or acceptance criteria?
Are there examples of good output?
Is there a validation or review step before final output?

What to run

selftune grade auto --skill my-skill --expectations "high quality output"
selftune last

Typical fixes

Add a target structure for the final answer
Include one short example of a strong result
Add a deterministic validation step before completion
Separate “collect data” from “present result”

If the same output-quality fix is needed every time, it probably belongs in code instead of prose. Read next: Using Scripts in Skills

5. Scripts hang or behave unpredictably

This is almost always a script design problem, not a selftune problem.

What to check

Does the script prompt for input interactively?
Does it print verbose logs to stdout instead of returning structured data?
Does it require local dependencies that are not declared?
Does it mutate files or state without explicit flags?

Typical bad pattern

read -p "Enter filename: " filename

That will hang an agent session.

Typical fix

filename="${1:?Usage: validate.sh <filename>}"

Better script properties:

All inputs via flags, env vars, or stdin
--help explains usage
stdout contains parseable output
stderr contains diagnostics
exit codes are meaningful

6. selftune does not seem to have enough evidence

Sometimes the skill itself is fine, but selftune cannot judge or evolve it well because the local evidence is thin.

What to check

Did you run selftune sync after recent sessions?
Is this a brand-new skill with no real usage history?
Did the local setup pass selftune doctor?

What to run

selftune doctor
selftune sync
selftune status
selftune verify --skill-path path/to/my-skill
selftune eval generate --skill my-skill --auto-synthetic --skill-path path/to/SKILL.md

Typical fix

Repair setup issues first
Use synthetic evals until real usage arrives
For draft packages, run verify first, then fill only the missing replay or baseline steps it asks for before publish
Establish a baseline before you judge evolution results

If selftune itself looks unhealthy, start with Quickstart and How It Works.

A practical triage order

When you are unsure where to start, use this order:

selftune doctor
selftune sync
selftune status
selftune grade auto --skill my-skill
selftune eval generate --skill my-skill

That sequence tells you whether the problem is setup, missing evidence, triggering, workflow compliance, or output quality.

Escalate carefully

Do not jump straight to rewriting the whole skill. Usually the right fix is smaller:

Tighten the description
Split a workflow
Add a script
Roll back a bad evolution
Add stronger eval coverage

Large rewrites are expensive and often erase useful signal.

Next steps

Build Your First Skill

Follow the full authoring and optimization loop.

Testing Triggers

Diagnose undertriggering and overtriggering.

Managing Context

Fix workflows that get ignored mid-session.

Using Scripts

Make the mechanical parts deterministic.

Documentation Index

​Start with the symptom

​1. The skill undertriggers

​What to check

​What to run

​Typical fix

​2. The skill overtriggers

​What to check

​What to run

​Typical fix

​3. The skill fires but the agent ignores instructions

​What to check

​What to run

​Typical fixes

​4. The workflow is followed, but the output is weak

​What to check

​What to run

​Typical fixes

​5. Scripts hang or behave unpredictably

​What to check

​Typical bad pattern

​Typical fix

​6. selftune does not seem to have enough evidence

​What to check

​What to run

​Typical fix

​A practical triage order

​Escalate carefully

​Next steps

Build Your First Skill

Testing Triggers

Managing Context

Using Scripts

Start with the symptom

1. The skill undertriggers

What to check

What to run

Typical fix

2. The skill overtriggers

What to check

What to run

Typical fix

3. The skill fires but the agent ignores instructions

What to check

What to run

Typical fixes

4. The workflow is followed, but the output is weak

What to check

What to run

Typical fixes

5. Scripts hang or behave unpredictably

What to check

Typical bad pattern

Typical fix

6. selftune does not seem to have enough evidence

What to check

What to run

Typical fix

A practical triage order

Escalate carefully

Next steps