Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

What you’ll build

This guide walks through a complete first loop:
  1. Create a simple skill
  2. Try it with real prompts
  3. Generate evals
  4. Baseline and evolve the description
  5. Re-test and decide whether to ship
If you want the full shipping path, including package validation, baseline gating, deployment, and watch, use Create, Test, and Deploy a Skill. We’ll use a small example skill called summarize-issues:
summarize-issues/
├── SKILL.md
├── workflows/default.md
├── references/overview.md
└── selftune.create.json
It helps an agent summarize bug reports, support threads, and issue trackers into a short engineering brief.

Step 1: Create the first draft package

Initialize it with selftune create init:
selftune create init \
  --name "Summarize Issues" \
  --description "Use when the user wants issue trackers, support threads, or bug reports summarized into an engineering brief."
Then edit summarize-issues/SKILL.md so the router stays focused:
---
name: summarize-issues
description: >
  Summarize bug reports, GitHub issues, and support threads into a concise
  engineering brief with key problems, reproduction clues, and likely next
  steps. USE WHEN summarize issue, bug report, support ticket, issue triage,
  customer complaint, incident summary.
---
That is enough to get started. Do not try to encode every edge case yet. The goal is to create a usable first draft package that selftune can measure.

Step 2: Test the raw trigger manually

Run a few prompts through your agent and check whether the skill activates. Examples that should trigger:
"Summarize these GitHub issues into the top 3 problems"
"Turn this support thread into an engineering brief"
"What are the main bugs customers are reporting here?"
Examples that should not trigger:
"Write a GitHub issue template"
"How should I prioritize my backlog?"
"Fix this bug in my React component"
You are looking for obvious misses and obvious false positives. If the skill is clearly too narrow or too broad, adjust the description before you start generating evals. For a deeper trigger-testing workflow, see Testing Skill Triggers.

Step 3: Let selftune observe real usage

If selftune is already installed, make sure session data is flowing:
selftune doctor
selftune sync
selftune status
If this is a brand-new environment, start with Quickstart. The point of this step is simple: selftune needs evidence before it can tell you how the skill is performing.

Step 4: Check the draft package

Before generating evals, confirm that the package is structurally valid:
selftune create status --skill-path summarize-issues
selftune verify --skill-path summarize-issues
If verify says the draft needs evals, move on. If it flags package or spec issues, fix those before treating the skill as publishable.

Step 5: Generate an eval set

If you already used the skill in real sessions:
selftune eval generate --skill summarize-issues
If the skill is new and you do not have usage history yet:
selftune eval generate \
  --skill summarize-issues \
  --auto-synthetic \
  --skill-path summarize-issues/SKILL.md
This gives you a starting eval set grounded either in real usage or in the skill definition itself.

Step 6: Evolve the description

Once evals exist, run a replay-backed dry run:
selftune evolve \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md \
  --dry-run
If you want the package-aware validation path used for new draft skills, the full shipping guide switches here to:
selftune create replay --skill-path summarize-issues --mode package
Review the proposed changes first. If the dry run looks reasonable, run the real evolution:
selftune evolve \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md
Typical improvements include:
  • Adding user-language synonyms you did not think of
  • Narrowing overly broad trigger phrases
  • Rewriting the description around intent instead of implementation
If you need the theory behind this step, read Writing Effective Descriptions.

Step 7: Re-test with the same prompts

Now re-run the manual prompts from Step 2 and compare behavior. Then inspect the skill’s health:
selftune status
selftune grade auto --skill summarize-issues
You want to confirm two things:
  • The skill catches more of the prompts that should trigger
  • It did not start firing on adjacent prompts that should stay out of scope
If it regressed, use:
selftune evolve rollback \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md

Step 8: Publish when the loop is complete

For a draft package, the package-first ship path is:
selftune publish --skill-path summarize-issues
For an already-existing skill that is not being authored as a draft package, the older evolve path still works:
selftune evolve \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md \
  --with-baseline

Step 9: Repeat until the skill is boring

Good skills become boring:
  • The description reliably catches real phrasing
  • The eval set stays mostly green
  • The skill health stays stable over time
Once you reach that point, you can automate the loop:
selftune cron setup
Or run the full loop directly:
selftune run --skill summarize-issues

What to improve next

If the skill still struggles, use the symptom to choose the next guide:
SymptomRead next
You are not sure what is broken yetTroubleshooting Skills
It misses real user phrasingWriting Effective Descriptions
It triggers inconsistentlyTesting Skill Triggers
The skill file is getting bloatedStructuring Large Skills
The agent forgets instructions mid-sessionManaging Context
Repeated steps are mechanicalUsing Scripts in Skills

After your first skill

Once one skill is working, the next level is operating it as a system: