Build and Improve Your First Skill

What you’ll build

This guide walks through a complete first loop:

Create a simple skill
Try it with real prompts
Generate evals
Baseline and evolve the description
Re-test and decide whether to ship

If you want the full shipping path, including package validation, baseline gating, deployment, and watch, use Create, Test, and Deploy a Skill. We’ll use a small example skill called summarize-issues:

summarize-issues/
├── SKILL.md
├── workflows/default.md
├── references/overview.md
└── selftune.create.json

It helps an agent summarize bug reports, support threads, and issue trackers into a short engineering brief.

Step 1: Create the first draft package

Initialize it with selftune create init:

selftune create init \
  --name "Summarize Issues" \
  --description "Use when the user wants issue trackers, support threads, or bug reports summarized into an engineering brief."

Then edit summarize-issues/SKILL.md so the router stays focused:

---
name: summarize-issues
description: >
  Summarize bug reports, GitHub issues, and support threads into a concise
  engineering brief with key problems, reproduction clues, and likely next
  steps. USE WHEN summarize issue, bug report, support ticket, issue triage,
  customer complaint, incident summary.
---

That is enough to get started. Do not try to encode every edge case yet. The goal is to create a usable first draft package that selftune can measure.

Step 2: Test the raw trigger manually

Run a few prompts through your agent and check whether the skill activates. Examples that should trigger:

"Summarize these GitHub issues into the top 3 problems"
"Turn this support thread into an engineering brief"
"What are the main bugs customers are reporting here?"

Examples that should not trigger:

"Write a GitHub issue template"
"How should I prioritize my backlog?"
"Fix this bug in my React component"

You are looking for obvious misses and obvious false positives. If the skill is clearly too narrow or too broad, adjust the description before you start generating evals. For a deeper trigger-testing workflow, see Testing Skill Triggers.

Step 3: Let selftune observe real usage

If selftune is already installed, make sure session data is flowing:

selftune doctor
selftune sync
selftune status

If this is a brand-new environment, start with Quickstart. The point of this step is simple: selftune needs evidence before it can tell you how the skill is performing.

Step 4: Check the draft package

Before generating evals, confirm that the package is structurally valid:

selftune create status --skill-path summarize-issues
selftune verify --skill-path summarize-issues

If verify says the draft needs evals, move on. If it flags package or spec issues, fix those before treating the skill as publishable.

Step 5: Generate an eval set

If you already used the skill in real sessions:

selftune eval generate --skill summarize-issues

If the skill is new and you do not have usage history yet:

selftune eval generate \
  --skill summarize-issues \
  --auto-synthetic \
  --skill-path summarize-issues/SKILL.md

This gives you a starting eval set grounded either in real usage or in the skill definition itself.

Step 6: Evolve the description

Once evals exist, run a replay-backed dry run:

selftune evolve \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md \
  --dry-run

If you want the package-aware validation path used for new draft skills, the full shipping guide switches here to:

selftune create replay --skill-path summarize-issues --mode package

Review the proposed changes first. If the dry run looks reasonable, run the real evolution:

selftune evolve \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md

Typical improvements include:

Adding user-language synonyms you did not think of
Narrowing overly broad trigger phrases
Rewriting the description around intent instead of implementation

If you need the theory behind this step, read Writing Effective Descriptions.

Step 7: Re-test with the same prompts

Now re-run the manual prompts from Step 2 and compare behavior. Then inspect the skill’s health:

selftune status
selftune grade auto --skill summarize-issues

You want to confirm two things:

The skill catches more of the prompts that should trigger
It did not start firing on adjacent prompts that should stay out of scope

If it regressed, use:

selftune evolve rollback \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md

Step 8: Publish when the loop is complete

For a draft package, the package-first ship path is:

selftune publish --skill-path summarize-issues

For an already-existing skill that is not being authored as a draft package, the older evolve path still works:

selftune evolve \
  --skill summarize-issues \
  --skill-path summarize-issues/SKILL.md \
  --with-baseline

Step 9: Repeat until the skill is boring

Good skills become boring:

The description reliably catches real phrasing
The eval set stays mostly green
The skill health stays stable over time

Once you reach that point, you can automate the loop:

selftune cron setup

Or run the full loop directly:

selftune run --skill summarize-issues

What to improve next

If the skill still struggles, use the symptom to choose the next guide:

Symptom	Read next
You are not sure what is broken yet	Troubleshooting Skills
It misses real user phrasing	Writing Effective Descriptions
It triggers inconsistently	Testing Skill Triggers
The skill file is getting bloated	Structuring Large Skills
The agent forgets instructions mid-session	Managing Context
Repeated steps are mechanical	Using Scripts in Skills

After your first skill

Once one skill is working, the next level is operating it as a system:

Watch health over time with The Iteration Loop
Publish it with Publishing and Sharing Skills
Use selftune Cloud if you want team or contributor-signal feedback

Documentation Index

​What you’ll build

​Step 1: Create the first draft package

​Step 2: Test the raw trigger manually

​Step 3: Let selftune observe real usage

​Step 4: Check the draft package

​Step 5: Generate an eval set

​Step 6: Evolve the description

​Step 7: Re-test with the same prompts

​Step 8: Publish when the loop is complete

​Step 9: Repeat until the skill is boring

​What to improve next

​After your first skill