Create, Test, and Deploy a Skill

What this guide covers

Use this when you want one path from idea to shipped skill:

Create -> verify -> publish -> watch

It combines the Agent Skills authoring basics with selftune’s lifecycle-first creator flow, so you can answer four questions before you publish:

Is the skill scoped correctly?
Does it trigger on the right prompts?
Does it add value compared with no skill?
Can I deploy it without guessing?

This guide follows the current package-first flow:

Draft package -> verify -> fill missing evidence -> publish -> watch

If you only want the command reference, use selftune create. If you want the lighter introductory version first, use Build and Improve Your First Skill.

Step 1: Pick a coherent skill boundary

Start with one unit of work the agent would otherwise get wrong or do inconsistently. Good candidates:

A reusable workflow with domain-specific context
A task with recurring trigger misses
A job that mixes judgment with a few deterministic steps

Bad candidates:

A vague bucket like “do engineering work”
A tiny one-line trick that the base agent already handles well
A grab-bag of unrelated tasks

Rule of thumb:

Put routing in description
Put ordered execution in workflows/
Put durable context in references/
Put deterministic mechanics in scripts/ or tools

Step 2: Create the draft package

Start with selftune create init if you already know the skill you want to build:

selftune create init \
  --name "Summarize Issues" \
  --description "Use when the user wants issue trackers, support threads, or bug reports summarized into an engineering brief."

If you want to bootstrap from repeated telemetry instead, use:

selftune create scaffold --from-workflow 1 --write

Both commands produce the same package shape:

summarize-issues/
├── SKILL.md
├── workflows/default.md
├── references/overview.md
├── scripts/
├── assets/
└── selftune.create.json

SKILL.md is the router. workflows/default.md is the first execution path. references/overview.md is durable background context. selftune.create.json records the package metadata selftune uses for readiness and package replay. Start with a small router:

---
name: summarize-issues
description: >
  Use this skill when the user wants bug reports, support threads, or issue
  trackers summarized into an engineering brief with key failures, reproduction
  clues, and next steps. Do not use it for writing issue templates, backlog
  prioritization, or fixing the bug itself.
compatibility: Works with standard shell tools only.
---

Keep the description focused on when to use the skill, not how it works internally.

Step 3: Put the right detail in the right file

Keep the always-loaded instructions lean. Use the rest of the directory intentionally:

workflows/ for the main path once the skill is selected
references/ for checklists, taxonomies, schemas, or examples the agent should load on demand
scripts/ for exact mechanics the agent should execute instead of reinventing
assets/ for templates, static examples, or config snippets

That split matters because agents load metadata first, full SKILL.md on activation, and support files only when needed.

Step 4: Check the package before generating evals

Use the draft-aware status and verify commands first:

selftune create status --skill-path .agents/skills/summarize-issues
selftune verify --skill-path .agents/skills/summarize-issues

create status is the fast local view. verify runs the same readiness contract as create check, then emits the measured package report once the draft is actually ready. At this point you want to confirm:

the package structure is complete enough to validate
the entry workflow exists
the description is specific enough to route
the next missing artifact is clear

Then do a quick manual trigger pass with three kinds of prompts:

should-trigger prompts
should-not-trigger near misses
realistic prompts with file paths, context, and messy phrasing

If the router is obviously too broad or too narrow, fix it now.

Step 5: Generate your first eval set

If you already have real usage:

selftune eval generate --skill my-skill

If the skill is new or cold-start:

selftune eval generate \
  --skill my-skill \
  --auto-synthetic \
  --skill-path path/to/my-skill/SKILL.md

This creates the routing eval set and saves the canonical copy under:

~/.selftune/eval-sets/my-skill.json

After generating evals, rerun:

selftune verify --skill-path .agents/skills/summarize-issues

The package should now move from needs_evals to needs_unit_tests.

Step 6: Add skill-level unit tests

Generate or run deterministic tests for the workflow itself:

selftune eval unit-test \
  --skill my-skill \
  --generate \
  --skill-path path/to/my-skill/SKILL.md

This covers the “once it triggers, does it do the job correctly?” part of the loop. The latest test run summary is stored under:

~/.selftune/unit-tests/my-skill.last-run.json

Run selftune verify --skill-path ... again after the suite is generated or recorded.

Step 7: Prove the package with replay validation

For a new draft package, use the package-aware replay path instead of a generic evolve dry-run:

selftune create replay \
  --skill-path .agents/skills/summarize-issues \
  --mode package

This stages the whole package, not just the router text, so runtime replay is allowed to read:

workflows/default.md
references/overview.md
other package-local files the skill needs during execution

Use --mode routing only if you intentionally want to isolate the routing layer.

Step 8: Measure the no-skill baseline

Record whether the skill actually adds value versus doing nothing:

selftune create baseline \
  --skill-path .agents/skills/summarize-issues \
  --mode package

That baseline is what lets selftune say “this skill helped” instead of only “this skill triggered.” At this point, selftune verify --skill-path ... should move to the point where publish is the next lifecycle action.

Step 9: Publish the draft package

Once evals, unit tests, replay validation, and baseline are all in place, ship through the lifecycle surface:

selftune publish \
  --skill-path .agents/skills/summarize-issues

This is the recommended ship command for new draft packages because it:

blocks if the draft is not ready
reuses the same measured package evaluation contract you saw during verify
starts watch by default unless you pass --no-watch

If you want another review pass first, rerun create replay or create baseline, inspect the dashboard skill report, and publish only after the draft loop is green.

Step 10: Watch the deployed skill

If you did not use --watch, start monitoring explicitly:

selftune watch --skill summarize-issues --skill-path .agents/skills/summarize-issues/SKILL.md

Or let the broader loop manage it:

selftune run --skill summarize-issues

The local dashboard and selftune status now expose this flow directly:

missing package resources
spec validation not yet run
missing evals
missing unit tests
missing replay validation for the package
missing baseline
ready to publish
already deployed and under watch

The dashboard is especially useful for new packages because draft skills now appear there before they have live telemetry, with package-local create readiness on the skill report. When the skill is stable, distribute it through the Agent Skills ecosystem:

npx skills add your-org/your-skill

If you want post-ship creator feedback, bundle creator-directed contribution config:

selftune creator-contributions enable --skill my-skill --creator-id <cloud-user-uuid>

That makes it possible to collect privacy-safe contributor signals after launch.

Deployment checklist

Ship only when all of these are true:

the description explains when to use the skill, not how to implement it
nearby negative examples do not trigger
the package passes selftune verify
workflows/, references/, and scripts/ each have a clear purpose
the skill validates against the Agent Skills package rules
evals exist
unit tests exist
replay evidence exists for the package
the no-skill baseline exists
the publish command has been reviewed

Goal	Read next
Learn the introductory version first	Build and Improve Your First Skill
Tune the router	Writing Effective Descriptions
Test trigger boundaries harder	Testing Skill Triggers
Package it for other users	Publishing and Sharing Skills
Operate the loop continuously	The Iteration Loop

Documentation Index

​What this guide covers

​Step 1: Pick a coherent skill boundary

​Step 2: Create the draft package

​Step 3: Put the right detail in the right file

​Step 4: Check the package before generating evals

​Step 5: Generate your first eval set

​Step 6: Add skill-level unit tests

​Step 7: Prove the package with replay validation

​Step 8: Measure the no-skill baseline

​Step 9: Publish the draft package

​Step 10: Watch the deployed skill

​Step 11: Publish and share it

​Deployment checklist

​Read next