Changelog

This page tracks user-facing selftune changes in a format that is easier to scan than raw commit history. Subscribe to the RSS feed at docs.selftune.dev/changelog/rss.xml, or browse packaged artifacts and compare links in GitHub releases. Matching OSS release tags are enriched from the corresponding entry on this page.

Tags on this page use a fixed taxonomy so filters stay stable over time: Cloud, CLI, Platforms, OSS, Dashboard, Registry, Billing, Community, and Breaking change.

2026-07-15

OSSCLIDashboardRegistry

OSS adopts one runtime owner and a one-container self-host

The same compiled selftune binary now owns daemon startup, its authenticated durable manifest, and launchd, systemd user, or Windows Task Scheduler registration. The desktop app is a thin host for that runtime: it monitors health, performs bounded recovery, exposes restart and diagnostics actions, preserves a backup before resetting runtime state, and keeps the recovery screen available when startup fails. Runtime manifests now distinguish the CLI or desktop owner from desktop-child, OS-service, or unsupervised lifecycle authority. The app preserves CLI foreground daemons and equal, newer, or unversioned services; it upgrades only a provably older registered service, while authenticated desktop children orphaned by an app crash are reclaimed.Signed releases still update through GitHub Releases. Packaged runtimes are copied out of transient install media into a versioned stable path only after their signed-source manifest verifies. The packaged-app release proof launches the final Electron bundle, exercises the authenticated Settings dashboard and preload bridge, verifies the stable runtime copy, and confirms graceful child cleanup, including quit during an in-flight startup. Installed Unix runtimes must remain executable as well as byte-identical to the signed source. CLI and desktop versions advance together, and release tags must resolve to the exact source commit before any artifact is published. Bot-authored Version Packages pull requests explicitly dispatch their required checks when no separate release token is configured, so protected branches do not stall the release loop. Release intent now enters through a workspace-visible package before the public CLI and desktop versions are stamped together, and CI runs each Effect suite with its native test runner. Cross-platform desktop proofs also use native temporary paths and a Linux-only sandbox waiver confined to the unpacked smoke process. Windows verification now treats Unix permission bits as a POSIX-only contract and exercises service definitions with host-native paths. Remote Library URL normalization and email-like invocation detection use linear scans so untrusted input cannot trigger regex backtracking. Public desktop and self-host builds now pin Bun 1.3.14. The Windows runtime is cross-compiled on Linux, passed to the Windows packaging job as a same-run artifact, and then exercised on Windows, avoiding the upstream Windows-host compiler crash without weakening the platform smoke.The public repository now uses explicit Executor-style package ownership: apps/cli composes commands, apps/local owns the daemon and HTTP host, harness integrations live in independent packages, and cross-harness sync lives in an orchestration package. Existing npm and hook paths remain as compatibility shims. Source sync is provided through a typed Effect service and live Layer, so reusable runtime science no longer reaches into platform adapters through hidden imports. Inside the local host, dashboard workflows now run through a scoped Effect service with typed failures and deterministic disposal. Authentication, live events, SPA transport, protocol validation, and SQLite-backed read routes have independent owners, leaving the Bun server as a small process composition root instead of a second application layer. The release packer flattens shared Effect and Zod runtimes into the root manifest and normalizes bundled workspace links before npm snapshots the tree, keeping the artifact within its enforced size and file-count budgets.Electron supervision now follows the same ownership model internally. A scoped Effect runtime serializes restarts, service toggles, resets, recovery, update preparation, and shutdown instead of dropping a request while another transition is active. Health probes and child-exit events are tied to their connection generation, authenticated window replacement is staged before it becomes active, native tray payloads are schema-decoded, and disposable IPC, updater, tray, session, and monitor resources are finalized by their owners. Packaged builds now compile internal workspace TypeScript into the Electron main bundle instead of asking Node to execute source files from node_modules.The local dashboard now uses a quieter application shell centered on Library, Projects, Insights, and Settings. A shared command palette makes skills and destinations searchable from the sidebar, inventory and project controls use denser native primitives, and the desktop bridge can open verified skill directories directly in Finder without exposing arbitrary file paths.Dashboard pages now share one visual scaffold across Library, Projects, Insights, Settings, and Status. Library prioritizes essential columns on narrower windows, Insights groups secondary review actions, loading states mirror their finished layouts, and an available dashboard build appears as an Update action beside the sidebar version instead of a floating alert.Remote Library account tokens now use the macOS Keychain, Linux Secret Service, or Windows Credential Manager when available. Existing plaintext configuration migrates automatically to a credential reference; headless environments keep an explicit owner-only file fallback.OSS SelfTune now includes a production one-container self-host for immutable backups, multi-device access, and recipient-scoped private sharing. The non-root image runs the canonical dashboard and cloud-compatible Remote Library API with tenant-scoped SQLite and content-addressed objects in one /data volume. Snapshot heads use transactional compare-and-swap, account tokens are hashed at rest, raw transcripts never sync, and coordinated backup instructions cover both SQLite and object content. Release automation builds linux/amd64 and linux/arm64 images for GitHub Container Registry, smokes the immutable per-platform digests under a non-root read-only container, and moves version or latest references only after the exact candidate passes. The self-hosted web Library and Skill Sets views read the authenticated admin organization’s remote snapshot, with Skill Set mutations left to trusted synced devices. Rollback receipts also recognize when a filesystem reuses an old device and inode, so SelfTune preserves a replacement package instead of treating it as the materialization originally created by a Skill Set apply.

2026-07-15

OSSDashboardRegistryCloud

Remote Library backups become complete and privately shareable

Remote Library now stores one canonical immutable revision for a skill even when that revision appears in several locations or harnesses. Skill Set sync includes every pinned revision in the same snapshot, so a restored Set can no longer reference a package that was never uploaded. Evaluated release authority remains attached to the canonical package when available, and old released_skill snapshots remain restorable.Desktop Settings can grant a skill or Skill Set revision privately to an existing SelfTune user. A Skill Set grant is expanded and validated on the server, recipients must explicitly accept it, and import copies every immutable object into the recipient organization before advancing its snapshot. Senders can revoke outstanding grants, expiration is enforced, and create, accept, import, and revoke actions enter the audit log. Device API keys never act as share links.The supervised desktop service now performs Remote Library sync shortly after startup and every four hours while it remains configured. Raw transcripts still never enter the backup or sharing protocol.

2026-07-15

OSSDashboard

Desktop adds supervised background service and native auto-update

Installed macOS builds can now keep SelfTune’s authenticated local service running independently of the Electron window and menu-bar process. An owner-scoped LaunchAgent starts at login, restarts only after failures, keeps its bearer token out of the plist, and is repaired when an app update changes the bundled sidecar. SelfTune falls back to a managed child if supervision is unavailable, and the menu bar can enable or disable the service explicitly.Signed desktop releases now use GitHub Release update manifests. The app checks on launch and every four hours, downloads updates in the background, shows progress in the menu bar, and asks for an explicit restart after the update is staged. The release workflow publishes blockmaps and merges macOS arm64 and x64 metadata so each Mac receives the correct build.The Library harness column now uses compact logo-only indicators with accessible hover and focus labels. Pi receives a larger internal treatment so its tall mark has the same apparent weight as the other harness logos.Library loading now uses cached source-update metadata for the first render and refreshes GitHub state in the background. Repeated harness links share a single package hash per scan, while portfolio combination analysis avoids processing unrelated query logs when it only needs positive examples. The native shell also uses a compact summary endpoint instead of downloading the full local analytics history on every page.

2026-07-15

OSSCLIDashboardRegistryPlatforms

SelfTune becomes a local-first skill control plane

The native app now organizes SelfTune around Library, Projects, Insights, and Settings. Library reconciles duplicate installations, immutable revisions, drafts, cached packages, and reversible archives across Claude Code, Codex, OpenCode, OpenClaw, and Pi. Project Skill Sets can target all five harnesses, preview every materialized path, block conflicts before mutation, prefer links to a verified cache, and roll back only receipt-owned paths.Insights adds a local synthesis inbox for repeated successful coverage gaps and stable ordered skill combinations. Candidate strength includes independent-session support, project diversity, temporal recurrence, outcome quality, marginal co-usage lift, sequence consistency, confidence, and uncertainty. Supporting and held-out sessions are separated before draft creation. Accepted candidates create drafts with workflow, provenance, and positive, negative near-neighbor, boundary, and execution eval cases; they do not publish, install, merge, or archive anything automatically. Explicit evaluation binds package validation, replay, routing, no-skill and existing skill baselines, held-out lift, and regression checks to the candidate, evidence snapshot, and immutable draft hash. Release is blocked if that draft changes or the gate does not recommend it.Library inventory rows now show the installed source, global or project location, configured harnesses, trusted last invocation, package modification time, and upstream update state. GitHub update badges are backed by install lock metadata and concurrency-bounded repository tree checks; local and unsupported sources are labeled as untracked or unavailable instead of being guessed.Source checks now reuse a six-hour disk cache and authenticate with GITHUB_TOKEN, GH_TOKEN, or the active GitHub CLI session before consuming anonymous API quota. The Library detail panel exposes every concrete path, scope, harness, source, revision, last use, and modification time. GitHub updates are previewed against the recorded upstream tree, block on local changes by default, and require an explicit replace action that retains a complete backup and update receipt. Inventory discovery also vendors all 73 placement definitions from the upstream vercel-labs/skills registry. This expands where SelfTune can find skills without claiming telemetry support for agents that do not yet have a SelfTune adapter.Remote Library is now an optional deployment-neutral backup and multi-device protocol for immutable packages, selected drafts, Skill Sets, metadata, and decision history. Raw transcripts remain local and are not a supported sync artifact. A sync preview lists exact artifacts and byte counts; selected draft provenance uses pseudonymous session identifiers and review reasons are redacted before the remote boundary. Versioned Skill Sets can also be derived from an existing project and exported as portable checked-in manifests with no device paths or credentials. The same authenticated API runs in SelfTune Cloud or in the optional OSS self-host. The shipped self-host uses tenant-scoped SQLite and immutable filesystem objects in one volume, with bootstrap accounts, health checks, integrity diagnostics, and coordinated backup and restore instructions.

2026-07-14

CLIOSS

OSS adds evidence-aware skill portfolio cleanup

The OSS CLI now inventories installed skills even when they have no usage records, distinguishes missing evidence from measured inactivity, and recommends keeping, measuring, repairing, consolidating, or reviewing a package for quarantine. Approved quarantine operations move complete packages outside active registries, preserve a local receipt and package hash, and return an exact restore command. SelfTune and system/admin-managed skills are protected, and no audit result deletes a skill automatically.The local dashboard now exposes that complete installed inventory directly, including packages with no SelfTune observations, with search, evidence-state filters, explicit quarantine, and receipt-based restore. A new Electron host packages the same dashboard and local API as a native desktop control plane: it supervises a compiled Bun sidecar on authenticated loopback, shares ~/.selftune with the CLI, and keeps the bearer credential out of renderer JavaScript.Desktop Settings now reports whether Claude Code, Codex, OpenCode, OpenClaw, and Pi are detected and whether their SelfTune integration is actually connected. It also configures the three local automation jobs with guarded human-readable schedule presets, recommended defaults, and native launchd or systemd activation. The desktop bundle includes its own task CLI, so background jobs do not depend on a separate global SelfTune installation.First-run desktop onboarding now lets a human choose historical import sources, select the detected harnesses where SelfTune should install live hooks, and choose between observability, daily health recommendations, and autonomous improvement. These are executable controls: source choices govern every sync path, feature choices reconcile the native background jobs, and hook choices preserve third-party entries while installing app-bundled native runners. Observability and recommendations are enabled by default; autonomous improvement is explicit opt-in.On macOS, SelfTune now remains available from the menu bar after its dashboard window closes. The live menu summarizes system health, connected harnesses, and attention items; it can sync sessions immediately, toggle each native automation job without changing its configured frequency, open Settings, status, and logs, and enable Launch at Login in installed builds. Action failures surface as native notifications, while an explicit Quit action shuts down the supervised local sidecar cleanly.Frequent telemetry sync now imports new harness sessions without rebuilding the complete historical repair overlay every 30 minutes. The daily health job retains the full repair checkpoint before scoring and recommendations.SelfTune now stores reusable project Skill Sets in a content-addressed local Library. The CLI and native Projects screen can pin installed package revisions, preview Codex and Claude Code project links, block the entire operation on any destination conflict, apply the set idempotently, and roll back only paths owned by the apply receipt. The desktop navigation is reduced to Library, Projects, Insights, and Settings so inventory, distribution, evidence, and configuration each have one predictable home.

2026-07-13

Cloud

Cloud qualification adds a sealed known-positive instrument canary

Positive-control qualification now accepts only a versioned operator canary whose current skill, corrected candidate, and 50-case deterministic suite match registered content hashes. The Worker and independent campaign assessor reject missing or drifted control identity before treating results as evidence, so an ordinary generated candidate cannot masquerade as proof that the measurement instrument detects real improvements.

2026-07-13

Cloud

Cloud improve checks model-provider capacity before task-package runs

Cloud improve, discovery, qualification, and sandbox experiments now check the execution provider’s key allowance and account credits before reserving a source or dispatching a workflow. Unavailable or unverifiable capacity returns a retryable service error without dispatching provider work; if database cleanup is interrupted, a queued idempotency retry must repeat admission before it can dispatch. Late provider failures remain excluded from scored evidence. Qualification protocols also seal an operator-declared campaign reserve and both execution ceilings. Scheduling and Worker start require that reserve on a finite, non-resetting execution key; account-credit checks use a separate management key that is never passed to the task-agent sandbox. Sandbox experiments use their own finite execution key, while a global account-capacity lease still serializes all provider consumers because both keys draw from the same credits. Campaign reserves use exact cent arithmetic, reject an unexecutable final repeat, and require prior repeats to pass in order. Duplicate experiment delivery is atomically claimed, stale coordinator leases expire, and exhausted dispatch recovery atomically fences workflow start before releasing both leases. A durable cleanup state retries lease release after crashes or coordinator failures. The zero-work infrastructure control stays runnable during provider outages.

2026-07-12

Cloud

Cloud improve separates candidate discovery from qualification evidence

Large outcome suites now fail before entering the ordinary atomic improve path. Operators can instead run a bounded calibration-only discovery that persists candidate archives and diffs without declaring a winner or creating a proposal, then stage the candidate as a detached snapshot for sealed qualification. Qualification checkpoints every task attempt independently and reserves explicit runtime headroom so long repeated trials can resume without discarding earlier evidence.

2026-07-12

CLICloudPlatforms

Skill telemetry now identifies the exact version and triggering prompt

Claude Code hooks now hash the complete installed skill package used by an invocation, persist that version and invocation time through local staging and cloud upload, and bind replayed tool events to the prompt active when they occurred. Historical events without preserved package bytes remain explicitly unversioned and are excluded from causal improvement claims.

2026-07-12

Cloud

Cloud task agents can no longer inspect verifier or oracle assets

Task-package runs now remove verifier scripts, oracle data, declared hidden assets, prior answers, and stale rollout artifacts before the task agent starts. The sealed archive is restored only after the agent exits, and newly authored reviewed-trace packages no longer include the previous response.

2026-07-12

Cloud

Cloud improve adds a sealed qualification lane for controlled skill experiments

Cloud operators can preregister immutable current-skill, candidate-skill, and no-skill experiments with randomized arm order, repeated scored trials, and sealed holdouts. Each task agent now hands only its source subtree to a fresh, network-denied verifier sandbox, while the protocol seals the exact Worker version, runtime manifest, model order, locale, and repeat count. Qualification runs cannot create winners, proposals, readiness changes, or apply attempts. Their full evidence is content-addressed in R2, and campaign assessment verifies every stored byte against immutable Neon hashes before it can pass.

2026-07-12

Cloud

Cloud telemetry now uses verified R2 cold storage to control Neon growth

Processed raw telemetry pushes remain available in Neon for seven days, then move to R2 under deterministic keys with SHA-256 evidence. The cloud worker verifies each object before clearing the duplicated hot payload columns, and archived pushes remain replayable through the hosted API.

2026-06-03

CloudDashboard

Cloud experiments now expose SelfTune-native agent trace observability in the dashboard

Completed cloud experiment runs now retain aggregate agent trace evidence, including turn counts, tool-call counts, token totals, timing fields, and final-response status for evaluation review.The dashboard also has an experiment trace view that shows the sandbox boot snapshot, ordered lifecycle events, category summaries, and expandable raw event payloads without relying on third-party tracing UI.

2026-06-02

Cloud

Cloud sources can now draft and queue cited benchmark-factory arms from uploaded skill snapshots

The hosted API now exposes a source-scoped benchmark factory draft endpoint. It reads the current skill snapshot, extracts cited contract requirements from the skill body, proposes distinct scenario candidates, and returns deterministic verifier proposals with explicit draft-only promotion blockers.The hosted API can also review completed sandbox arm evidence for the same source-scoped benchmark draft, validating verifier proposals against current-skill and no-skill outputs before emitting review-only promotion drafts.Benchmark-factory drafts can now queue bounded current-skill and no-skill sandbox experiment arms. Each queued arm carries scenario, condition, and batch metadata so completed experiments can be reviewed without losing the baseline identity.Completed queued arm experiments can now be reviewed by batch or experiment ID, converting persisted sandbox traces back into arm evidence before running verifier validation and discriminating-case filtering.

2026-05-17

CloudDashboard

Cloud V2 adds the first Hono-backed Effect Schema contract endpoint for the Start removal refactor

The hosted API now exposes a Cloud V2 status contract endpoint backed by Hono and Effect Schema. This is the first tracer bullet in the Cloud V2 refactor away from TanStack Start server functions toward API-owned contracts and backend Effect modules.The Cloud V2 overview page now has a Hono-backed readiness batch endpoint, so the dashboard can replace per-source Start server-function fanout with one API-owned request that returns null for individual missing readiness rows.Cloud V2 skills inventory and skill detail reads also now have hosted API endpoints, giving the dashboard a Hono-owned read path before the Start BFF layer is removed.Cloud V2 improvement list/detail reads, skill source-file reads, and skill improve-run queueing now use Hono endpoints backed by Effect services, moving more operator workflows off TanStack Start server functions.Cloud V2 bundle list/detail reads and GitHub bundle creation now run through Hono endpoints as well, so bundle pages no longer need the Start BFF wrapper.Cloud V2 GitHub import source creation and imported-skill auto-setup now call hosted Hono endpoints, removing another Start server-function hop from the skill import flow.Cloud V2 overview loading and readiness advancement now use hosted Hono endpoints backed by API-owned inventory and Effect overview services.Cloud V2 GitHub settings now load through a hosted Hono read model instead of a dashboard Start server function.Cloud V2 uploaded skill imports now post directly to the Hono import endpoint, replacing the upload-specific Start route for skill source creation.Cloud V2 GitHub installation binding now has a hosted Hono endpoint backed by an Effect service, preparing the GitHub App callback flow for the API-owned backend boundary.Cloud V2 bootstrap now loads from a Hono endpoint, removing the app-local Start/session database bootstrap path from the dashboard shell.Cloud V2 now builds as a Vite SPA instead of a TanStack Start app. The dashboard no longer ships Start server routes, app-local Neon database code, or BFF adapters; browser routes call the hosted Hono API boundary directly.Cloud V2 Hono contracts now use concrete Effect Schema DTOs for migrated run, skill, eval, improvement, and readiness payloads, and the SPA decodes key read responses before handing them to route components. Improvement workflow failures now map from tagged Effect service errors instead of route string matching.The Effect architecture guardrail now covers Cloud V2 Hono routes, service contracts, and browser boundaries, preventing broad Effect imports, runtime Effect usage in the SPA, and unknown response DTOs in migrated contracts.Cloud V2 dashboard API calls now send Neon Auth JWTs to the Hono backend for local verification, avoiding per-request remote session validation. The overview readiness endpoint also reads readiness rows in one batch, cutting the 26-source readiness request from multi-second fanout to sub-second API latency.Cloud V2 Hono and bundle paths now avoid redundant type assertions and ambiguous fallback defaults, tightening the Effect-backed API boundary without changing user-facing behavior.Cloud V2 workflow endpoints now share one Effect result-to-HTTP response helper, reducing repeated route-level try/catch handling across eval, experiment promotion, GitHub import, and bundle creation actions.Cloud V2 eval routes and run-detail DTOs now share typed request and JSON helpers, reducing repeated route validation while preserving the existing Hono response shapes.Cloud V2 experiment promotion now records explicit calibration/holdout eval split metadata for reviewed task-package cases. Reviewed traces default to calibration, while the API can intentionally promote a case as holdout for blind regression measurement.

2026-05-16

CloudDashboard

SelfTune Cloud now includes a public skill validator with safe archive handling and autofix downloads

The cloud dashboard now exposes a public /validate page for uploading Agent Skills packages without signing in. The page returns spec validation, best-practice lint findings, package inspection details, and an autofix download when deterministic corrections are available.Public validation and autofix uploads run through the shared skill validation library with bounded archive processing, generated temporary paths, typed upload errors, cleanup after each request, and basic per-IP throttling.The validator can also export an uploaded skill as a Claude Code plugin zip, adding the required .claude-plugin/plugin.json manifest and wrapping the skill under skills/<skill-name>/ for plugin installs.

2026-05-14

CloudDashboard

Cloud V2 local smoke now supports dev auth and verifies the improve review loop without production login

Local Cloud V2 development can now enable dashboard dev auth with DEV_AUTH=1 and NEXT_PUBLIC_DEV_AUTH=1. The dev user is seeded with an active alpha enrollment and a pro Dev Org so local smoke tests can exercise the same upload, improve, proposal review, and apply surfaces that production gates expose.The cloud dashboard smoke now has a local no-auth mode for dev-auth sessions, waits for hydrated Cloud V2 surfaces before asserting, and accepts current proposal handoff copy for applied improvements.

2026-05-12

CloudDashboardRegistryCLICommunity

Cloud private bundle creation now publishes from GitHub, registry updates are failure-safe, and telemetry projection can be replayed

The V2 bundle publish flow now starts from a connected GitHub installation, repository, and skill path, then runs the private-bundle sync before the bundle is shown as installable.Initial sync is required for new private GitHub bundles. If GitHub sync, archive generation, manifest validation, or publishing fails, Cloud returns a visible error instead of creating a success-shaped empty registry entry.Bundle lists and detail pages now distinguish installable releases from bundles that still need a published version, so operators can see whether a bundle has a current version before handing install commands to users.The CLI registry installer and sync command now verify downloaded archive hashes before extraction, stage updates outside the live skill directory, and swap the full directory only after extraction succeeds. Files removed from a published version are removed locally on the next install or sync, while failed updates leave the previous installed copy intact.Registry push and rollback persistence now compensates across DB and R2 boundaries: uploaded archives are deleted if DB writes fail, new versions are not marked current until the switch succeeds, and rollback failures restore the previous current-version pointer.V2 telemetry ingestion now fails visibly when raw payload archival or canonical projection fails, leaving the push in a retryable failed state instead of returning success with empty or missing artifacts. Contributor signal, community bundle, and creator analytics reads are scoped to the authenticated organization so guessed creator IDs cannot cross org boundaries.Hosted Cloud Improve Worker persistence now follows the same retry-safe shape as the API path for run candidates and search frontier state. Workflow retries can fill or update candidate rows after a terminal run write, retain useful specialists, shadow weaker winners, supersede weaker active frontier entries, and prune stale retained frontier records.Proposal review routes now accept only approval or rejection. Applied state is reserved for the explicit Cloud Improve apply endpoint, which preserves the required human approval gate and records the actual draft or GitHub PR apply attempt. The hosted API now exposes the same explicit proposal apply endpoint and cloud-run proposal filter as the dashboard API, so production smoke tests can approve and apply Cloud Improve winners without direct database access.Operator alpha routes now include a projection reconciliation action for V2 telemetry. It revalidates archived canonical push payloads, replays the dashboard projector with per-push source arrays, and marks failed projections as retryable instead of hiding rebuild failures.The cloud-v2 improvement review page now uses the explicit Cloud Improve apply service after approval and shows recorded draft or GitHub pull request apply attempts, including failed attempts that need operator attention.The cloud-v2 run list, run detail, and candidate diff reads now load from hosted Hono API endpoints with Effect Schema request contracts, removing more dashboard-only server function reads from the operator path.Cloud-v2 improvement review, apply, and publish actions now call explicit hosted Hono endpoints backed by Effect action services and stable typed error responses, so the review page no longer depends on Start server functions for proposal mutations.Cloud-v2 eval suite list/detail/save/smoke flows and experiment detail/promote flows now use hosted Hono endpoints with schema-backed contracts, removing the eval and experiment Start server-function wrappers from the dashboard app.Cloud-v2 import and bundle creation flows now share the same hosted GitHub source-selection endpoints for installations, repositories, and skill path discovery, reducing duplicated selection logic between the two surfaces.The Cloud-v2 browser API client now centralizes base URL handling, credentials, JSON request bodies, 404-to-null reads, and stable error message extraction for the migrated Hono endpoints.Cloud Improve run detail now shows a live progress strip and refreshes active runs automatically, while terminal runs show the final state without polling. Persisted run errors are also shown inline on the run detail page so failed provider, storage, or projection states are visible to operators. Run candidates now link to their proposal review handoff and show the latest draft or GitHub apply attempt state when an apply has been tried. Applied draft proposals can now be published from the improvement review page as the next installable registry version, with a Cloud Improve source ref linking the release back to the reviewed proposal.Bundle release history now shows installs per version, making adoption visible at the same point operators inspect source refs and release summaries. Bundle detail now separates current-version installs from stale installs so sync adoption is visible before operators optimize the next release.Release checks now include smoke:cloud-product-lifecycle, a hermetic lifecycle smoke that chains Agent Skills package validation, SkillsBench-style no-skill eval contract checks, distribution, CLI install/sync staging, telemetry replay, Worker result persistence, and approved proposal apply guardrails into one JSON summary.Bundle detail now exposes installable artifact provenance for the current version, including source type, source ref, repo, archive size, and content hash, with the same source/archive fields visible in release history. Release history also shows aggregate pass rate, eval count, and session count per version so adoption can be read alongside outcomes.

2026-04-27

CLICloudCommunity

Creator contribution setup now bundles a portable feedback helper for no-CLI downstream signals

selftune creator-contributions enable now writes a generated selftune-feedback.mjs helper and selftune.feedback.json manifest next to the existing selftune.contribute.json config.Downstream agents can use that helper to submit one privacy-safe, creator-directed signal after first-run consent, without requiring the full selftune CLI or a contributor API key. Cloud accepts those helper submissions at POST /api/v1/public/signals and stores them under the creator’s organization.The V2 Cloud registry now applies the same helper automatically when a skill bundle is published or synced from GitHub, with a publish-time checkbox for teams that need to ship the raw skill archive without contributor feedback artifacts.

2026-04-27

CloudBreaking change

Cloud eval cases move out of the legacy casesJson blob into the cloud_eval_cases table; the eval-suite smoke endpoint now returns `cases` instead of `casesJson`

The cloud_eval_suites.casesJson blob has been retired. Eval cases now live in a dedicated cloud_eval_cases table that supports per-case identity, lineage, lifecycle, and indexable metadata. The reader cutover and dual- write that landed across slices 91-94 are now collapsed: writers go only to the new table, the EVAL_CASES_TABLE_AUTHORITATIVE feature flag is gone, and the column has been dropped.The wire-shape change to be aware of: the POST /v1/eval-suites/:id/smoke endpoint now returns suite.cases (the resolved case array) instead of the legacy suite.casesJson field. API clients that unpacked casesJson should switch to cases.

2026-04-26

CloudDashboard

Cloud improve runs now persist explicit search policy controls for candidate budget and mutation surfaces

Cloud improve run creation now accepts and persists search-policy controls, including candidate budget and mutation surfaces, so the control plane and Cloudflare runner execute the same optimization contract the product asked for.This lets trigger-only runs stay focused on trigger-relevant surfaces while outcome-backed task-package suites can intentionally open up body and structure optimization.

2026-04-24

CloudDashboardRegistry

Cloud moves private bundle publishing out of GitHub settings into a dedicated registry flow and trims settings back to connection management

The GitHub settings page no longer carries both installation management and the full private bundle publish composer in the same screen.GitHub settings is now scoped back to installation binding, org-wide write-back defaults, and connected repository sources, while the actual private bundle flow now lives at /registry/new.That new registry flow still uses a compact step-through wizard, but it now sits under the information architecture the user expects: registry publishing under Registry, GitHub access management under Settings.

2026-04-23

CloudDashboardRegistry

Cloud GitHub installs now use a signed start/callback flow so workspace binding survives the GitHub round-trip

The GitHub settings page no longer sends operators straight to a raw GitHub install URL with no workspace context.SelfTune Cloud now starts the flow through a signed install-state redirect, receives the GitHub setup callback on a dedicated callback route, and sends the browser back to the originating workspace with the installation_id intact.That fixes the broken state where the GitHub App could already be installed on an account, but the workspace still had no bound installation row, which left repository selection permanently empty.

2026-04-23

CloudDashboardBilling

Cloud billing success now reconciles completed Stripe checkout sessions directly instead of waiting indefinitely for webhook-only sync

The billing success screen now uses the returned Stripe Checkout session_id to reconcile the completed subscription directly into the current workspace instead of relying only on a webhook-driven database update.That means local development and delayed webhook cases no longer leave the workspace stuck on the indefinite “Setting up your subscription…” spinner even when Stripe already shows the checkout as paid and trialing.The success page also now falls back to a bounded retry loop with an explicit retry state instead of spinning forever after the sync window is exhausted.

2026-04-23

CloudDashboardRegistryBilling

Cloud GitHub bundle setup now renders a real upgrade state for free workspaces and uses a clearer staged publish flow

The GitHub settings page now checks workspace billing first, so free workspaces see a direct upgrade state instead of a raw 402 API error when GitHub publishing is not available on the current plan.The private bundle setup flow was also tightened into a clearer staged experience that separates bundle destination, GitHub source selection, and release behavior, with a compact publish summary alongside the form.That makes the page usable as an operator workflow instead of exposing the plan gate as a transport error and burying the actual publish decisions in one large settings form.

2026-04-23

CloudPlatforms

Cloud API background workers are now explicit so Neon compute can scale down when the product is idle

SelfTune Cloud no longer starts in-process API schedulers just because a DATABASE_URL is present.Cron workers now require an explicit ENABLE_BACKGROUND_WORKERS=1 worker deployment, the rollback-only legacy cloud-improve dispatcher stays off unless CLOUD_IMPROVE_RUNTIME_MODE=legacy is set intentionally, and the Cloudflare runtime recovery sweep now runs every 15 minutes instead of every 5 minutes.That reduces accidental always-on Neon compute usage while keeping the background repair paths available when they are explicitly needed.

2026-04-23

CloudDashboardRegistry

Cloud org workspaces now switch in-app and private registry bundles can be bootstrapped from the selected workspace

SelfTune Cloud now lets operators switch org workspaces directly in the app instead of implicitly binding the dashboard to the first org membership on the account.That selected workspace now carries through the dashboard shell, registry surfaces, and GitHub-connected private bundle bootstrap path, so partner-style operators can move between client workspaces without dropping into a separate browser session.The sidebar header was also tightened into a compact workspace selector with inline search and workspace creation, while private registry detail pages now double as install and onboarding surfaces for org-scoped bespoke bundles.

2026-04-22

CloudDashboard

Cloud uploads now stay on the skill workspace after the first quick suite is prepared

Uploading a new cloud skill still prepares the first quick eval suite automatically, but SelfTune now keeps you on the canonical skill workspace instead of auto-starting the first run and pushing you into the run screen.That keeps the first improve run explicit, surfaces the setup and validation state in one place, and makes the live run page a secondary diagnostics surface instead of the default first impression.

2026-04-22

CloudDashboard

Cloud run review now defaults to one compact summary and moves deeper evidence behind diagnostics

Completed cloud improve runs now lead with a single proposal-review pointer instead of stacking release-gate, trust, timeline, candidate, and artifact surfaces above the fold on the run page.The lower-level evidence still exists, but it now lives behind a Diagnostics disclosure so the completed run screen reads as a lighter handoff instead of an operator console.

2026-04-22

CloudDashboard

Cloud source validation now has a dedicated change-review surface for prepared fixes

The cloud source page now shows prepared validation fixes in a dedicated review surface directly below the hero instead of burying the full SKILL.md diff inside the setup card.That review surface renders the exact unified SKILL.md diff plus the directory-change list in one place, so the apply decision happens next to the real package changes rather than next to generic warning copy.

2026-04-22

CloudDashboard

Cloud validation fixes can now be applied directly from the prepared diff preview

The cloud source workflow now lets you apply the prepared deterministic validation autofix directly from the default validation preview card.That means SelfTune no longer stops at “here are the warnings and the diff” when the bounded next write is already known. It can create a refreshed source snapshot for you and immediately re-run validation on that package.

2026-04-22

CloudDashboard

Cloud validation now prepares package fixes with a visible SKILL.md diff and directory-change preview

When validation fails on a cloud source, SelfTune now prepares the deterministic package autofix preview from the current snapshot archive and shows it directly in the source workflow.That means the default source path now surfaces the proposed SKILL.md diff and directory changes instead of only listing validation warnings and making the operator infer the next step manually.

2026-04-22

CloudDashboard

Cloud runtime details now separate active-loop coordination from lower-level source internals

The source page’s runtime lane now splits into Run coordination versus Source internals, so active-loop guidance stops sharing one mixed panel with the lower-level source diagnostics.That keeps the source workflow closer to one guided hosted loop even after you open the deeper technical surfaces.

2026-04-22

CloudDashboard

Cloud benchmark authoring now separates draft review from bundle workflow

The source page’s benchmark authoring lane now splits into Draft review versus Bundle workflow, so task-package remediation follows a clearer scaffold, materialize, and smoke-check sequence.That keeps the authoring lane closer to a guided workflow instead of one broad technical tool surface once a runnable draft is in progress.

2026-04-22

CloudDashboard

Cloud source technical details now default to benchmark authoring only when a task draft is active

The source page’s technical details now default to Benchmark authoring when a persisted task-check draft exists, and to Runtime state otherwise.That keeps benchmark remediation work in front of you when SelfTune already has a bounded authoring draft in progress, without making the runtime and coordinator state fight for space in the same default technical view.

2026-04-22

CloudDashboard

Cloud source advanced workflow details now split trust rationale from technical authoring state

The source page’s advanced workflow surface now splits into Why and trust versus Technical details, so benchmark pressure, trust rationale, and reviewed suggestion queues no longer share one undifferentiated panel with coordinator and authoring state.That keeps the source workflow closer to one guided hosted loop even after you open the deeper details.

2026-04-22

CloudDashboard

Cloud source pages now keep run pressure and uncovered-case queues behind the advanced workflow details boundary

The source page now keeps latest-run pressure, uncovered-case suggestions, and reviewed suggestion history behind the existing advanced-details boundary instead of showing that whole queue in the default setup flow.That makes the main source experience read more like one guided hosted loop, while still keeping the deeper benchmark and trust evidence one click away when you actually need it.

2026-04-22

CloudDashboard

Cloud source run-start gates now stay aligned with the selected eval suite

Cloud source detail now returns a suite-aware improve-run gate for the selected eval suite, and source-detail refreshes preserve that suite id across analysis reruns, validation polling, and latest-run refreshes.That removes another page-local trust inference path from the source screen and keeps the run-start gate aligned with the same backend contract the rest of the hosted loop now uses.

2026-04-22

CloudDashboard

Cloud benchmark repair drafts now follow task-package workflows automatically when the source already uses them

When a source already relies on task-package suites, SelfTune now seeds the auto-prepared benchmark repair draft as a task-package scaffold instead of always starting from a generic structured check.That keeps bounded benchmark remediation aligned with the existing task-package workflow earlier in the loop, including the later materialize and smoke-check path.

2026-04-22

CloudDashboard

Cloud hosted-loop state now distinguishes benchmark repair work in progress from review-ready drafts

When SelfTune has only auto-seeded the benchmark repair draft, the hosted loop now reads that as system remediation still in progress. It only flips to AI remediation once the bounded refinement pass is complete and the draft is truly ready for review.The source page follows the same split, so the run-start gate and next-step guidance now distinguish “SelfTune is still working” from “now it needs your review.”

2026-04-22

CloudDashboard

Cloud benchmark repair remediation now runs in the background for recent active sources

The same bounded benchmark repair drafting and one-pass refinement flow used on source detail can now also run from the scheduler for recent active sources.That means SelfTune no longer needs a source page read before it can prepare the next review-first benchmark repair checkpoint for a source that is accumulating trust pressure.

2026-04-22

CloudDashboard

Cloud sources now block new improve runs until auto-prepared benchmark repair drafts are reviewed

When SelfTune auto-prepares a bounded benchmark repair draft, the source UI and the run-creation control plane now both hold the next improve run until that draft is reviewed.This turns the benchmark repair checkpoint into a real hosted-loop gate instead of leaving it as a visible suggestion beside an otherwise still-live Start improve run action.

2026-04-22

CloudDashboard

Cloud benchmark repair drafts now get one bounded refinement pass automatically

When SelfTune auto-prepares a bounded benchmark repair draft, it now also runs one refinement pass through the existing authoring path before handing the draft back for review.That means the first repair proposal is less skeletal, while the release gate still stays in review-first mode instead of silently saving new benchmark cases into the canonical suite.

2026-04-22

CloudDashboard

Cloud source workflow headers now treat auto-prepared benchmark drafts as AI remediation checkpoints

When SelfTune auto-prepares a bounded benchmark repair draft, the canonical source hostedLoopState now reflects that as an AI remediation checkpoint instead of still reading like the source is simply ready for another run.That keeps the source workflow header, release gate, and benchmark repair card aligned with the same autopilot step rather than splitting the story across separate local UI hints.

2026-04-22

CloudDashboard

Cloud source pages can now auto-seed one benchmark repair draft from trust pressure

When a cloud source already has a saved suite but recent run pressure or pending eval suggestions show one clear benchmark hole, SelfTune can now auto-prepare one bounded task/output repair draft instead of only warning about the trust problem.The draft is persisted through the existing eval authoring session model, recorded in the audit trail, and surfaced in the source workflow hero so the operator can review the suggested repair directly.

2026-04-22

CloudDashboard

Cloud source remediation now runs in the background instead of waiting for a page view

The first safe cloud source self-heals no longer depend only on someone opening source detail.SelfTune now runs bounded background remediation for recent active sources, so ended Apply Observation windows, stale saved-suite smoke checks, default judge calibration drift, and stale source coordinator locks can be repaired before an operator even refreshes the page.

2026-04-22

CloudDashboard

Cloud source pages now mirror the canonical hosted-loop state in the setup header

The cloud source setup hero now reads the backend hostedLoopState contract for its top-level loop summary and next-action copy instead of relying only on page-local wording.The selected-suite trust gate still appears as a separate setup blocker underneath that header, but the primary workflow frame now stays aligned with the same stage/status model that already drives run and proposal pages.

2026-04-22

CloudDashboard

Cloud source pages now show when SelfTune already fixed deterministic blockers for you

Cloud source detail can now clear two deterministic blockers before the page even asks for operator input: it settles ended Apply Observation windows, reruns the latest eligible saved-suite smoke check against the current snapshot when it is missing or stale, reruns the default judge calibration benchmark when it is missing, stale, or context-drifted, and clears stale source coordinator locks after the persisted Cloud Improve Run has already ended.The setup hero now shows a SelfTune auto-remediation card so operators can see what the system already handled automatically instead of guessing why the trust state changed.

2026-04-22

CloudDashboard

Cloud run and proposal pages now mirror the canonical hosted-loop state

Hosted improve run detail and proposal detail now read the backend hostedLoopState contract for their top-level loop summary and next action copy instead of relying only on page-local wording.That means the workflow headers now stay aligned with the same stage/status model the API emits, while older fixtures still fall back safely until every surface is fully migrated.

2026-04-21

CloudDashboard

Cloud automation cards now explain why trusted automation is still deferred

The shared Automation and release gate card now shows the next ladder rung, Trusted automation, instead of stopping at the current release boundary.Operators can now see whether a source is simply too early, not ready because trust is holding the gate, or still on the assisted-default path because higher autonomy is a later policy tier.

2026-04-21

CloudDashboard

Cloud source, run, and proposal pages now explain what SelfTune automates versus what still needs approval

Cloud source setup, improve-run detail, and proposal detail now share an Automation and release gate card instead of expecting operators to infer the product boundary from separate trust panels and button states.The card makes the current Assisted default mode explicit, says what SelfTune already handles automatically, and says what still needs human approval before anything ships.

2026-04-21

CloudDashboard

Cloud setup heroes now say when setup is complete but trust is still blocking the next run

Cloud source pages no longer show a flat Ready to improve state when the setup checklist is complete but trust history or another release gate is still blocking the next run.The primary setup card now explains that setup is complete while the next run is still blocked, and surfaces the blocker inline instead of burying it under the disabled button.

2026-04-21

CloudDashboard

Blocked cloud improve runs now tell you exactly where to click next

When a cloud improve run is blocked by trust, the setup hero now explains what to do next instead of only showing a disabled button.The primary card surfaces direct actions like Show advanced details and Review eval suite, so operators can jump straight to the place where the blocker can be resolved.

2026-04-21

CloudDashboard

Cloud source pages now keep the default setup loop ahead of advanced trust and authoring controls

Cloud source detail now keeps the workflow hero and suite editor at the top of the page, and moves coordination, task-package authoring, structure recommendations, and full trust panels behind one explicit advanced toggle.The default path now reads more like review checks -> start improve instead of forcing operators to scan the whole optimization console before taking the first action.

2026-04-21

CloudDashboard

Cloud improve pages now reinforce one hosted-loop workflow from setup through apply

Cloud source setup, improve-run detail, and proposal detail now share the same hosted-loop header before the deeper trust and evidence panels.The pages now reinforce the same Connect -> Prepare -> Improve -> Review -> Apply -> Observe model so operators can see where they are in the loop and what happens next without learning the underlying optimizer internals first.

2026-04-21

CloudDashboard

Cloud improve runs now join winning search provenance and trust evidence in one review card

Improve run detail now shows one Optimization context card for the winning candidate instead of forcing operators to mentally combine the search notes and trust panels.The card combines winning search provenance, saved-suite smoke, eval-vs- observed outcomes, judge calibration, and trust-history context in one review surface.

2026-04-21

CloudDashboard

Cloud proposals now join search provenance and trust evidence in one review card

Proposal detail now shows one Optimization context card that ties the chosen search strategy to the suite trust state that should govern whether the candidate is believable.The card combines search provenance, saved-suite smoke, eval-vs-observed outcomes, judge calibration, and trust-history context so operators can judge the optimizer and the measurement story together.

2026-04-21

CloudDashboard

Cloud source detail now blocks new improve runs when trust context is on hold

Cloud source detail now treats a Hold trust state as a real run-start barrier, not just a summary badge.When the current saved suite or measurement trust is too degraded to trust another run, the quick-start hero disables Start improve run and explains which stale trust driver is blocking the action.

2026-04-21

CloudDashboard

Cloud proposal apply now blocks writeback when trust context is on hold

Proposal detail now treats a Hold trust recommendation as an actual apply barrier instead of only a warning.When the surrounding saved-suite smoke, source trust, calibration, or outcome history says the proposal should not be trusted yet, the apply actions are disabled and the page explains that trust must improve first.

2026-04-21

CloudDashboard

Cloud proposal apply now summarizes whether trust signals say ready, review, or hold

Proposal detail now turns the surrounding trust context into a simple apply recommendation instead of only listing caution bullets.Operators now see whether the page believes the proposal is Ready, Review carefully, or Hold, based on the suite smoke result, source trust, calibration health, trust-history advisory, and rollback-review signals.

2026-04-21

CloudDashboard

Cloud proposal apply now surfaces trust cautions before writeback

Proposal detail now surfaces trust cautions directly in the apply panel when the surrounding measurement state deserves extra scrutiny.Before applying a proposal, operators now see warnings for watch or stale source trust, failed saved-suite smoke checks, weak or drifting judge-calibration results, and recent Apply Observation windows that are still pending.

2026-04-21

CloudDashboard

Cloud run detail now carries the same trust context as proposal review

Improve-run detail now reuses the same source trust context that proposal review already shows.Hosted run pages now keep the saved-suite smoke result, judge-calibration history, and trust-history ledger visible next to the run evidence so operators can review search output and trust state in one place.

2026-04-21

CloudDashboard

Cloud proposal review now joins search provenance with trust context

Cloud proposal detail now shows why a candidate was selected and why its measurement should be trusted in the same review flow.Proposal pages now combine bounded search provenance with the matching source trust context: eval-suite trust status, latest saved-suite smoke result, judge-calibration benchmark history, and the source trust-history ledger.

2026-04-21

CloudDashboard

Cloud trust now keeps a longitudinal trust-history ledger

Cloud source trust panels now include a chronological trust-history ledger instead of relying only on summary buckets.The ledger combines saved judge-calibration runs, canonical saved-suite smoke checks, suite-backed Cloud Improve Run outcomes, and Apply Observation outcomes into one operator-readable timeline.

2026-04-21

CloudDashboard

Cloud trust now shows judge calibration history and drift

Cloud source trust panels and observed-skill cloud controls now surface the latest saved llm_judge calibration benchmark, the change versus the previous comparable run, the weakest benchmark families, and recent benchmark history.That gives operators a concrete review workflow for prompt/model calibration drift instead of relying only on the raw calibration CLI output.

2026-04-20

CloudDashboard

Cloud trust panels now include broader 90-day outcome history

Skill-level and source-level cloud trust panels now add coarse 30-day outcome buckets across the last 90 days on top of the short weekly history.That gives operators a broader trust read before they queue another hosted run, instead of relying only on the most recent few outcomes.

2026-04-20

CloudDashboard

Cloud trust now correlates benchmark health with observed proposal outcomes

Skill-level and source-level cloud trust panels now group recent suite-backed runs by each suite’s latest canonical saved check state.That makes it easier to see whether suites that currently look healthy are actually lining up with helped outcomes after apply, or whether regressions are clustering around failed or missing saved checks.

2026-04-20

CloudDashboard

Cloud trust panels now include short weekly outcome history

Skill-level and source-level cloud trust panels now include a compact multi-window post-apply history derived from recent completed observation buckets.That makes it easier to see whether the last few windows were improving, regressing, mixed, or steady instead of relying on a single rolling badge.

2026-04-20

CloudDashboard

Cloud trust panels now include a recent outcome timeline

Skill-level and source-level cloud trust panels now include a compact recent outcome timeline based on the same post-apply observation summary that powers the direction badge and latest outcome link.That keeps a short run of concrete helped, regressed, or inconclusive proposal outcomes visible without drilling into proposal history.

2026-04-20

CloudDashboard

Cloud trust panels now summarize recent post-apply direction

Skill-level and source-level cloud trust panels now condense recent post-apply outcomes into a compact direction signal: Improving, Regressing, Mixed, Steady, or Needs more signal.That makes it easier to tell whether trust is getting better or worse without opening each proposal outcome one by one.

2026-04-20

CloudDashboard

Cloud trust warnings now link to the latest observed proposal outcome

The skill-level and source-level cloud trust panels now surface the latest completed post-apply outcome with a direct proposal link.That means stale or watch-mode trust warnings now point at concrete outcome evidence instead of leaving operators to hunt through proposal history by hand.

2026-04-20

CloudDashboard

Cloud trust summaries now self-heal ended apply observation windows

Source trust summaries no longer wait for proposal-detail reads or the batch observation scorer to reflect ended apply windows.When an observation window has already ended, the cloud trust summary now scores it on read and promotes it into the completed outcome counts immediately.

2026-04-20

CloudDashboard

Cloud trust summaries now show when recent applies are still being observed

Cloud trust summaries no longer flatten everything into completed outcomes. The selected source trust card and run preflight now show when recent applies are still inside the post-apply observation window, so operators can tell when the latest outcome counts are not final yet.

2026-04-20

CloudDashboard

Observed-skill cloud controls now warn before queueing runs with stale or failing trust signals

The observed-skill Cloud Improve panel now raises explicit run preflight warnings when the selected source trust is already degraded or when the selected suite’s last canonical task-package smoke check failed.That keeps risky benchmark state visible at the actual queue point instead of only inside the deeper source detail screens.

2026-04-20

CloudDashboard

Observed-skill cloud controls now show source trust before queueing runs

The Cloud Improve panel on observed skill pages now shows the selected cloud source’s trust summary, recent post-apply observation counts, and the latest canonical task-package smoke result for the selected suite.That means operators can check benchmark freshness and recent real-world outcomes before they queue another hosted improve run, instead of drilling into the cloud source page first.

2026-04-20

CloudDashboard

Cloud source evidence can now jump straight into task-package draft authoring

Suggested trigger cases and recent run-pressure cards can now promote directly into a review-only task-package draft instead of always starting on the structured path first.That direct promotion path now follows the first persisted draft/refine step with initial environment and verifier asset generation, so operators start from source evidence plus concrete draft files instead of an empty task-package scaffold.

2026-04-20

CloudDashboard

Cloud task-check drafts now start through the authoring agent flow

Suggested trigger cases and recent run-pressure cases no longer create task/output drafts purely in page state.Draft creation now goes through a review-first promotion route that builds seed enrichment on the server and invokes the same Think or fallback refinement path used by the persisted authoring session.

2026-04-20

CloudDashboard

Cloud task-check drafts now keep visible authoring activity history

Cloud source pages now show the recent review-first authoring steps attached to a persisted task/output draft instead of only the latest draft snapshot.The authoring session now keeps a bounded activity log for draft saves and promotions, Think or fallback refinement, task-package asset generation, bundle materialization, runnable smoke checks, and draft clears.

2026-04-20

CloudDashboard

Cloud advanced run settings now show canonical task-package smoke freshness in suite selection

The advanced run drawer no longer hides whether a saved task-package suite most recently passed or failed its canonical smoke check.This reuses the same summary-level saved-suite smoke signal in the advanced run suite chooser, so operators can compare suite trust while configuring a run, not just while editing the suite.

2026-04-20

CloudDashboard

Cloud suite pickers now show canonical task-package smoke freshness

Operators can now compare saved task-package suites before opening one in the editor.This adds summary-level canonical smoke state to hosted eval-suite list responses and surfaces it directly in the cloud source page suite picker, so saved-check freshness is visible during suite selection as well as after a suite is opened.

2026-04-20

CloudDashboard

Cloud source setup summaries now show the latest canonical task-package smoke result outside the editor

Operators no longer need to open the eval editor to see the last saved task_package smoke result.This now surfaces the latest canonical smoke state directly in the cloud source page setup summary, so the trust signal is visible in the broader read model as well as in the editor.

2026-04-20

CloudDashboard

Cloud saved task-package suites now remember the last canonical smoke result

Running a saved canonical task_package suite smoke check no longer yields a result that disappears as soon as the request ends.This now writes the latest canonical smoke result back into the saved suite metadata and surfaces it in the cloud source page editor, so operators can see the last saved-suite check before deciding whether to rerun it.

2026-04-20

CloudDashboard

Cloud source pages can now run saved canonical task-package suites once against the current snapshot

Saved canonical task_package suites no longer require a separate improve run just to verify the saved case still works against the current source snapshot.This adds a narrow saved-suite smoke action on cloud source pages and matching eval-suite routes, so operators can run a saved canonical task-package case once before they reuse it in hosted improve runs.

2026-04-20

CloudDashboard

Cloud task-package saves now retain draft provenance and latest smoke results in the canonical case payload

Runnable task_package saves no longer drop the context that produced the draft.This now writes a typed task_package_metadata block into matching canonical cases, preserving the seed evidence, expected-outcome scaffold, optional notes, and latest smoke result that came from the authoring session.

2026-04-20

CloudDashboard

Cloud task-package saves now require a fresh smoke result before a runnable case can be written into a canonical suite

Runnable task_package drafts on cloud source pages can no longer be saved into a canonical hosted eval suite if the latest smoke result is missing or stale.This now blocks the save both on the source page and in the eval-suite create/update routes, so runnable task-package cases must be freshly smoke-checked before they become part of the authoritative suite.

2026-04-20

CloudDashboard

Cloud task-package drafts now keep the last smoke result visible and mark it stale when the scaffold, assets, or bundle change

Runnable task-package drafts on cloud source pages no longer silently lose their last smoke-check result when the scaffold or generated bundle changes.This now keeps the latest smoke result visible, marks it stale with an explicit reason, and tells the operator when the draft needs to be smoke-checked again before save.

2026-04-20

CloudDashboard

Cloud source pages can now smoke-check runnable task-package drafts before saving them into a canonical suite

Runnable task-package drafts on cloud source pages can now be executed once against the current snapshot before they are saved into a canonical suite.This persists the latest pass/fail result back into the authoring session, so operators can verify the materialized bundle and current snapshot still work together before promoting the case.

2026-04-20

CloudDashboard

Cloud task-package drafts now switch into an explicit runnable state once their review-only bundle is materialized

Materialized task-package drafts on cloud source pages no longer stay labeled like scaffolds.This now promotes them into an explicit runnable draft state, updates the promotion preview and bundle preview accordingly, and makes it clear that the existing save flow will write a real canonical task_package case.

2026-04-20

CloudDashboard

Cloud task-package drafts can now materialize review-only bundles into real R2-backed environment archives

Persisted task-package drafts on cloud source pages can now turn generated bundle files into a real review-only environment archive.This uploads the bundle to R2, points the draft scaffold at the materialized archive, and keeps the archive descriptor in the same authoring session until the operator explicitly promotes the draft further.

2026-04-20

CloudDashboard

Cloud task-package drafts now roll generated asset files into a review-only bundle preview with explicit archive-materialization readiness

Persisted task-package drafts on cloud source pages no longer stop at raw environment/verifier asset text.This now rolls the generated files into a review-only bundle preview, marks when the draft is ready for archive materialization, and keeps the whole flow inside the persisted authoring session until an operator explicitly promotes it further.

2026-04-20

CloudDashboard

Cloud task-package drafts can now generate review-only environment and verifier asset drafts through the authoring agent

Persisted task-package drafts on cloud source pages can now generate a review-only environment manifest draft and verifier script draft through the authoring agent.This keeps the new task-package authoring flow behind the same persisted draft session, surfaces whether the generated assets came from Think or the fallback template path, and avoids writing anything canonical until the operator decides the draft is ready.

2026-04-20

CloudDashboard

Cloud source pages now support real task-package draft authoring, including editable scaffold fields and canonical task-package case saves

Persisted task/output drafts on cloud source pages can now move past a placeholder preview into real task-package scaffold authoring.This adds editable instruction, environment, verifier, oracle, skill mount, and resource-hint fields to the review-only draft flow, lets operators save that scaffold back into the persisted authoring session, and makes the existing save path emit a real task_package case when that is the chosen promotion target.

2026-04-20

CloudDashboard

Cloud source pages now show canonical promotion previews for task/output drafts and let operators switch a persisted draft between a structured check and a review-only task-package scaffold

Persisted task/output drafts on cloud source pages now show what the current promotion target will become in the canonical eval pipeline before anything is saved.This makes the next step explicit: a draft can stay on the structured deterministic path, or switch into a review-only task-package scaffold with the expected environment and verifier placeholders called out up front.

2026-04-20

CloudDashboard

Cloud source pages can now refine persisted task/output drafts through a review-only authoring agent, with an explicit fallback when Workers AI is unavailable

Persisted task/output drafts on cloud source pages can now run through a bounded authoring-agent refinement step before the operator saves anything canonical.This keeps the same draft/session contract, surfaces whether the refinement came from a Think-backed path or a deterministic fallback, and lets the page stay review-first even when Workers AI is unavailable. The dashboard exposes that action through the new public refine route instead of a page-local-only mutation path.

2026-04-20

CloudDashboard

Cloud source pages now show review-only guidance for persisted task/output drafts and let operators apply that scaffold back into the draft before saving it

Persisted task/output drafts on cloud source pages now show API-derived review-only guidance from matching suggestions, recent run pressure, and saved eval-suite overlap.This means operators can see what failure a draft protects against, what verifier shape is likely to fit best, whether they should extend an existing suite, and apply the suggested scaffold back into the deterministic draft before they save anything canonical.

2026-04-20

CloudDashboard

Cloud task/output drafts now keep seed evidence, promotion target, and expected-outcome scaffolding, so deterministic eval authoring survives refreshes with real provenance instead of thin form state

Cloud source pages now persist richer deterministic draft metadata for eval authoring, including the originating evidence, the intended promotion target, and a first expected-outcome scaffold.This means review-only task/output drafts are no longer just local editor fields. Operators can refresh and resume the draft while still seeing what seeded it and what kind of deeper check it is trying to become.

2026-04-19

CloudDashboard

Cloud source pages now keep one resumable task/output draft per source, so operators can refresh and continue deterministic eval authoring without rebuilding the draft from scratch

Cloud improve now stores a review-only task/output draft per source in the runtime layer and exposes it back through source detail.This means a deterministic draft started from trigger evidence is no longer purely local page state: operators can refresh, resume the draft, or clear it without overwriting the saved trigger-confidence suite.

2026-04-19

CloudDashboard

Improve-run detail now keeps the active timeline step and proposal links in sync after completion, so operators can see live progress and open the winning proposal without a manual refresh

Improve-run detail now seeds the current phase into the timeline immediately, keeps the active step visibly live, and briefly re-checks proposal links after a successful run until the winning proposal is available.This closes two gaps on the same surface: active runs now look active where the operator is already reading, and completed runs no longer require a manual refresh just to open the winning proposal.

2026-04-19

CloudDashboard

Improve-run pages now keep refreshing proposal links briefly after a run completes, so new proposal links appear without a manual reload

Cloud improve-run pages now re-sync proposal links after terminal updates and keep polling briefly when a winning candidate exists but the proposal link has not settled yet.This closes the gap where a run could finish successfully and create a proposal seconds later, while the improve page still looked like proposal creation had been skipped until a manual refresh.

2026-04-19

CloudDashboard

Cloud source pages now show live source-coordination state, including the active reserved run and queued rerun/cancel intent

Cloud source detail now carries a compact coordinator read model from the runtime, and the source page surfaces that state directly.This makes it visible when a source already has an active reserved run, whether cancel was requested, and whether a rerun is queued, without inferring it only from raw run rows.

2026-04-19

CloudDashboard

Cloud source pages can now promote trigger evidence into a detached task/output draft, so operators can scaffold deterministic checks without overwriting the saved trigger suite

Pending eval suggestions and recent run-pressure cards now offer a Draft task check action that opens the eval editor in deterministic mode with a new, unsaved task/output draft scaffold.This keeps the saved trigger-confidence suite intact while giving operators a fast path to start deeper task/output coverage from real evidence.

2026-04-19

CloudDashboard

Improve-run timelines now keep the active phase on a colored dot, so the phase progression stays visible while a run is live

The live step on the cloud improve-run timeline now stays on the same phase-colored dot as the rest of the run history instead of switching to a generic spinner.This keeps the setup, evaluation, drafting, and finalization phases visually distinct even while the run is still active.

2026-04-19

CloudDashboard

Cloud skill pages now separate trigger-confidence evals from task/output checks in the onboarding and eval-editor language

Cloud skill onboarding and eval-editor copy now makes the intended eval progression explicit: start with trigger confidence, then add task/output checks once the skill is activating in the right places.This matches the hosted eval contract more closely and makes it clearer that discoverability/routing checks and deeper task correctness checks answer different questions.

2026-04-19

CloudDashboard

Cloud source pages now expose a public rerun-analysis action, so operators can reprocess the current snapshot without creating a new upload or GitHub sync

Cloud source pages now include a Rerun analysis action that triggers the existing snapshot analysis pipeline for the current snapshot through a public dashboard route.This makes it possible to refresh validation, lint, capability, and structural reports after cloud-side analysis logic changes, without creating a new upload or GitHub sync just to force a re-run.

2026-04-19

CloudDashboard

Improve-run pages now preserve run scope on per-candidate proposal links, so review navigation stays inside the run-specific proposal queue

Candidate-level View proposal links on improve-run pages now keep the current ?run=... context instead of dropping back to the global proposal queue.This keeps the operator inside the run-specific review flow whether they open the winning proposal from the summary header or from the candidate list.

2026-04-19

CloudDashboard

Proposal cards now render separate skill and review links instead of nesting anchors, fixing a hydration bug on the dashboard proposals page

Cloud proposal cards now show an explicit Review proposal link inside the card instead of wrapping the whole card in a detail link.This removes invalid nested anchor markup when a proposal card also links to its skill page, which fixes the hydration error on the dashboard proposals route.

2026-04-19

CloudDashboard

Cloud source pages now read structural provenance directly from the proposal queue payload, removing an extra proposal-detail fetch from the latest-run review path

The run-scoped proposal queue now carries candidate provenance for cloud improve proposals, so cloud source pages can explain the newest structural proposal directly from the queue payload.This removes the extra proposal-detail request the source page used to make just to show the latest proposal’s structural origin.

2026-04-19

CloudDashboard

Cloud skill source pages now show which structural recommendation produced the newest linked proposal, so authors can connect snapshot analysis to the actual reviewable candidate

The structure-candidates panel on each cloud skill source page now surfaces the newest linked proposal’s exact structural recommendation and deterministic strategy when that proposal came from a structure run.This closes the gap between “top recommendations on the snapshot” and “the candidate you can review right now,” so authors can see why the latest proposal exists before opening the proposal detail page.

2026-04-19

CloudDashboard

Cloud proposal queues and detail routes now have explicit review-flow coverage, reducing the risk of silent regressions in run-scoped proposal review

Added route-level coverage for the run-scoped proposal queue and proposal detail lookup used by cloud-improve review flows.This hardens the path from improve-run pages into proposal review by checking both the queue filter (cloud_run_id) and proposal-detail validation behavior.

2026-04-19

CloudDashboard

Cloud improve and skills surfaces now use softer, consistent status treatments instead of mixing heavy cyan badges, raw enum labels, and duplicate readiness states

Cloud skill cards, improve-run pages, and setup checklists now use a standardized status system with softer status chips, text-only status labels where appropriate, and clearer warning/error semantics.This removes raw labels like cloud_ready, collapses redundant Ready states on the skills library cards, and brings advanced run settings into a drawer with more readable model selection options.

2026-04-19

CloudDashboard

Proposal detail now normalizes structural provenance consistently after refresh and keeps run-scoped back navigation on the same helper path as the proposal queue

Proposal detail now uses shared helpers to normalize structural provenance and run-scoped back navigation, so refreshed review pages keep the same candidate-rationale and queue-return path instead of relying on ad hoc inline logic.This keeps the proposal review surface more predictable as cloud-improve candidates add richer provenance metadata and more scoped review flows.

2026-04-19

CloudDashboard

Proposal detail now shows which structural recommendation produced a cloud structure candidate, so operators can see why a package rewrite was drafted before applying it

Cloud proposal detail now surfaces the structural recommendation and deterministic strategy that produced a structure candidate, such as extract_references or harden_script_ergonomics.That makes package-backed proposal review more defensible: operators can see which structural analysis signal led to the candidate before deciding whether to apply it to a draft or GitHub PR.

2026-04-19

CloudDashboard

Improve-run detail now links straight into the run-scoped proposal queue when a candidate frontier produced more than one reviewable proposal

Improve-run detail pages now preserve run context when you open the winning proposal, and they surface a direct Review run proposals action whenever a hosted run produced more than one reviewable proposal.That keeps the run frontier, proposal queue, and individual proposal review pages tied together, so multi-candidate review flows no longer require manual navigation between /improve and /proposals.

2026-04-19

CloudDashboard

Run-scoped proposal review now preserves that scope when you open proposal detail, so it is easier to move through a single cloud-improve queue without losing context

When you open proposal detail from a run-scoped proposals list, the detail page now keeps that run context and offers a Back to run proposals path instead of dropping you back into the global proposal backlog.This keeps multi-candidate cloud review flows tighter: operators can move between the run-specific proposal queue and individual proposal detail pages without reapplying filters or losing their place.

2026-04-19

CloudDashboard

Proposal review can now be scoped to a single improve run, so multi-candidate cloud runs no longer dump you back into the full proposal backlog

The proposals page now accepts a run-scoped view for cloud-improve runs, letting operators review only the proposals created by a single hosted run instead of filtering mentally through the full backlog.Cloud skill source pages now use that view when the latest run produced multiple proposals, so the Structure candidates panel can link directly into proposal review even when there is more than one candidate to inspect.

2026-04-19

CloudDashboard

Cloud skill source pages now explain when structure proposals are viable and link directly into proposal review when the latest run produced one

Cloud skill source pages now synthesize the typed structural-analysis report into a dedicated Structure candidates panel instead of leaving operators to infer package-shape readiness from raw technical details.The panel now shows whether the current snapshot is ready for structure proposals, still review-first because of execution limits, or simply not a structural change candidate right now. When the latest run already produced a reviewable proposal, the page links straight into the proposal review flow; otherwise it routes back to the latest run frontier.

2026-04-19

CloudDashboard

Cloud proposal review now renders the real candidate diff for package-structure changes and respects GitHub PR apply targets

Cloud proposal detail now renders the unified diff from the winning candidate package instead of only showing placeholder see candidate archive text. Structure-focused proposals can be reviewed as real package changes, and the page links directly to the candidate archive when you want the full package.Proposal apply now also respects the run’s configured apply target. If a cloud-improve run was set to write back through GitHub, the proposal page now offers a GitHub PR action instead of always defaulting to draft promotion.

2026-04-19

CloudDashboard

Cloud proposal detail now keeps the originating run and apply history visible after refresh

Cloud proposal detail now links back to the originating improve run so you can move from a pending proposal into the full candidate frontier and evaluation evidence without hunting for the run separately.The page also now shows recent apply attempts, including whether the attempt targeted a draft or GitHub PR, when it ran, and the PR URL or error message when one is available. That keeps proposal review useful even after the page is refreshed or revisited later.

2026-04-19

CloudDashboard

The proposals index now recognizes archive-backed cloud-improve proposals instead of showing them as fake single-field diffs

The proposals index now correctly tags cloud-improve proposals created by the hosted runner, even when proposed_by is the actual runner identity instead of the older cloud_improve string.Archive-backed cloud proposals also no longer render (see candidate archive) as if it were a field-level diff. The card now tells you it is a package-backed change and shows whether a linked candidate package and source run are available before you open the full review page.

2026-04-18

CloudDashboard

Cloud skill source pages now surface structural analysis, script strategy, and frontmatter execution guidance instead of hiding them in raw validation blobs

Cloud source detail now renders the typed structural-analysis summary for the current snapshot, including SKILL.md line and token budgets, inferred script strategy, compatibility notes, allowed-tools, and the execution flags that explain whether cloud writeback is viable or still review-first.Validation checklists and technical details also now surface structural recommendations as first-class findings instead of showing 0 issues for typed reports that were previously stored outside the generic lint array shape.

2026-04-18

CloudDashboard

Ended post-apply observation windows are now scored against baseline telemetry instead of staying pending forever

Applied cloud-improve proposals now compare a pre-apply telemetry window against the finished post-apply observation window and classify the result as helped, inconclusive, or regressed.Proposal detail now shows the before/after live-signal breakdown for eval volume, pass rate, missed triggers, false negatives, and false positives, and source-level Eval suite trust now folds recent observed regressions and helps into the trust summary so coverage isn’t the only signal. Ops can batch-score ended windows with bun run score:cloud-improve-observations.

2026-04-18

CloudDashboard

Applied cloud-improve proposals now enter an explicit observation window instead of looking fully done the moment apply succeeds

Applied cloud-improve proposals now show a post-apply observation state on the proposal detail page. Instead of treating applied as the end of the story, the page now distinguishes proposals that are still gathering live signal from ones that will eventually be evaluated against observed outcomes.This is the first thin slice of the post-apply observation loop from the cloud-improve quality hardening plan. It does not score outcomes yet, but it does create a durable observation record the moment a draft promotion or GitHub apply succeeds.

2026-04-18

CloudDashboard

Cloud skill pages now show eval-suite trust signals, and the cloud-improve runner has a first judge-calibration benchmark command

Cloud skill source pages now include an Eval suite trust panel that shows whether the saved suite still covers recent linked telemetry and the newest hosted improve-run pressure. Instead of treating every eval win as equally trustworthy, the page now marks suites as Fresh, Watch, Stale, or No signal based on how much recent evidence is actually covered by saved trigger-query cases.The cloud-improve runner also now ships with a first reusable judge-calibration command, so the llm_judge trigger evaluator can be checked against a labeled benchmark fixture instead of remaining an unmeasured instrument.

2026-04-18

CloudDashboard

Cloud skill setup now auto-fixes common Agent Skills spec issues before the first snapshot is analyzed

Upload-backed and GitHub-backed cloud skill setup now canonicalize common package issues before the first snapshot is analyzed. The setup flow rewrites lowercase skill.md to SKILL.md, rebuilds missing or invalid frontmatter, normalizes the skill name to Agent Skills format, synthesizes a required description when it is missing, and moves unsupported top-level frontmatter fields into metadata so the package starts from a spec-compliant baseline.The setup response now also reports which fixes were applied, and the cloud library success message calls that out before the first quick eval suite and hosted improve run are prepared.

2026-04-18

CloudDashboard

Cloud eval suites now learn from telemetry and hosted runs, write accepted suggestions directly into saved suites, and surface eval pressure from the latest run

Cloud skill pages now surface suggested trigger-query cases from the linked observed skill whenever recent real usage exposes misses or false positives that are not already covered by the saved suite.Hosted improve runs also now persist query-level eval evidence, so recent run failures and regressions can feed the same review queue when the active suite no longer covers those cases.These suggestions stay review-first: you can append them into the draft suite from the cloud page, inspect them in the editor, and then decide whether to save them before the next improve run. Accepted and dismissed suggestions are now also persisted, so the same pending cases do not keep resurfacing after you review them. The cloud source page now also keeps a reviewed history with restore and re-accept actions, so dismissed cases can be reopened and accepted cases can be added back into the draft without losing their provenance. Accepted suggestions can now also be written directly into the selected saved suite with their telemetry/run provenance preserved, instead of stopping at the draft-only state. The cloud source page now highlights the latest run’s eval pressure directly, and the improve run detail page surfaces the failed/regressed queries from that run instead of forcing operators to dig through raw artifacts to find them. Source detail reads also now degrade safely if the new eval-suggestion review tables are one migration behind, while review actions return a clear migration-needed error instead of a raw database exception.

2026-04-18

CloudDashboard

Cloud improve now generates true bounded surface candidates and run pages render actual reviewed diffs

Hosted improve generation now treats description, routing, and body as real bounded mutation surfaces instead of prompt-only hints. A routing candidate rewrites only routing, a description candidate rewrites only the description, and body candidates preserve routing while updating the non-routing sections they are allowed to touch.Improve run detail pages now also normalize old prose-only diff summaries back into real unified diffs when the source and candidate archives exist, so review pages show the actual changed lines instead of a rationale paragraph.

2026-04-18

CloudDashboard

Improve run detail pages now show the skill context, run outcome, and readable evidence instead of raw storage URLs

Hosted improve run pages now load the source skill and eval-suite context, summarize the winning result or failure in plain language, and show the best candidate’s score movement and diff preview directly on the page.Evidence is still available for download, but artifact links are now grouped and labeled by what they represent instead of exposing a wall of raw R2 URLs.

2026-04-18

CloudDashboard

Fresh cloud skills now auto-start the first hosted improve run, and the legacy in-process dispatcher is rollback-only

Fresh cloud skill sources now move directly from upload or sync into the first hosted improve run once the quick eval suite is generated, instead of stopping on the setup page and requiring a separate manual queue action.The API startup path also now treats the old in-process improve dispatcher as an explicit legacy rollback path rather than part of the normal runtime. Cloudflare remains the default hosted execution plane whenever the runtime URL is configured.The API-key cloud-source surface also now matches the dashboard route for creating GitHub-backed sources, which makes the same hosted improve flow scriptable for smoke runs and automation.

2026-04-18

Cloud

Branded email system with 9 React Email templates and centralized Resend service

New @selftune/email package with 9 branded React Email templates: welcome, alert notification, evolution proposal, weekly digest, team invitation, plan upgrade, usage limit warning, getting started, and first insight. Alert emails now use HTML templates instead of plain text. Team invitations and billing checkout flows send branded emails automatically.

2026-04-18

Cloud

Cloud improve now auto-links uploaded and GitHub-backed sources to canonical skills, and imported task-package suites are first-class

Upload-backed and GitHub-backed cloud sources now automatically create or reuse the canonical skills row they belong to. That closes the proposal gap where a winning improve run could persist artifacts but skip proposal_created because the source had no linked skill_id.The eval-suite control plane also now accepts source_kind = imported for deterministic task_package suites, which is the first explicit hosted lane for benchmark-style imports instead of treating every imported suite as a manual one. The docs now also include a first-class imported benchmark page and script path for turning package manifests into live cloud eval suites.

2026-04-17

Cloud

Cloud improve now supports deterministic task-package eval suites and benchmark-style runtime docs

Hosted eval suites now accept deterministic task_package cases, which lets you point an improve run at a benchmark-style environment archive and verifier script instead of relying only on trigger-query or exact-match checks.The Cloudflare runtime executes these task packages inside Sandboxes so the verifier has a real filesystem and process boundary, and the public docs now cover improve run events, statuses, and eval-suite API usage in the same terminology the product uses.

2026-04-17

CloudDashboard

Improve run pages now show customer-facing live progress, delay states, and clearer timeline copy

Hosted improve pages now translate runtime activity into customer-facing progress language instead of exposing queue, worker, or transport details. Active runs surface clearer status cards, a friendlier timeline, and “taking longer than expected” messaging when a run stalls.The improve overview also better distinguishes active versus completed work without making the page feel like an internal operations console.

2026-04-17

CloudDashboard

Improve run pages now refresh live while queued and running, with terminal refetch on completion

Hosted improve run detail pages now subscribe to the run event stream while a run is queued or running, updating phase and status live instead of waiting for a manual refresh. When a terminal event arrives, the page re-fetches full run detail so candidates, artifacts, and proposal state stay in sync.The improve run list also now polls only while active runs are visible, which keeps the overview current without constantly refetching completed history.

2026-04-17

Cloud

Cloud source uploads and GitHub sync now accept lowercase skill.md packages and preserve folder paths on the API-key surface

Hosted cloud-source ingest now accepts both SKILL.md and lowercase skill.md when validating uploaded packages and GitHub-backed skill repos. That keeps upload and sync behavior aligned with the rest of the hosted analysis pipeline, which already supported both casings.The API-key Hono upload route also now preserves multipart field keys as relative paths instead of flattening uploaded files to their basenames, so folder uploads keep nested references/ and other package structure intact.

2026-04-17

Cloud

Cloud improve runtime: Cloudflare execution plane foundation and live SSE event stream

Added foundation for Cloudflare-backed improve run execution using Queues, Workflows, and Sandboxes. A new GET /api/v1/improve-runs/:id/events endpoint streams run lifecycle events via SSE, enabling live updates on the run detail page without manual refresh. The runtime mode is controlled by CLOUD_IMPROVE_RUNTIME_MODE and defaults to legacy with no behavior change until explicitly switched.

2026-04-17

CloudDashboard

Cloud skill validation now uses native spec checks and clearer report detail during setup

Cloud skill setup now persists one validation report per snapshot and shows those results inline in the guided setup hero, so structural validation, best-practice lint, and capability classification are easier to inspect without dropping into raw logs.The hosted validation step also now runs on a native TypeScript implementation of the Agent Skills frontmatter rules instead of shelling out to the demonstration skills-ref toolchain. That keeps cloud validation deterministic in production while tightening allowed-tools parsing and preserving clearer per-rule issue messages.Apply flows also now version and re-upload promoted skill archives more safely. Draft apply and GitHub PR apply both keep the cloud source pointed at the newly promoted snapshot, preserve archive manifests across lowercase skill.md packages, and avoid corrupting frontmatter when YAML values contain ---.

2026-04-17

CloudDashboard

Cloud source APIs now honor source-type and skill filters consistently across dashboard and API-key surfaces

The hosted cloud-source list API now applies the same type and skill_id filters on the API-key Hono surface that the dashboard session route already supported. That keeps browser, CLI, and smoke-test callers on one normalized contract when listing cloud skills.This batch also tightens the hosted improve apply/runtime path so root-level GitHub applies do not infer repo-wide deletions, runner dependencies resolve snapshots through the correct org-scoped database client, and local Neon CLI binding metadata is no longer tracked in git.

2026-04-17

CloudDashboard

Cloud improve model selectors now load the live OpenRouter catalog and use a teacher-student default spread

The per-run model selectors on cloud and observed skill detail pages no longer use a hardcoded GPT-only shortlist. They now load the current OpenRouter model catalog from the server and expose the broader set of text-capable models available through the hosted cloud runtime.Each selector also now carries an explicit recommended default for generate, judge, and summarize. Leaving a selector empty keeps the server-side default for that role, and the UI now spells out those defaults directly so you can test alternatives without losing track of the intended baseline. The selectors are now searchable comboboxes as well, so longer model lists stay usable without scrolling through a giant dropdown.The recommended defaults now follow a clearer teacher-student split instead of a flat GPT-only stack: google/gemini-2.5-pro for proposal generation, google/gemini-2.5-flash for judging, and google/gemini-2.5-flash-lite for summarization. That keeps the strongest model on the expensive generative step while moving validation and helper work onto cheaper OpenRouter models.

2026-04-17

CloudDashboard

Cloud skill onboarding now auto-creates a 50-case eval suite and funnels into one happy path

When you create or sync a cloud skill, the detail page now automatically drafts and saves a 50-case quick eval suite from the current snapshot instead of making you build one manually first. The skill page now leads with a single guided decision: edit the generated eval suite or start the hosted improve run.The cloud skill detail UI has also been simplified around that progression. The primary suite is now treated as one editable artifact with save support, advanced run controls stay collapsed by default, and metadata/report panels are moved behind a technical-details disclosure so the page feels less like a control plane and more like a clear product flow.The Eval summary card now surfaces the per-case origin (auto-generated versus hand-curated versus a mixed breakdown) so you can tell at a glance whether the suite is still the synthetic draft or has been edited. Clicking Edit eval suite also now smooth-scrolls the editor into view, and the hero action row has been reordered so advanced run options live next to the edit affordance rather than after Start improve run.

2026-04-16

CloudDashboard

Overview now presents the hosted cloud loop instead of legacy first-run telemetry onboarding

The cloud dashboard overview now introduces SelfTune as a hosted review loop instead of the older “run selftune and wait for skills to appear” onboarding. The empty-state banner now points people toward the real cloud path: create or import a cloud skill, shape a reviewable eval suite, run the hosted comparison, and review the resulting proposal before draft apply.The overview also now keys that banner off cloud authoring state rather than observed telemetry alone, so the first-run guidance stays visible until you actually have cloud sources in the hosted product.

2026-04-16

CloudDashboard

Quick Eval Suite can now auto-seed synthetic trigger cases and edit them in a table

The cloud skill detail page now includes a real Quick Eval Suite editor instead of only raw textareas and JSON authoring. Trigger-query cases are now editable as table rows with expectation, invocation type, provenance, and row-level remove actions.For llm_judge suites, the page can also draft a synthetic seed directly from the current snapshot’s SKILL.md. Those seeded cases are marked as synthetic so you can review, revise, or delete them before creating the hosted eval suite.

2026-04-16

CloudDashboard

Cloud folder uploads now preserve a real skill directory and show clearer selection state

Cloud skill uploads now package the selected folder as a real skill directory instead of flattening only its file contents into the snapshot archive. That means downstream validation sees the skill in a proper directory layout, which fixes false failures caused by generic wrapper names during skills-ref analysis.The dashboard upload flow is also clearer: the picker now behaves like a folder intake card, shows the detected folder name and file count, confirms whether a root SKILL.md was found, and disables upload until the selection is actually valid.

2026-04-16

CloudDashboard

Cloud improve runs now support separate generate, judge, and summarize model overrides

Cloud improve run setup now exposes three separate model selectors instead of one shared override. You can independently pick the model used for candidate generation, LLM judging, and summarization from the skill detail page before queueing a run.These overrides are stored with the run itself and passed through the hosted runner, so they no longer collapse into a single model choice. This makes it practical to test cheaper summarize settings while keeping a stronger judge, or to isolate generation changes without touching the server defaults.

2026-04-16

CloudDashboard

Cloud dashboard now uses skill-first wording instead of exposing raw source terminology

The cloud dashboard now uses skill-first wording across the library, detail pages, and observed-to-cloud bridge instead of exposing the backend source model directly in the UI. Cloud library cards, blocked-state messages, quick eval setup, and linked-skill surfaces now read as normal product concepts: Cloud skills, Open Skill, and Linked cloud skills.This is a terminology cleanup only. The backend cloud_skill_sources model and API routes are unchanged, but the visible dashboard flow is less confusing because it no longer asks users to think in storage-layer terms.

2026-04-16

CloudDashboard

Import observed skills to cloud and manage eval suites from the dashboard

Observed skill cards now include an Import to Cloud button that navigates directly to the cloud library with the skill pre-filled for import. The cloud source detail page also gains an Eval Suites section where you can view existing suites scoped to a source and create new ones inline with a name, verifier kind, and JSON test cases — no CLI or API calls required.Proposal detail pages now show full eval comparison data, artifact kinds, confidence levels, and an Apply to Draft button that closes the loop from review to draft apply. The jobs page shows Cloud Improve Runs alongside pipeline jobs with status badges and candidate counts.

2026-04-16

CloudDashboard

Cloud Library now shows cloud authoring records separately from observed telemetry skills

The cloud dashboard’s main Skills surface now reflects the real cloud authoring model instead of the telemetry-backed skill table. It lists GitHub-backed sources, imported uploads, and cloud-managed records with their current snapshot and capability state.The old telemetry-backed skills library is still available, but it now lives under Observed so local and cloud concepts do not get mixed together. Cloud sources without a linked telemetry skill now have their own detail page for snapshot metadata, validation reports, and hosted improve controls.The cloud library also now includes first-run onboarding directly in the UI: you can upload a skill folder into a new cloud source, create and sync a GitHub-backed source from a bound installation, jump into cloud import from observed skills, and create a lightweight eval suite from the cloud detail page before queuing a hosted improve run.Observed skill detail pages now also expose that bridge directly: if a skill already has linked cloud sources you can jump straight into them, and if it does not you can import it into cloud from the report itself instead of backing out to the library first.

2026-04-16

CloudDashboard

Cloud improve runs can now override the model per run, with cheaper default summarize policy

Cloud improve runs can now override the hosted model policy directly from the skill page before queueing a run. The selected override applies to the full run, not just candidate generation, so you can force a cheaper test model or a stronger one without changing server env defaults.Hosted defaults are also now more cost-aware for testing: generation and judging stay on openai/gpt-4.1-mini, while summarize and low-risk helper work default to openai/gpt-4.1-nano.

2026-04-16

Cloud

Cloud skill improvement integration: runner package, eval backends, control-plane wiring, and hosted eval-suite parity

The cloud skill improvement pipeline is now fully wired end-to-end. The isolated runner package (@selftune/cloud-improve-runner) connects to the control-plane orchestrator via concrete dependency adapters. Eval backends (trigger-query LLM judge + deterministic) dispatch through a registry. Both draft and GitHub apply paths consume the same candidate archive contract.Hosted eval-suite creation now validates runnable manual suites for both llm_judge and deterministic verifiers through the same control-plane contract the runner consumes, which keeps the dashboard and API-key paths on one canonical suite definition.

2026-04-15

OSSCLIDashboard

Non-TTY improve runs now show durable CLI progress

selftune improve and selftune evolve now fall back to plain stderr progress lines when the terminal is not a TTY, instead of going completely silent while long proposal or validation steps are still running. - Interactive terminals keep the spinner/TUI behavior, while test runs remain quiet by default.

2026-04-15

OSSDashboard

Dashboard action toasts now deep-link into the exact live run

Local dashboard action toasts now include a Live run action that opens the exact /live-run entry for the streaming creator-loop event, including the event id, skill, and action selection state. - The floating Live lifecycle actions feed now uses the same deep link, so clicking a running or finished lifecycle card jumps straight into the matching Live Run entry instead of leaving you to find it manually.

2026-04-15

OSSCLIPlatforms

eval generate can now force opencode or another agent runtime

selftune eval generate now accepts --agent for --synthetic, --auto-synthetic, and --blend, so you can force opencode, codex, or pi instead of relying on auto-detection order. - Cold-start synthetic eval generation now reuses the same cleaned query filtering as log-derived evals and summarizes oversized SKILL.md content before sending it to the runtime, which reduces prompt bloat for large skills like SelfTuneBlog.

2026-04-15

OSSCLI

Package search merge candidates no longer overwrite evaluated variants

Bounded package search now writes merged routing/body candidates into a new temp package snapshot instead of overwriting the already-evaluated body variant on disk, so candidate artifacts remain consistent for later winner application and review. - selftune create publish --watch --ignore-watch-alerts now also bypasses the watch gate when the watch subprocess crashes or fails to emit structured JSON, while still surfacing the warning and remediation command.

2026-04-15

OSSDashboard

OSS local dashboard CI now typechecks bounded search summaries again

The OSS local dashboard LiveRun test fixture now uses the real DashboardActionResultSummary shape for bounded package-search summaries, so export verification no longer fails when search_run is present on deploy candidate entries.

2026-04-15

OSSDashboardCLI

Dashboard lifecycle copy now matches the CLI lifecycle surface

The local dashboard now normalizes selftune create replay, selftune create baseline, selftune evolve, selftune evolve body, and selftune search-run into the same lifecycle-facing commands the CLI already shows, so Overview, Skill Report, and Live Run no longer leak stage-level command names for draft-package flows.

2026-04-15

OSSCLIDashboard

Package baseline now reuses fresh replay artifacts and emits phase progress

selftune create baseline --mode package now reuses the last fresh with-skill replay from the canonical package-evaluation artifact when the draft fingerprint still matches, so measuring baseline no longer pays for two full replay passes after an unchanged verify, report, or search-run. - Package baseline now emits explicit with_skill_replay and without_skill_replay step progress so the local dashboard live-run surface shows immediate movement instead of looking stuck while the underlying replay work is still running.

2026-04-15

OSSCLI

Reflective package search, merged winners, and lifecycle auto-selection

selftune search-run now prefers reflective routing/body proposals from measured runtime failures before targeted or deterministic fallback. - When routing and body both produce accepted improvements, package search now evaluates a merged candidate before final winner selection instead of forcing the frontier to choose between complementary single-surface edits. - Plain selftune improve now auto-selects bounded package search for skills that already have package evidence or a draft package manifest, so agents do not need to force --scope package for the main package-shaped lifecycle. - Added an end-to-end package lifecycle test covering verify auto-fix, bounded package search, winner promotion, and publish --watch.

2026-04-15

OSSCLI

Direct search-run now uses measured targeted variants

selftune search-run now uses the same measured targeted-routing/body mutation path as orchestrate package search, falling back to deterministic variants only when targeted variants do not fill the requested minibatch. - The public CLI docs and workflow docs now describe search-run as bounded local package search over draft variants instead of registry lookup, and the Evolve workflow now points package-scope users at the measured targeted search path instead of only the older deterministic description. - Publish/package-search lifecycle docs now describe the real blocking publish-time watch gate instead of the older advisory wording.

2026-04-15

OSSCLI

Verify auto-fix, publish watch blocking, and targeted-mutation wiring fixes

selftune verify now auto-runs the real missing-evidence commands with the required flags and skill context, including --auto-synthetic eval generation and generated unit tests. - selftune create publish --watch now blocks publish if the watch subprocess fails or returns malformed output instead of treating missing watch JSON as a passing gate. - Eval-informed targeted mutations now read grading_results.pass_rate, expectations_json, and failure_feedback_json from the real SQLite schema instead of a test-only summary_json shape. - The shipped lifecycle docs now describe the actual concrete readiness states and the correct --ignore-watch-alerts flag.

2026-04-15

OSSCLI

Lifecycle vocabulary normalization and restructured CLI help

normalizeLifecycleCommand now maps create replay, create baseline, evolve, evolve-body, and search-run to their lifecycle equivalents. - selftune --help now shows Primary Lifecycle commands first, with Advanced / Stage Commands below.

2026-04-15

OSSCLI

Auto-evidence generation in verify

selftune verify auto-runs missing evidence steps (up to 4 iterations) when readiness checks fail. Use --no-auto-fix to skip.

2026-04-15

OSSCLI

Broader package search eligibility for draft packages with grading evidence

collectPackageSearchEligibleSkills now includes a second eligibility tier: skills with a selftune.create.json draft package and at least 3 grading results in the DB are routed to package search during orchestrate. - The existing frontier/artifact fast path is unchanged; the new tier is additive and fail-open (skips silently if the grading table is missing).

2026-04-15

OSS

Docs: fix stale orchestrate claim in SearchRun.md and document watch frontier demotion

SearchRun.md no longer claims orchestrate cannot auto-select package search — it documents the eligibility criteria and plan-phase routing. - Watch.md adds a “How Watch Evidence Feeds Back to the Frontier” section explaining watch rank levels, SQLite row updates, and dashboard visibility. - SKILL.md SearchRun routing keywords now include “optimize package”, “improve routing and body together”, and “bounded evolution”.

2026-04-15

OSSCLI

Publish watch gate now blocks and mutation weakness extraction populates failure patterns

create publish --watch now blocks publishing when the watch gate detects active alerts (published: false, watch_gate_blocked: true), instead of unconditionally publishing. Use --ignore-watch-alerts to bypass. - extractMutationWeaknesses now populates gradingFailurePatterns from the expectations array in grading summary JSON, enabling targeted body mutations to focus on specific failed expectations.

2026-04-15

OSSCLIDashboard

Phase 2 follow-up fixes package search and publish gate wiring

Orchestrate now marks skills package-search-eligible from the real accepted frontier and canonical package-evaluation artifacts, so the new package-search branch is reachable in normal runs instead of existing only in isolated tests.
The orchestrate package-search phase now uses the current mutation and winner-application contracts, including targeted routing/body variants, current candidate path fields, and the current applySearchRunWinner response shape. - create publish --watch now surfaces watch_gate_passed, watch_gate_warnings, and watch_trust_score directly in the publish payload, and --ignore-watch-alerts now intentionally bypasses that advisory gate when needed. - Skill reports now populate watch_trust_score from the latest stored package-evaluation watch summary, so the dashboard watch trust indicator renders from real watch evidence instead of staying empty. - Fixed the selftune orchestrate CLI docs page so Mintlify renders it as a normal document instead of a raw fenced code block. - Dashboard skill report and live run now display routing and body weakness percentages from surface plan data, with a visual bar highlighting the weaker surface. The frontier panel also shows a parent-vs-winner comparison when both members are available.

2026-04-15

OSSCLI

Orchestrate gains automatic package search selection

Added evidence-driven scope selection to orchestrate so it automatically chooses between description-level evolve and package-level bounded search based on accepted frontier state and canonical package evaluation evidence. - Added watch trust scoring feedback so post-deploy regressions can demote accepted frontier candidates and influence future scope selection. - Updated workflow and skill documentation to reflect the new package-search-in-orchestrate truth.

2026-04-15

CLIOSS

Bounded mutation strategies for package evolution

Added deterministic routing mutations (synonym expansion, granularity split, coverage broadening) and body mutations (instruction emphasis, example enrichment, description expansion) for bounded package evolution. - Added eval-informed targeted mutations that consume measured weaknesses from replay failures and grading results to focus routing and body changes on specific failure patterns. - Added weakness extraction from the local SQLite database to surface replay failure samples, routing misses, body quality scores, and grading pass rate deltas for mutation targeting.

2026-04-15

OSSCLIDashboard

Watch trust scoring and publish gate

Added computeWatchTrustScore to the watch module, producing a 0-1 trust score from trigger regression, grade regression, and rollback signals. - Added an advisory publish watch gate that warns when active alerts or low trust scores are detected, with --ignore-watch-alerts bypass for experts. - Extended the dashboard contract with watch_trust_score on skill reports and watch_gate_passed on action result summaries. - Updated the live run screen to display watch gate pass/alert badges when watch or deploy actions complete.
Added a watch trust indicator to the skill report creator loop section.

2026-04-14

CloudRegistry

Cloud GitHub registry foundation

2026-04-15

CLIOSS

Package search phase in orchestrate loop

Added a package-search candidate action to the orchestrate loop so skills with accepted package frontier candidates are routed through bounded package search instead of standard evolution. - The new phase generates bounded mutations, fingerprints variants, runs package search evaluation, and applies winning candidates automatically. - Package search modules are lazy-loaded and gracefully degrade when unavailable, so the existing orchestrate flow is unaffected until the full package search stack is present.

2026-04-15

OSSCLI

search-run no longer over-biases body mutations on missing quality scores

search-run now treats body.quality_score: null as a neutral weakness signal when the body already passed validation, instead of coercing it to maximum weakness. - This prevents --surface both from over-allocating routing/body search budget toward body mutations when quality assessment was unavailable but the current body was still valid.

2026-04-15

OSSCLI

bounded search now biases the minibatch toward weaker measured surfaces

search-run --surface both now reads the accepted frontier first and falls back to the canonical package evaluation when needed, using that measured package state to bias routing/body candidate counts. - This replaces the old fixed half-routing half-body split with a weakness planner that sends more of the minibatch budget toward the weaker measured surface while still keeping bounded deterministic search behavior. - The chosen surface budget is now persisted into search provenance and shown in the live-run and skill-report frontier surfaces, so reviewers can see why a run spent more budget on routing or body.

2026-04-15

OSSCLIDashboard

package search can now promote the winning draft candidate

Added search-run --apply-winner, which copies the winning candidate back into the draft package and refreshes the canonical package-evaluation artifact from the accepted candidate cache instead of leaving search as read-only provenance. - selftune improve --scope package now adds winner promotion by default and keeps --dry-run as the review-only escape hatch. - Search-run dashboard summaries now carry the resulting next command and package-evaluation context when a winning candidate is applied, so live review stays grounded in measured package state instead of raw search provenance.

2026-04-15

OSSCLI

package search is now part of the main improve lifecycle

Added selftune improve --scope package, which routes the primary improvement alias into selftune search-run instead of keeping bounded package search behind an expert-only command. - Package scope now preserves --eval-set, strips redundant --dry-run, normalizes compatible replay validation flags, and maps --candidates onto search-run’s --max-candidates knob. - Updated command help, workflow docs, SKILL routing guidance, and CLI docs so package search is taught as part of the main measured improvement loop.

2026-04-15

OSSCLIDashboard

bounded package search is now executable end to end

Added selftune search-run as a real top-level CLI command that generates bounded routing/body package variants, evaluates them through the shared package evaluator, and persists the selected winner plus provenance. - Wired search-run through dashboard actions, child-process event instrumentation, live-run summaries, and draft-package action buttons so bounded search is executable from the product surface instead of only existing as stored backend state. - The skill report backend now returns real package frontier state and the latest search-run provenance, so the frontier panel is driven by measured candidate history rather than a dormant response field. - Package search evaluations now normalize temp candidate variants back onto the canonical skill name, and winner selection now follows the accepted frontier over the full evaluator contract instead of replay-only gains. - Updated command help, workflow docs, SKILL routing, and the CLI quick reference so the new search surface is documented consistently.

2026-04-15

CLIOSS

Package evaluation pipeline terminology

Updated selftune status output to label the readiness section “Package pipeline” instead of “Creator loop”. - Adapted package search runner to the mature evaluator API with frontier-based parent selection. - Normalized SKILL.md description and body to reference the package evaluation pipeline (replay, baseline, grading, body, unit tests, and post-deploy watch) as the primary improvement mechanism. - Updated Evolve, EvolveBody, Watch, and CreateTestDeploy workflow docs to use package evaluation pipeline terminology consistently. - Normalized Baseline, Evals, UnitTest, SignalsDashboard workflow docs and creator-playbook reference to use package evaluation pipeline terminology.

2026-04-15

OSSDashboardCLI

Package frontier observability in dashboard

Added package frontier panel to skill report showing accepted candidates ranked by measured evidence with watch-fed demotion indicators. - Added search run panel to live run screen showing selected parent, candidates evaluated, winner determination, and provenance detail. - Added search-run action result parsing to the dashboard action result contract so search runs surface structured summaries alongside existing replay dry-run results.

2026-04-15

CLIOSS

Bounded mutation primitives for package search

Added generateRoutingMutations() and generateBodyMutations() in the evolution pipeline to produce complete skill file variants that a package search runner can score. Three routing strategies (synonym expansion, granularity split, coverage broadening) and three body strategies (instruction emphasis, example enrichment, description expansion) create bounded variants written to temporary directories.

2026-04-15

CLIOSS

Bounded package search runner

Added bounded package search runner that evaluates candidate skill variants against the accepted frontier parent with measured delta acceptance. - Added package candidate state management with frontier reading, parent selection, and fingerprint-based deduplication. - Added package search provenance persistence tracking frontier size, parent selection method, candidate fingerprints, and evaluation summaries.

2026-04-15

OSSCLIDashboard

watch now flags package efficiency regressions

selftune watch now reads the current package-evaluation artifact when one exists and computes an efficiency regression signal from observed post-deploy sessions, instead of only looking for trigger-pass-rate regressions and optional grade regressions. - Efficiency watch is grounded in measured package baselines already produced by create report and create publish, so post-deploy monitoring now compares observed input tokens, output tokens, and assistant turns against the same package-evaluator contract used before publish. - Efficiency regressions now flow through the structured watch result and the nested package watch summary, so publish/watch consumers can surface the same measured signal without scraping alert text. - The local dashboard watch parser now preserves those efficiency-regression fields in the package watch summary, keeping the watch contract forward-ready for richer live-run presentation as more post-deploy package signals land.

2026-04-15

OSSCLIDashboard

package candidate history now records measured acceptance

Durable draft package candidates now carry a measured acceptance decision in local state, instead of only lineage metadata, so candidate history can distinguish accepted improvements from measured regressions. - Acceptance is computed from package-evaluator evidence rather than model confidence, with explicit replay, routing, baseline-lift, body-quality, and unit-test deltas plus a human-readable rationale attached to the candidate summary. - Re-evaluating the same draft fingerprint preserves the original parent relationship instead of inventing a new comparison target, so repeated review runs update the candidate record without corrupting lineage. - Fresh candidates now compare their measured acceptance against the latest accepted frontier member instead of blindly inheriting the most recent rejected draft as the comparison baseline, while still keeping chronological lineage in the parent link. - When the current draft matches an already accepted frontier member, package evaluation can now reuse that candidate-specific artifact by fingerprint even if the canonical latest package report points at some other draft, so re-checking an accepted draft no longer repays the full evaluator cost. - Accepted-frontier selection is now ranked by measured package outcomes instead of timestamp alone, so newer accepted drafts with weaker grading or weaker observed health no longer automatically become the comparison parent for the next candidate. - create publish --watch now writes structured watch results back into the matching package candidate artifact and registry row, so observed regressions can demote an accepted draft in later frontier selection without fabricating a brand-new evaluation event. - Cached package-evaluation reuse now also requires acceptance metadata in the stored artifact, so older lineage-only artifacts automatically refresh once before they can participate in candidate-aware reuse. - Benchmark reports, create publish summaries, and the local dashboard live-run screen now surface the candidate acceptance decision and rationale, so measured accept/reject state is visible without opening archived JSON.

2026-04-15

OSSCLI

package evaluation now registers candidate lineage

Fresh draft package evaluations now register a durable package candidate per package fingerprint in local state, instead of only overwriting one latest package report per skill. - New candidate records carry parent linkage to the previously evaluated draft for the same skill plus a candidate-specific archived evaluation artifact, so later bounded package search can reuse lineage and evaluator evidence instead of rebuilding history from ad hoc files. - Cached package-evaluation reuse now requires candidate metadata in the saved artifact too, so older artifacts automatically force one fresh measured run before they can participate in candidate-aware reuse. - Benchmark reports, publish summaries, and the local dashboard live-run view now surface candidate ID, parent linkage, and generation directly, so candidate lineage is inspectable without opening archived JSON artifacts.

2026-04-14

OSSCLI

selftune skill now teaches a simpler lifecycle

Repositioned the shipped selftune skill around a smaller lifecycle: Create, Verify, Publish, Improve, and Run, instead of leading with the older stage-heavy creator loop. - Added new primary workflow docs for Verify, Publish, Improve, and Run, while keeping the existing lower-level eval, replay, baseline, watch, and body-evolution workflows available as advanced surfaces. - Updated SKILL.md, routing keywords, and lifecycle-state guidance so “can I trust this skill?”, “ship this skill”, and “run the loop” now map to intention-level workflows that still use today’s commands accurately under the hood. - Reframed Create as draft authoring only, marked the older CreateTestDeploy workflow as legacy compatibility guidance, and taught Orchestrate as the underlying runtime behind the simpler Run concept. - The local dashboard action stream and dashboard-triggered publish/evolve paths now recognize and use the new verify, publish, improve, and run aliases where they preserve the same measured behavior, so the live-run UI stays aligned with the simplified lifecycle surface. - The local dashboard overview, skill report, live action feed, and CLI docs now teach draft-package work as verify, publish, and live monitoring first, while still exposing the lower-level eval, replay, baseline, and create-check commands when an agent needs to drive the advanced loop manually. - selftune status, dashboard recommended commands, live-run next-command cards, the shipped quick reference, README, and the main skill-authoring guides now normalize old surface aliases like create check, create publish, and orchestrate into verify, publish, and run when the underlying behavior is equivalent, so the product stops teaching mixed lifecycle vocabulary by default. - Scheduled automation surfaces now teach selftune run as the default autonomous loop entrypoint: cron job messages, generated schedule snippets, alpha-enrollment guidance, orchestration reports, and the related docs and skill workflows all use run first while keeping orchestrate as the underlying advanced runtime name where needed. - Fixed the selftune create CLI page after a broken MDX wrapper landed, and updated the main authoring, troubleshooting, sharing, trigger-testing, and creator-playbook docs so they teach verify / publish first while still documenting the lower-level create replay / create baseline package steps when a draft needs explicit measured proof. - Normalized the secondary advanced workflow docs and README so eval, unit-test, baseline, evolve, evolve body, dashboard live-run, and legacy create-test-deploy guidance now distinguish draft-package lifecycle work from already-published skill iteration, instead of re-teaching the old creator-loop chain as the default. - Cleaned up the remaining lifecycle wording in status, eval, and create CLI docs plus the shipped SKILL.md reference table, so “creator loop” now mainly survives as a compatibility/search term instead of the default label for the product surface. - Corrected the package-search docs so search-run and improve --scope package are documented as explicit bounded-search surfaces, without claiming that run / orchestrate already auto-select package search before that automation is actually shipped.

2026-04-14

OSSCLI

package evaluation now reuses fresh measured artifacts

Added a canonical full-evaluation artifact beside the stored package summary, so create report and publish-time package gates can reuse one measured replay/baseline/body-validation result instead of scraping or recomputing partial state. - Package-evaluation reuse is guarded by the bounded package fingerprint and request shape, so edited drafts or changed evaluation requests still trigger a fresh measured run instead of trusting stale evidence. - Cache hits only apply when the saved package artifact already includes the current routing/body validation dimensions, so older summaries automatically fall back to a fresh measured run instead of silently downgrading the review signal. - Benchmark reports and publish output now label whether the package evaluation was freshly measured or reused from a matching artifact cache, so creators can audit reuse instead of inferring it from timing or logs. - The local dashboard live-run summary now surfaces that same fresh-vs-cached evaluation source for package report/publish actions, so cache reuse stays visible in the main review UI too.

2026-04-14

OSSCLIDashboard

package evaluation now includes routing and body validation

Extended the shared draft package evaluator so create report and create publish now attach current routing replay validation and current body validation alongside replay, baseline, grading, unit-test, and watch evidence.
Updated the benchmark-style package report format so routing replay and body validation show up in the same deterministic artifact as the rest of the measured package evidence. - Updated the active bounded package-evolution plan to reflect that body/routing validation is now part of the unified evaluator contract, moving the remaining gap toward candidate state, evaluator reuse, and measured search rather than missing evaluator dimensions.

2026-04-14

OSSCLI

draft package benchmark report helper

Added selftune create report --skill-path <path> as a no-side-effect package-evaluation command that runs replay plus baseline and renders one benchmark-style report with failure analysis, measured lift, recommendation, and next-step guidance.
Added the same report shape as a reusable helper in the shared draft package evaluator so future dashboard and PR-summary surfaces can reuse one deterministic evidence format instead of inventing ad hoc summaries.
Updated the selftune skill workflow docs, quick reference, README, and CLI docs so package creators can explicitly request a measured publish-readiness report before running create publish.

2026-04-14

OSSCLI

package-first create publish handoff

Updated selftune create publish so draft-package publishing now re-runs create replay --mode package and create baseline --mode package as the final measured gate before watch. - Removed the old direct handoff from create publish into description-only selftune evolve, keeping the creator loop grounded in package-level validation instead of a description mutation step. - Added a shared package-evaluation summary that create publish can return directly, so draft deploy/watch actions have one measured result shape instead of stitching together replay and baseline outcomes ad hoc. - Updated the local dashboard action parser so draft-package baseline and publish runs can surface replay mode, before/after pass rates, and lift on the live run screen. - selftune watch now emits a machine-readable recommended_command, and create publish --watch now carries the nested watch_result payload through directly so draft publish/watch flows expose measured post-deploy pass rates, alerts, and rollback recommendations instead of only a coarse “watch started” status. - Updated creator-loop readiness and selftune status guidance so draft packages now recommend create replay, create baseline, and create publish instead of falling back to the older evolve / grade commands for those milestones. - Updated the overview, skill report, and selftune status creator-loop surfaces so draft packages stay blocked on create check or package-resource fixes until those checks actually pass, instead of skipping ahead to replay or publish because later creator-loop artifacts already exist. - Added dashboard support for create check as a runnable draft-package action, so the live-run screen and draft package panel can stream and summarize spec-validation checks instead of showing that step as copy-only guidance. - Added structured progress events for create check, so the live-run screen now shows draft-package load, Agent Skills validation, and selftune readiness computation as explicit steps instead of only the final JSON result. - Made the overview creator-loop priorities runnable from the dashboard for actionable steps, so top-level draft-package cards can launch create check, eval generation, replay, baseline, and publish flows without drilling into the per-skill report first. - Updated the CLI help, OSS workflow docs, and docs site reference so the publish contract matches the package-first creator loop. - The live-run summary tiles now relabel watch actions as Baseline, Observed, Delta, and Signal, so post-deploy watch evidence no longer appears under the older dry-run Before / After / Validation vocabulary. - The shared package-evaluation payload now carries runtime efficiency and representative evidence, so package replay / baseline / publish flows can return measured duration and token aggregates together with replay-failure and baseline-win samples instead of only pass-rate summaries.
The live-run screen now surfaces those measured package-evaluation artifacts directly, including replay-failure samples, baseline-win/regression samples, with-skill versus without-skill efficiency totals, and recommended next commands when publish or watch actions expose them.
Added report-package as a first-class dashboard action for draft skills, so the skill report and live-run feed can launch selftune create report directly and label the resulting benchmark artifact separately from baseline, publish, and watch runs.
create publish --watch now attaches a structured watch summary to that same package-evaluation payload, and the live-run screen renders watch snapshot counts, invocation-type totals, rollback state, and grade-watch deltas from that shared measured contract.
Clarified the public CLI docs and shipped Create workflow so agents can rely on both the raw nested watch_result payload and the normalized package_evaluation.watch block when they parse publish-with-watch results.
selftune evolve and selftune evolve body no longer reject proposals before measured validation solely because model-reported confidence is low; --confidence now acts as a review threshold and adaptive-gate risk signal instead of a hard pre-validation stop.
The shared package-evaluation payload now also includes grading baseline versus recent grading deltas when that data exists, so create report and create publish --json can show observed execution-quality movement next to replay, baseline, and watch evidence.
The local dashboard now parses and renders that same package_evaluation.grading block in live-run summaries, so draft package report and publish flows expose measured grading movement without requiring raw JSON inspection.
The latest package-evaluation summary is now stored canonically in SQLite and mirrored to ~/.selftune/package-evaluations/<skill>.json, so draft report/publish/watch flows can reuse one measured artifact instead of treating package evaluation as stdout-only output.
Draft-package readiness and create check now honor the latest stored package-evaluation status, so a measured replay_failed or baseline_failed result keeps the skill blocked on the corresponding package gate instead of surfacing a false ready to publish state just because the older replay or baseline artifacts exist.
The shared package-evaluation payload now also carries deterministic unit test results and representative failing tests when that evidence exists, so create report, create publish --json, and the live-run UI can review the latest measured test run alongside replay, baseline, grading, and watch evidence.
Draft-package readiness and create check now also honor the latest failed deterministic unit-test run when one exists, so stored test failures keep the draft blocked on rerunning unit tests instead of treating test-file presence alone as publish-ready proof.
Stored package-evaluation artifacts now include a bounded package fingerprint, and draft-package readiness only trusts those replay/baseline results when the fingerprint still matches the current package tree, so stale failed measurements stop blocking edited drafts just because they share the same skill name.
Fixed dashboard child-process action context for report-package, so create report and verify now stream live progress and metrics events into the live-run screen instead of silently dropping them when the action context is read from environment variables.

2026-04-14

OSSCLI

create skill packages and workflow scaffolds

Added selftune create init as the clean-slate authoring path for new skills. - Added selftune create scaffold --from-workflow ... as the workflow-derived authoring path, and upgraded selftune workflows scaffold to emit the same package shape for backward compatibility. - Package drafts now include SKILL.md, workflows/default.md, references/overview.md, empty scripts/ and assets/ directories, plus a selftune.create.json manifest.
Added selftune create check to run Agent Skills spec validation first and then compute selftune-specific package readiness for evals, unit tests, replay, and baseline. - Added selftune create replay, selftune create baseline, selftune create status, and selftune create publish so the draft-package path now reaches all the way through replay validation, lift measurement, and handoff into the existing evolve/watch surfaces. - Added package-mode replay staging so runtime replay can read workflow/reference files inside the staged skill package without treating them as unrelated paths. - The local dashboard now surfaces draft packages before they have live telemetry, shows package-local create readiness on the skill report, and routes dashboard replay/baseline/ publish actions through the draft-aware create commands automatically. - selftune create check now recommends create replay, create baseline, and create publish for draft-package next steps instead of the older generic evolve/grade commands, keeping package-tree staging consistent from CLI output through the dashboard. - Hardened the local dashboard draft-package views so the exported OSS app typechecks cleanly when create-readiness data is optional, preserving the draft-package panels in shipped builds. - Fixed selftune workflows scaffold --write so fresh workflow-derived packages are written through the shared draft-package writer instead of pre-creating the directory and tripping the overwrite guard. - Draft-package dashboard actions now start eval generation with --auto-synthetic, so cold-start skills can bootstrap eval sets from the dashboard instead of attempting empty log-based generation. - Added agent workflow docs and public CLI docs so agents can route package authoring requests to the full command surface.

2026-04-14

CloudRegistry

Cloud GitHub registry foundation

Added GitHub App installation binding for cloud orgs so a team can associate a GitHub installation with its registry workspace. - Added GitHub-backed registry connection APIs for listing accessible repos, connecting a repo to a registry entry, disconnecting it, and requesting manual sync. - Added immediate manual sync publishing so a connected repo path is packaged from GitHub, archived, and pushed into the registry as a GitHub-sourced version without waiting on a background worker. - Added webhook-driven auto-publish for default-branch pushes and matching Git tags so connected repos now flow into the registry without manual sync. - Added a dashboard GitHub settings flow with installation binding, repo discovery, monorepo path selection, and connection management controls. - Added Tier A GitHub write-back with org-level policy, per-connection opt-in, persisted publish attempts, and optional commit status/check-run updates for successful, skipped, and failed publishes. - Added direct selftune registry install github:owner/repo[@ref][//path] support so skills can be installed straight from GitHub with monorepo path discovery when the cloud registry is not part of the flow. - Fixed direct root installs from GitHub so a missing name: in root-level SKILL.md falls back to the actual repository name instead of the temporary clone directory name. - Restored the expected indentation in selftune registry --help so the usage block matches the rest of the CLI help formatting. - Polished the cloud GitHub settings experience with branded action buttons, clearer installation action states, a consolidated production setup runbook, and lowercase selftune branding on key cloud surfaces. - Added signed GitHub webhook intake plus registry source metadata fields so GitHub-origin publishes can be tracked separately from CLI-pushed versions. - Hardened GitHub webhook handling so tag patterns reject unsafe multi-wildcard shapes and webhook deliveries return immediately while publish processing continues asynchronously.

2026-04-14

OSSDashboardCLI

sqlite creator loop artifacts

Moved canonical eval sets, generated unit tests, and unit-test run results into SQLite as the primary local source of truth for creator-loop readiness. - Kept mirroring those artifacts into the legacy ~/.selftune/eval-sets/ and ~/.selftune/unit-tests/ JSON files so existing file-based workflows and commands still work during the transition. - Updated readiness/status surfaces to prefer SQLite-backed artifacts instead of depending on filesystem existence checks.

2026-04-14

OSSDashboardCLI

canonical dashboard artifact paths

Updated dashboard-triggered generate-evals to pass the canonical ~/.selftune/eval-sets/<skill>.json output path explicitly instead of relying on a relative fallback filename.
Updated dashboard-triggered generate-unit-tests to pass the canonical ~/.selftune/unit-tests/<skill>.json path explicitly as well, keeping readiness artifacts out of the repo working directory.

2026-04-14

OSSDashboardCLI

dashboard rollback routing

Fixed local dashboard rollback actions to spawn selftune evolve rollback with the expected proposal arguments, matching the actual CLI command surface.
Added a dashboard regression test that asserts the rollback action uses the evolve rollback subcommand shape.

2026-04-14

OSSDashboard

evolution rail header background

Removed the forced background fill from the sticky Evolution heading in the shared skill report evidence rail so proposal views keep the intended transparent panel treatment while scrolling.

2026-04-14

OSSDashboardCLI

structured creator-loop progress

Added a shared dashboard action instrumentation layer so creator-loop commands can emit structured step progress, LLM call progress, and provider-normalized runtime metadata without hard-coding the dashboard to one provider. - Wired selftune eval generate and selftune eval unit-test --generate into that shared observer path so the live-run screen can show load/build/write steps plus provider/model/duration updates instead of only terminal output. - Generalized the live-run UI from replay-only wording to a broader action-progress surface while keeping replay as the richest source of token and cost detail.

2026-04-14

OSSDashboardCLI

dashboard update badge

Added cached update availability metadata to the local dashboard health surface so the dashboard can tell the difference between up-to-date, auto-update-capable installs and manual-refresh source-tree installs. - Added a passive Update available status chip in the local dashboard footer plus a dedicated update panel on /status, keeping version visibility available without polluting live creator-loop transcripts.

2026-04-14

OSSDashboard

local dashboard proposal focus

Fixed proposal selection so opening a proposal link no longer gets overwritten by an automatic fallback selection. - Removed eager proposal auto-focus during initial load to keep deep links stable. - Kept readiness-driven action prioritization aligned with the active proposal focus state so child action sections no longer shift unexpectedly.

2026-04-14

OSSCLIDashboardPlatforms

quieter local creator loop runs

Suppressed unsupported auto-update chatter during local source-tree runs so dashboard-triggered creator-loop actions no longer flood the live log with manual refresh instructions. - Updated OpenCode ingest to support the current SQLite schema, including time_created timestamps and JSON-backed message rows, instead of assuming legacy created/content columns.

2026-04-13

OSSDashboardCLI

dashboard streaming refresh

Added a live action feed in the local dashboard so creator-loop runs show start, progress, and finish states instead of only appearing after the next data refresh. - Added a dedicated live-run screen for creator-loop actions so replay dry-runs can stream output, show parsed lift summaries, and display model/platform/token context beside the terminal log. - Added structured replay metrics to the live dashboard stream so Claude runtime replay now reports per-run platform, model, token, cost, and duration data in real time instead of only terminal text. - Added per-eval replay progress streaming and SSE backfill so the live-run screen can show eval n/N, query snippets, and pass/fail evidence even when you open the page after the run has already started. - Added dashboard action buttons for the main creator loop on skill reports: generate evals, generate unit tests, replay dry-run, baseline measurement, deploy, and watch. - Added a shared local action stream so supported terminal-run selftune commands also appear in the dashboard without being launched from the UI. - Fixed replay dry-runs so validated evolve --dry-run runs surface as success in the live dashboard feed even when the CLI exits non-zero to avoid accidental deployment.

2026-04-13

OSSDashboardCloudRegistryBilling

v0.2.24 to v0.2.27

Repaired the OSS publish pipeline so npm releases can still generate SBOMs, GitHub tags, and enriched release notes even when a publish partially succeeds. - Blocked cloud dashboard indexing and added changelog coverage enforcement so shipped product changes are documented before they merge. - Opened registry publishing and rollback to Pro plans so solo skill creators can publish and iterate without upgrading to Team first. - Tightened the local dashboard skill report around proposal deep links, kept proposal-focused layouts stable while report data loads, prevented raw ENOENT errors during SPA reloads, and restored full-width creator loop layout on overview. - Unified cloud and OSS skill report styling around the shared trust status language by restoring trust panel order, removing leftover success-green treatments, and switching trust badges to the app-wide dot-and-pill status treatment.

2026-04-08

OSSPlatformsCLI

v0.2.20 to v0.2.23

Added universal hook adapters for Codex, OpenCode, and Cline so selftune can capture real-time telemetry beyond Claude Code. - Added cold-start suspicion and Claude runtime replay validation to make trigger diagnostics more trustworthy when a skill has little history. - Hardened OpenCode installation so hook setup follows current plugin and config behavior instead of relying on rejected config keys. - See the OSS releases for package artifacts and per-version compare links.

2026-04-01

OSSDashboardCommunity

v0.2.14 to v0.2.19

Overhauled dashboard, trust, and creator-facing contribution surfaces so health signals are easier to interpret during active iteration. - Tightened the autonomous evolve and audit path to close reliability gaps in proposal rollout and monitoring. - Added CLI auto-update, richer structured errors, description quality scoring, and unblock suggestions for faster operator recovery.

2026-03-08

OSSCLIDashboard

v0.2.0

Added full skill body evolution so selftune can refine routing tables and larger skill bodies instead of only short descriptions. - Added synthetic eval generation to help new skills bootstrap without waiting for a large session history. - Introduced cheaper validation loops, activation rules, specialized agents, and a live local dashboard server for faster iteration. - Read more in the evolution concept guide and the dashboard command reference.

2026-03-01

OSSCLICommunity

v0.1.4

Added selftune status and selftune last so you can check skill health without opening the full dashboard. - Added a local dashboard and Claude transcript backfill to make retroactive analysis practical on existing projects. - Added opt-in community export so you can share anonymized signals back to the ecosystem.

2026-02-28

OSSCLIPlatforms

v0.1.0

Shipped the initial CLI with init, grade, eval, evolve, watch, doctor, and platform ingest commands. - Added Claude Code hooks for prompt capture, skill evaluation, and end-of-session telemetry. - Introduced the initial observe → detect → evolve → watch loop that the rest of the product builds on today.

Start here

Use SelfTune

Run it your way

Concepts

Reference