Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.selftune.dev/llms.txt

Use this file to discover all available pages before exploring further.

This page tracks user-facing selftune changes in a format that is easier to scan than raw commit history. Subscribe to the RSS feed at docs.selftune.dev/changelog/rss.xml, or browse packaged artifacts and compare links in GitHub releases. Matching OSS release tags are enriched from the corresponding entry on this page.
Tags on this page use a fixed taxonomy so filters stay stable over time: Cloud, CLI, Platforms, OSS, Dashboard, Registry, Billing, Community, and Breaking change.
2026-04-20
CloudDashboard
Cloud trust panels now include broader 90-day outcome history
Skill-level and source-level cloud trust panels now add coarse 30-day outcome buckets across the last 90 days on top of the short weekly history.That gives operators a broader trust read before they queue another hosted run, instead of relying only on the most recent few outcomes.
2026-04-20
CloudDashboard
Cloud trust now correlates benchmark health with observed proposal outcomes
Skill-level and source-level cloud trust panels now group recent suite-backed runs by each suite’s latest canonical saved check state.That makes it easier to see whether suites that currently look healthy are actually lining up with helped outcomes after apply, or whether regressions are clustering around failed or missing saved checks.
2026-04-20
CloudDashboard
Cloud trust panels now include short weekly outcome history
Skill-level and source-level cloud trust panels now include a compact multi-window post-apply history derived from recent completed observation buckets.That makes it easier to see whether the last few windows were improving, regressing, mixed, or steady instead of relying on a single rolling badge.
2026-04-20
CloudDashboard
Cloud trust panels now include a recent outcome timeline
Skill-level and source-level cloud trust panels now include a compact recent outcome timeline based on the same post-apply observation summary that powers the direction badge and latest outcome link.That keeps a short run of concrete helped, regressed, or inconclusive proposal outcomes visible without drilling into proposal history.
2026-04-20
CloudDashboard
Cloud trust panels now summarize recent post-apply direction
Skill-level and source-level cloud trust panels now condense recent post-apply outcomes into a compact direction signal: Improving, Regressing, Mixed, Steady, or Needs more signal.That makes it easier to tell whether trust is getting better or worse without opening each proposal outcome one by one.
2026-04-20
CloudDashboard
Cloud trust warnings now link to the latest observed proposal outcome
The skill-level and source-level cloud trust panels now surface the latest completed post-apply outcome with a direct proposal link.That means stale or watch-mode trust warnings now point at concrete outcome evidence instead of leaving operators to hunt through proposal history by hand.
2026-04-20
CloudDashboard
Cloud trust summaries now self-heal ended apply observation windows
Source trust summaries no longer wait for proposal-detail reads or the batch observation scorer to reflect ended apply windows.When an observation window has already ended, the cloud trust summary now scores it on read and promotes it into the completed outcome counts immediately.
2026-04-20
CloudDashboard
Cloud trust summaries now show when recent applies are still being observed
Cloud trust summaries no longer flatten everything into completed outcomes. The selected source trust card and run preflight now show when recent applies are still inside the post-apply observation window, so operators can tell when the latest outcome counts are not final yet.
2026-04-20
CloudDashboard
Observed-skill cloud controls now warn before queueing runs with stale or failing trust signals
The observed-skill Cloud Improve panel now raises explicit run preflight warnings when the selected source trust is already degraded or when the selected suite’s last canonical task-package smoke check failed.That keeps risky benchmark state visible at the actual queue point instead of only inside the deeper source detail screens.
2026-04-20
CloudDashboard
Observed-skill cloud controls now show source trust before queueing runs
The Cloud Improve panel on observed skill pages now shows the selected cloud source’s trust summary, recent post-apply observation counts, and the latest canonical task-package smoke result for the selected suite.That means operators can check benchmark freshness and recent real-world outcomes before they queue another hosted improve run, instead of drilling into the cloud source page first.
2026-04-20
CloudDashboard
Cloud source evidence can now jump straight into task-package draft authoring
Suggested trigger cases and recent run-pressure cards can now promote directly into a review-only task-package draft instead of always starting on the structured path first.That direct promotion path now follows the first persisted draft/refine step with initial environment and verifier asset generation, so operators start from source evidence plus concrete draft files instead of an empty task-package scaffold.
2026-04-20
CloudDashboard
Cloud task-check drafts now start through the authoring agent flow
Suggested trigger cases and recent run-pressure cases no longer create task/output drafts purely in page state.Draft creation now goes through a review-first promotion route that builds seed enrichment on the server and invokes the same Think or fallback refinement path used by the persisted authoring session.
2026-04-20
CloudDashboard
Cloud task-check drafts now keep visible authoring activity history
Cloud source pages now show the recent review-first authoring steps attached to a persisted task/output draft instead of only the latest draft snapshot.The authoring session now keeps a bounded activity log for draft saves and promotions, Think or fallback refinement, task-package asset generation, bundle materialization, runnable smoke checks, and draft clears.
2026-04-20
CloudDashboard
Cloud advanced run settings now show canonical task-package smoke freshness in suite selection
The advanced run drawer no longer hides whether a saved task-package suite most recently passed or failed its canonical smoke check.This reuses the same summary-level saved-suite smoke signal in the advanced run suite chooser, so operators can compare suite trust while configuring a run, not just while editing the suite.
2026-04-20
CloudDashboard
Cloud suite pickers now show canonical task-package smoke freshness
Operators can now compare saved task-package suites before opening one in the editor.This adds summary-level canonical smoke state to hosted eval-suite list responses and surfaces it directly in the cloud source page suite picker, so saved-check freshness is visible during suite selection as well as after a suite is opened.
2026-04-20
CloudDashboard
Cloud source setup summaries now show the latest canonical task-package smoke result outside the editor
Operators no longer need to open the eval editor to see the last saved task_package smoke result.This now surfaces the latest canonical smoke state directly in the cloud source page setup summary, so the trust signal is visible in the broader read model as well as in the editor.
2026-04-20
CloudDashboard
Cloud saved task-package suites now remember the last canonical smoke result
Running a saved canonical task_package suite smoke check no longer yields a result that disappears as soon as the request ends.This now writes the latest canonical smoke result back into the saved suite metadata and surfaces it in the cloud source page editor, so operators can see the last saved-suite check before deciding whether to rerun it.
2026-04-20
CloudDashboard
Cloud source pages can now run saved canonical task-package suites once against the current snapshot
Saved canonical task_package suites no longer require a separate improve run just to verify the saved case still works against the current source snapshot.This adds a narrow saved-suite smoke action on cloud source pages and matching eval-suite routes, so operators can run a saved canonical task-package case once before they reuse it in hosted improve runs.
2026-04-20
CloudDashboard
Cloud task-package saves now retain draft provenance and latest smoke results in the canonical case payload
Runnable task_package saves no longer drop the context that produced the draft.This now writes a typed task_package_metadata block into matching canonical cases, preserving the seed evidence, expected-outcome scaffold, optional notes, and latest smoke result that came from the authoring session.
2026-04-20
CloudDashboard
Cloud task-package saves now require a fresh smoke result before a runnable case can be written into a canonical suite
Runnable task_package drafts on cloud source pages can no longer be saved into a canonical hosted eval suite if the latest smoke result is missing or stale.This now blocks the save both on the source page and in the eval-suite create/update routes, so runnable task-package cases must be freshly smoke-checked before they become part of the authoritative suite.
2026-04-20
CloudDashboard
Cloud task-package drafts now keep the last smoke result visible and mark it stale when the scaffold, assets, or bundle change
Runnable task-package drafts on cloud source pages no longer silently lose their last smoke-check result when the scaffold or generated bundle changes.This now keeps the latest smoke result visible, marks it stale with an explicit reason, and tells the operator when the draft needs to be smoke-checked again before save.
2026-04-20
CloudDashboard
Cloud source pages can now smoke-check runnable task-package drafts before saving them into a canonical suite
Runnable task-package drafts on cloud source pages can now be executed once against the current snapshot before they are saved into a canonical suite.This persists the latest pass/fail result back into the authoring session, so operators can verify the materialized bundle and current snapshot still work together before promoting the case.
2026-04-20
CloudDashboard
Cloud task-package drafts now switch into an explicit runnable state once their review-only bundle is materialized
Materialized task-package drafts on cloud source pages no longer stay labeled like scaffolds.This now promotes them into an explicit runnable draft state, updates the promotion preview and bundle preview accordingly, and makes it clear that the existing save flow will write a real canonical task_package case.
2026-04-20
CloudDashboard
Cloud task-package drafts can now materialize review-only bundles into real R2-backed environment archives
Persisted task-package drafts on cloud source pages can now turn generated bundle files into a real review-only environment archive.This uploads the bundle to R2, points the draft scaffold at the materialized archive, and keeps the archive descriptor in the same authoring session until the operator explicitly promotes the draft further.
2026-04-20
CloudDashboard
Cloud task-package drafts now roll generated asset files into a review-only bundle preview with explicit archive-materialization readiness
Persisted task-package drafts on cloud source pages no longer stop at raw environment/verifier asset text.This now rolls the generated files into a review-only bundle preview, marks when the draft is ready for archive materialization, and keeps the whole flow inside the persisted authoring session until an operator explicitly promotes it further.
2026-04-20
CloudDashboard
Cloud task-package drafts can now generate review-only environment and verifier asset drafts through the authoring agent
Persisted task-package drafts on cloud source pages can now generate a review-only environment manifest draft and verifier script draft through the authoring agent.This keeps the new task-package authoring flow behind the same persisted draft session, surfaces whether the generated assets came from Think or the fallback template path, and avoids writing anything canonical until the operator decides the draft is ready.
2026-04-20
CloudDashboard
Cloud source pages now support real task-package draft authoring, including editable scaffold fields and canonical task-package case saves
Persisted task/output drafts on cloud source pages can now move past a placeholder preview into real task-package scaffold authoring.This adds editable instruction, environment, verifier, oracle, skill mount, and resource-hint fields to the review-only draft flow, lets operators save that scaffold back into the persisted authoring session, and makes the existing save path emit a real task_package case when that is the chosen promotion target.
2026-04-20
CloudDashboard
Cloud source pages now show canonical promotion previews for task/output drafts and let operators switch a persisted draft between a structured check and a review-only task-package scaffold
Persisted task/output drafts on cloud source pages now show what the current promotion target will become in the canonical eval pipeline before anything is saved.This makes the next step explicit: a draft can stay on the structured deterministic path, or switch into a review-only task-package scaffold with the expected environment and verifier placeholders called out up front.
2026-04-20
CloudDashboard
Cloud source pages can now refine persisted task/output drafts through a review-only authoring agent, with an explicit fallback when Workers AI is unavailable
Persisted task/output drafts on cloud source pages can now run through a bounded authoring-agent refinement step before the operator saves anything canonical.This keeps the same draft/session contract, surfaces whether the refinement came from a Think-backed path or a deterministic fallback, and lets the page stay review-first even when Workers AI is unavailable. The dashboard exposes that action through the new public refine route instead of a page-local-only mutation path.
2026-04-20
CloudDashboard
Cloud source pages now show review-only guidance for persisted task/output drafts and let operators apply that scaffold back into the draft before saving it
Persisted task/output drafts on cloud source pages now show API-derived review-only guidance from matching suggestions, recent run pressure, and saved eval-suite overlap.This means operators can see what failure a draft protects against, what verifier shape is likely to fit best, whether they should extend an existing suite, and apply the suggested scaffold back into the deterministic draft before they save anything canonical.
2026-04-20
CloudDashboard
Cloud task/output drafts now keep seed evidence, promotion target, and expected-outcome scaffolding, so deterministic eval authoring survives refreshes with real provenance instead of thin form state
Cloud source pages now persist richer deterministic draft metadata for eval authoring, including the originating evidence, the intended promotion target, and a first expected-outcome scaffold.This means review-only task/output drafts are no longer just local editor fields. Operators can refresh and resume the draft while still seeing what seeded it and what kind of deeper check it is trying to become.
2026-04-19
CloudDashboard
Cloud source pages now keep one resumable task/output draft per source, so operators can refresh and continue deterministic eval authoring without rebuilding the draft from scratch
Cloud improve now stores a review-only task/output draft per source in the runtime layer and exposes it back through source detail.This means a deterministic draft started from trigger evidence is no longer purely local page state: operators can refresh, resume the draft, or clear it without overwriting the saved trigger-confidence suite.
2026-04-19
CloudDashboard
Improve-run detail now keeps the active timeline step and proposal links in sync after completion, so operators can see live progress and open the winning proposal without a manual refresh
Improve-run detail now seeds the current phase into the timeline immediately, keeps the active step visibly live, and briefly re-checks proposal links after a successful run until the winning proposal is available.This closes two gaps on the same surface: active runs now look active where the operator is already reading, and completed runs no longer require a manual refresh just to open the winning proposal.
2026-04-19
CloudDashboard
Improve-run pages now keep refreshing proposal links briefly after a run completes, so new proposal links appear without a manual reload
Cloud improve-run pages now re-sync proposal links after terminal updates and keep polling briefly when a winning candidate exists but the proposal link has not settled yet.This closes the gap where a run could finish successfully and create a proposal seconds later, while the improve page still looked like proposal creation had been skipped until a manual refresh.
2026-04-19
CloudDashboard
Cloud source pages now show live source-coordination state, including the active reserved run and queued rerun/cancel intent
Cloud source detail now carries a compact coordinator read model from the runtime, and the source page surfaces that state directly.This makes it visible when a source already has an active reserved run, whether cancel was requested, and whether a rerun is queued, without inferring it only from raw run rows.
2026-04-19
CloudDashboard
Cloud source pages can now promote trigger evidence into a detached task/output draft, so operators can scaffold deterministic checks without overwriting the saved trigger suite
Pending eval suggestions and recent run-pressure cards now offer a Draft task check action that opens the eval editor in deterministic mode with a new, unsaved task/output draft scaffold.This keeps the saved trigger-confidence suite intact while giving operators a fast path to start deeper task/output coverage from real evidence.
2026-04-19
CloudDashboard
Improve-run timelines now keep the active phase on a colored dot, so the phase progression stays visible while a run is live
The live step on the cloud improve-run timeline now stays on the same phase-colored dot as the rest of the run history instead of switching to a generic spinner.This keeps the setup, evaluation, drafting, and finalization phases visually distinct even while the run is still active.
2026-04-19
CloudDashboard
Cloud skill pages now separate trigger-confidence evals from task/output checks in the onboarding and eval-editor language
Cloud skill onboarding and eval-editor copy now makes the intended eval progression explicit: start with trigger confidence, then add task/output checks once the skill is activating in the right places.This matches the hosted eval contract more closely and makes it clearer that discoverability/routing checks and deeper task correctness checks answer different questions.
2026-04-19
CloudDashboard
Cloud source pages now expose a public rerun-analysis action, so operators can reprocess the current snapshot without creating a new upload or GitHub sync
Cloud source pages now include a Rerun analysis action that triggers the existing snapshot analysis pipeline for the current snapshot through a public dashboard route.This makes it possible to refresh validation, lint, capability, and structural reports after cloud-side analysis logic changes, without creating a new upload or GitHub sync just to force a re-run.
2026-04-19
CloudDashboard
Improve-run pages now preserve run scope on per-candidate proposal links, so review navigation stays inside the run-specific proposal queue
Candidate-level View proposal links on improve-run pages now keep the current ?run=... context instead of dropping back to the global proposal queue.This keeps the operator inside the run-specific review flow whether they open the winning proposal from the summary header or from the candidate list.
2026-04-19
CloudDashboard
Proposal cards now render separate skill and review links instead of nesting anchors, fixing a hydration bug on the dashboard proposals page
Cloud proposal cards now show an explicit Review proposal link inside the card instead of wrapping the whole card in a detail link.This removes invalid nested anchor markup when a proposal card also links to its skill page, which fixes the hydration error on the dashboard proposals route.
2026-04-19
CloudDashboard
Cloud source pages now read structural provenance directly from the proposal queue payload, removing an extra proposal-detail fetch from the latest-run review path
The run-scoped proposal queue now carries candidate provenance for cloud improve proposals, so cloud source pages can explain the newest structural proposal directly from the queue payload.This removes the extra proposal-detail request the source page used to make just to show the latest proposal’s structural origin.
2026-04-19
CloudDashboard
Cloud skill source pages now show which structural recommendation produced the newest linked proposal, so authors can connect snapshot analysis to the actual reviewable candidate
The structure-candidates panel on each cloud skill source page now surfaces the newest linked proposal’s exact structural recommendation and deterministic strategy when that proposal came from a structure run.This closes the gap between “top recommendations on the snapshot” and “the candidate you can review right now,” so authors can see why the latest proposal exists before opening the proposal detail page.
2026-04-19
CloudDashboard
Cloud proposal queues and detail routes now have explicit review-flow coverage, reducing the risk of silent regressions in run-scoped proposal review
Added route-level coverage for the run-scoped proposal queue and proposal detail lookup used by cloud-improve review flows.This hardens the path from improve-run pages into proposal review by checking both the queue filter (cloud_run_id) and proposal-detail validation behavior.
2026-04-19
CloudDashboard
Cloud improve and skills surfaces now use softer, consistent status treatments instead of mixing heavy cyan badges, raw enum labels, and duplicate readiness states
Cloud skill cards, improve-run pages, and setup checklists now use a standardized status system with softer status chips, text-only status labels where appropriate, and clearer warning/error semantics.This removes raw labels like cloud_ready, collapses redundant Ready states on the skills library cards, and brings advanced run settings into a drawer with more readable model selection options.
2026-04-19
CloudDashboard
Proposal detail now normalizes structural provenance consistently after refresh and keeps run-scoped back navigation on the same helper path as the proposal queue
Proposal detail now uses shared helpers to normalize structural provenance and run-scoped back navigation, so refreshed review pages keep the same candidate-rationale and queue-return path instead of relying on ad hoc inline logic.This keeps the proposal review surface more predictable as cloud-improve candidates add richer provenance metadata and more scoped review flows.
2026-04-19
CloudDashboard
Proposal detail now shows which structural recommendation produced a cloud structure candidate, so operators can see why a package rewrite was drafted before applying it
Cloud proposal detail now surfaces the structural recommendation and deterministic strategy that produced a structure candidate, such as extract_references or harden_script_ergonomics.That makes package-backed proposal review more defensible: operators can see which structural analysis signal led to the candidate before deciding whether to apply it to a draft or GitHub PR.
2026-04-19
CloudDashboard
Improve-run detail now links straight into the run-scoped proposal queue when a candidate frontier produced more than one reviewable proposal
Improve-run detail pages now preserve run context when you open the winning proposal, and they surface a direct Review run proposals action whenever a hosted run produced more than one reviewable proposal.That keeps the run frontier, proposal queue, and individual proposal review pages tied together, so multi-candidate review flows no longer require manual navigation between /improve and /proposals.
2026-04-19
CloudDashboard
Run-scoped proposal review now preserves that scope when you open proposal detail, so it is easier to move through a single cloud-improve queue without losing context
When you open proposal detail from a run-scoped proposals list, the detail page now keeps that run context and offers a Back to run proposals path instead of dropping you back into the global proposal backlog.This keeps multi-candidate cloud review flows tighter: operators can move between the run-specific proposal queue and individual proposal detail pages without reapplying filters or losing their place.
2026-04-19
CloudDashboard
Proposal review can now be scoped to a single improve run, so multi-candidate cloud runs no longer dump you back into the full proposal backlog
The proposals page now accepts a run-scoped view for cloud-improve runs, letting operators review only the proposals created by a single hosted run instead of filtering mentally through the full backlog.Cloud skill source pages now use that view when the latest run produced multiple proposals, so the Structure candidates panel can link directly into proposal review even when there is more than one candidate to inspect.
2026-04-19
CloudDashboard
Cloud skill source pages now explain when structure proposals are viable and link directly into proposal review when the latest run produced one
Cloud skill source pages now synthesize the typed structural-analysis report into a dedicated Structure candidates panel instead of leaving operators to infer package-shape readiness from raw technical details.The panel now shows whether the current snapshot is ready for structure proposals, still review-first because of execution limits, or simply not a structural change candidate right now. When the latest run already produced a reviewable proposal, the page links straight into the proposal review flow; otherwise it routes back to the latest run frontier.
2026-04-19
CloudDashboard
Cloud proposal review now renders the real candidate diff for package-structure changes and respects GitHub PR apply targets
Cloud proposal detail now renders the unified diff from the winning candidate package instead of only showing placeholder see candidate archive text. Structure-focused proposals can be reviewed as real package changes, and the page links directly to the candidate archive when you want the full package.Proposal apply now also respects the run’s configured apply target. If a cloud-improve run was set to write back through GitHub, the proposal page now offers a GitHub PR action instead of always defaulting to draft promotion.
2026-04-19
CloudDashboard
Cloud proposal detail now keeps the originating run and apply history visible after refresh
Cloud proposal detail now links back to the originating improve run so you can move from a pending proposal into the full candidate frontier and evaluation evidence without hunting for the run separately.The page also now shows recent apply attempts, including whether the attempt targeted a draft or GitHub PR, when it ran, and the PR URL or error message when one is available. That keeps proposal review useful even after the page is refreshed or revisited later.
2026-04-19
CloudDashboard
The proposals index now recognizes archive-backed cloud-improve proposals instead of showing them as fake single-field diffs
The proposals index now correctly tags cloud-improve proposals created by the hosted runner, even when proposed_by is the actual runner identity instead of the older cloud_improve string.Archive-backed cloud proposals also no longer render (see candidate archive) as if it were a field-level diff. The card now tells you it is a package-backed change and shows whether a linked candidate package and source run are available before you open the full review page.
2026-04-18
CloudDashboard
Cloud skill source pages now surface structural analysis, script strategy, and frontmatter execution guidance instead of hiding them in raw validation blobs
Cloud source detail now renders the typed structural-analysis summary for the current snapshot, including SKILL.md line and token budgets, inferred script strategy, compatibility notes, allowed-tools, and the execution flags that explain whether cloud writeback is viable or still review-first.Validation checklists and technical details also now surface structural recommendations as first-class findings instead of showing 0 issues for typed reports that were previously stored outside the generic lint array shape.
2026-04-18
CloudDashboard
Ended post-apply observation windows are now scored against baseline telemetry instead of staying pending forever
Applied cloud-improve proposals now compare a pre-apply telemetry window against the finished post-apply observation window and classify the result as helped, inconclusive, or regressed.Proposal detail now shows the before/after live-signal breakdown for eval volume, pass rate, missed triggers, false negatives, and false positives, and source-level Eval suite trust now folds recent observed regressions and helps into the trust summary so coverage isn’t the only signal. Ops can batch-score ended windows with bun run score:cloud-improve-observations.
2026-04-18
CloudDashboard
Applied cloud-improve proposals now enter an explicit observation window instead of looking fully done the moment apply succeeds
Applied cloud-improve proposals now show a post-apply observation state on the proposal detail page. Instead of treating applied as the end of the story, the page now distinguishes proposals that are still gathering live signal from ones that will eventually be evaluated against observed outcomes.This is the first thin slice of the post-apply observation loop from the cloud-improve quality hardening plan. It does not score outcomes yet, but it does create a durable observation record the moment a draft promotion or GitHub apply succeeds.
2026-04-18
CloudDashboard
Cloud skill pages now show eval-suite trust signals, and the cloud-improve runner has a first judge-calibration benchmark command
Cloud skill source pages now include an Eval suite trust panel that shows whether the saved suite still covers recent linked telemetry and the newest hosted improve-run pressure. Instead of treating every eval win as equally trustworthy, the page now marks suites as Fresh, Watch, Stale, or No signal based on how much recent evidence is actually covered by saved trigger-query cases.The cloud-improve runner also now ships with a first reusable judge-calibration command, so the llm_judge trigger evaluator can be checked against a labeled benchmark fixture instead of remaining an unmeasured instrument.
2026-04-18
CloudDashboard
Cloud skill setup now auto-fixes common Agent Skills spec issues before the first snapshot is analyzed
Upload-backed and GitHub-backed cloud skill setup now canonicalize common package issues before the first snapshot is analyzed. The setup flow rewrites lowercase skill.md to SKILL.md, rebuilds missing or invalid frontmatter, normalizes the skill name to Agent Skills format, synthesizes a required description when it is missing, and moves unsupported top-level frontmatter fields into metadata so the package starts from a spec-compliant baseline.The setup response now also reports which fixes were applied, and the cloud library success message calls that out before the first quick eval suite and hosted improve run are prepared.
2026-04-18
CloudDashboard
Cloud eval suites now learn from telemetry and hosted runs, write accepted suggestions directly into saved suites, and surface eval pressure from the latest run
Cloud skill pages now surface suggested trigger-query cases from the linked observed skill whenever recent real usage exposes misses or false positives that are not already covered by the saved suite.Hosted improve runs also now persist query-level eval evidence, so recent run failures and regressions can feed the same review queue when the active suite no longer covers those cases.These suggestions stay review-first: you can append them into the draft suite from the cloud page, inspect them in the editor, and then decide whether to save them before the next improve run. Accepted and dismissed suggestions are now also persisted, so the same pending cases do not keep resurfacing after you review them. The cloud source page now also keeps a reviewed history with restore and re-accept actions, so dismissed cases can be reopened and accepted cases can be added back into the draft without losing their provenance. Accepted suggestions can now also be written directly into the selected saved suite with their telemetry/run provenance preserved, instead of stopping at the draft-only state. The cloud source page now highlights the latest run’s eval pressure directly, and the improve run detail page surfaces the failed/regressed queries from that run instead of forcing operators to dig through raw artifacts to find them. Source detail reads also now degrade safely if the new eval-suggestion review tables are one migration behind, while review actions return a clear migration-needed error instead of a raw database exception.
2026-04-18
CloudDashboard
Cloud improve now generates true bounded surface candidates and run pages render actual reviewed diffs
Hosted improve generation now treats description, routing, and body as real bounded mutation surfaces instead of prompt-only hints. A routing candidate rewrites only routing, a description candidate rewrites only the description, and body candidates preserve routing while updating the non-routing sections they are allowed to touch.Improve run detail pages now also normalize old prose-only diff summaries back into real unified diffs when the source and candidate archives exist, so review pages show the actual changed lines instead of a rationale paragraph.
2026-04-18
CloudDashboard
Improve run detail pages now show the skill context, run outcome, and readable evidence instead of raw storage URLs
Hosted improve run pages now load the source skill and eval-suite context, summarize the winning result or failure in plain language, and show the best candidate’s score movement and diff preview directly on the page.Evidence is still available for download, but artifact links are now grouped and labeled by what they represent instead of exposing a wall of raw R2 URLs.
2026-04-18
CloudDashboard
Fresh cloud skills now auto-start the first hosted improve run, and the legacy in-process dispatcher is rollback-only
Fresh cloud skill sources now move directly from upload or sync into the first hosted improve run once the quick eval suite is generated, instead of stopping on the setup page and requiring a separate manual queue action.The API startup path also now treats the old in-process improve dispatcher as an explicit legacy rollback path rather than part of the normal runtime. Cloudflare remains the default hosted execution plane whenever the runtime URL is configured.The API-key cloud-source surface also now matches the dashboard route for creating GitHub-backed sources, which makes the same hosted improve flow scriptable for smoke runs and automation.
2026-04-18
Cloud
Branded email system with 9 React Email templates and centralized Resend service
New @selftune/email package with 9 branded React Email templates: welcome, alert notification, evolution proposal, weekly digest, team invitation, plan upgrade, usage limit warning, getting started, and first insight. Alert emails now use HTML templates instead of plain text. Team invitations and billing checkout flows send branded emails automatically.
2026-04-18
Cloud
Cloud improve now auto-links uploaded and GitHub-backed sources to canonical skills, and imported task-package suites are first-class
Upload-backed and GitHub-backed cloud sources now automatically create or reuse the canonical skills row they belong to. That closes the proposal gap where a winning improve run could persist artifacts but skip proposal_created because the source had no linked skill_id.The eval-suite control plane also now accepts source_kind = imported for deterministic task_package suites, which is the first explicit hosted lane for benchmark-style imports instead of treating every imported suite as a manual one. The docs now also include a first-class imported benchmark page and script path for turning package manifests into live cloud eval suites.
2026-04-17
Cloud
Cloud improve now supports deterministic task-package eval suites and benchmark-style runtime docs
Hosted eval suites now accept deterministic task_package cases, which lets you point an improve run at a benchmark-style environment archive and verifier script instead of relying only on trigger-query or exact-match checks.The Cloudflare runtime executes these task packages inside Sandboxes so the verifier has a real filesystem and process boundary, and the public docs now cover improve run events, statuses, and eval-suite API usage in the same terminology the product uses.
2026-04-17
CloudDashboard
Improve run pages now show customer-facing live progress, delay states, and clearer timeline copy
Hosted improve pages now translate runtime activity into customer-facing progress language instead of exposing queue, worker, or transport details. Active runs surface clearer status cards, a friendlier timeline, and “taking longer than expected” messaging when a run stalls.The improve overview also better distinguishes active versus completed work without making the page feel like an internal operations console.
2026-04-17
CloudDashboard
Improve run pages now refresh live while queued and running, with terminal refetch on completion
Hosted improve run detail pages now subscribe to the run event stream while a run is queued or running, updating phase and status live instead of waiting for a manual refresh. When a terminal event arrives, the page re-fetches full run detail so candidates, artifacts, and proposal state stay in sync.The improve run list also now polls only while active runs are visible, which keeps the overview current without constantly refetching completed history.
2026-04-17
Cloud
Cloud source uploads and GitHub sync now accept lowercase skill.md packages and preserve folder paths on the API-key surface
Hosted cloud-source ingest now accepts both SKILL.md and lowercase skill.md when validating uploaded packages and GitHub-backed skill repos. That keeps upload and sync behavior aligned with the rest of the hosted analysis pipeline, which already supported both casings.The API-key Hono upload route also now preserves multipart field keys as relative paths instead of flattening uploaded files to their basenames, so folder uploads keep nested references/ and other package structure intact.
2026-04-17
Cloud
Cloud improve runtime: Cloudflare execution plane foundation and live SSE event stream
Added foundation for Cloudflare-backed improve run execution using Queues, Workflows, and Sandboxes. A new GET /api/v1/improve-runs/:id/events endpoint streams run lifecycle events via SSE, enabling live updates on the run detail page without manual refresh. The runtime mode is controlled by CLOUD_IMPROVE_RUNTIME_MODE and defaults to legacy with no behavior change until explicitly switched.
2026-04-17
CloudDashboard
Cloud skill validation now uses native spec checks and clearer report detail during setup
Cloud skill setup now persists one validation report per snapshot and shows those results inline in the guided setup hero, so structural validation, best-practice lint, and capability classification are easier to inspect without dropping into raw logs.The hosted validation step also now runs on a native TypeScript implementation of the Agent Skills frontmatter rules instead of shelling out to the demonstration skills-ref toolchain. That keeps cloud validation deterministic in production while tightening allowed-tools parsing and preserving clearer per-rule issue messages.Apply flows also now version and re-upload promoted skill archives more safely. Draft apply and GitHub PR apply both keep the cloud source pointed at the newly promoted snapshot, preserve archive manifests across lowercase skill.md packages, and avoid corrupting frontmatter when YAML values contain ---.
2026-04-17
CloudDashboard
Cloud source APIs now honor source-type and skill filters consistently across dashboard and API-key surfaces
The hosted cloud-source list API now applies the same type and skill_id filters on the API-key Hono surface that the dashboard session route already supported. That keeps browser, CLI, and smoke-test callers on one normalized contract when listing cloud skills.This batch also tightens the hosted improve apply/runtime path so root-level GitHub applies do not infer repo-wide deletions, runner dependencies resolve snapshots through the correct org-scoped database client, and local Neon CLI binding metadata is no longer tracked in git.
2026-04-17
CloudDashboard
Cloud improve model selectors now load the live OpenRouter catalog and use a teacher-student default spread
The per-run model selectors on cloud and observed skill detail pages no longer use a hardcoded GPT-only shortlist. They now load the current OpenRouter model catalog from the server and expose the broader set of text-capable models available through the hosted cloud runtime.Each selector also now carries an explicit recommended default for generate, judge, and summarize. Leaving a selector empty keeps the server-side default for that role, and the UI now spells out those defaults directly so you can test alternatives without losing track of the intended baseline. The selectors are now searchable comboboxes as well, so longer model lists stay usable without scrolling through a giant dropdown.The recommended defaults now follow a clearer teacher-student split instead of a flat GPT-only stack: google/gemini-2.5-pro for proposal generation, google/gemini-2.5-flash for judging, and google/gemini-2.5-flash-lite for summarization. That keeps the strongest model on the expensive generative step while moving validation and helper work onto cheaper OpenRouter models.
2026-04-17
CloudDashboard
Cloud skill onboarding now auto-creates a 50-case eval suite and funnels into one happy path
When you create or sync a cloud skill, the detail page now automatically drafts and saves a 50-case quick eval suite from the current snapshot instead of making you build one manually first. The skill page now leads with a single guided decision: edit the generated eval suite or start the hosted improve run.The cloud skill detail UI has also been simplified around that progression. The primary suite is now treated as one editable artifact with save support, advanced run controls stay collapsed by default, and metadata/report panels are moved behind a technical-details disclosure so the page feels less like a control plane and more like a clear product flow.The Eval summary card now surfaces the per-case origin (auto-generated versus hand-curated versus a mixed breakdown) so you can tell at a glance whether the suite is still the synthetic draft or has been edited. Clicking Edit eval suite also now smooth-scrolls the editor into view, and the hero action row has been reordered so advanced run options live next to the edit affordance rather than after Start improve run.
2026-04-16
CloudDashboard
Overview now presents the hosted cloud loop instead of legacy first-run telemetry onboarding
The cloud dashboard overview now introduces SelfTune as a hosted review loop instead of the older “run selftune and wait for skills to appear” onboarding. The empty-state banner now points people toward the real cloud path: create or import a cloud skill, shape a reviewable eval suite, run the hosted comparison, and review the resulting proposal before draft apply.The overview also now keys that banner off cloud authoring state rather than observed telemetry alone, so the first-run guidance stays visible until you actually have cloud sources in the hosted product.
2026-04-16
CloudDashboard
Quick Eval Suite can now auto-seed synthetic trigger cases and edit them in a table
The cloud skill detail page now includes a real Quick Eval Suite editor instead of only raw textareas and JSON authoring. Trigger-query cases are now editable as table rows with expectation, invocation type, provenance, and row-level remove actions.For llm_judge suites, the page can also draft a synthetic seed directly from the current snapshot’s SKILL.md. Those seeded cases are marked as synthetic so you can review, revise, or delete them before creating the hosted eval suite.
2026-04-16
CloudDashboard
Cloud folder uploads now preserve a real skill directory and show clearer selection state
Cloud skill uploads now package the selected folder as a real skill directory instead of flattening only its file contents into the snapshot archive. That means downstream validation sees the skill in a proper directory layout, which fixes false failures caused by generic wrapper names during skills-ref analysis.The dashboard upload flow is also clearer: the picker now behaves like a folder intake card, shows the detected folder name and file count, confirms whether a root SKILL.md was found, and disables upload until the selection is actually valid.
2026-04-16
CloudDashboard
Cloud improve runs now support separate generate, judge, and summarize model overrides
Cloud improve run setup now exposes three separate model selectors instead of one shared override. You can independently pick the model used for candidate generation, LLM judging, and summarization from the skill detail page before queueing a run.These overrides are stored with the run itself and passed through the hosted runner, so they no longer collapse into a single model choice. This makes it practical to test cheaper summarize settings while keeping a stronger judge, or to isolate generation changes without touching the server defaults.
2026-04-16
CloudDashboard
Cloud dashboard now uses skill-first wording instead of exposing raw source terminology
The cloud dashboard now uses skill-first wording across the library, detail pages, and observed-to-cloud bridge instead of exposing the backend source model directly in the UI. Cloud library cards, blocked-state messages, quick eval setup, and linked-skill surfaces now read as normal product concepts: Cloud skills, Open Skill, and Linked cloud skills.This is a terminology cleanup only. The backend cloud_skill_sources model and API routes are unchanged, but the visible dashboard flow is less confusing because it no longer asks users to think in storage-layer terms.
2026-04-16
CloudDashboard
Import observed skills to cloud and manage eval suites from the dashboard
Observed skill cards now include an Import to Cloud button that navigates directly to the cloud library with the skill pre-filled for import. The cloud source detail page also gains an Eval Suites section where you can view existing suites scoped to a source and create new ones inline with a name, verifier kind, and JSON test cases — no CLI or API calls required.Proposal detail pages now show full eval comparison data, artifact kinds, confidence levels, and an Apply to Draft button that closes the loop from review to draft apply. The jobs page shows Cloud Improve Runs alongside pipeline jobs with status badges and candidate counts.
2026-04-16
CloudDashboard
Cloud Library now shows cloud authoring records separately from observed telemetry skills
The cloud dashboard’s main Skills surface now reflects the real cloud authoring model instead of the telemetry-backed skill table. It lists GitHub-backed sources, imported uploads, and cloud-managed records with their current snapshot and capability state.The old telemetry-backed skills library is still available, but it now lives under Observed so local and cloud concepts do not get mixed together. Cloud sources without a linked telemetry skill now have their own detail page for snapshot metadata, validation reports, and hosted improve controls.The cloud library also now includes first-run onboarding directly in the UI: you can upload a skill folder into a new cloud source, create and sync a GitHub-backed source from a bound installation, jump into cloud import from observed skills, and create a lightweight eval suite from the cloud detail page before queuing a hosted improve run.Observed skill detail pages now also expose that bridge directly: if a skill already has linked cloud sources you can jump straight into them, and if it does not you can import it into cloud from the report itself instead of backing out to the library first.
2026-04-16
CloudDashboard
Cloud improve runs can now override the model per run, with cheaper default summarize policy
Cloud improve runs can now override the hosted model policy directly from the skill page before queueing a run. The selected override applies to the full run, not just candidate generation, so you can force a cheaper test model or a stronger one without changing server env defaults.Hosted defaults are also now more cost-aware for testing: generation and judging stay on openai/gpt-4.1-mini, while summarize and low-risk helper work default to openai/gpt-4.1-nano.
2026-04-16
Cloud
Cloud skill improvement integration: runner package, eval backends, control-plane wiring, and hosted eval-suite parity
The cloud skill improvement pipeline is now fully wired end-to-end. The isolated runner package (@selftune/cloud-improve-runner) connects to the control-plane orchestrator via concrete dependency adapters. Eval backends (trigger-query LLM judge + deterministic) dispatch through a registry. Both draft and GitHub apply paths consume the same candidate archive contract.Hosted eval-suite creation now validates runnable manual suites for both llm_judge and deterministic verifiers through the same control-plane contract the runner consumes, which keeps the dashboard and API-key paths on one canonical suite definition.
2026-04-15
OSSCLIDashboard
Non-TTY improve runs now show durable CLI progress
  • selftune improve and selftune evolve now fall back to plain stderr progress lines when the terminal is not a TTY, instead of going completely silent while long proposal or validation steps are still running. - Interactive terminals keep the spinner/TUI behavior, while test runs remain quiet by default.
2026-04-15
OSSDashboard
Dashboard action toasts now deep-link into the exact live run
  • Local dashboard action toasts now include a Live run action that opens the exact /live-run entry for the streaming creator-loop event, including the event id, skill, and action selection state. - The floating Live lifecycle actions feed now uses the same deep link, so clicking a running or finished lifecycle card jumps straight into the matching Live Run entry instead of leaving you to find it manually.
2026-04-15
OSSCLIPlatforms
eval generate can now force opencode or another agent runtime
  • selftune eval generate now accepts --agent for --synthetic, --auto-synthetic, and --blend, so you can force opencode, codex, or pi instead of relying on auto-detection order. - Cold-start synthetic eval generation now reuses the same cleaned query filtering as log-derived evals and summarizes oversized SKILL.md content before sending it to the runtime, which reduces prompt bloat for large skills like SelfTuneBlog.
2026-04-15
OSSCLI
Package search merge candidates no longer overwrite evaluated variants
  • Bounded package search now writes merged routing/body candidates into a new temp package snapshot instead of overwriting the already-evaluated body variant on disk, so candidate artifacts remain consistent for later winner application and review. - selftune create publish --watch --ignore-watch-alerts now also bypasses the watch gate when the watch subprocess crashes or fails to emit structured JSON, while still surfacing the warning and remediation command.
2026-04-15
OSSDashboard
OSS local dashboard CI now typechecks bounded search summaries again
  • The OSS local dashboard LiveRun test fixture now uses the real DashboardActionResultSummary shape for bounded package-search summaries, so export verification no longer fails when search_run is present on deploy candidate entries.
2026-04-15
OSSDashboardCLI
Dashboard lifecycle copy now matches the CLI lifecycle surface
  • The local dashboard now normalizes selftune create replay, selftune create baseline, selftune evolve, selftune evolve body, and selftune search-run into the same lifecycle-facing commands the CLI already shows, so Overview, Skill Report, and Live Run no longer leak stage-level command names for draft-package flows.
2026-04-15
OSSCLIDashboard
Package baseline now reuses fresh replay artifacts and emits phase progress
  • selftune create baseline --mode package now reuses the last fresh with-skill replay from the canonical package-evaluation artifact when the draft fingerprint still matches, so measuring baseline no longer pays for two full replay passes after an unchanged verify, report, or search-run. - Package baseline now emits explicit with_skill_replay and without_skill_replay step progress so the local dashboard live-run surface shows immediate movement instead of looking stuck while the underlying replay work is still running.
2026-04-15
OSSCLI
Reflective package search, merged winners, and lifecycle auto-selection
  • selftune search-run now prefers reflective routing/body proposals from measured runtime failures before targeted or deterministic fallback. - When routing and body both produce accepted improvements, package search now evaluates a merged candidate before final winner selection instead of forcing the frontier to choose between complementary single-surface edits. - Plain selftune improve now auto-selects bounded package search for skills that already have package evidence or a draft package manifest, so agents do not need to force --scope package for the main package-shaped lifecycle. - Added an end-to-end package lifecycle test covering verify auto-fix, bounded package search, winner promotion, and publish --watch.
2026-04-15
OSSCLI
Direct search-run now uses measured targeted variants
  • selftune search-run now uses the same measured targeted-routing/body mutation path as orchestrate package search, falling back to deterministic variants only when targeted variants do not fill the requested minibatch. - The public CLI docs and workflow docs now describe search-run as bounded local package search over draft variants instead of registry lookup, and the Evolve workflow now points package-scope users at the measured targeted search path instead of only the older deterministic description. - Publish/package-search lifecycle docs now describe the real blocking publish-time watch gate instead of the older advisory wording.
2026-04-15
OSSCLI
Verify auto-fix, publish watch blocking, and targeted-mutation wiring fixes
  • selftune verify now auto-runs the real missing-evidence commands with the required flags and skill context, including --auto-synthetic eval generation and generated unit tests. - selftune create publish --watch now blocks publish if the watch subprocess fails or returns malformed output instead of treating missing watch JSON as a passing gate. - Eval-informed targeted mutations now read grading_results.pass_rate, expectations_json, and failure_feedback_json from the real SQLite schema instead of a test-only summary_json shape. - The shipped lifecycle docs now describe the actual concrete readiness states and the correct --ignore-watch-alerts flag.
2026-04-15
OSSCLI
Lifecycle vocabulary normalization and restructured CLI help
  • normalizeLifecycleCommand now maps create replay, create baseline, evolve, evolve-body, and search-run to their lifecycle equivalents. - selftune --help now shows Primary Lifecycle commands first, with Advanced / Stage Commands below.
2026-04-15
OSSCLI
Auto-evidence generation in verify
  • selftune verify auto-runs missing evidence steps (up to 4 iterations) when readiness checks fail. Use --no-auto-fix to skip.
2026-04-15
OSSCLI
Broader package search eligibility for draft packages with grading evidence
  • collectPackageSearchEligibleSkills now includes a second eligibility tier: skills with a selftune.create.json draft package and at least 3 grading results in the DB are routed to package search during orchestrate. - The existing frontier/artifact fast path is unchanged; the new tier is additive and fail-open (skips silently if the grading table is missing).
2026-04-15
OSS
Docs: fix stale orchestrate claim in SearchRun.md and document watch frontier demotion
  • SearchRun.md no longer claims orchestrate cannot auto-select package search — it documents the eligibility criteria and plan-phase routing. - Watch.md adds a “How Watch Evidence Feeds Back to the Frontier” section explaining watch rank levels, SQLite row updates, and dashboard visibility. - SKILL.md SearchRun routing keywords now include “optimize package”, “improve routing and body together”, and “bounded evolution”.
2026-04-15
OSSCLI
Publish watch gate now blocks and mutation weakness extraction populates failure patterns
  • create publish --watch now blocks publishing when the watch gate detects active alerts (published: false, watch_gate_blocked: true), instead of unconditionally publishing. Use --ignore-watch-alerts to bypass. - extractMutationWeaknesses now populates gradingFailurePatterns from the expectations array in grading summary JSON, enabling targeted body mutations to focus on specific failed expectations.
2026-04-15
OSSCLIDashboard
Phase 2 follow-up fixes package search and publish gate wiring
  • Orchestrate now marks skills package-search-eligible from the real accepted frontier and canonical package-evaluation artifacts, so the new package-search branch is reachable in normal runs instead of existing only in isolated tests.
  • The orchestrate package-search phase now uses the current mutation and winner-application contracts, including targeted routing/body variants, current candidate path fields, and the current applySearchRunWinner response shape. - create publish --watch now surfaces watch_gate_passed, watch_gate_warnings, and watch_trust_score directly in the publish payload, and --ignore-watch-alerts now intentionally bypasses that advisory gate when needed. - Skill reports now populate watch_trust_score from the latest stored package-evaluation watch summary, so the dashboard watch trust indicator renders from real watch evidence instead of staying empty. - Fixed the selftune orchestrate CLI docs page so Mintlify renders it as a normal document instead of a raw fenced code block. - Dashboard skill report and live run now display routing and body weakness percentages from surface plan data, with a visual bar highlighting the weaker surface. The frontier panel also shows a parent-vs-winner comparison when both members are available.
2026-04-15
OSSCLI
Orchestrate gains automatic package search selection
  • Added evidence-driven scope selection to orchestrate so it automatically chooses between description-level evolve and package-level bounded search based on accepted frontier state and canonical package evaluation evidence. - Added watch trust scoring feedback so post-deploy regressions can demote accepted frontier candidates and influence future scope selection. - Updated workflow and skill documentation to reflect the new package-search-in-orchestrate truth.
2026-04-15
CLIOSS
Bounded mutation strategies for package evolution
  • Added deterministic routing mutations (synonym expansion, granularity split, coverage broadening) and body mutations (instruction emphasis, example enrichment, description expansion) for bounded package evolution. - Added eval-informed targeted mutations that consume measured weaknesses from replay failures and grading results to focus routing and body changes on specific failure patterns. - Added weakness extraction from the local SQLite database to surface replay failure samples, routing misses, body quality scores, and grading pass rate deltas for mutation targeting.
2026-04-15
OSSCLIDashboard
Watch trust scoring and publish gate
  • Added computeWatchTrustScore to the watch module, producing a 0-1 trust score from trigger regression, grade regression, and rollback signals. - Added an advisory publish watch gate that warns when active alerts or low trust scores are detected, with --ignore-watch-alerts bypass for experts. - Extended the dashboard contract with watch_trust_score on skill reports and watch_gate_passed on action result summaries. - Updated the live run screen to display watch gate pass/alert badges when watch or deploy actions complete.
  • Added a watch trust indicator to the skill report creator loop section.
2026-04-14
CloudRegistry
Cloud GitHub registry foundation
2026-04-15
CLIOSS
Package search phase in orchestrate loop
  • Added a package-search candidate action to the orchestrate loop so skills with accepted package frontier candidates are routed through bounded package search instead of standard evolution. - The new phase generates bounded mutations, fingerprints variants, runs package search evaluation, and applies winning candidates automatically. - Package search modules are lazy-loaded and gracefully degrade when unavailable, so the existing orchestrate flow is unaffected until the full package search stack is present.
2026-04-15
OSSCLI
search-run no longer over-biases body mutations on missing quality scores
  • search-run now treats body.quality_score: null as a neutral weakness signal when the body already passed validation, instead of coercing it to maximum weakness. - This prevents --surface both from over-allocating routing/body search budget toward body mutations when quality assessment was unavailable but the current body was still valid.
2026-04-15
OSSCLI
bounded search now biases the minibatch toward weaker measured surfaces
  • search-run --surface both now reads the accepted frontier first and falls back to the canonical package evaluation when needed, using that measured package state to bias routing/body candidate counts. - This replaces the old fixed half-routing half-body split with a weakness planner that sends more of the minibatch budget toward the weaker measured surface while still keeping bounded deterministic search behavior. - The chosen surface budget is now persisted into search provenance and shown in the live-run and skill-report frontier surfaces, so reviewers can see why a run spent more budget on routing or body.
2026-04-15
OSSCLIDashboard
package search can now promote the winning draft candidate
  • Added search-run --apply-winner, which copies the winning candidate back into the draft package and refreshes the canonical package-evaluation artifact from the accepted candidate cache instead of leaving search as read-only provenance. - selftune improve --scope package now adds winner promotion by default and keeps --dry-run as the review-only escape hatch. - Search-run dashboard summaries now carry the resulting next command and package-evaluation context when a winning candidate is applied, so live review stays grounded in measured package state instead of raw search provenance.
2026-04-15
OSSCLI
package search is now part of the main improve lifecycle
  • Added selftune improve --scope package, which routes the primary improvement alias into selftune search-run instead of keeping bounded package search behind an expert-only command. - Package scope now preserves --eval-set, strips redundant --dry-run, normalizes compatible replay validation flags, and maps --candidates onto search-run’s --max-candidates knob. - Updated command help, workflow docs, SKILL routing guidance, and CLI docs so package search is taught as part of the main measured improvement loop.
2026-04-15
OSSCLIDashboard
bounded package search is now executable end to end
  • Added selftune search-run as a real top-level CLI command that generates bounded routing/body package variants, evaluates them through the shared package evaluator, and persists the selected winner plus provenance. - Wired search-run through dashboard actions, child-process event instrumentation, live-run summaries, and draft-package action buttons so bounded search is executable from the product surface instead of only existing as stored backend state. - The skill report backend now returns real package frontier state and the latest search-run provenance, so the frontier panel is driven by measured candidate history rather than a dormant response field. - Package search evaluations now normalize temp candidate variants back onto the canonical skill name, and winner selection now follows the accepted frontier over the full evaluator contract instead of replay-only gains. - Updated command help, workflow docs, SKILL routing, and the CLI quick reference so the new search surface is documented consistently.
2026-04-15
CLIOSS
Package evaluation pipeline terminology
  • Updated selftune status output to label the readiness section “Package pipeline” instead of “Creator loop”. - Adapted package search runner to the mature evaluator API with frontier-based parent selection. - Normalized SKILL.md description and body to reference the package evaluation pipeline (replay, baseline, grading, body, unit tests, and post-deploy watch) as the primary improvement mechanism. - Updated Evolve, EvolveBody, Watch, and CreateTestDeploy workflow docs to use package evaluation pipeline terminology consistently. - Normalized Baseline, Evals, UnitTest, SignalsDashboard workflow docs and creator-playbook reference to use package evaluation pipeline terminology.
2026-04-15
OSSDashboardCLI
Package frontier observability in dashboard
  • Added package frontier panel to skill report showing accepted candidates ranked by measured evidence with watch-fed demotion indicators. - Added search run panel to live run screen showing selected parent, candidates evaluated, winner determination, and provenance detail. - Added search-run action result parsing to the dashboard action result contract so search runs surface structured summaries alongside existing replay dry-run results.
2026-04-15
CLIOSS
Bounded mutation primitives for package search
  • Added generateRoutingMutations() and generateBodyMutations() in the evolution pipeline to produce complete skill file variants that a package search runner can score. Three routing strategies (synonym expansion, granularity split, coverage broadening) and three body strategies (instruction emphasis, example enrichment, description expansion) create bounded variants written to temporary directories.
2026-04-15
CLIOSS
Bounded package search runner
  • Added bounded package search runner that evaluates candidate skill variants against the accepted frontier parent with measured delta acceptance. - Added package candidate state management with frontier reading, parent selection, and fingerprint-based deduplication. - Added package search provenance persistence tracking frontier size, parent selection method, candidate fingerprints, and evaluation summaries.
2026-04-15
OSSCLIDashboard
watch now flags package efficiency regressions
  • selftune watch now reads the current package-evaluation artifact when one exists and computes an efficiency regression signal from observed post-deploy sessions, instead of only looking for trigger-pass-rate regressions and optional grade regressions. - Efficiency watch is grounded in measured package baselines already produced by create report and create publish, so post-deploy monitoring now compares observed input tokens, output tokens, and assistant turns against the same package-evaluator contract used before publish. - Efficiency regressions now flow through the structured watch result and the nested package watch summary, so publish/watch consumers can surface the same measured signal without scraping alert text. - The local dashboard watch parser now preserves those efficiency-regression fields in the package watch summary, keeping the watch contract forward-ready for richer live-run presentation as more post-deploy package signals land.
2026-04-15
OSSCLIDashboard
package candidate history now records measured acceptance
  • Durable draft package candidates now carry a measured acceptance decision in local state, instead of only lineage metadata, so candidate history can distinguish accepted improvements from measured regressions. - Acceptance is computed from package-evaluator evidence rather than model confidence, with explicit replay, routing, baseline-lift, body-quality, and unit-test deltas plus a human-readable rationale attached to the candidate summary. - Re-evaluating the same draft fingerprint preserves the original parent relationship instead of inventing a new comparison target, so repeated review runs update the candidate record without corrupting lineage. - Fresh candidates now compare their measured acceptance against the latest accepted frontier member instead of blindly inheriting the most recent rejected draft as the comparison baseline, while still keeping chronological lineage in the parent link. - When the current draft matches an already accepted frontier member, package evaluation can now reuse that candidate-specific artifact by fingerprint even if the canonical latest package report points at some other draft, so re-checking an accepted draft no longer repays the full evaluator cost. - Accepted-frontier selection is now ranked by measured package outcomes instead of timestamp alone, so newer accepted drafts with weaker grading or weaker observed health no longer automatically become the comparison parent for the next candidate. - create publish --watch now writes structured watch results back into the matching package candidate artifact and registry row, so observed regressions can demote an accepted draft in later frontier selection without fabricating a brand-new evaluation event. - Cached package-evaluation reuse now also requires acceptance metadata in the stored artifact, so older lineage-only artifacts automatically refresh once before they can participate in candidate-aware reuse. - Benchmark reports, create publish summaries, and the local dashboard live-run screen now surface the candidate acceptance decision and rationale, so measured accept/reject state is visible without opening archived JSON.
2026-04-15
OSSCLI
package evaluation now registers candidate lineage
  • Fresh draft package evaluations now register a durable package candidate per package fingerprint in local state, instead of only overwriting one latest package report per skill. - New candidate records carry parent linkage to the previously evaluated draft for the same skill plus a candidate-specific archived evaluation artifact, so later bounded package search can reuse lineage and evaluator evidence instead of rebuilding history from ad hoc files. - Cached package-evaluation reuse now requires candidate metadata in the saved artifact too, so older artifacts automatically force one fresh measured run before they can participate in candidate-aware reuse. - Benchmark reports, publish summaries, and the local dashboard live-run view now surface candidate ID, parent linkage, and generation directly, so candidate lineage is inspectable without opening archived JSON artifacts.
2026-04-14
OSSCLI
selftune skill now teaches a simpler lifecycle
  • Repositioned the shipped selftune skill around a smaller lifecycle: Create, Verify, Publish, Improve, and Run, instead of leading with the older stage-heavy creator loop. - Added new primary workflow docs for Verify, Publish, Improve, and Run, while keeping the existing lower-level eval, replay, baseline, watch, and body-evolution workflows available as advanced surfaces. - Updated SKILL.md, routing keywords, and lifecycle-state guidance so “can I trust this skill?”, “ship this skill”, and “run the loop” now map to intention-level workflows that still use today’s commands accurately under the hood. - Reframed Create as draft authoring only, marked the older CreateTestDeploy workflow as legacy compatibility guidance, and taught Orchestrate as the underlying runtime behind the simpler Run concept. - The local dashboard action stream and dashboard-triggered publish/evolve paths now recognize and use the new verify, publish, improve, and run aliases where they preserve the same measured behavior, so the live-run UI stays aligned with the simplified lifecycle surface. - The local dashboard overview, skill report, live action feed, and CLI docs now teach draft-package work as verify, publish, and live monitoring first, while still exposing the lower-level eval, replay, baseline, and create-check commands when an agent needs to drive the advanced loop manually. - selftune status, dashboard recommended commands, live-run next-command cards, the shipped quick reference, README, and the main skill-authoring guides now normalize old surface aliases like create check, create publish, and orchestrate into verify, publish, and run when the underlying behavior is equivalent, so the product stops teaching mixed lifecycle vocabulary by default. - Scheduled automation surfaces now teach selftune run as the default autonomous loop entrypoint: cron job messages, generated schedule snippets, alpha-enrollment guidance, orchestration reports, and the related docs and skill workflows all use run first while keeping orchestrate as the underlying advanced runtime name where needed. - Fixed the selftune create CLI page after a broken MDX wrapper landed, and updated the main authoring, troubleshooting, sharing, trigger-testing, and creator-playbook docs so they teach verify / publish first while still documenting the lower-level create replay / create baseline package steps when a draft needs explicit measured proof. - Normalized the secondary advanced workflow docs and README so eval, unit-test, baseline, evolve, evolve body, dashboard live-run, and legacy create-test-deploy guidance now distinguish draft-package lifecycle work from already-published skill iteration, instead of re-teaching the old creator-loop chain as the default. - Cleaned up the remaining lifecycle wording in status, eval, and create CLI docs plus the shipped SKILL.md reference table, so “creator loop” now mainly survives as a compatibility/search term instead of the default label for the product surface. - Corrected the package-search docs so search-run and improve --scope package are documented as explicit bounded-search surfaces, without claiming that run / orchestrate already auto-select package search before that automation is actually shipped.
2026-04-14
OSSCLI
package evaluation now reuses fresh measured artifacts
  • Added a canonical full-evaluation artifact beside the stored package summary, so create report and publish-time package gates can reuse one measured replay/baseline/body-validation result instead of scraping or recomputing partial state. - Package-evaluation reuse is guarded by the bounded package fingerprint and request shape, so edited drafts or changed evaluation requests still trigger a fresh measured run instead of trusting stale evidence. - Cache hits only apply when the saved package artifact already includes the current routing/body validation dimensions, so older summaries automatically fall back to a fresh measured run instead of silently downgrading the review signal. - Benchmark reports and publish output now label whether the package evaluation was freshly measured or reused from a matching artifact cache, so creators can audit reuse instead of inferring it from timing or logs. - The local dashboard live-run summary now surfaces that same fresh-vs-cached evaluation source for package report/publish actions, so cache reuse stays visible in the main review UI too.
2026-04-14
OSSCLIDashboard
package evaluation now includes routing and body validation
  • Extended the shared draft package evaluator so create report and create publish now attach current routing replay validation and current body validation alongside replay, baseline, grading, unit-test, and watch evidence.
  • Updated the benchmark-style package report format so routing replay and body validation show up in the same deterministic artifact as the rest of the measured package evidence. - Updated the active bounded package-evolution plan to reflect that body/routing validation is now part of the unified evaluator contract, moving the remaining gap toward candidate state, evaluator reuse, and measured search rather than missing evaluator dimensions.
2026-04-14
OSSCLI
draft package benchmark report helper
  • Added selftune create report --skill-path <path> as a no-side-effect package-evaluation command that runs replay plus baseline and renders one benchmark-style report with failure analysis, measured lift, recommendation, and next-step guidance.
  • Added the same report shape as a reusable helper in the shared draft package evaluator so future dashboard and PR-summary surfaces can reuse one deterministic evidence format instead of inventing ad hoc summaries.
  • Updated the selftune skill workflow docs, quick reference, README, and CLI docs so package creators can explicitly request a measured publish-readiness report before running create publish.
2026-04-14
OSSCLI
package-first create publish handoff
  • Updated selftune create publish so draft-package publishing now re-runs create replay --mode package and create baseline --mode package as the final measured gate before watch. - Removed the old direct handoff from create publish into description-only selftune evolve, keeping the creator loop grounded in package-level validation instead of a description mutation step. - Added a shared package-evaluation summary that create publish can return directly, so draft deploy/watch actions have one measured result shape instead of stitching together replay and baseline outcomes ad hoc. - Updated the local dashboard action parser so draft-package baseline and publish runs can surface replay mode, before/after pass rates, and lift on the live run screen. - selftune watch now emits a machine-readable recommended_command, and create publish --watch now carries the nested watch_result payload through directly so draft publish/watch flows expose measured post-deploy pass rates, alerts, and rollback recommendations instead of only a coarse “watch started” status. - Updated creator-loop readiness and selftune status guidance so draft packages now recommend create replay, create baseline, and create publish instead of falling back to the older evolve / grade commands for those milestones. - Updated the overview, skill report, and selftune status creator-loop surfaces so draft packages stay blocked on create check or package-resource fixes until those checks actually pass, instead of skipping ahead to replay or publish because later creator-loop artifacts already exist. - Added dashboard support for create check as a runnable draft-package action, so the live-run screen and draft package panel can stream and summarize spec-validation checks instead of showing that step as copy-only guidance. - Added structured progress events for create check, so the live-run screen now shows draft-package load, Agent Skills validation, and selftune readiness computation as explicit steps instead of only the final JSON result. - Made the overview creator-loop priorities runnable from the dashboard for actionable steps, so top-level draft-package cards can launch create check, eval generation, replay, baseline, and publish flows without drilling into the per-skill report first. - Updated the CLI help, OSS workflow docs, and docs site reference so the publish contract matches the package-first creator loop. - The live-run summary tiles now relabel watch actions as Baseline, Observed, Delta, and Signal, so post-deploy watch evidence no longer appears under the older dry-run Before / After / Validation vocabulary. - The shared package-evaluation payload now carries runtime efficiency and representative evidence, so package replay / baseline / publish flows can return measured duration and token aggregates together with replay-failure and baseline-win samples instead of only pass-rate summaries.
  • The live-run screen now surfaces those measured package-evaluation artifacts directly, including replay-failure samples, baseline-win/regression samples, with-skill versus without-skill efficiency totals, and recommended next commands when publish or watch actions expose them.
  • Added report-package as a first-class dashboard action for draft skills, so the skill report and live-run feed can launch selftune create report directly and label the resulting benchmark artifact separately from baseline, publish, and watch runs.
  • create publish --watch now attaches a structured watch summary to that same package-evaluation payload, and the live-run screen renders watch snapshot counts, invocation-type totals, rollback state, and grade-watch deltas from that shared measured contract.
  • Clarified the public CLI docs and shipped Create workflow so agents can rely on both the raw nested watch_result payload and the normalized package_evaluation.watch block when they parse publish-with-watch results.
  • selftune evolve and selftune evolve body no longer reject proposals before measured validation solely because model-reported confidence is low; --confidence now acts as a review threshold and adaptive-gate risk signal instead of a hard pre-validation stop.
  • The shared package-evaluation payload now also includes grading baseline versus recent grading deltas when that data exists, so create report and create publish --json can show observed execution-quality movement next to replay, baseline, and watch evidence.
  • The local dashboard now parses and renders that same package_evaluation.grading block in live-run summaries, so draft package report and publish flows expose measured grading movement without requiring raw JSON inspection.
  • The latest package-evaluation summary is now stored canonically in SQLite and mirrored to ~/.selftune/package-evaluations/<skill>.json, so draft report/publish/watch flows can reuse one measured artifact instead of treating package evaluation as stdout-only output.
  • Draft-package readiness and create check now honor the latest stored package-evaluation status, so a measured replay_failed or baseline_failed result keeps the skill blocked on the corresponding package gate instead of surfacing a false ready to publish state just because the older replay or baseline artifacts exist.
  • The shared package-evaluation payload now also carries deterministic unit test results and representative failing tests when that evidence exists, so create report, create publish --json, and the live-run UI can review the latest measured test run alongside replay, baseline, grading, and watch evidence.
  • Draft-package readiness and create check now also honor the latest failed deterministic unit-test run when one exists, so stored test failures keep the draft blocked on rerunning unit tests instead of treating test-file presence alone as publish-ready proof.
  • Stored package-evaluation artifacts now include a bounded package fingerprint, and draft-package readiness only trusts those replay/baseline results when the fingerprint still matches the current package tree, so stale failed measurements stop blocking edited drafts just because they share the same skill name.
  • Fixed dashboard child-process action context for report-package, so create report and verify now stream live progress and metrics events into the live-run screen instead of silently dropping them when the action context is read from environment variables.
2026-04-14
OSSCLI
create skill packages and workflow scaffolds
  • Added selftune create init as the clean-slate authoring path for new skills. - Added selftune create scaffold --from-workflow ... as the workflow-derived authoring path, and upgraded selftune workflows scaffold to emit the same package shape for backward compatibility. - Package drafts now include SKILL.md, workflows/default.md, references/overview.md, empty scripts/ and assets/ directories, plus a selftune.create.json manifest.
  • Added selftune create check to run Agent Skills spec validation first and then compute selftune-specific package readiness for evals, unit tests, replay, and baseline. - Added selftune create replay, selftune create baseline, selftune create status, and selftune create publish so the draft-package path now reaches all the way through replay validation, lift measurement, and handoff into the existing evolve/watch surfaces. - Added package-mode replay staging so runtime replay can read workflow/reference files inside the staged skill package without treating them as unrelated paths. - The local dashboard now surfaces draft packages before they have live telemetry, shows package-local create readiness on the skill report, and routes dashboard replay/baseline/ publish actions through the draft-aware create commands automatically. - selftune create check now recommends create replay, create baseline, and create publish for draft-package next steps instead of the older generic evolve/grade commands, keeping package-tree staging consistent from CLI output through the dashboard. - Hardened the local dashboard draft-package views so the exported OSS app typechecks cleanly when create-readiness data is optional, preserving the draft-package panels in shipped builds. - Fixed selftune workflows scaffold --write so fresh workflow-derived packages are written through the shared draft-package writer instead of pre-creating the directory and tripping the overwrite guard. - Draft-package dashboard actions now start eval generation with --auto-synthetic, so cold-start skills can bootstrap eval sets from the dashboard instead of attempting empty log-based generation. - Added agent workflow docs and public CLI docs so agents can route package authoring requests to the full command surface.
2026-04-14
CloudRegistry
Cloud GitHub registry foundation
  • Added GitHub App installation binding for cloud orgs so a team can associate a GitHub installation with its registry workspace. - Added GitHub-backed registry connection APIs for listing accessible repos, connecting a repo to a registry entry, disconnecting it, and requesting manual sync. - Added immediate manual sync publishing so a connected repo path is packaged from GitHub, archived, and pushed into the registry as a GitHub-sourced version without waiting on a background worker. - Added webhook-driven auto-publish for default-branch pushes and matching Git tags so connected repos now flow into the registry without manual sync. - Added a dashboard GitHub settings flow with installation binding, repo discovery, monorepo path selection, and connection management controls. - Added Tier A GitHub write-back with org-level policy, per-connection opt-in, persisted publish attempts, and optional commit status/check-run updates for successful, skipped, and failed publishes. - Added direct selftune registry install github:owner/repo[@ref][//path] support so skills can be installed straight from GitHub with monorepo path discovery when the cloud registry is not part of the flow. - Fixed direct root installs from GitHub so a missing name: in root-level SKILL.md falls back to the actual repository name instead of the temporary clone directory name. - Restored the expected indentation in selftune registry --help so the usage block matches the rest of the CLI help formatting. - Polished the cloud GitHub settings experience with branded action buttons, clearer installation action states, a consolidated production setup runbook, and lowercase selftune branding on key cloud surfaces. - Added signed GitHub webhook intake plus registry source metadata fields so GitHub-origin publishes can be tracked separately from CLI-pushed versions. - Hardened GitHub webhook handling so tag patterns reject unsafe multi-wildcard shapes and webhook deliveries return immediately while publish processing continues asynchronously.
2026-04-14
OSSDashboardCLI
sqlite creator loop artifacts
  • Moved canonical eval sets, generated unit tests, and unit-test run results into SQLite as the primary local source of truth for creator-loop readiness. - Kept mirroring those artifacts into the legacy ~/.selftune/eval-sets/ and ~/.selftune/unit-tests/ JSON files so existing file-based workflows and commands still work during the transition. - Updated readiness/status surfaces to prefer SQLite-backed artifacts instead of depending on filesystem existence checks.
2026-04-14
OSSDashboardCLI
canonical dashboard artifact paths
  • Updated dashboard-triggered generate-evals to pass the canonical ~/.selftune/eval-sets/<skill>.json output path explicitly instead of relying on a relative fallback filename.
  • Updated dashboard-triggered generate-unit-tests to pass the canonical ~/.selftune/unit-tests/<skill>.json path explicitly as well, keeping readiness artifacts out of the repo working directory.
2026-04-14
OSSDashboardCLI
dashboard rollback routing
  • Fixed local dashboard rollback actions to spawn selftune evolve rollback with the expected proposal arguments, matching the actual CLI command surface.
  • Added a dashboard regression test that asserts the rollback action uses the evolve rollback subcommand shape.
2026-04-14
OSSDashboard
evolution rail header background
  • Removed the forced background fill from the sticky Evolution heading in the shared skill report evidence rail so proposal views keep the intended transparent panel treatment while scrolling.
2026-04-14
OSSDashboardCLI
structured creator-loop progress
  • Added a shared dashboard action instrumentation layer so creator-loop commands can emit structured step progress, LLM call progress, and provider-normalized runtime metadata without hard-coding the dashboard to one provider. - Wired selftune eval generate and selftune eval unit-test --generate into that shared observer path so the live-run screen can show load/build/write steps plus provider/model/duration updates instead of only terminal output. - Generalized the live-run UI from replay-only wording to a broader action-progress surface while keeping replay as the richest source of token and cost detail.
2026-04-14
OSSDashboardCLI
dashboard update badge
  • Added cached update availability metadata to the local dashboard health surface so the dashboard can tell the difference between up-to-date, auto-update-capable installs and manual-refresh source-tree installs. - Added a passive Update available status chip in the local dashboard footer plus a dedicated update panel on /status, keeping version visibility available without polluting live creator-loop transcripts.
2026-04-14
OSSDashboard
local dashboard proposal focus
  • Fixed proposal selection so opening a proposal link no longer gets overwritten by an automatic fallback selection. - Removed eager proposal auto-focus during initial load to keep deep links stable. - Kept readiness-driven action prioritization aligned with the active proposal focus state so child action sections no longer shift unexpectedly.
2026-04-14
OSSCLIDashboardPlatforms
quieter local creator loop runs
  • Suppressed unsupported auto-update chatter during local source-tree runs so dashboard-triggered creator-loop actions no longer flood the live log with manual refresh instructions. - Updated OpenCode ingest to support the current SQLite schema, including time_created timestamps and JSON-backed message rows, instead of assuming legacy created/content columns.
2026-04-13
OSSDashboardCLI
dashboard streaming refresh
  • Added a live action feed in the local dashboard so creator-loop runs show start, progress, and finish states instead of only appearing after the next data refresh. - Added a dedicated live-run screen for creator-loop actions so replay dry-runs can stream output, show parsed lift summaries, and display model/platform/token context beside the terminal log. - Added structured replay metrics to the live dashboard stream so Claude runtime replay now reports per-run platform, model, token, cost, and duration data in real time instead of only terminal text. - Added per-eval replay progress streaming and SSE backfill so the live-run screen can show eval n/N, query snippets, and pass/fail evidence even when you open the page after the run has already started. - Added dashboard action buttons for the main creator loop on skill reports: generate evals, generate unit tests, replay dry-run, baseline measurement, deploy, and watch. - Added a shared local action stream so supported terminal-run selftune commands also appear in the dashboard without being launched from the UI. - Fixed replay dry-runs so validated evolve --dry-run runs surface as success in the live dashboard feed even when the CLI exits non-zero to avoid accidental deployment.
2026-04-13
OSSDashboardCloudRegistryBilling
v0.2.24 to v0.2.27
  • Repaired the OSS publish pipeline so npm releases can still generate SBOMs, GitHub tags, and enriched release notes even when a publish partially succeeds. - Blocked cloud dashboard indexing and added changelog coverage enforcement so shipped product changes are documented before they merge. - Opened registry publishing and rollback to Pro plans so solo skill creators can publish and iterate without upgrading to Team first. - Tightened the local dashboard skill report around proposal deep links, kept proposal-focused layouts stable while report data loads, prevented raw ENOENT errors during SPA reloads, and restored full-width creator loop layout on overview. - Unified cloud and OSS skill report styling around the shared trust status language by restoring trust panel order, removing leftover success-green treatments, and switching trust badges to the app-wide dot-and-pill status treatment.
2026-04-08
OSSPlatformsCLI
v0.2.20 to v0.2.23
  • Added universal hook adapters for Codex, OpenCode, and Cline so selftune can capture real-time telemetry beyond Claude Code. - Added cold-start suspicion and Claude runtime replay validation to make trigger diagnostics more trustworthy when a skill has little history. - Hardened OpenCode installation so hook setup follows current plugin and config behavior instead of relying on rejected config keys. - See the OSS releases for package artifacts and per-version compare links.
2026-04-01
OSSDashboardCommunity
v0.2.14 to v0.2.19
  • Overhauled dashboard, trust, and creator-facing contribution surfaces so health signals are easier to interpret during active iteration. - Tightened the autonomous evolve and audit path to close reliability gaps in proposal rollout and monitoring. - Added CLI auto-update, richer structured errors, description quality scoring, and unblock suggestions for faster operator recovery.
2026-03-08
OSSCLIDashboard
v0.2.0
  • Added full skill body evolution so selftune can refine routing tables and larger skill bodies instead of only short descriptions. - Added synthetic eval generation to help new skills bootstrap without waiting for a large session history. - Introduced cheaper validation loops, activation rules, specialized agents, and a live local dashboard server for faster iteration. - Read more in the evolution concept guide and the dashboard command reference.
2026-03-01
OSSCLICommunity
v0.1.4
  • Added selftune status and selftune last so you can check skill health without opening the full dashboard. - Added a local dashboard and Claude transcript backfill to make retroactive analysis practical on existing projects. - Added opt-in community export so you can share anonymized signals back to the ecosystem.
2026-02-28
OSSCLIPlatforms
v0.1.0
  • Shipped the initial CLI with init, grade, eval, evolve, watch, doctor, and platform ingest commands. - Added Claude Code hooks for prompt capture, skill evaluation, and end-of-session telemetry. - Introduced the initial observe → detect → evolve → watch loop that the rest of the product builds on today.