Skip to main content

Overview

Eval suites are sets of test cases that verify your skill triggers (or doesn’t trigger) on specific queries. SelfTune can suggest new cases automatically from telemetry data and recent runs. You can accept those suggestions into the draft editor or write them directly into the selected saved suite.

Suggestion review workflow

When SelfTune detects queries your skill missed, failed on, or regressed against, it surfaces them as eval suggestions on the skill’s detail page. Each suggestion shows:
  • The query text
  • Whether the skill should or should not trigger
  • The source (linked telemetry or hosted run)
  • Why it was raised (missed query, failed eval, run regression, or run failure)
  • Observed count and confidence
You can:
  • Accept into draft to open the suite editor with the case pre-filled
  • Save to suite to append the case directly into the selected saved suite
  • Dismiss to remove it from the pending queue
Saved-suite writes are still explicit operator actions. SelfTune does not silently rewrite the authoritative suite.

Reviewed suggestion history

After reviewing suggestions, they move into the Reviewed suggestion history section below the pending queue. This section shows your last 8 reviewed cases with their outcomes. Each reviewed suggestion card displays:
  • Whether it was Accepted or Dismissed
  • The provenance badge (Failed eval, Missed query, Run regression, Run failure)
  • The data source badge (Telemetry or Hosted run)
  • The query text and AI rationale
  • When it was last seen and when you reviewed it

Actions on reviewed suggestions

SituationActionWhat happens
You accepted a suggestion and want to add the case to the draft againAdd to draft againOpens the suite editor with the case merged in — no need to re-review
You accepted a suggestion into the draft and now want it in the authoritative suiteSave to suiteAppends the case into the selected saved suite and records the acceptance target as saved_suite
You dismissed a suggestion but want to accept it nowAccept into draftRecords a new accepted review and opens the suite editor
You want to undo your review entirelyRestore to pendingClears the review record and returns the suggestion to the pending queue

Restore to pending

Restoring a suggestion removes your review decision. The suggestion reappears in the pending queue so you (or a teammate) can review it fresh. This is useful if:
  • You dismissed a suggestion by mistake
  • A previously accepted case was removed from the suite and you want to reconsider it
  • A teammate should re-review the case with fresh context
If you have more than 8 reviewed suggestions, the history shows the 8 most recent. Older records are still stored and accessible via the Sources API.

Latest run pressure

The cloud skill page now also shows a Latest run pressure section sourced from the newest improve run for that skill source. This is different from the pending suggestion queue:
  • it shows what the latest run actually failed or regressed on
  • it appears even before you review those cases into the pending queue
  • it links back to the improve run detail page so you can inspect the result in context
If a latest-run pressure case is still net-new, you can accept it into the draft or save it directly into the selected suite from the source page.

Creating a suite from suggestions

  1. Open the skill detail page for a cloud source.
  2. Review pending suggestions — accept cases that represent real usage patterns your skill should handle.
  3. Either accept the case into the draft editor or save it directly into the selected suite.
  4. If you opened the draft editor, add more rows, set the verifier, then save.
  5. Run the suite from the improve tab to measure coverage.

Preserved provenance

Cases accepted from suggestions retain their provenance in the saved suite:
  • source:
    • linked_telemetry
    • hosted_run
  • provenance:
    • missed_query
    • failed_evaluation
    • run_regression
    • run_failure
That keeps operator-authored suites auditable even as they learn from recent evidence. For programmatic workflows, use the Eval Suites API to create and manage suites directly.