Overview
Eval suites are sets of test cases that verify your skill triggers (or doesn’t trigger) on specific queries. SelfTune can suggest new cases automatically from telemetry data and recent runs. You can accept those suggestions into the draft editor or write them directly into the selected saved suite.Suggestion review workflow
When SelfTune detects queries your skill missed, failed on, or regressed against, it surfaces them as eval suggestions on the skill’s detail page. Each suggestion shows:- The query text
- Whether the skill should or should not trigger
- The source (linked telemetry or hosted run)
- Why it was raised (missed query, failed eval, run regression, or run failure)
- Observed count and confidence
- Accept into draft to open the suite editor with the case pre-filled
- Save to suite to append the case directly into the selected saved suite
- Dismiss to remove it from the pending queue
Reviewed suggestion history
After reviewing suggestions, they move into the Reviewed suggestion history section below the pending queue. This section shows your last 8 reviewed cases with their outcomes. Each reviewed suggestion card displays:- Whether it was Accepted or Dismissed
- The provenance badge (Failed eval, Missed query, Run regression, Run failure)
- The data source badge (Telemetry or Hosted run)
- The query text and AI rationale
- When it was last seen and when you reviewed it
Actions on reviewed suggestions
| Situation | Action | What happens |
|---|---|---|
| You accepted a suggestion and want to add the case to the draft again | Add to draft again | Opens the suite editor with the case merged in — no need to re-review |
| You accepted a suggestion into the draft and now want it in the authoritative suite | Save to suite | Appends the case into the selected saved suite and records the acceptance target as saved_suite |
| You dismissed a suggestion but want to accept it now | Accept into draft | Records a new accepted review and opens the suite editor |
| You want to undo your review entirely | Restore to pending | Clears the review record and returns the suggestion to the pending queue |
Restore to pending
Restoring a suggestion removes your review decision. The suggestion reappears in the pending queue so you (or a teammate) can review it fresh. This is useful if:- You dismissed a suggestion by mistake
- A previously accepted case was removed from the suite and you want to reconsider it
- A teammate should re-review the case with fresh context
Latest run pressure
The cloud skill page now also shows a Latest run pressure section sourced from the newest improve run for that skill source. This is different from the pending suggestion queue:- it shows what the latest run actually failed or regressed on
- it appears even before you review those cases into the pending queue
- it links back to the improve run detail page so you can inspect the result in context
Creating a suite from suggestions
- Open the skill detail page for a cloud source.
- Review pending suggestions — accept cases that represent real usage patterns your skill should handle.
- Either accept the case into the draft editor or save it directly into the selected suite.
- If you opened the draft editor, add more rows, set the verifier, then save.
- Run the suite from the improve tab to measure coverage.
Preserved provenance
Cases accepted from suggestions retain their provenance in the saved suite:- source:
linked_telemetryhosted_run
- provenance:
missed_queryfailed_evaluationrun_regressionrun_failure