promptdojo_

Write classify_failure(case) that takes a postmortem dict with three fields:

  • user_saw: str (the complaint as the user reported it)
  • trace_snippet: str (the relevant trace excerpt)
  • system_state: str (one-line description of the system at failure)

It must return a dict with two keys, in this exact order: {"class": int, "fix": str}.

Apply these string-match heuristics on trace_snippet, in priority order. First match wins:

  1. If "no retrieval" is in trace_snippet → class 3, fix "add retrieval-with-citations + out-of-domain refusal".
  2. Else if "schema mismatch" is in trace_snippet → class 4, fix "add Pydantic validation at the consumer boundary".
  3. Else if "stale chunk" is in trace_snippet → class 1, fix "add freshness metadata + superseded_by filter".
  4. Else if "ambiguous" is in trace_snippet → class 2, fix "tighten prompt with explicit constraint + negative example".
  5. Otherwise → class 0, fix "unclassified — read the full trace".

Two cases run for you. Expected output:

air_canada: {'class': 3, 'fix': 'add retrieval-with-citations + out-of-domain refusal'}
recruiter:  {'class': 4, 'fix': 'add Pydantic validation at the consumer boundary'}

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

open this same url on a laptop to keep going today.