Read the trace, not the chat — find the broken turn before reading the user's complaint (step 9/9) · debugging broken ai output

promptdojo_›phase 04 · shipping discipline›ch 24 · debugging broken ai output

lesson 1 of 3 · read the trace, not the chat — find the broken turn before reading the user's complaintstep 9 / 9

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Build the full triage flow. Write triage(trace) that:

Computes n_tools (total tool calls across the trace) and n_validation (total validation errors).
Returns a dict {"class": <class>, "fix_layer": <layer>, "summary": <one-line summary>} where:
- If n_validation > 0: class = "retrieval", fix_layer = "retriever / tool inputs".
- Else if trace[-1]["stop_reason"] == "max_tokens": class = "prompt", fix_layer = "prompt (output too long; cap or split task)".
- Else if n_tools == 0: class = "true_hallucination", fix_layer = "constrain output (force tool use, require citations)".
- Else: class = "downstream_mangling", fix_layer = "post-processing code".
- summary is the same one-line string from step 8: "<N> turns, <M> tools, <V> validation_errors, final stop=<stop_reason>".

Three traces run for you. Expected output:

class=retrieval fix_layer=retriever / tool inputs summary=3 turns, 2 tools, 1 validation_errors, final stop=end_turn
class=true_hallucination fix_layer=constrain output (force tool use, require citations) summary=2 turns, 0 tools, 0 validation_errors, final stop=end_turn
class=prompt fix_layer=prompt (output too long; cap or split task) summary=2 turns, 1 tools, 0 validation_errors, final stop=max_tokens

⌘↵ runs the editor.read, then continue.

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Build the full triage flow. Write triage(trace) that:

Computes n_tools (total tool calls across the trace) and n_validation (total validation errors).
Returns a dict {"class": <class>, "fix_layer": <layer>, "summary": <one-line summary>} where:
- If n_validation > 0: class = "retrieval", fix_layer = "retriever / tool inputs".
- Else if trace[-1]["stop_reason"] == "max_tokens": class = "prompt", fix_layer = "prompt (output too long; cap or split task)".
- Else if n_tools == 0: class = "true_hallucination", fix_layer = "constrain output (force tool use, require citations)".
- Else: class = "downstream_mangling", fix_layer = "post-processing code".
- summary is the same one-line string from step 8: "<N> turns, <M> tools, <V> validation_errors, final stop=<stop_reason>".

Three traces run for you. Expected output:

class=retrieval fix_layer=retriever / tool inputs summary=3 turns, 2 tools, 1 validation_errors, final stop=end_turn
class=true_hallucination fix_layer=constrain output (force tool use, require citations) summary=2 turns, 0 tools, 0 validation_errors, final stop=end_turn
class=prompt fix_layer=prompt (output too long; cap or split task) summary=2 turns, 1 tools, 0 validation_errors, final stop=max_tokens

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

save my spot follow @TFisPython for the app launch

open this same url on a laptop to keep going today.

Read the trace, not the chat — find the broken turn before reading the user's complaint — step 9 of 9

this step needs the editor