promptdojo_

Add evals and traces — measure the agent, don't trust it — step 4 of 9

The agent's canned answers are above. Each case checks whether the expected substring is in the agent's answer. The third case will fail (agent says "Kyoto" instead of "Tokyo"). What does the suite print?

read, then continue.