promptdojo_

Cursor wrote this eval: it calls the agent, gets the answer, then asserts the answer equals what the agent returned. The eval ALWAYS passes — it's checking the agent against itself. Tautological.

Fix the eval to assert against an INDEPENDENT ground truth string ("Paris") that came from a fact-check, not from the agent.

Expected output:

pass: True
The break is on line 9 — but read the whole snippet first.

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

open this same url on a laptop to keep going today.