Add evals and traces — measure the agent, don't trust it — step 7 of 9
Cursor wrote this eval: it calls the agent, gets the answer, then asserts the answer equals what the agent returned. The eval ALWAYS passes — it's checking the agent against itself. Tautological.
Fix the eval to assert against an INDEPENDENT ground truth string ("Paris") that came from a fact-check, not from the agent.
Expected output:
pass: True
Cursor wrote this eval: it calls the agent, gets the answer, then asserts the answer equals what the agent returned. The eval ALWAYS passes — it's checking the agent against itself. Tautological.
Fix the eval to assert against an INDEPENDENT ground truth string ("Paris") that came from a fact-check, not from the agent.
Expected output:
pass: True
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.