LLM-as-judge — when the judge is another model (step 7/9) · eval-driven ai development

promptdojo_

The pairwise judge is calling itself only ONCE per pair — order (a, b). Position bias means whichever output sits in slot A gets ~35% more votes than it deserves on close pairs.

Fix the code to run BOTH orders and only declare a winner when the judge is consistent (same physical output wins regardless of position). On disagreement, return "tie".

Expected output:

result: tie

The break is on line 11 — but read the whole snippet first.

The pairwise judge is calling itself only ONCE per pair — order (a, b). Position bias means whichever output sits in slot A gets ~35% more votes than it deserves on close pairs.

Fix the code to run BOTH orders and only declare a winner when the judge is consistent (same physical output wins regardless of position). On disagreement, return "tie".

Expected output:

result: tie

The break is on line 11 — but read the whole snippet first.

full-screen editor opens — close anytime to keep reading.

LLM-as-judge — when the judge is another model — step 7 of 9