LLM-as-judge — when the judge is another model — step 7 of 9
The pairwise judge is calling itself only ONCE per pair — order
(a, b). Position bias means whichever output sits in slot A
gets ~35% more votes than it deserves on close pairs.
Fix the code to run BOTH orders and only declare a winner when the
judge is consistent (same physical output wins regardless of
position). On disagreement, return "tie".
Expected output:
result: tie
The pairwise judge is calling itself only ONCE per pair — order
(a, b). Position bias means whichever output sits in slot A
gets ~35% more votes than it deserves on close pairs.
Fix the code to run BOTH orders and only declare a winner when the
judge is consistent (same physical output wins regardless of
position). On disagreement, return "tie".
Expected output:
result: tie
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.