LLM-as-judge — when the judge is another model — step 9 of 9
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Build a pairwise judge that detects its OWN position
bias and returns tie when it disagrees with itself across orders.
Write pairwise(question, output_a, output_b) that:
- Calls
fake_judge(question, first, second)twice:- Once with
first=output_a, second=output_b— read result as "the output named output_a sits in slot A this round." - Once with
first=output_b, second=output_a— output_b is now in slot A.
- Once with
- Each call returns
"A"or"B"(the slot that won). - Translate slot wins back to which physical output won that round.
- If the SAME physical output won both rounds, return that output's
label (
"a"or"b"). - If the rounds disagreed, return
"tie".
Then run a multi-case suite. Expected output:
case 1: a wins (consistent)
case 2: tie (position-biased — judge disagrees with itself)
case 3: b wins (consistent)
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Build a pairwise judge that detects its OWN position
bias and returns tie when it disagrees with itself across orders.
Write pairwise(question, output_a, output_b) that:
- Calls
fake_judge(question, first, second)twice:- Once with
first=output_a, second=output_b— read result as "the output named output_a sits in slot A this round." - Once with
first=output_b, second=output_a— output_b is now in slot A.
- Once with
- Each call returns
"A"or"B"(the slot that won). - Translate slot wins back to which physical output won that round.
- If the SAME physical output won both rounds, return that output's
label (
"a"or"b"). - If the rounds disagreed, return
"tie".
Then run a multi-case suite. Expected output:
case 1: a wins (consistent)
case 2: tie (position-biased — judge disagrees with itself)
case 3: b wins (consistent)
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.