promptdojo_

LLM-as-judge — when the judge is another model — step 2 of 9

Pairwise asks "A or B better?" → returns A/B/tie. Rubric asks "does this output pass?" → returns pass/fail. Which TWO scenarios above are the strongest fit for pairwise?