LLM-as-judge — when the judge is another model — step 2 of 9
Pairwise asks "A or B better?" → returns A/B/tie. Rubric asks "does this output pass?" → returns pass/fail. Which TWO scenarios above are the strongest fit for pairwise?
⌘↵ runs the editor.read, then continue.
LLM-as-judge — when the judge is another model — step 2 of 9
Pairwise asks "A or B better?" → returns A/B/tie. Rubric asks "does this output pass?" → returns pass/fail. Which TWO scenarios above are the strongest fit for pairwise?
⌘↵ runs the editor.read, then continue.