promptdojo_

Retrieval that finds the right thing — top-k, thresholds, and the rerank step everyone skips — step 7 of 9

Two documents indexed the same paragraph (it was copy-pasted across docs). The retriever returns the same chunk_id twice in the top-3, wasting a slot. The fix is to dedupe by chunk_id BEFORE taking the top-k. Keep the highest-scoring occurrence.

Expected output:

['policy/p2', 'policy/p4', 'policy/p7']
The break is on line 11 — but read the whole snippet first.

full-screen editor opens — close anytime to keep reading.