Retrieval that finds the right thing — top-k, thresholds, and the rerank step everyone skips — step 7 of 9
Two documents indexed the same paragraph (it was copy-pasted across docs). The retriever returns the same chunk_id twice in the top-3, wasting a slot. The fix is to dedupe by chunk_id BEFORE taking the top-k. Keep the highest-scoring occurrence.
Expected output:
['policy/p2', 'policy/p4', 'policy/p7']
The break is on line 11 — but read the whole snippet first.
⌘↵ runs the editor.read, then continue.
Two documents indexed the same paragraph (it was copy-pasted across docs). The retriever returns the same chunk_id twice in the top-3, wasting a slot. The fix is to dedupe by chunk_id BEFORE taking the top-k. Keep the highest-scoring occurrence.
Expected output:
['policy/p2', 'policy/p4', 'policy/p7']
The break is on line 11 — but read the whole snippet first.
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.