Two documents indexed the same paragraph (it was copy-pasted across docs). The retriever returns the same chunk_id twice in the top-3, wasting a slot. The fix is to dedupe by chunk_id BEFORE taking the top-k. Keep the highest-scoring occurrence.
Expected output:
['policy/p2', 'policy/p4', 'policy/p7']
The break is on line 11 — but read the whole snippet first.