Embedding that fits the budget — pick a model that matches your corpus — step 8 of 9
Write rank_by_similarity(query_vec, doc_vecs) that returns a
list of doc INDICES sorted by descending cosine similarity to the
query. The highest-scoring doc comes first.
- Input: a query vector and a list of doc vectors (all the same dim).
- Output: list of indices into
doc_vecs, sorted best-first. - Use cosine similarity (dot product / product of norms).
A query and four docs run for you. The query points heavily in the first dimension. Doc 1 matches it closely. Doc 3 is the next-closest. Docs 2 and 0 point elsewhere.
Expected output:
[1, 3, 2, 0]
Write rank_by_similarity(query_vec, doc_vecs) that returns a
list of doc INDICES sorted by descending cosine similarity to the
query. The highest-scoring doc comes first.
- Input: a query vector and a list of doc vectors (all the same dim).
- Output: list of indices into
doc_vecs, sorted best-first. - Use cosine similarity (dot product / product of norms).
A query and four docs run for you. The query points heavily in the first dimension. Doc 1 matches it closely. Doc 3 is the next-closest. Docs 2 and 0 point elsewhere.
Expected output:
[1, 3, 2, 0]
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.