Embedding that fits the budget — pick a model that matches your corpus (step 6/9) · context and retrieval

promptdojo_›phase 04 · shipping discipline›ch 22 · context and retrieval

lesson 2 of 4 · embedding that fits the budget — pick a model that matches your corpusstep 6 / 9

This is the bug that ships to production most often: the query gets embedded with one model, the documents with another. Cosine similarity is meaningless across two different vector spaces — you get garbage retrieval.

In the editor, two stub "models" (embed_a and embed_b) mimic the OpenAI vs Voyage situation: same input, completely different output spaces. The docs were embedded with embed_a. The query was accidentally embedded with embed_b.

Fix it so the query is embedded with the SAME model as the docs. The user asks "reset my password" — the matching doc should return a cosine of 1.0.

Expected output:

1.0000  reset my password
0.0000  cancel my subscription
0.0000  export my data

The break is on line 30 — but read the whole snippet first.

⌘↵ runs the editor.read, then continue.

Fix it so the query is embedded with the SAME model as the docs. The user asks "reset my password" — the matching doc should return a cosine of 1.0.

Expected output:

1.0000  reset my password
0.0000  cancel my subscription
0.0000  export my data

The break is on line 30 — but read the whole snippet first.

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

save my spot follow @TFisPython for the app launch

open this same url on a laptop to keep going today.

Embedding that fits the budget — pick a model that matches your corpus — step 6 of 9

this step needs the editor