Embedding that fits the budget — pick a model that matches your corpus (step 9/9) · context and retrieval

promptdojo_›phase 04 · shipping discipline›ch 22 · context and retrieval

lesson 2 of 4 · embedding that fits the budget — pick a model that matches your corpusstep 9 / 9

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Wire it all together into a tiny FAQ search engine — the same shape as a real RAG retrieval step, just with stub embeddings instead of an API call.

Build top_k(query_vec, faqs, k):

faqs is a list of (text, vec) tuples — 5 of them in the starter.
query_vec is a pre-computed embedding for the user's question.
Compute cosine similarity between query_vec and each FAQ vec.
Return the k highest-scoring FAQs as (text, score) tuples, best first.

The starter calls your function with k=2 to return the top-2 closest FAQs to the question "I forgot my password."

Real-world framing: this is the retrieval step of RAG. In production you'd swap the hand-picked vectors for real OpenAI embeddings, swap the list of 5 for thousands of chunks in a vector database, and feed the top-K results into the prompt you send to the model.

Expected output:

query: 'I forgot my password'
top 2 matches:
  0.9963  how do I reset my password?
  0.2457  how do I upgrade my plan?

⌘↵ runs the editor.read, then continue.

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Wire it all together into a tiny FAQ search engine — the same shape as a real RAG retrieval step, just with stub embeddings instead of an API call.

Build top_k(query_vec, faqs, k):

faqs is a list of (text, vec) tuples — 5 of them in the starter.
query_vec is a pre-computed embedding for the user's question.
Compute cosine similarity between query_vec and each FAQ vec.
Return the k highest-scoring FAQs as (text, score) tuples, best first.

The starter calls your function with k=2 to return the top-2 closest FAQs to the question "I forgot my password."

Expected output:

query: 'I forgot my password'
top 2 matches:
  0.9963  how do I reset my password?
  0.2457  how do I upgrade my plan?

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

save my spot follow @TFisPython for the app launch

open this same url on a laptop to keep going today.

Embedding that fits the budget — pick a model that matches your corpus — step 9 of 9

this step needs the editor