Embedding that fits the budget — pick a model that matches your corpus (step 7/9) · context and retrieval

Someone wrote a "fast similarity" that's just the dot product — they skipped the normalization step. It compiles, it runs, and it RETRIEVES THE WRONG DOCUMENT.

Why: dot product without normalization rewards big vectors. A doc that points the wrong direction but has a bigger magnitude beats a doc that points the right direction with a smaller magnitude.

In the editor, doc_match is an exact-direction match for the query (cosine = 1.0). doc_long points a totally different direction but has a bigger magnitude. The broken function thinks doc_long is closer. Fix the function to divide by the norms.

Expected output:

doc_match: 1.0000
doc_long:  0.5774
winner: doc_match

The break is on lines 4, 5 — but read the whole snippet first.

Someone wrote a "fast similarity" that's just the dot product — they skipped the normalization step. It compiles, it runs, and it RETRIEVES THE WRONG DOCUMENT.

Why: dot product without normalization rewards big vectors. A doc that points the wrong direction but has a bigger magnitude beats a doc that points the right direction with a smaller magnitude.

Expected output:

doc_match: 1.0000
doc_long:  0.5774
winner: doc_match

The break is on lines 4, 5 — but read the whole snippet first.

full-screen editor opens — close anytime to keep reading.

Embedding that fits the budget — pick a model that matches your corpus — step 7 of 9