RAG vs long context vs fine-tune — the decision that's killed more AI startups than any model swap (step 9/9) · context and retrieval

promptdojo_›phase 04 · shipping discipline›ch 22 · context and retrieval

lesson 4 of 4 · rag vs long context vs fine-tune — the decision that's killed more ai startups than any model swapstep 9 / 9

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Invert the rubric. Given a product spec AND a fork someone picked, decide whether the fork fits the product.

Write evaluate_strategy(product, fork) that returns a tuple (ok: bool, reason: str).

The product spec has the same fields as in step 08: corpus_size_tokens, update_frequency_days, style_critical, cost_per_call_max, latency_budget_ms.

The fork is one of: "rag", "long_context", "fine_tune", "hybrid".

Apply these checks (first failure wins; if none fail, return ok=True):

If fork is "long_context" AND corpus_size_tokens > 2_000_000: return (False, "corpus too large for long-context window")
If fork is "fine_tune" AND update_frequency_days <= 7: return (False, "freshness too high for fine-tune")
If fork is "rag" AND corpus_size_tokens <= 200_000 AND update_frequency_days >= 90: return (False, "RAG overkill for small stable corpus")
If fork is "hybrid" AND NOT style_critical: return (False, "hybrid wasted without style demand")
Otherwise return (True, "fork fits product")

Three product-fork pairs run. Expected output:

Harvey + fine_tune:       (False, 'freshness too high for fine-tune')
Glean + long_context:     (False, 'corpus too large for long-context window')
Handbook bot + rag:       (False, 'RAG overkill for small stable corpus')

⌘↵ runs the editor.read, then continue.

Checkpoint

One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.

Final drill. Invert the rubric. Given a product spec AND a fork someone picked, decide whether the fork fits the product.

Write evaluate_strategy(product, fork) that returns a tuple (ok: bool, reason: str).

The product spec has the same fields as in step 08: corpus_size_tokens, update_frequency_days, style_critical, cost_per_call_max, latency_budget_ms.

The fork is one of: "rag", "long_context", "fine_tune", "hybrid".

Apply these checks (first failure wins; if none fail, return ok=True):

If fork is "long_context" AND corpus_size_tokens > 2_000_000: return (False, "corpus too large for long-context window")
If fork is "fine_tune" AND update_frequency_days <= 7: return (False, "freshness too high for fine-tune")
If fork is "rag" AND corpus_size_tokens <= 200_000 AND update_frequency_days >= 90: return (False, "RAG overkill for small stable corpus")
If fork is "hybrid" AND NOT style_critical: return (False, "hybrid wasted without style demand")
Otherwise return (True, "fork fits product")

Three product-fork pairs run. Expected output:

Harvey + fine_tune:       (False, 'freshness too high for fine-tune')
Glean + long_context:     (False, 'corpus too large for long-context window')
Handbook bot + rag:       (False, 'RAG overkill for small stable corpus')

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

save my spot follow @TFisPython for the app launch

open this same url on a laptop to keep going today.

RAG vs long context vs fine-tune — the decision that's killed more AI startups than any model swap — step 9 of 9

this step needs the editor