RAG vs long context vs fine-tune — the decision that's killed more AI startups than any model swap — step 9 of 9
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Invert the rubric. Given a product spec AND a fork someone picked, decide whether the fork fits the product.
Write evaluate_strategy(product, fork) that returns a tuple
(ok: bool, reason: str).
The product spec has the same fields as in step 08:
corpus_size_tokens, update_frequency_days, style_critical,
cost_per_call_max, latency_budget_ms.
The fork is one of: "rag", "long_context", "fine_tune",
"hybrid".
Apply these checks (first failure wins; if none fail, return ok=True):
- If fork is
"long_context"ANDcorpus_size_tokens > 2_000_000: return(False, "corpus too large for long-context window") - If fork is
"fine_tune"ANDupdate_frequency_days <= 7: return(False, "freshness too high for fine-tune") - If fork is
"rag"ANDcorpus_size_tokens <= 200_000ANDupdate_frequency_days >= 90: return(False, "RAG overkill for small stable corpus") - If fork is
"hybrid"AND NOTstyle_critical: return(False, "hybrid wasted without style demand") - Otherwise return
(True, "fork fits product")
Three product-fork pairs run. Expected output:
Harvey + fine_tune: (False, 'freshness too high for fine-tune')
Glean + long_context: (False, 'corpus too large for long-context window')
Handbook bot + rag: (False, 'RAG overkill for small stable corpus')
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Invert the rubric. Given a product spec AND a fork someone picked, decide whether the fork fits the product.
Write evaluate_strategy(product, fork) that returns a tuple
(ok: bool, reason: str).
The product spec has the same fields as in step 08:
corpus_size_tokens, update_frequency_days, style_critical,
cost_per_call_max, latency_budget_ms.
The fork is one of: "rag", "long_context", "fine_tune",
"hybrid".
Apply these checks (first failure wins; if none fail, return ok=True):
- If fork is
"long_context"ANDcorpus_size_tokens > 2_000_000: return(False, "corpus too large for long-context window") - If fork is
"fine_tune"ANDupdate_frequency_days <= 7: return(False, "freshness too high for fine-tune") - If fork is
"rag"ANDcorpus_size_tokens <= 200_000ANDupdate_frequency_days >= 90: return(False, "RAG overkill for small stable corpus") - If fork is
"hybrid"AND NOTstyle_critical: return(False, "hybrid wasted without style demand") - Otherwise return
(True, "fork fits product")
Three product-fork pairs run. Expected output:
Harvey + fine_tune: (False, 'freshness too high for fine-tune')
Glean + long_context: (False, 'corpus too large for long-context window')
Handbook bot + rag: (False, 'RAG overkill for small stable corpus')
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.