promptdojo_

Write pick_fork(spec) that takes a product spec (dict) and returns one of: "rag", "long_context", "fine_tune", "hybrid".

Spec fields you'll be given:

  • corpus_size_tokens: int (how big the corpus is in tokens)
  • update_frequency_days: int (how often the corpus changes; 9999 means "essentially never")
  • style_critical: bool (does the output need to read like a domain expert wrote it?)
  • cost_per_call_max: float (USD ceiling per query)
  • latency_budget_ms: int (max acceptable latency)

Apply the rubric in this order (first match wins):

  1. If corpus_size_tokens > 2_000_000: return "rag" (doesn't fit in any context window, RAG by physics).
  2. If style_critical AND update_frequency_days <= 7: return "hybrid" (style + freshness both matter).
  3. If style_critical: return "fine_tune" (style is the product, corpus is stable enough).
  4. If update_frequency_days <= 7: return "rag" (freshness pressure, no style demand).
  5. If corpus_size_tokens <= 2_000_000: return "long_context" (stable, fits in window).
  6. Fallback: return "rag".

Two products demoed. Expected output:

Glean (enterprise search):       rag
Harvey (legal AI):               hybrid

this step needs the editor

on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.

open this same url on a laptop to keep going today.