How evals went from research curiosity to the only thing that ships — a five-year history — step 7 of 8
Write eval_readiness(team) that takes a team profile (dict) and
returns a dict with two fields:
score: integer 0-100, higher means MORE eval discipline (good)verdict: string, one of:"eval-mature"if score >= 80"eval-aware"if score >= 50"eval-curious"if score >= 20"vibes era"if score < 20
Score the team on these signals (each adds points to the readiness total):
has_test_setis True: add 25 (Hamel: you do not have a product without one)has_judge_promptis True: add 15 (rubric or LLM-as-judge defined somewhere)ci_runs_evalsis True: add 25 (the regression gate)tracks_eval_historyis True: add 15 (can compare runs over time)eval_count_per_feature>= 20: add 20 (Anthropic's 50-is-plenty floor)
Two teams run. Expected output:
EvalMatureCo: {'score': 100, 'verdict': 'eval-mature'}
VibeCo: {'score': 0, 'verdict': 'vibes era'}
⌘↵ runs the editor.read, then continue.
Write eval_readiness(team) that takes a team profile (dict) and
returns a dict with two fields:
score: integer 0-100, higher means MORE eval discipline (good)verdict: string, one of:"eval-mature"if score >= 80"eval-aware"if score >= 50"eval-curious"if score >= 20"vibes era"if score < 20
Score the team on these signals (each adds points to the readiness total):
has_test_setis True: add 25 (Hamel: you do not have a product without one)has_judge_promptis True: add 15 (rubric or LLM-as-judge defined somewhere)ci_runs_evalsis True: add 25 (the regression gate)tracks_eval_historyis True: add 15 (can compare runs over time)eval_count_per_feature>= 20: add 20 (Anthropic's 50-is-plenty floor)
Two teams run. Expected output:
EvalMatureCo: {'score': 100, 'verdict': 'eval-mature'}
VibeCo: {'score': 0, 'verdict': 'vibes era'}
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.