How evals went from research curiosity to the only thing that ships — a five-year history — step 8 of 8
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Synthesize the chapter's history into one function:
triage_teams(teams) that takes a list of team profiles and returns
a dict with two fields:
verdicts: a dict mapping each team'snameto its verdict string (same four tiers as step 07)most_at_risk: thenameof the team with the LOWEST readiness score (the team most likely to break on the next model launch)
Score each team using the same rules as step 07:
has_test_setTrue: +25has_judge_promptTrue: +15ci_runs_evalsTrue: +25tracks_eval_historyTrue: +15eval_count_per_feature>= 20: +20
Verdict tiers: >=80 "eval-mature", >=50 "eval-aware", >=20
"eval-curious", else "vibes era".
On ties for lowest score, return the FIRST tied team (Python's
min with key= preserves stable order).
Five teams run. Expected output:
verdicts: {'CursorClone': 'eval-mature', 'SeriesBAI': 'eval-mature', 'SeedAI': 'eval-curious', 'ConsultingShop': 'eval-curious', 'PromptShopCo': 'vibes era'}
most at risk: PromptShopCo
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Final drill. Synthesize the chapter's history into one function:
triage_teams(teams) that takes a list of team profiles and returns
a dict with two fields:
verdicts: a dict mapping each team'snameto its verdict string (same four tiers as step 07)most_at_risk: thenameof the team with the LOWEST readiness score (the team most likely to break on the next model launch)
Score each team using the same rules as step 07:
has_test_setTrue: +25has_judge_promptTrue: +15ci_runs_evalsTrue: +25tracks_eval_historyTrue: +15eval_count_per_feature>= 20: +20
Verdict tiers: >=80 "eval-mature", >=50 "eval-aware", >=20
"eval-curious", else "vibes era".
On ties for lowest score, return the FIRST tied team (Python's
min with key= preserves stable order).
Five teams run. Expected output:
verdicts: {'CursorClone': 'eval-mature', 'SeriesBAI': 'eval-mature', 'SeedAI': 'eval-curious', 'ConsultingShop': 'eval-curious', 'PromptShopCo': 'vibes era'}
most at risk: PromptShopCo
this step needs the editor
on desktop today; in the app (coming soon). save your spot and we'll bring you back here when you're ready.