The harness's layer-2 code calls the model exactly once. If the call hits a transient 429 (rate limit) or 503 (service unavailable), it raises and the user sees a crash. Production harnesses retry transient failures with exponential backoff.
Fix call_with_retries(fn, max_attempts=3) to retry up to 3
times on transient errors (any exception with transient=True
attribute). Return the first successful result. Re-raise on
the final attempt failure.
Expected output:
succeeded on attempt 3
The break is on lines 13, 14 — but read the whole snippet first.