This message has the user's question (variable) BEFORE the few-shot examples (stable). The cache prefix tries to match starting from the user's question — which differs every call — so the cache never fires. Hit rate: 0%.
Fix the message so the stable few-shot examples come FIRST (with
cache_control), then the user's question comes last with no
cache_control. Now the prefix is stable and cache fires on every
call after the first.
Expected output:
first block: text? True, cached? True
second block: text? True, cached? False
The break is on lines 7, 8, 9, 10 — but read the whole snippet first.