This step subtracts the whole gradient and ignores the learning
rate, so every step is far too big — the exact recipe for an exploding
loss. Fix line 3 to scale the gradient by lr before subtracting.
Expected output:
4.6
The break is on line 3 — but read the whole snippet first.