Gradient descent by hand (step 5/7) · training loops, backprop, optimizers, and schedulers

promptdojo_

This step subtracts the whole gradient and ignores the learning rate, so every step is far too big — the exact recipe for an exploding loss. Fix line 3 to scale the gradient by lr before subtracting.

Expected output:

4.6

The break is on line 3 — but read the whole snippet first.

Expected output:

4.6

The break is on line 3 — but read the whole snippet first.

full-screen editor opens — close anytime to keep reading.

Gradient descent by hand — step 5 of 7