The training loop: forward, loss, backward, step
Every training script — from a 10-line demo to a frontier model — runs the same four-beat loop, over and over:
- Forward — run the model on the input:
pred = w * x. - Loss — measure how wrong the prediction is:
(pred - y)². - Backward — compute the gradient (which way to nudge each parameter
to lower the loss). In PyTorch this is
loss.backward(). - Step — update the parameters against the gradient:
w = w - lr * grad. In PyTorch,optimizer.step().
Then repeat. Run the editor: the loss falls from 36 toward 0 as the loop repeats — that downward march is training.
The order matters, and so does repeating
The four steps must run in that order, every iteration: you can't update
before you've computed the gradient, and you must recompute pred
each loop (a stale prediction is the classic "loss won't move" bug).
In real PyTorch you'll also see optimizer.zero_grad() at the top of the
loop — it clears last iteration's gradients so they don't accumulate. We
skip it here because our by-hand grad is freshly computed each time, but
recognize it when you read real code.
Why a builder reads loops, not writes them
You won't hand-roll training loops at work, but you'll read them
constantly when an AI writes one — and the bugs are almost always loop
bugs: wrong order, a missing zero_grad, a prediction computed once
outside the loop, or a learning rate that's too big. Knowing the four
beats and that they repeat is enough to spot every one of them.