Checkpoint
One last thing before we move on. Same surface as a write step — but the lesson doesn't complete until this passes.
Checkpoint. Prove training works by watching the loss fall. Write
train(w, x, y, lr) that records the loss before training, runs two
gradient-descent steps on pred = w * x, and returns:
loss 64.00 -> 8.29
(loss before -> loss after 2 steps, each to 2 decimals). Loss is
(w*x - y)**2; each step uses gradient 2*(w*x - y)*x and update
w = w - lr*grad.