Gradient descent by hand (step 1/7) · training loops, backprop, optimizers, and schedulers

Gradient descent, by hand

The "step" in a training loop has a name: gradient descent. The idea is physical — you're standing on a hilly loss surface and want the bottom. The gradient tells you the uphill direction and how steep it is; you take a step the opposite way. Do it again and again and you walk downhill to low loss.

w_next = w - learning_rate * gradient

Run the editor: w climbs toward 4 and the loss shrinks each step.

The learning rate is the whole game

The learning rate (lr) scales each step:

Too small → tiny steps; training crawls and may never arrive in the steps you budgeted.
Just right → steady march down to low loss.
Too big → you leap past the bottom and land higher up the opposite slope. Each step overshoots worse than the last and the loss diverges (blows up to huge numbers or inf). This is the single most common "why is my loss exploding?" cause.

Why a builder cares

You're not deriving gradients — you read them. What you tune (or watch an AI tune) is the learning rate, and the two failure modes are "loss barely moves" (lr too small) and "loss explodes" (lr too big). Recognizing those two shapes in a training log is the practical skill. The next steps let you feel both by changing one number.