skip to content
promptdojo
_
[ save your spot ]
[ follow on x ]
Optimizers and learning-rate schedulers — step 2 of 7
ch 43 · training loops, backprop, optimizers, and schedulers
2/7
promptdojo
_
›
phase 08 · ai/ml engineering buildout
›
ch 43 · training loops, backprop, optimizers, and schedulers
lesson 3 of 5 · optimizers and learning-rate schedulers
step 2 / 7
Why do most training runs
decay
the learning rate over time — larger early, smaller later?
1
Big steps early cover ground fast; small steps late settle into the minimum instead of bouncing around it.
2
A smaller learning rate later makes the model train faster overall.
3
Decaying the learning rate adds more training data.
4
It removes the need for an optimizer.
check
Show hint
Optimizers and learning-rate schedulers — step 2 of 7
ch 43 · training loops, backprop, optimizers, and schedulers
2/7
promptdojo
_
›
phase 08 · ai/ml engineering buildout
›
ch 43 · training loops, backprop, optimizers, and schedulers
lesson 3 of 5 · optimizers and learning-rate schedulers
step 2 / 7
Why do most training runs
decay
the learning rate over time — larger early, smaller later?
1
Big steps early cover ground fast; small steps late settle into the minimum instead of bouncing around it.
2
A smaller learning rate later makes the model train faster overall.
3
Decaying the learning rate adds more training data.
4
It removes the need for an optimizer.
check
Show hint
park a thought