promptdojo_

Why do most training runs decay the learning rate over time — larger early, smaller later?