promptdojo_

Distributions, sampling, and variance

An average is useful, but it can hide the shape of the data. Two samples can have the same mean while one is tightly clustered and the other swings wildly.

Before you trust a metric, look at the spread. In production ML work, this shows up as sample size, outliers, segment differences, and unstable eval results.

The builder move is to ask, "What does the distribution look like, and how much could this number move if the sample changed?"