Distributions, sampling, and variance (step 1/7) · ml math and statistics that actually show up

Distributions, sampling, and variance

A distribution is the picture of where your numbers land: bunched near one value, or spread out? Two datasets can share the same average and still be completely different — one calm, one wild.

The mean tells you the center. Variance tells you the spread: it is the average squared distance from the mean. You square each distance so values above and below the mean do not cancel out, then average those squares. Bigger variance means more spread. (Take the square root and you get the standard deviation, back in the original units.)

Variance is not the range. Range is just max - min — only two points. Variance uses every value's distance from the center, so one outlier does not get to define the whole spread.

You also rarely measure everyone. You take a sample — a subset — and hope it reflects the whole. A sample mean approximates the true mean, but small samples are noisy: the fewer points you draw, the more your estimate jumps around. Builders remember this before trusting a number computed from a handful of rows.

Distributions, sampling, and variance

A distribution is the picture of where your numbers land: bunched near one value, or spread out? Two datasets can share the same average and still be completely different — one calm, one wild.

Variance is not the range. Range is just max - min — only two points. Variance uses every value's distance from the center, so one outlier does not get to define the whole spread.

Distributions, sampling, and variance — step 1 of 7

Distributions, sampling, and variance

Distributions, sampling, and variance