Broadcasting and vectorization (step 1/7) · pytorch tensors and autograd

Broadcasting and vectorization

Two words show up everywhere in tensor code, and both are simpler than they sound.

Vectorization: one operation over the whole tensor

Instead of looping element-by-element, you express the operation once and it applies to the entire tensor. In PyTorch you'd write prices * 1.1; here we model it with a comprehension. The payoff is twofold: the code reads like the math (scores = features @ weights), and the real libraries run it far faster than a Python for loop because the work happens in optimized native code, not one element at a time.

Broadcasting: stretching a smaller shape to fit

You often want to combine tensors of different shapes — add one bias number to every feature, or add a row of biases to every row of a batch. Broadcasting is the rule that stretches the smaller operand to match the bigger one:

vector + scalar → the scalar is applied to every element.
matrix + row_vector → the row is added to every row of the matrix (the row's length must equal the matrix's column count).

matrix = [[1, 2, 3],          bias = [10, 20, 30]
          [4, 5, 6]]
matrix + bias  ->  [[11, 22, 33],
                    [14, 25, 36]]   # bias added to each row

Why a builder cares

Broadcasting is the rule behind both the magic ("why did adding a length-3 vector to a 2×3 matrix work?") and the bugs ("why did adding a length-2 vector to it crash?"). The shapes have to be compatible: the trailing dimensions must match or be 1. When AI-written tensor code throws a broadcasting error, you're not debugging calculus — you're checking that the shapes line up, exactly like the last lesson.