CNNs and local patterns (step 1/7) · cnns, transformers, and useful llm internals

CNNs and local patterns

A convolutional neural network (CNN) is built on one move: slide a small filter (a "kernel") across the input and compute a number at each position. The filter only sees a small local window at a time — that's the whole idea.

On an image, a 3×3 filter might fire on edges, corners, or textures — small local patterns — and deeper layers combine them into shapes.
The same filter weights are reused at every position ("weight sharing"), so a CNN has few parameters and finds a pattern wherever it appears (move the cat in the photo, still detected).

Run the editor: a length-2 edge-detector kernel [1, -1] slides over the signal and lights up where neighboring values differ. The output [0, -5, 0] says "the only edge is between positions 1 and 2."

The shape rule (you already know it)

For a signal of length n and a kernel of length k, the output has length n - k + 1 — you can only place the window where it fully fits. That's the same shape-arithmetic from the tensors chapter, and the #1 CNN bug an AI ships is an off-by-one on that range.

Why a builder cares

You won't hand-derive convolutions, but you'll read CNN code and need the intuition: small filter, local window, same weights everywhere. When a model is great at "is there a defect in this product photo?" but you don't know why it's a CNN, this is why — local patterns, detected anywhere. Real PyTorch spells it nn.Conv2d(...); the sliding-window idea is identical.