Slice-based error analysis (step 1/7) · metrics, slices, and error analysis

Slice-based error analysis: where the average lies

A single accuracy number is an average, and averages hide the groups a model is failing. Run the editor: this ticket-router is "67% accurate overall." That sounds mediocre but usable — until you split it.

Slice it by who it serves

A slice is a subgroup of your data: a language, a region, a customer tier, a device, a ticket type. Compute accuracy per slice and the hidden failure jumps out:

English tickets: 4 of 4 correct → 100%
Spanish tickets: 0 of 2 correct → 0%

The model isn't "67% good." It's perfect for English speakers and completely broken for Spanish speakers. Shipping the average means shipping a tool that silently fails an entire group of customers.

Why a builder always slices

Aggregate metrics are how bad models pass review. The real questions are "who does it fail, and how badly?" Slicing turns "good enough on average" into "unusable for Spanish tickets — fix retrieval for that language before launch." You pick slices from how the business is actually segmented: the groups where a silent failure would cost you trust, money, or a lawsuit.

The move: group the cases by a slice key, compute accuracy within each group, and read the worst one.

Slice-based error analysis: where the average lies

Slice it by who it serves

A slice is a subgroup of your data: a language, a region, a customer tier, a device, a ticket type. Compute accuracy per slice and the hidden failure jumps out:

English tickets: 4 of 4 correct → 100%
Spanish tickets: 0 of 2 correct → 0%

Why a builder always slices

The move: group the cases by a slice key, compute accuracy within each group, and read the worst one.

Slice-based error analysis — step 1 of 7

Slice-based error analysis: where the average lies

Slice it by who it serves

Why a builder always slices

Slice-based error analysis: where the average lies

Slice it by who it serves

Why a builder always slices