promptdojo_

Tokenizers and the context budget

A model never sees your raw string. A tokenizer chops text into tokens — and tokens are not characters and not quite words. Real tokenizers (BPE, etc.) use subword pieces: common words are one token, rare words split into several, and a rough rule of thumb is ~4 characters per token in English. We'll use a simple word split as a browser-safe stand-in for counting, but remember the real unit is subword.

The context budget

Every model has a context window: the maximum number of tokens it can handle at once, counting input + output together. That's a hard budget:

  • Go over it and the model truncates (silently drops the oldest tokens) or errors.
  • Cost is per token, so token count is also your bill.

This is why "count tokens, not characters" matters: a 10,000-character prompt might be ~2,500 tokens, and whether it fits (and what it costs) is measured in tokens.

Why a builder cares

The two real failures are over-budget (your long prompt + retrieved docs blow past the window and the model quietly forgets the start) and surprise cost (you budgeted by characters, got billed by tokens). Estimating tokens and checking them against the window before you send is the habit that prevents both. Real code uses the model's actual tokenizer (e.g. tiktoken); the skill is the same — count tokens, compare to the budget.