skip to content
promptdojo
_
[ save your spot ]
[ follow on x ]
Attention and Transformer blocks — step 2 of 7
ch 44 · cnns, transformers, and useful llm internals
2/7
promptdojo
_
›
phase 08 · ai/ml engineering buildout
›
ch 44 · cnns, transformers, and useful llm internals
lesson 3 of 5 · attention and transformer blocks
step 2 / 7
Stripped of the math, what does an attention layer actually compute for a token?
1
A relevance-weighted average of the other tokens' values (scores -> weights that sum to 1 -> weighted sum).
2
It picks exactly one other token and copies it.
3
It detects local edges with a sliding filter.
4
It shrinks the model by using fewer bits per number.
check
Show hint
Attention and Transformer blocks — step 2 of 7
ch 44 · cnns, transformers, and useful llm internals
2/7
promptdojo
_
›
phase 08 · ai/ml engineering buildout
›
ch 44 · cnns, transformers, and useful llm internals
lesson 3 of 5 · attention and transformer blocks
step 2 / 7
Stripped of the math, what does an attention layer actually compute for a token?
1
A relevance-weighted average of the other tokens' values (scores -> weights that sum to 1 -> weighted sum).
2
It picks exactly one other token and copies it.
3
It detects local edges with a sliding filter.
4
It shrinks the model by using fewer bits per number.
check
Show hint
park a thought