promptdojo_

Stripped of the math, what does an attention layer actually compute for a token?