Part 8 — Transformers & Attention
Self-attention allows tokens to:
Only attend to their immediate neighbor
Attend to all tokens in the sequence
Ignore other tokens
Compute ELBO directly