Masked self-attention prevents a token from attending to:
Answer options
A
All tokens
B
Future tokens during autoregressive generation
C
Past tokens only
D
Its own embedding
Correct answer: Future tokens during autoregressive generation
Explanation
The correct answer is: Future tokens during autoregressive generation.