Which is an advantage of Transformers over RNNs?
Answer options
A
Better at long-range dependencies and parallel training
B
Require sequential computation only
C
Always fewer params
D
No need for GPUs
Correct answer: Better at long-range dependencies and parallel training
Explanation
Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.