Transformers are more parallelizable than RNNs because:
Answer options
A
They remove recurrence and process sequences simultaneous
B
They use smaller models
C
They use GPUs only
D
They use SVMs internally
Correct answer: They remove recurrence and process sequences simultaneous
Explanation
Transformers process all tokens in parallel using self-attention (unlike sequential RNNs), making them faster to train and better at capturing long-range dependencies.