The Transformer architecture introduced the concept of self- attention to handle which primary challenge in sequence modeling?
Answer options
A
Reducing model size
B
Improving model robustness
C
Capturing dependencies regardless of their distance in the
D
input
E
Speeding up training
F
Handling larger input sizes
Correct answer: Capturing dependencies regardless of their distance in the, input
Explanation
The Transformer's self-attention mechanism was introduced primarily to address the challenge of capturing long-range dependencies in sequences regardless of the distance between dependent tokens in the input — a fundamental limitation of RNNs that process data sequentially. Options [2] and [3] form the complete answer: 'Capturing dependencies regardless of their distance in the input.'