Multi-head attention helps by:
Answer options
A
Computing the same attention repeatedly
B
Learning multiple types of relationships in parallel
C
Removing the need for encoders
D
Guaranteeing perfect generalization
Correct answer: Learning multiple types of relationships in parallel
Explanation
The correct answer is: Learning multiple types of relationships in parallel.