Attention scores are computed from queries, keys and values using:
Answer options
A
Element-wise product followed by softmax
B
Dot-products and softmax (scaled)
C
Only LeakyReLU
D
Only KL divergence
Correct answer: Dot-products and softmax (scaled)
Explanation
Transformer attention computes similarity scores via scaled dot-products of queries (Q) and keys (K), normalizes with softmax, then produces a weighted sum of values (V).