What is the primary advantage of pretraining a Transformer on a large corpus before fine-tuning on a specific task?
Answer options
A
It reduces the risk of overfitting
B
It allows the model to leverage general language
C
understanding
D
It makes the model smaller
E
It makes the model more robust to adversarial attacks
F
It speeds up the fine-tuning process
Correct answer: It allows the model to leverage general language, understanding
Explanation
The primary advantage of pretraining a Transformer on a large corpus is that the model acquires broad general language understanding (grammar, semantics, world knowledge) which is then efficiently adapted to specific tasks via fine-tuning on smaller labeled datasets. Options [1] and [2] together form the complete answer: 'It allows the model to leverage general language understanding.'