题目
Transformer架构最早在哪篇论文中提出?A. GPT-3: Language Models are Few-Shot LearnersB. Attention Is All You NeedC. Improving Language Understanding by Generative Pre-trainingD. BERT: Pre-training of Deep Bidirectional Transformers
Transformer架构最早在哪篇论文中提出?
A. GPT-3: Language Models are Few-Shot Learners
B. Attention Is All You Need
C. Improving Language Understanding by Generative Pre-training
D. BERT: Pre-training of Deep Bidirectional Transformers
题目解答
答案
B. Attention Is All You Need