Summary
BERT models and autoregressive transformers are equally effective on generative tasks and have different strengths and weaknesses related to pretraining (1). This was observed with DeBERTa and an open-source version of GPT-3.
1.
Samuel D. BERTs are Generative In-Context Learners. In: Advances in Neural Information Processing Systems 37. Neural Information Processing Systems Foundation, Inc. (NeurIPS); 2024. p. 2558–89. Available from: https://doi.org/10.52202/079017-0084