MSA-transformer is a protein language model that operates on multiple sequence alignments. It was used as the basis to train ESM-1v. It was developed in parallel with the Evoformer and has many features in common. It is the basis for ProteinNPT.
Ref (1)
Notes
- The column-wise attention matrices of MSA Transformer recapitulate protein phylogenies, and this was concentrated in early layers of the NN (2).
- MSA transformer outperforms PLMs and Inverse folding models on stability prediction when fine-tuned on the Tsuboyama et al. (3) dataset (4). This includes hybrid sequence-structure methods like ProstT5.
- MSA-transformer can extract intermolecular contacts from correctly paired MSAs, but not incorrectly paired MSAs (5).
Ref (5)
1.
Rao R, Liu J, Verkuil R, Meier J, Canny JF, Abbeel P, et al. MSA Transformer. openRxiv; 2021. Available from: https://doi.org/10.1101/2021.02.12.430858
2.
Lupo U, Sgarbossa D, Bitbol A-F. Protein language models trained on multiple sequence alignments learn phylogenetic relationships. Nature Communications. 2022;13(1). Available from: https://doi.org/10.1038/s41467-022-34032-y
3.
Tsuboyama K, Dauparas J, Chen J, Laine E, Mohseni Behbahani Y, Weinstein JJ, et al. Mega-scale experimental analysis of protein folding stability in biology and design. Nature. 2023;620(7973):434–44. Available from: https://doi.org/10.1038/s41586-023-06328-6
4.
Cuturello F, Celoria M, Ansuini A, Cazzaniga A. Enhancing predictions of protein stability changes induced by single mutations using MSA-based language models. Bioinformatics. 2024;40(7). Available from: https://doi.org/10.1093/bioinformatics/btae447
5.
Lupo U, Sgarbossa D, Bitbol A-F. Pairing interacting protein sequences using masked language modeling. Proceedings of the National Academy of Sciences. 2024;121(27). Available from: https://doi.org/10.1073/pnas.2311887121