Multiple sequence alignments (MSAs) are sets of sequences aligned either to a reference sequence or to each other. They are widely used in bioinformatics:
- Calculate summary statistics such as PSSMs and Potts models
- Infer protein properties, such as structure, via neural networks (MSA Transformer, AlphaFold2/RosettaFold)
- Predict evolutionary relationships (e.g., ancestral sequence reconstruction)
Notes
1.
Su J, Han C, Zhou Y, Shan J, Zhou X, Yuan F. SaProt: Protein Language Modeling with Structure-aware Vocabulary. openRxiv; 2023. Available from: https://doi.org/10.1101/2023.10.01.560349
Ref