Summary

DL models for protein sequences (including PLMs) learn the phylogenetics of protein evolution. (1) showed this in the 2D latent space of protein family-specific Variational autoencoders (below); whereas (2) showed how repurposing perplexity as velocities within a manifold of sequence embedding space from ESM-1b recapitulated evolutionary trajectories of viral and eukaryotic proteins

Figures

Figures from (1)

Ref (2)

See also

1.
Detlefsen NS, Hauberg S, Boomsma W. Learning meaningful representations of protein sequences. Nature Communications. 2022;13(1). Available from: https://doi.org/10.1038/s41467-022-29443-w
2.
Hie BL, Yang KK, Kim PS. Evolutionary velocity with protein language models predicts evolutionary dynamics of diverse proteins. Cell Systems. 2022;13(4):274-285.e6. Available from: https://doi.org/10.1016/j.cels.2022.01.003