Summary
The PLM-based sequence search method PLMSearch (1) outperformed virtually all other tested sequence-based methods when querying DBs, while matching structure-based methods such as Foldseek. This includes pairs with low sequence identity (<0.3) but high structurally similarity (TM-score > 0.5). Additionally, it did so at far faster speeds using precomputed mean-pooled ESM-1b embeddings (so fixed size per protein).
Figures
Ref (1)
1.
Liu W, Wang Z, You R, Xie C, Wei H, Xiong Y, et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nature Communications. 2024;15(1). Available from: https://doi.org/10.1038/s41467-024-46808-5