Summary
Remote homologs with sequence identity in the 20-25% region cannot be identified using methods for assembling Multiple sequence alignments centered on hidden Markov models (1). This was corroborated by (2). (3) found that PLMs were better than HMMs at homolog detection at all scales (link).
1.
Kilinc M, Jia K, Jernigan RL. Improved global protein homolog detection with major gains in function identification. Proceedings of the National Academy of Sciences. 2023;120(9). Available from: https://doi.org/10.1073/pnas.2211823120
2.
Liu W, Wang Z, You R, Xie C, Wei H, Xiong Y, et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nature Communications. 2024;15(1). Available from: https://doi.org/10.1038/s41467-024-46808-5
3.
Wu KE, Chang H, Zou J. ProteinCLIP: enhancing protein language models with natural language. openRxiv; 2024. Available from: https://doi.org/10.1101/2024.05.14.594226