Summary

Protein structure is more conserved than sequence. Many proteins have similar structures but barely any sequence overlap (12). As a result, clustering proteins using their structures is more effective than clustering by using their sequence (5). Likewise, calculating phylogenies is more accurate using structure than sequence - (5) used AlphaFold2 models alongside Foldseek to do so.

Details

A specific example was highlighted by (6), which used qualitative/anecdotal observations, such as seeing that the adenosine deaminases were split into multiple groups using sequence clustering but not when using structure clustering.

Figures

Ref (3)

Ref (5)

See also

1.
Kilinc M, Jia K, Jernigan RL. Improved global protein homolog detection with major gains in function identification. Proceedings of the National Academy of Sciences. 2023;120(9). Available from: https://doi.org/10.1073/pnas.2211823120
2.
Illergård K, Ardell DH, Elofsson A. Structure is three to ten times more conserved than sequence—A study of structural response in protein cores. Proteins: Structure, Function, and Bioinformatics. 2009;77(3):499–508. Available from: https://doi.org/10.1002/prot.22458
3.
Chothia C, Lesk AM. The relation between the divergence of sequence and structure in proteins. The EMBO Journal. 1986;5(4):823–6. Available from: https://doi.org/10.1002/j.1460-2075.1986.tb04288.x
4.
del Alamo D, Meiler J, Mchaourab HS. Principles of Alternating Access in LeuT-fold Transporters: Commonalities and Divergences. Journal of Molecular Biology. 2022;434(19):167746. Available from: https://doi.org/10.1016/j.jmb.2022.167746
5.
Moi D, Bernard C, Steinegger M, Nevers Y, Langleib M, Dessimoz C. Structural phylogenetics unravels the evolutionary diversification of communication systems in gram-positive bacteria and their viruses. Nature Structural & Molecular Biology. 2025;32(12):2492–502. Available from: https://doi.org/10.1038/s41594-025-01649-8
6.
Huang J, Lin Q, Fei H, He Z, Xu H, Li Y, et al. Discovery of deaminase functions by structure-based protein clustering. Cell. 2023;186(15):3182-3195.e14. Available from: https://doi.org/10.1016/j.cell.2023.05.041