Sequence-only protein language models implicitly cluster proteins at fineness levels that increase with size

Summary

Sequence-only protein language models implicitly cluster protein sequences at fineness granularities that increase with size (1). For example, ESM-15B implicitly learns about sequence constraints from very similar proteins, whereas the 650M model learns from a broader pool of sequences. This was determined by observing the extent to which these models learn homo-oligomeric contacts and comparing their performance to that of MSA-Pairformer.

Figures

Ref (1)

Quartz 4

Explorer

Sequence-only protein language models implicitly cluster proteins at fineness levels that increase with size

Summary

Figures

See also

Graph View

Backlinks