Summary

Compression of protein sequences and structures from PLM representations passed through VQ-VAEs is prone to codebook collapse when insufficiently compressed (1). This can be ameliorated using alternative architectures.

Figures

Ref (1)

See also

1.
Lu AX, Yan W, Yang KK, Gligorijevic V, Cho K, Abbeel P, et al. Tokenized and continuous embedding compressions of protein sequence and structure. Patterns. 2025;6(6):101289. Available from: https://doi.org/10.1016/j.patter.2025.101289