Variational autoencoders (VAEs) are neural networks that learn a reduced representation of a set of inputs by compressing an input signal and trying to recover it in the presence of noise. Vector-quantized variational autoencoders (VQ-VAEs) discretize the inputs into one-hot representations, essentially creating an alphabet (or codebook) for possible input values. These can, however, be prone to “codebook collapse” (1).
1.
Lu AX, Yan W, Yang KK, Gligorijevic V, Cho K, Abbeel P, et al. Tokenized and continuous embedding compressions of protein sequence and structure. Patterns. 2025;6(6):101289. Available from: https://doi.org/10.1016/j.patter.2025.101289