Summary

Mean-pooled embeddings outperform other zero-shot approaches when transfer learning on protein language models (1). Moreover, the outperformance was more noticeably in variable-length datasets (PISCES) than fixed-lengths datasets (DMS). This was done using the ESM2-150M model with Lasso regression and cross-validation. However, it was found to perform worse than optimal transport, especially when using embeddings from small neural networks when a small amount of training data was available.

Figures

Ref (1)

1.
Vieira LC, Handojo ML, Wilke CO. Medium-sized protein language models perform well at transfer learning on realistic datasets. Scientific Reports. 2025;15(1). Available from: https://doi.org/10.1038/s41598-025-05674-x