Dimensionality reduction refers to the process of compressing input features into dimensions, where .
Notes
- t-SNE and UMAP distort input values (1). There are other examples of this in the single-cell omics field.
Ref (1)
Approaches
Principal component analysis
Principal component analysis (PCA) is a standard dimensionality reduction tool widely used in data science. Per (2), its interpretation requires the following assumptions:
- The underlying patterns exist
- The patterns exhibited in the data are independent of one another
- The patterns combine linearly to form the data
- The data contain exclusively these patterns and additive, uncorrelated noise
- The observations are independent
Figure 1 from Shinn (2)
t-Stochastic Neighbor Embedding
- Anecdotally, this is better for local clustering and patterns
Uniform Manifold Approximation and Projection
- Anecdotally, this is better for finding global clustering and patterns
Autoencoders
These are in practice not used very often.

Variational autoencoders
1.
Chari T, Pachter L. The specious art of single-cell genomics. PLOS Computational Biology. 2023;19(8):e1011288. Available from: https://doi.org/10.1371/journal.pcbi.1011288
2.
Shinn M. Phantom oscillations in principal component analysis. Proceedings of the National Academy of Sciences. 2023;120(48). Available from: https://doi.org/10.1073/pnas.2311420120