Dimensionality reduction

Dimensionality reduction refers to the process of compressing $N$ input features into $M$ dimensions, where $M < N$ .

Notes

t-SNE and UMAP distort input values (1). There are other examples of this in the single-cell omics field. Ref (1)

Approaches

Principal component analysis

Principal component analysis (PCA) is a standard dimensionality reduction tool widely used in data science. Per (2), its interpretation requires the following assumptions:

The underlying patterns exist
The patterns exhibited in the data are independent of one another
The patterns combine linearly to form the data
The data contain exclusively these patterns and additive, uncorrelated noise
The observations are independent

Figure 1 from Shinn (2)

t-Stochastic Neighbor Embedding

Anecdotally, this is better for local clustering and patterns

Uniform Manifold Approximation and Projection

Anecdotally, this is better for finding global clustering and patterns

Autoencoders

These are in practice not used very often.

Variational autoencoders

See Variational autoencoders

Chari T, Pachter L. The specious art of single-cell genomics. PLOS Computational Biology. 2023;19(8):e1011288. Available from: https://doi.org/10.1371/journal.pcbi.1011288

Shinn M. Phantom oscillations in principal component analysis. Proceedings of the National Academy of Sciences. 2023;120(48). Available from: https://doi.org/10.1073/pnas.2311420120

Quartz 4

Explorer