๐ŸƒUniform Manifold Approximation and Projection (UMAP)

UMAP (Uniform Manifold Approximation and Projection) is a dimensional reduction method that considers both local and global structures. Here's a brief summary of how UMAP works:

  1. Construct a Neighborhood Graph: For each data point, identify its nearest neighbors using a distance metric (e.g., Euclidean distance). UMAP focuses on a local neighborhood approach, considering only a fixed number of nearest neighbors for each point.

  2. Fuzzy-Simplicial Set Approximation: Convert the nearest neighbors graph into a fuzzy representation of a simplicial set. This involves determining the likelihood of each pair of data points being connected by an edge in the low-dimensional representation.

  3. Optimize Low-Dimensional Embedding: Optimize the low-dimensional representation of the data by minimizing the mismatch between the fuzzy-simplicial set in the original data space and the low-dimensional space. This optimization is achieved through stochastic gradient descent.

  4. Preserve Global and Local Structure: Local structure is preserved by ensuring that nearby points in the original space remain close in the low-dimensional space. Global structure is preserved by maintaining the broader connectivity patterns in the data.

To understand more details of UMAP, check herearrow-up-right.

Meanwhile, UMAP supports both unsupervised and supervised dimensional reduction! Look at the code below, the only difference is in fit_transform() function:

After the data had been reduced to 3 dimensions, the unsupervised output looks like this:

For supervised output, let's look at the projections from both training data and testing data:

๐ŸŒป Check UMAP code here >>arrow-up-right

After seeing all these methods, how do you plan to do dimensional reduction in the future? You're more than welcome to share you ideas herearrow-up-right!

Last updated