๐Ÿƒt-SNE

Similar to Isomap, t-SNE (t-Distributed Stochastic Neighbor Embedding) aims to project data into a lower-dimensional space while preserving the local neighborhood information. However, t-SNE takes a different approach to achieve this goal. Let's explore how t-SNE works:

  1. Similarity Calculation with Normal Distribution: For each data point, t-SNE measures the similarity to all other points by placing them on a normal distribution curve and calculating the distances. This process is repeated for every data point, resulting in a matrix that records the similarity scores between all data pairs.

  2. Mapping to Lower Dimensions with t-Distribution: t-SNE then randomly maps the data into a lower-dimensional space and recalculates the similarities using a t-distribution instead of a normal distribution. The t-distribution, with its flatter shape and higher tails, helps spread the data points further apart. This step produces another similarity matrix in the lower-dimensional space.

  3. Minimizing KL Divergence: Finally, t-SNE uses gradient descent to minimize the Kullback-Leibler (KL) divergence between the two similarity matrices (the one from step 1 and the one from step 2). Through an iterative process, t-SNE adjusts the positions of the data points, moving them closer to their nearest neighbors while pushing them away from distant ones, as recorded in the first matrix. This optimization continues until it reaches the maximum number of iterations, resulting in a lower-dimensional representation that preserves the original local neighborhood structure.

To understand more details of t-SNE, check here.

To reduce our campaign data into 3 dimensions using t-SNE, the code looks like:

An important parameter in the above code is perplexity, it specifies the density of neighborhoods. Smaller value leads to larger number of small groups and larger value leads to fewer but tightly packed groups. Normally, you can start with values between 5 and 50.

The data plot looks like

๐ŸŒป Check t-SNE code here >>

Last updated