Simple But Surprisingly Good Clustering “Algorithm”

This is one of those jaw dropping papers.¬†Density Clustering, astonishingly simple and yet phonemically performance. No need to put it into math and hardly call it an “algorithm” (truly an algorithm). ¬†Here is how it works:

Given a distance/similarity matrix (pairwise distance/similarity for all data points) and a cutoff distance/similarity.

  1. For every point, count how many other points are within the cutoff distance. The count is the density of the current point.
  2. For every point, find all other points having a higher density. Among those with higher density, find the smallest distance, and use it as the current point’s distance.
  3. Plot density vs distance.
  4. All the outliers in the plot are cluster centers.
  5. Assign each point the cluster of its nearest neighbor.

No any fancy math, and it seems work. It has R library too. Well done.

Comments are closed.