When we can't linearly separate groups, such as concentrated circles, one way to solve a variety of such problems is with Spectral Clustering. We can build a similarity matrix which represents the distance between each pair of data points. Then we take the eigenvectors and eigenvalues, and we often find, that the data in the transformed space is linearly separable. Spectral Clustering is interesting in two says. First that it uses the similarity matrix(connection to graph theory), second that it looks at the eigenvectors of that.
Several Laplacians of the Similarity matrix can be made, including RW for Markovian/Random-Walk type Laplacian, and sym for symmetric.
The minimum eigenvalue for all 3 is 0 (up to numerical error), and they all have this eigenvalue. The multiplicity of eigenvalue 0 is related to the number of connected components in the similarity graph, and using the fully connected similarity graph the multiplicity is thus 1. The corresponding eigenvector is constant (for L_un and L_rw), and D^(1/2) for L_sym. The source says to use the first K eigenvectors (thus including the constant vector).
The similarity matrix is defined below. It's the exponential of the distance between points, with c factor taking the role of a user-defined inverse variance. Furthermore, we use the fully connected similarity matrix itself and thus set W = S.
For a fully connected similarity matrix, the sum for a vertex degree is over all points. We can write the entries of diagonal matrix D as
We can now fully calculate each Laplacian and test them. Once we represent the data using eigenvectors of these laplacians, we are hopefully able to cluster the data points linearly using Kmeans.
1) Concentric circles (K = 2)
This is a very simple and fun example where direct linear separation fails. However if we look at the similarity matrix and the eigenvectors of that, using the spectral clustering method, we can see if we succesfully manage to cluster the concentric circles.
RW Laplacian - Good result
Symmetric Laplacian - Good result
Unnormalized Laplacian - Good result
Clear separability using L_rw
Clear separability using L_sym
L_un separates succesfully
The eigenvalues corresponding to each Laplacian, for all N = 200 data points are seen below.
2) Noisy moons (K = 2)
As another example of data that is not linearly separable, we can look at noisy moons generated by Scikit-learn. These half-moons overlap and interlock.
L_rw shows succesful clustering
Clear separability using L_sym matrix
Clear separability using L_un matrix
L_rw shows succesful clustering
Clear separability using L_sym matrix
Clear separability using also L_un matrix
The eigenvalues corresponding to each Laplacian, for all N = 200 data points are seen below.
References
http://graphics.stanford.edu/courses/cs233-25-spring/ReferencedPapers/Luxburg07_spectral_clustering_tutorial_4488.pdf