论文题目:Kernel k-means, Spectral Clustering and Normalized Cuts
Summary
- 论文总结了传统的核k-means方法和谱聚类的方法,这两种方法看似是不相关的,但其实通过一定的公式推导和理论的证明,可以得到核k-means的方法也可以表达成为谱聚类那样的最大化迹的形式。
Problem Statement
- 线性的k-means只能解决线性空间上的聚类问题,谱聚类由于在谱空间上进行聚类因此可以聚类非线性的数据形式,可以将k-means通过非线性核转化到非线性的空间上去。而谱聚类由于需要计算特征值,因此计算量非常大。
Method
- 常用的非线性核函数
- 加权的核kmeans
- 加权核k-means算法
- 谱聚类优化函数
- 建立两者联系:将聚类中心写成矩阵的形式,并代入到k-means的优化函数中:
- 因此可以转化为最小化簇间距,即最大化迹。
Evaluation
- 论文主要是在算法的复杂度和计算时间上进行了比较,证明了算法的有效和高效。
Conclusion
- 核k-means核谱聚类两种方法是有联系的,在特定的情况下是可以转化等价的。
Notes
- Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space.
- A major drawback to k-means is that it cannot separate clusters that are non-linearly separable in input space.
- Clustering has received a significant amount of attention in the last few years as one of the fundamental problems in data mining. k-means is one of the most popular clustering algorithms.
References
- Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods.
- Iterative clustering of high dimensional text data augmented by local search.
- On clusterings – good, bad, and spectral.
- On spectral clustering: Analysis and an algorithm.
- Multiclass spectral clustering.
- Spectral relaxation for k-means clustering.