菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It

Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high probability, essentially every partition of the data is a fixed point. Consequently, Lloyd's algorithm simply returns its initial partition - even when the underlying clusters are trivially recoverable by other methods. In contrast, we prove that Hartigan's k-means algorithm does not exhibit this pathology. Our results show the stark difference between these algorithms and offer a theoretical explanation for the empirical difficulties often observed with k-means in high dimensions.

顶级标签: machine learning theory model evaluation
详细标签: clustering k-means high-dimensional data algorithm analysis fixed points 或 搜索:

k均值算法在高维空间的灾难性失效及哈蒂根算法如何避免它 / The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It


1️⃣ 一句话总结

这篇论文证明,在高维、高噪声数据中,经典的Lloyd k均值算法会完全失效,几乎任何初始划分都会成为最终结果,而Hartigan的k均值算法则能避免这个问题,从而解释了为何高维数据下k均值效果不佳。

源自 arXiv: 2602.09936