← 返回列表

arXiv 提交日期: 2026-02-10

📄 Abstract - The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It

Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high probability, essentially every partition of the data is a fixed point. Consequently, Lloyd's algorithm simply returns its initial partition - even when the underlying clusters are trivially recoverable by other methods. In contrast, we prove that Hartigan's k-means algorithm does not exhibit this pathology. Our results show the stark difference between these algorithms and offer a theoretical explanation for the empirical difficulties often observed with k-means in high dimensions.

顶级标签: machine learning theory model evaluation

k均值算法在高维空间的灾难性失效及哈蒂根算法如何避免它 / The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It

1️⃣ 一句话总结

这篇论文证明，在高维、高噪声数据中，经典的Lloyd k均值算法会完全失效，几乎任何初始划分都会成为最终结果，而Hartigan的k均值算法则能避免这个问题，从而解释了为何高维数据下k均值效果不佳。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2602.09936

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要