The elbow statistic: Multiscale clustering statistical significance

📄 Abstract - The elbow statistic: Multiscale clustering statistical significance

Selecting the number of clusters remains a fundamental challenge in unsupervised learning. Existing criteria typically target a single ``optimal'' partition, often overlooking statistically meaningful structure present at multiple resolutions. We introduce ElbowSig, a framework that formalizes the heuristic ``elbow'' method as a rigorous inferential problem. Our approach centers on a normalized discrete curvature statistic derived from the cluster heterogeneity sequence, which is evaluated against a null distribution of unstructured data. We derive the asymptotic properties of this null statistic in both large-sample and high-dimensional regimes, characterizing its baseline behavior and stochastic variability. As an algorithm-agnostic procedure, ElbowSig requires only the heterogeneity sequence and is compatible with a wide range of clustering methods, including hard, fuzzy, and model-based clustering. Extensive experiments on synthetic and empirical datasets demonstrate that the method maintains appropriate Type-I error control while providing the power to resolve multiscale organizational structures that are typically obscured by single-resolution selection criteria.

肘部统计量：多尺度聚类的统计显著性 / The elbow statistic: Multiscale clustering statistical significance

1️⃣ 一句话总结

这篇论文提出了一个名为ElbowSig的通用框架，它将常用的‘肘部法则’启发式方法转化为一个严谨的统计推断问题，用于评估和发现数据中多个尺度上具有统计显著性的聚类结构，而不仅仅是寻找单一的‘最优’分区。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要