📄
Abstract - Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning
In this article we develop a new method for summarizing a ranking distribution, \textit{i.e.} a probability distribution on the symmetric group $\mathfrak{S}_n$, beyond the classical theory of consensus and Kemeny medians. Based on the notion of \textit{local ranking median}, we introduce the concept of \textit{consensus ranking distribution} ($\crd$), a sparse mixture model of Dirac masses on $\mathfrak{S}_n$, in order to approximate a ranking distribution with small distortion from a mass transportation perspective. We prove that by choosing the popular Kendall $\tau$ distance as the cost function, the optimal distortion can be expressed as a function of pairwise probabilities, paving the way for the development of efficient learning methods that do not suffer from the lack of vector space structure on $\mathfrak{S}_n$. In particular, we propose a top-down tree-structured statistical algorithm that allows for the progressive refinement of a CRD based on ranking data, from the Dirac mass at a Kemeny median at the root of the tree to the empirical ranking data distribution itself at the end of the tree's exhaustive growth. In addition to the theoretical arguments developed, the relevance of the algorithm is empirically supported by various numerical experiments.
超越Kemeny中位数:共识排序分布的定义、性质与统计学习 /
Beyond Kemeny Medians: Consensus Ranking Distributions Definition, Properties and Statistical Learning
1️⃣ 一句话总结
这篇论文提出了一种新的方法来总结排序数据的概率分布,它通过构建一种称为‘共识排序分布’的稀疏混合模型来近似原始分布,并基于Kendall τ距离设计了一种高效的树状统计学习算法,从而克服了传统Kemeny中位数方法的局限性,并能更有效地从排序数据中学习。