幅度距离:一种数据集相似性的几何度量 / Magnitude Distance: A Geometric Measure of Dataset Similarity
1️⃣ 一句话总结
这篇论文提出了一种名为‘幅度距离’的新方法,用于衡量不同数据集之间的相似性,它通过一个可调节的参数来捕捉数据的整体结构或局部细节,并且在处理高维数据时依然有效,还能用于训练生成模型。
Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite datasets using the notion of the \emph{magnitude} of a metric space. The proposed distance incorporates a tunable scaling parameter, $t$, that controls the sensitivity to global structure (small $t$) and finer details (large $t$). We prove several theoretical properties of magnitude distance, including its limiting behavior across scales and conditions under which it satisfies key metric properties. In contrast to classical distances, we show that magnitude distance remains discriminative in high-dimensional settings when the scale is appropriately tuned. We further demonstrate how magnitude distance can be used as a training objective for push-forward generative models. Our experimental results support our theoretical analysis and demonstrate that magnitude distance provides meaningful signals, comparable to established distance-based generative approaches.
幅度距离:一种数据集相似性的几何度量 / Magnitude Distance: A Geometric Measure of Dataset Similarity
这篇论文提出了一种名为‘幅度距离’的新方法,用于衡量不同数据集之间的相似性,它通过一个可调节的参数来捕捉数据的整体结构或局部细节,并且在处理高维数据时依然有效,还能用于训练生成模型。
源自 arXiv: 2602.08859