菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-04
📄 Abstract - Measuring Differences between Conditional Distributions using Kernel Embeddings

Comparing conditional distributions is a fundamental challenge in statistics and machine learning, with applications across a wide range of domains. While proposed methods for measuring discrepancies using kernel embeddings of distributions in a reproducing kernel Hilbert space (RKHS) provide powerful non-parametric techniques, the existing literature remains fragmented and lacks a unified theoretical treatment. This paper addresses this gap by establishing a coherent framework for studying kernel-based methods to measure divergence between conditional distributions through what we refer to as conditional maximum mean discrepancy (CMMD). The CMMD consists of a family of metrics which we call levels, with three special cases each using a different type of RKHS embedding: CMMD$_0$ (conditional mean operators), CMMD$_1$ (conditional mean embeddings), and CMMD$_2$ (joint mean embeddings). We additionally introduce a general level $s$ CMMD, clarifying the required assumptions, and establishing mathematical connections between the levels through the lens of operator-based smoothing. In addition to reviewing previously proposed estimators, we introduce a novel doubly robust estimator for the CMMD that maintains consistency provided at least one of the underlying models is correctly specified. We provide numerical experiments demonstrating that the CMMD effectively captures complex conditional dependencies for statistical testing.

顶级标签: machine learning theory
详细标签: kernel embeddings conditional distributions maximum mean discrepancy operator smoothing distribution divergence 或 搜索:

使用核嵌入度量条件分布之间的差异 / Measuring Differences between Conditional Distributions using Kernel Embeddings


1️⃣ 一句话总结

本文提出了一套统一的框架(条件最大均值差异,CMMD),通过核方法在再生核希尔伯特空间中度量两个条件分布之间的差异,并引入了一种双重稳健的估计器,即使部分模型设定有误也能保证一致性,从而在统计检验中有效捕捉复杂的条件依赖关系。

源自 arXiv: 2605.02260