从单个KL恒等式看指数族分布 / Exponential families from a single KL identity
1️⃣ 一句话总结
本文发现了一个简洁的KL散度恒等式,仅通过它和KL散度非负的性质,无需复杂数学推导,就能统一推导出指数族分布的多个核心结论,包括吉布斯变分原理、投影定理和熵正则化强化学习中的指数倾斜公式等。
Exponential families encompass the distributions central to modern machine learning -- softmax, Gaussians, and Boltzmann distributions -- and underlie the theory of variational inference, entropy-regularized reinforcement learning, and RLHF. We isolate a simple identity for exponential families that expresses the KL difference $\mathrm{KL}(q \| p_{\lambda_2}) - \mathrm{KL}(q \| p_{\lambda_1})$ in terms of the log-partition function $A(\lambda)$ and the moment $\mu_q$. Remarkably, this identity together with the single fact that $\mathrm{KL} \geq 0$ (with equality iff $p = q$) suffices, by direct substitution and rearrangement, to derive a cluster of results that are classically obtained by separate, heavier arguments: a generalized three-point identity for arbitrary reference distributions, Pythagorean theorems for I-projections and reverse I-projections, convexity of the log-partition function, identification of its Legendre dual in KL terms, the Gibbs variational principle, and the explicit optimizer in KL-regularized reward maximization, including the exponential tilting formula underlying entropy-regularized control and RLHF. Beyond these purely algebraic consequences, standard analytic arguments recover the gradient formula for the log-partition function, the Bregman representation of within-family KL divergence, and the surjectivity of the moment map. The note is self-contained.
从单个KL恒等式看指数族分布 / Exponential families from a single KL identity
本文发现了一个简洁的KL散度恒等式,仅通过它和KL散度非负的性质,无需复杂数学推导,就能统一推导出指数族分布的多个核心结论,包括吉布斯变分原理、投影定理和熵正则化强化学习中的指数倾斜公式等。
源自 arXiv: 2604.28036