菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-01
📄 Abstract - Probabilistic Learning and Generation in Deep Sequence Models

Despite exceptional predictive performance of Deep sequence models (DSMs), the main concern of their deployment centers around the lack of uncertainty awareness. In contrast, probabilistic models quantify the uncertainty associated with unobserved variables with rules of probability. Notably, Bayesian methods leverage Bayes' rule to express our belief of unobserved variables in a principled way. Since exact Bayesian inference is computationally infeasible at scale, approximate inference is required in practice. Two major bottlenecks of Bayesian methods, especially when applied in deep neural networks, are prior specification and approximation quality. In Chapter 3 & 4, we investigate how the architectures of DSMs themselves can be informative for the design of priors or approximations in probabilistic models. We first develop an approximate Bayesian inference method tailored to the Transformer based on the similarity between attention and sparse Gaussian process. Next, we exploit the long-range memory preservation capability of HiPPOs (High-order Polynomial Projection Operators) to construct an interdomain inducing point for Gaussian process, which successfully memorizes the history in online learning. In addition to the progress of DSMs in predictive tasks, sequential generative models consisting of a sequence of latent variables are popularized in the domain of deep generative models. Inspired by the explicit self-supervised signals for these latent variables in diffusion models, in Chapter 5, we explore the possibility of improving other generative models with self-supervision for their sequential latent states, and investigate desired probabilistic structures over them. Overall, this thesis leverages inductive biases in DSMs to design probabilistic inference or structure, which bridges the gap between DSMs and probabilistic models, leading to mutually reinforced improvement.

顶级标签: machine learning model training theory
详细标签: probabilistic modeling bayesian inference deep sequence models generative models uncertainty quantification 或 搜索:

深度序列模型中的概率学习与生成 / Probabilistic Learning and Generation in Deep Sequence Models


1️⃣ 一句话总结

这篇论文通过利用深度序列模型(如Transformer)的内在结构特性来设计更有效的概率模型先验和近似推断方法,从而弥合了高性能但缺乏不确定性评估的深度模型与能量化不确定性的概率模型之间的鸿沟,实现了两者的相互增强。

源自 arXiv: 2603.00888