菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-09
📄 Abstract - Hidden Biases in Conditioning Autoregressive Models

Large language and music models are increasingly used for constrained generation: rhyming lines, fixed meter, inpainting or infilling, positional endings, and other global form requirements. These systems often perform strikingly well, but the induced procedures are usually not exact conditioning of the underlying autoregressive model. This creates a hidden inferential bias, distinct from the better-known notion of bias inherited from the training set: samples are distorted relative to the true constrained distribution, with no generic guarantee of complete coverage of the admissible solution space or of correct conditional probabilities over valid completions. We formalize several exact inference tasks for autoregressive models and prove corresponding hardness results. For succinctly represented autoregressive models whose next-token probabilities are computable in polynomial time, exact sentence-level maximum a posteriori (MAP) decoding is NP-hard. This hardness persists under unary and metrical constraints. On the sampling side, exact conditioned normalization is \#P-hard even for regular constraints such as fixed-length terminal events. Unlike finite-state Markov models, general autoregressive models do not admit a bounded-state dynamic program for these tasks. These results formalize a standard claim in the neural decoding literature: local autoregressive sampling is easy, whereas exact decoding and exact conditioning under global form constraints are computationally intractable in general.

顶级标签: llm theory model evaluation
详细标签: constrained generation inference hardness autoregressive models computational complexity conditional sampling 或 搜索:

自回归模型条件化中的隐藏偏差 / Hidden Biases in Conditioning Autoregressive Models


1️⃣ 一句话总结

这篇论文指出,当大型语言和音乐模型被用于满足特定格式要求(如押韵、固定结构)的生成任务时,其常用的近似方法会产生隐藏的推理偏差,导致生成结果偏离理论上的真实分布,并且论文从理论上证明了进行精确的条件化生成或解码在计算上是极其困难的。

源自 arXiv: 2604.07855