菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-31
📄 Abstract - Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally predictable spans while under-allocating computation to semantically critical transitions. We propose $\textbf{Dynamic Large Concept Models (DLCM)}$, a hierarchical language modeling framework that learns semantic boundaries from latent representations and shifts computation from tokens to a compressed concept space where reasoning is more efficient. DLCM discovers variable-length concepts end-to-end without relying on predefined linguistic units. Hierarchical compression fundamentally changes scaling behavior. We introduce the first $\textbf{compression-aware scaling law}$, which disentangles token-level capacity, concept-level reasoning capacity, and compression ratio, enabling principled compute allocation under fixed FLOPs. To stably train this heterogeneous architecture, we further develop a $\textbf{decoupled $\mu$P parametrization}$ that supports zero-shot hyperparameter transfer across widths and compression regimes. At a practical setting ($R=4$, corresponding to an average of four tokens per concept), DLCM reallocates roughly one-third of inference compute into a higher-capacity reasoning backbone, achieving a $\textbf{+2.69$\%$ average improvement}$ across 12 zero-shot benchmarks under matched inference FLOPs.

顶级标签: llm model training theory
详细标签: scaling laws hierarchical compression semantic concepts efficient inference parameterization 或 搜索:

动态大概念模型:自适应语义空间中的潜在推理 / Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space


1️⃣ 一句话总结

这篇论文提出了一种名为‘动态大概念模型’的新框架,它通过将文本自动压缩成更高级的‘概念’来重新分配计算资源,从而在保持相同计算成本的前提下,显著提升了大型语言模型在多项任务上的推理性能。

源自 arXiv: 2512.24617