菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models

How can children acquire native-level syntax from limited input? According to the Poverty of the Stimulus Hypothesis (PoSH), the linguistic input children receive is insufficient to explain certain generalizations that are robustly learned; innate linguistic constraints, many have argued, are thus necessary to explain language learning. Neural language models, which lack such language-specific constraints in their design, offer a computational test of this longstanding (but controversial) claim. We introduce \poshbench, a training-and-evaluation suite targeting question formation, islands to movement, and other English phenomena at the center of the PoSH arguments. Training Transformer models on 10--50M words of developmentally plausible text, we find indications of generalization on all phenomena even without direct positive evidence -- yet neural models remain less data-efficient and their generalizations are weaker than those of children. We further enhance our models with three recently proposed cognitively motivated inductive biases. We find these biases improve general syntactic competence but not \poshbench performance. Our findings challenge the claim that innate syntax is the only possible route to generalization, while suggesting that human-like data efficiency requires inductive biases beyond those tested here.

顶级标签: llm natural language processing theory
详细标签: poverty of stimulus language acquisition syntactic generalization inductive biases transformer evaluation 或 搜索:

针对神经语言模型的刺激贫乏论统一评估 / A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models


1️⃣ 一句话总结

这篇论文通过训练Transformer模型发现,即使没有内置的先天语法规则,模型也能从有限数据中学习到一些句法规律,挑战了‘人类学习语言必须依赖先天语法’的传统观点,但同时也指出模型的学习效率和泛化能力仍远不及人类儿童。

源自 arXiv: 2602.09992