Words as Difference Makers: How Large Language Models Determine Causal Structure in Text

📄 Abstract - Words as Difference Makers: How Large Language Models Determine Causal Structure in Text

Because large language models (LLMs) are impressively successful in predicting text, it appears that they must have access to a 'world model' representing causal and definitional structure. However, the dominant formalisms of modern causal inference -- Judea Pearl's interventionist approach and the Neyman-Rubin potential outcomes framework -- struggle to illuminate how LLMs learn causal structure. I resolve this puzzle by arguing that LLMs employ a specific inductive approach based on a difference-making logic -- sometimes called variational induction. I demonstrate how central aspects of this logic are realized during training, where LLMs require enormous amounts of text data from a wide range of contexts to identify difference- and indifference-makers within word sequences. Furthermore, I analyze specific architectural features of LLMs -- such as token embeddings and self-attention -- to determine their roles in variational induction. The difference-making logic of LLMs fundamentally parallels the experimental method, where causal relations are derived by systematically varying individual circumstances to determine their influence on a phenomenon.

词语作为差异制造者：大型语言模型如何从文本中推断因果结构 / Words as Difference Makers: How Large Language Models Determine Causal Structure in Text

1️⃣ 一句话总结

本文解释了大型语言模型（如GPT）为何能理解文本中的因果关系：它们通过在海量不同语境数据中学习词语间“是否带来差异”的模式（即变分归纳），逐步识别出哪些词语是结果的关键影响因素，其原理类似于科学家通过控制变量实验来发现因果规律。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要