📄
Abstract - Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs
We show how causal interventions in Transformer models provide insights into English syntax by focusing on a long-standing challenge for syntactic theory: syntactic islands. Extraction from coordinated verb phrases is often degraded, yet acceptability varies gradiently with lexical content (e.g., "I know what he hates art and loves" vs. "I know what he looked down and saw"). We show that modern Transformer language models replicate human judgments across this gradient. Using causal interventions that isolate functionally relevant subspaces in Transformer blocks, attention modules, and MLPs, we demonstrate that extraction from coordination islands engages the same filler-gap mechanisms as canonical wh-dependencies, but that these mechanisms are selectively blocked to varying degrees. By projecting a large corpus of unrelated text onto these causally identified subspaces, we derive a novel linguistic hypothesis: the conjunction "and" is represented differently in extractable versus non-extractable constructions, corresponding to expressions encoding relational dependencies versus purely conjunctive uses. These results illustrate how mechanistic interpretability can inform syntax, generating new hypotheses about linguistic representation and processing.
因果桥梁:探究Transformer语言模型中句法孤岛梯度阻塞的特征 /
Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs
1️⃣ 一句话总结
这篇论文通过分析Transformer语言模型如何像人类一样对英语中不同难度的‘句法孤岛’结构做出判断,揭示了模型内部处理这类复杂句法时存在选择性‘阻塞’机制,并由此提出了关于连词‘and’在不同结构中具有不同语言学表征的新假设。