菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-08
📄 Abstract - Is Cross-Lingual Transfer in Bilingual Models Human-Like? A Study with Overlapping Word Forms in Dutch and English

Bilingual speakers show cross-lingual activation during reading, especially for words with shared surface form. Cognates (friends) typically lead to facilitation, whereas interlingual homographs (false friends) cause interference or no effect. We examine whether cross-lingual activation in bilingual language models mirrors these patterns. We train Dutch-English causal Transformers under four vocabulary-sharing conditions that manipulate whether (false) friends receive shared or language-specific embeddings. Using psycholinguistic stimuli from bilingual reading studies, we evaluate the models through surprisal and embedding similarity analyses. The models largely maintain language separation, and cross-lingual effects arise primarily when embeddings are shared. In these cases, both friends and false friends show facilitation relative to controls. Regression analyses reveal that these effects are mainly driven by frequency rather than consistency in form-meaning mapping. Only when just friends share embeddings are the qualitative patterns of bilinguals reproduced. Overall, bilingual language models capture some cross-linguistic activation effects. However, their alignment with human processing seems to critically depend on how lexical overlap is encoded, possibly limiting their explanatory adequacy as models of bilingual reading.

顶级标签: natural language processing llm model evaluation
详细标签: cross-lingual transfer bilingual models cognates psycholinguistics embedding analysis 或 搜索:

双语模型中的跨语言迁移是否类人?一项关于荷兰语和英语中重叠词形的研究 / Is Cross-Lingual Transfer in Bilingual Models Human-Like? A Study with Overlapping Word Forms in Dutch and English


1️⃣ 一句话总结

这项研究发现,双语语言模型虽然能模拟人类双语阅读中的跨语言激活效应,但其与人类处理模式的匹配程度高度依赖于词汇重叠的编码方式,特别是只有当同源词共享词嵌入时,模型才能复现人类阅读中‘朋友词’促进而‘假朋友词’干扰的典型模式。

源自 arXiv: 2604.07067