土耳其语中来源敏感推理的基准测试:证据信任操纵下的人类与大型语言模型 / Benchmarking Source-Sensitive Reasoning in Turkish: Humans and LLMs under Evidential Trust Manipulation
1️⃣ 一句话总结
本研究通过实验发现,土耳其语母语者会根据信息来源的可信度,在两种过去时态后缀(-DI和-mIs)之间做出不同选择,而大型语言模型(LLM)在这类基于证据信任的推理中表现不稳定,与人类存在明显差距。
This paper investigates whether source trustworthiness shapes Turkish evidential morphology and whether large language models (LLMs) track this sensitivity. We study the past-domain contrast between -DI and -mIs in controlled cloze contexts where the information source is overtly external, while only its perceived reliability is manipulated (High-Trust vs. Low-Trust). In a human production experiment, native speakers of Turkish show a robust trust effect: High-Trust contexts yield relatively more -DI, whereas Low-Trust contexts yield relatively more -mIs, with the pattern remaining stable across sensitivity analyses. We then evaluate 10 LLMs in three prompting paradigms (open gap-fill, explicit past-tense gap-fill, and forced-choice A/B selection). LLM behavior is highly model- and prompt-dependent: some models show weak or local trust-consistent shifts, but effects are generally unstable, often reversed, and frequently overshadowed by output-compliance problems and strong base-rate suffix preferences. The results provide new evidence for a trust-/commitment-based account of Turkish evidentiality and reveal a clear human-LLM gap in source-sensitive evidential reasoning.
土耳其语中来源敏感推理的基准测试:证据信任操纵下的人类与大型语言模型 / Benchmarking Source-Sensitive Reasoning in Turkish: Humans and LLMs under Evidential Trust Manipulation
本研究通过实验发现,土耳其语母语者会根据信息来源的可信度,在两种过去时态后缀(-DI和-mIs)之间做出不同选择,而大型语言模型(LLM)在这类基于证据信任的推理中表现不稳定,与人类存在明显差距。
源自 arXiv: 2604.24665