SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

📄 Abstract - SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

Genuine spatial reasoning relies on the capacity to construct and manipulate coherent internal spatial representations, often conceptualized as mental models, rather than merely processing surface linguistic associations. While large language models exhibit advanced capabilities across various domains, existing benchmarks fail to isolate this intrinsic spatial cognition from statistical language heuristics. Furthermore, multimodal evaluations frequently conflate genuine spatial reasoning with visual perception. To systematically investigate whether models construct flexible spatial mental models, we introduce SpatialText, a theory-driven diagnostic framework. Rather than functioning simply as a dataset, SpatialText isolates text-based spatial reasoning through a dual-source methodology. It integrates human-annotated descriptions of real 3D indoor environments, which capture natural ambiguities, perspective shifts, and functional relations, with code-generated, logically precise scenes designed to probe formal spatial deduction and epistemic boundaries. Systematic evaluation across state-of-the-art models reveals fundamental representational limitations. Although models demonstrate proficiency in retrieving explicit spatial facts and operating within global, allocentric coordinate systems, they exhibit critical failures in egocentric perspective transformation and local reference frame reasoning. These systematic errors provide strong evidence that current models rely heavily on linguistic co-occurrence heuristics rather than constructing coherent, verifiable internal spatial representations. SpatialText thus serves as a rigorous instrument for diagnosing the cognitive boundaries of artificial spatial intelligence.

SpatialText：一个用于评估大语言模型空间理解能力的纯文本认知基准 / SpatialText: A Pure-Text Cognitive Benchmark for Spatial Understanding in Large Language Models

1️⃣ 一句话总结

这篇论文提出了一个名为SpatialText的纯文本基准测试，通过分析大语言模型在空间推理任务中的系统性错误，发现它们主要依赖语言关联而非构建真正的内部空间心理模型，从而揭示了当前模型在空间认知上的根本局限。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要