菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-23
📄 Abstract - SpatialTree: How Spatial Abilities Branch Out in MLLMs

Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierarchy remains poorly understood, as most studies focus on a narrow set of tasks. We introduce SpatialTree, a cognitive-science-inspired hierarchy that organizes spatial abilities into four levels: low-level perception (L1), mental mapping (L2), simulation (L3), and agentic competence (L4). Based on this taxonomy, we construct the first capability-centric hierarchical benchmark, thoroughly evaluating mainstream MLLMs across 27 sub-abilities. The evaluation results reveal a clear structure: L1 skills are largely orthogonal, whereas higher-level skills are strongly correlated, indicating increasing interdependency. Through targeted supervised fine-tuning, we uncover a surprising transfer dynamic-negative transfer within L1, but strong cross-level transfer from low- to high-level abilities with notable synergy. Finally, we explore how to improve the entire hierarchy. We find that naive RL that encourages extensive "thinking" is unreliable: it helps complex reasoning but hurts intuitive perception. We propose a simple auto-think strategy that suppresses unnecessary deliberation, enabling RL to consistently improve performance across all levels. By building SpatialTree, we provide a proof-of-concept framework for understanding and systematically scaling spatial abilities in MLLMs.

顶级标签: multi-modal model evaluation llm
详细标签: spatial reasoning benchmark cognitive hierarchy fine-tuning reinforcement learning 或 搜索:

SpatialTree:空间能力在多模态大语言模型中的分支发展 / SpatialTree: How Spatial Abilities Branch Out in MLLMs


1️⃣ 一句话总结

这篇论文提出了一个受认知科学启发的四层次空间能力框架(SpatialTree),用于系统评估和提升多模态大语言模型的空间能力,并发现低层能力相互独立而高层能力紧密关联,同时提出了一种能抑制不必要思考的自动思考策略来全面提升模型性能。

源自 arXiv: 2512.20617