菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-08
📄 Abstract - START: Spatial and Textual Learning for Chart Understanding

Chart understanding is crucial for deploying multimodal large language models (MLLMs) in real-world scenarios such as analyzing scientific papers and technical reports. Unlike natural images, charts pair a structured visual layout (spatial property) with an underlying data representation (textual property) -- grasping both is essential for precise, fine-grained chart reasoning. Motivated by this observation, we propose START, the Spatial and Textual learning for chART understanding. Specifically, we introduce (i) chart-element grounding and (ii) chart-to-code generation to strengthen an MLLM's understanding of both chart visual layout and data details. To facilitate spatial and textual learning, we propose the START-Dataset generated with a novel data-generation pipeline that first leverages an MLLM to translate real chart images into executable chart code, recovering the underlying data representation while preserving the visual distribution of real-world charts. We then evolve the code with a Large Language Model (LLM) to ascertain the positions of chart elements that capture the chart's visual structure, addressing challenges that existing methods cannot handle. To evaluate a model's ability to understand chart spatial structures, we propose the Chart Spatial understanding Benchmark (CS-Bench), filling a critical gap in comprehensive chart understanding evaluation. Leveraging spatial and textual learning, START delivers consistent gains across model sizes and benchmarks over the base models and surpasses prior state-of-the-art by a clear margin. Code, data and models will be publicly available.

顶级标签: multi-modal model training model evaluation
详细标签: chart understanding multimodal llm spatial reasoning code generation benchmark 或 搜索:

START:用于图表理解的空间与文本学习 / START: Spatial and Textual Learning for Chart Understanding


1️⃣ 一句话总结

这篇论文提出了一个名为START的新方法,通过同时学习图表的视觉空间布局和底层数据文本信息,显著提升了多模态大语言模型对图表的理解能力,并在新构建的基准测试上取得了领先性能。


源自 arXiv: 2512.07186