菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-14
📄 Abstract - Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning

Spreadsheets are central to real-world applications such as enterprise reporting, auditing, and scientific data management. Despite their ubiquity, existing large language model based approaches typically treat tables as plain text, overlooking critical layout cues and visual semantics. Moreover, real-world spreadsheets are often massive in scale, exceeding the input length that LLMs can efficiently process. To address these challenges, we propose SpreadsheetAgent, a two-stage multi-agent framework for spreadsheet understanding that adopts a step-by-step reading and reasoning paradigm. Instead of loading the entire spreadsheet at once, SpreadsheetAgent incrementally interprets localized regions through multiple modalities, including code execution results, images, and LaTeX tables. The method first constructs a structural sketch and row/column summaries, and then performs task-driven reasoning over this intermediate representation in the Solving Stage. To further enhance reliability, we design a verification module that validates extracted structures via targeted inspections, reducing error propagation and ensuring trustworthy inputs for downstream reasoning. Extensive experiments on two spreadsheet datasets demonstrate the effectiveness of our approach. With GPT-OSS-120B, SpreadsheetAgent achieves 38.16% on Spreadsheet Bench, outperforming the ChatGPT Agent baseline (35.27%) by 2.89 absolute points. These results highlight the potential of SpreadsheetAgent to advance robust and scalable spreadsheet understanding in real-world applications. Code is available at this https URL.

顶级标签: llm agents systems
详细标签: spreadsheet understanding multi-agent framework multi-modal reasoning document ai long-context processing 或 搜索:

迈向稳健的电子表格理解:基于多智能体多格式推理的方法 / Towards Robust Real-World Spreadsheet Understanding with Multi-Agent Multi-Format Reasoning


1️⃣ 一句话总结

这篇论文提出了一个名为SpreadsheetAgent的两阶段多智能体框架,通过分步、多模态(如代码、图像、表格)的渐进式读取与推理方法,有效解决了大语言模型在处理大规模、复杂布局的真实世界电子表格时面临的输入长度限制和视觉语义缺失问题,从而显著提升了表格理解的准确性和可靠性。

源自 arXiv: 2604.12282