菜单

🤖 系统
📄 Abstract - TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data

Complex reasoning over tabular data is crucial in real-world data analysis, yet large language models (LLMs) often underperform due to complex queries, noisy data, and limited numerical capabilities. To address these issues, we propose TabDSR, a framework consisting of: (1) a query decomposer that breaks down complex questions, (2) a table sanitizer that cleans and filters noisy tables, and (3) a program-of-thoughts (PoT)-based reasoner that generates executable code to derive the final answer from the sanitized table. To ensure unbiased evaluation and mitigate data leakage, we introduce a new dataset, CalTab151, specifically designed for complex numerical reasoning over tables. Experimental results demonstrate that TabDSR consistently outperforms existing methods, achieving state-of-the-art (SOTA) performance with 8.79%, 6.08%, and 19.87% accuracy improvement on TAT-QA, TableBench, and TabDSR, respectively. Moreover, our framework integrates seamlessly with mainstream LLMs, providing a robust solution for complex tabular numerical reasoning. These findings highlight the effectiveness of our framework in enhancing LLM performance for complex tabular numerical reasoning. Data and code are available upon request.

顶级标签: llm natural language processing data
详细标签: tabular reasoning numerical reasoning query decomposition table sanitization program-of-thoughts 或 搜索:

📄 论文总结

TabDSR:针对表格数据复杂数值推理的分解、清理与推理框架 / TabDSR: Decompose, Sanitize, and Reason for Complex Numerical Reasoning in Tabular Data


1️⃣ 一句话总结

本文提出了一种名为TabDSR的三步框架,通过分解复杂问题、清理表格噪声并生成可执行代码来提升大语言模型在表格数据复杂数值推理任务中的准确率,并在新构建的数据集上验证了其有效性。


📄 打开原文 PDF