SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning

📄 Abstract - SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning

English financial NLP has progressed rapidly through benchmarks for sentiment, document understanding, and financial question answering, while Arabic financial NLP remains comparatively under-explored despite strong practical demand for trustworthy finance and Islamic-finance assistants. We introduce SAHM, a document-grounded benchmark and instruction-tuning dataset for Arabic financial NLP and Shari'ah-compliant reasoning. SAHM contains 14,380 expert-verified instances spanning seven tasks: AAOIFI standards QA, fatwa-based QA/MCQ, accounting and business exams, financial sentiment analysis, extractive summarization, and event-cause reasoning, curated from authentic regulatory, juristic, and corporate sources. We evaluate 19 strong open and proprietary LLMs using task-specific metrics and rubric-based scoring for open-ended outputs, and find that Arabic fluency does not reliably translate to evidence-grounded financial reasoning: models are substantially stronger on recognition-style tasks than on generation and causal reasoning, with the largest gaps on event-cause reasoning. We release the benchmark, evaluation framework, and an instruction-tuned model to support future research on trustworthy Arabic financial NLP.

SAHM：一个面向阿拉伯语金融及伊斯兰教法合规推理的基准数据集 / SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning

1️⃣ 一句话总结

该论文构建了首个针对阿拉伯语金融和伊斯兰教法合规推理的多任务基准数据集SAHM，包含1.4万条经专家验证的样本，并测试了19个大型语言模型，发现模型虽擅长候选识别，但在生成和因果推理任务上表现薄弱，尤其在事件因果推理方面差距最大。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要