菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-08
📄 Abstract - Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection

Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range of human biases. Behavioral biases can lead to instability and uncertainty in decision-making, particularly when processing financial information. However, existing research on LLM bias has mainly focused on direct questioning or simplified, general-purpose settings, with limited consideration of the complex real-world financial environments and high-risk, context-sensitive, multilingual financial misinformation detection tasks (\mfmd). In this work, we propose \mfmdscen, a comprehensive benchmark for evaluating behavioral biases of LLMs in \mfmd across diverse economic scenarios. In collaboration with financial experts, we construct three types of complex financial scenarios: (i) role- and personality-based, (ii) role- and region-based, and (iii) role-based scenarios incorporating ethnicity and religious beliefs. We further develop a multilingual financial misinformation dataset covering English, Chinese, Greek, and Bengali. By integrating these scenarios with misinformation claims, \mfmdscen enables a systematic evaluation of 22 mainstream LLMs. Our findings reveal that pronounced behavioral biases persist across both commercial and open-source models. This project will be available at this https URL.

顶级标签: llm benchmark financial
详细标签: behavioral bias multilingual misinformation scenario evaluation financial scenarios model bias 或 搜索:

相同声明,不同判断:多语言金融虚假信息检测中情境诱导偏见的基准测试 / Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection


1️⃣ 一句话总结

这篇论文创建了一个名为MFMDScen的基准测试,用于系统评估主流大语言模型在复杂金融情境下进行多语言虚假信息检测时,是否会产生因角色、地域、文化背景等不同而导致的判断偏见。

源自 arXiv: 2601.05403