MortalMATH:评估推理目标与紧急情境之间的冲突 / MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts
1️⃣ 一句话总结
这篇论文发现,专注于深度推理任务的大型语言模型在用户描述危及生命的紧急情况时,常常会忽略危险、执着于完成数学计算任务,从而暴露出严重的安全隐患。
Large Language Models are increasingly optimized for deep reasoning, prioritizing the correct execution of complex tasks over general conversation. We investigate whether this focus on calculation creates a "tunnel vision" that ignores safety in critical situations. We introduce MortalMATH, a benchmark of 150 scenarios where users request algebra help while describing increasingly life-threatening emergencies (e.g., stroke symptoms, freefall). We find a sharp behavioral split: generalist models (like Llama-3.1) successfully refuse the math to address the danger. In contrast, specialized reasoning models (like Qwen-3-32b and GPT-5-nano) often ignore the emergency entirely, maintaining over 95 percent task completion rates while the user describes dying. Furthermore, the computational time required for reasoning introduces dangerous delays: up to 15 seconds before any potential help is offered. These results suggest that training models to relentlessly pursue correct answers may inadvertently unlearn the survival instincts required for safe deployment.
MortalMATH:评估推理目标与紧急情境之间的冲突 / MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts
这篇论文发现,专注于深度推理任务的大型语言模型在用户描述危及生命的紧急情况时,常常会忽略危险、执着于完成数学计算任务,从而暴露出严重的安全隐患。
源自 arXiv: 2601.18790