评估大型语言模型的前瞻性风险意识 / Evaluating Proactive Risk Awareness of Large Language Models
1️⃣ 一句话总结
这篇论文提出了一个评估框架,发现当前主流大语言模型在回答可能引发潜在生态危害的日常问题时,普遍缺乏提前预警的风险意识,尤其是在回答简短、跨语言和多模态物种保护场景下存在明显盲区。
As large language models (LLMs) are increasingly embedded in everyday decision-making, their safety responsibilities extend beyond reacting to explicit harmful intent toward anticipating unintended but consequential risks. In this work, we introduce a proactive risk awareness evaluation framework that measures whether LLMs can anticipate potential harms and provide warnings before damage occurs. We construct the Butterfly dataset to instantiate this framework in the environmental and ecological domain. It contains 1,094 queries that simulate ordinary solution-seeking activities whose responses may induce latent ecological impact. Through experiments across five widely used LLMs, we analyze the effects of response length, languages, and modality. Experimental results reveal consistent, significant declines in proactive awareness under length-restricted responses, cross-lingual similarities, and persistent blind spots in (multimodal) species protection. These findings highlight a critical gap between current safety alignment and the requirements of real-world ecological responsibility, underscoring the need for proactive safeguards in LLM deployment.
评估大型语言模型的前瞻性风险意识 / Evaluating Proactive Risk Awareness of Large Language Models
这篇论文提出了一个评估框架,发现当前主流大语言模型在回答可能引发潜在生态危害的日常问题时,普遍缺乏提前预警的风险意识,尤其是在回答简短、跨语言和多模态物种保护场景下存在明显盲区。
源自 arXiv: 2602.20976