菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-23
📄 Abstract - LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context

While the validity of LLMs' use in the legal context remains subject to ethical and legal debate, legal professionals are already experimenting with personal LLMs, if only for translation and reformulation. However, even such a seemingly innocuous use can introduce biases through case processing speed if LLM assistants selectively refuse assistance on certain topics. To better anticipate such biases, we investigate several modern small LLMs that are most likely to be used as on-device assistants, to assess the impact of overrefusal on legal prompts. Surprisingly, we find that authority-style prefixes (``you are acting as an assistant of the national supreme court'', ``[...] defense lawyer'') systematically increase refusal rates by 2--20x over the no-prefix baseline, while a known role-play jailbreak prefix shows mixed effects, sharply increasing refusals in some models and barely shifting them in others. The finding suggests that small on-prem deployable LLMs are unstable under contextual framings that a real institutional user might naturally introduce, and further investigation is essential to minimize opportunities for bias.

顶级标签: llm model evaluation behavior
详细标签: overrefusal legal bias role-play small language models safety 或 搜索:

针对法律语境提示的大语言模型过度拒绝:小型本地大语言模型在刑事法律语境中的表现 / LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context


1️⃣ 一句话总结

该论文研究了小型本地部署大语言模型在刑事法律场景下的“过度拒绝”现象,发现添加如“你正在担任国家最高法院助理”这类权威身份提示词,会使模型拒绝提供帮助的概率比无提示时增加2到20倍,而角色扮演式的“越狱”提示效果则因模型而异,这表明法律专业人士日常使用这些模型时可能无意中引入偏见。

源自 arXiv: 2606.24585