MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment

📄 Abstract - MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment

Vision-Language Models (VLMs) continue to struggle to make morally salient judgments in multimodal and socially ambiguous contexts. Prior works typically rely on binary or pairwise supervision, which often fail to capture the continuous and pluralistic nature of human moral reasoning. We present MM-SCALE (Multimodal Moral Scale), a large-scale dataset for aligning VLMs with human moral preferences through 5-point scalar ratings and explicit modality grounding. Each image-scenario pair is annotated with moral acceptability scores and grounded reasoning labels by humans using an interface we tailored for data collection, enabling listwise preference optimization over ranked scenario sets. By moving from discrete to scalar supervision, our framework provides richer alignment signals and finer calibration of multimodal moral reasoning. Experiments show that VLMs fine-tuned on MM-SCALE achieve higher ranking fidelity and more stable safety calibration than those trained with binary signals.

MM-SCALE：通过标量判断与列表对齐实现基于多模态的道德推理 / MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment

1️⃣ 一句话总结

这篇论文提出了一个名为MM-SCALE的大规模数据集和训练框架，通过使用人类标注的5级评分和明确的模态依据来训练视觉语言模型，使其在多模态场景下能做出更符合人类道德判断的连续、细致的推理，而不是简单的二元好坏判断。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要