菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-12
📄 Abstract - CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

Current large language models (LLMs), even those explicitly trained for reasoning, often struggle with ambiguous content moderation cases due to misleading "decision shortcuts" embedded in context. Inspired by cognitive psychology insights into expert moderation, we introduce \caro (Chain-of-Analogy Reasoning Optimization), a novel two-stage training framework to induce robust analogical reasoning in LLMs. First, \caro bootstraps analogical reasoning chains via retrieval-augmented generation (RAG) on moderation data and performs supervised fine-tuning (SFT). Second, we propose a customized direct preference optimization (DPO) approach to reinforce analogical reasoning behaviors explicitly. Unlike static retrieval methods, \caro dynamically generates tailored analogical references during inference, effectively mitigating harmful decision shortcuts. Extensive experiments demonstrate that \caro substantially outperforms state-of-the-art reasoning models (DeepSeek R1, QwQ), specialized moderation models (LLaMA Guard), and advanced fine-tuning and retrieval-augmented methods, achieving an average F1 score improvement of 24.9\% on challenging ambiguous moderation benchmarks.

顶级标签: llm natural language processing model training
详细标签: content moderation reasoning optimization analogical reasoning chain-of-analogy preference optimization 或 搜索:

CARO:基于类比推理链优化的鲁棒内容审核方法 / CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation


1️⃣ 一句话总结

这篇论文提出了一种名为CARO的新训练框架,通过模仿人类专家的类比推理过程,帮助大型语言模型更准确地处理模糊不清的内容审核任务,从而显著提升其判断能力。

源自 arXiv: 2604.10504