菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-12-23
📄 Abstract - FaithLens: Detecting and Explaining Faithfulness Hallucination

Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-4.1 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.

顶级标签: llm model evaluation natural language processing
详细标签: faithfulness hallucination detection and explanation synthetic data reinforcement learning benchmark evaluation 或 搜索:

FaithLens:一个用于检测和解释大语言模型忠实性幻觉的高效模型 / FaithLens: Detecting and Explaining Faithfulness Hallucination


1️⃣ 一句话总结

本文提出了FaithLens,一个高效、低成本的模型,它不仅能检测大语言模型输出中的忠实性幻觉,还能同时提供相应的解释,在多个任务上超越了GPT-4o等先进模型。


2️⃣ 论文创新点

1. 联合检测与解释的FaithLens模型框架

2. 基于合成数据与强化学习的训练策略

3. 针对性的数据过滤流程

4. 基于规则的强化学习训练协议

5. 复合奖励设计


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2512.20182