菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-02
📄 Abstract - Re-Ranking Through an Attribution Lens for Citation Quality in Legal QA

Retrieval-augmented generation systems for legal question answering typically retrieve passages based on semantic similarity and provide them to a language model, which then generates cited answers. Prior work assumes that highly ranked passages are most likely to be usefully cited by the model. Perturbation-based attribution methods, such as C-LIME, have been used exclusively for post-hoc explanation. However, on the AQuAECHR benchmark, semantic similarity does not correlate with passage attribution. Within a retriever's candidate pool, similarity-based ranking performs worse than random selection at surfacing gold citation paragraphs. To address this limitation, a lightweight cross-encoder is trained on continuous perturbation-based attribution scores to re-rank passages prior to generation. This approach is evaluated on the AQuAECHR benchmark, using two language models and five-fold cross-validation. The re-ranker substantially improves citation faithfulness and alignment with gold expert answers. Notably, two re-rankers trained independently on different models converge beyond their raw attribution agreement. This finding indicates that the cross-encoder reduces model-specific noise and produces a shared relevance signal that partially transfers across models, although same-model re-ranking remains more effective. These results demonstrate that perturbation-based attribution provides a practical, model-agnostic training signal for citation-aware retrieval.

顶级标签: llm natural language processing
详细标签: retrieval-augmented generation legal question answering citation quality re-ranking attribution 或 搜索:

基于归因视角的重排序:提升法律问答中的引文质量 / Re-Ranking Through an Attribution Lens for Citation Quality in Legal QA


1️⃣ 一句话总结

该论文发现传统基于语义相似度的检索在法律问答中并不能有效找到被引用的重要段落,于是提出用轻量级模型学习段落归因分数来重新排序,从而显著提升模型生成答案的引文准确性和与专家答案的一致性。

源自 arXiv: 2606.03728