菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-23
📄 Abstract - How Retrieved Context Shapes Internal Representations in RAG

Retrieval-augmented generation (RAG) enhances large language models (LLMs) by conditioning generation on retrieved external documents, but the effect of retrieved context is often non-trivial. In realistic retrieval settings, the retrieved document set often contains a mixture of documents that vary in relevance and usefulness. While prior work has largely examined these phenomena through output behavior, little is known about how retrieved context shapes the internal representations that mediate information integration in RAG. In this work, we study RAG through the lens of latent representations. We systematically analyze how different types of retrieved documents affect the hidden states of LLMs, and how these internal representation shifts relate to downstream generation behavior. Across four question-answering datasets and three LLMs, we analyze internal representations under controlled single- and multi-document settings. Our results reveal how context relevancy and layer-wise processing influence internal representations, providing explanations on LLMs output behaviors and insights for RAG system design.

顶级标签: llm natural language processing model evaluation
详细标签: retrieval-augmented generation internal representations context relevance representation analysis question answering 或 搜索:

检索到的上下文如何塑造RAG中的内部表征 / How Retrieved Context Shapes Internal Representations in RAG


1️⃣ 一句话总结

这篇论文通过分析大语言模型在处理不同相关性的检索文档时内部表征的变化,揭示了检索增强生成(RAG)系统内部如何整合信息,从而解释其输出行为并为系统设计提供新见解。

源自 arXiv: 2602.20091