菜单

关于 🐙 GitHub
arXiv 提交日期: 2025-10-14
📄 Abstract - RAG-Anything: All-in-One RAG Framework

Retrieval-Augmented Generation (RAG) has emerged as a fundamental paradigm for expanding Large Language Models beyond their static training limitations. However, a critical misalignment exists between current RAG capabilities and real-world information environments. Modern knowledge repositories are inherently multimodal, containing rich combinations of textual content, visual elements, structured tables, and mathematical expressions. Yet existing RAG frameworks are limited to textual content, creating fundamental gaps when processing multimodal documents. We present RAG-Anything, a unified framework that enables comprehensive knowledge retrieval across all modalities. Our approach reconceptualizes multimodal content as interconnected knowledge entities rather than isolated data types. The framework introduces dual-graph construction to capture both cross-modal relationships and textual semantics within a unified representation. We develop cross-modal hybrid retrieval that combines structural knowledge navigation with semantic matching. This enables effective reasoning over heterogeneous content where relevant evidence spans multiple modalities. RAG-Anything demonstrates superior performance on challenging multimodal benchmarks, achieving significant improvements over state-of-the-art methods. Performance gains become particularly pronounced on long documents where traditional approaches fail. Our framework establishes a new paradigm for multimodal knowledge access, eliminating the architectural fragmentation that constrains current systems. Our framework is open-sourced at: this https URL.

顶级标签: multi-modal natural language processing systems
详细标签: retrieval-augmented generation multimodal knowledge graph cross-modal retrieval long document qa unified framework 或 搜索:

RAG-Anything:一个统一的多模态检索增强生成框架 / RAG-Anything: All-in-One RAG Framework


1️⃣ 一句话总结

本文提出了RAG-Anything框架,旨在通过统一的知识表示和跨模态混合检索机制,解决传统文本RAG系统在处理包含图像、表格、数学公式等异构多模态真实世界知识库时的根本性局限。


2️⃣ 论文创新点

1. 多模态知识统一与双图构建

2. 跨模态混合检索机制

3. 从检索到合成的系统化过程


3️⃣ 主要结果与价值

结果亮点

实际价值


4️⃣ 术语表

源自 arXiv: 2510.12323