📄
Abstract - Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering
Retrieval-Augmented Generation (RAG) systems for question answering typically retrieve evidence by semantic similarity between the query and document chunks. While effective for unstructured text, this approach is less reliable on semi-structured corpora where answering may require exact filtering, aggregation, or exhaustive retrieval over structured attributes across multiple documents. Symbolic approaches support such operations, but they are often brittle on noisy natural-language corpora. We address this gap with DualGraph, a RAG framework that represents documents through two complementary views: a Textual Knowledge Graph for semantic retrieval and a Symbolic Knowledge Graph for symbolic querying over typed subject--predicate--object triples. Building on these two components, we provide multiple strategies for selecting or combining semantic and symbolic this http URL also introduce SpecsQA, a benchmark from a commercial shopping website with semi-structured product documents and manually curated questions spanning open-ended and specification-oriented retrieval. Experiments show that DualGraph consistently outperforms state-of-the-art dense-retrieval, GraphRAG, symbolic, and table-oriented baselines across question this http URL and data are available at this https URL.
用符号查询还是语义检索?面向半结构化问答的数据集与方法 /
Query Symbolically or Retrieve Semantically? A Dataset and Method for Semi-Structured Question Answering
1️⃣ 一句话总结
本文提出DualGraph框架,通过同时构建文本知识图谱用于语义检索和符号知识图谱用于精确查询,解决了半结构化文档中既需要语义理解又需要结构化操作的问答难题,并发布了来自电商网站的新基准数据集SpecsQA,实验证明该方法优于现有各类基线模型。