菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-20
📄 Abstract - DocQAC: Adaptive Trie-Guided Decoding for Effective In-Document Query Auto-Completion

Query auto-completion (QAC) has been widely studied in the context of web search, yet remains underexplored for in-document search, which we term DocQAC. DocQAC aims to enhance search productivity within long documents by helping users craft faster, more precise queries, even for complex or hard-to-spell terms. While global historical queries are available to both WebQAC and DocQAC, DocQAC uniquely accesses document-specific context, including the current document's content and its specific history of user query interactions. To address this setting, we propose a novel adaptive trie-guided decoding framework that uses user query prefixes to softly steer language models toward high-quality completions. Our approach introduces an adaptive penalty mechanism with tunable hyperparameters, enabling a principled trade-off between model confidence and trie-based guidance. To efficiently incorporate document context, we explore retrieval-augmented generation (RAG) and lightweight contextual document signals such as titles, keyphrases, and summaries. When applied to encoder-decoder models like T5 and BART, our trie-guided framework outperforms strong baselines and even surpasses much larger instruction-tuned models such as LLaMA-3 and Phi-3 on seen queries across both seen and unseen documents. This demonstrates its practicality for real-world DocQAC deployments, where efficiency and scalability are critical. We evaluate our method on a newly introduced DocQAC benchmark derived from ORCAS, enriched with query-document pairs. We make both the DocQAC dataset (this https URL) and code (this https URL) publicly available.

顶级标签: natural language processing llm systems
详细标签: query auto-completion in-document search trie-guided decoding retrieval-augmented generation adaptive decoding 或 搜索:

DocQAC:面向文档内查询自动补全的自适应字典树引导解码方法 / DocQAC: Adaptive Trie-Guided Decoding for Effective In-Document Query Auto-Completion


1️⃣ 一句话总结

这篇论文提出了一种新颖的、基于自适应字典树引导解码的智能方法,专门用于帮助用户在阅读长文档时更快速、更准确地自动补全搜索词,其核心在于巧妙地平衡了语言模型的预测能力与文档特定信息的引导,并在效率和效果上超越了更大的通用模型。

源自 arXiv: 2604.18257