菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-16
📄 Abstract - 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models

Several data warehouse and database providers have recently introduced extensions to SQL called AI Queries, enabling users to specify functions and conditions in SQL that are evaluated by LLMs, thereby broadening significantly the kinds of queries one can express over the combination of structured and unstructured data. LLMs offer remarkable semantic reasoning capabilities, making them an essential tool for complex and nuanced queries that blend structured and unstructured data. While extremely powerful, these AI queries can become prohibitively costly when invoked thousands of times. This paper provides an extensive evaluation of a recent AI query approximation approach that enables low cost analytics and database applications to benefit from AI queries. The approach delivers >100x cost and latency reduction for the semantic filter ($this http URL$) operator and also important gains for semantic ranking ($this http URL$). The cost and performance gains come from utilizing cheap and accurate proxy models over embedding vectors. We show that despite the massive gains in latency and cost, these proxy models preserve accuracy and occasionally improve accuracy across various benchmark datasets, including the extended Amazon reviews benchmark that has 10M rows. We present an OLAP-friendly architecture within Google BigQuery for this approach for purely online (ad hoc) queries, and a low-latency HTAP database-friendly architecture in AlloyDB that could further improve the latency by moving the proxy model training offline. We present techniques that accelerate the proxy model training.

顶级标签: systems model evaluation machine learning
详细标签: query approximation proxy models cost reduction latency optimization sql extensions 或 搜索:

百倍成本与延迟降低:使用轻量级代理模型进行AI查询近似的性能分析 / 100x Cost & Latency Reduction: Performance Analysis of AI Query Approximation using Lightweight Proxy Models


1️⃣ 一句话总结

这篇论文提出了一种利用轻量级代理模型来近似执行昂贵AI查询的新方法,能在保持甚至偶尔提升准确性的前提下,将语义过滤和排序等操作的执行成本和延迟降低超过100倍,从而让更多数据分析应用能够经济高效地利用大型语言模型的强大语义理解能力。

源自 arXiv: 2603.15970