数据流控制:面向AI代理的数据安全策略 / Data Flow Control: Data Safety Policies for AI Agents
1️⃣ 一句话总结
本文提出了一种名为数据流控制(DFC)的框架,能够直接在数据库查询系统中自动执行复杂的合规性规则(如隐私和商业约束),无需手动检查或大量计算开销,从而为AI代理在数据处理过程中提供内置的安全保障。
Agents increasingly generate SQL, orchestrate pipelines, and automate data analysis on behalf of users. While recent work improves query correctness, correctness is not safety. A query may be semantically valid yet violate regulatory, privacy, or business constraints that govern how data may be combined and released. We argue that enforcing such constraints is fundamentally a data infrastructure problem. This paper introduces Data Flow Control (DFC), a framework to declaratively specify and guarantee policy enforcement over tuple-level data flows within a DBMS query. A key challenge is defining a policy language that is optimizer-invariant yet efficient to enforce at scale. We formalize data safety as aggregate predicates over provenance monomials and present Passant, a portable query rewriting layer that enforces DFC policies without materializing provenance. Across five DBMS engines -- DuckDB, Umbra, PostgreSQL, DataFusion, and SQLServer -- Passant achieves ~0% overhead and outperforms alternatives by orders of magnitude. As a result, Data Flow Control is the first step towards moving data safety from prompts and post-hoc checks into the data infrastructure. Data Flow Control is available open source at this https URL.
数据流控制:面向AI代理的数据安全策略 / Data Flow Control: Data Safety Policies for AI Agents
本文提出了一种名为数据流控制(DFC)的框架,能够直接在数据库查询系统中自动执行复杂的合规性规则(如隐私和商业约束),无需手动检查或大量计算开销,从而为AI代理在数据处理过程中提供内置的安全保障。
源自 arXiv: 2606.05679