Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

📄 Abstract - Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

An extraction schema should not reduce knowledge graph fidelity. On statistical CSV, however, it can. We study country-by-year time-series matrices, a common layout on open-data portals. In this setting, serialization format and schema constraints interact super-additively. Their joint effect exceeds the sum of independent effects by up to +1.180 (2x2 factorial, 6 datasets). Bootstrap 95% CIs are strictly positive on 4/6 datasets, with strongest evidence on wide Type-II matrices. More critically, a schema applied to a mismatched format can trigger catastrophic mismatch. Fact coverage falls below the unconstrained baseline on 4/6 datasets through entity inflation or extraction refusal. We call this observed pattern format-constraint coupling. Probing and token ablation support a surface-form anchoring explanation centred on column-name references. Controlled variants across format-schema pairings, GraphRAG hosts, and LLM families show the same direction within the measured scope; one LLM family shows only partial activation. The observation also has a diagnostic consequence. Three standard retrieval modes largely mask construction quality (delta <= 1pp), whereas direct graph access exposes gaps up to +47.6pp (p < 0.0001). To support fidelity-aware evaluation, we release CSVFidelity-Bench. It contains 15 datasets, 11 Type-II matrices, 4 Type-III tables, and 1,892 Gold Standard facts across 6 domains.

统计表格知识图谱构建中的格式-约束耦合问题 / Format-Constraint Coupling in Knowledge Graph Construction from Statistical Tables

1️⃣ 一句话总结

本文发现，在从统计表格（如CSV格式的时序数据）构建知识图谱时，数据格式与提取约束之间存在超加性交互效应，即两者共同作用会导致事实覆盖率严重下降（最多可达47.6个百分点），远超过各自单独影响的总和，作者称之为“格式-约束耦合”，并为此发布了专门用于评估该问题的基准测试集CSVFidelity-Bench。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要