菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-22
📄 Abstract - STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion

Table retrieval is the task of retrieving the most relevant tables from large-scale corpora given natural language queries. However, structural and semantic discrepancies between unstructured text and structured tables make embedding alignment particularly challenging. Recent methods such as QGpT attempt to enrich table semantics by generating synthetic queries, yet they still rely on coarse partial-table sampling and simple fusion strategies, which limit semantic diversity and hinder effective query-table alignment. We propose STAR (Semantic Table Representation), a lightweight framework that improves semantic table representation through semantic clustering and weighted fusion. STAR first applies header-aware K-means clustering to group semantically similar rows and selects representative centroid instances to construct a diverse partial table. It then generates cluster-specific synthetic queries to comprehensively cover the table's semantic space. Finally, STAR employs weighted fusion strategies to integrate table and query embeddings, enabling fine-grained semantic alignment. This design enables STAR to capture complementary information from structured and textual sources, improving the expressiveness of table representations. Experiments on five benchmarks show that STAR achieves consistently higher Recall than QGpT on all datasets, demonstrating the effectiveness of semantic clustering and adaptive weighted fusion for robust table representation. Our code is available at this https URL.

顶级标签: natural language processing data model training
详细标签: table retrieval semantic representation embedding alignment query generation weighted fusion 或 搜索:

STAR:基于表头感知聚类与自适应加权融合的语义表格表示方法 / STAR: Semantic Table Representation with Header-Aware Clustering and Adaptive Weighted Fusion


1️⃣ 一句话总结

这篇论文提出了一个名为STAR的轻量级框架,它通过智能地对表格行进行语义聚类并生成多样化的查询,再自适应地融合表格与查询信息,从而更精准地从大规模数据中检索出与自然语言问题最相关的表格。

源自 arXiv: 2601.15860