菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-06-21
📄 Abstract - How Does Research Evolve? Tracing Cross-Domain Trajectories in NLP, ML, and CV with Claim-Grounded Typed Citations

How does research evolve, and what substrate would let us forecast where it goes next? Scientific progress is not simply a uniform accumulation of facts: ideas extend prior methods, address known limitations, realize proposed future directions, and sometimes dispute earlier claims. Existing citation graphs usually collapse these roles into a single homogeneous edge type, limiting how we can analyze scientific progress. We address this gap by proposing the SciTraj corpus, the first claim-grounded typed citation graph in which each edge is linked to the specific claim sentence that motivates it. Claim-bearing sentences are extracted from paper sections; four claim-driven relations are verified by NLI entailment against in-paper context, while two similarity-only relations are gated by abstract cosine and year-gap rules. SciTraj contains 32,559 papers from NLP, ML, and Vision (2015--2024), connected by 573,126 directed edges across six relation types, with NLI-verified claim seeds. Using SciTraj, we identify disciplinary siloing in typed citation flow and topic emergence concentrated in Vision and LLM-related work. The corpus also contains 287M typed trajectories of length $\geq 3$, covering 72.8% of papers, and supports a temporally split typed link-prediction benchmark. A year-shuffle falsifiability test separates temporal structure from year-correlated content, and a 3-annotator pilot reports $\kappa = 0.74$ with 79.9% precision.

顶级标签: natural language processing computer vision machine learning
详细标签: citation graph scientific evolution benchmark cross-domain typed citations 或 搜索:

研究如何演变?——利用基于主张的带类型引文追踪自然语言处理、机器学习和计算机视觉领域的跨领域轨迹 / How Does Research Evolve? Tracing Cross-Domain Trajectories in NLP, ML, and CV with Claim-Grounded Typed Citations


1️⃣ 一句话总结

本文构建了第一个将引文按“主张关系”(如扩展、反驳、实现等)细分类别的引文图谱数据集SciTraj,覆盖2015至2024年间的三万余篇论文,并基于此揭示了不同子领域间的孤立现象及新研究主题的出现规律,为预测科学进展方向提供了新工具。

源自 arXiv: 2606.22342