菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-12
📄 Abstract - Reconnecting Fragmented Citation Networks with Semantic Augmentation

Citation graphs are fundamental tools for modeling scientific structure, but are often fragmented due to missing citations of scientifically connected articles. To address this issue, we propose a computationally efficient hybrid framework integrating citation topology with large language model (LLM)-based text similarity. Using 662,369 Web of Science publications in Mathematics and Operations Research & Management Science, we augment the original graph by adding semantic edges from small, disconnected components and weighting existing citations according to textual similarity. Semantic augmentation substantially reduces fragmentation while preserving disciplinary homogeneity. Compared to embedding-only clustering, cluster detection on augmented graphs using the Leiden algorithm retains structural interpretability while offering multi-scale organization. The method scales efficiently to large datasets and offers a practical strategy for strengthening citation-based indicators without collapsing disciplinary boundaries.

顶级标签: systems llm data
详细标签: citation networks semantic augmentation graph fragmentation clustering leiden algorithm 或 搜索:

通过语义增强重新连接碎片化的引文网络 / Reconnecting Fragmented Citation Networks with Semantic Augmentation


1️⃣ 一句话总结

本文提出了一种结合引文关系和大型语言模型文本相似度的混合方法,通过为孤立的小型文献组添加语义连接并调整现有引文权重,有效修复了引文网络的碎片化问题,在保持学科独立性的同时提升了网络的结构完整性和分析效果。

源自 arXiv: 2605.12263