菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - Text summarization via global structure awareness

Text summarization is a fundamental task in natural language processing (NLP), and the information explosion has made long-document processing increasingly demanding, making summarization essential. Existing research mainly focuses on model improvements and sentence-level pruning, but often overlooks global structure, leading to disrupted coherence and weakened downstream performance. Some studies employ large language models (LLMs), which achieve higher accuracy but incur substantial resource and time costs. To address these issues, we introduce GloSA-sum, the first summarization approach that achieves global structure awareness via topological data analysis (TDA). GloSA-sum summarizes text efficiently while preserving semantic cores and logical dependencies. Specifically, we construct a semantic-weighted graph from sentence embeddings, where persistent homology identifies core semantics and logical structures, preserved in a ``protection pool'' as the backbone for summarization. We design a topology-guided iterative strategy, where lightweight proxy metrics approximate sentence importance to avoid repeated high-cost computations, thus preserving structural integrity while improving efficiency. To further enhance long-text processing, we propose a hierarchical strategy that integrates segment-level and global summarization. Experiments on multiple datasets demonstrate that GloSA-sum reduces redundancy while preserving semantic and logical integrity, striking a balance between accuracy and efficiency, and further benefits LLM downstream tasks by shortening contexts while retaining essential reasoning chains.

顶级标签: natural language processing llm model evaluation
详细标签: text summarization topological data analysis long-document processing semantic graph persistent homology 或 搜索:

基于全局结构感知的文本摘要生成 / Text summarization via global structure awareness


1️⃣ 一句话总结

这篇论文提出了一种名为GloSA-sum的新方法,它利用拓扑数据分析来捕捉文本的全局语义和逻辑结构,从而在生成摘要时既能高效压缩文本,又能保持核心信息和推理链条的完整性,在准确性和效率之间取得了良好平衡。

源自 arXiv: 2602.09821