菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-09
📄 Abstract - Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization

Abstractive summarization aims to generate concise summaries by creating new sentences, allowing for flexible rephrasing. However, this approach can be vulnerable to inaccuracies, particularly `hallucinations' where the model introduces non-existent information. In this paper, we leverage the use of multimodal and multilingual sentence embeddings derived from pretrained models such as LaBSE, SONAR, and BGE-M3, and feed them into a modified BART-based French model. A Named Entity Injection mechanism that appends tokenized named entities to the decoder input is introduced, in order to improve the factual consistency of the generated summary. Our novel framework, SBARThez, is applicable to both text and speech inputs and supports cross-lingual summarization; it shows competitive performance relative to token-level baselines, especially for low-resource languages, while generating more concise and abstract summaries.

顶级标签: natural language processing multi-modal model training
详细标签: abstractive summarization multilingual embeddings factual consistency cross-lingual summarization low-resource languages 或 搜索:

利用多模态与语言无关的句子嵌入进行抽象摘要生成 / Using Multimodal and Language-Agnostic Sentence Embeddings for Abstractive Summarization


1️⃣ 一句话总结

这篇论文提出了一个名为SBARThez的新框架,它利用多模态和多语言句子嵌入,并结合命名实体注入机制,来生成更准确、简洁且支持跨语言和语音输入的文本摘要,尤其在低资源语言上表现优异。

源自 arXiv: 2603.08282