SciDef:利用大语言模型从学术文献中自动提取定义 / SciDef: Automating Definition Extraction from Academic Literature with Large Language Models
1️⃣ 一句话总结
这篇论文提出了一个名为SciDef的自动化工具,它利用大语言模型从海量学术文献中高效提取关键术语的定义,并通过实验证明多步骤提示和优化方法能显著提升提取的准确性,但同时也指出模型容易过度提取定义,未来需更关注定义的相关性筛选。
Definitions are the foundation for any scientific work, but with a significant increase in publication numbers, gathering definitions relevant to any keyword has become challenging. We therefore introduce SciDef, an LLM-based pipeline for automated definition extraction. We test SciDef on DefExtra & DefSim, novel datasets of human-extracted definitions and definition-pairs' similarity, respectively. Evaluating 16 language models across prompting strategies, we demonstrate that multi-step and DSPy-optimized prompting improve extraction performance. To evaluate extraction, we test various metrics and show that an NLI-based method yields the most reliable results. We show that LLMs are largely able to extract definitions from scientific literature (86.4% of definitions from our test-set); yet future work should focus not just on finding definitions, but on identifying relevant ones, as models tend to over-generate them. Code & datasets are available at this https URL.
SciDef:利用大语言模型从学术文献中自动提取定义 / SciDef: Automating Definition Extraction from Academic Literature with Large Language Models
这篇论文提出了一个名为SciDef的自动化工具,它利用大语言模型从海量学术文献中高效提取关键术语的定义,并通过实验证明多步骤提示和优化方法能显著提升提取的准确性,但同时也指出模型容易过度提取定义,未来需更关注定义的相关性筛选。
源自 arXiv: 2602.05413