基于大语言模型的财务报告三元组抽取 / LLM-based Triplet Extraction from Financial Reports
1️⃣ 一句话总结
这篇论文提出了一种从企业财务报告中自动抽取知识三元组的新方法,它通过使用本体驱动的评估指标和混合验证策略,有效解决了该领域缺乏标注数据以及大语言模型产生幻觉的问题,显著提升了抽取结果的准确性和可靠性。
Corporate financial reports are a valuable source of structured knowledge for Knowledge Graph construction, but the lack of annotated ground truth in this domain makes evaluation difficult. We present a semi-automated pipeline for Subject-Predicate-Object triplet extraction that uses ontology-driven proxy metrics, specifically Ontology Conformance and Faithfulness, instead of ground-truth-based evaluation. We compare a static, manually engineered ontology against a fully automated, document-specific ontology induction approach across different LLMs and two corporate annual reports. The automatically induced ontology achieves 100% schema conformance in all configurations, eliminating the ontology drift observed with the manual approach. We also propose a hybrid verification strategy that combines regex matching with an LLM-as-a-judge check, reducing apparent subject hallucination rates from 65.2% to 1.6% by filtering false positives caused by coreference resolution. Finally, we identify a systematic asymmetry between subject and object hallucinations, which we attribute to passive constructions and omitted agents in financial prose.
基于大语言模型的财务报告三元组抽取 / LLM-based Triplet Extraction from Financial Reports
这篇论文提出了一种从企业财务报告中自动抽取知识三元组的新方法,它通过使用本体驱动的评估指标和混合验证策略,有效解决了该领域缺乏标注数据以及大语言模型产生幻觉的问题,显著提升了抽取结果的准确性和可靠性。
源自 arXiv: 2602.11886