菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-07
📄 Abstract - THIVLVC: Retrieval Augmented Dependency Parsing for Latin

We describe THIVLVC, a two-stage system for the EvaLatin 2026 Dependency Parsing task. Given a Latin sentence, we retrieve structurally similar entries from the CIRCSE treebank using sentence length and POS n-gram similarity, then prompt a large language model to refine the baseline parse from UDPipe using the retrieved examples and UD annotation guidelines. We submit two configurations: one without retrieval and one with retrieval (RAG). On poetry (Seneca), THIVLVC improves CLAS by +17 points over the UDPipe baseline; on prose (Thomas Aquinas), the gain is +1.5 CLAS. A double-blind error analysis of 300 divergences between our system and the gold standard reveals that, among unanimous annotator decisions, 53.3% favour THIVLVC, showing annotation inconsistencies both within and across treebanks.

顶级标签: natural language processing llm data
详细标签: dependency parsing retrieval-augmented generation low-resource language treebank evaluation 或 搜索:

THIVLVC:基于检索增强的拉丁语依存句法分析 / THIVLVC: Retrieval Augmented Dependency Parsing for Latin


1️⃣ 一句话总结

这篇论文提出了一个名为THIVLVC的两阶段系统,它通过从树库中检索结构相似的句子来辅助大语言模型,从而显著提升了拉丁语诗歌文本的依存句法分析准确率,并揭示了现有标注数据中的不一致性问题。

源自 arXiv: 2604.05564