菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-02-10
📄 Abstract - Targum -- A Multilingual New Testament Translation Corpus

Many European languages possess rich biblical translation histories, yet existing corpora - in prioritizing linguistic breadth - often fail to capture this depth. To address this gap, we introduce a multilingual corpus of 657 New Testament translations, of which 352 are unique, with unprecedented depth in five languages: English (208 unique versions from 396 total), French (41 from 78), Italian (18 from 33), Polish (30 from 48), and Spanish (55 from 102). Aggregated from 12 online biblical libraries and one preexisting corpus, each translation is manually annotated with metadata that maps the text to a standardized identifier for the work, its specific edition, and its year of revision. This canonicalization empowers researchers to define "uniqueness" for their own needs: they can perform micro-level analyses on translation families, such as the KJV lineage, or conduct macro-level studies by deduplicating closely related texts. By providing the first resource designed for such flexible, multilevel analysis, our corpus establishes a new benchmark for the quantitative study of translation history.

顶级标签: natural language processing data benchmark
详细标签: multilingual corpus translation history text canonicalization biblical texts metadata annotation 或 搜索:

Targum——一个多语言新约翻译语料库 / Targum -- A Multilingual New Testament Translation Corpus


1️⃣ 一句话总结

这篇论文构建了一个包含657个新约译本的多语言语料库,通过精细的元数据标注,首次为研究者提供了可按需进行微观(如译本家族)或宏观(去重后)分析的灵活工具,为翻译历史的定量研究设立了新标准。

源自 arXiv: 2602.09724