菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-11
📄 Abstract - An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?

Subject indexing is vital for discovery but hard to sustain at scale and across languages. We release a large bilingual (English/German) corpus of catalog records annotated with the Integrated Authority File (GND), plus a machine-actionable GND taxonomy. The resource enables ontology-aware multi-label classification, mapping text to authority terms, and agent-assisted cataloging with reproducible, authority-grounded evaluation. We provide a brief statistical profile and qualitative error analyses of three systems. We invite the community to assess not only accuracy but usefulness and transparency, toward authority-anchored AI co-pilots that amplify catalogers' work.

顶级标签: natural language processing data multi-modal
详细标签: extreme multi-label classification subject indexing digital libraries ontology-aware classification authority file 或 搜索:

一个极大多标签文本分类(XMTC)库数据集:如果我们认真对待“数字图书馆中实用人工智能的应用”会怎样? / An Extreme Multi-label Text Classification (XMTC) Library Dataset: What if we took "Use of Practical AI in Digital Libraries" seriously?


1️⃣ 一句话总结

这篇论文发布了一个包含大量英德双语图书馆目录记录及其权威词标注的数据集,旨在帮助开发能够自动为文献添加专业标签、从而辅助图书馆员更高效工作的AI工具。

源自 arXiv: 2603.10876