菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-04-08
📄 Abstract - Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation

Deciphering ancient Chinese Oracle Bone Script (OBS) is a challenging task that offers insights into the beliefs, systems, and culture of the ancient era. Existing approaches treat decipherment as a closed-set image recognition problem, which fails to bridge the ``interpretation gap'': while individual characters are often unique and rare, they are composed of a limited set of recurring, pictographic components that carry transferable semantic meanings. To leverage this structural logic, we propose an agent-driven Vision-Language Model (VLM) framework that integrates a VLM for precise visual grounding with an LLM-based agent to automate a reasoning chain of component identification, graph-based knowledge retrieval, and relationship inference for linguistically accurate interpretation. To support this, we also introduce OB-Radix, an expert-annotated dataset providing structural and semantic data absent from prior corpora, comprising 1,022 character images (934 unique characters) and 1,853 fine-grained component images across 478 distinct components with verified explanations. By evaluating our system across three benchmarks of different tasks, we demonstrate that our framework yields more detailed and precise decipherments compared to baseline methods.

顶级标签: llm multi-modal computer vision
详细标签: oracle bone script vision-language model knowledge augmentation ancient script interpretation multimodal reasoning 或 搜索:

通过基于组件的多模态知识增强,为甲骨文解读专门化大型模型 / Specializing Large Models for Oracle Bone Script Interpretation via Component-Grounded Multimodal Knowledge Augmentation


1️⃣ 一句话总结

这篇论文提出了一种结合视觉语言模型和大语言模型代理的新方法,通过识别甲骨文中可重复的象形部件并利用其语义进行推理,从而更准确、详细地解读这种古老文字,同时还创建了一个包含精细部件标注的新数据集来支持这一任务。

源自 arXiv: 2604.06711