MURAD:一个大规模多领域统一反向阿拉伯语词典数据集 / MURAD: A Large-Scale Multi-Domain Unified Reverse Arabic Dictionary Dataset
1️⃣ 一句话总结
这篇论文介绍了一个名为MURAD的大规模、多领域阿拉伯语反向词典数据集,它包含了超过9.6万个词条及其标准定义,旨在支持阿拉伯语的自然语言处理研究和应用开发。
Arabic is a linguistically and culturally rich language with a vast vocabulary that spans scientific, religious, and literary domains. Yet, large-scale lexical datasets linking Arabic words to precise definitions remain limited. We present MURAD (Multi-domain Unified Reverse Arabic Dictionary), an open lexical dataset with 96,243 word-definition pairs. The data come from trusted reference works and educational sources. Extraction used a hybrid pipeline integrating direct text parsing, optical character recognition, and automated reconstruction. This ensures accuracy and clarity. Each record aligns a target word with its standardized Arabic definition and metadata that identifies the source domain. The dataset covers terms from linguistics, Islamic studies, mathematics, physics, psychology, and engineering. It supports computational linguistics and lexicographic research. Applications include reverse dictionary modeling, semantic retrieval, and educational tools. By releasing this resource, we aim to advance Arabic natural language processing and promote reproducible research on Arabic lexical semantics.
MURAD:一个大规模多领域统一反向阿拉伯语词典数据集 / MURAD: A Large-Scale Multi-Domain Unified Reverse Arabic Dictionary Dataset
这篇论文介绍了一个名为MURAD的大规模、多领域阿拉伯语反向词典数据集,它包含了超过9.6万个词条及其标准定义,旨在支持阿拉伯语的自然语言处理研究和应用开发。
源自 arXiv: 2601.21512