ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation

📄 Abstract - ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation

Effective retrieval-augmented generation across bilingual Greek--English applications requires embedding models capable of capturing both domain-specific semantic relationships and cross-lingual semantic alignment. Existing multilingual embedding models distribute their representational capacity across numerous languages, limiting their optimization for Greek and failing to encode the morphological complexity and domain-specific terminological structures inherent in Greek text. In this work, we propose ORPHEAS, a specialized Greek--English embedding model for bilingual retrieval-augmented generation. ORPHEAS is trained with a high quality dataset generated by a knowledge graph-based fine-tuning methodology which is applied to a diverse multi-domain corpus, which enables language-agnostic semantic representations. The numerical experiments across monolingual and cross-lingual retrieval benchmarks reveal that ORPHEAS outperforms state-of-the-art multilingual embedding models, demonstrating that domain-specialized fine-tuning on morphologically complex languages does not compromise cross-lingual retrieval capability.

ORPHEAS：一个用于检索增强生成的跨语言希腊语-英语嵌入模型 / ORPHEAS: A Cross-Lingual Greek-English Embedding Model for Retrieval-Augmented Generation

1️⃣ 一句话总结

这篇论文提出了一个专门为希腊语和英语设计的跨语言嵌入模型ORPHEAS，它通过基于知识图谱的方法进行训练，在保持跨语言检索能力的同时，能更好地处理希腊语复杂的形态结构和专业术语，从而在双语检索任务中超越了现有的通用多语言模型。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要