菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-05-11
📄 Abstract - Phoenix-VL 1.5 Medium Technical Report

We introduce Phoenix-VL 1.5 Medium, a 123B-parameter natively multimodal and multilingual foundation model, adapted to regional languages and the Singapore context. Developed as a sovereign AI asset, it demonstrates that deep domain adaptation can be achieved with minimal degradation to broad-spectrum intelligence and alignment. Continued pretraining was performed on Mistral Medium 3.1 using a localized 1-trillion tokens multimodal corpus, followed by a 250-billion tokens long-context extension phase. Subsequent post-training incorporated a novel human-annotated Singapore multimodal dataset and curated textual corpus on Singapore culture, knowledge, and legislation, totaling 22-billion tokens. An additional 5 billion tokens of model alignment was performed through Online Direct Preference Optimization. Phoenix-VL 1.5 Medium achieves state-of-the-art performance for its size on Singapore multimodal, legal, and government policy benchmarks while remaining globally competitive on general multimodal intelligence, multilingual, and STEM benchmarks. We also introduce a novel evaluation suite encompassing localized knowledge benchmarks and an institutionally aligned model behavior and safety framework. We report the data curation principles, training methodology, and highlight benchmark and inference performance.

顶级标签: multi-modal llm model training
详细标签: multimodal foundation model domain adaptation preference optimization benchmark multilingual 或 搜索:

Phoenix-VL 1.5 Medium 技术报告 / Phoenix-VL 1.5 Medium Technical Report


1️⃣ 一句话总结

本文介绍了一个拥有1230亿参数的多模态多语言基础模型Phoenix-VL 1.5 Medium,它针对新加坡和东南亚地区进行了深度领域适配,通过大规模本地化语料训练和新型对齐技术,在保持全球通用智能水平的同时,在本地政策、法律和多模态任务上达到了业界领先性能。

源自 arXiv: 2605.10391