← 返回列表

arXiv 提交日期: 2026-01-13

📄 Abstract - Ministral 3

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

顶级标签: llm model training natural language processing

Ministral 3 / Ministral 3

1️⃣ 一句话总结

这篇论文介绍了一个名为Ministral 3的高效语言模型系列，它包含三种不同大小的模型，通过一种创新的‘级联蒸馏’技术训练而成，不仅支持文本和图像理解，还提供了基础版、指令微调版和推理版三种变体，旨在为计算和内存资源有限的应用场景提供强大且开源（Apache 2.0许可）的AI解决方案。

👋 没兴趣 ☆ 感兴趣 📌 待读

打开原文 PDF

源自 arXiv: 2601.08584

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要