菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-01-13
📄 Abstract - Ministral 3

We introduce the Ministral 3 series, a family of parameter-efficient dense language models designed for compute and memory constrained applications, available in three model sizes: 3B, 8B, and 14B parameters. For each model size, we release three variants: a pretrained base model for general-purpose use, an instruction finetuned, and a reasoning model for complex problem-solving. In addition, we present our recipe to derive the Ministral 3 models through Cascade Distillation, an iterative pruning and continued training with distillation technique. Each model comes with image understanding capabilities, all under the Apache 2.0 license.

顶级标签: llm model training natural language processing
详细标签: parameter-efficient models cascade distillation instruction tuning reasoning models multimodal language models 或 搜索:

Ministral 3 / Ministral 3


1️⃣ 一句话总结

这篇论文介绍了一个名为Ministral 3的高效语言模型系列,它包含三种不同大小的模型,通过一种创新的‘级联蒸馏’技术训练而成,不仅支持文本和图像理解,还提供了基础版、指令微调版和推理版三种变体,旨在为计算和内存资源有限的应用场景提供强大且开源(Apache 2.0许可)的AI解决方案。

源自 arXiv: 2601.08584