📄
Abstract - MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources
Achieving strong optimization generalization across diverse optimization problems while requiring limited training resources remains a challenging problem for optimization-oriented large language models (LLMs). Existing approaches typically rely on large-scale supervised datasets, costly reasoning annotations, and expensive intermediate step verification, resulting in substantial training overhead. To address these challenges, we propose MiniOpt, a reinforcement learning framework that learns to solve optimization problems through an "reasoning-to-model-and-solve" paradigm. MiniOpt decomposes optimization reasoning into structured optimization modeling and executable solver generation. Building upon this paradigm, we introduce OptReward, a reward function with hierarchical score structure that jointly evaluates formulation and solution, enabling effective policy learning without expert demonstrations. We further develop an optimization-oriented policy optimization strategy that improves exploration efficiency and stabilizes reinforcement learning for compact models. Extensive experiments show that MiniOpt-3B exhibits strong optimization generalization across various optimization types, problem scenarios, and task domains. For models with fewer than 10B parameters, MiniOpt series achieves the highest average solving accuracy (SA). For models with more than 10B parameters, MiniOpt still shows competitive performance. These results suggest that optimization-oriented reward design and reinforcement learning provide an effective pathway for developing compact optimization-specialized language models with strong optimization generalization capabilities. The code is available at this https URL.
MiniOpt:在有限资源下推理建模并求解通用优化问题 /
MiniOpt: Reasoning to Model and Solve General Optimization Problems with Limited Resources
1️⃣ 一句话总结
这篇论文提出了MiniOpt框架,通过强化学习让小型语言模型学会将优化问题先建模成标准形式再自动生成求解代码,并用一种新颖的分级奖励函数来评估建模和求解质量,从而在不依赖大量标注数据和昂贵计算资源的情况下,让仅3B参数的模型在多种优化任务上达到领先的求解准确率。