R1-SyntheticVL:来自生成模型的合成数据是否已为多模态大语言模型做好准备? / R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?
1️⃣ 一句话总结
这篇论文提出了一种名为“集体对抗数据合成”的新方法,它能自动生成高质量、多样化且具有挑战性的多模态训练数据,从而有效提升多模态大语言模型在复杂任务上的性能。
In this work, we aim to develop effective data synthesis techniques that autonomously synthesize multimodal training data for enhancing MLLMs in solving complex real-world tasks. To this end, we propose Collective Adversarial Data Synthesis (CADS), a novel and general approach to synthesize high-quality, diverse and challenging multimodal data for MLLMs. The core idea of CADS is to leverage collective intelligence to ensure high-quality and diverse generation, while exploring adversarial learning to synthesize challenging samples for effectively driving model improvement. Specifically, CADS operates with two cyclic phases, i.e., Collective Adversarial Data Generation (CAD-Generate) and Collective Adversarial Data Judgment (CAD-Judge). CAD-Generate leverages collective knowledge to jointly generate new and diverse multimodal data, while CAD-Judge collaboratively assesses the quality of synthesized data. In addition, CADS introduces an Adversarial Context Optimization mechanism to optimize the generation context to encourage challenging and high-value data generation. With CADS, we construct MMSynthetic-20K and train our model R1-SyntheticVL, which demonstrates superior performance on various benchmarks.
R1-SyntheticVL:来自生成模型的合成数据是否已为多模态大语言模型做好准备? / R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?
这篇论文提出了一种名为“集体对抗数据合成”的新方法,它能自动生成高质量、多样化且具有挑战性的多模态训练数据,从而有效提升多模态大语言模型在复杂任务上的性能。
源自 arXiv: 2602.03300