R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

📄 Abstract - R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

In this work, we aim to develop effective data synthesis techniques that autonomously synthesize multimodal training data for enhancing MLLMs in solving complex real-world tasks. To this end, we propose Collective Adversarial Data Synthesis (CADS), a novel and general approach to synthesize high-quality, diverse and challenging multimodal data for MLLMs. The core idea of CADS is to leverage collective intelligence to ensure high-quality and diverse generation, while exploring adversarial learning to synthesize challenging samples for effectively driving model improvement. Specifically, CADS operates with two cyclic phases, i.e., Collective Adversarial Data Generation (CAD-Generate) and Collective Adversarial Data Judgment (CAD-Judge). CAD-Generate leverages collective knowledge to jointly generate new and diverse multimodal data, while CAD-Judge collaboratively assesses the quality of synthesized data. In addition, CADS introduces an Adversarial Context Optimization mechanism to optimize the generation context to encourage challenging and high-value data generation. With CADS, we construct MMSynthetic-20K and train our model R1-SyntheticVL, which demonstrates superior performance on various benchmarks.

R1-SyntheticVL：来自生成模型的合成数据是否已为多模态大语言模型做好准备？ / R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

1️⃣ 一句话总结

这篇论文提出了一种名为“集体对抗数据合成”的新方法，它能自动生成高质量、多样化且具有挑战性的多模态训练数据，从而有效提升多模态大语言模型在复杂任务上的性能。

← 返回列表

菜单

AI 帮我研读全文

1️⃣ 一句话总结

密码管理

设置密码

修改密码

移除密码

菜单

AI 帮我研读全文

1️⃣ 一句话总结

获取最新论文摘要