菜单

关于 🐙 GitHub
arXiv 提交日期: 2026-03-11
📄 Abstract - Graph-GRPO: Training Graph Flow Models with Reinforcement Learning

Graph generation is a fundamental task with broad applications, such as drug discovery. Recently, discrete flow matching-based graph generation, \aka, graph flow model (GFM), has emerged due to its superior performance and flexible sampling. However, effectively aligning GFMs with complex human preferences or task-specific objectives remains a significant challenge. In this paper, we propose Graph-GRPO, an online reinforcement learning (RL) framework for training GFMs under verifiable rewards. Our method makes two key contributions: (1) We derive an analytical expression for the transition probability of GFMs, replacing the Monte Carlo sampling and enabling fully differentiable rollouts for RL training; (2) We propose a refinement strategy that randomly perturbs specific nodes and edges in a graph, and regenerates them, allowing for localized exploration and self-improvement of generation quality. Extensive experiments on both synthetic and real datasets demonstrate the effectiveness of Graph-GRPO. With only 50 denoising steps, our method achieves 95.0\% and 97.5\% Valid-Unique-Novelty scores on the planar and tree datasets, respectively. Moreover, Graph-GRPO achieves state-of-the-art performance on the molecular optimization tasks, outperforming graph-based and fragment-based RL methods as well as classic genetic algorithms.

顶级标签: machine learning model training agents
详细标签: graph generation reinforcement learning flow matching molecular optimization online training 或 搜索:

Graph-GRPO:使用强化学习训练图流模型 / Graph-GRPO: Training Graph Flow Models with Reinforcement Learning


1️⃣ 一句话总结

这篇论文提出了一种名为Graph-GRPO的新方法,它通过强化学习来训练图生成模型,使其能更好地满足复杂的人工偏好或特定任务目标,从而在药物发现等领域的分子优化任务上取得了领先的性能。

源自 arXiv: 2603.10395