Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
要約
Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clippin…