論文 Hugging Face 発表: 2026-06-08 HF ↑29

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

著者: Bowen Ping, Xiangxin Zhou, Penghui Qi, Minnan Luo, Liefeng Bo ほか1名

要約

Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clippin…

#rl#alignment

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合