論文 arXiv 発表: 2026-05-12

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

著者: Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao

要約

In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilities without an additional cold-start stage. Our approach unlocks the model’s intrinsic potential to…

#multimodal#llm#diffusion#agent#alignment

同じカテゴリの記事