論文 arXiv 発表: 2026-05-12

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

OmniNFT: Modality-wise Omni Diffusion Reinforcement for Joint Audio-Video Generation

著者: Guohui Zhang, XiaoXiao Ma, Jie Huang, Hang Xu, Hu Yu ほか7名

要約

Recent advances in joint audio-video generation have been remarkable, yet real-world applications demand strong per-modality fidelity, cross-modal alignment, and fine-grained synchronization. Reinforcement Learning (RL) offers a promising paradigm, but its extension to multi-objective and multi-moda…

#alignment#diffusion#rl#multimodal#fine-tuning

同じカテゴリの記事