論文 Hugging Face 発表: 2026-05-31 HF ↑20

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

著者: Junhao Cheng, Liang Hou, Tianxiong Zhong, Xin Tao, Pengfei Wan ほか2名

要約

The recent “Reasoning with Video” paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to log…

#multimodal#benchmark

同じカテゴリの記事