論文 Hugging Face 発表: 2026-05-31 HF ↑20

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

著者: Junhao Cheng, Liang Hou, Tianxiong Zhong, Xin Tao, Pengfei Wan ほか2名

要約

The recent “Reasoning with Video” paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to log…

#multimodal#benchmark

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合