論文 Hugging Face 発表: 2026-06-09 HF ↑15

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

InternVideo3: Agentify Foundation Models with Multimodal Contextual Reasoning

著者: Ziang Yan, Sheng Xia, Jiashuo Yu, Yue Wu, Tianxiang Jiang ほか8名

要約

Recent progress in foundation models has shifted toward agentic behavior involving multi-step reasoning and tool use. However, open-source efforts largely focus on text-dominant settings, leaving long-horizon multimodal tasks underexplored. This gap is evident in video tasks requiring sustained temp…

#multimodal#agent#rl#fine-tuning#benchmark

同じカテゴリの記事