論文 深掘り arXiv 発表: 2026-05-11

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

著者: Shijue Huang, Hangyu Guo, Chenxin Li, Junting Lu, Xinyu Geng ほか5名

要約

Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as…

#agent#multimodal#rl#fine-tuning#benchmark

同じカテゴリの記事