論文 Hugging Face 発表: 2026-06-08 HF ↑2

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

著者: Yutong Bian, Dongjie Cheng, Heming Xia, Yongqi Li, Wenjie Li

要約

Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both tex…

#multimodal#llm#benchmark

同じカテゴリの記事