World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning
World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning
要約
World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. How…