論文 arXiv 発表: 2026-05-25

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking

著者: Matt L. Wiemann, Lindsay M. Smith, Peter Melchior, Siddharth Mishra-Sharma, Andrew Gordon Wilson ほか2名

要約

Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose ph…

#llm#agent#benchmark

同じカテゴリの記事