論文 Hugging Face 発表: 2026-06-10 HF ↑3

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

著者: Yunhan Wang, Jiaan Wang, Lianzhe Huang, Xianfeng Zeng, Fandong Meng

要約

Search Agents — large language models augmented with search tools — have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models …

#agent#benchmark#llm

同じカテゴリの記事