論文深掘り Hugging Face 発表: 2026-05-31 HF ↑41

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

著者: Nahyun Lee, Dongkeun Yoon, Guijin Son, Geewook Kim, Dayoon Ko ほか10名

要約

Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We introduce K-BrowseComp, a web-browsing agent benchmark grounded in Korean contexts, consisting of 400…

#agent#benchmark#llm

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合