論文 Hugging Face 発表: 2026-06-08 HF ↑13

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

著者: Liya Zhu, Jingzhe Ding, Jian Zhang, Jianbo Xue, Shihao Liang ほか52名

要約

Recent years have witnessed the rapid evolution of AI agents toward handling increasingly complex, real-world tasks. However, existing benchmarks rarely evaluate whether agents can operate graphical user interfaces to complete long-horizon, high-value professional workflows across diverse domains. C…

#agent#benchmark

同じカテゴリの記事