論文 Hugging Face 発表: 2026-05-06 HF ↑7

A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

著者: Dingwei Chen, Zefang Zong, Zhipeng Ma, Leo Luo, Yang Li ほか3名

要約

Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such process credit assignment either depend…

#agent#benchmark#llm#rl

同じカテゴリの記事