論文 Hugging Face 発表: 2026-05-10 HF ↑10

G-Zero: Self-Play for Open-Ended Generation from Zero Data

著者: Chengsong Huang, Haolin Liu, Tong Zheng, Runpeng Dai, Langlin Huang ほか5名

要約

Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement. Our core innova…

#llm#agent

G-Zero: Self-Play for Open-Ended Generation from Zero Data

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合