論文 Hugging Face 発表: 2026-05-05 HF ↑1

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

When to Think, When to Speak: Learning Disclosure Policies for LLM Reasoning

著者: Jiaqi Wei, Xuehang Guo, Pengfei Yu, Xiang Zhang, Wanli Ouyang ほか3名

要約

In single-stream autoregressive interfaces, the same tokens both update the model state and constitute an irreversible public commitment. This coupling creates a silence tax: additional deliberation postpones the first task-relevant content, while naive early streaming risks premature commitments th…

#llm#rl#benchmark

同じカテゴリの記事