How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs
How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs
要約
Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal…