Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention
要約
Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. De…