論文 arXiv 発表: 2026-05-18

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

著者: Yuxiang Huang, Nuno M. T. Gonçalves, Federico Alvetreti, Lei Li, Xu Han ほか3名

要約

Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-value (KV) blocks based on coarse attention scores and subsequently apply fine-grained softmax attention on the selected tokens. However, the top-k operation assumes the number of relevant tokens for any …

#llm

同じカテゴリの記事