論文 arXiv 発表: 2026-06-04

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

著者: Yutao Sun, Yanqi Zhang, Li Dong, Jianyong Wang, Furu Wei

要約

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse m…

#coding#llm#benchmark

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合