← アーカイブ一覧
論文 深掘り Hugging Face 2026-06-07 HF ↑33
Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Neural Memory Indexer built upon the DeepSeek-V4 architecture. ...
#llm#coding#benchmark
論文 深掘り Hugging Face 2026-06-07 HF ↑11
Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions an...
#diffusion#coding
論文 深掘り Hugging Face 2026-06-07 HF ↑34
Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interacti...
#agent#multimodal#benchmark#llm
論文 Hugging Face 2026-06-07 HF ↑6
Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, m...
#agent
論文 Hugging Face 2026-06-07 HF ↑27
We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure is often memory rather than local image synthesis: after the camera le...
#benchmark#diffusion
論文 Hugging Face 2026-06-07 HF ↑12
World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forc...
#robotics#diffusion#coding
論文 Hugging Face 2026-06-07 HF ↑37
Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel ...
#diffusion#coding
論文 深掘り Hugging Face 2026-06-08 HF ↑7
Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given promp...
#llm#rl#coding#benchmark
論文 Hugging Face 2026-06-07 HF ↑15
Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent class...
#agent#multimodal#benchmark#llm
論文 Hugging Face 2026-06-08 HF ↑2
Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both tex...
#multimodal#llm#benchmark
論文 Hugging Face 2026-06-08 HF ↑1
Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are...
#agent#benchmark
モデル DeepMind 2026-06-09
Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet....
#speech
モデル DeepMind 2026-06-09
Introducing Gemma 4 12B: a unified, encoder-free multimodal model...
#multimodal
モデル OpenAI 2026-06-09
How engineers at Nextdoor use Codex with GPT-5.5 to investigate hard-to-reproduce issues, build across platforms, and focus on product outcomes....
企業動向 OpenAI 2026-06-09
How Notion uses Codex to one-shot specs, build AI Voice Input for the web, and multiply engineering power across small teams....
#speech
企業動向 OpenAI 2026-06-08
OpenAI confirms a confidential S-1 submission to the SEC and has not yet determined timing for further action....
企業動向 OpenAI 2026-06-08
A vision for the future of AI, focusing on access, safety, and shared prosperity as OpenAI works to ensure AGI benefits everyone....
#alignment
モデル OpenAI 2026-06-08
OpenAI launches the Economic Research Exchange to study AI’s impact on jobs, productivity, and the economy. Applications are now open for selected research projects....