2026-06-09

18件

論文深掘り Hugging Face 2026-06-07 HF ↑33

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Neural Memory Indexer built upon the DeepSeek-V4 architecture. ...

#llm#coding#benchmark

論文深掘り Hugging Face 2026-06-07 HF ↑11

SwiftVR: Real-Time One-Step Generative Video Restoration

Real-time video restoration (VR) for live streams requires high-resolution outputs under strict per-frame latency constraints. Existing one-step diffusion-based VR models remain difficult to deploy on consumer-grade GPUs due to two main bottlenecks: quadratic spatial attention at high resolutions an...

#diffusion#coding

論文深掘り Hugging Face 2026-06-07 HF ↑34

SpatialWorld: Benchmarking Interactive Spatial Reasoning of Multimodal Agents in Real-World Tasks

Spatial reasoning is a foundational capability for multimodal large language models (MLLMs) to perceive and operate within the physical world. However, existing benchmarks predominantly rely on passive evaluation (e.g., static VQA) or simulator-specific pipelines, failing to assess general interacti...

#agent#multimodal#benchmark#llm

論文 Hugging Face 2026-06-07 HF ↑6

End-to-End Context Compression at Scale

Long-context language model inference is bottlenecked by memory, as the KV cache grows with context length. Recent techniques to compress the KV cache fall short: they either degrade model quality substantially or require considerable time and compute to compress a single long prompt. Furthermore, m...

#agent

論文 Hugging Face 2026-06-07 HF ↑27

Echo-Memory: A Controlled Study of Memory in Action World Models

We present Echo-Memory, a controlled study of memory mechanisms in action-conditioned world models. These models generate multi-segment videos from a first frame, text prompt, and camera-action sequence, but their central failure is often memory rather than local image synthesis: after the camera le...

#benchmark#diffusion

論文 Hugging Face 2026-06-07 HF ↑12

AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing

World-action models have emerged as a promising paradigm for robot manipulation, jointly modeling visual scene dynamics and actions to inject physical priors into policy learning. However, existing world-action models couple world prediction and action execution at the same temporal resolution, forc...

#robotics#diffusion#coding

論文 Hugging Face 2026-06-07 HF ↑37

Latent Spatial Memory for Video World Models

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel ...

#diffusion#coding

論文深掘り Hugging Face 2026-06-08 HF ↑7

Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short

Reinforcement learning with verifiable rewards (RLVR) has become a leading paradigm for improving the reasoning ability of large language models through outcome-based supervision. However, verifiable rewards frequently become uninformative at the group level: when all sampled traces of a given promp...

#llm#rl#coding#benchmark

論文 Hugging Face 2026-06-07 HF ↑15

OmniGameArena: A Unified UE5 Benchmark for VLM Game Agents with Improvement Dynamics

Vision-language model (VLM) agents are increasingly deployed in interactive game environments. Yet game benchmarks for VLM agents typically report a single first-attempt score per (agent, game) pair, focus on single-agent Solo play, and lack unified protocols for evaluating heterogeneous agent class...

#agent#multimodal#benchmark#llm

論文 Hugging Face 2026-06-08 HF ↑2

Optical Reasoning: Rethinking Images as an Expressive Reasoning Medium Beyond Text

Chain-of-Thought (CoT) improves the performance of Large Language Models (LLMs) and has been extended to Multimodal Large Language Models (MLLMs). More recent work further moves from text-based multimodal reasoning toward interleaved-modal reasoning, where intermediate steps can incorporate both tex...

#multimodal#llm#benchmark

論文 Hugging Face 2026-06-08 HF ↑1

Experience Makes Skillful: Enabling Generalizable Medical Agent Reasoning via Self-Evolving Skill Memory

Medical agent systems are increasingly expected to support interactive clinical decision making rather than only static question answering. In such settings, effective agents must reuse prior experience across evolving cases, yet existing memory mechanisms often retain raw historical traces that are...

#agent#benchmark

モデル DeepMind 2026-06-09

Fluid, natural voice translation with Gemini 3.5 Live Translate

Gemini 3.5 Live Translate brings near real-time, natural speech translation to Google AI Studio, Google Translate and Google Meet....

#speech

モデル DeepMind 2026-06-09

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Introducing Gemma 4 12B: a unified, encoder-free multimodal model...

#multimodal

モデル OpenAI 2026-06-09

How engineers at Nextdoor use Codex to build without limits

How engineers at Nextdoor use Codex with GPT-5.5 to investigate hard-to-reproduce issues, build across platforms, and focus on product outcomes....

企業動向 OpenAI 2026-06-09

What Codex unlocks for Notion

How Notion uses Codex to one-shot specs, build AI Voice Input for the web, and multiply engineering power across small teams....

#speech

企業動向 OpenAI 2026-06-08

Confidential submission of draft S-1 to the SEC

OpenAI confirms a confidential S-1 submission to the SEC and has not yet determined timing for further action....

企業動向 OpenAI 2026-06-08

Built to benefit everyone: our plan

A vision for the future of AI, focusing on access, safety, and shared prosperity as OpenAI works to ensure AGI benefits everyone....

#alignment

モデル OpenAI 2026-06-08

Introducing the OpenAI Economic Research Exchange

OpenAI launches the Economic Research Exchange to study AI’s impact on jobs, productivity, and the economy. Applications are now open for selected research projects....