論文 Hugging Face 発表: 2026-06-07 HF ↑37

Latent Spatial Memory for Video World Models

著者: Weijie Wang, Haoyu Zhao, Yifan Yang, Feng Chen, Zeyu Zhang ほか5名

要約

Video world models that maintain 3D spatial consistency across generated frames typically rely on explicit point cloud memory constructed in RGB space. This design is both computationally expensive, requiring repeated rendering and VAE encoding, and inherently lossy, as the round trip through pixel …

#diffusion#coding

Latent Spatial Memory for Video World Models

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合