2026-06-03

17件

論文深掘り Hugging Face 2026-06-02 HF ↑21

World Models Meet Language Models: On the Complementarity of Concrete and Abstract Reasoning

World models and multimodal large language models (MLLMs) provide complementary capabilities for predicting future outcomes from static visual observations. World models can generate concrete visual rollouts of possible futures, while MLLMs can reason abstractly over questions, goals, and rules. How...

#llm#multimodal#benchmark

論文深掘り Hugging Face 2026-06-01 HF ↑10

Small RL Controller, Large Language Model: RL-Guided Adaptive Sampling for Test-Time Scaling

Test-time scaling improves the reasoning performance of large language models but incurs substantial cost in both total computation and latency. Existing adaptive sampling methods partially mitigate this issue by dynamically deciding when to stop sampling, yet they typically rely on heuristic rules ...

#llm#rl

論文深掘り Hugging Face 2026-06-01 HF ↑10

Language Models Need Sleep: Learning to Self-Modify and Consolidate Memories

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific shallow models to more general deep Large Language Models (LLMs). Despite showing promising results in tasks that require instant prediction or in-context learni...

#llm#rl

論文深掘り Hugging Face 2026-06-01 HF ↑28

Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking

We introduce Humanoid-GPT, a GPT-style Transformer with causal attention trained on a billion-scale motion corpus for whole-body control. Unlike prior shallow MLP trackers constrained by scarce data and an agility-generalization trade-off, Humanoid-GPT is pre-trained on a 2B-frame retargeted corpus ...

論文 Hugging Face 2026-06-01 HF ↑6

NVIDIA OmniDreams: Real-Time Generative World Model for Closed-Loop Autonomous Vehicle Simulation

As autonomous vehicle capabilities advance, the safe evaluation of driving policies in long-tail scenarios remains a critical bottleneck. In closed-loop simulation, the driving policy model actively interacts with the environment, where its actions dynamically update the simulator state and directly...

#diffusion#agent#benchmark

論文 Hugging Face 2026-06-01 HF ↑4

Benchmarking Visual State Tracking in Multimodal Video Understanding

Understanding a video requires more than recognizing isolated moments, as humans continuously track entities, states, and events over time. This capacity for visual state tracking is fundamental to video understanding, yet remains underexplored in current evaluations of Multimodal Large Language Mod...

#llm#benchmark#agent#multimodal#coding

企業動向 OpenAI 2026-06-03

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society....

#alignment

ツール OpenAI 2026-06-03

A blueprint for democratic governance of frontier AI

OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security....

#alignment

論文 Hugging Face 2026-06-02 HF ↑6

Value-Aware Stochastic KV Cache Eviction for Reasoning Models

Reasoning models improve accuracy through extended chains of thought, but their long outputs create a memory and compute bottleneck. KV cache eviction methods reduce this cost by evicting unimportant key-value pairs from the cache, yet they often yield worse accuracy than selection-based sparse atte...

論文 Hugging Face 2026-06-01 HF ↑5

PaddleOCR-VL-1.6: Expanding the Frontier of Document Parsing with Under-Optimized Region Refinement and Progressive Post-Training

We introduce PaddleOCR-VL-1.6, an upgraded compact document parsing model built upon PaddleOCR-VL-1.5. Although PaddleOCR-VL-1.5 establishes a strong 0.9B baseline, its remaining errors concentrate in under-optimized regions where model behavior is unstable, data coverage is sparse, or supervision i...

#rl#multimodal

論文 Hugging Face 2026-06-01 HF ↑2

Bootstrap Your Generator: Unpaired Visual Editing with Flow Matching

Modern generative models possess a deep understanding of visual content, yet training them for image editing typically requires massive datasets of paired examples. This limits scalability, especially for video editing where collecting paired data is prohibitively expensive. We propose Bootstrap You...

#benchmark

論文 Hugging Face 2026-06-01 HF ↑1

AURA: Action-Gated Memory for Robot Policies at Constant VRAM

The KV-cache is the right memory for datacenters but the wrong memory for robots. Datacenter inference batches many short requests and resets them, amortizing an attention cache across a crowd. Embodied agents instead run one long, non-resetting episode on bandwidth-limited edge hardware, where high...

#robotics#agent#benchmark

モデル OpenAI 2026-06-03

Introducing new capabilities to GPT-Rosalind

GPT-Rosalind advances life sciences research with enhanced biological reasoning, medicinal chemistry expertise, genomics analysis, and experimental workflow capabilities....

モデル OpenAI 2026-06-03

How Wasmer used Codex to build a Node.js runtime for the edge

See how Wasmer used Codex with GPT-5.5 to build a Node.js runtime for the edge, accelerating development 10x to 20x and shipping in weeks instead of months....

企業動向 NVIDIA 2026-06-03

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

At CVPR, NVIDIA is unveiling new physical AI agent skills that help researchers and developers speed the development of autonomous vehicles, robots and vision AI systems. The core challenge in physical AI research isn’t simply developing stronger models. It’s building a full workflow around them — r...

#agent#robotics#benchmark

企業動向 OpenAI 2026-06-02

Travelers deploys AI-powered claims countrywide with OpenAI

Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand....

企業動向 OpenAI 2026-06-02

Advancing youth safety and opportunity through global leadership

OpenAI calls for global action on youth AI safety, proposing an international institute to strengthen safeguards, standards, and opportunities for young people....

#alignment