2026-05-19

20件

論文深掘り Hugging Face 2026-05-17 HF ↑43

AI for Auto-Research: Roadmap & User Guide

AI-assisted research is crossing a threshold: fully automated systems can now generate research papers for as little as $15, while long-horizon agents can execute experiments, draft manuscripts, and simulate critique with minimal human input. Yet this productivity frontier exposes a deeper integrity...

#agent#benchmark#llm#coding

論文深掘り Hugging Face 2026-05-17 HF ↑42

SkillsVote: Lifecycle Governance of Agent Skills from Collection, Recommendation to Evolution

Long-horizon LLM agents leave traces that could become reusable experience, but raw trajectories are noisy and hard to govern. We treat Agent Skills as an experience schema that couples executable scripts, with non-executable guidance on procedures. Yet open skill ecosystems contain redundant, uneve...

#agent#llm#benchmark

論文 Hugging Face 2026-05-17 HF ↑32

Code-as-Room: Generating 3D Rooms from Top-Down View Images via Agentic Code Synthesis

Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, te...

#agent#llm#benchmark#robotics

論文深掘り Hugging Face 2026-05-17 HF ↑57

Lance: Unified Multimodal Modeling by Multi-Task Synergy

We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a practical paradigm for unified multimodal modeling via collabor...

#multimodal#alignment#coding

論文 Hugging Face 2026-05-17 HF ↑4

AtlasVA: Self-Evolving Visual Skill Memory for Teacher-Free VLM Agents

Vision-language model (VLM) agents increasingly rely on memory-augmented reinforcement learning to reuse experience across long-horizon tasks, yet most existing frameworks store memory as text and depend on proprietary teacher models to summarize or refine it. This design is poorly matched to spatia...

#multimodal#agent#rl#llm#robotics

論文 Hugging Face 2026-05-17 HF ↑11

StableVLA: Towards Robust Vision-Language-Action Models without Extra Data

It is infeasible to encompass all possible disturbances within the training dataset. This raises a critical question regarding the robustness of Vision-Language-Action (VLA) models when encountering unseen real-world visual disturbances, particularly under imperfect visual conditions. In this work, ...

論文 Hugging Face 2026-05-17 HF ↑83

LongLive-2.0: An NVFP4 Parallel Infrastructure for Long Video Generation

We present LongLive-2.0, an NVFP4-based parallel infrastructure throughout the full training and inference workflow of long video generation, addressing speed and memory bottlenecks. For training, we introduce sequence-parallel autoregressive (AR) training, instantiated as Balanced SP, which co-desi...

#diffusion#coding#benchmark

企業動向 OpenAI 2026-05-19

Advancing content provenance for a safer, more transparent AI ecosystem

OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media....

論文 Hugging Face 2026-05-17 HF ↑1

MementoGUI: Learning Agentic Multimodal Memory Control for Long-Horizon GUI Agents

Recent GUI agents have made substantial progress in visual grounding and action prediction, yet they remain brittle in long-horizon tasks that require maintaining task state across many interface transitions. Existing agents typically rely on raw history replay or text-only memory, which either over...

#agent#llm#benchmark#multimodal#fine-tuning

論文 Hugging Face 2026-05-17 HF ↑2

SafeDiffusion-R1: Online Reward Steering for Safe Diffusion Post-Training

Diffusion models have been widely studied for removing unsafe content learned during pre-training. Existing methods require expensive supervised data, either unsafe-text paired with safe-image groundtruth or negative/positive image pairs, making them impractical to scale. Furthermore, offline reinfo...

#diffusion#rl#fine-tuning#alignment#benchmark

論文 Hugging Face 2026-05-17 HF ↑1

Code as Agent Harness

Recent large language models (LLMs) have demonstrated strong capabilities in understanding and generating code, from competitive programming to repository-level software engineering. In emerging agentic systems, code is no longer only a target output. It increasingly serves as an operational substra...

#agent#llm#multimodal#alignment#coding

企業動向 OpenAI 2026-05-18

OpenAI and Dell partner to bring Codex to hybrid and on-premise enterprise environments

OpenAI and Dell partner to bring Codex to hybrid and on-premise environments, helping enterprises deploy AI coding agents securely across data and workflows....

#agent#coding

企業動向 NVIDIA 2026-05-18

Vera Arrives: NVIDIA’s First CPU Built for Agents Lands at Top AI Labs

The first NVIDIA Vera CPUs arrived at three of the world's leading AI labs on Friday — Anthropic in San Francisco, OpenAI in Mission Bay, SpaceXAI in Palo Alto — followed by a delivery to Oracle Cloud Infrastructure in Santa Clara on Monday. NVIDIA Vice President of Hyperscale and High-Performance C...

#agent

論文深掘り arXiv 2026-05-18

ESI-Bench: Towards Embodied Spatial Intelligence that Closes the Perception-Action Loop

Spatial intelligence unfolds through a perception-action loop: agents act to acquire observations, and reason about how observations vary as a function of action. Rather than passively processing what is seen, they actively uncover what is unseen - occluded structure, dynamics, containment, and func...

#agent#robotics#llm#benchmark

企業動向 NVIDIA 2026-05-19

NVIDIA and Google Cloud Empower the Next Wave of AI Builders

At this year’s Google I/O conference, NVIDIA and Google Cloud are accelerating the work of more than 100,000 developers in the companies’ joint developer community, which provides curated learning paths, hands-on labs and events that help them build using the full-stack NVIDIA AI platform on Google ...

企業動向 Google Research 2026-05-19

Empirical Research Assistance (ERA): From Nature publication to catalyzing Computational Discovery

General Science...

論文 arXiv 2026-05-18

DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-value (KV) blocks based on coarse attention scores and subsequently apply fine-grained softmax attention on the selected tokens. However, the top-k operation assumes the number of relevant tokens for any ...

#llm

論文 arXiv 2026-05-18

Code as Agent Harness

#agent#llm#multimodal#alignment#coding

論文 arXiv 2026-05-18

Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation

Multimodal Large Language Models (MLLMs) still struggle with fine-grained visual understanding, where answers often depend on small but decisive evidence in the full image. We observe a regional-to-global perception gap: the same MLLM answers fine-grained questions more accurately when conditioned o...

#llm#multimodal#agent#benchmark

論文 arXiv 2026-05-18

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Evaluating embodied systems on real dexterous hardware requires more than isolated primitive skills: an agent must perceive a changing tabletop scene, choose a context-appropriate action, execute it with a dexterous hand, and leave the scene usable for later decisions. We introduce DexHoldem, a real...

#agent#robotics#benchmark