2026-06-02

18件

論文深掘り Hugging Face 2026-05-31 HF ↑9

OpenWebRL: Demystifying Online Multi-turn Reinforcement Learning for Visual Web Agents

Building capable visual web agents requires long-horizon reasoning, precise grounding, and robust interaction with dynamic real-world websites. Despite rapid progress, the strongest systems remain largely proprietary, while open agents still depend heavily on supervised post-training over large coll...

#agent#benchmark#rl#multimodal

論文深掘り Hugging Face 2026-05-31 HF ↑41

K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts

Frontier model evaluations are shifting from foundational capabilities (e.g., instruction following and reasoning) toward compositional, agentic ones, but Korean agentic benchmarks remain scarce. We introduce K-BrowseComp, a web-browsing agent benchmark grounded in Korean contexts, consisting of 400...

#agent#benchmark#llm

論文深掘り Hugging Face 2026-05-31 HF ↑12

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

The Model Context Protocol (MCP) has emerged as a transformative standard for connecting large language models (LLMs) with external data sources and tools, and has been rapidly adopted across personal applications and development platforms. However, existing benchmarks predominantly focus on generic...

#benchmark#agent#llm

論文 Hugging Face 2026-05-31 HF ↑20

X-Stream: Exploring MLLMs as Multiplexers for Multi-Stream Understanding

While video streaming understanding has made significant strides, real-world applications, such as live sports broadcasting, autonomous driving, and multi-screen collaboration, inherently demand continuous, multi-stream interactions. However, existing benchmarks are confined to single-stream paradig...

#llm#benchmark#agent

論文 Hugging Face 2026-05-31 HF ↑11

Joint Agent Memory and Exploration Learning via Novelty Signals

In open-ended environments, exploration is fundamental for autonomous agents, yet current language model agents struggle with this. Effective exploration requires memory, but retaining raw interaction histories is computationally expensive over long trajectories. While latent memory offers a solutio...

#agent#llm#benchmark

論文 Hugging Face 2026-05-31 HF ↑9

LongLive-RAG: A General Retrieval-Augmented Framework for Long Video Generation

Autoregressive (AR) video diffusion enables variable-length synthesis, but long-horizon generation often suffers from accumulated errors and identity drift. For efficiency, existing methods commonly adopt sliding-window attention during generation. This creates an irreversible generation trajectory:...

#benchmark#rag#diffusion

論文 Hugging Face 2026-05-31 HF ↑20

VLMs are Good Teachers for Video Reasoning via Adaptive Test-Time Optimization

The recent "Reasoning with Video" paradigm utilizes Video Generation Models (VGMs) to generate temporally coherent visual trajectories to complete reasoning tasks. Although state-of-the-art VGMs excel at visual quality, they often struggle to understand and follow task-specific rules, leading to log...

#multimodal#benchmark

論文 Hugging Face 2026-05-31 HF ↑53

On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters

Parameter-efficient fine-tuning (PEFT) is usually treated as a cheaper alternative to full fine-tuning. We study a broader role: small trainable adapters as persistent local state on top of strong shared foundation models. In this framing, the base model provides shared competence while adapters car...

#fine-tuning#benchmark

企業動向 OpenAI 2026-06-02

Travelers deploys AI-powered claims countrywide with OpenAI

Travelers built an AI-powered Claim Assistant with OpenAI to guide customers through filing claims, provide 24/7 support, and scale operations during peak demand....

企業動向 OpenAI 2026-06-02

Advancing youth safety and opportunity through global leadership

OpenAI calls for global action on youth AI safety, proposing an international institute to strengthen safeguards, standards, and opportunities for young people....

#alignment

企業動向 OpenAI 2026-06-02

Codex for every role, tool, and workflow

Discover new Codex plugins, sites, and annotations that help analysts, marketers, designers, investors, and other teams get more done with AI....

企業動向 OpenAI 2026-06-02

Codex is becoming a productivity tool for everyone

The Next Era of Knowledge Work report explores how Codex is transforming productivity through AI-powered research, data analysis, workflow automation, and content creation....

論文深掘り arXiv 2026-06-01

AdaCodec: A Predictive Visual Code for Video MLLMs

Video is temporally redundant: adjacent frames usually share most objects, background, and layout. Yet existing video multimodal large language models (video MLLMs) usually encode each sampled frame as an independent RGB image, causing visual tokens to repeat content already present in earlier frame...

#llm#benchmark#multimodal

論文深掘り arXiv 2026-06-01

PaSBench-Video: A Streaming Video Benchmark for Proactive Safety Warning

Between the first visible sign of danger and the moment an accident occurs, there is often a window where intervention remains possible. Video-capable multimodal large language models (MLLMs) could serve as always-on safety monitors that issue warnings during this window. Yet current benchmarks do n...

#benchmark#llm#alignment#multimodal

企業動向 NVIDIA 2026-06-02

NVIDIA Partners With Microsoft on Unified Stack for Agentic AI Deployment, From Windows Devices to Cloud to Local

The agentic AI moment has arrived, but delivering on its promise requires more than good models. It also takes fast hardware, secure runtimes, a responsive data layer and models tuned for long-running reasoning. NVIDIA and Microsoft are bringing that full stack to developers across Windows devices, ...

#agent

企業動向 NVIDIA 2026-06-02

NVIDIA Jetson Brings Agentic AI to the Physical World

Agentic AI is getting physical. At COMPUTEX on Tuesday, NVIDIA announced NVIDIA JetPack 7.2 and NVIDIA NemoClaw support on NVIDIA Jetson. JetPack 7.2 brings agentic AI skills, Yocto project support, NVIDIA CUDA 13 on NVIDIA Jetson Orin, a substantial performance gain on Jetson AGX Orin 32GB module a...

#agent

企業動向 OpenAI 2026-06-01

Building the infrastructure for the Intelligence Age in Michigan

OpenAI breaks ground on a 1GW data center project in Michigan as part of Stargate, building AI infrastructure to expand access, create jobs, and support communities....

企業動向 OpenAI 2026-06-01

OpenAI frontier models and Codex are now available on AWS

OpenAI frontier models and Codex are now generally available on AWS, giving enterprises a new path to build with OpenAI through the AWS environments, controls, and procurement workflows they already use. Customers can get started with OpenAI on AWS and move faster from evaluation to production....

#benchmark