2026-06-04

19件

論文深掘り Hugging Face 2026-06-02 HF ↑27

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hac...

#rl#llm#agent

論文深掘り Hugging Face 2026-06-02 HF ↑20

Streaming Communication in Multi-Agent Reasoning

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent ag...

#agent#llm#benchmark

論文 Hugging Face 2026-06-02 HF ↑10

MapAgent: An Industrial-Grade Agentic Framework for City-scale Lane-level Map Generation

Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directl...

#agent

論文 Hugging Face 2026-06-02 HF ↑20

Echo-Infinity: Learning Evolving Memory for Real-Time Infinite Video Generation

We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules,...

#diffusion

論文 Hugging Face 2026-06-02 HF ↑3

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

Reinforcement learning with verifiable rewards (e.g. GRPO) is now a common way to improve mathematical reasoning in Large Language Models (LLMs). However, current methods usually broadcast one sequence-level advantage to all tokens, or use costly process reward models (PRMs) for step-level supervisi...

#rl#llm#alignment#benchmark

論文深掘り Hugging Face 2026-06-02 HF ↑23

M^3Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models ret...

#benchmark

論文 Hugging Face 2026-06-02 HF ↑40

Audio Interaction Model

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-deci...

#speech#benchmark

論文 arXiv 2026-06-03

Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery

When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry withou...

#agent#llm#benchmark

論文 Hugging Face 2026-06-02 HF ↑5

Stateful Visual Encoders for Vision-Language Models

Vision-language models (VLMs) are increasingly used in multi-image, multi-turn agentic settings where decisions depend on visual changes. However, in existing open-weight VLMs, visual comparisons happen only inside the language model, while the visual encoder itself remains stateless: each image is ...

#multimodal#agent#fine-tuning

企業動向 OpenAI 2026-06-04

How Endava is redesigning software delivery around AI agents

Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise....

#agent

企業動向 OpenAI 2026-06-04

Dreaming: Better memory for a more helpful ChatGPT

ChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations....

企業動向 OpenAI 2026-06-04

Biodefense in the Intelligence Age

An action plan for AI-powered biological resilience...

企業動向 OpenAI 2026-06-03

OpenAI public policy agenda

OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society....

#alignment

ツール OpenAI 2026-06-03

A blueprint for democratic governance of frontier AI

OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security....

#alignment

企業動向 Hugging Face 2026-06-04

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI

Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI...

#alignment#multimodal

企業動向 Hugging Face 2026-06-04

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios

EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios...

ツール Hugging Face 2026-06-04

Designing the hf CLI as an agent-optimized way to work with the Hub

Designing the hf CLI as an agent-optimized way to work with the Hub...

#agent

企業動向 NVIDIA 2026-06-04

Forecast: Fun Ahead — 18 Games Join in June to Stream on GeForce NOW

June’s forecast with GeForce NOW: 100% chance of gaming. GeForce NOW is lining up new adventures for the month, from big-name blockbusters to quirky indies ready for the spotlight. Members can dive into fresh worlds, squad up in new playlists and discover “just one more run” favorites — all streamin...

企業動向 Google Research 2026-06-04

Towards passive heart health monitoring via smartphone camera

Health & Bioscience...