← アーカイブ一覧
論文 深掘り Hugging Face 2026-06-02 HF ↑27
Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hac...
#rl#llm#agent
論文 深掘り Hugging Face 2026-06-02 HF ↑20
Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent ag...
#agent#llm#benchmark
論文 Hugging Face 2026-06-02 HF ↑10
Lane-level maps are critical infrastructure for autonomous driving and lane-level navigation, yet constructing and maintaining standardized lane networks for hundreds of cities remains highly labor-intensive. Recent end-to-end vectorized mapping methods can predict lane geometry and topology directl...
#agent
論文 Hugging Face 2026-06-02 HF ↑20
We present Echo Infinity, an autoregressive (AR) framework towards real-time infinite video generation that employs a learnable evolving memory to dynamically filter, abstract, and compress any-length history at constant cost. Existing methods mainly curate memory with predefined KV-cache schedules,...
#diffusion
論文 Hugging Face 2026-06-02 HF ↑3
Reinforcement learning with verifiable rewards (e.g. GRPO) is now a common way to improve mathematical reasoning in Large Language Models (LLMs). However, current methods usually broadcast one sequence-level advantage to all tokens, or use costly process reward models (PRMs) for step-level supervisi...
#rl#llm#alignment#benchmark
論文 深掘り Hugging Face 2026-06-02 HF ↑23
As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models ret...
#benchmark
論文 Hugging Face 2026-06-02 HF ↑40
Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-deci...
#speech#benchmark
論文 arXiv 2026-06-03
When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry withou...
#agent#llm#benchmark
論文 Hugging Face 2026-06-02 HF ↑5
Vision-language models (VLMs) are increasingly used in multi-image, multi-turn agentic settings where decisions depend on visual changes. However, in existing open-weight VLMs, visual comparisons happen only inside the language model, while the visual encoder itself remains stateless: each image is ...
#multimodal#agent#fine-tuning
企業動向 OpenAI 2026-06-04
Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise....
#agent
企業動向 OpenAI 2026-06-04
ChatGPT introduces a new memory system to better remember preferences, keeping context fresh and relevant across conversations....
企業動向 OpenAI 2026-06-04
An action plan for AI-powered biological resilience...
企業動向 OpenAI 2026-06-03
OpenAI outlines its public policy agenda for AI, including safety, youth protection, workforce transition, and global standards to ensure AI benefits society....
#alignment
ツール OpenAI 2026-06-03
OpenAI outlines a blueprint for U.S. governance of frontier AI, proposing a federal framework for safety, resilience, and national security....
#alignment
企業動向 Hugging Face 2026-06-04
Nemotron 3.5 Content Safety: Customizable Multimodal Safety for Global Enterprise AI...
#alignment#multimodal
企業動向 Hugging Face 2026-06-04
EVA-Bench Data 2.0: 3 Domains, 121 Tools, 213 Scenarios...
ツール Hugging Face 2026-06-04
Designing the hf CLI as an agent-optimized way to work with the Hub...
#agent
企業動向 NVIDIA 2026-06-04
June’s forecast with GeForce NOW: 100% chance of gaming. GeForce NOW is lining up new adventures for the month, from big-name blockbusters to quirky indies ready for the spotlight. Members can dive into fresh worlds, squad up in new playlists and discover “just one more run” favorites — all streamin...
企業動向 Google Research 2026-06-04
Health & Bioscience...