2026-05-08

19件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-05-06 HF ↑31

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple re...

#diffusion#fine-tuning#rl#coding#benchmark
論文 深掘り Hugging Face 2026-05-06 HF ↑18

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the ``zero-advantage problem'': when all sampled roll...

#llm#rl
論文 深掘り Hugging Face 2026-05-06 HF ↑35

MiA-Signature: Approximating Global Activation for Long-Context Understanding

A growing body of work in cognitive science suggests that reportable conscious access is associated with global ignition over distributed memory systems, while such activation is only partially accessible as individuals cannot directly access or enumerate all activated contents. This tension suggest...

#llm#agent#rag
論文 Hugging Face 2026-05-06 HF ↑10

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize th...

#agent#rl
論文 Hugging Face 2026-05-06 HF ↑7

A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such process credit assignment either depend...

#agent#benchmark#llm#rl
論文 Hugging Face 2026-05-06 HF ↑27

When to Trust Imagination: Adaptive Action Execution for World Action Models

World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to wh...

#robotics#benchmark
論文 Hugging Face 2026-05-06 HF ↑17

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it towa...

#diffusion#alignment#vision
論文 Hugging Face 2026-05-06 HF ↑4

SkillOS: Learning Skill Curation for Self-Evolving Agents

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key...

#agent#llm#rl#benchmark
企業動向 OpenAI 2026-05-08

Running Codex safely at OpenAI

How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption....

#agent#coding
企業動向 OpenAI 2026-05-07

Parloa builds service agents customers want to talk to

Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions....

#agent#speech
企業動向 OpenAI 2026-05-07

Testing ads in ChatGPT

OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control....

論文 深掘り arXiv 2026-05-07

PianoCoRe: Combined and Refined Piano MIDI Dataset

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents P...

#alignment#benchmark