2026-05-08

19件

論文深掘り Hugging Face 2026-05-06 HF ↑31

MARBLE: Multi-Aspect Reward Balance for Diffusion RL

Reinforcement learning fine-tuning has become the dominant approach for aligning diffusion models with human preferences. However, assessing images is intrinsically a multi-dimensional task, and multiple evaluation criteria need to be optimized simultaneously. Existing practice deal with multiple re...

#diffusion#fine-tuning#rl#coding#benchmark

論文深掘り Hugging Face 2026-05-06 HF ↑18

Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration

Reinforcement learning with verifiable rewards, particularly Group Relative Policy Optimization (GRPO), has significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, in complex tasks, GRPO frequently suffers from the ``zero-advantage problem'': when all sampled roll...

#llm#rl

論文 Hugging Face 2026-05-06 HF ↑2

The Granularity Axis: A Micro-to-Macro Latent Direction for Social Roles in Language Models

Large language models (LLMs) are routinely prompted to take on social roles ranging from individuals to institutions, yet it remains unclear whether their internal representations encode the granularity of such roles, from micro-level individual experience to macro-level organizational, institutiona...

#llm

論文深掘り Hugging Face 2026-05-06 HF ↑35

MiA-Signature: Approximating Global Activation for Long-Context Understanding

A growing body of work in cognitive science suggests that reportable conscious access is associated with global ignition over distributed memory systems, while such activation is only partially accessible as individuals cannot directly access or enumerate all activated contents. This tension suggest...

#llm#agent#rag

論文 Hugging Face 2026-05-06 HF ↑10

Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning

A persistent skill library allows language model agents to reuse successful strategies across tasks. Maintaining such a library requires three coupled capabilities. The agent selects a relevant skill, utilizes it during execution, and distills new skills from experience. Existing methods optimize th...

#agent#rl

論文 Hugging Face 2026-05-06 HF ↑7

A^2TGPO: Agentic Turn-Group Policy Optimization with Adaptive Turn-level Clipping

Reinforcement learning for agentic large language models (LLMs) typically relies on a sparse, trajectory-level outcome reward, making it difficult to evaluate the contribution of individual tool-calls within multi-turn interactions. Existing approaches to such process credit assignment either depend...

#agent#benchmark#llm#rl

論文 Hugging Face 2026-05-06 HF ↑27

When to Trust Imagination: Adaptive Action Execution for World Action Models

World Action Models (WAMs) have recently emerged as a promising paradigm for robotic manipulation by jointly predicting future visual observations and future actions. However, current WAMs typically execute a fixed number of predicted actions after each model inference, leaving the robot blind to wh...

#robotics#benchmark

論文 Hugging Face 2026-05-06 HF ↑17

Continuous-Time Distribution Matching for Few-Step Diffusion Distillation

Step distillation has become a leading technique for accelerating diffusion models, among which Distribution Matching Distillation (DMD) and Consistency Distillation are two representative paradigms. While consistency methods enforce self-consistency along the full PF-ODE trajectory to steer it towa...

#diffusion#alignment#vision

論文 Hugging Face 2026-05-06 HF ↑4

SkillOS: Learning Skill Curation for Self-Evolving Agents

LLM-based agents are increasingly deployed to handle streaming tasks, yet they often remain one-off problem solvers that fail to learn from past interactions. Reusable skills distilled from experience provide a natural substrate for self-evolution, where high-quality skill curation serves as the key...

#agent#llm#rl#benchmark

企業動向 OpenAI 2026-05-08

Running Codex safely at OpenAI

How OpenAI runs Codex securely with sandboxing, approvals, network policies, and agent-native telemetry to support safe and compliant coding agent adoption....

#agent#coding

企業動向 OpenAI 2026-05-07

Parloa builds service agents customers want to talk to

Parloa leverages OpenAI models to power scalable, voice-driven AI customer service agents, enabling enterprises to design, simulate, and deploy reliable, real-time interactions....

#agent#speech

企業動向 Microsoft Research 2026-05-08

Building realistic electric transmission grid dataset at scale: a pipeline from open dataset

Microsoft Research is excited to release an open dataset of approximate transmission topology of the U.S. power grid derived from publicly available data. The ability to study transmission-level power grid behavior is essential for modern power systems research. Analyses of congestion, transmission ...

論文深掘り arXiv 2026-05-07

Beyond Negative Rollouts: Positive-Only Policy Optimization with Implicit Negative Gradients

Reinforcement learning with verifiable rewards (RLVR), due to the deterministic verification, becomes a dominant paradigm for enhancing the reasoning ability of large language models (LLMs). The community witnesses the rapid change from the Proximal Policy Optimization (PPO) to Group Relative Policy...

#llm#rl#benchmark

論文深掘り arXiv 2026-05-07

PairAlign: A Framework for Sequence Tokenization via Self-Alignment with Applications to Audio Tokenization

Many operations on sensory data -- comparison, memory, retrieval, and reasoning -- are naturally expressed over discrete symbolic structures. In language this interface is given by tokens; in audio, it must be learned. Existing audio tokenizers rely on quantization, clustering, or codec reconstructi...

#alignment#speech#benchmark

モデル OpenAI 2026-05-07

Scaling Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber

OpenAI expands Trusted Access for Cyber with GPT-5.5 and GPT-5.5-Cyber, helping verified defenders accelerate vulnerability research and protect critical infrastructure....

モデル OpenAI 2026-05-07

Advancing voice intelligence with new models in the API

Explore new realtime voice models in the OpenAI API that can reason, translate, and transcribe speech, enabling more natural and intelligent voice experiences....

#speech

企業動向 OpenAI 2026-05-07

Testing ads in ChatGPT

OpenAI begins testing ads in ChatGPT to support free access, with clear labeling, answer independence, strong privacy protections, and user control....

論文深掘り arXiv 2026-05-07

PianoCoRe: Combined and Refined Piano MIDI Dataset

Symbolic music datasets with matched scores and performances are essential for many music information retrieval (MIR) tasks. Yet, existing resources often cover a narrow range of composers, lack performance variety, omit note-level alignments, or use inconsistent naming formats. This work presents P...

#alignment#benchmark

論文 arXiv 2026-05-07

Optimizer-Model Consistency: Full Finetuning with the Same Optimizer as Pretraining Forgets Less

Optimizers play an important role in both pretraining and finetuning stages when training large language models (LLMs). In this paper, we present an observation that full finetuning with the same optimizer as in pretraining achieves a better learning-forgetting tradeoff, i.e., forgetting less while ...

#fine-tuning#llm