2026-05-20

18件

論文深掘り Hugging Face 2026-05-18 HF ↑44

GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment

We present GoLongRL, a fully open-source, capability-oriented post-training recipe for long-context reinforcement learning with verifiable rewards (RLVR). Existing long-context RL methods often treat data construction as a matter of designing increasingly complex retrieval paths, leading to homogene...

#rl#alignment#benchmark

論文深掘り Hugging Face 2026-05-18 HF ↑31

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing video generation mod...

#multimodal#diffusion#rl#benchmark

論文 Hugging Face 2026-05-18 HF ↑12

CEPO: RLVR Self-Distillation using Contrastive Evidence Policy Optimization

When a model produces a correct solution under reinforcement learning with verifiable rewards (RLVR), every token receives the same reward signal regardless of whether it was a decisive reasoning step or a grammatical filler. A natural fix is to condition the model on the correct answer as a teacher...

#rl#multimodal#alignment#benchmark

論文 Hugging Face 2026-05-18 HF ↑44

OpenComputer: Verifiable Software Worlds for Computer-Use Agents

We present OpenComputer, a verifier-grounded framework for constructing verifiable software worlds for computer-use agents. OpenComputer integrates four components: (1) app-specific state verifiers that expose structured inspection endpoints over real applications, (2) a self-evolving verification l...

#agent#benchmark#llm

論文深掘り Hugging Face 2026-05-18 HF ↑40

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this...

#agent#benchmark

論文 Hugging Face 2026-05-18

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

Large language model (LLM) agents increasingly operate over long and recurring external contexts, like document corpora and code repositories. Across invocations, existing approaches preserve either the agent's trajectory, passive access to raw material, or task-level strategies. None of them preser...

#agent#llm#coding

論文 Hugging Face 2026-05-18 HF ↑7

PixVerve: Advancing Native UHR Image Generation to 100MP with a Large-Scale High-Quality Dataset

Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the extreme desire for better visual experience and the rapid development of imaging technology, the demand for Ultra-High-Resolution (UHR) image generation has grown significantly. However, UHR image ge...

#vision#benchmark#llm#multimodal#alignment

論文 Hugging Face 2026-05-18 HF ↑11

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

Video generation is rapidly evolving from single-shot synthesis to complex multi-shot audio-video (MSAV) narratives to meet real-world demands. However, evaluating such frontier models remains a fundamental challenge. Existing benchmarks are limited in scope and data diversity, and rely on rigid eva...

#benchmark#agent#alignment

企業動向 OpenAI 2026-05-20

The next phase of OpenAI’s Education for Countries

OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes....

企業動向 OpenAI 2026-05-20

An OpenAI model has disproved a central conjecture in discrete geometry

An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics....

モデル OpenAI 2026-05-20

How Ramp engineers accelerate code review with Codex

How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours....

論文深掘り arXiv 2026-05-19

PEEK: Context Map as an Orientation Cache for Long-Context LLM Agents

#agent#llm#coding

モデル OpenAI 2026-05-19

Introducing OpenAI for Singapore

OpenAI for Singapore launches a multi-year AI partnership to expand deployment, build local talent, and support businesses and public services with AI....

企業動向 OpenAI 2026-05-19

Advancing content provenance for a safer, more transparent AI ecosystem

OpenAI advances AI content provenance with Content Credentials, SynthID, and a verification tool to help people identify and trust AI-generated media....

論文深掘り arXiv 2026-05-19

A Methodology for Selecting and Composing Runtime Architecture Patterns for Production LLM Agents

Production LLM agents combine stochastic model outputs with deterministic software systems, yet the boundary between the two is rarely treated as a first-class architectural object. This paper names that boundary the stochastic-deterministic boundary (SDB): a four-part contract among a proposer, ver...

#agent#llm

論文 arXiv 2026-05-19

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR

Reinforcement learning with verifiable rewards has made post-training highly effective when correctness can be checked automatically. However, many important model behaviors require satisfying several qualitative criteria at once. Rubric-based rewards address this setting by grading prompt-specific ...

#rl#multimodal#benchmark

論文 arXiv 2026-05-19

Less Back-and-Forth: A Comparative Study of Structured Prompting

Large language models (LLMs) are widely used for open-ended tasks, but underspecified prompts can lead to low-quality answers and additional interaction. This paper studies whether structured prompt design improves response quality while reducing user effort. We compare three prompt conditions: a ra...

#llm#coding#benchmark

論文 arXiv 2026-05-19

ThoughtTrace: Understanding User Thoughts in Real-World LLM Interactions

Conversational AI has now reached billions of users, yet existing datasets capture only what people say, not what they think. We introduce ThoughtTrace, the first large-scale dataset that pairs real-world multi-turn human--AI conversations with users' self-reported thoughts: their reasons for sendin...

#llm#alignment