2026-05-12

20件

論文深掘り Hugging Face 2026-05-10 HF ↑39

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Test-time scaling has become an effective paradigm for improving the reasoning ability of large language models by allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rou...

#agent#llm#rl#benchmark

論文深掘り Hugging Face 2026-05-10 HF ↑49

Qwen-Image-2.0 Technical Report

We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution pho...

#multimodal#vision#diffusion#benchmark

論文 Hugging Face 2026-05-10 HF ↑22

Model Merging Scaling Laws in Large Language Models

We study empirical scaling laws for language model merging measured by cross-entropy. Despite its wide practical use, merging lacks a quantitative rule that predicts returns as we add experts or scale the model size. We identify a compact power law that links model size and expert number: the size-d...

#llm

論文深掘り Hugging Face 2026-05-10 HF ↑11

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized int...

#agent#rl#llm

論文 Hugging Face 2026-05-10 HF ↑10

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement. Our core innova...

#llm#agent

論文 Hugging Face 2026-05-10 HF ↑26

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

A LaTeX manuscript that compiles without error is not necessarily publication-ready. The resulting PDFs frequently suffer from misplaced floats, overflowing equations, inconsistent table scaling, widow and orphan lines, and poor page balance, forcing authors into repetitive compile-inspect-edit cycl...

#llm#agent#benchmark

論文 Hugging Face 2026-05-10 HF ↑21

WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors

Commercial video generation systems such as Seedance2.0 and Veo3.1 have rapidly improved, strengthening the view that video generators may be evolving into "world simulators." Yet the community still lacks a benchmark that directly tests whether a model can reason about how an observed world should ...

#benchmark

論文 Hugging Face 2026-05-10 HF ↑6

Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage establish...

#benchmark

論文 Hugging Face 2026-05-10 HF ↑11

Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR

Self-distillation has emerged as a powerful framework for post-training LLMs, where a teacher conditioned on extra information guides a student without it, both from the same model. While this guidance is useful when the student has failed, on successful rollouts, the same mechanism instead overwrit...

#rl#llm

論文 Hugging Face 2026-05-10 HF ↑11

Pixal3D: Pixel-Aligned 3D Generation from Images

Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We a...

企業動向 Microsoft Research 2026-05-12

Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and multi-task models

MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and...

企業動向 Microsoft Research 2026-05-11

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. The post SocialReasoning-Bench: Measuring whether AI agents act in users’ best inte...

#agent

モデル OpenAI 2026-05-12

How NVIDIA engineers and researchers build with Codex

Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments....

企業動向 OpenAI 2026-05-12

AutoScout24 scales engineering with AI-powered workflows

Learn how AutoScout24 Group uses Codex and ChatGPT to speed development cycles, improve code quality, and expand AI adoption....

企業動向 OpenAI 2026-05-12

What Parameter Golf taught us about AI-assisted research

Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints....

#agent#coding

論文深掘り arXiv 2026-05-11

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as...

#agent#multimodal#rl#fine-tuning#benchmark

企業動向 OpenAI 2026-05-11