2026-05-12

20件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-05-10 HF ↑39

TMAS: Scaling Test-Time Compute via Multi-Agent Synergy

Test-time scaling has become an effective paradigm for improving the reasoning ability of large language models by allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rou...

#agent#llm#rl#benchmark
論文 深掘り Hugging Face 2026-05-10 HF ↑49

Qwen-Image-2.0 Technical Report

We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution pho...

#multimodal#vision#diffusion#benchmark
論文 Hugging Face 2026-05-10 HF ↑22

Model Merging Scaling Laws in Large Language Models

We study empirical scaling laws for language model merging measured by cross-entropy. Despite its wide practical use, merging lacks a quantitative rule that predicts returns as we add experts or scale the model size. We identify a compact power law that links model size and expert number: the size-d...

#llm
論文 深掘り Hugging Face 2026-05-10 HF ↑11

Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized int...

#agent#rl#llm
論文 Hugging Face 2026-05-10 HF ↑10

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement. Our core innova...

#llm#agent
論文 Hugging Face 2026-05-10 HF ↑26

PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents

A LaTeX manuscript that compiles without error is not necessarily publication-ready. The resulting PDFs frequently suffer from misplaced floats, overflowing equations, inconsistent table scaling, widow and orphan lines, and poor page balance, forcing authors into repetitive compile-inspect-edit cycl...

#llm#agent#benchmark
論文 Hugging Face 2026-05-10 HF ↑6

Mela: Test-Time Memory Consolidation based on Transformation Hypothesis

Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage establish...

#benchmark
論文 Hugging Face 2026-05-10 HF ↑11

Pixal3D: Pixel-Aligned 3D Generation from Images

Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We a...

企業動向 Microsoft Research 2026-05-11

SocialReasoning-Bench: Measuring whether AI agents act in users’ best interests

Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. The post SocialReasoning-Bench: Measuring whether AI agents act in users’ best inte...

#agent
企業動向 OpenAI 2026-05-12

What Parameter Golf taught us about AI-assisted research

Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints....

#agent#coding
論文 深掘り arXiv 2026-05-11

Towards On-Policy Data Evolution for Visual-Native Multimodal Deep Search Agents

Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as...

#agent#multimodal#rl#fine-tuning#benchmark
企業動向 OpenAI 2026-05-11

How ChatGPT adoption broadened in early 2026

ChatGPT adoption surged in Q1 2026, with fastest growth among users over 35 and more balanced gender usage, signaling broader mainstream AI adoption....