← アーカイブ一覧
論文 深掘り Hugging Face 2026-05-10 HF ↑39
Test-time scaling has become an effective paradigm for improving the reasoning ability of large language models by allocating additional computation during inference. Recent structured approaches have further advanced this paradigm by organizing inference across multiple trajectories, refinement rou...
#agent#llm#rl#benchmark
論文 深掘り Hugging Face 2026-05-10 HF ↑49
We present Qwen-Image-2.0, an omni-capable image generation foundation model that unifies high-fidelity generation and precise image editing within a single framework. Despite recent progress, existing models still struggle with ultra-long text rendering, multilingual typography, high-resolution pho...
#multimodal#vision#diffusion#benchmark
論文 Hugging Face 2026-05-10 HF ↑22
We study empirical scaling laws for language model merging measured by cross-entropy. Despite its wide practical use, merging lacks a quantitative rule that predicts returns as we add experts or scale the model size. We identify a compact power law that links model size and expert number: the size-d...
#llm
論文 深掘り Hugging Face 2026-05-10 HF ↑11
Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized int...
#agent#rl#llm
論文 Hugging Face 2026-05-10 HF ↑10
Self-evolving LLMs excel in verifiable domains but struggle in open-ended tasks, where reliance on proxy LLM judges introduces capability bottlenecks and reward hacking. To overcome this, we introduce G-Zero, a verifier-free, co-evolutionary framework for autonomous self-improvement. Our core innova...
#llm#agent
論文 Hugging Face 2026-05-10 HF ↑26
A LaTeX manuscript that compiles without error is not necessarily publication-ready. The resulting PDFs frequently suffer from misplaced floats, overflowing equations, inconsistent table scaling, widow and orphan lines, and poor page balance, forcing authors into repetitive compile-inspect-edit cycl...
#llm#agent#benchmark
論文 Hugging Face 2026-05-10 HF ↑21
Commercial video generation systems such as Seedance2.0 and Veo3.1 have rapidly improved, strengthening the view that video generators may be evolving into "world simulators." Yet the community still lacks a benchmark that directly tests whether a model can reason about how an observed world should ...
#benchmark
論文 Hugging Face 2026-05-10 HF ↑6
Memory consolidation, the process by which transient experiences are transformed into stable, structured representations, is a foundational organizing principle in the human brain, yet it remains largely unexplored as a design principle for modern sequence models. In this work, we leverage establish...
#benchmark
論文 Hugging Face 2026-05-10 HF ↑11
Self-distillation has emerged as a powerful framework for post-training LLMs, where a teacher conditioned on extra information guides a student without it, both from the same model. While this guidance is useful when the student has failed, on successful rollouts, the same mechanism instead overwrit...
#rl#llm
論文 Hugging Face 2026-05-10 HF ↑11
Recent advances in 3D generative models have rapidly improved image-to-3D synthesis quality, enabling higher-resolution geometry and more realistic appearance. Yet fidelity, which measures pixel-level faithfulness of the generated 3D asset to the input image, still remains a central bottleneck. We a...
企業動向 Microsoft Research 2026-05-12
MatterSim is expanding what AI can do for materials science—from faster large-scale simulations to MatterSim-MT, a new multi-task model for simulating properties beyond potential energy surfaces alone. The post Advancing AI for materials with MatterSim: experimental synthesis, faster simulation, and...
企業動向 Microsoft Research 2026-05-11
Using SocialReasoning Bench, we observed a stable pattern across models—agents execute competently, but fail to consistently improve the user’s position, even with explicit instructions to optimize for user interest. The post SocialReasoning-Bench: Measuring whether AI agents act in users’ best inte...
#agent
モデル OpenAI 2026-05-12
Teams use Codex with GPT-5.5 to ship production systems and turn research ideas into runnable experiments....
企業動向 OpenAI 2026-05-12
Learn how AutoScout24 Group uses Codex and ChatGPT to speed development cycles, improve code quality, and expand AI adoption....
企業動向 OpenAI 2026-05-12
Parameter Golf brought together 1,000+ participants and 2,000+ submissions to explore AI-assisted machine learning research, coding agents, quantization, and novel model design under strict constraints....
#agent#coding
論文 深掘り arXiv 2026-05-11
Multimodal deep search requires an agent to solve open-world problems by chaining search, tool use, and visual reasoning over evolving textual and visual context. Two bottlenecks limit current systems. First, existing tool-use harnesses treat images returned by search, browsing, or transformation as...
#agent#multimodal#rl#fine-tuning#benchmark
企業動向 OpenAI 2026-05-11
Join the OpenAI Campus Network—connect student clubs worldwide, access AI tools, host events, and build an AI-powered campus community....
企業動向 OpenAI 2026-05-11
OpenAI launches DeployCo, a new enterprise deployment company built to help organizations bring frontier AI into production and turn it into measurable business impact....
企業動向 Hugging Face 2026-05-11
Building Blocks for Foundation Model Training and Inference on AWS...
企業動向 OpenAI 2026-05-11
ChatGPT adoption surged in Q1 2026, with fastest growth among users over 35 and more balanced gender usage, signaling broader mainstream AI adoption....