2026-05-21

20件

論文深掘り Hugging Face 2026-05-19 HF ↑33

IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools

Multimodal large language models (MLLMs) have shown remarkable capability in bridging visual perception and textual reasoning, enabling zero-shot understanding across diverse industrial scenarios. However, their performance in open-vocabulary industrial anomaly detection (IAD) is often limited by do...

#agent#llm#rl#multimodal#fine-tuning

論文深掘り Hugging Face 2026-05-19 HF ↑27

You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories

Reinforcement learning with verifiable rewards (RLVR) has become a dominant paradigm for improving reasoning in large language models (LLMs), yet the underlying geometry of the resulting parameter trajectories remains underexplored. In this work, we demonstrate that RLVR weight trajectories are extr...

#llm#rl#benchmark

論文 Hugging Face 2026-05-19 HF ↑14

Generative Recursive Reasoning

How should future neural reasoning systems implement extended computation? Recursive Reasoning Models (RRMs) offer a promising alternative to autoregressive sequence extension by performing iterative latent-state refinement with shared transition functions. Yet existing RRMs are largely deterministi...

論文 Hugging Face 2026-05-19 HF ↑9

On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists

With the advancement of AI capabilities, AI reviewers are beginning to be deployed in scientific peer review, yet their capability and credibility remain in question: many scientists simply view them as probabilistic systems without the expertise to evaluate research, while other researchers are mor...

#agent#alignment#benchmark

論文 Hugging Face 2026-05-19 HF ↑3

Conditional Equivalence of DPO and RLHF: Implicit Assumption, Failure Modes, and Provable Alignment

Direct Preference Optimization (DPO) has emerged as a popular alternative to Reinforcement Learning from Human Feedback (RLHF), offering theoretical equivalence with simpler implementation. We prove this equivalence is conditional rather than universal, depending on an implicit assumption frequently...

#alignment#rl#benchmark

論文 Hugging Face 2026-05-19 HF ↑3

UniT: Unified Geometry Learning with Group Autoregressive Transformer

Recent feed-forward models have significantly advanced geometry perception for inferring dense 3D structure from sensor observations. However, its essential capabilities remain fragmented across multiple incompatible paradigms, including online perception, offline reconstruction, multi-modal integra...

#benchmark

論文深掘り Hugging Face 2026-05-19 HF ↑6

OcclusionFormer: Arranging Z-Order for Layout-Grounded Image Generation

Recent layout-to-image models have achieved remarkable progress in spatial controllability. However, they still struggle with inter-object occlusion. When bounding boxes overlap, most existing methods lack explicit occlusion information, which makes the generation in intersection regions inherently ...

#diffusion#alignment#vision

論文 Hugging Face 2026-05-19 HF ↑16

Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning

Currently, enhancing Unified Multimodal Models (UMMs) with image understanding, generation, and editing capabilities mainly relies on mixed multi-task training. Due to inherent task conflicts, such strategy requires complex multi-stage pipelines, massive data mixing, and balancing tricks, merely res...

#multimodal

企業動向 OpenAI 2026-05-21

AdventHealth advances whole-person care with OpenAI

AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care....

論文 Hugging Face 2026-05-19 HF ↑3

SpecBench: Measuring Reward Hacking in Long-Horizon Coding Agents

As long-horizon coding agents produce more code than any developer can review, oversight collapses onto a single surface: the automated test suite. Reward hacking naturally arises in this setup, as the agent optimizes for passing tests while deviating from the users true goal. We study this reward h...

#agent#coding#benchmark

論文 Hugging Face 2026-05-19 HF ↑2

Mem-π: Adaptive Memory through Learning When and What to Generate

We present Mem-π, a framework for adaptive memory in large language model (LLM) agents, where useful guidance is generated on demand rather than retrieved from external memory stores. Existing memory-augmented agents typically rely on similarity-based retrieval from episodic memory banks or skill li...

#agent#llm#rl#robotics#benchmark

ツール Microsoft Research 2026-05-21

MagenticLite, MagenticBrain, Fara1.5: An agentic experience optimized for small models

MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks. The post MagenticLite, MagenticBrain, Fara1.5: An agentic experien...

#agent

企業動向 Microsoft Research 2026-05-21

Vega: Zero-knowledge proofs for digital identity in the age of AI

Vega turns a full credential into a single proof, sharing only what is needed and nothing more, with performance that works in real apps. The post Vega: Zero-knowledge proofs for digital identity in the age of AI appeared first on Microsoft Research ....

企業動向 DeepMind 2026-05-21

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks

We’re launching the Google DeepMind Accelerator program in Asia Pacific to tackle environmental risks...

ツール NVIDIA 2026-05-21

NVIDIA GTC Taipei at COMPUTEX: Live Updates on What’s Next in AI

At NVIDIA GTC Taipei at COMPUTEX, the world’s developers, researchers and industry leaders are converging to dive into the latest breakthroughs shaping every industry, covering topics spanning AI factories and scaling infrastructure to agentic and physical AI and more....

#agent

企業動向 OpenAI 2026-05-20

The next phase of OpenAI’s Education for Countries

OpenAI advances Education for Countries, expanding AI adoption in schools with new partnerships, teacher training, and tools to improve global learning outcomes....

企業動向 OpenAI 2026-05-20

An OpenAI model has disproved a central conjecture in discrete geometry

An OpenAI model solved the 80-year-old unit distance problem, disproving a major conjecture in discrete geometry and marking a milestone in AI-driven mathematics....

論文深掘り arXiv 2026-05-20

Agent JIT Compilation for Latency-Optimizing Web Agent Planning and Scheduling

Computer-use agents (CUA) automate tasks specified with natural language such as "order the cheapest item from Taco Bell" by generating sequences of calls to tools such as click, type, and scroll on a browser. Current implementations follow a sequential fetch-screenshot-execute loop where each itera...

#agent#llm

論文 arXiv 2026-05-20

DeepWeb-Bench: A Deep Research Benchmark Demanding Massive Cross-Source Evidence and Long-Horizon Derivation

Deep research, in which an agent searches the open web, collects evidence, and derives an answer through extended reasoning, is a prominent use case for frontier language models. Frontier deep research products score high on existing benchmarks, making it difficult to distinguish their capabilities ...

#benchmark#agent

論文 arXiv 2026-05-20

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

Visual Question Answering (VQA) benchmarks have largely emphasized perception-based tasks that can be solved from visual content alone. In contrast, many real-world scenarios require external knowledge that is not directly observable in the image to answer correctly. We introduce WikiVQABench, a hum...

#benchmark#multimodal#llm