2026-05-22

20件

論文 Hugging Face 2026-05-20 HF ↑48

ACC: Compiling Agent Trajectories for Long-Context Training

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and r...

#agent#llm#fine-tuning#benchmark

論文深掘り Hugging Face 2026-05-20 HF ↑30

LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning

Joint audio-visual reasoning is essential for omnimodal understanding, yet current multimodal large language models (MLLMs) still struggle when reasoning requires fine-grained evidence from both modalities. A central limitation is that explicit text-based chain-of-thought (CoT) compresses continuous...

#llm#multimodal#benchmark

論文深掘り Hugging Face 2026-05-20 HF ↑62

Perception or Prejudice: Can MLLMs Go Beyond First Impressions of Personality?

Multimodal Large Language Models (MLLMs) are increasingly deployed in human-facing roles where personality perception is critical, yet existing benchmarks evaluate this capability solely on numerical Big Five score prediction, leaving open whether models truly perceive personality through behavioral...

#llm#benchmark#multimodal#agent

論文 Hugging Face 2026-05-20 HF ↑4

Diversed Model Discovery via Structured Table Discovery

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exp...

#alignment#benchmark

論文深掘り Hugging Face 2026-05-20 HF ↑10

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling existing associations. De...

#coding#benchmark

論文 Hugging Face 2026-05-20 HF ↑16

Maestro: Reinforcement Learning to Orchestrate Hierarchical Model-Skill Ensembles

The proliferation of large language models (LLMs) and modular skills has endowed autonomous agents with increasingly powerful capabilities. Existing frameworks typically rely on monolithic LLMs and fixed logic to interface with these skills. This gives rise to a critical bottleneck: different LLMs o...

#llm#multimodal#rl#agent#benchmark

論文 Hugging Face 2026-05-20 HF ↑18

SpaceDG: Benchmarking Spatial Intelligence under Visual Degradation

Multimodal Large Language Models (MLLMs) have made rapid progress in spatial intelligence, yet existing spatial reasoning benchmarks largely assume pristine visual inputs and overlook the degradations that commonly occur in real-world deployment, such as motion blur, low light, adverse weather, lens...

#llm#benchmark#multimodal#fine-tuning

論文 Hugging Face 2026-05-20 HF ↑22

Spreadsheet-RL: Advancing Large Language Model Agents on Realistic Spreadsheet Tasks via Reinforcement Learning

Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a pr...

#agent#llm#rl#fine-tuning#benchmark

論文 Hugging Face 2026-05-20 HF ↑26

WorldKV: Efficient World Memory with World Retrieval and Compression

Autoregressive video diffusion models have enabled real-time, action-conditioned world generation. However, sustaining a persistent world, where revisiting a previously seen viewpoint yields consistent content, remains an open problem. Full KV-cache attention preserves this consistency but breaks re...

#benchmark#diffusion#fine-tuning#coding

論文 Hugging Face 2026-05-20 HF ↑16

Sensor2Sensor: Cross-Embodiment Sensor Conversion for Autonomous Driving

Robust training and validation of Autonomous Driving Systems (ADS) require massive, diverse datasets. Proprietary data collected by Autonomous Vehicle (AV) fleets, while high-fidelity, are limited in scale, diversity of sensor configurations, as well as geographic and long-tail-behavioral coverage. ...

#agent#diffusion#benchmark

企業動向 OpenAI 2026-05-22

OpenAI named a Leader in enterprise coding agents by Gartner

OpenAI is named a leader in the 2026 Gartner Magic Quadrant for Enterprise AI Coding Agents, with Codex recognized for innovation and enterprise-scale deployment....

#agent#coding

企業動向 OpenAI 2026-05-22

How Virgin Atlantic ships faster with Codex

How Virgin Atlantic used Codex to ship its revamped mobile app on a fixed holiday travel deadline, reaching near-total unit test coverage and zero P1 defects....

論文深掘り arXiv 2026-05-21

SDPM: Survival Diffusion Probabilistic Model for Continuous-Time Survival Analysis

Survival analysis aims to estimate a time-to-event distribution from data with censored observations. Many existing methods either impose structural assumptions on the hazard function or discretize the time axis, which may limit flexibility and introduce approximation errors. We propose the Survival...

#diffusion#benchmark

企業動向 OpenAI 2026-05-21

AdventHealth advances whole-person care with OpenAI

AdventHealth is using ChatGPT for Healthcare to streamline workflows, reduce administrative burden, and return more time to patient care....

企業動向 Hugging Face 2026-05-22

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook...

論文深掘り arXiv 2026-05-21

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

#coding#benchmark

論文深掘り arXiv 2026-05-21

Deep Reinforcement Learning for Flexible Job Shop Scheduling with Random Job Arrivals

The Flexible Job Shop Scheduling Problem (FJSP) is the optimal allocation of a set of jobs to machines. Two primary challenges persist in FJSP: the unpredictable arrival of future jobs and the combinatorial complexity of the problem, rendering it intractable for conventional mixed-integer linear pro...

#agent#coding#rl#benchmark

論文 arXiv 2026-05-21

SeqLoRA: Bilevel Orthogonal Adaptation for Continual Multi-Concept Generation

Parameter-efficient fine-tuning enables fast personalization of text-to-image diffusion models, but composing multiple custom concepts remains challenging due to representation interference. Existing modular methods either rely on expensive post-hoc fusion or freeze adaptation subspaces, which limit...

#diffusion#fine-tuning#vision

論文 arXiv 2026-05-21

Can AI Make Conflicts Worse? An Alignment Failure in LLM Deployment Across Conflict Contexts

AI models are already deployed in societies affected by armed conflict, and journalists, humanitarian workers, governments and ordinary citizens rely on them for information or for their work processes. No established practice exists for checking whether their outputs can make those conflicts worse....

#alignment#benchmark#llm

論文 arXiv 2026-05-21

AMEL: Accumulated Message Effects on LLM Judgments

Large language models are routinely used as automated evaluators: to review code, moderate content, or score outputs, often with many items passing through one conversation. We ask whether the polarity of prior conversation history biases subsequent judgments, an effect we call the accumulated messa...

#llm#benchmark