2026-06-05

14件

論文 arXiv 2026-06-04

MLEvolve: A Self-Evolving Framework for Automated Machine Learning Algorithm Discovery

Large language model (LLM) agents are increasingly applied to long-horizon tasks such as scientific discovery and machine learning engineering (MLE), where sustained self-evolution becomes a key capability. However, existing MLE agents suffer from inter-branch information isolation, memoryless searc...

#agent#llm#coding#benchmark

論文 arXiv 2026-06-04

You Only Index Once: Cross-Layer Sparse Attention with Shared Routing

Long-context inference in modern LLMs is increasingly constrained by decoding efficiency, especially in reasoning-heavy settings where models generate long intermediate chains of thought. Existing sparse attention methods often face a practical efficiency-quality trade-off. Structured block sparse m...

#coding#llm#benchmark

論文 arXiv 2026-06-04

Benchmark Everything Everywhere All at Once

Benchmarks are fundamental for evaluating and advancing LLMs and MLLMs by providing standardized and explicit measures of performance. However, their construction is labor-intensive and hard to reuse, raising concerns about sustainability and scalability. Moreover, existing benchmarks often quickly ...

#benchmark#agent#llm#multimodal

論文 arXiv 2026-06-04

Vortex: Efficient and Programmable Sparse Attention Serving for AI Agents

Sparse attention is becoming increasingly important for serving large language models (LLMs) as generation lengths continue to grow. However, deploying and evaluating new sparse attention algorithms at scale remains highly engineering-intensive, slowing both human researchers and AI agents in explor...

#agent#llm#benchmark

論文深掘り arXiv 2026-06-04

Where Should Knowledge Enter? A Layered Framework for Knowledge Infusion in Multimodal Iterative Generative Mo

Multimodal generative models produce fluent outputs but remain unreliable when generation must respect structured, domain-specific, or safety-critical knowledge. Existing methods incorporate knowledge through mechanisms such as prompt augmentation, guidance, latent editing, or fine-tuning, yet they ...

#multimodal#diffusion#fine-tuning#alignment

論文深掘り arXiv 2026-06-04

Improving Answer Extraction in Context-based Question Answering Systems Using LLMs

Question answering (QA) systems have achieved notable progress with the advent of large language models (LLMs). However, they still face challenges in accurately extracting and generating precise answers from given contexts, particularly when dealing with complex or ambiguous queries. Existing appro...

#llm#fine-tuning#benchmark

モデル Google Research 2026-06-05

Unlocking dependable responses with Gemini Enterprise Agent Platform’s Agentic RAG

Data Management...

#agent#rag

企業動向 Hugging Face 2026-06-05

Thousand Token Wood: shipping a multi-agent economy on a 3B model

Thousand Token Wood: shipping a multi-agent economy on a 3B model...

#agent

企業動向 OpenAI 2026-06-04

How Endava is redesigning software delivery around AI agents

Learn how Endava is using AI agents, ChatGPT Enterprise, and Codex to accelerate software delivery, automate workflows, and build an AI-native culture across the enterprise....

#agent

論文深掘り arXiv 2026-06-04

Will the Agent Recuse Itself? Measuring LLM-Agent Compliance with In-Band Access-Deny Signals

As autonomous LLM agents increasingly hold real credentials and operate infrastructure without a human in the loop, operators have no standard way to tell an agent that a resource is off-limits. Access controls either let the agent in (it has valid credentials) or hard-fail it (indistinguishable fro...

#agent#llm#robotics

論文 arXiv 2026-06-04

USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding

Audio encoders are critical to modern audio applications as large language models (LLMs) increasingly rely on a single encoder for diverse inputs. While self-supervised learning (SSL) has yielded strong domain-specific encoders like speech or music experts, multi-domain approaches like USAD and SPEA...

#llm#benchmark#speech

企業動向 NVIDIA 2026-06-05

Seoul Purpose: How NVIDIA and South Korea Are Building the Future of AI

Home to cutting-edge sovereign AI infrastructure and robotics innovators, as well as one of the world’s most passionate gaming communities, South Korea is one of the world’s centers of AI. NVIDIA founder and CEO Jensen Huang is in Seoul this week to meet the partners and builders behind that work. S...

#robotics

論文 arXiv 2026-06-04

RREDCoT: Segment-Level Reward Redistribution for Reasoning Models

Recent advancements in reasoning language models have been driven by Reinforcement Learning (RL) fine-tuning. Most often, these rely on the Group Relative Policy Optimization (GRPO) algorithm or modifications thereof to steer the models to produce Chain-of-Thought (CoT) traces. The final answer can ...

#rl#fine-tuning

論文 arXiv 2026-06-04

Self-Augmenting Retrieval for Diffusion Language Models

Discrete diffusion language models generate text by iteratively denoising an entire response in parallel. At each step, they predict tentative tokens for every masked position, committing the confident predictions to the output and discarding the unconfident ones. We show that the discarded tokens a...

#diffusion#rag#benchmark