← アーカイブ一覧
論文 深掘り Hugging Face 2026-06-09 HF ↑55
General-purpose agents such as OpenClaw are increasingly used as autonomous tool users, but their coding ability is difficult to measure under SWE-bench: a generic agent does not by itself satisfy the clean Docker workspace, patch, and prediction contract required for scoring. We introduce Claw-SWE-...
#agent#benchmark#coding
論文 深掘り Hugging Face 2026-06-09 HF ↑55
Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a systematic categorization and deep analysis. This paper syst...
#agent#benchmark#llm
論文 深掘り Hugging Face 2026-06-10 HF ↑21
Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures offer a scalable alternative. However, it remains unclear which designs yield the most effective sequence models. We compare three leading approaches: xLSTM...
#llm#benchmark
論文 Hugging Face 2026-06-09 HF ↑16
Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this...
#llm#alignment#coding#benchmark
論文 Hugging Face 2026-06-09 HF ↑61
Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduc...
#agent#benchmark
論文 Hugging Face 2026-06-09 HF ↑15
Recent progress in foundation models has shifted toward agentic behavior involving multi-step reasoning and tool use. However, open-source efforts largely focus on text-dominant settings, leaving long-horizon multimodal tasks underexplored. This gap is evident in video tasks requiring sustained temp...
#multimodal#agent#rl#fine-tuning#benchmark
論文 深掘り Hugging Face 2026-06-09 HF ↑15
Reinforcement learning (RL) has become a key component in modern large language models, yet the rollout stage remains the key bottleneck in RL training pipelines. Although Multi-Token Prediction (MTP) offers a natural solution to accelerate rollouts through speculative decoding, many studies have ob...
#llm#rl#agent#coding
企業動向 深掘り OpenAI 2026-06-11
Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide....
論文 深掘り Hugging Face 2026-06-10 HF ↑3
There are two main Parameter-Efficient Fine-Tuning (PEFT) techniques for Large Language Models (LLMs). While Low-Rank Adaptation (LoRA) introduces additional weights between the LLM layers, Soft Prompting introduces additional fine-tuning-specific raw tokens to an LLM input. However, both require mo...
#llm#fine-tuning#benchmark#multimodal
論文 Hugging Face 2026-06-09 HF ↑26
Spatial reasoning from egocentric videos is inherently challenging because the observable evidence is constrained by the camera trajectory. Existing methods rely on single-turn inference, forcing models to resolve geometric ambiguity through semantic priors rather than verifiable evidence. We argue ...
#llm#benchmark
論文 Hugging Face 2026-06-09 HF ↑4
Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for planning and decision-making. Existing approaches either outsource this reasoning to language or vision-lan...
#rl#multimodal#robotics#benchmark#diffusion
論文 Hugging Face 2026-06-09 HF ↑5
Reinforcement Learning (RL) with verifiable environments has emerged as a powerful approach for enhancing the reasoning capabilities of Large Language Models (LLMs). While prior research demonstrates that scaling environment quantity improves RL performance, existing manual or individual constructio...
#llm#rl#benchmark
企業動向 OpenAI 2026-06-11
OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows....
#agent
企業動向 OpenAI 2026-06-11
OpenAI supports the EU Code of Practice on AI content transparency, advancing provenance standards and tools to help people understand AI-generated content....
企業動向 OpenAI 2026-06-10
Access OpenAI models and Codex through Oracle Cloud, using existing commitments to build and deploy AI with enterprise security and governance....
企業動向 OpenAI 2026-06-10
A new report from OpenAI details PRC-linked influence operations using AI to target U.S. tech debates, data center narratives, tariffs, and false claims about ChatGPT....
企業動向 DeepMind 2026-06-10
Google DeepMind and partners announce a $10M funding call for multi-agent safety research....
#agent#alignment
企業動向 OpenAI 2026-06-10
See how LSEG uses OpenAI to scale trusted AI across its global business, accelerating insights, shrinking release cycles, and empowering 4,000 employees....
論文 arXiv 2026-06-10
In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teach...
#agent#rl#benchmark