← アーカイブ一覧
論文 深掘り Hugging Face 2026-06-10 HF ↑56
Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untena...
#multimodal#llm#agent#coding#benchmark
論文 深掘り Hugging Face 2026-06-10 HF ↑67
Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-image generation and editing. However, constrained by their architectures, they cannot achieve interleaved generation (text-image sequence), which has crucial applications in visual nar...
#agent#benchmark#rl#multimodal#robotics
論文 深掘り Hugging Face 2026-06-10 HF ↑92
Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluations assume static environments. In contrast, real-world deployment is inherently dynamic, requiring agents to continually align their knowledge, skills, and behavior with changing envir...
#agent#benchmark#llm
論文 Hugging Face 2026-06-10 HF ↑23
Holistic visual tokenizers are fundamental to unified multimodal models (UMMs) as they map diverse visual inputs into a unified representation space. In this paper, we present HYDRA-X, the first UMM that unifies image and video tokenization within a single Vision Transformer (ViT). Our design is dri...
#multimodal#llm#vision
論文 Hugging Face 2026-06-10 HF ↑61
We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generati...
#rl
論文 Hugging Face 2026-06-10 HF ↑71
Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fundamental challenge for vision-language models (VLMs). Tool-augmented agents attempt to address this by augmenting VLMs with specialist perception modules, yet their effectiveness is ...
#agent#multimodal#benchmark
論文 Hugging Face 2026-06-10 HF ↑22
Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures offer a scalable alternative. However, it remains unclear which designs yield the most effective sequence models. We compare three leading approaches: xLSTM...
#llm#benchmark
論文 Hugging Face 2026-06-10 HF ↑3
Search Agents -- large language models augmented with search tools -- have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models ...
#agent#benchmark#llm
論文 Hugging Face 2026-06-10 HF ↑10
We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides approximate 3D pose sequences that serve as a noisy teacher: these are diffused, d...
#diffusion#alignment
論文 Hugging Face 2026-06-10 HF ↑15
LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human-designed approaches. As model capabilities cont...
#agent#llm#benchmark
企業動向 OpenAI 2026-06-12
OpenAI introduces three Academy courses that help people build practical AI skills, create repeatable workflows, and apply agents in everyday work....
#agent
企業動向 OpenAI 2026-06-12
Preply uses OpenAI to launch AI-generated lesson summaries, providing personalised feedback and language learning exercises....
企業動向 OpenAI 2026-06-11
Learn how BBVA scaled ChatGPT Enterprise to 100,000 employees and partnered with OpenAI to accelerate AI-powered banking transformation worldwide....
論文 深掘り arXiv 2026-06-11
Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between the...
#agent#coding#benchmark
論文 深掘り arXiv 2026-06-11
Agent systems are advancing quickly across domains, but their evaluation remains fragmented. Most benchmarks rely on fixed, LLM-centric harnesses that require heavy integration, create test-production mismatch, and limit fair comparison across diverse agent designs. The root problem is the lack of a...
#agent#benchmark#coding#llm
論文 深掘り arXiv 2026-06-11
Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event gene...
#agent#llm
企業動向 OpenAI 2026-06-11
OpenAI plans to acquire Ona to expand Codex with secure, persistent cloud environments, enabling long-running AI agents across enterprise workflows....
#agent
企業動向 OpenAI 2026-06-11
OpenAI supports the EU Code of Practice on AI content transparency, advancing provenance standards and tools to help people understand AI-generated content....
企業動向 Google Research 2026-06-12
Health & Bioscience...
ツール Google Research 2026-06-12
Climate & Sustainability...