2026-05-06

17件

← アーカイブ一覧

論文 深掘り Hugging Face 2026-05-04 HF ↑4

PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination

Patent examination is a complex, multi-stage process requiring both technical expertise and legal reasoning, increasingly challenged by rising application volumes. Prior benchmarks predominantly view patent examination as discriminative classification or static extraction, failing to capture its inh...

#benchmark#llm
モデル OpenAI 2026-05-06

Introducing ChatGPT Futures: Class of 2026

Meet the ChatGPT Futures Class of 2026—26 student innovators using AI to build, research, and drive real-world impact. Discover how this generation is redefining learning, creativity, and opportunity with ChatGPT....

論文 Hugging Face 2026-05-04 HF ↑1

A Benchmark for Interactive World Models with a Unified Action Generation Framework

Achieving Artificial General Intelligence (AGI) requires agents that learn and interact adaptively, with interactive world models providing scalable environments for perception, reasoning, and action. Yet current research still lacks large-scale datasets and unified benchmarks to evaluate their phys...

#benchmark#agent
論文 Hugging Face 2026-05-04 HF ↑2

SymptomAI: Towards a Conversational AI Agent for Everyday Symptom Assessment

Language models excel at diagnostic assessments on currated medical case-studies and vignettes, performing on par with, or better than, clinical professionals. However, existing studies focus on complex scenarios with rich context making it difficult to draw conclusions about how these systems perfo...

#agent#llm#benchmark
論文 arXiv 2026-05-05

Safety and accuracy follow different scaling laws in clinical large language models

Clinical LLMs are often scaled by increasing model size, context length, retrieval complexity, or inference-time compute, with the implicit expectation that higher accuracy implies safer behavior. This assumption is incomplete in medicine, where a few confident, high-risk, or evidence-contradicting ...

#alignment#llm#rag#agent#benchmark
論文 深掘り arXiv 2026-05-05

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

Coding agents often pass per-prompt safety review yet ship exploitable code when their tasks are decomposed into routine engineering tickets. The challenge is structural: existing safety alignment evaluates overt requests in isolation, leaving models blind to malicious end-states that emerge from se...

#agent#coding#benchmark#alignment#speech
論文 深掘り arXiv 2026-05-05

TabSurv: Adapting Modern Tabular Neural Networks to Survival Analysis

Survival analysis on tabular data is a well-studied problem. However, existing deep learning methods are often highly task-specific, which can limit the transfer of new approaches from other domains and introduce constraints that may affect performance. We propose TabSurv, an approach that adapts mo...

#benchmark
企業動向 OpenAI 2026-05-05

New ways to buy ChatGPT ads

OpenAI expands ChatGPT ads with a beta self-serve Ads Manager, CPC bidding, and enhanced measurement tools—built to protect privacy and keep conversations separate from ads....

企業動向 Microsoft Research 2026-05-05

Microsoft at NSDI 2026: Advances in large-scale networked systems

Microsoft researchers share advances in building and operating large-scale distributed systems, spanning datacenters, networking, and the growing intersection with AI during NSDI ’26. The post Microsoft at NSDI 2026: Advances in large-scale networked systems appeared first on Microsoft Research ....