論文深掘り arXiv 発表: 2026-05-05

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

著者: Jonathan Steinberg, Oren Gal

要約

Coding agents often pass per-prompt safety review yet ship exploitable code when their tasks are decomposed into routine engineering tickets. The challenge is structural: existing safety alignment evaluates overt requests in isolation, leaving models blind to malicious end-states that emerge from se…

#agent#coding#benchmark#alignment#speech

MOSAIC-Bench: Measuring Compositional Vulnerability Induction in Coding Agents

要約

同じカテゴリの記事

Claw-SWE-Bench: A Benchmark for Evaluating OpenClaw-style Agent Harnesses on Coding Tasks

On-Policy Self-Evolution via Failure Trajectories for Agentic Safety Alignment

World-R1: テキストから動画生成における3D制約の強化学習による整合