RewardHarness: Self-Evolving Agentic Post-Training Paper • 2605.08703 • Published 15 days ago • 9
ClawBench — Browser Agent Benchmark Suite Collection Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 12 days ago • 1
ClawBench — Browser Agent Benchmark Suite Collection Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated 12 days ago • 1
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
ClawBench: Can AI Agents Complete Everyday Online Tasks? Paper • 2604.08523 • Published Apr 9 • 263
Watch Before You Answer: Learning from Visually Grounded Post-Training Paper • 2604.05117 • Published Apr 6 • 36