Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures Paper • 2510.24081 • Published Oct 28, 2025 • 18
Global PIQA Collection A physical commonsense reasoning benchmark for 100+ languages, written in collaboration with 300+ researchers from 65 countries. • 2 items • Updated Nov 12, 2025 • 1
CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models Paper • 2512.10655 • Published 21 days ago • 8
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published Nov 18, 2025 • 71
Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Paper • 2510.21614 • Published Oct 24, 2025 • 22