What Does It Take to Be a Good AI Research Agent? Studying the Role of Ideation Diversity Paper • 2511.15593 • Published Nov 19, 2025 • 57
CodeClash: Benchmarking Goal-Oriented Software Engineering Paper • 2511.00839 • Published Nov 2, 2025 • 9
One Life to Learn: Inferring Symbolic World Models for Stochastic Environments from Unguided Exploration Paper • 2510.12088 • Published Oct 14, 2025 • 4
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26, 2025 • 79
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29, 2025 • 140
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19, 2025 • 126
SpatialVID: A Large-Scale Video Dataset with Spatial Annotations Paper • 2509.09676 • Published Sep 11, 2025 • 33
DeepResearch Arena: The First Exam of LLMs' Research Abilities via Seminar-Grounded Tasks Paper • 2509.01396 • Published Sep 1, 2025 • 57
OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning Paper • 2509.01644 • Published Sep 1, 2025 • 33
Mobile-Agent-v3: Foundamental Agents for GUI Automation Paper • 2508.15144 • Published Aug 21, 2025 • 64