Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR Paper • 2509.02522 • Published Sep 2, 2025 • 25
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8, 2025 • 75
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published Jul 1, 2025 • 79
CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following Paper • 2506.12285 • Published Jun 14, 2025 • 54
SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner Paper • 2506.09003 • Published Jun 10, 2025 • 18
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values Paper • 2504.05535 • Published Apr 7, 2025 • 44