The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs Paper • 2509.09677 • Published Sep 11, 2025 • 34
MCP-AgentBench: Evaluating Real-World Language Agent Performance with MCP-Mediated Tools Paper • 2509.09734 • Published Sep 10, 2025 • 15
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20, 2025 • 109