K-BrowseComp: A Web Browsing Agent Benchmark Grounded in Korean Contexts Paper • 2606.02404 • Published 3 days ago • 50
OmniRetrieval: Unified Retrieval across Heterogeneous Knowledge Sources Paper • 2605.29250 • Published 7 days ago • 74
AutoScientists: Self-Organizing Agent Teams for Long-Running Scientific Experimentation Paper • 2605.28655 • Published 8 days ago • 11
ESC-Skills: Discovering and Self-Evolving Skills for Emotional Support Conversations Paper • 2605.27908 • Published 8 days ago • 6
Learn from Weaknesses: Automated Domain Specialization for Small Computer-Use Agents Paper • 2605.28775 • Published 8 days ago • 37
ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence Paper • 2605.26340 • Published 10 days ago • 35
ResearchMath-14K: Scaling Research-Level Mathematics via Agents Paper • 2605.28003 • Published 8 days ago • 49
Gamma-World: Generative Multi-Agent World Modeling Beyond Two Players Paper • 2605.28816 • Published 8 days ago • 419
Agent Explorative Policy Optimization for Multimodal Agentic Reasoning Paper • 2605.28774 • Published 8 days ago • 86
ProRL: Effective Reinforcement Learning for Proactive Recommendation via Rectified Policy Gradient Estimation Paper • 2605.28293 • Published 8 days ago • 87
CUA-Gym: Scaling Verifiable Training Environments and Tasks for Computer-Use Agents Paper • 2605.25624 • Published 10 days ago • 32
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 16 days ago • 104
Self-Improving CAD Generation Agents with Finite Element Analysis as Feedback Paper • 2605.17448 • Published 18 days ago • 19
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models Paper • 2605.23345 • Published 13 days ago • 17
AutoResearch AI: Towards AI-Powered Research Automation for Scientific Discovery Paper • 2605.23204 • Published 13 days ago • 29
Evolution Fine-Tuning Collection Internalizing Discovery Capability into LLM • 5 items • Updated 12 days ago