LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling Paper • 2511.20785 • Published Nov 25, 2025 • 181
Human-Agent Collaborative Paper-to-Page Crafting for Under $0.1 Paper • 2510.19600 • Published Oct 22, 2025 • 68
PEAR: Phase Entropy Aware Reward for Efficient Reasoning Paper • 2510.08026 • Published Oct 9, 2025 • 8
PEAR: Phase Entropy Aware Reward for Efficient Reasoning Paper • 2510.08026 • Published Oct 9, 2025 • 8
PEAR: Phase Entropy Aware Reward for Efficient Reasoning Paper • 2510.08026 • Published Oct 9, 2025 • 8 • 2
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23, 2025 • 56
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9, 2025 • 18
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9, 2025 • 18