TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar Paper • 2510.14972 • Published Oct 16, 2025 • 34
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Paper • 2509.25454 • Published Sep 29, 2025 • 141
The Invisible Leash: Why RLVR May Not Escape Its Origin Paper • 2507.14843 • Published Jul 20, 2025 • 85
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models Paper • 2410.13370 • Published Oct 17, 2024 • 37