Jet-RL: Enabling On-Policy FP8 Reinforcement Learning with Unified Training and Rollout Precision Flow Paper • 2601.14243 • Published 7 days ago • 15
LLM-in-Sandbox Elicits General Agentic Intelligence Paper • 2601.16206 • Published 5 days ago • 72
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 101