view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 Dec 9, 2022 • 389
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning Paper • 2509.25760 • Published Sep 30, 2025 • 55
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published Sep 24, 2025 • 120
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper • 2508.08221 • Published Aug 11, 2025 • 50
FastCuRL Collection The collection for the Paper "Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models" • 6 items • Updated May 29, 2025 • 3
Tool-Star Collection Tool-Star is a reinforcement learning-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasonin • 8 items • Updated Sep 2, 2025 • 5
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26, 2025 • 59
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published May 1, 2025 • 53
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce Paper • 2504.11343 • Published Apr 15, 2025 • 19
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency Jan 30, 2025 • 209