Peng Wang's picture

Peng Wang

stillarrow

·

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

liked a model 4 days ago

opencompass/CompassVerifier-3B

upvoted an article 5 days ago

Open Responses: What you need to know

upvoted a paper 5 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

View all activity

Organizations

None yet

upvoted an article 5 days ago

Article

Open Responses: What you need to know

+2

14 days ago

•

99

upvoted a paper 5 days ago

SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines

Paper • 2502.14739 • Published Feb 20, 2025 • 107

upvoted a paper 7 days ago

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published 16 days ago • 145

upvoted a paper 8 days ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published Aug 28, 2025 • 37

upvoted a collection 12 days ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 181

upvoted an article about 1 month ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9, 2025

•

78

upvoted an article about 2 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

+2

Dec 9, 2022

•

394

upvoted a paper 2 months ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 133

upvoted 2 papers 4 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 80

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55

upvoted a collection 4 months ago

Qwen3-VL

37 items • Updated 29 days ago • 608

upvoted 2 papers 4 months ago

VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models

Paper • 2509.19803 • Published Sep 24, 2025 • 120

Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published Aug 11, 2025 • 50

upvoted a collection 6 months ago

FastCuRL

The collection for the Paper "Curriculum Reinforcement Learning with Stage-wise Context Scaling for Efficient Training R1-like Reasoning Models" • 6 items • Updated May 29, 2025 • 3

upvoted a collection 7 months ago

"Physics of Language Models" series

7 items • Updated Dec 22, 2025 • 53

upvoted a paper 7 months ago

Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9, 2025 • 263

upvoted a collection 7 months ago

Tool-Star

Tool-Star is a reinforcement learning-based framework designed to empower LLMs to autonomously invoke multiple external tools during stepwise reasonin • 8 items • Updated Sep 2, 2025 • 5

upvoted a paper 8 months ago

Understanding R1-Zero-Like Training: A Critical Perspective

Paper • 2503.20783 • Published Mar 26, 2025 • 59

upvoted 2 papers 9 months ago

DeepCritic: Deliberate Critique with Large Language Models

Paper • 2505.00662 • Published May 1, 2025 • 54

A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce

Paper • 2504.11343 • Published Apr 15, 2025 • 19