Raywithyou (Ray)

upvoted 2 papers 5 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26, 2025 • 158

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24, 2025 • 316

upvoted 2 papers 6 months ago

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Paper • 2507.01006 • Published Jul 1, 2025 • 249

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

Paper • 2507.01001 • Published Jul 1, 2025 • 46

upvoted 7 papers 7 months ago

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Paper • 2506.11928 • Published Jun 13, 2025 • 24

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

Paper • 2506.09003 • Published Jun 10, 2025 • 18

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

Paper • 2506.13284 • Published Jun 16, 2025 • 26

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published Jun 2, 2025 • 187

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16, 2025 • 273

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

Paper • 2506.12278 • Published Jun 13, 2025 • 16

Table-R1: Inference-Time Scaling for Table Reasoning

Paper • 2505.23621 • Published May 29, 2025 • 93

upvoted an article 10 months ago

Article

Open R1: Update #3

Mar 11, 2025

•

296

upvoted a paper 12 months ago

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Paper • 2501.12380 • Published Jan 21, 2025 • 84

Ray

AI & ML interests

Organizations

Agentic Reinforced Policy Optimization

Group Sequence Policy Optimization

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks

LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

SWE-Flow: Synthesizing Software Engineering Data in a Test-Driven Manner

AceReason-Nemotron 1.1: Advancing Math and Code Reasoning through SFT and RL Synergy

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure

Table-R1: Inference-Time Scaling for Table Reasoning

Open R1: Update #3

MMVU: Measuring Expert-Level Multi-Discipline Video Understanding

Ray

AI & ML interests

Organizations

Raywithyou's activity

Open R1: Update #3