Augusteinia
's Collections
RL thinking
updated
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper
•
2505.10320
•
Published
•
24
Insights into DeepSeek-V3: Scaling Challenges and Reflections on
Hardware for AI Architectures
Paper
•
2505.09343
•
Published
•
74
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
•
2505.10554
•
Published
•
120
Scaling Reasoning can Improve Factuality in Large Language Models
Paper
•
2505.11140
•
Published
•
7
Chain-of-Model Learning for Language Model
Paper
•
2505.11820
•
Published
•
121
AdaptThink: Reasoning Models Can Learn When to Think
Paper
•
2505.13417
•
Published
•
83
Paper
•
2505.14674
•
Published
•
37
Scaling Reasoning, Losing Control: Evaluating Instruction Following in
Large Reasoning Models
Paper
•
2505.14810
•
Published
•
62
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with
Curiosity-Driven Reinforcement Learning
Paper
•
2505.15966
•
Published
•
53
AceReason-Nemotron: Advancing Math and Code Reasoning through
Reinforcement Learning
Paper
•
2505.16400
•
Published
•
35
GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation
with Reinforcement Learning
Paper
•
2505.17022
•
Published
•
27
QwenLong-L1: Towards Long-Context Large Reasoning Models with
Reinforcement Learning
Paper
•
2505.17667
•
Published
•
88
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
•
2505.17612
•
Published
•
81
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in
Reasoning Models
Paper
•
2505.17225
•
Published
•
64
Reinforcement Pre-Training
Paper
•
2506.08007
•
Published
•
263
ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning
in LLMs
Paper
•
2506.15211
•
Published
•
38