zengxiangji 's Collections reinforcement-learning
updated
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged
Reinforcement Learning
Paper
• 2506.04207
• Published
• 48
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large
Vision-Language Models
Paper
• 2504.11468
• Published
• 30
RLPR: Extrapolating RLVR to General Domains without Verifiers
Paper
• 2506.18254
• Published
• 32
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for
Visual Reasoning
Paper
• 2507.05255
• Published
• 75
Franca: Nested Matryoshka Clustering for Scalable Visual Representation
Learning
Paper
• 2507.14137
• Published
• 36
Scaling RL to Long Videos
Paper
• 2507.07966
• Published
• 160
Group Sequence Policy Optimization
Paper
• 2507.18071
• Published
• 319
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable
Reinforcement Learning
Paper
• 2507.01006
• Published
• 251
Sharing is Caring: Efficient LM Post-Training with Collective RL
Experience Sharing
Paper
• 2509.08721
• Published
• 662
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
• 2509.08827
• Published
• 190
Agent Learning via Early Experience
Paper
• 2510.08558
• Published
• 273
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified
Self-Play
Paper
• 2509.25541
• Published
• 140