-
zhouxiangxin/Variational-Reasoning-32B-Acc
Text Generation • 33B • Updated • 2 -
zhouxiangxin/Variational-Reasoning-32B-GML
Text Generation • 33B • Updated • 2 -
zhouxiangxin/Variational-Reasoning-8B-Acc
Text Generation • 8B • Updated • 3 -
zhouxiangxin/Variational-Reasoning-8B-GML
Text Generation • 8B • Updated • 3
Xiangxin Zhou
zhouxiangxin
AI & ML interests
None yet
Recent Activity
authored a paper 1 day ago
Rethinking the Divergence Regularization in LLM RL authored a paper 1 day ago
Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models authored a paper 1 day ago
Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning