Multi-Crit: Benchmarking Multimodal Judges on Pluralistic Criteria-Following Paper • 2511.21662 • Published Nov 26, 2025 • 11
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation Paper • 2511.01163 • Published Nov 3, 2025 • 31
SPICE: Self-Play In Corpus Environments Improves Reasoning Paper • 2510.24684 • Published Oct 28, 2025 • 17
BigCodeArena: Unveiling More Reliable Human Preferences in Code Generation via Execution Paper • 2510.08697 • Published Oct 9, 2025 • 36
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Paper • 2509.23661 • Published Sep 28, 2025 • 47
The Era of Real-World Human Interaction: RL from User Conversations Paper • 2509.25137 • Published Sep 29, 2025 • 18
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play Paper • 2509.25541 • Published Sep 29, 2025 • 140
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31, 2025 • 84
What makes Reasoning Models Different? Follow the Reasoning Leader for Efficient Decoding Paper • 2506.06998 • Published Jun 8, 2025 • 1
CaughtCheating: Is Your MLLM a Good Cheating Detective? Exploring the Boundary of Visual Perception and Reasoning Paper • 2507.00045 • Published Jun 23, 2025 • 1
LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model Paper • 2509.00676 • Published Aug 31, 2025 • 84
Self-Rewarding Vision-Language Model via Reasoning Decomposition Paper • 2508.19652 • Published Aug 27, 2025 • 84
The Trickle-down Impact of Reward (In-)consistency on RLHF Paper • 2309.16155 • Published Sep 28, 2023 • 1
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning Paper • 2407.00617 • Published Jun 30, 2024 • 7
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs Paper • 2412.21187 • Published Dec 30, 2024 • 40
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving Paper • 2412.20735 • Published Dec 30, 2024 • 11