Learning Adaptive Reasoning Paths for Efficient Visual Reasoning
Abstract
Adaptive visual reasoning framework reduces unnecessary computation by dynamically selecting optimal reasoning formats while maintaining accuracy.
Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. However, they often suffer from overthinking, producing unnecessarily long reasoning chains for any tasks. We attribute this issue to Reasoning Path Redundancy in visual reasoning: many visual questions do not require the full reasoning process. To address this, we propose AVR, an adaptive visual reasoning framework that decomposes visual reasoning into three cognitive functions: visual perception, logical reasoning, and answer application. It further enables models to dynamically choose among three response formats: Full Format, Perception-Only Format, and Direct Answer. AVR is trained with FS-GRPO, an adaptation of Group Relative Policy Optimization that encourages the model to select the most efficient reasoning format while preserving correctness. Experiments on multiple vision-language benchmarks show that AVR reduces token usage by 50--90\% while maintaining overall accuracy, especially in perception-intensive tasks. These results demonstrate that adaptive visual reasoning can effectively mitigate overthinking in VRMs. Code and data are available at: https://github.com/RunRiotComeOn/AVR.
Community
We identify reasoning path redundancy in visual reasoning models and propose AVR, which adaptively selects between full reasoning, perception-only, and direct answering. This reduces token usage by 50–90% while maintaining or improving accuracy.
Get this paper in your agent:
hf papers read 2604.14568 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper