PA-BDM: Prefix-Adaptive Block Diffusion for Efficient Document Recognition

Efficient Document Recognition with Prefix-Adaptive Block Diffusion

Mingxu Chai, Ziyu Shen, Chenyu Liu, Kaidi Zhang, Jiazheng Zhang, Dingwei Zhu, Zhiheng Xi, Ruoyu Chen, Jun Long, Jihua Kang, Tao Gui, Qi Zhang

arXiv GitHub Hugging Face

πŸ“° News

  • [2026.05] πŸŽ‰ We release PA-BDM, a prefix-adaptive block diffusion framework for efficient document recognition.

πŸ“„ Introduction

Document recognition aims to convert document images containing text, formulas, tables, and complex layouts into structured machine-readable formats. While autoregressive vision-language models have achieved strong recognition quality, their sequential decoding process can be inefficient for long structured outputs. Block diffusion models provide a promising alternative by enabling semi-parallel generation and KV-cache reuse, but existing block diffusion approaches often rely on a fixed block granularity, which limits decoding flexibility and may introduce instability for structure-sensitive recognition tasks.

PA-BDM addresses these limitations with a prefix-adaptive block diffusion framework. Instead of treating the block size as a fixed generation unit, PA-BDM uses it as a maximum candidate generation range and dynamically commits reliable prefixes during decoding. This design enables adaptive generation lengths, timely KV-cache reuse, and more stable recognition of structured document outputs.

✨ Highlights

  • Prefix-Adaptive Decoding: Dynamically commits reliable prefixes within each candidate block, allowing the effective decoding length to adapt to local prediction confidence.

  • Efficient KV-cache Reuse: Enables timely cache updates without waiting for an entire fixed block to be fully resolved.

  • Structure-sensitive Document Recognition: Designed for document recognition tasks involving text, formulas, tables, and structured outputs.

  • Improved Efficiency-Accuracy Trade-off: Achieves faster inference while maintaining strong recognition performance across document recognition benchmarks.

πŸš€ Usage

Please refer to the repository for installation and inference instructions:

❀️ Acknowledgements

This project builds upon prior work and open-source resources including Qwen2.5-VL, DiffusionVL, BD3LMs, and related diffusion language modeling frameworks. We thank the authors for their valuable contributions to the community.

πŸ“ Citation

If you find our work useful, please cite our paper:

@misc{chai2026prefixadaptiveblockdiffusionefficient,
      title={Prefix-Adaptive Block Diffusion for Efficient Document Recognition}, 
      author={Mingxu Chai and Ziyu Shen and Chenyu Liu and Kaidi Zhang and Jiazheng Zhang and Dingwei Zhu and Zhiheng Xi and Ruoyu Chen and Jun Long and Jihua Kang and Tao Gui and Qi Zhang},
      year={2026},
      eprint={2605.16861},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2605.16861}, 
}
Downloads last month
48
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Paper for MingxuChai/PA-BDM