Document Image Binarization
Document image binarization aims to convert grayscale or color document images into high-quality binary images, separating foreground text from the background, a crucial preprocessing step for optical character recognition (OCR) and other document analysis tasks. Recent research heavily utilizes deep learning, particularly generative adversarial networks (GANs) and vision transformers (ViTs), often incorporating techniques like wavelet transforms and multi-scale feature extraction to handle diverse degradations and improve efficiency. These advancements lead to more accurate and robust binarization, impacting fields like historical document preservation and large-scale text digitization by enabling better automated processing of challenging document images.
Papers
A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement
Risab Biswas, Swalpa Kumar Roy, Umapada Pal
DocBinFormer: A Two-Level Transformer Network for Effective Document Image Binarization
Risab Biswas, Swalpa Kumar Roy, Ning Wang, Umapada Pal, Guang-Bin Huang