Document Classification
Document classification aims to automatically categorize text documents into predefined classes, facilitating efficient information retrieval and analysis. Current research emphasizes improving accuracy and efficiency, focusing on techniques like lightweight supervised learning for rapid processing of large datasets, transformer-based models enhanced with attention mechanisms and pruning strategies to handle long documents, and multimodal approaches integrating textual and visual information. These advancements are crucial for various applications, including digital forensics, medical record analysis, and combating misinformation campaigns, by enabling faster, more accurate, and privacy-preserving document processing.
Papers
Enhancing Document Information Analysis with Multi-Task Pre-training: A Robust Approach for Information Extraction in Visually-Rich Documents
Tofik Ali, Partha Pratim Roy
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma, Siddharth Varia, Nishant Sankaran, Srikar Appalaraju, Bonan Min, Yogarshi Vyas