Visually Rich Document
Visually rich documents (VRDs), containing diverse elements like text, images, tables, and charts, present a significant challenge for automated information extraction. Current research focuses on developing robust multimodal models, often leveraging transformer architectures and graph neural networks, to effectively integrate visual and textual information, addressing issues like layout understanding and reading order prediction to improve information extraction accuracy and efficiency. This field is crucial for advancing document understanding across various domains, impacting applications ranging from scientific literature analysis to business process automation.
Papers
CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data
Michał Turski, Tomasz Stanisławek, Karol Kaczmarek, Paweł Dyda, Filip Graliński
Information Redundancy and Biases in Public Document Information Extraction Benchmarks
Seif Laatiri, Pirashanth Ratnamogan, Joel Tang, Laurent Lam, William Vanhuffel, Fabien Caspani