Visually Rich Document
Visually rich documents (VRDs), containing diverse elements like text, images, tables, and charts, present a significant challenge for automated information extraction. Current research focuses on developing robust multimodal models, often leveraging transformer architectures and graph neural networks, to effectively integrate visual and textual information, addressing issues like layout understanding and reading order prediction to improve information extraction accuracy and efficiency. This field is crucial for advancing document understanding across various domains, impacting applications ranging from scientific literature analysis to business process automation.
Papers
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
Hao Wang, Qingxuan Wang, Yue Li, Changqing Wang, Chenhui Chu, Rui Wang
Vision-Enhanced Semantic Entity Recognition in Document Images via Visually-Asymmetric Consistency Learning
Hao Wang, Xiahua Chen, Rui Wang, Chenhui Chu