Document Understanding

Document understanding aims to enable computers to comprehend the content and structure of documents, including text, images, and layouts, to extract key information and answer questions. Current research focuses on improving the efficiency and accuracy of multimodal large language models (MLLMs) for this task, often employing techniques like knowledge distillation, synthetic data generation, and efficient visual processing to handle high-resolution and long-context documents. These advancements are significant because they improve information retrieval, automate document processing tasks, and address privacy concerns through techniques like machine unlearning, ultimately impacting various fields from healthcare to finance.

Papers

May 19, 2023

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Mingliang Zhai, Yulin Li, Xiameng Qin, Chen Yi, Qunyi Xie, Chengquan Zhang, Kun Yao, Yuwei Wu, Yunde Jia
Document Understanding Token Merging

May 16, 2023

May 15, 2023

M$^{6}$Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis
Hiuyi Cheng, Peirong Zhang, Sihang Wu, Jiaxin Zhang, Qiyuan Zhu, Zecheng Xie, Jing Li, Kai Ding, Lianwen Jin
Multi Label Document Understanding Document Layout Analysis Large Scale Multimodal Multi Type

April 28, 2023

CCpdf: Building a High Quality Corpus for Visually Rich Documents from Web Crawl Data
Michał Turski, Tomasz Stanisławek, Karol Kaczmarek, Paweł Dyda, Filip Graliński
Language Model Large Corpus Multilingual Language Model Document Understanding Visually Rich Document PDF Document Quality Corpus Web Crawled Data

April 17, 2023

What Makes a Good Dataset for Symbol Description Reading?
Karol Lynch, Joern Ploennigs, Bradley Eck
High Quality Document Understanding Mathematical Formula Symbol Detection Sufficient Representation

April 13, 2023

PDFVQA: A New Dataset for Real-World VQA on PDF Documents
Yihao Ding, Siwen Luo, Hyunsuk Chung, Soyeon Caren Han
Visual Question Answering Document Understanding PDF Document VQA Datasets Ok Vqa

March 1, 2023

StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Yuechen Yu, Yulin Li, Chengquan Zhang, Xiaoqiang Zhang, Zengyuan Guo, Xiameng Qin, Kun Yao, Junyu Han, Errui Ding, Jingdong Wang
Document Understanding Document Image Mask Guided Text Feature

December 6, 2022

Multimodal Tree Decoder for Table of Contents Extraction in Document Images
Pengfei Hu, Zhenrong Zhang, Jianshu Zhang, Jun Du, Jiajia Wu
Document Understanding Table Semantics Document Image Tree Decoder Tree Edit Distance

November 27, 2022

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding
Zilong Wang, Jiuxiang Gu, Chris Tensmeyer, Nikolaos Barmpalios, Ani Nenkova, Tong Sun, Jingbo Shang, Vlad I. Morariu
Document Understanding Multi Granularity Granularity Attention Document Intelligence Multi Granular Feature

November 14, 2022

QueryForm: A Simple Zero-shot Form Entity Query Framework
Zifeng Wang, Zizhao Zhang, Jacob Devlin, Chen-Yu Lee, Guolong Su, Hao Zhang, Jennifer Dy, Vincent Perot, Tomas Pfister
Query Information Document Understanding Entity Extraction Form Like Document Pre Training Task Zero Shot Text to SQL

November 11, 2022

Unimodal and Multimodal Representation Training for Relation Extraction
Ciaran Cooney, Rachel Heyburn, Liam Madigan, Mairead O'Cuinn, Chloe Thompson, Joana Cavadas
Relation Extraction Document Understanding Unimodal Model Joint Representation Multimodal Chart Multimodal Integration

November 7, 2022

On Web-based Visual Corpus Construction for Visual Document Understanding
Donghyun Kim, Teakgyu Hong, Moonbin Yim, Yoonsik Kim, Geewook Kim
Document Understanding Image Corpus

October 6, 2022

XDoc: Unified Pre-training for Cross-Format Document Understanding
Jingye Chen, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
Pre Trained Model Document Understanding Unified Pre Training Diverse Document

September 26, 2022

Improving Document Image Understanding with Reinforcement Finetuning
Bao-Sinh Nguyen, Dung Tien Le, Hieu M. Vu, Tuan Anh D. Nguyen, Minh-Tien Nguyen, Hung Le
Reinforcement Learning Training Data Fine Tuning Policy Gradient Document Understanding Open Information Extraction

September 18, 2022

ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang, Zhengjie Huang, Bin Luo, Qianglong Chen, Qiming Peng, Yinxu Pan, Weichong Yin, Shikun Feng, Yu Sun, Dianhai Yu, Yin Zhang
Document Understanding Visually Rich Document ERNIE ViLG

September 12, 2022

One-Shot Doc Snippet Detection: Powering Search in Document Beyond Text
Abhinav Java, Shripad Deshmukh, Milan Aggarwal, Surgan Jandial, Mausoom Sarkar, Balaji Krishnamurthy
Text Modality Document Understanding Structured Document Full Length Document Snippet Extraction Document Summary Pair

August 23, 2022

Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks
Andrea Gemelli, Sanket Biswas, Enrico Civitelli, Josep Lladós, Simone Marinai
Graph Neural Network Document Understanding Document Analysis

August 17, 2022