Document Understanding

Document understanding aims to enable computers to comprehend the content and structure of documents, including text, images, and layouts, to extract key information and answer questions. Current research focuses on improving the efficiency and accuracy of multimodal large language models (MLLMs) for this task, often employing techniques like knowledge distillation, synthetic data generation, and efficient visual processing to handle high-resolution and long-context documents. These advancements are significant because they improve information retrieval, automate document processing tasks, and address privacy concerns through techniques like machine unlearning, ultimately impacting various fields from healthcare to finance.

Papers

June 14, 2024

Enhancing Question Answering on Charts Through Effective Pre-training Tasks
Ashim Gupta, Vivek Gupta, Shuo Zhang, Yujie He, Ning Zhang, Shalin Shah
New Task Document Understanding QA Datasets Chart Related Visual Knowledge VQA System

June 12, 2024

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem, Subhajit Maity, Ayan Banerjee, Matthew Blaschko, Marie-Francine Moens, Josep Lladós, Sanket Biswas
Knowledge Distillation Document Understanding Visually Rich Document Document Layout Analysis

May 28, 2024

Notes on Applicability of GPT-4 to Document Understanding
Łukasz Borchmann
GPT 4 Document Understanding Short Note Applicability Study OCR Engine GPT 4 Vision

May 23, 2024

Focus Anywhere for Fine-grained Multi-page Document Understanding
Chenglong Liu, Haoran Wei, Jinyue Chen, Lingyu Kong, Zheng Ge, Zining Zhu, Liang Zhao, Jianjian Sun, Chunrui Han, Xiangyu Zhang
Fine Grained Document Understanding Selective Focus

May 6, 2024

GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Nil Biescas, Carlos Boned, Josep Lladós, Sanket Biswas
Link Prediction Graph Attention Network Document Understanding Edge Learning Document Analysis Entity Recognition Performance

May 1, 2024

CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto, Youngmin Baek, Geewook Kim, Ryota Nakao, DongHyun Kim, Moon Bin Yim, Seunghyun Park, Bado Lee
Document Understanding Gallery Style OCR Sequence Generation Model OCR Annotation

April 29, 2024

Machine Unlearning for Document Classification
Lei Kang, Mohamed Ali Souibgui, Fei Yang, Lluis Gomez, Ernest Valveny, Dimosthenis Karatzas
Machine Unlearning Document Understanding Document Classification Document Image Analysis Text Classification Problem

April 16, 2024

A LayoutLMv3-Based Model for Enhanced Relation Extraction in Visually-Rich Documents
Wiam Adnan, Joel Tang, Yassine Bel Khayat Zouggari, Seif Edinne Laatiri, Laurent Lam, Fabien Caspani
Relation Extraction Document Understanding Visually Rich Document Key Information Extraction

April 10, 2024

HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu, Kun Yin, Haoyu Cao, Xinghua Jiang, Xin Li, Yinsong Liu, Deqiang Jiang, Xing Sun, Linli Xu
Document Understanding Visually Rich Document Visual Expert Long Text Modeling

April 8, 2024

LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo, Yufan Shen, Zhaoqing Zhu, Qi Zheng, Zhi Yu, Cong Yao
Instruction Tuning Document Understanding Document Understanding Task Layout Representation Learning

April 5, 2024

BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Ran Zmigrod, Dongsheng Wang, Mathieu Sibue, Yulong Pei, Petr Babkin, Ivan Brugere, Xiaomo Liu, Nacho Navarro, Antony Papadimitriou, William Watson, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah
NLP Task Document Understanding Entity Extraction Information Extraction Task

March 25, 2024

Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao, Haoli Bai, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu, Kam-Fai Wong
Document Understanding Generative Pre Training Document Classification Document Intelligence Generative Layout

March 21, 2024

LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding
Masato Fujitake
Large Language Model Document Understanding Visually Rich Document Document Analysis Document Image Classification

March 19, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Anwen Hu, Haiyang Xu, Jiabo Ye, Ming Yan, Liang Zhang, Bo Zhang, Chen Li, Ji Zhang, Qin Jin, Fei Huang, Jingren Zhou
Document Understanding Structure Learning Visually Rich Document Unified Learning OCR Free

February 29, 2024

Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang, Zhihao Guo, Mingming Gong, Haoyu Cao, Yinsong Liu, Deqiang Jiang, Xing Sun
Contrastive Learning Fine Grained Visual Language Model Document Understanding

February 28, 2024

February 15, 2024

LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott, Yves-Noel Weweler, Adrian Ulges, Faisal Shafait, Dirk Krechel, Darko Obradovic
Document Understanding Document Relevance Multi Modal Transformer LLM Generated Text Document transFormer

February 5, 2024

Financial Report Chunking for Effective Retrieval Augmented Generation
Antonio Jimeno Yepes, Yao You, Jan Milczek, Sebastian Laverde, Renyu Li
Retrieval Augmented Generation Document Understanding Financial Report Paragraph Level

January 24, 2024

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka, Taichi Iki, Kyosuke Nishida, Kuniko Saito, Jun Suzuki
Zero Shot Data Set Multimodal Large Language Model Human Instruction Document Understanding MPT 7b Instruct Reading System