Document transFormer

Document Transformers are a class of deep learning models designed to understand and extract information from documents, addressing limitations of traditional methods in handling complex layouts and diverse data modalities. Current research focuses on improving model architectures like the Document Image Transformer (DiT) and others, often incorporating self-supervised pre-training and techniques like layout-aware prompting to enhance performance on tasks such as information extraction, classification, and question answering. These advancements are significant for various fields, enabling more efficient processing of large document collections in areas like scientific literature analysis, historical research, and legal document processing. The development of robust and efficient document transformers promises to significantly accelerate knowledge discovery and automate information extraction from diverse document sources.

Papers

June 17, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Renqiu Xia, Song Mao, Xiangchao Yan, Hongbin Zhou, Bo Zhang, Haoyang Peng, Jiahao Pi, Daocheng Fu, Wenjie Wu, Hancheng Ye, Shiyang Feng, Bin Wang, Chao Xu, Conghui He, Pinlong Cai, Min Dou, Botian Shi, Sheng Zhou, Yongwei Wang, Bin Wang, Junchi Yan, Fei Wu, Yu Qiao
Training Data Text Benchmark Document transFormer

May 23, 2024

Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification
Taylor Archibald, Tony Martinez
Semantic Segmentation Jina Embeddings Fine Grained Recognition Efficient Classification Unsupervised Classification Semantic Segmentation Mask Document transFormer

February 15, 2024

LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott, Yves-Noel Weweler, Adrian Ulges, Faisal Shafait, Dirk Krechel, Darko Obradovic
Document Understanding Document Relevance Multi Modal Transformer LLM Generated Text Document transFormer

July 16, 2023

DocTr: Document Transformer for Structured Information Extraction in Documents
Haofu Liao, Aruni RoyChowdhury, Weijian Li, Ankan Bansal, Yuting Zhang, Zhuowen Tu, Ravi Kumar Satzoda, R. Manmatha, Vijay Mahadevan
Document Relevance Explicit in Document Tagging Anchor Based Structured Information Extraction Entity Discovery Document transFormer

March 4, 2022

DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
Self Supervised Pretraining Self Supervised Transformer Document Image Document Intelligence Document transFormer

February 1, 2022

WebFormer: The Web-page Transformer for Structure Information Extraction
Qifan Wang, Yi Fang, Anirudh Ravula, Fuli Feng, Xiaojun Quan, Dongfang Liu
Unstructured Information Structure Information Extraction Document transFormer Web Extraction

December 8, 2021

Joint Global and Local Hierarchical Priors for Learned Image Compression
Jun-Hyuk Kim, Byeongho Heo, Jong-Seok Lee
Image Compression Learned Image Compression Joint Framework Entropy Model Rate Distortion Performance Document transFormer Hierarchical Prior

November 30, 2021

OCR-free Document Understanding Transformer
Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park
OCR Free OCR Model Document transFormer