Document Generation

Document generation research focuses on creating realistic and diverse documents, encompassing both textual content and visual layout, to address limitations in data availability for tasks like visual document understanding and question answering. Current efforts utilize autoregressive models and deep neural networks, often incorporating Bayesian methods for risk minimization and improved robustness against noisy data, to generate synthetic documents or enhance existing retrieval-augmented generation pipelines. This work is significant because it addresses data scarcity challenges, improves the performance of downstream tasks like document parsing and question answering, and enables the development of more robust and accurate AI systems for processing and understanding documents.

Papers