Full Length Document

Research on full-length document understanding focuses on efficiently extracting information and answering questions from complex, often visually rich, documents. Current efforts involve developing multimodal models that integrate text, layout, and image information, employing techniques like large language models (LLMs), attention mechanisms (e.g., shifted window attention), and graph-based representations to capture relationships between entities and temporal information. These advancements aim to improve information retrieval, question answering, and document summarization, impacting fields like scientific literature analysis, business intelligence, and digital archiving.

Papers