Layout Representation Learning

Layout representation learning focuses on encoding the spatial arrangement of elements within various data types, such as documents, images, and 3D scenes, into computationally useful representations. Current research emphasizes developing effective methods for integrating layout information with other modalities (e.g., text, images) using techniques like large language models, graph neural networks, and autoencoders, often within a self-supervised learning framework. These advancements are improving performance in diverse applications, including document understanding, image retrieval, 3D scene generation, and large-scale model training, by enabling more accurate and efficient processing of complex structured data.

Papers