Page Classification

Page classification aims to automatically categorize individual pages within a document, enabling efficient information retrieval and analysis. Current research emphasizes developing robust models that handle diverse document types (e.g., electronic theses, legal briefs, web pages) and incorporate multimodal information (text, images, HTML structure) using techniques like multimodal deep learning, graph neural networks, and pre-trained language models. This field is crucial for managing large document collections, improving information access, and facilitating tasks such as information extraction and genealogical research.

Papers