Historical Newspaper

Historical newspapers are increasingly being used as a rich source of data for research, driven by digitization efforts and advancements in natural language processing (NLP). Current research focuses on developing methods for accurate text extraction (improving Optical Character Recognition, or OCR), analyzing the textual content for insights into historical events and societal biases (using quantitative discourse analysis and question-answering models), and overcoming challenges posed by the unique linguistic and structural characteristics of historical documents (e.g., through rule-based and machine learning approaches to layout analysis). These efforts are significantly expanding the accessibility and analytical potential of historical newspaper archives for both humanities and computational research.

Papers