Paper ID: 2408.12871

DeepDiveAI: Identifying AI Related Documents in Large Scale Literature Data

Zhou Xiaochen, Liang Xingzhou, Zou Hui, Lu Yi, Qu Jingjing

This paper presents DeepDiveAI, a comprehensive dataset specifically curated to identify AI-related research papers from a large-scale academic literature database. The dataset was created using an advanced Long Short-Term Memory (LSTM) model trained on a binary classification task to distinguish between AI-related and non-AI-related papers. The model was trained and validated on a vast dataset, achieving high accuracy, precision, recall, and F1-score. The resulting DeepDelveAI dataset comprises over 9.4 million AI-related papers published since Dartmouth Conference, from 1956 to 2024, providing a crucial resource for analyzing trends, thematic developments, and the evolution of AI research across various disciplines.

Submitted: Aug 23, 2024