Paper ID: 2203.00101

ApacheJIT: A Large Dataset for Just-In-Time Defect Prediction

Hossein Keshavarz, Meiyappan Nagappan

In this paper, we present ApacheJIT, a large dataset for Just-In-Time defect prediction. ApacheJIT consists of clean and bug-inducing software changes in popular Apache projects. ApacheJIT has a total of 106,674 commits (28,239 bug-inducing and 78,435 clean commits). Having a large number of commits makes ApacheJIT a suitable dataset for machine learning models, especially deep learning models that require large training sets to effectively generalize the patterns present in the historical data to future data.

Submitted: Feb 28, 2022