Paper ID: 2201.03225

Explainable AI Integrated Feature Selection for Landslide Susceptibility Mapping using TreeSHAP

Muhammad Sakib Khan Inan, Istiakur Rahman

Landslides have been a regular occurrence and an alarming threat to human life and property in the era of anthropogenic global warming. An early prediction of landslide susceptibility using a data-driven approach is a demand of time. In this study, we explored the eloquent features that best describe landslide susceptibility with state-of-the-art machine learning methods. In our study, we employed state-of-the-art machine learning algorithms including XgBoost, LR, KNN, SVM, and Adaboost for landslide susceptibility prediction. To find the best hyperparameters of each individual classifier for optimized performance, we have incorporated the Grid Search method, with 10 Fold Cross-Validation. In this context, the optimized version of XgBoost outperformed all other classifiers with a Cross-validation Weighted F1 score of 94.62 %. Followed by this empirical evidence, we explored the XgBoost classifier by incorporating TreeSHAP, a game-theory-based statistical algorithm used to explain Machine Learning models, to identify eloquent features such as SLOPE, ELEVATION, TWI that complement the performance of the XGBoost classifier mostly and features such as LANDUSE, NDVI, SPI which has less effect on models performance. According to the TreeSHAP explanation of features, we selected the 9 most significant landslide causal factors out of 15. Evidently, an optimized version of XgBoost along with feature reduction by 40 % has outperformed all other classifiers in terms of popular evaluation metrics with a Cross-Validation Weighted F1 score of 95.01 % on the training and AUC score of 97 %

Submitted: Jan 10, 2022