Paper ID: 2212.04419

Mining Explainable Predictive Features for Water Quality Management

Conor Muldoon, Levent Görgü, John J. O'Sullivan, Wim G. Meijer, Gregory M. P. O'Hare

With water quality management processes, identifying and interpreting relationships between features, such as location and weather variable tuples, and water quality variables, such as levels of bacteria, is key to gaining insights and identifying areas where interventions should be made. There is a need for a search process to identify the locations and types of phenomena that are influencing water quality and a need to explain how the quality is being affected and which factors are most relevant. This paper addresses both of these issues. A process is developed for collecting data for features that represent a variety of variables over a spatial region and which are used for training models and inference. An analysis of the performance of the features is undertaken using the models and Shapley values. Shapley values originated in cooperative game theory and can be used to aid in the interpretation of machine learning results. Evaluations are performed using several machine learning algorithms and water quality data from the Dublin Grand Canal basin.

Submitted: Dec 8, 2022