Data-Driven Machine Learning Models for E. coli Concentration Prediction

Alaa Aldein M. S. Ibrahim; Mfanasibili Nkonyane; Mlondi Ngcobo; Tom Walingo; Jules-Raymond Tapamo

doi:10.3390/su18010179

,

and

¹

Discipline of Electrical, Electronic and Computer Engineering, University of KwaZulu-Natal, Durban 4041, South Africa

²

Umngeni-Uthukela Water, Pietermaritzburg 3201, South Africa

^*

Authors to whom correspondence should be addressed.

Sustainability2026, 18(1), 179;https://doi.org/10.3390/su18010179
(registering DOI)

This article belongs to the Section Sustainable Water Management

Version Notes

Order Reprints

Review Reports

Abstract

Accurate assessment of water quality is crucial for protecting public health and promoting environmental sustainability. Conventional laboratory-based methods for evaluating microbial contaminants are often time-consuming, resource-intensive, and reactive in nature, limiting their effectiveness for real-time water quality monitoring and management. This study examines the application of data-driven machine learning models to predict E. coli concentrations in Midmar Dam, utilizing readily available physicochemical parameters. A comparative analysis was conducted using five classical standalone ML algorithms: Random Forest (RF), Support Vector Machine (SVM), k-Nearest Neighbors (kNN), Artificial Neural Network (ANN), and Extreme Gradient Boosting (XGBoost). These models were assessed based on their predictive performance using standard error metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). Among the models evaluated, the kNN algorithm demonstrated superior performance, achieving the lowest MSE and RMSE values, thereby highlighting its effectiveness in capturing the complex relationships between physicochemical indicators and microbial contamination levels. The findings demonstrate the potential of ML-based approaches to serve as efficient, scalable, and proactive tools for sustainable water-quality monitoring and management in dams.

Keywords:

environmental monitoring; Midmar Dam; physico-chemical parameters; sustainable management; water quality assessment

Data-Driven Machine Learning Models for E. coli Concentration Prediction

Abstract

Article Metrics

Citations

Article Access Statistics