You are currently viewing a new version of our website. To view the old version click .
Sustainability
  • This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
  • Article
  • Open Access

23 December 2025

Data-Driven Machine Learning Models for E. coli Concentration Prediction

,
,
,
and
1
Discipline of Electrical, Electronic and Computer Engineering, University of KwaZulu-Natal, Durban 4041, South Africa
2
Umngeni-Uthukela Water, Pietermaritzburg 3201, South Africa
*
Authors to whom correspondence should be addressed.
Sustainability2026, 18(1), 179;https://doi.org/10.3390/su18010179 
(registering DOI)
This article belongs to the Section Sustainable Water Management

Abstract

Accurate assessment of water quality is crucial for protecting public health and promoting environmental sustainability. Conventional laboratory-based methods for evaluating microbial contaminants are often time-consuming, resource-intensive, and reactive in nature, limiting their effectiveness for real-time water quality monitoring and management. This study examines the application of data-driven machine learning models to predict E. coli concentrations in Midmar Dam, utilizing readily available physicochemical parameters. A comparative analysis was conducted using five classical standalone ML algorithms: Random Forest (RF), Support Vector Machine (SVM), k-Nearest Neighbors (kNN), Artificial Neural Network (ANN), and Extreme Gradient Boosting (XGBoost). These models were assessed based on their predictive performance using standard error metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE). Among the models evaluated, the kNN algorithm demonstrated superior performance, achieving the lowest MSE and RMSE values, thereby highlighting its effectiveness in capturing the complex relationships between physicochemical indicators and microbial contamination levels. The findings demonstrate the potential of ML-based approaches to serve as efficient, scalable, and proactive tools for sustainable water-quality monitoring and management in dams.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.