AI-Driven Anomaly Detection in Smart Water Metering Systems Using Ensemble Learning
Abstract
1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Data Collection Methods
2.3. Proposed Framework
2.3.1. Step 1: Data Acquisition
2.3.2. Step 2: Data Preparation and Balancing
- Meter ID (unique identifier for each household water meter)
- Reading Date (timestamp of the monthly reading)
- Monthly Consumption (in cubic meters)
- Derived Features: interpolated daily consumption, and anomaly labels
- -
- Spikes: Single-day values above Q3 + 1.5 × IQR
- -
- Sudden changes: Rapid deviations from previous baseline values
NB: Anomaly Class Definitions
- ➢
- Class 0—Normal Consumption
- ➢
- Class 1—Spike
- ➢
- Class 2—Sudden Changes
Data Imbalance Handling
Algorithm 1: Dataset Input |
Input Dataset: X ← Feature matrix (interpolated daily water consumption (PCHIP)) y ← Labels generated by LSTM-AE reconstruction error where: Class 0 = Normal Consumption Class 1 = Spike Anomaly Class 2 = Sudden Change |
Algorithm 2: RUS |
Input: X ← feature matric of shape (n_samples, n_features) y ← corresponding label vector (normal = 0, anomaly = 1 or 2) Output: X_resampled, y_resampled←Balanced dataset with equal class representation Procedure: 1. Identify majority and minority class indices in y. 2. Extract: X_majority, y_majority←samples where y==majority class y_minority, y_minority←samples where y==minority class 3. Randomly under-sample X_majority to match the size of X_minority. 4. Concatenate: X_resampled ← [X_minority; X_sampled_majority] y_resampled ← [y_minority; y_sampled_majority] 5. Shuffle X_resampled and y_resampled to ensure randomization. 6. Return X_resampled, y_resampled |
Algorithm 3: SMOTE |
Input: X ← Feature matrix of shape (n_samples, n_features) y ← Corresponding label vector (normal = 0, anomaly = 1 or 2) k ← Number of nearest neighbors for interpolation (default: 5) Output: X_augmented, y_augmented ← Dataset including original and synthetic minority samples Procedure: 1. Identify minority class samples in X and y. 2. For each minority instance xi: a. Identify k nearest neighbors from within the minority class. b. Randomly select one or more neighbors. c. For each selected neighbor xj: i. Generate synthetic sample xs: xs = xi + rand(0,1) × (xj − xi) ii. Append xs to X_synthetic and assign corresponding label to y_synthetic. 3. Concatenate: X_augmented ← [X; X_synthetic] y_augmented ← [y; y_synthetic] 4. Return X_augmented, y_augmented |
Algorithm 4: SMOTEENN |
Input: X ← Feature matrix of shape (n_samples, n_features) y ← Corresponding label vector (normal = 0, anomaly = 1 or 2) k ← Number of neighbors for both SMOTE and ENN stages (5) Output: X_cleaned, y_cleaned ← Balanced and noise-filtered dataset Procedure: 1. Apply SMOTE to X and y: a. Follow the SMOTE procedure to obtain X_smote and y_smote. 2. Apply ENN to (X_smote, y_smote): For each instance xi in X_smote: a. Identify k nearest neighbors. b. If label of xi differs from majority label among its neighbors: Remove xi from X_smote and y_smote. 3. The remaining instances form: X_cleaned ← filtered feature matrix y_cleaned ← corresponding filtered labels 4. Return X_cleaned, y_cleaned |
2.3.3. Step 3: ML Algorithms for Anomaly Detection
Algorithm 5: SVM |
Input: X, y ← Feature matrix and LSTM-labeled target Resample method ← RUS, SMOTE, or SMOTEENN Kernel ← RBF C ← Regularization parameter Output: ← Predicted class labels for X_test Anomaly_Index ← Distance to decision boundary Procedure: 1. Apply selected resampling method to (X, y) → (X_bal, y_bal) 2. Standardize X_bal 3. Train SVM using kernel and C on (X_bal, y_bal) on X_test 5. Compute Anomaly_Index = |decision_function(X_test)| , Anomaly_Index |
Algorithm 6: KNN |
Input: X, y ← LSTM-labeled dataset Resample method ← RUS, SMOTE, or SMOTEENN k ← Number of neighbors Output: ← Predicted class labels Anomaly_Index ← Mean distance to k neighbors Procedure: 1. Apply resampling to (X, y) → (X_bal, y_bal) 2. Normalize X_bal and X_test 3. For each sample x in X_test: a. Compute distance to all points in X_bal b. Identify k-nearest neighbors by majority voting d. Anomaly_Index ← Mean distance to neighbors , Anomaly_Index |
Algorithm 7: Random Forest |
Input: X, y ← LSTM-labeled dataset Resample method ← RUS, SMOTE, or SMOTEENN n_trees ← Number of decision trees Output: ← Class predictions Anomaly_Index ← Voting confidence or entropy Procedure: 1. Resample (X, y) → (X_bal, y_bal) 2. Train RF on X_bal with n_trees, using feature sub-sampling 3. For each x in X_test: a. Predict class from each tree ← Majority vote c. Anomaly_Index ← 1 − max(vote_distribution) , Anomaly_Index |
Algorithm 8: Decision Tree |
Input: X, y ← LSTM-labeled dataset Resample method ← RUS, SMOTE, or SMOTEENN Criterion ← Entropy Output: ← Class predictions Anomaly_Index ← Leaf node purity or depth Procedure: 1. Resample (X, y) → (X_bal, y_bal) 2. Train DT classifier using specified criterion 3. For each x in X_test: a. Traverse tree to a leaf node based on majority class in leaf c. Anomaly_Index ← Depth of leaf/impurity measure , Anomaly_Index |
Algorithm 9: Stacking Ensemble |
Input: X_train ← Resampled dataset using SMOTEENN y_train ← Class labels base_models ← {SVM, k-NN, DT, RF} meta_model ← Logistic Regression Output: stacking_model ← Trained ensemble ← Predicted class labels Procedure: 1. Train each base_modeli on (X_train, y_train). 2. For each x ∈ X_train: a. Predict base outputs pi = base_modeli(x) 3. Form meta-feature vector X_meta ← [p1, p2, p3, p4] 4. Train meta_model on (X_meta, y_train) 5. For each x ∈ X_test: a. Get base predictions [p1, …, p4] ← meta_model([p1, …, p4]) |
Algorithm 10: Voting Ensemble |
Input: X_train ← Resampled dataset using SMOTEENN y_train ← Class labels base_models ← {SVM, k-NN, DT, RF} voting_type ← {‘hard’ or ‘soft’} Output: ← Final prediction anomaly_index ← Voting confidence Procedure: 1. Train each base_modeli on (X_train, y_train) 2. For each x ∈ X_test: a. Collect predictions pi from all base models b. If voting_type == ‘hard’: ← class with majority votes anomaly_index ← ratio of minority votes c. If voting_type == ‘soft’: ← class with highest average predicted probability anomaly_index ← standard deviation of probabilities and anomaly_index |
2.3.4. Step 4: Key Performance Metrics
2.4. Experimental Setup
3. Results and Discussion
- (a)
- Effects of Class Imbalance in Classification
- (b)
- The Receiver Operating Characteristics (ROC) Curves
- (c)
- Confusion Matrix
- (d)
- Ensemble ML Classifiers
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ramotsoela, D.; Abu-Mahfouz, A.; Hancke, G. A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study. Sensors 2018, 18, 2491. [Google Scholar] [CrossRef] [PubMed]
- Okoli, N.J.; Kabaso, B. Building a Smart Water City: IoT Smart Water Technologies, Applications, and Future Directions. Water 2024, 16, 557. [Google Scholar] [CrossRef]
- Kanyama, M.N.; Shava, F.B.; Gamundani, A.M.; Hartmann, A. Machine learning applications for anomaly detection in Smart Water Metering Networks: A systematic review. Phys. Chem. Earth 2024, 134, 103558. [Google Scholar] [CrossRef]
- Kanyama, M.N.; Shava, F.B.; Gamundani, A.M.; Hartmann, A. Anomalies identification in Smart Water Metering Networks: Fostering improved water efficiency. Phys. Chem. Earth 2024, 134, 103592. [Google Scholar] [CrossRef]
- Marar, R.W.; Marar, H.W. A reliable algorithm for Efficient Water Delivery and Smart Metering in Water-Scarce Regions. Asian J. Water Environ. Pollut. 2024, 21, 1–9. [Google Scholar] [CrossRef]
- Mankad, U.; Arolkar, H. Smart Water Metering Implementation. In Smart Trends in Computing and Communications; Lecture Notes in Networks and Systems; Springer Science and Business Media: Berlin, Germany, 2023; pp. 721–731. [Google Scholar] [CrossRef]
- Beal, C. The 2014 Review of Smart Metering and Intelligent Water Networks in Australia & New Zealand; Water Services Association of Australia: Docklands, Australia, 2014. [Google Scholar] [CrossRef]
- Ogboh, V.C.; Ogboke, H.N.; Obiora-Okeke, C.A.; Nwoye, N.A. Design and Implementation of IoT Based Smart Meter. Int. J. Eng. Invent. 2024, 13, 158–168. Available online: www.ijeijournal.com (accessed on 20 February 2025).
- Nyirenda, C.N.; Makwara, P.; Shitumbapo, L. Particle Swarm Optimization Based Placement of Data Acquisition Points in a Smart Water Metering Network. In Proceedings of SAI Intelligent Systems Conference; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2018; Volume 16, pp. 905–916. [Google Scholar] [CrossRef]
- Shitumbapo, L.N.; Nyirenda, C.N. Simulation of a Smart Water Metering Network in Tsumeb East, Namibia. In Proceedings of the 2015 International Conference on Emerging Trends in Networks and Computer Communications, ETNCC 2015, Windhoek, Namibia, 17–20 May 2015; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2015; pp. 44–49. [Google Scholar] [CrossRef]
- Srivastava, J.; Sharan, A. SMOTEEN Hybrid Sampling Based Improved Phishing Website Detection. TechRxiv 2022, preprint. [Google Scholar] [CrossRef]
- Abayomi-Alli, O.O.; Damasevicius, R.; Maskeliunas, R.; Abayomi-Alli, A. BiLSTM with Data Augmentation using Interpolation Methods to Improve Early Detection of Parkinson Disease. In Proceedings of the 2020 Federated Conference on Computer Science and Information Systems, FedCSIS 2020, Sofia, Bulgaria, 6–9 September 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 371–380. [Google Scholar] [CrossRef]
- Kanyama, M.N.; Shava, F.B.; Gamundani, A.M.; Hartmann, A. Enhancing Anomaly Detection in Smart Water Metering Networks with LSTM-Autoencoder and Data Augmentation Techniques. In 2024 4th International Multidisciplinary Information Technology and Engineering Conference (IMITEC), Vanderbijlpark, South Africa, 27–29 November 2024; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024; pp. 20–28. [Google Scholar] [CrossRef]
- Oladipupo, T. Types of Machine Learning Algorithms. In New Advances in Machine Learning; InTechOpen: Londra, UK, 2010. [Google Scholar] [CrossRef]
- Fang, S.; Sun, W.; Huang, L. Anomaly Detection for Water Supply Data using Machine Learning Technique. J. Phys. Conf. Ser. 2019, 1345, 022054. [Google Scholar] [CrossRef]
- Garmaroodi, M.S.S.; Farivar, F.; Haghighi, M.S.; Shoorehdeli, M.A.; Jolfaei, A. Detection of Anomalies in Industrial IoT Systems by Data Mining: Study of CHRIST Osmotron Water Purification System. IEEE Internet Things J. 2021, 8, 10280–10287. [Google Scholar] [CrossRef]
- Iyer, S.; Thakur, S.; Dixit, M.; Katkam, R.; Agrawal, A.; Kazi, F. Blockchain and Anomaly Detection based Monitoring System for Enforcing Wastewater Reuse. 2019. Available online: https://github.com/sreeragiyer/Wastewater-Reuse (accessed on 15 January 2025).
- Mahmoud, H.; Wu, W.; Gaber, M.M. A Time-Series Self-Supervised Learning Approach to Detection of Cyber-physical Attacks in Water Distribution Systems. Energies 2022, 15, 914. [Google Scholar] [CrossRef]
- Mounce, S.R.; Pedraza, C.; Jackson, T.; Linford, P.; Boxall, J.B. Cloud based machine learning approaches for leakage assessment and management in smart water networks. Procedia Eng. 2015, 119, 43–52. [Google Scholar] [CrossRef]
- Ramotsoela, T.D.; Hancke, G.P.; Abu-Mahfouz, A.M. Behavioural Intrusion Detection in Water Distribution Systems Using Neural Networks. IEEE Access 2020, 8, 190403–190416. [Google Scholar] [CrossRef]
- Ramotsoela, D.T.; Hancke, G.P.; Abu-Mahfouz, A.M. Attack detection in water distribution systems using machine learning. Hum.-Centric Comput. Inf. Sci. 2019, 9, 13. [Google Scholar] [CrossRef]
- Taormina, R.; Galelli, S. Deep-Learning Approach to the Detection and Localization of Cyber-Physical Attacks on Water Distribution Systems. J Water Resour. Plan. Manag. 2018, 144, 04018065. [Google Scholar] [CrossRef]
- Wang, D.; Wang, P.; Zhou, J.; Sun, L.; Du, B.; Fu, Y. Defending water treatment networks: Exploiting spatio-temporal effects for cyber attack detection. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2020; pp. 32–41. [Google Scholar] [CrossRef]
- Luque, A.; Carrasco, A.; Martín, A.; Lama, J.R. Exploring symmetry of binary classification performance metrics. Symmetry 2019, 11, 47. [Google Scholar] [CrossRef]
- De Diego, I.M.; Redondo, A.R.; Fernández, R.R.; Navarro, J.; Moguerza, J.M. General Performance Score for classification problems. Appl. Intell. 2022, 52, 12049–12063. [Google Scholar] [CrossRef]
- Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar] [CrossRef]
- Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
- Powers, D. Evaluation_From_Precision_Recall_and_F-Factor_to_R. Mach. Learn. Technol. 2008, 2. [Google Scholar]
- Vujović, Ž. Classification Model Evaluation Metrics. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 599–606. [Google Scholar] [CrossRef]
- Liu, Y.; Zhou, Y.; Wen, S.; Tang, C. A Strategy on Selecting Performance Metrics for Classifier Evaluation. Int. J. Mob. Comput. Multimed. Commun. 2014, 6, 20–35. [Google Scholar] [CrossRef]
- Wongvorachan, T.; He, S.; Bulut, O. A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining. Information 2023, 14, 54. [Google Scholar] [CrossRef]
- Husain, G.; Nasef, D.; Jose, R.; Mayer, J.; Bekbolatova, M.; Devine, T.; Toma, M. SMOTE vs. SMOTEENN: A Study on the Performance of Resampling Algorithms for Addressing Class Imbalance in Regression Models. Algorithms 2025, 18, 37. [Google Scholar] [CrossRef]
- Muaz, A.; Jayabalan, M.; Thiruchelvam, V. A Comparison of Data Sampling Techniques for Credit Card Fraud Detection. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 477–485. Available online: www.ijacsa.thesai.org (accessed on 20 December 2024). [CrossRef]
- Singh, A.; Ranjan, R.K.; Tiwari, A. Credit Card Fraud Detection under Extreme Imbalanced Data: A Comparative Study of Data-level Algorithms. J. Exp. Theor. Artif. Intell. 2022, 34, 571–598. [Google Scholar] [CrossRef]
- Kim, M.; Hwang, K.B. An empirical evaluation of sampling methods for the classification of imbalanced data. PLoS ONE 2022, 17, e0271260. [Google Scholar] [CrossRef]
- Kudithipudi, S.; Narisetty, N.; Kancherla, G.R.; Bobba, B. Evaluating the Efficacy of Resampling Techniques in Addressing Class Imbalance for Network Intrusion Detection Systems Using Support Vector Machines. Ing. Syst. d’Inform. 2023, 28, 1229–1236. [Google Scholar] [CrossRef]
- Fernando, C.D.; Weerasinghe, P.T.; Walgampaya, C.K. Heart Disease Risk Identification using Machine Learning Techniques for a Highly Imbalanced Dataset: A Comparative Study. KDU J. Multidiscip. Stud. 2022, 4, 43–55. [Google Scholar] [CrossRef]
- Ako, R.E.; Aghware, F.O.; Okpor, M.D.; Akazue, M.I.; Yoro, R.E.; Ojugo, A.A.; Setiadi, D.R.I.M.; Odiakaose, C.C.; Abere, R.A.; Emordi, F.U.; et al. Effects of Data Resampling on Predicting Customer Churn via a Comparative Tree-based Random Forest and XGBoost. J. Comput. Theor. Appl. 2024, 2, 86–101. [Google Scholar] [CrossRef]
- Misr International University (MIU). In Proceedings of the 2024 International Mobile, Intelligent, and Ubiquitous Computing Conference. Cairo, Egypt, 8–9 May 2024. IEEE: Piscataway, NJ, USA, 2024. [Google Scholar]
JAN_17 | FEB_17 | MAR_17 | APR_17 | MAY_17 | JUN_17 | JUL_17 | AUG_17 | SEP_17 | OCT_17 | NOV_17 | DEC_17 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 808 | 808 | 808 | 808 | 808 | 808 | 808 | 808 | 808 | 808 | 808 | 808 |
2 | 8.884901 | 6.440594 | 6.460396 | 6.69802 | 11.00124 | 7.90099 | 6.87995 | 11.63243 | 11.53713 | 12.75371 | 15.03094 | 23.64604 |
3 | 16.2794 | 15.14732 | 10.27555 | 15.47818 | 17.16813 | 12.9908 | 13.5372 | 19.92893 | 18.20954 | 23.09042 | 22.40475 | 115.3365 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 4 | 5 |
6 | 4 | 3 | 4 | 4 | 7 | 6 | 5 | 8 | 8 | 9 | 11 | 12 |
7 | 13 | 9 | 10 | 10 | 16 | 11 | 9 | 15 | 15.25 | 17 | 19 | 21.25 |
8 | 270 | 311 | 191 | 373 | 272 | 208 | 307 | 333 | 281 | 381 | 310 | 3110 |
JAN_18 | FEB_18 | MAR_18 | APR_18 | MAY_18 | JUN_18 | JUL_18 | AUG_18 | SEP_18 | OCT_18 | NOV_18 | DEC_18 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1028 | 1028 | 1028 | 1028 | 1028 | 1028 | 1028 | 1028 | 1028 | 1028 | 1028 | 1028 |
2 | 0.227626 | 20.65661 | 11.32977 | 10.93288 | 8.238327 | 11.1216 | 11.29767 | 8.965953 | 13.66926 | 11.54086 | 14.63327 | 10.3142 |
3 | 2.266288 | 39.7629 | 18.86923 | 17.55782 | 11.17806 | 17.50793 | 16.64495 | 11.02629 | 19.78422 | 14.49955 | 17.91407 | 14.04124 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 0 | 0 | 0 | 0 | 1 | 2 | 2 | 3 | 3 | 5 | 3 |
6 | 0 | 12.5 | 7 | 7 | 6 | 9 | 9 | 7 | 11 | 9 | 11 | 7 |
7 | 0 | 27 | 15 | 14 | 12 | 15 | 15 | 13 | 18 | 15 | 20 | 14 |
8 | 51 | 677 | 312 | 298 | 224 | 342 | 314 | 233 | 366 | 209 | 333 | 199 |
JAN_19 | FEB_19 | MAR_19 | APR_19 | MAY_19 | JUN_19 | JUL_19 | AUG_19 | SEP_19 | OCT_19 | NOV_19 | DEC_19 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1126 | 1126 | 1126 | 1126 | 1126 | 1126 | 1126 | 1126 | 1126 | 1126 | 1126 | 1126 |
2 | 0.045293 | 26.50622 | 11.02575 | 11.67584 | 10.18206 | 9.974245 | 14.05595 | 10.18739 | 13.2167 | 14.33481 | 10.73535 | 12.47336 |
3 | 0.709736 | 33.67401 | 13.50688 | 25.35285 | 12.04723 | 10.70087 | 18.20631 | 12.76431 | 16.20351 | 18.03091 | 11.60181 | 18.74732 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 0 | 6 | 2 | 3 | 3 | 3 | 5 | 4 | 6 | 6 | 4 | 5 |
6 | 0 | 19 | 8 | 8 | 8 | 8 | 12 | 8 | 11 | 12 | 8 | 10 |
7 | 0 | 37 | 16 | 15 | 14 | 14 | 19 | 14 | 18 | 19 | 15 | 16 |
8 | 15 | 633 | 261 | 712 | 247 | 220 | 336 | 220 | 346 | 353 | 214 | 424 |
JAN_20 | FEB_20 | MAR_20 | APR_20 | MAY_20 | JUN_20 | JUL_20 | AUG_20 | SEP_20 | OCT_20 | NOV_20 | DEC_20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1224 | 1224 | 1224 | 1224 | 1224 | 1224 | 1224 | 1224 | 1224 | 1224 | 1224 | 1224 |
2 | 9.714052 | 11.19608 | 11.44853 | 0.896242 | 24.17484 | 9.834967 | 10.00408 | 10.78023 | 20.45425 | 12.8317 | 16.75163 | 11.61356 |
3 | 12.59281 | 16.81064 | 17.42454 | 5.257695 | 41.42892 | 16.96552 | 11.42815 | 13.40226 | 22.42771 | 22.4398 | 19.72643 | 15.57206 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 3 | 3 | 4 | 0 | 9 | 3 | 4 | 4 | 10 | 5 | 6 | 4 |
6 | 8 | 9 | 9 | 0 | 19 | 8 | 8 | 9.5 | 18 | 10 | 14 | 9 |
7 | 13 | 15 | 14 | 0 | 31 | 13 | 14 | 14 | 28 | 17 | 23 | 15 |
8 | 284 | 334 | 340 | 114 | 773 | 417 | 246 | 298 | 540 | 605 | 390 | 255 |
JAN_21 | FEB_21 | MAR_21 | APR_21 | MAY_21 | JUN_21 | JUL_21 | AUG_21 | SEP_21 | OCT_21 | NOV_21 | DEC_21 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1268 | 1268 | 1268 | 1268 | 1268 | 1268 | 1268 | 1268 | 1268 | 1268 | 1268 | 1268 |
2 | 12.85489 | 9.337539 | 13.85252 | 14.73344 | 13.72319 | 10.9653 | 13.21767 | 14.18297 | 15.74842 | 13.79968 | 20.31467 | 10.55205 |
3 | 22.42158 | 13.37025 | 18.09838 | 20.60463 | 64.03196 | 15.23938 | 18.27562 | 21.99177 | 18.26471 | 16.85729 | 24.59939 | 14.96504 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 4 | 3 | 5 | 5 | 4 | 4 | 5 | 5 | 7 | 5 | 8.75 | 4 |
6 | 9 | 8 | 11 | 12 | 8 | 9 | 11 | 12 | 13 | 11 | 16 | 8 |
7 | 16 | 13 | 18 | 19 | 13 | 14 | 17 | 19 | 21 | 18 | 26 | 13 |
8 | 532 | 311 | 339 | 426 | 1599 | 309 | 328 | 516 | 385 | 302 | 465 | 300 |
JAN_22 | FEB_22 | MAR_22 | APR_22 | MAY_22 | JUN_22 | JUL_22 | AUG_22 | SEP_22 | OCT_22 | NOV_22 | DEC_22 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1303 | 1303 | 1303 | 1303 | 1303 | 1303 | 1303 | 1303 | 1303 | 1303 | 1303 | 1303 |
2 | 41.32694 | 38.09286 | 23.51343 | 26.6769 | 28.39908 | 22.10207 | 26.86416 | 24.38219 | 33.0284 | 28.8089 | 32.45741 | 26.82348 |
3 | 731.3927 | 734.125 | 416.9982 | 471.3119 | 502.6794 | 390.7141 | 473.3532 | 430.6239 | 584.7378 | 508.1026 | 573.1599 | 473.7025 |
4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 | 7 | 3 | 4 | 5 | 5 | 4 | 5 | 5 | 7 | 5 | 7 | 5 |
6 | 17 | 9 | 10 | 11 | 12 | 9 | 12 | 10 | 13 | 12 | 14 | 11 |
7 | 29 | 15 | 16 | 19 | 19 | 15 | 19 | 17 | 21 | 20 | 21 | 18 |
8 | 26,406 | 24,545 | 15,053 | 17,013 | 18,144 | 14,092 | 17,089 | 15,544 | 21,110 | 18,342 | 20,693 | 17,100 |
Predicted Anomalies | Predicted Normal | |
---|---|---|
Actual anomalies | TP | FN |
Actual normal | FP | TN |
Method | ML Model | Accuracy | F1 Score | Recall (TPR) | AUC Score |
---|---|---|---|---|---|
RUS | Random forest | 0.596154 | 0.589492 | 0.596154 | 0.775768 |
Decision tree | 0.596154 | 0.589492 | 0.596154 | 0.696218 | |
kNN | 0.634615 | 0.60812 | 0.634615 | 0.782975 | |
SVM | 0.653846 | 0.518115 | 0.653846 | 0.852747 | |
SMOTE | Random forest | 0.714756 | 0.715368 | 0.714756 | 0.888157 |
Decision tree | 0.715304 | 0.71586 | 0.715304 | 0.786466 | |
kNN | 0.7548 | 0.751849 | 0.7548 | 0.890988 | |
SVM | 0.687877 | 0.615475 | 0.687877 | 0.841265 | |
SMOTEENN | Random forest | 0.995541 | 0.995539 | 0.995541 | 0.998008 |
Decision tree | 0.995541 | 0.995539 | 0.995541 | 0.996743 | |
kNN | 0.971014 | 0.97079 | 0.971014 | 0.99726 | |
SVM | 0.826087 | 0.783482 | 0.826087 | 0.924708 |
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
Stacking | 0.995541 | 0.995540 | 0.995541 | 0.995539 |
Voting (hard) | 0.981048 | 0.981517 | 0.981048 | 0.980723 |
Voting (soft) | 0.992196 | 0.992182 | 0.992196 | 0.992172 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kanyama, M.N.; Bhunu Shava, F.; Gamundani, A.M.; Hartmann, A. AI-Driven Anomaly Detection in Smart Water Metering Systems Using Ensemble Learning. Water 2025, 17, 1933. https://doi.org/10.3390/w17131933
Kanyama MN, Bhunu Shava F, Gamundani AM, Hartmann A. AI-Driven Anomaly Detection in Smart Water Metering Systems Using Ensemble Learning. Water. 2025; 17(13):1933. https://doi.org/10.3390/w17131933
Chicago/Turabian StyleKanyama, Maria Nelago, Fungai Bhunu Shava, Attlee Munyaradzi Gamundani, and Andreas Hartmann. 2025. "AI-Driven Anomaly Detection in Smart Water Metering Systems Using Ensemble Learning" Water 17, no. 13: 1933. https://doi.org/10.3390/w17131933
APA StyleKanyama, M. N., Bhunu Shava, F., Gamundani, A. M., & Hartmann, A. (2025). AI-Driven Anomaly Detection in Smart Water Metering Systems Using Ensemble Learning. Water, 17(13), 1933. https://doi.org/10.3390/w17131933