Machine Learning-Based Water Quality Classification Assessment
Abstract
:1. Introduction
- (1)
- Limitations of single model optimization: Many existing studies focus on enhancing the performance of single models by incorporating optimization algorithms. However, single models have inherent limitations when handling complex water quality data, particularly regarding feature selection and the modeling of nonlinear relationships, challenges that optimization algorithms alone cannot fully address. While some research has explored improving model performance through ensemble learning methods, there is a limited exploration of combining GBDT and MLP models specifically for water quality classification.
- (2)
- Lack of dynamic feature selection mechanisms: Most current water quality classification models depend on static feature weights, making them unable to dynamically adjust the importance of features according to different tasks or environmental conditions. This limits the model’s adaptability in handling complex and changing water quality monitoring tasks. In particular, there is a weak correlation between water quality characteristics and labels.
- (3)
- Parameter selection and simplification: To enhance computational efficiency, some studies reduce water quality parameters to simplify model structures. However, this simplification strategy may overlook key water quality features, reducing classification accuracy. While simplifying models can reduce computational burdens, it may also weaken overall model performance under complex water quality conditions, particularly when significant interactions between multiple parameters are present, leading to the potential omission of critical water quality information and impacting classification outcomes.
2. Materials and Methods
2.1. Data Sources and Description
- pH: indicates the acidity or alkalinity of the water.
- Turbidity: measures the amount of suspended solids in water, serving as an indicator of waste disposal.
- Dissolved solids: used to measure the ability of water to dissolve various organic and organic minerals.
- Chloramine: represents the concentration of chloramine, a primary disinfectant.
- Conductivity: reflects the water’s electrical conductivity, which increases with ion concentration.
- Nitrates, sulfates, chlorides, fluorides: indicate the concentrations of these substances in water.
- Various metals: indicate the levels of metal elements in the water sample.
2.2. Machine Learning Algorithms
2.2.1. Single Models
- 1.
- Adaptive Boosting (ADA)
- 2.
- Logistic Regression (LR)
- 3.
- Gaussian Naive Bayes (GNB)
- 4.
- K-Nearest Neighbors (KNNs)
- 5.
- Support Vector Machine (SVM)
- 6.
- Gradient Boosting Decision Tree (GBDT)
- 7.
- Multi-Layer Perceptron (MLP)
2.2.2. BS-FAMLP Hybrid Model Building
2.3. Model Evaluation Criteria
- 1.
- Accuracy
- 2.
- Precision
- 3.
- Recall
- 4.
- F1 Score
- 5.
- AUC
3. Experiments
3.1. Hardware and Software Configuration
3.2. Dataset Description and Preprocessing
3.2.1. Data Preprocessing
3.2.2. Evaluation of Data Preprocessing
3.2.3. Data Analysis
3.2.4. Data Distribution Analysis
3.2.5. Correlation Analysis
3.3. Data Normalization
4. Results and Discussion
4.1. Single Model Classification Results
4.2. Classification Results of the Hybrid Model
4.2.1. Model Training
4.2.2. Ablation Study
- (1)
- Model B-MLP with the hybrid of two learners, Bagging layer and MLP.
- (2)
- Model BS-MLP with three learners fused.
- (3)
- Model BS-FAMLP with feature-weighted attention mechanism added to the BS-MLP model.
- (4)
- Model BS-SAMLP that adds a self-attention mechanism to the BS-MLP model.
- (5)
- Model BS-MAMLP that adds a multi-head attention mechanism to the BS-MLP model.
4.2.3. Test Set Performance Evaluation
5. Conclusions
- Parameter Optimization and Algorithm Selection: Explore more efficient optimization algorithms to prevent local optima and enhance model performance.
- Data Preprocessing: Adopt more advanced missing value imputation and outlier detection techniques to improve data integrity and reliability.
- Imbalanced Data Handling: Use ensemble learning or generative adversarial networks (GANs) to increase minority class samples and improve classification performance.
- Model Lightweighting: Simplify the model structure to increase operational efficiency, making it easier to deploy in resource-constrained environments for real-time classification.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Kumar, P. Simulation of Gomti River (Lucknow City, India) future water quality under different mitigation strategies. Heliyon 2018, 4, 1074. [Google Scholar] [CrossRef] [PubMed]
- Ahmed, J.; Wong, L.P.; Chua, Y.P.; Channa, N.; Mahar, R.B.; Yasmin, A.; VanDerslice, J.A.; Garn, J.V. Quantitative Microbial Risk Assessment of Drinking Water Quality to Predict the Risk of Waterborne Diseases in Primary-School Children. Int. J. Environ. Res. Public Health 2020, 17, 2774. [Google Scholar] [CrossRef]
- Tleuova, Z.; Snow, D.D.; Mukhamedzhanov, M.; Ermenbay, A. Relation of hydrogeology and contaminant sources to drinking water quality in southern Kazakhstan. Water 2023, 15, 4240. [Google Scholar] [CrossRef]
- Zhu, M.; Wang, J.; Yang, X.; Zhang, Y.; Zhang, L.; Ren, H.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef] [PubMed]
- Mahgoub, H.A. Extraction techniques for determination of polycyclic aromatic hydrocarbons in water samples. Int. J. Sci. Res. 2013, 1, 268–272. [Google Scholar]
- Hu, C.; Dong, B.; Shao, H.; Zhang, J.; Wang, Y. Toward purifying defect feature for multilabel sewer defect classification. IEEE Trans. Instrum. Meas. 2023, 72, 5008611. [Google Scholar] [CrossRef]
- Kang, J.-K.; Lee, D.; Muambo, K.E.; Choi, J.-W.; Oh, J.-E. Development of an embedded molecular structure-based model for prediction of micropollutant treatability in a drinking water treatment plant by machine learning from three years monitoring data. Water Res. 2023, 239, 120037. [Google Scholar] [CrossRef]
- Uddin, M.G.; Nash, S.; Rahman, A.; Olbert, A.I. Performance analysis of the water quality index model for predicting water state using machine learning techniques. Process Saf. Environ. Prot. 2023, 169, 808–828. [Google Scholar] [CrossRef]
- Muharemi, F.; Logofătu, D.; Leon, F. Machine learning approaches for anomaly detection of water quality on a real-world data set. J. Inf. Telecommun. 2019, 3, 294–307. [Google Scholar] [CrossRef]
- Pedro-Monzonís, M.; Solera, A.; Ferrer, J.; Estrela, T.; Paredes-Arquiola, J. A review of water scarcity and drought indexes in water resources planning and management. J. Hydrol. 2015, 527, 482–493. [Google Scholar] [CrossRef]
- Memon, A.G.; Mustafa, A.; Raheem, A.; Ahmad, J.; Giwa, A.S. Impact of effluent discharge on recreational beach water quality: A case study of Karachi-Pakistan. J. Coast. Conserv. 2021, 25, 37. [Google Scholar] [CrossRef]
- Saghebian, S.M.; Sattari, M.T.; Mirabbasi, R.; Pal, M. Ground water quality classification by decision tree method in Ardebil region, Iran. Arab. J. Geosci. 2013, 7, 4767–4777. [Google Scholar] [CrossRef]
- Muhammad, S.Y.; Makhtar, M.; Rozaimee, A.; Aziz, A.A.; Jamal, A.A. Classification model for water quality using machine learning techniques. Int. J. Softw. Eng. Appl. 2015, 9, 45–52. [Google Scholar] [CrossRef]
- Rizeei, H.M.; Azeez, O.S.; Pradhan, B.; Khamees, H.H. Assessment of groundwater nitrate contamination hazard in a semi-arid region by using integrated parametric IPNOA and data-driven logistic regression models. Environ. Monit. Assess. 2018, 190, 633. [Google Scholar] [CrossRef] [PubMed]
- Nong, X.; Shao, D.; Zhong, H.; Liang, J. Evaluation of water quality in the South-to-North Water Diversion Project of China using the water quality index (WQI) method. Water Res. 2020, 178, 115781. [Google Scholar] [CrossRef]
- Nafouanti, M.B.; Li, J.; Mustapha, N.A.; Uwamungu, P.; AL-Alimi, D. Prediction on the fluoride contamination in groundwater at the Datong Basin, Northern China: Comparison of random forest, logistic regression and artificial neural network. Appl. Geochem. 2021, 132, 105054. [Google Scholar] [CrossRef]
- Huang, Y.; Ding, L.; Liu, W.; Niu, H.; Yang, M.; Lyu, G.; Lin, S.; Hu, Q. Groundwater contamination site identification based on machine learning: A case study of gas stations in China. Water 2023, 15, 1326. [Google Scholar] [CrossRef]
- Chatterjee, T.; Gogoi, U.R.; Samanta, A.; Chatterjee, A.; Singh, M.K.; Pasupuleti, S. Identifying the Most Discriminative Parameter for Water Quality Prediction Using Machine Learning Algorithms. Water 2024, 16, 481. [Google Scholar] [CrossRef]
- Singh, Y.; Walingo, T. Smart Water Quality Monitoring with IoT Wireless Sensor Networks. Sensors 2024, 24, 2871. [Google Scholar] [CrossRef]
- Hosmer, D.W.; Lemeshow, S.; Sturdivant, R.X. Applied Logistic Regression; John Wiley Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
- Guns, M.; Vanacker, V. Logistic regression applied to natural hazards: Rare event logistic regression with replications. Nat. Hazards Earth Syst. Sci. 2012, 12, 1937–1947. [Google Scholar] [CrossRef]
- Zhang, H. The optimality of naive Bayes. In The Florida AI Research Society, Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004), Miami Beach, FL, USA, 12–14 May 2004; The AAAI Press: Menlo Park, CA, USA, 2004; ISBN 978-1-57735-201-3. [Google Scholar]
- Beyer, K.; Goldstein, J.; Ramakrishnan, R.; Shaft, U. When is “nearest neighbor” meaningful? In Proceedings of the International Conference on Database Theory, Jerusalem, Israel, 10–12 January 1999; pp. 217–235. [Google Scholar]
- Tong, S.; Koller, D. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2001, 2, 45–66. [Google Scholar]
- Zhang, H.; Zou, Q.; Ju, Y.; Song, C.; Chen, D. Distance-based support vector machine to predict DNA N6-methyladenine modification. Curr. Bioinform. 2022, 17, 473–482. [Google Scholar] [CrossRef]
- Friedman, J.H. Stochastic gradient boosting. Comput. Stat. Data Anal. 2002, 38, 367–378. [Google Scholar] [CrossRef]
- Huang, P.; Wang, L.; Hou, D.; Lin, W.; Yu, J.; Zhang, G.; Zhang, H. A feature extraction method based on the entropy-minimal description length principle and GBDT for common surface water pollution identification. J. Hydroinform. 2021, 23, 1050–1065. [Google Scholar] [CrossRef]
- Liang, W.; Luo, S.; Zhao, G.; Wu, H. Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms. Mathematics 2020, 8, 765. [Google Scholar] [CrossRef]
- Lin, H.-Y.; Lee, S.-H.; Wang, J.-H.; Chang, M.-J. Utilizing Artificial Intelligence Techniques for a Long–Term Water Resource Assessment in the ShihMen Reservoir for Water Resource Allocation. Water 2024, 16, 2346. [Google Scholar] [CrossRef]
- Günther, F.; Fritsch, S. Neuralnet: Training of neural networks. R J. 2010, 2, 30–38. [Google Scholar] [CrossRef]
- Pinkus, A. Approximation theory of the MLP model in neural networks. Acta Numer. 1999, 8, 143–195. [Google Scholar] [CrossRef]
- Zhou, Z.H. Ensemble Methods: Foundations and Algorithms; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
- McLaughlin, D.B. Assessing the predictive performance of risk-based water quality criteria using decision error estimates from receiver operating characteristics (ROC) analysis. Integr. Environ. Assess. Manag. 2012, 8, 674–684. [Google Scholar] [CrossRef]
- Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score, and ROC: A Family of Discriminant Measures for Performance Evaluation. In AI 2006: Advances in Artificial Intelligence, Proceedings of the Australasian Joint Conference on Artificial Intelligence, Hobart, Australia, 4–6 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
- Goutte, C.; Gaussier, E. A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation. In Advances in Information Retrieval, Proceedings of the 27th European Conference on IR Research, ECIR 2005, Santiago de Compostela, Spain, 21–23 March 2005; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2005; pp. 345–359. [Google Scholar]
- Gazzaz, N.M.; Yusoff, M.K.; Aris, A.Z.; Juahir, H.; Ramli, M.F. Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar. Pollut. Bull. 2012, 64, 2409–2420. [Google Scholar] [CrossRef]
- Liaw, A.; Wiener, M. Classification and regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
- Willmott, C.J.; Matsuura, K. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Alnaqeb, R.; Alrashdi, F.; Alketbi, K.; Ismail, H. Machine learning-based water potability prediction. In Proceedings of the 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, 5–8 December 2022; IEEE Computer Society: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Zhu, X.; Khosravi, M.; Vaferi, B.; Nait Amar, M.; Ghriga, M.A.; Mohammed, A.H. Application of machine learning methods for estimating and comparing the sulfur dioxide absorption capacity of a variety of deep eutectic solvents. J. Clean. Prod. 2022, 363, 132465. [Google Scholar] [CrossRef]
- Jayalakshmi, T.; Santhakumaran, A. Statistical normalization and back propagation for classification. Int. J. Comput. Theory Eng. 2011, 3, 1793–8201. [Google Scholar]
Research | Methodology | Algorithms | Main Findings |
---|---|---|---|
S. Mahdi Saghebian et al. (2013) [12] | Combined DT with USSL diagram. Combination of various soft computing technologies. Statistical analysis to determine key chemical parameters. | DT PCA | DT is highly accurate and efficient. DT is better than PCA. |
Salisu Yusuf Muhammad et al. (2015) [13] | Used various machine learning models to classify water quality. Processing of unbalanced datasets. | DT SVM GBT ADA | Machine learning models demonstrated good adaptability in handling complex, high-dimensional data. |
H. H. Khamees et al. (2018) [14] | Satellite image classification. Combined parametric models and multivariate data-driven models with GIS to assess groundwater nitrate pollution. | IPNOA LR SVM | IPNOA and LR integrated model that can accurately predict and analyze nitrate hazard concentrations. |
Xizhi Nong et al. (2020) [15] | Proposed WQImin model based on stepwise MLR | MLR | Provided practical tools for large-scale water quality management |
Mouigni Baraka Nafouanti et al. (2021) [16] | Compared three methods in evaluating groundwater fluoride contamination. | RF ANN LR | RF achieved the highest accuracy and identified key variables influencing fluoride concentration. |
Yanpeng Huang et al. (2023) [17] | Compared six classic machine learning algorithms to identify TPH. Introduce BO to optimize the GBDT model. | LR DT GBDT RF MLP SVM | The GBDT model performed best. BO-GBDT significantly reduced training time. |
Tapan Chatterjee et al. (2024) [18] | Investigated the most effective parameters for water quality classification. | ANN LR KNN | Using only the Cr parameter achieved 91.67% classification accuracy |
Algorithm | Hyperparameter | Description | Range |
---|---|---|---|
ADA | n_estimators | The number of weak classifiers to be combined | Int [10, 500] |
learning_rate | Shrinks the contribution of each classifier | Float [0.01, 1] | |
LR | C | Regularization strength | Float [0.0001, 10] |
solver | The optimization algorithm used for fitting the model | [lbfgs, saga, liblinear] | |
GNB | var_smoothing | A portion of the largest variance is to be added to variances for stability. | Float [1 × 10−9, 1 × 10−5] |
KNN | n_estimators | Number of neighbors to use | Int [1, 30] |
distance metric | The distance metric used to compute nearest neighbors | [Euclidean, Manhattan, minkowski] | |
SVM | C | Regularization parameter | Float [0.01, 1000] |
kernel | Determines the method for mapping input data into a higher-dimensional space | [linear, poly, rbf, sigmoid] | |
GBDT | learning_rate | Controls the contribution of each tree. | Float [0.001, 0.5] |
max_depth | Maximum depth of the individual estimators | Int [2, 15] | |
min_samples_leaf | Minimum number of samples required to be at a leaf node | Int [1, 20] | |
min_samples_split | Minimum number of samples required to split an internal node | Int [2, 20] | |
n_estimators | Number of boosting stages to perform | Int [2, 500] | |
MLP | hidden_layer_sizes | The number of neurons in each hidden layer | (8, 16), (16, 32) (32, 64), (64, 128), (128, 256) |
learning_rate | Initial learning rate | [constant, adaptive] | |
activation function | Function used to introduce nonlinearity in the network | [ReLU, Tanh, Sigmoid] | |
solver | The solver for weight optimization | [adam, sgd] |
Item | Configuration |
---|---|
Operating System | Windows 11 (64 bit) |
CPU | Intel i9 12900HX |
GPU | Nvidia RTX 4060 (The device was designed by Nvidia Corporation in Santa Clara, CA, USA, and manufactured by TSMC in Taiwan, China) |
RAM | 16 GB |
Hard Disk | 1 TB |
Programming Language | Python 3.11.0 |
Experimental Platform | Keras 3.3.3 |
Parameters | Number |
---|---|
pH | 2752 |
Iron | 937 |
Nitrate | 2566 |
Chloride | 4122 |
Lead | 601 |
Zinc | 3748 |
Turbidity | 1169 |
Fluoride | 4449 |
Copper | 4762 |
Odor | 4235 |
Sulfate | 4658 |
Conductivity | 3848 |
Chlorine | 1326 |
Manganese | 2592 |
Total Dissolved Solids | 43 |
Dataset | RMSE | MAE | R2 |
---|---|---|---|
Raw Dataset | 0.326 | 0.220 | 0.393 |
Preprocessed Dataset | 0.280 | 0.156 | 0.579 |
Parameters | WHO Limits | Avg. ± S.D. | Max | Min |
---|---|---|---|---|
pH | 6.5–8.5 | 7.43 ± 0.97 | 12.89 | 2.06 |
Iron (mg/L) | 0.3 | 0.10 ± 0.41 | 15.75 | 9.70 × 10−41 |
Nitrate (mg/L) | 10 | 6.30 ± 3.29 | 54.97 | 4.89 × 10−1 |
Chloride (mg/L) | 250 | 188.37 ± 69.75 | 981.05 | 3.50 × 10 |
Lead (mg/L) | 0.01 | 0.00 ± 0.03 | 3.50 | 0 |
Zinc (mg/L) | 5 | 1.57 ± 1.53 | 21.78 | 4.97 × 10−6 |
Turbidity (NTU) | 5 | 0.50 ± 0.83 | 18.59 | 2.47 × 10−13 |
Fluoride (mg/L) | 1.5 | 1.00 ± 0.83 | 9.60 | 2.54 × 10−5 |
Copper (mg/L) | 2 | 0.53 ± 0.57 | 11.18 | 2.10 × 10−9 |
Odor (TON) | 3 | 1.87 ± 1.10 | 4.14 | 1.10 × 10−2 |
Sulfate (mg/L) | 250 | 148.29 ± 69.67 | 1015.27 | 1.62 × 10 |
Conductivity (µS/cm) | 424.68 ± 187.10 | 1809.40 | 1.31 × 10 | |
Chlorine (mg/L) | 4 | 3.30 ± 0.76 | 9.78 | 1.04 |
Manganese (mg/L) | 0.05 | 0.08 ± 0.42 | 14.74 | 7.46 × 10−41 |
Total Dissolved Solids (mg/L) | 500 | 272.80 ± 158.95 | 579.80 | 1.06 × 10−2 |
Range | Correlation Type |
---|---|
−1 ≤ r < −0.8 | Very strong indirect relationship |
−0.8 ≤ r < −0.6 | Strong indirect relationship |
−0.6 ≤ r < −0.4 | Moderate indirect relationship |
−0.4 ≤ r < −0.2 | Weak indirect relationship |
−0.2 ≤ r < 0 | Very weak indirect relationship |
r = 0 | No relationship |
0 < r ≤ 0.2 | Very weak direct relationship |
0.2 < r ≤ 0.4 | Weak direct relationship |
0.4 < r ≤ 0.6 | Moderate direct relationship |
0.6 < r ≤ 0.8 | Strong direct relationship |
0.8 < r ≤ 1 | Powerful direct relationship |
Parameters | Correlation |
---|---|
Chloride | 0.213 |
Copper | 0.197 |
Odor | 0.189 |
Fluoride | 0.176 |
Turbidity | 0.173 |
Nitrate | 0.166 |
Chlorine | 0.163 |
Sulfate | 0.129 |
Total Dissolved Solids | 0.112 |
Manganese | 0.088 |
Iron | 0.076 |
Zinc | 0.069 |
Lead | 0.018 |
pH | −0.036 |
Model | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|
ADA | 0.6425 | 0.6501 | 0.9049 | 0.7566 | 0.7246 |
LR | 0.7267 | 0.7152 | 0.6792 | 0.6967 | 0.7812 |
GNB | 0.7117 | 0.8099 | 0.4919 | 0.6121 | 0.8614 |
KNN | 0.8079 | 0.8149 | 0.7562 | 0.7845 | 0.8644 |
SVM | 0.8370 | 0.8279 | 0.8172 | 0.8225 | 0.8965 |
MLP | 0.8460 | 0.8234 | 0.8488 | 0.8359 | 0.8879 |
GBDT | 0.8719 | 0.8438 | 0.8871 | 0.8649 | 0.9277 |
Layer | Parameters | |||||
---|---|---|---|---|---|---|
learning_rate | max_depth | min_samples_leaf | min_samples_split | n_estimators | ||
Bagging Layer | GBDT1 | 0.01 | 13 | 15 | 7 | 310 |
GBDT2 | 0.20 | 13 | 5 | 2 | 340 | |
GBDT3 | 0.20 | 15 | 20 | 4 | 500 | |
GBDT4 | 0.01 | 15 | 20 | 20 | 500 | |
GBDT5 | 0.20 | 15 | 1 | 2 | 331 | |
GBDT6 | 0.01 | 15 | 20 | 18 | 500 | |
GBDT7 | 0.08 | 14 | 13 | 6 | 365 | |
GBDT8 | 0.16 | 13 | 1 | 13 | 500 | |
GBDT9 | 0.09 | 15 | 1 | 2 | 480 | |
Stacking Layer | GBDT1 | 0.08 | 12 | 18 | 4 | 51 |
GBDT2 | 0.01 | 11 | 20 | 4 | 492 | |
GBDT3 | 0.09 | 15 | 7 | 15 | 500 | |
GBDT4 | 0.01 | 10 | 12 | 2 | 500 | |
GBDT5 | 0.05 | 8 | 19 | 5 | 183 | |
MLP | activation: ReLU, alpha: 0.0001, learning_rate: 0.001, hidden_layer_sizes: (64, 128), max_iter: 500 solver: adam. |
Layer | Accuracy | Precision | Recall | F1 Score | |
---|---|---|---|---|---|
Bagging Layer | GBDT1 | 0.8933 | 0.8581 | 0.9436 | 0.8988 |
GBDT2 | 0.9042 | 0.8646 | 0.9563 | 0.9081 | |
GBDT3 | 0.8958 | 0.8549 | 0.9512 | 0.9005 | |
GBDT4 | 0.8983 | 0.8613 | 0.9511 | 0.9040 | |
GBDT5 | 0.8908 | 0.8438 | 0.9501 | 0.8938 | |
GBDT6 | 0.9133 | 0.8787 | 0.9602 | 0.9177 | |
GBDT7 | 0.9029 | 0.8919 | 0.9473 | 0.9187 | |
GBDT8 | 0.8988 | 0.8687 | 0.9424 | 0.9041 | |
GBDT9 | 0.8983 | 0.8570 | 0.9462 | 0.8994 | |
Stacking Layer | GBDT1 | 0.9363 | 0.9109 | 0.9541 | 0.9320 |
GBDT2 | 0.9367 | 0.9127 | 0.9587 | 0.9352 | |
GBDT3 | 0.9361 | 0.9103 | 0.9645 | 0.9367 | |
GBDT4 | 0.9365 | 0.9108 | 0.9671 | 0.9381 | |
GBDT5 | 0.9365 | 0.9109 | 0.9539 | 0.9319 | |
BS-MLP | 0.9435 | 0.9327 | 0.9513 | 0.9419 | |
BS-FAMLP | 0.9629 | 0.9563 | 0.9570 | 0.9567 |
Model | K Fold | Accuracy | Precision | Recall | F1 Score | Run Time (s) |
---|---|---|---|---|---|---|
B-MLP | 1-fold | 0.9362 | 0.9385 | 0.9338 | 0.9361 | 315.41 |
2-fold | 0.9370 | 0.9373 | 0.9363 | 0.9368 | 392.19 | |
3-fold | 0.9368 | 0.9372 | 0.9364 | 0.9368 | 602.99 | |
4-fold | 0.9376 | 0.9351 | 0.9403 | 0.9378 | 326.07 | |
5-fold | 0.9382 | 0.9376 | 0.9390 | 0.9383 | 386.35 | |
BS-MLP | 1-fold | 0.9446 | 0.9338 | 0.9514 | 0.9425 | 514.04 |
2-fold | 0.9412 | 0.9242 | 0.9513 | 0.9376 | 441.88 | |
3-fold | 0.9497 | 0.9331 | 0.9604 | 0.9466 | 542.89 | |
4-fold | 0.9407 | 0.9321 | 0.9436 | 0.9378 | 538.61 | |
5-fold | 0.9411 | 0.9403 | 0.9500 | 0.9451 | 488.52 | |
BS-FAMLP | 1-fold | 0.9591 | 0.9545 | 0.9546 | 0.9545 | 2240.13 |
2-fold | 0.9667 | 0.9521 | 0.9636 | 0.9578 | 2682.16 | |
3-fold | 0.9634 | 0.9615 | 0.9511 | 0.9563 | 2497.14 | |
4-fold | 0.9596 | 0.9539 | 0.9601 | 0.9570 | 2419.21 | |
5-fold | 0.9657 | 0.9597 | 0.9556 | 0.9576 | 2521.12 | |
BS-SAMLP | 1-fold | 0.9540 | 0.9486 | 0.9597 | 0.9541 | 1532.05 |
2-fold | 0.9549 | 0.9505 | 0.9591 | 0.9548 | 1935.67 | |
3-fold | 0.9532 | 0.9471 | 0.9596 | 0.9533 | 1839.56 | |
4-fold | 0.9537 | 0.9473 | 0.9606 | 0.9539 | 1601.11 | |
5-fold | 0.9572 | 0.9560 | 0.9585 | 0.9572 | 1565.66 | |
BS-MAMLP | 1-fold | 0.9598 | 0.9681 | 0.9702 | 0.9691 | 10,036.05 |
2-fold | 0.9645 | 0.9767 | 0.9701 | 0.9734 | 10,933.45 | |
3-fold | 0.9634 | 0.9770 | 0.9702 | 0.9736 | 11,871.04 | |
4-fold | 0.9664 | 0.9758 | 0.9718 | 0.9738 | 10,752.75 | |
5-fold | 0.9673 | 0.9749 | 0.9789 | 0.9769 | 11,567.36 |
Model | Accuracy | Precision | Recall | F1 Score | AUC |
---|---|---|---|---|---|
B-MLP | 0.9337 | 0.9293 | 0.9277 | 0.9285 | 0.9610 |
BS-MLP | 0.9423 | 0.9384 | 0.9368 | 0.9376 | 0.9715 |
BS-FAMLP | 0.9616 | 0.9524 | 0.9655 | 0.9589 | 0.9834 |
BS-SAMLP | 0.9535 | 0.9593 | 0.9597 | 0.9588 | 0.9822 |
BS-MAMLP | 0.9632 | 0.9507 | 0.9711 | 0.9608 | 0.9875 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, W.; Xu, D.; Pan, B.; Zhao, Y.; Song, Y. Machine Learning-Based Water Quality Classification Assessment. Water 2024, 16, 2951. https://doi.org/10.3390/w16202951
Chen W, Xu D, Pan B, Zhao Y, Song Y. Machine Learning-Based Water Quality Classification Assessment. Water. 2024; 16(20):2951. https://doi.org/10.3390/w16202951
Chicago/Turabian StyleChen, Wenliang, Duo Xu, Bowen Pan, Yuan Zhao, and Yan Song. 2024. "Machine Learning-Based Water Quality Classification Assessment" Water 16, no. 20: 2951. https://doi.org/10.3390/w16202951
APA StyleChen, W., Xu, D., Pan, B., Zhao, Y., & Song, Y. (2024). Machine Learning-Based Water Quality Classification Assessment. Water, 16(20), 2951. https://doi.org/10.3390/w16202951