Multiple Linear Regression and Machine Learning for Predicting the Drinking Water Quality Index in Al-Seine Lake
Abstract
:1. Introduction
2. Materials and Methods
2.1. Study Area
2.2. Sample Collection and Analysis
2.3. Water Quality Index (WQI)
2.4. Multiple Linear Regression (MLR)
2.5. Machine Learning Models (ML)
3. Results and Discussion
3.1. Evaluation of Water Quality Parameters
3.1.1. pH
3.1.2. Sulfate (SO4−2)
3.1.3. Nitrate (NO3−)
3.1.4. Nitrite (NO2−)
- -
- Agricultural pollution: Nitrates, ammonium, and urea are added to the soil as nitrogen fertilizers for crops, and water can seep into the groundwater containing nitrates and nitrites. Nitrates in agricultural fertilizers can also decompose and turn into nitrites through bacterial processes.
- -
- Industrial pollution: Chemicals in raw materials or used in industry can leak into the soil and groundwater, and then into surface water sources. Some industries using nitrates, such as fertilizers, insecticides, and other chemicals, can contribute to nitrite pollution in surface and groundwater.
- -
- Sewage: Sewage and animal waste from animal and poultry facilities can seep into groundwater and surface water sources, leading to increased concentrations of nitrates and nitrites.
- -
- Air pollution: Rain, snow, and airborne spray can carry industrial and agricultural pollutants into water sources, leading to increased concentrations of nitrates and nitrites in surface water.
- -
- Drug pollution: Some drugs can seep into the soil and groundwater from various sources and may be found in surface and groundwater sources.
3.1.5. Ammonium NH4+
3.1.6. Phosphate PO4−3
3.1.7. Turbidity
3.1.8. Electrical Conductivity (EC)
3.2. Multiple Linear Regression (MLR)
3.3. Machine Learning Models (ML)
4. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
References
- World Health Organization. Guidelines for Drinking-Water Quality: First Addendum to the Fourth Edition; WHO: Geneva, Switzerland, 2017. [Google Scholar]
- Nouraki, A.; Alavi, M.; Golabi, M.; Albaji, M. Prediction of water quality parameters using machine learning models: A case study of the Karun River, Iran. Environ. Sci. Pollut. Res. 2021, 28, 57060–57072. [Google Scholar] [CrossRef] [PubMed]
- UN Environment Programme. A Snapshot of the World’s Water Quality: Towards a Global Assessment; United Nations Environment Programme: Nairobi, Kenya, 2016. [Google Scholar]
- Asadollah, S.B.H.S.; Sharafati, A.; Motta, D.; Yaseen, Z.M. River water quality index prediction and uncertainty analysis: A comparative study of machine learning models. J. Environ. Chem. Eng. 2021, 9, 104599. [Google Scholar] [CrossRef]
- Mishra, B.K.; Regmi, R.K.; Masago, Y.; Fukushi, K.; Kumar, P.; Saraswat, C. Assessment of Bagmati river pollution in Kathmandu Valley: Scenario-based modeling and analysis for sustainable urban development. Sustain. Water Qual. Ecol. 2017, 9, 67–77. [Google Scholar] [CrossRef]
- Ewaid, S.H.; Abed, S.A. Water quality index for Al-Gharraf river, southern Iraq. Egypt. J. Aquat. Res. 2017, 43, 117–122. [Google Scholar] [CrossRef]
- Ramakrishnaiah, C.; Sadashivaiah, C.; Ranganna, G. Assessment of water quality index for the groundwater in Tumkur Taluk, Karnataka State, India. E-J. Chem. 2009, 6, 523–530. [Google Scholar] [CrossRef]
- Ewaid, S.H.; Abed, S.A. Water quality assessment of Al-Gharraf River, South of Iraq using multivariate statistical techniques. Al-Nahrain J. Sci. 2017, 20, 114–122. [Google Scholar] [CrossRef]
- Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol. 2020, 585, 124670. [Google Scholar]
- Nearing, G.S.; Kratzert, F.; Sampson, A.K.; Pelissier, C.S.; Klotz, D.; Frame, J.M.; Prieto, C.; Gupta, H.V. What role does hydrological science play in the age of machine learning? Water Resour. Res. 2021, 57, e2020WR028091. [Google Scholar] [CrossRef]
- Jafar, R. Assessment of surface water quality by using multivariate statistical techniques. Tishreen Univ. J. Eng. Sci. Ser. 2022, 44, 11–31. [Google Scholar]
- Abbasi, T.; Abbasi, S.A. Water Quality Indices; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
- Ahmed, M.; Mumtaz, R.; Hassan Zaidi, S.M. Analysis of water quality indices and machine learning techniques for rating water pollution: A case study of Rawal Dam, Pakistan. Water Supply 2021, 21, 3225–3250. [Google Scholar] [CrossRef]
- Bedi, S.; Samal, A.; Ray, C.; Snow, D. Comparative evaluation of machine learning models for groundwater quality assessment. Environ. Monit. Assess. 2020, 192, 776. [Google Scholar] [CrossRef] [PubMed]
- Gupta, S.; Gupta, S.K. Evaluation of River Health Status Based on Water Quality Index and Multiple Linear Regression Analysis. In Sustainable Environmental Engineering and Sciences: Select Proceedings of SEES 2021; Springer: Berlin/Heidelberg, Germany, 2023; pp. 77–85. [Google Scholar]
- Wu, Z.; Wang, X.; Chen, Y.; Cai, Y.; Deng, J. Assessing river water quality using water quality index in Lake Taihu Basin, China. Sci. Total Environ. 2018, 612, 914–922. [Google Scholar] [CrossRef] [PubMed]
- Nair, J.P.; Vijaya, M. River Water Quality Prediction and index classification using Machine Learning. Proc. J. Phys. Conf. Ser. 2022, 2325, 012011. [Google Scholar] [CrossRef]
- Malek, N.H.A.; Wan Yaacob, W.F.; Md Nasir, S.A.; Shaadan, N. Prediction of Water Quality Classification of the Kelantan River Basin, Malaysia, Using Machine Learning Techniques. Water 2022, 14, 1067. [Google Scholar] [CrossRef]
- Nguyen, D.P.; Ha, H.D.; Trinh, N.T.; Nguyen, M.T. Application of artificial intelligence for forecasting surface quality index of irrigation systems in the Red River Delta, Vietnam. Environ. Syst. Res. 2023, 12, 24. [Google Scholar] [CrossRef]
- Rezaie-Balf, M.; Attar, N.F.; Mohammadzadeh, A.; Murti, M.A.; Ahmed, A.N.; Fai, C.M.; Nabipour, N.; Alaghmand, S.; El-Shafie, A. Physicochemical parameters data assimilation for efficient improvement of water quality index prediction: Comparative assessment of a noise suppression hybridization approach. J. Clean. Prod. 2020, 271, 122576. [Google Scholar] [CrossRef]
- Kouadri, S.; Elbeltagi, A.; Islam, A.R.M.T.; Kateb, S. Performance of machine learning methods in predicting water quality index based on irregular data set: Application on Illizi region (Algerian southeast). Appl. Water Sci. 2021, 11, 190. [Google Scholar] [CrossRef]
- Irwan, D.; Ali, M.; Ahmed, A.N.; Jacky, G.; Nurhakim, A.; Ping Han, M.C.; AlDahoul, N.; El-Shafie, A. Predicting Water Quality with Artificial Intelligence: A Review of Methods and Applications. Arch. Comput. Methods Eng. 2023, 1–20. [Google Scholar] [CrossRef]
- Jafar, R. Application of the Water Quality Index (NSFWQI) on the Al-Sain Lake. Tishreen Univ. J. Eng. Sci. Ser. 2016, 38, 20. [Google Scholar]
- Yadav, A.K.; Khan, P.; Sharma, S.K. Water Quality Index Assessment ofGroundwater in Todaraisingh Tehsil of Rajasthan State, India—A Greener Approach. E-J. Chem. 2010, 7, S428–S432. [Google Scholar] [CrossRef]
- World Health Organization. Guidelines for Drinking-Water Quality: Incorporating the First and Second Addenda; World Health Organization: Geneva, Switzerland, 2022. [Google Scholar]
- Meride, Y.; Ayenew, B. Drinking water quality assessment and its effects on residents health in Wondo genet campus, Ethiopia. Environ. Syst. Res. 2016, 5, 1. [Google Scholar] [CrossRef]
- Pooja, D.; Kumar, P.; Singh, P.; Patil, S. Sensors in Water Pollutants Monitoring: Role of Material; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Singh, A.L.; Tripathi, A.K.; Kumar, A.; Singh, V. Nitrate and phosphate contamination in ground water of Varanasi, Uttar Pradesh, India. J. Ind. Res. Technol. 2012, 2, 26–32. [Google Scholar]
- Bui, D.T.; Khosravi, K.; Tiefenbacher, J.; Nguyen, H.; Kazakis, N. Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci. Total Environ. 2020, 721, 137612. [Google Scholar] [CrossRef]
Descriptive Statistics | |||||
---|---|---|---|---|---|
N | Minimum | Maximum | Mean | Std. Deviation | |
WQI | 530 | 15.969 | 69.499 | 23.6427 | 4.24165 |
pH | 530 | 7.00 | 08.90 | 7.7704 | 0.17769 |
SO4−2 (mg/L) | 530 | 6.00 | 18.00 | 11.0811 | 2.19447 |
NO3− (mg/L) | 530 | 0.66 | 12.00 | 3.3711 | 0.89827 |
NO2− (mg/L) | 530 | 0.00 | 03.20 | 0.0212 | 0.13915 |
NH4+ (mg/L) | 530 | 0.00 | 0.05 | 0.0023 | 0.00592 |
PO4−3 (mg/L) | 530 | 0.01 | 0.90 | 0.2053 | 0.09166 |
Turbidity (NTU) | 530 | 0.38 | 09.94 | 1.9497 | 1.00350 |
EC (µS/cm) | 530 | 445.00 | 503.00 | 477.7830 | 7.92975 |
Parameter | Water Quality Stander (Si) | Assigned Weight (wi) | Relative Weight (Wi) |
---|---|---|---|
pH | 6.5–8.5 | 4 | 0.1212 |
SO4−2 (mg/L) | 250 | 4 | 0.1212 |
NO3− (mg/L) | 50 | 5 | 0.1515 |
NO2− (mg/L) | 1 | 5 | 0.1515 |
NH4+ (mg/L) | 0.50 | 3 | 0.0910 |
PO4−3 (mg/L) | 0.50 | 4 | 0.1212 |
Turbidity (NTU) | 5 | 4 | 0.1212 |
EC (µS/cm) | 1000 | 4 | 0.1212 |
Total |
Water Quality Index | 0–25 | 26–50 | 51–75 | 76–100 | Above 100 |
---|---|---|---|---|---|
Water Quality | Excellent | Good | Fair | Poor | Very Poor |
MLR Model | B | T | Sig. | Correlation Coefficient R | Determination Coefficient R2 | VIF Variance Inflation Factor | F | Sig. |
---|---|---|---|---|---|---|---|---|
(Constant) | −56.751 | −124.833 | 0.00 | 0.999 | 0.999 | 69,855.695 | 0.00 | |
pH | 8.227 | 217.177 | 0.00 | 1.057 | ||||
NO3− | 0.304 | 37.335 | 0.00 | 1.244 | ||||
NO2− | 15.167 | 289.537 | 0.00 | 1.239 | ||||
PO4− | 24.370 | 337.218 | 0.00 | 1.024 | ||||
Turbidity | 2.428 | 365.470 | 0.00 | 1.037 | ||||
EC | 0.011 | 13.183 | 0.00 | 1.070 |
Model | MAE | MSE | RMSE | R2 | RMSLE | MAPE | |
---|---|---|---|---|---|---|---|
Ir | Linear Regression | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 |
lar | Least Angle Regression | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 |
br | Bayesian Ridge | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.0000 | 0.0000 |
ridge | Ridge Regression | 0.6266 | 5.7816 | 1.3336 | 0.8822 | 0.0406 | 0.0235 |
et | Extra Trees Regressor | 0.6691 | 7.7039 | 1.6521 | 0.8227 | 0.0514 | 0.0240 |
gbr | Gradient Boosting Regressor | 0.6713 | 7.7261 | 1.6444 | 0.8216 | 0.0505 | 0.0240 |
xgboost | Extreme Gradient Boosting | 0.6906 | 7.8954 | 1.7136 | 0.8092 | 0.0531 | 0.0249 |
rf | Random Forest Regressor | 0.8350 | 8.2094 | 1.8274 | 0.7900 | 0.0589 | 0.0312 |
huber | Huber Regressor | 1.2459 | 10.5125 | 2.1805 | 0.6997 | 0.0772 | 0.0492 |
dt | Decision Tree Regressor | 1.2773 | 9.9978 | 2.3195 | 0.6650 | 0.0782 | 0.0501 |
lightgbm | Light Gradient Boosting Machine | 1.0778 | 9.8756 | 2.3649 | 0.6560 | 0.0762 | 0.0403 |
ada | AdaBoost Regressor | 1.4226 | 10.1218 | 2.4847 | 0.6166 | 0.0871 | 0.0579 |
lasso | Lasso Regression | 2.2606 | 13.7407 | 3.3067 | 0.2884 | 0.1212 | 0.0941 |
llar | Lasso Least Angle Regression | 2.2606 | 13.7407 | 3.3067 | 0.2884 | 0.1212 | 0.0941 |
en | Elastic Net | 2.3091 | 13.9869 | 3.3621 | 0.2580 | 0.1231 | 0.0961 |
knn | K Neighbors Regressor | 2.4627 | 14.7339 | 3.5339 | 0.1645 | 0.1295 | 0.1011 |
omp | Orthogonal Matching Pursuit | 2.7662 | 17.7386 | 3.9547 | −0.0604 | 0.1456 | 0.1154 |
dummy | Dummy Regressor | 2.7546 | 17.9070 | 3.9646 | −0.0620 | 0.1462 | 0.1152 |
par | Passive Aggressive Regressor | 3.6783 | 28.3909 | 5.0279 | −0.9922 | 0.1892 | 0.1473 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jafar, R.; Awad, A.; Hatem, I.; Jafar, K.; Awad, E.; Shahrour, I. Multiple Linear Regression and Machine Learning for Predicting the Drinking Water Quality Index in Al-Seine Lake. Smart Cities 2023, 6, 2807-2827. https://doi.org/10.3390/smartcities6050126
Jafar R, Awad A, Hatem I, Jafar K, Awad E, Shahrour I. Multiple Linear Regression and Machine Learning for Predicting the Drinking Water Quality Index in Al-Seine Lake. Smart Cities. 2023; 6(5):2807-2827. https://doi.org/10.3390/smartcities6050126
Chicago/Turabian StyleJafar, Raed, Adel Awad, Iyad Hatem, Kamel Jafar, Edmond Awad, and Isam Shahrour. 2023. "Multiple Linear Regression and Machine Learning for Predicting the Drinking Water Quality Index in Al-Seine Lake" Smart Cities 6, no. 5: 2807-2827. https://doi.org/10.3390/smartcities6050126
APA StyleJafar, R., Awad, A., Hatem, I., Jafar, K., Awad, E., & Shahrour, I. (2023). Multiple Linear Regression and Machine Learning for Predicting the Drinking Water Quality Index in Al-Seine Lake. Smart Cities, 6(5), 2807-2827. https://doi.org/10.3390/smartcities6050126