# Multi-Expression Programming (MEP): Water Quality Assessment Using Water Quality Indices

^{1}

^{2}

^{3}

^{4}

^{5}

^{6}

^{7}

^{*}

## Abstract

**:**

^{2}), root-mean-square error (RMSE), mean-absolute error (MAE), root-mean-square-logarithmic error (RMSLE) and mean-absolute-percent error (MAPE). The results show that the R

^{2}in the testing phase (subjected to unseen data) for EC-MEP and TDS-MEP models is above 0.90, i.e., 0.9674 and 0.9725, respectively, reflecting the higher accuracy and generalized performance. Also, the error measures are quite lower. In accordance with MAPE statistics, both the MEP models shows an “excellent” performance in all three stages. In comparison with traditional non-linear regression models (NLRMs), the developed machine learning models have good generalization capabilities. The sensitivity analysis of the developed MEP models with regard to the significance of each input on the forecasted water quality parameters suggests that Cl and HCO

_{3}have substantial impacts on the predictions of MEP models (EC and TDS), with a sensitiveness index above 0.90, although the influence of the Na is the less prominent. The results of this research suggest that the development of intelligence models for EC and TDS are cost effective and viable for the evaluation and monitoring of the quality of river water.

## 1. Introduction

^{+}), calcium (Ca

^{2+}), magnesium (Mg

^{2+}), nitrates (NO

_{3}

^{−}), chloride (Cl

^{−}), and sulfate (SO

_{4}

^{2−}), as well as other dissolved organic particles. Higher level of salts and organic matter indicates substandard water quality [8].

_{4}

^{2−}), Dissolved Oxygen (DO), Chlorine (Cl

^{−}), Total Coliform (TC), Magnesium (Mg), and Biochemical Oxygen Demand (BOD). The use of only nine variables results in quicker calculations, consequently reducing the computational duration. Similarly, Zali et al. [28] used the ML technique i.e., artificial neural networks (ANNs) to investigate the impacts of six key explanatory variables (i.e., Chemical Oxygen Demand (COD), Suspended Solids (SS), Nitrate (NO

_{3}

^{−}), BOD, DO, and pH) for the computation of WQI. Determining the comparative relevance of every variable in WQI prediction using a sensitivity analysis showed that DO, SS, and NO

_{3}

^{−}are indeed the essential input variables. Nigam and SM [29] compared the prediction performance of fuzzy based models and conventional computation techniques for the calculation of WQI of ground water, reporting comparatively the outburst predictive power of fuzzy (an intelligent model). Thus, this categorizes the water quality and surpasses the predictions of conventional calculation techniques. Srinivas and Singh [30] extended their study to an Interactive Fuzzy model (IFM) for establishment of a unique fuzzy decision-making technique for predicting WQI in rivers. The results of their research show a considerable enhancement in WQI predictive accuracy in comparison with a conventional fuzzy approach. Yaseen et al. [31] investigated the estimation efficiency of adaptive-neuro-fuzzy-inference-system (ANFIS)-based hybrid models combined with subtractive clustering (SC), Fuzzy C-mean data clustering (FCM), and grid partitioning (GP). They found that ANFIS-SC is the best and most consistent model. Radial-basis-function-neural-networks (RBFN) and back-propagation-neural-network (BPNN) algorithms were used to propose a model for the establishment of the relation between WQI and many biological variables (like COD, SS, BOD, DO, Nitrate, and pH) in tropical and subtropical environments [32]. The RBFNN model produced comparatively good predictive outcomes. Bozorg-Haddad et al. [10] tested the performance of genetic-programming (GP) and least-square-support-vector-regression (LSSVR) for the estimation of K, Na, Mg, EC, SO

_{4}, EC, TDS, and pH, in the Sefidrood River located in Iran. For all the computed models, the R

^{2}values is greater than 0.9, indicating good correlation. Al-Mukhtar and Al-Yaseen [33] used ANN, ANFIS and multiple-linear-regression (MLR) techniques to assess the water quality of the Abu-Ziriq River, located in Iraq. They forecasted the EC and TDS with most significant input variables (nitrate, chloride, calcium, magnesium, hardness and sulfate) and found that the ANFIS technique yielded the best outcomes. Sarkar and Pandey [34] used the ANN approach to analyze the amount of dissolved oxygen (DO) in river water over three distinct sites using four different variables i.e., pH, temperature, DO, and biochemical oxygen demand (BOD) and reported a correlation coefficient (R) value above 0.90 between the forecasted and actual DO data. Zhang et al. [35] used the combined hybridized model of ANN and GP algorithm to forecast the production of drinking water from chemical treatment plants. The findings showed that these created models performed well in forecasting the output capacity of the water treatment processing unit. Incorporation of additional data to the algorithm during training enhanced the model performance significantly. Chen et al. [36] utilized a comprehensive database to examine the water quality predictive ability of ten distinct ML models (three ensemble and seven conventional). The study indicated that utilizing a larger number of datapoints for water quality assessment can improve the predictive accuracy of the model. Some other prominent methodologies have been used effectively for different meteorological, environmental, and hydrological challenges (like rainfall forecasting), and these include tree-based algorithmic procedures, like random forest (RF), decision tree (DT) and support vector machine (SVM) models. These models are also recognized as a remarkable ML approach for both linear and complex non-linear engineering problems. Different researchers have utilized these algorithms with excellent predictive performance in a variety of scientific challenges [37]. Granata et al. [38] generated the SVM and RF model to forecast the content of TDS, TSS, BOD and COD, finding that the SVM model provided superior predictions. However, the efficacy is reduced when subjected to unseen data. In brief, the various mathematical models are developed that contributed to the betterment of human life [39,40,41,42,43,44,45,46].

## 2. Materials and Methods

#### 2.1. Multi-Expression Programming

- Two individuals (parents) are chosen employing the simple binary tournament process, subsequently reconstituted using a specified crossover probability.
- The two parents are then recombined to generate two offspring.
- After the mutation of the offspring, the weakest individual is substituted with the finest among them within the present population.

#### 2.2. Non-Linear Regression Approach

#### 2.3. Study Area and Data Collection

^{2}, respectively. The ice deposition of 2174 km

^{3}also exists at the same location. The height of UIB ranges between 455 to 8611 m, with the amount of annual precipitation between 100 and 200 mm, which changes the climatic condition within the basin proportional to the elevation [51,63].

_{4}), chloride (Cl), pH, and bicarbonates (HCO

_{3}),, along with two well-known outputs i.e., EC and TDS. Table 1 presents the key descriptive statistical attributes of the collected data, which includes mean, maximum and minimum values and the dispersion statistics like kurtosis, skewness and standard deviation. TDS levels in the following research vary considerably between 60 and 524 ppm, whereas EC readings range from 88 to 770 μS/cm. According to WHO recommendations, the acceptable limit of TDS in drinkable water is 300 to 600 mg/L, whereas the authorized level in agricultural water is 450 to 2000 mg/L [7]. Table 1 presents that the concentrations of EC and TDS are both inside the allowable range; nonetheless, it is essential that significant water quality indices be measured precisely and without substantial work. Also, the kurtosis and skewness lie in the permissible range of [–10, 10] and [−3, +3], respectively, showing an acceptable dispersion and peakedness in the model parameters [64,65].

#### 2.4. Statistical Indicators for Response Evaluation of Models

^{2}) (also known as root square value) and mean absolute error (MAE). R

^{2}is recognized to be the superior amongst these for examining the effectiveness of the models. The R

^{2}score from 0.65 to 0.75 implies outstanding performance, whereas below 0.50 indicates poor performance [69]. The formula used to get the R

^{2}value is given in Equation (4), where, the ‘P’ and ‘E’ denotes the model predicted and experimental findings, respectively. And ‘m’ is the total number of readings.

#### 2.5. Tunning the Modeling Hyper-Parameters

^{2}) is reached. In case the model findings for each set of data are inaccurate, the procedure is continued again, steadily boosting the size and number of subpopulations. The optimal result is then chosen depending on the lowest RMSE and highest R

^{2}. It was noted that the accuracy of certain models in the training phase was better than in the testing phase, indicating the overfitting, which must be prevented. It is worthy of mention that the evolution time and generations have an influence on the accuracy of the produced models. A model would keep developing endlessly with these types of algorithms owing to the inclusion of additional variables into the process. However, in the current work, the modeling was terminated after a thousand generations or when the improvement in fitness value falls below 0.1%. Furthermore, an optimum solution must fulfill several performance metrics, as discussed in Section 2.4.

## 3. Results and Discussions

#### 3.1. Formulation of EC and TDS Using MEP Model

^{2}of both EC and TDS MEP models is above 0.9 in each stage i.e., for training, validation and testing. It can be seen clearly from Figure 2 and Figure 3, the scatter of datapoints is near the 45° lines (1:1), resulting in the higher prediction accuracy of the suggested EC-MEP and TDS-MEP models, respectively. The R

^{2}-values for the EC model are 0.9519 (training), 0.9348 (validation) and 0.9674 (testing), while for the TDS model they are 0.9185 (training), 0.9441 (validation) and 0.9725 (testing). Both of the developed MEP models shows a generalized performance, as the measures in each stage are closer with a little difference [74].

#### 3.2. Overall Performance of Developed Models

^{2}-score consistently above 0.8 indicates a good association amongst measured and model predicted results [77]. Both EC and TDS are significantly associated with all the selected input variables. Conversely, the investigations revealed that R

^{2}identifies the linear relationship of outcome and independent factors. Therefore, evaluating the presented models (EC and TDS) just on the slope or inclination of the trendline and the regression coefficient is inadequate [76]. Thus, multiple statistical measures are used to examine the robustness and reliability of the generated MEP models.

#### 3.2.1. EC-MEP MODEL

#### 3.2.2. TDS-MEP Model

^{2}, and the reliable predictive performance considering the error metrics (i.e., MAE, RMSE, RMSLE, and MAPE). As shown in Figure 6, the distribution of absolute error near the x-axis witnesses the outburst performance of the developed TDS model. As can be seen in Table 5, the MAE is lesser than RMSE in all three stages, with the least values in the testing stage i.e., 8.27 ppm and 11.36 ppm, respectively. The maximum absolute error is 66.25 ppm, while the minimum was found to be 0.046 ppm, signifying the effectiveness of the developed TDS model [7]. In addition, the error histogram (see Figure 7) reflects zero error predictions above 45% of the absolute percent error with 309 readings (85.83% data points) having absolute percent error below 10%. Like the EC mode, the MAPE for TDS model in training, validation and for unseen data (testing) also falls below 10% i.e., 7.40%, 8.25%, and 5.54%, respectively, and can be similarly classified as “excellent” [7].

#### 3.3. Comparison between the MEP Models and NLRMs

^{2}), 29.67% (MAE), 54.67% (RMSE), 85.62% (RMSLE), and 27.19% (MAPE) lower than EC-MEP model. Consequently, TDS-NLRM gives 29.26% (R

^{2}), 27.34% (MAE), 45.67% (RMSE), 78.58% (RMSLE), and 34.59% (MAPE), which is an inaccurate prediction as compared to TDS-MEP model. In the testing phase, the R

^{2}and MAPE of EC-NLRM are 0.7156 and 25.88% respectively, and 0.7295 and 28.813% for TDS-NLRM models. Thus, the developed NLRMs can be categorized as “acceptable” models for prediction but are less accurate than MEP-based models. It can be observed from Table 6 that the performance measurements of NLRM models get worse when subjected to unseen (testing data), replicating the inconsistency and irregularities in the performance [35,36,78]. In essence, the traditional regression models (i.e., NLRMs) are not useful for the prediction of complex problems because of their inefficiency and lesser generalization capability [35,78].

#### 3.4. Sensitivity Analysis of MEP Models

## 4. Recommendations and Suggestions

## 5. Conclusions

_{4}), chloride (Cl), pH, and bicarbonates (HCO

_{3}). The accuracy, reliability and generalization of the established models were evaluated using various well-known of statistical measures, i.e., slope and coefficient of determination (R

^{2}), mean-absolute-present error (MAPE), mean-absolute-error (MAE), root-mean-square-logarithmic error (RMSLE), and root-mean-square error (RMSE). The performance of the models was compared with traditional multiple non-linear regression (NLRM) models. The regression results of EC-MEP and TDS-MEP showed excellent accuracy with coefficient of regression (R

^{2}) and slope above 0.95 in the testing phase on unseen data. Also, the error statistics are minimum, showing the generalized and reliable performance. The projected (RMSE and MAE) in EC prediction were (18.54 μS/cm and 12.36 μS/cm), (17.19 μS/cm and 12.14 μS/cm) and (16.43 μS/cm and 11.22 μS/cm) for training, validation and testing sets, respectively, and for the TDS modeling they were (13.36 ppm and 9.75 ppm), (13.33 ppm and 9.80 ppm) and (11.36 ppm and 8.27 ppm), respectively. The RMSLE approaches 0, indicating an outburst performance. According to MAPE, the performance of the established models was categorized as “excellent” and thus can be confidently used for future predictions. The predictions of NLRMs show a significant deviation from the targeted results, reflecting the reduced performance statistics that made the reliability of the NLRMs doubtful. However, the MAPE of the NLRMs also falls within acceptable limits i.e., below 50%. In essence, the traditional regression models (i.e., NLRMs) are not useful for the prediction of complex problems because of their inefficiency and least generalization capability. Furthermore, the sensitivity analysis of the developed MEP models revealed that all of the eight variables considered in current research influences the prediction of the water quality parameters (EC and TDS), with distinct effects having a sensitiveness index above 0.5. Thus, the developed EA-based MEP models are not merely the correlations but can be helpful for practitioners and decision-makers that will eventually save the time and money required for monitoring water quality parameters.

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Acknowledgments

## Conflicts of Interest

## Appendix A

#include <math.h> |

#include <stdio.h> |

void mepx(double *x/*inputs*/, double *outputs) |

{ |

double prg[38]; |

prg[0] = x[6]; |

prg[1] = x[5]; |

prg[2] = x[3]; |

prg[3] = prg[1] ∗ prg[1]; |

prg[4] = log(prg[3]); |

prg[5] = prg[2] + prg[1]; |

prg[6] = prg[5] + prg[5]; |

prg[7] = prg[2] ∗ prg[3]; |

prg[8] = prg[5] + prg[6]; |

prg[9] = prg[0] + prg[8]; |

prg[10] = x[4]; |

prg[11] = prg[10] − prg[0]; |

prg[12] = prg[9]/prg[2]; |

prg[13] = prg[4] ∗ prg[12]; |

prg[14] = prg[9] ∗ prg[9]; |

prg[15] = x[4]; |

prg[16] = prg[0] − prg[13]; |

prg[17] = prg[13] − prg[0]; |

prg[18] = x[0]; |

prg[19] = prg[14] + prg[5]; |

prg[20] = x[0]; |

prg[21] = log(prg[20]); |

prg[22] = x[7]; |

prg[23] = prg[16] ∗ prg[21]; |

prg[24] = prg[13] − prg[18]; |

prg[25] = prg[12] + prg[23]; |

prg[26] = prg[19] + prg[25]; |

prg[27] = prg[7]/prg[24]; |

prg[28] = prg[16] − prg[17]; |

prg[29] = prg[26] + prg[24]; |

prg[30] = prg[28] ∗ prg[15]; |

prg[31] = prg[27] + prg[29]; |

prg[32] = prg[30] + prg[31]; |

prg[33] = prg[32] + prg[21]; |

prg[34] = prg[33] − prg[7]; |

prg[35] = prg[22]/prg[11]; |

prg[36] = prg[30] + prg[34]; |

prg[37] = prg[35] + prg[36]; |

outputs[0] = prg[37]; |

} |

int main(void) |

{ |

//example of utilization... |

double x[8]; |

x[0] = 1.680000; |

x[1] = 0.730000; |

x[2] = 0.320000; |

x[3] = 1.550000; |

x[4] = 0.690000; |

x[5] = 0.480000; |

x[6] = 7.900000; |

x[7] = 5.555556; |

double outputs[1]; |

mepx(x, outputs); |

printf(“%lf”, outputs[0]); |

getchar(); |

} |

#include <math.h> |

#include <stdio.h> |

void mepx(double *x/*inputs*/, double *outputs) |

{ |

double prg[44]; |

prg[0] = x[6]; |

prg[1] = prg[0] ∗ prg[0]; |

prg[2] = x[5]; |

prg[3] = x[5]; |

prg[4] = x[3]; |

prg[5] = sqrt(prg[4]); |

prg[6] = x[3]; |

prg[7] = prg[5]/prg[6]; |

prg[8] = prg[5] − prg[2]; |

prg[9] = prg[7] ∗ prg[0]; |

prg[10] = prg[6] * prg[8]; |

prg[11] = prg[7] − prg[2]; |

prg[12] = prg[7] + prg[7]; |

prg[13] = prg[9] + prg[9]; |

prg[14] = prg[13] + prg[9]; |

prg[15] = prg[6] ∗ prg[14]; |

prg[16] = prg[11] + prg[12]; |

prg[17] = prg[14] + prg[5]; |

prg[18] = prg[17] + prg[9]; |

prg[19] = prg[16] + prg[11]; |

prg[20] = x[7]; |

prg[21] = x[1]; |

prg[22] = prg[3] + prg[4]; |

prg[23] = prg[1] ∗ prg[22]; |

prg[24] = prg[19] ∗ prg[19]; |

prg[25] = prg[22] + prg[12]; |

prg[26] = x[4]; |

prg[27] = sqrt(prg[25]); |

prg[28] = prg[9] + prg[23]; |

prg[29] = prg[26] ∗ prg[15]; |

prg[30] = prg[17] − prg[1]; |

prg[31] = prg[20]/prg[6]; |

prg[32] = prg[13]/prg[20]; |

prg[33] = prg[28] + prg[29]; |

prg[34] = prg[10] − prg[29]; |

prg[35] = prg[31] ∗ prg[27]; |

prg[36] = prg[30] + prg[24]; |

prg[37] = prg[35]/prg[34]; |

prg[38] = prg[37] + prg[36]; |

prg[39] = prg[32] − prg[26]; |

prg[40] = prg[21] + prg[33]; |

prg[41] = prg[38] + prg[40]; |

prg[42] = prg[39] + prg[18]; |

prg[43] = prg[42] + prg[41]; |

outputs[0] = prg[43]; |

} |

int main(void) |

{ |

//example of utilization... |

double x[8]; |

x[0] = 1.680000; |

x[1] = 0.730000; |

x[2] = 0.320000; |

x[3] = 1.550000; |

x[4] = 0.690000; |

x[5] = 0.480000; |

x[6] = 7.900000; |

x[7] = 5.600000; |

double outputs[1]; |

mepx(x, outputs); |

printf(“%lf”, outputs[0]); |

getchar(); |

} |

## References

- Pandhiani, S.M.; Sihag, P.; Shabri, A.B.; Singh, B.; Pham, Q.B. Time-series prediction of streamflows of Malaysian rivers using data-driven techniques. J. Irrig. Drain. Eng.
**2020**, 146, 04020013. [Google Scholar] [CrossRef] - Singh, A.P.; Dhadse, K.; Ahalawat, J. Managing water quality of a river using an integrated geographically weighted regression technique with fuzzy decision-making model. Environ. Monit. Assess.
**2019**, 191, 378. [Google Scholar] [PubMed] - Shahzad, G.; Rehan, R.; Fahim, M. Rapid performance evaluation of water supply services for strategic planning. Civ. Eng. J.
**2019**, 5, 1197–1204. [Google Scholar] [CrossRef] [Green Version] - Solangi, G.S.; Siyal, A.A.; Siyal, P. Analysis of Indus Delta groundwater and surface water suitability for domestic and irrigation purposes. Civ. Eng. J.
**2019**, 5, 1599–1608. [Google Scholar] [CrossRef] [Green Version] - Kim, H.; Jeong, H.; Jeon, J.; Bae, S. Effects of irrigation with saline water on crop growth and yield in greenhouse cultivation. Water
**2016**, 8, 127. [Google Scholar] [CrossRef] [Green Version] - Velmurugan, A.; Swarnam, P.; Subramani, T.; Meena, B.; Kaledhonkar, M. Water demand and salinity. In Desalination-Challenges and Opportunities; IntechOpen: London, UK, 2020. [Google Scholar]
- Jamei, M.; Ahmadianfar, I.; Chu, X.; Yaseen, Z.M. Prediction of surface water total dissolved solids using hybridized wavelet-multigene genetic programming: New approach. J. Hydrol.
**2020**, 589, 125335. [Google Scholar] [CrossRef] - Jagaba, A.; Kutty, S.; Hayder, G.; Baloo, L.; Abubakar, S.; Ghaleb, A.; Lawal, I.; Noor, A.; Umaru, I.; Almahbashi, N. Water quality hazard assessment for hand dug wells in Rafin Zurfi, Bauchi State, Nigeria. Ain Shams Eng. J.
**2020**, 11, 983–999. [Google Scholar] [CrossRef] - Sattari, M.T.; Joudi, A.R.; Kusiak, A. Estimation of Water Quality Parameters with Data-Driven Model. J.-Am. Water Work. Assoc.
**2016**, 108, E232–E239. [Google Scholar] [CrossRef] [Green Version] - Bozorg-Haddad, O.; Soleimani, S.; Loáiciga, H.A. Modeling water-quality parameters using genetic algorithm–least squares support vector regression and genetic programming. J. Environ. Eng.
**2017**, 143, 04017021. [Google Scholar] [CrossRef] - Salami, E.; Salari, M.; Ehteshami, M.; Bidokhti, N.; Ghadimi, H. Application of artificial neural networks and mathematical modeling for the prediction of water quality variables (case study: Southwest of Iran). Desalination Water Treat.
**2016**, 57, 27073–27084. [Google Scholar] [CrossRef] - El Osta, M.; Masoud, M.; Alqarawy, A.; Elsayed, S.; Gad, M. Groundwater Suitability for Drinking and Irrigation Using Water Quality Indices and Multivariate Modeling in Makkah Al-Mukarramah Province, Saudi Arabia. Water
**2022**, 14, 483. [Google Scholar] [CrossRef] - Deng, W.; Wang, G.; Zhang, X. A novel hybrid water quality time series prediction method based on cloud model and fuzzy forecasting. Chemom. Intell. Lab. Syst.
**2015**, 149, 39–49. [Google Scholar] [CrossRef] - Alexakis, D.E. Linking DPSIR Model and Water Quality Indices to Achieve Sustainable Development Goals in Groundwater Resources. Hydrology
**2021**, 8, 90. [Google Scholar] [CrossRef] - Alexakis, D.E. Meta-evaluation of water quality indices. application into groundwater resources. Water
**2020**, 12, 1890. [Google Scholar] [CrossRef] - Dehghani, M.; Saghafian, B.; Nasiri Saleh, F.; Farokhnia, A.; Noori, R. Uncertainty analysis of streamflow drought forecast using artificial neural networks and Monte-Carlo simulation. Int. J. Climatol.
**2014**, 34, 1169–1180. [Google Scholar] - Mandal, S.; Mahapatra, S.; Adhikari, S.; Patel, R. Modeling of arsenic (III) removal by evolutionary genetic programming and least square support vector machine models. Environ. Process
**2015**, 2, 145–172. [Google Scholar] [CrossRef] [Green Version] - Alizadeh, M.J.; Kavianpour, M.R.; Danesh, M.; Adolf, J.; Shamshirband, S.; Chau, K.-W. Effect of river flow on the quality of estuarine and coastal waters using machine learning models. Eng. Appl. Comput. Fluid Mech.
**2018**, 12, 810–823. [Google Scholar] [CrossRef] [Green Version] - Kargar, K.; Samadianfard, S.; Parsa, J.; Nabipour, N.; Shamshirband, S.; Mosavi, A.; Chau, K.-W. Estimating longitudinal dispersion coefficient in natural streams using empirical models and machine learning algorithms. Eng. Appl. Comput. Fluid Mech.
**2020**, 14, 311–322. [Google Scholar] [CrossRef] - Sihag, P.; Tiwari, N.; Ranjan, S. Prediction of unsaturated hydraulic conductivity using adaptive neuro-fuzzy inference system (ANFIS). ISH J. Hydraul. Eng.
**2019**, 25, 132–142. [Google Scholar] [CrossRef] - Sihag, P.; Tiwari, N.; Ranjan, S. Modelling of infiltration of sandy soil using gaussian process regression. Modeling Earth Syst. Environ.
**2017**, 3, 1091–1100. [Google Scholar] [CrossRef] - Yaseen, Z.M.; Sulaiman, S.O.; Deo, R.C.; Chau, K.-W. An enhanced extreme learning machine model for river flow forecasting: State-of-the-art, practical applications in water resource engineering area and future research direction. J. Hydrol.
**2019**, 569, 387–408. [Google Scholar] [CrossRef] - Najafzadeh, M.; Tafarojnoruz, A. Evaluation of neuro-fuzzy GMDH-based particle swarm optimization to predict longitudinal dispersion coefficient in rivers. Environ. Earth Sci.
**2016**, 75, 157. [Google Scholar] [CrossRef] - Najafzadeh, M.; Tafarojnoruz, A.; Lim, S.Y. Prediction of local scour depth downstream of sluice gates using data-driven models. ISH J. Hydraul. Eng.
**2017**, 23, 195–202. [Google Scholar] [CrossRef] - Najafzadeh, M.; Rezaie-Balf, M.; Tafarojnoruz, A. Prediction of riprap stone size under overtopping flow using data-driven models. Int. J. River Basin Manag.
**2018**, 16, 505–512. [Google Scholar] [CrossRef] - Tung, T.M.; Yaseen, Z.M. A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol.
**2020**, 585, 124670. [Google Scholar] - Tripathi, M.; Singal, S.K. Use of principal component analysis for parameter selection for development of a novel water quality index: A case study of river Ganga India. Ecol. Indic.
**2019**, 96, 430–436. [Google Scholar] [CrossRef] - Zali, M.A.; Retnam, A.; Juahir, H.; Zain, S.M.; Kasim, M.F.; Abdullah, B.; Saadudin, S.B. Sensitivity analysis for water quality index (WQI) prediction for Kinta River, Malaysia. World Appl. Sci. J.
**2011**, 14, 60–65. [Google Scholar] - Nigam, U.; SM, Y. Development of computational assessment model of fuzzy rule based evaluation of groundwater quality index: Comparison and analysis with conventional index. In Proceedings of the International Conference on Sustainable Computing in Science, Technology and Management (SUSCOM), Amity University Rajasthan, Jaipur, India, 26–28 February 2019. [Google Scholar]
- Srinivas, R.; Singh, A.P. Application of fuzzy multi-criteria approach to assess the water quality of river Ganges. In Soft Computing: Theories and Applications; Springer: Berlin/Heidelberg, Germany, 2018; pp. 513–522. [Google Scholar]
- Yaseen, Z.M.; Ramal, M.M.; Diop, L.; Jaafar, O.; Demir, V.; Kisi, O. Hybrid adaptive neuro-fuzzy models for water quality index estimation. Water Resour. Manag.
**2018**, 32, 2227–2245. [Google Scholar] [CrossRef] - Hameed, M.; Sharqi, S.S.; Yaseen, Z.M.; Afan, H.A.; Hussain, A.; Elshafie, A. Application of artificial intelligence (AI) techniques in water quality index prediction: A case study in tropical region, Malaysia. Neural Comput. Appl.
**2017**, 28, 893–905. [Google Scholar] [CrossRef] - Al-Mukhtar, M.; Al-Yaseen, F. Modeling water quality parameters using data-driven models, a case study Abu-Ziriq marsh in south of Iraq. Hydrology
**2019**, 6, 24. [Google Scholar] [CrossRef] [Green Version] - Sarkar, A.; Pandey, P. River water quality modelling using artificial neural network technique. Aquat. Procedia
**2015**, 4, 1070–1077. [Google Scholar] [CrossRef] - Zhang, Y.; Gao, X.; Smith, K.; Inial, G.; Liu, S.; Conil, L.B.; Pan, B. Integrating water quality and operation into prediction of water production in drinking water treatment plants by genetic algorithm enhanced artificial neural network. Water Res.
**2019**, 164, 114888. [Google Scholar] [CrossRef] [PubMed] - Chen, K.; Chen, H.; Zhou, C.; Huang, Y.; Qi, X.; Shen, R.; Liu, F.; Zuo, M.; Zou, X.; Wang, J. Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res.
**2020**, 171, 115454. [Google Scholar] [CrossRef] [PubMed] - Hazarika, B.B.; Gupta, D.; Ashu; Berlin, M. A Comparative Analysis of Artificial Neural Network and Support Vector Regression for River Suspended Sediment Load Prediction. In First International Conference on Sustainable Technologies for Computational Intelligence; Springer: Singapore, 2020; pp. 339–349. [Google Scholar]
- Granata, F.; Papirio, S.; Esposito, G.; Gargano, R.; De Marinis, G. Machine Learning Algorithms for the Forecasting of Wastewater Quality Indicators. Water
**2017**, 9, 105. [Google Scholar] [CrossRef] [Green Version] - Zha, T.-H.; Castillo, O.; Jahanshahi, H.; Yusuf, A.; Alassafi, M.O.; Alsaadi, F.E.; Chu, Y.-M. A fuzzy-based strategy to suppress the novel coronavirus (2019-NCOV) massive outbreak. Appl. Comput. Math.
**2021**, 20, 160–176. [Google Scholar] - Nazeer, M.; Hussain, F.; Khan, M.I.; El-Zahar, E.R.; Chu, Y.-M.; Malik, M. Theoretical study of MHD electro-osmotically flow of third-grade fluid in micro channel. Appl. Math. Comput.
**2022**, 420, 126868. [Google Scholar] [CrossRef] - Zhao, T.H.; Khan, M.I.; Chu, Y.M. Artificial neural networking (ANN) analysis for heat and entropy generation in flow of non-Newtonian fluid between two rotating disks. Math. Methods Appl. Sci.
**2021**. [Google Scholar] [CrossRef] - Chu, H.-H.; Zhao, T.-H.; Chu, Y.-M. Sharp bounds for the Toader mean of order 3 in terms of arithmetic, quadratic and contraharmonic means. Math. Slovaca
**2020**, 70, 1097–1112. [Google Scholar] [CrossRef] - Zhao, T.-H.; He, Z.-Y.; Chu, Y.-M. On some refinements for inequalities involving zero-balanced hypergeometric function. AIMS Math.
**2020**, 5, 6479–6495. [Google Scholar] [CrossRef] - Zhao, T.-H.; Wang, M.-K.; Chu, Y.-M. A sharp double inequality involving generalized complete elliptic integral of the first kind. AIMS Math.
**2020**, 5, 4512–4528. [Google Scholar] - Zhao, T.-H.; Zhou, B.-C.; Wang, M.-K.; Chu, Y.-M. On approximating the quasi-arithmetic mean. J. Inequal. Appl.
**2019**, 2019, 42. [Google Scholar] [CrossRef] - Zhao, T.-H.; Wang, M.-K.; Zhang, W.; Chu, Y.-M. Quadratic transformation inequalities for Gaussian hypergeometric function. J. Inequalities Appl.
**2018**, 2018, 1–15. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Azimi, S.; Azhdary Moghaddam, M.; Hashemi Monfared, S.A. Prediction of annual drinking water quality reduction based on Groundwater Resource Index using the artificial neural network and fuzzy clustering. J. Contam. Hydrol.
**2019**, 220, 6–17. [Google Scholar] [CrossRef] [PubMed] - Ismael, M.; Mokhtar, A.; Farooq, M.; Lü, X. Assessing drinking water quality based on physical, chemical and microbial parameters in the Red Sea State, Sudan using a combination of water quality index and artificial neural network model. Groundw. Sustain. Dev.
**2021**, 14, 100612. [Google Scholar] [CrossRef] - Kim, J.; Seo, D.; Jang, M.; Kim, J. Augmentation of limited input data using an artificial neural network method to improve the accuracy of water quality modeling in a large lake. J. Hydrol.
**2021**, 602, 126817. [Google Scholar] [CrossRef] - Zhang, Q.; Li, Z.; Zhu, L.; Zhang, F.; Sekerinski, E.; Han, J.-C.; Zhou, Y. Real-time prediction of river chloride concentration using ensemble learning. Environ. Pollut.
**2021**, 291, 118116. [Google Scholar] [CrossRef] - Shah, M.I.; Javed, M.F.; Abunama, T. Proposed formulation of surface water quality and modelling using gene expression, machine learning, and regression techniques. Environ. Sci. Pollut. Res. Int.
**2021**, 28, 13202–13220. [Google Scholar] [CrossRef] - Jiang, L.; Chui, T.F.M. A review of the application of constructed wetlands (CWs) and their hydraulic, water quality and biological responses to changing hydrological conditions. Ecol. Eng.
**2022**, 174, 106459. [Google Scholar] [CrossRef] - Oltean, M.; Grosan, C. A comparison of several linear genetic programming techniques. Complex Syst.
**2003**, 14, 285–314. [Google Scholar] - Arabshahi, A.; Gharaei-Moghaddam, N.; Tavakkolizadeh, M. Development of applicable design models for concrete columns confined with aramid fiber reinforced polymer using Multi-Expression Programming. Structures
**2020**, 23, 225–244. [Google Scholar] [CrossRef] - Goldberg, D.E. Genetic Algorithms; Pearson Education India: London, UK, 2006. [Google Scholar]
- Koza, J.R.; Koza, J.R. Genetic Programming: On the Programming of Computers by Means of Natural Selection; MIT Press: Cambridge, MA, USA, 1992; Volume 1. [Google Scholar]
- Alavi, A.H.; Gandomi, A.H.; Nejad, H.C.; Mollahasani, A.; Rashed, A. Design equations for prediction of pressuremeter soil deformation moduli utilizing expression programming systems. Neural Comput. Appl.
**2013**, 23, 1771–1786. [Google Scholar] [CrossRef] - Li, P.; Khan, M.A.; El-Zahar, E.R.; Awan, H.H.; Zafar, A.; Javed, M.F.; Khan, M.I.; Qayyum, S.; Malik, M.; Wang, F. Sustainable Use of Chemically modified Tyre Rubber in Concrete: Machine Learning based Novel Predictive Model. Chem. Phys. Lett.
**2022**, 793, 139478. [Google Scholar] [CrossRef] - FM Zain, M.; M Abd, S. Multiple regression model for compressive strength prediction of high performance concrete. J. Appl. Sci.
**2009**, 9, 155–160. [Google Scholar] [CrossRef] - Faradonbeh, R.S.; Hasanipanah, M.; Amnieh, H.B.; Armaghani, D.J.; Monjezi, M. Development of GP and GEP models to estimate an environmental issue induced by blasting operation. Environ. Monit. Assess.
**2018**, 190, 351. [Google Scholar] [CrossRef] [PubMed] - Ilyas, I.; Zafar, A.; Javed, M.F.; Farooq, F.; Aslam, F.; Musarat, M.A.; Vatin, N.I. Forecasting Strength of CFRP Confined Concrete Using Multi Expression Programming. Materials
**2021**, 14, 7134. [Google Scholar] [CrossRef] - Chu, H.-H.; Khan, M.A.; Javed, M.; Zafar, A.; Khan, M.I.; Alabduljabbar, H.; Qayyum, S. Sustainable use of fly-ash: Use of gene-expression programming (GEP) and multi-expression programming (MEP) for forecasting the compressive strength geopolymer concrete. Ain Shams Eng. J.
**2021**, 12, 3603–3617. [Google Scholar] [CrossRef] - Tahir, A.A.; Chevallier, P.; Arnaud, Y.; Neppel, L.; Ahmad, B. Modeling snowmelt-runoff under climate scenarios in the Hunza River basin, Karakoram Range, Northern Pakistan. J. Hydrol.
**2011**, 409, 104–117. [Google Scholar] [CrossRef] - Khan, M.A.; Farooq, F.; Javed, M.F.; Zafar, A.; Ostrowski, K.A.; Aslam, F.; Malazdrewicz, S.; Maślak, M. Simulation of Depth of Wear of Eco-Friendly Concrete Using Machine Learning Based Computational Approaches. Materials
**2022**, 15, 58. [Google Scholar] [CrossRef] - Khan, S.; Ali Khan, M.; Zafar, A.; Javed, M.F.; Aslam, F.; Musarat, M.A.; Vatin, N.I. Predicting the Ultimate Axial Capacity of Uniaxially Loaded CFST Columns Using Multiphysics Artificial Intelligence. Materials
**2022**, 15, 39. [Google Scholar] [CrossRef] - Khan, M.A.; Shah, M.I.; Javed, M.F.; Khan, M.I.; Rasheed, S.; El-Shorbagy, M.; El-Zahar, E.R.; Malik, M. Application of random forest for modelling of surface water salinity. Ain Shams Eng. J.
**2021**, 13, 101635. [Google Scholar] [CrossRef] - Jalal, F.E.; Xu, Y.; Iqbal, M.; Javed, M.F.; Jamhiri, B. Predictive modeling of swell-strength of expansive soils using artificial intelligence approaches: ANN, ANFIS and GEP. J. Environ. Manag.
**2021**, 289, 112420. [Google Scholar] [CrossRef] [PubMed] - Azim, I.; Yang, J.; Iqbal, M.F.; Mahmood, Z.; Javed, M.F.; Wang, F.; Liu, Q.-F. Prediction of catenary action capacity of RC beam-column substructures under a missing column scenario using evolutionary algorithm. KSCE J. Civ. Eng.
**2021**, 25, 891–905. [Google Scholar] [CrossRef] - Nafees, A.; Amin, M.N.; Khan, K.; Nazir, K.; Ali, M.; Javed, M.F.; Aslam, F.; Musarat, M.A.; Vatin, N.I. Modeling of Mechanical Properties of Silica Fume-Based Green Concrete Using Machine Learning Techniques. Polymers
**2022**, 14, 30. [Google Scholar] [CrossRef] [PubMed] - Iqbal, M.F.; Javed, M.F.; Rauf, M.; Azim, I.; Ashraf, M.; Yang, J.; Liu, Q.-F. Sustainable utilization of foundry waste: Forecasting mechanical properties of foundry sand based concrete using multi-expression programming. Sci. Total Environ.
**2021**, 780, 146524. [Google Scholar] [CrossRef] - Mousavi, S.; Alavi, A.; Gandomi, A.; Arab Esmaeili, M.; Gandomi, M. A data mining approach to compressive strength of CFRP-confined concrete cylinders. Struct. Eng. Mech.
**2010**, 36, 759. [Google Scholar] [CrossRef] - Qiu, R.; Wang, Y.; Wang, D.; Qiu, W.; Wu, J.; Tao, Y. Water temperature forecasting based on modified artificial neural network methods: Two cases of the Yangtze River. Sci. Total Environ.
**2020**, 737, 139729. [Google Scholar] [CrossRef] - Jalal, F.E.; Xu, Y.; Iqbal, M.; Jamhiri, B.; Javed, M.F. Predicting the compaction characteristics of expansive soils using two genetic programming-based algorithms. Transp. Geotech.
**2021**, 30, 100608. [Google Scholar] [CrossRef] - Khan, M.A.; Zafar, A.; Farooq, F.; Javed, M.F.; Alyousef, R.; Alabduljabbar, H.; Khan, M.I. Geopolymer Concrete Compressive Strength via Artificial Neural Network, Adaptive Neuro Fuzzy Interface System, and Gene Expression Programming with K-Fold Cross Validation. Front. Mater.
**2021**, 8, 621163. [Google Scholar] [CrossRef] - Gholampour, A.; Gandomi, A.H.; Ozbakkaloglu, T. New formulations for mechanical properties of recycled aggregate concrete using gene expression programming. Constr. Build. Mater.
**2017**, 130, 122–145. [Google Scholar] [CrossRef] - Liu, Q.-F.; Iqbal, M.F.; Yang, J.; Lu, X.-Y.; Zhang, P.; Rauf, M. Prediction of chloride diffusivity in concrete using artificial neural network: Modelling and performance evaluation. Constr. Build. Mater.
**2021**, 268, 121082. [Google Scholar] [CrossRef] - Farooq, F.; Ahmed, W.; Akbar, A.; Aslam, F.; Alyousef, R. Predictive modeling for sustainable high-performance concrete from industrial wastes: A comparison and optimization of models using ensemble learners. J. Clean. Prod.
**2021**, 292, 126032. [Google Scholar] [CrossRef] - Iqbal, M.F.; Liu, Q.-F.; Azim, I.; Zhu, X.; Yang, J.; Javed, M.F.; Rauf, M. Prediction of mechanical properties of green concrete incorporating waste foundry sand based on gene expression programming. J. Hazard. Mater.
**2020**, 384, 121322. [Google Scholar] [CrossRef] [PubMed] - Ardakani, A.; Kordnaeij, A. Soil compaction parameters prediction using GMDH-type neural network and genetic algorithm. Eur. J. Environ. Civ. Eng.
**2019**, 23, 449–462. [Google Scholar] [CrossRef] - Wang, H.-L.; Yin, Z.-Y. High performance prediction of soil compaction parameters using multi expression programming. Eng. Geol.
**2020**, 276, 105758. [Google Scholar] [CrossRef]

Independent and Dependent Variables | Descriptive Statistics | ||||||
---|---|---|---|---|---|---|---|

Range | Minimum | Maximum | Mean | Std. Deviation | Skewness | Kurtosis | |

Ca | 103.39 | 0.61 | 104.00 | 1.82343 | 5.420 | 1.26 | 3.22 |

Mg | 2.61 | 0.03 | 2.64 | 0.6149 | 0.343 | 1.92 | 4.42 |

Na | 8.95 | 0.05 | 9.00 | 0.5427 | 0.672 | 1.86 | 4.44 |

HCO_{3} | 7.29 | 0.11 | 7.40 | 1.7404 | 0.697 | 1.24 | 2.66 |

Cl | 4.20 | 0.00 | 4.20 | 0.2968 | 0.312 | 1.25 | 5.80 |

SO_{4} | 3.10 | 0.10 | 3.20 | 0.5758 | 0.383 | 2.18 | 1.87 |

PH | 8.40 | 0.00 | 8.40 | 7.8413 | 0.621 | −1.86 | 4.56 |

WT (°C) | 20.11 | 1.00 | 21.11 | 12.1813 | 3.812 | −0.32 | −0.75 |

TDS (ppm) | 464 | 60 | 524 | 148.35 | 62.231 | 4.584 | 37.225 |

EC (μS/cm) | 690 | 88 | 770 | 250.42 | 98.691 | 3.801 | 31.808 |

Pearson’s Correlation Matrix | Ca | Mg | Na | HCO_{3} | Cl | SO_{4} | pH | WT (°C) | TDS (mg/L) |
---|---|---|---|---|---|---|---|---|---|

Ca | 1.000 | ||||||||

Mg | 0.315 | 1.000 | |||||||

Na | 0.398 | 0.332 | 1.000 | ||||||

HCO_{3} | 0.626 | 0.482 | 0.594 | 1.000 | |||||

Cl | 0.446 | 0.380 | 0.490 | 0.449 | 1.000 | ||||

SO_{4} | 0.381 | 0.444 | 0.454 | 0.178 | 0.198 | 1.000 | |||

pH | −0.133 | 0.116 | −0.005 | 0.023 | −0.051 | −0.018 | 1.000 | ||

WT (°C) | −0.464 | −0.390 | −0.249 | −0.296 | −0.239 | −0.316 | 0.021 | 1.000 | |

TDS (mg/L) | 0.704 | 0.619 | 0.759 | 0.769 | 0.587 | 0.546 | −0.630 | −0.664 | 1.000 |

Pearson’s Correlation Matrix | Ca | Mg | Na | HCO_{3} | Cl | SO_{4} | pH | WT (°C) | EC (μS/cm) |
---|---|---|---|---|---|---|---|---|---|

Ca | 1.000 | ||||||||

Mg | 0.315 | 1.000 | |||||||

Na | 0.398 | 0.332 | 1.000 | ||||||

HCO_{3} | 0.626 | 0.482 | 0.594 | 1.000 | |||||

Cl | 0.446 | 0.380 | 0.490 | 0.449 | 1.000 | ||||

SO_{4} | 0.381 | 0.444 | 0.454 | 0.178 | 0.198 | 1.000 | |||

pH | −0.133 | 0.116 | −0.005 | 0.023 | −0.051 | −0.018 | 1.000 | ||

WT (°C) | −0.464 | −0.390 | −0.249 | −0.296 | −0.239 | −0.316 | 0.021 | 1.000 | |

EC (μS/cm) | 0.715 | 0.601 | 0.728 | 0.767 | 0.555 | 0.508 | −0.616 | −0.649 | 1.000 |

Hyper Parameters | Optimized Setting |
---|---|

Operators | Addition, subtraction, multiplication, division, Sqrt, Ln |

Num subpopulations | 50 |

Subpopulation size | 250 |

Code length | 50 |

Crossover probability | 0.9 |

Crossover type | uniform |

Mutation probability | 0.01 |

Tournament size | 2 |

Operators probability | 0.5 |

Variables probability | 0.5 |

Num generations | 1000 |

Statistical Measure | MEP-EC | MEP-TDS | ||||
---|---|---|---|---|---|---|

Training | Validation | Testing | Training | Validation | Testing | |

R^{2} | 0.9519 | 0.9348 | 0.9674 | 0.9186 | 0.9442 | 0.9725 |

MAPE | 5.317 | 6.145 | 6.119 | 7.40 | 8.25 | 5.54 |

MAE | 12.36 | 12.14 | 11.22 | 9.75 | 9.80 | 8.27 |

RMSE | 18.54 | 17.19 | 16.43 | 13.36 | 13.33 | 11.36 |

RMSLE | 0.000582 | 0.000469 | 0.000402 | 0.000868 | 0.00033 | 0.00031 |

Statistical Measure | NLRM-EC | NLRM-TDS | ||||
---|---|---|---|---|---|---|

Training | Validation | Testing | Training | Validation | Testing | |

R^{2} | 0.9133 | 0.9343 | 0.7156 | 0.9204 | 0.9552 | 0.7295 |

MAPE | 16.148 | 17.863 | 25.88 | 15.896 | 16.017 | 28.813 |

MAE | 13.320 | 15.738 | 22.912 | 8.135 | 8.258 | 12.95 |

RMSE | 25.599 | 27.742 | 43.004 | 14.544 | 14.234 | 33.15 |

RMSLE | 0.0013 | 0.00253 | 0.00351 | 0.00124 | 0.00753 | 0.00853 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Aldrees, A.; Khan, M.A.; Tariq, M.A.U.R.; Mustafa Mohamed, A.; Ng, A.W.M.; Bakheit Taha, A.T.
Multi-Expression Programming (MEP): Water Quality Assessment Using Water Quality Indices. *Water* **2022**, *14*, 947.
https://doi.org/10.3390/w14060947

**AMA Style**

Aldrees A, Khan MA, Tariq MAUR, Mustafa Mohamed A, Ng AWM, Bakheit Taha AT.
Multi-Expression Programming (MEP): Water Quality Assessment Using Water Quality Indices. *Water*. 2022; 14(6):947.
https://doi.org/10.3390/w14060947

**Chicago/Turabian Style**

Aldrees, Ali, Mohsin Ali Khan, Muhammad Atiq Ur Rehman Tariq, Abdeliazim Mustafa Mohamed, Ane Wai Man Ng, and Abubakr Taha Bakheit Taha.
2022. "Multi-Expression Programming (MEP): Water Quality Assessment Using Water Quality Indices" *Water* 14, no. 6: 947.
https://doi.org/10.3390/w14060947