Multi-Expression Programming (MEP): Water Quality Assessment Using Water Quality Indices

Water contamination is indeed a worldwide problem that threatens public health, environmental protection, and agricultural productivity. The distinctive attributes of machine learning (ML)-based modelling can provide in-depth understanding into increasing water quality challenges. This study presents the development of a multi-expression programming (MEP) based predictive model for water quality parameters, i.e., electrical conductivity (EC) and total dissolved solids (TDS) in the upper Indus River at two different outlet locations using 360 readings collected on a monthly basis. The optimized MEP models were assessed using different statistical measurements i.e., coefficient-of-determination (R2), root-mean-square error (RMSE), mean-absolute error (MAE), root-mean-square-logarithmic error (RMSLE) and mean-absolute-percent error (MAPE). The results show that the R2 in the testing phase (subjected to unseen data) for EC-MEP and TDS-MEP models is above 0.90, i.e., 0.9674 and 0.9725, respectively, reflecting the higher accuracy and generalized performance. Also, the error measures are quite lower. In accordance with MAPE statistics, both the MEP models shows an “excellent” performance in all three stages. In comparison with traditional non-linear regression models (NLRMs), the developed machine learning models have good generalization capabilities. The sensitivity analysis of the developed MEP models with regard to the significance of each input on the forecasted water quality parameters suggests that Cl and HCO3 have substantial impacts on the predictions of MEP models (EC and TDS), with a sensitiveness index above 0.90, although the influence of the Na is the less prominent. The results of this research suggest that the development of intelligence models for EC and TDS are cost effective and viable for the evaluation and monitoring of the quality of river water.


Introduction
Surface water connected with rivers and streams is a key component in controlling the hydrological cycle, ecology, social wellbeing, and economic growth [1]. Environmental events, including annual precipitation and erosion, and also social elements like agricultural, urban, and industrial production procedures, all have an impact on river water variables to just the nine best combinations of explanatory parameters, which includes Hydrogen Power (pH), EC, TDS, Sulfate (SO 4 2− ), Dissolved Oxygen (DO), Chlorine (Cl − ), Total Coliform (TC), Magnesium (Mg), and Biochemical Oxygen Demand (BOD). The use of only nine variables results in quicker calculations, consequently reducing the computational duration. Similarly, Zali et al. [28] used the ML technique i.e., artificial neural networks (ANNs) to investigate the impacts of six key explanatory variables (i.e., Chemical Oxygen Demand (COD), Suspended Solids (SS), Nitrate (NO 3 − ), BOD, DO, and pH) for the computation of WQI. Determining the comparative relevance of every variable in WQI prediction using a sensitivity analysis showed that DO, SS, and NO 3 − are indeed the essential input variables. Nigam and SM [29] compared the prediction performance of fuzzy based models and conventional computation techniques for the calculation of WQI of ground water, reporting comparatively the outburst predictive power of fuzzy (an intelligent model). Thus, this categorizes the water quality and surpasses the predictions of conventional calculation techniques. Srinivas and Singh [30] extended their study to an Interactive Fuzzy model (IFM) for establishment of a unique fuzzy decision-making technique for predicting WQI in rivers. The results of their research show a considerable enhancement in WQI predictive accuracy in comparison with a conventional fuzzy approach. Yaseen et al. [31] investigated the estimation efficiency of adaptive-neuro-fuzzy-inferencesystem (ANFIS)-based hybrid models combined with subtractive clustering (SC), Fuzzy C-mean data clustering (FCM), and grid partitioning (GP). They found that ANFIS-SC is the best and most consistent model. Radial-basis-function-neural-networks (RBFN) and back-propagation-neural-network (BPNN) algorithms were used to propose a model for the establishment of the relation between WQI and many biological variables (like COD, SS, BOD, DO, Nitrate, and pH) in tropical and subtropical environments [32]. The RBFNN model produced comparatively good predictive outcomes. Bozorg-Haddad et al. [10] tested the performance of genetic-programming (GP) and least-square-support-vector-regression (LSSVR) for the estimation of K, Na, Mg, EC, SO 4 , EC, TDS, and pH, in the Sefidrood River located in Iran. For all the computed models, the R 2 values is greater than 0.9, indicating good correlation. Al-Mukhtar and Al-Yaseen [33] used ANN, ANFIS and multiple-linearregression (MLR) techniques to assess the water quality of the Abu-Ziriq River, located in Iraq. They forecasted the EC and TDS with most significant input variables (nitrate, chloride, calcium, magnesium, hardness and sulfate) and found that the ANFIS technique yielded the best outcomes. Sarkar and Pandey [34] used the ANN approach to analyze the amount of dissolved oxygen (DO) in river water over three distinct sites using four different variables i.e., pH, temperature, DO, and biochemical oxygen demand (BOD) and reported a correlation coefficient (R) value above 0.90 between the forecasted and actual DO data. Zhang et al. [35] used the combined hybridized model of ANN and GP algorithm to forecast the production of drinking water from chemical treatment plants. The findings showed that these created models performed well in forecasting the output capacity of the water treatment processing unit. Incorporation of additional data to the algorithm during training enhanced the model performance significantly. Chen et al. [36] utilized a comprehensive database to examine the water quality predictive ability of ten distinct ML models (three ensemble and seven conventional). The study indicated that utilizing a larger number of datapoints for water quality assessment can improve the predictive accuracy of the model. Some other prominent methodologies have been used effectively for different meteorological, environmental, and hydrological challenges (like rainfall forecasting), and these include tree-based algorithmic procedures, like random forest (RF), decision tree (DT) and support vector machine (SVM) models. These models are also recognized as a remarkable ML approach for both linear and complex non-linear engineering problems. Different researchers have utilized these algorithms with excellent predictive performance in a variety of scientific challenges [37]. Granata et al. [38] generated the SVM and RF model to forecast the content of TDS, TSS, BOD and COD, finding that the SVM model provided superior predictions. However, the efficacy is reduced when subjected to unseen data. In brief, the various mathematical models are developed that contributed to the betterment of human life [39][40][41][42][43][44][45][46].
Although conventional AI algorithms rely on ANN and ANFIS, are extensively used for WQP modeling. Environmentalists have been investigating new resilient and robust intelligent algorithms [18,19]. It is worthy to mention that the neural networks work like a black box and do not consider any physical phenomenon of the issue being resolved. Most of the neural networks deliver a complex expression for the prediction of outcome on the basis of inputs [47]. In fact, ANN based models are considered as a correlation amongst the inputs and outputs, and the relation is either linear or relies on the pre-defined base functions [48,49]. Also, the tree-based algorithms are only good at capturing the linear relationship [50]. To overcome these issues, researchers used EA algorithms like GEP for simulation of water WQPs [51,52]. The EAs are advantageous in the condition where realistic and practical expression with higher generalization and prediction capabilities is required. However, the GEP is unable to incorporate the diverging data for the establishment of an ultimate model and must be excluded from the training and validation phase for the enhancement of the model's performance. Also, the GEP encode a single chromosome (expression) and present it as a program. Thus, they are only appropriate when there exists a simple relationship between inputs and outputs [53]. In contrast, the MEP is a comparatively new variant of genetic programming (GP) and has an ability to code a multiple number of chromosomes (expressions) in just one computer program [53]. It has the potential to predict the outcome accurately given the unknown complexity of the targeted parameter [54]. Unlike other ML algorithms, MEP does not need the identification of the final expression. In addition, the evolution process can effectively read and eliminate the complex mathematical errors from the resulting expression. The decoding procedure of MEP is fairly intuitive in comparison with other ML methods. Given the immense benefits of MEP algorithmic process over other evolutionary algorithms, it is scarcely adopted by environmentalists.
In the present research, water quality parameters like EC and TDS of the Upper Indus Basin (UIB) at the Bisham Qilla monitoring stations were modeled employing MEP algorithmic procedures and the traditional non-linear regression (NLR) approach based on the most affecting variables. A comprehensive dataset of 360 monthly readings taken from the Water and Power Development Authority (WAPDA) is being partitioned into three sets (training, validation, and testing) to verify the efficiency of the training process. To guarantee model efficacy, reliability and applicability, an in-depth statistical error test and sensitivity study is performed on the developed MEP models. The models that effectively predict TDS and EC concentrations by employing a small set of variables considerably minimize the trouble and expense involved in environmental surveillance.

Materials and Methods
This section is specifically explaining the methodology of multi-expression programming (machine learning approach) and a traditional non-linear regression (NLR) approach for the assessment of water quality indicators (i.e., EC and TDS) alongside the study area chosen in the current research and the performance evaluation indicators.

Multi-Expression Programming
Genetic algorithms (GA) are the metaheuristic and stochastic approach for searching for and optimizing complex solutions depending upon the evolutionary genetics and biological selection principles [55]. GA generates a sequence of strings (binary) to describe the outcome by employing the conventional optimization methods. Genetic programming (GP) was established as an improved version of GA to generate string expressions into computer algorithms like tree construction or workable computer code [56]. GP is a symbolized optimization approach that employs Darwin's theory of evolution to resolve an issue using complex algorithms. The fundamental purpose of GP is just to find a code that combines the given inputs and known outcome using the fitness metric. In general, there are three main forms of GP: linear-based GP, graph-based GP, and tree-based GP [57,58]. In comparison to the other two forms of GP, linear-GP is much more effective since it does not require sophisticated or fast analyzers. This results in a more relevant significance for such linear-GP, which improves the accurateness in actual timescales.
To forecast the water quality indices (i.e., EC and TDS), a relevant linear-GP technique known as multi-expression-programming (MEP) was used in this research, depending upon it's overall precision and reliability. The MEP records responses using linear chromosomes. A chromosome can contain many computer algorithms (productive solutions). The optimal recorded answer to reflect the chromosome is identified from a comparison of the fitness scores of the scripts (programs). The MEP algorithmic procedure begins with the creation of a randomized population of computer code. To design the best solution to the considered problem, the subsequent stages are performed until the exit condition is reached [53]: Two individuals (parents) are chosen employing the simple binary tournament process, subsequently reconstituted using a specified crossover probability.

2.
The two parents are then recombined to generate two offspring.

3.
After the mutation of the offspring, the weakest individual is substituted with the finest among them within the present population.
MEP is represented in the same manner as C and Pascal compilers translate mathematical equations to coding [57]. A sequence (String or series) of formulas represents the MEP genes. The set of genes define the code length (chromosomal length), which remains unchanged across the computation. The gene is constituted with a function sign (component in a function set i.e., F) and one or two terminals (component terminal set i.e., T). To produce the contextually valid code, the initial gene of a chromosome should correspond to a terminal selected at random within a terminal set. The function gene must hold a reference to the function parameters. The generated terminal scores of a particular gene have lesser values than the gene's chromosomal location. The following example illustrates a simple MEP chromosome: In the given example, the function set F = {Pow, +, /} and the terminal set T = {A 1 , A 2 , A 3 , A 4 } are utilized to encode and MEP program. The MEP genes can be converted into a computational program with the decoding of the chromosomes right from the top to the bottom. The respective program is presented in the form of gene trees in Figure 1, where the 0, 1, 3 and 5 number genes are used to code one unique terminal i.e., Y 0 = A 1 , Y 1 = A 2 , Y 3 = A 3 , and Y 5 = A 4 , respectively. At chromosomal positions 0 and 1, gene 2 denotes the 'Pow' function upon the said operands, thus encoding the expression Y 2 = A A 2 1 . Likewise, at positions 2 and 3, gene 4 represents the operator / upon the operands and encodes the expression as Y 4 = A A 2 1 /A 3 . Eventually, Y 4 = A A 2 1 /A 3 + A 4 can be stated as a result of gene 6. Consequently, the chromosomal combination can be visualized as a forest of gene trees (see Figure 1), having several expressions. The optimal expression tree is determined following regulation of the efficiency of various expressions on an MEP chromosome [53]. denotes the 'Pow' function upon the said operands, thus encoding the expression = . Likewise, at positions 2 and 3, gene 4 represents the operator ′/′ upon the operands and encodes the expression as = . Eventually, = + can be stated as a result of gene 6. Consequently, the chromosomal combination can be visualized as a forest of gene trees (see Figure 1), having several expressions. The optimal expression tree is determined following regulation of the efficiency of various expressions on an MEP chromosome [53].

Non-Linear Regression Approach
Linear regressions employ a linear function for fitting the recorded data and to identify the relation among two or more variables and a single output parameter. Each record of the predictor variables corresponds to a recorded value of the outcome parameter. The following is a generalized version of multiple-linear regression: However, in cases when several dependent variables are non-linear, the logarithm based exponential adjustment can be employed to perform regression (see Equation (2)), which is then reverted for the prediction of the outcome using an antilogarithmic function (see Equation (3)).
where , , , … , denotes the coefficient (constants) and , , , … , are the independent (input) variables. The above equation represents a multi-variable power expression which better captures the functional interdependency between numerous variables, and thus provides more realistic results [59].
The present research discusses the findings of the MEP (ML technique) and NLRM (traditional regression approach) established models to assess water quality indicators (i.e., EC and TDS). A comparison of outcomes from both methodologies was also

Non-Linear Regression Approach
Linear regressions employ a linear function for fitting the recorded data and to identify the relation among two or more variables and a single output parameter. Each record of the predictor variables corresponds to a recorded value of the outcome parameter. The following is a generalized version of multiple-linear regression: However, in cases when several dependent variables are non-linear, the logarithm based exponential adjustment can be employed to perform regression (see Equation (2)), which is then reverted for the prediction of the outcome using an antilogarithmic function (see Equation (3)).
where B 0 , B 1 , B 2 , . . . , B n denotes the coefficient (constants) and X 0 , X 1 , X 2 , . . . , X n are the independent (input) variables. The above equation represents a multi-variable power expression which better captures the functional interdependency between numerous variables, and thus provides more realistic results [59]. The present research discusses the findings of the MEP (ML technique) and NLRM (traditional regression approach) established models to assess water quality indicators (i.e., EC and TDS). A comparison of outcomes from both methodologies was also performed to determine and confirm the superiority of the ML-based MEP approach. It is a widely adopted approach having significant implications in a variety of technological sectors [60][61][62]. The NLRM implemented in this research was created with the help of statistical software (i.e., SPSS: statistical package for social sciences), while the MEPX V.1.0. software is used for MEP modeling.

Study Area and Data Collection
The Indus River runs for 2880 km, with a mountainous, snow fed and glacierized catchment of the Upper Indus River basin (UIB) [63]. The UIB is located at the top of the Tarbela reservoir, which is basically associated with the Indus basin system, having a length and drain area of 1150 km and 165,400 km 2 , respectively. The ice deposition of 2174 km 3 also exists at the same location. The height of UIB ranges between 455 to 8611 m, with the amount of annual precipitation between 100 and 200 mm, which changes the climatic condition within the basin proportional to the elevation [51,63].
The Pakistani Water and Power Development Authority (WAPDA) provided the dataset related to water quality which is utilized in this investigation. The finalized data set included 360 monthly instances recorded between 1975 and 2005 at the outlet location of Bisham Qilla and Doyian. The eight most prominent explanatory variables selected are the water temperature ( • C), magnesium (Mg), calcium (Ca), sodium (Na), sulphate (SO 4 ), chloride (Cl), pH, and bicarbonates (HCO 3 )" along with two well-known outputs i.e., EC and TDS. Table 1 presents the key descriptive statistical attributes of the collected data, which includes mean, maximum and minimum values and the dispersion statistics like kurtosis, skewness and standard deviation. TDS levels in the following research vary considerably between 60 and 524 ppm, whereas EC readings range from 88 to 770 µS/cm. According to WHO recommendations, the acceptable limit of TDS in drinkable water is 300 to 600 mg/L, whereas the authorized level in agricultural water is 450 to 2000 mg/L [7]. Table 1 presents that the concentrations of EC and TDS are both inside the allowable range; nonetheless, it is essential that significant water quality indices be measured precisely and without substantial work. Also, the kurtosis and skewness lie in the permissible range of [-10, 10] and [−3, +3], respectively, showing an acceptable dispersion and peakedness in the model parameters [64,65].  Tables 2 and 3 show the correlation coefficient among the model variables and the model outcomes (i.e., EC and TDS). As per the current research, incorporating so many input variables having a poor relationship with the model outcome degrades model efficiency and enhances complication and computation complexity [66]. The inputs considered in this research have a strong relation with the concerned output. Also, no multicollinearity problem is observed, as the relation amongst the inputs are less than 0.8 [67,68]. As is customary, the data were divided randomly into three distinct sets i.e., 70% (252 recordings) for training, 15% (54 readings) for validation and 15% (54 readings) for testing the developed models.

Statistical Indicators for Response Evaluation of Models
The functionality of the simulated models in the training or testing phase is monitored by estimating statistical measures like mean absolute percent error (MAPE), mean root mean squared logarithmic error (RMSLE), root mean square error (RMSE), coefficient of determination (R 2 ) (also known as root square value) and mean absolute error (MAE). R 2 is recognized to be the superior amongst these for examining the effectiveness of the models. The R 2 score from 0.65 to 0.75 implies outstanding performance, whereas below 0.50 indicates poor performance [69]. The formula used to get the R 2 value is given in Equation (4), where, the 'P' and 'E' denotes the model predicted and experimental findings, respectively. And 'm' is the total number of readings.
MAE represents the imbalances between estimated and observed results in terms of the mean of the absolute magnitude of the error (removing the negative sign) in the same unit as the output. It captures the magnitude of lower error values effectively [70] and is computed using the provided Equation (5).
RMSLE helps to determine the proportional difference among the forecasted and observed result by incorporating the enlarged logarithmic error. It is useful for right-skewed outcomes because the log conversion depicts the targeted distribution quite intuitively. The Equation (6) shows the expression used for the calculation of RMSLE. An RMSE is used for the effective assessment of the magnitude of larger error values and significant deviations, like outliers, which were weighted more heavily [70]. It measures the error by considering the root of squared error. The the value of RMSE, the better the effectiveness and performance of the simulated model will be. It is computed using Equation (7).

Tunning the Modeling Hyper-Parameters
To produce robust, reliable and adaptable models, certain MEP hyper-parameters are set up before starting the modeling. Using the existing suggestions in the literature and a hit-and-trial process, the correct parameters are set [71]. The population size is used to determine the total programs that will emerge in the population. A larger population size can take a long time until convergence is achieved, and it may be complex but will also be realistic and accurate. However, the rise in size above a particular threshold led to the overfitting problem in the developed model. Table 4 shows the optimized hyperparameters that were chosen for the two developed models (EC and TDS) produced in the current study. The modelling began with the assumption of a total of 10 sub-population. Initially, a simple arithmetic operator (i.e., +, −, ×, and ÷) was selected to obtain a simple final formulation that was later on extended to a logarithmic and square root function to achieve the convergence at a higher degree of accuracy and robustness. The iterations are also used to specify the accurateness that the algorithm must attain before being terminated. A simulation having several iterations might result in a model with the least inaccurate results. Consequently, the crossover and mutation speed are the measures to control the likelihood of the offspring currently undergoing these genetic activities. Considering these probabilities, the crossover rate varies between 50% and 95%. After testing multiple combinations of these parameters, the best possible options were chosen depending on the model performance, as presented in Table 4. Performance overfitting of the developed model is the frequently occurring challenge with machine learning modeling. A model appears to work successfully on the sample data provided, however its efficacy suffers dramatically when extended to new unseen data. To solve the discussed problem, the literature suggested that the developed (trained) model must be tested on the testing dataset (fresh unseen) [72]. Thus, the entire data collected in the database was arbitrarily partitioned among three distinct sets i.e., training (70%-252% reading), validation (15%-54% readings), and testing (15%-54% readings). It was verified that the data distribution was consistent across all sets. Both the training and validation data were utilized in the computation of the optimized model. The performance of the validated model was further evaluated on the unseen data i.e., the testing data (third set). The resulting models outperformed in all three stages. A publicly accessible computer tool (i.e., MEPX v1.0), was used to execute the MEP algorithmic process. The method initiates with the generation of a population containing acceptable solutions. The operations are repetitive, with every iteration bringing us closer to an optimal solution. Within the whole solution population, the efficiency of every iteration is assessed. The process will keep evolving unless the stagnation of pre-defined fitness function (i.e., RMSE or R 2 ) is reached. In case the model findings for each set of data are inaccurate, the procedure is continued again, steadily boosting the size and number of subpopulations. The optimal result is then chosen depending on the lowest RMSE and highest R 2 . It was noted that the accuracy of certain models in the training phase was better than in the testing phase, indicating the overfitting, which must be prevented. It is worthy of mention that the evolution time and generations have an influence on the accuracy of the produced models. A model would keep developing endlessly with these types of algorithms owing to the inclusion of additional variables into the process. However, in the current work, the modeling was terminated after a thousand generations or when the improvement in fitness value falls below 0.1%. Furthermore, an optimum solution must fulfill several performance metrics, as discussed in Section 2.4.

Formulation of EC and TDS Using MEP Model
The generated optimized MEP code for the future prediction of EC and TDS using nine variables is presented in Appendix A as A-1 and A-2, respectively. The experimental and modeling results of EC of all three sets is plotted in Figure 2 along with the slope. The ideal position of the regression line is 45 • , i.e., slope equaling 1, while for strong correlation it must be nearer to 1 [70]. As depicted in Figure 2, the slope for all three sets is 0.9885 (training), 0.9921 (validation), and 0.9918 (testing), indicating a substantial relationship among the experimental and resulted modeling values of EC. Furthermore, the results seem fairly comparable to one another and approach a perfect fit for the sets, implying the proper training of the model [65]. Thus, it possesses a strong generalization potential, i.e., it functions pretty similarly on new data (testing set).
A corresponding assessment was also undertaken for the TDS findings, which are presented in Figure 3. It is quite clear and obvious that the generated model for TDS has been trained well using the input data to accurately follow and capture the experimental TDS. The regression line slope for all three sets is 0.9924 (training), 1.0119 (validation), 0.9989 (testing). Like an EC model, the TDS model also functions exceptionally well in the testing phase [73]. Thus, this demonstrates that the problem of over-fitting was addressed effectively [70].
Water 2022, 14, x FOR PEER REVIEW 1 Inside the compiled database, the maximum points i.e., 360 were picked for both E TDS, thus, resulting in a higher accuracy with minimal errors [75].

Overall Performance of Developed Models
The reliability of a model is directly related to the number of instances/readings used in the establishment of the model. It is mentioned in the literature that the ins to independent variables ratio must exceed five [76]. For both the EC and TDS, thi is 31.5 and 6.75 for the training and validation/testing set, respectively. An R 2 -scor sistently above 0.8 indicates a good association amongst measured and model pre results [77]. Both EC and TDS are significantly associated with all the selected inpu ables. Conversely, the investigations revealed that R 2 identifies the linear relations Inside the compiled database, the maximum points i.e., 360 were picked for both E TDS, thus, resulting in a higher accuracy with minimal errors [75].

Overall Performance of Developed Models
The reliability of a model is directly related to the number of instances/readings used in the establishment of the model. It is mentioned in the literature that the inst to independent variables ratio must exceed five [76]. For both the EC and TDS, this is 31.5 and 6.75 for the training and validation/testing set, respectively. An R 2 -scor sistently above 0.8 indicates a good association amongst measured and model pred results [77]. Both EC and TDS are significantly associated with all the selected inpu ables. Conversely, the investigations revealed that R 2 identifies the linear relations   Furthermore, the R 2 of both EC and TDS MEP models is above 0.9 in each stage i.e., for training, validation and testing. It can be seen clearly from Figures 2 and 3, the scatter of datapoints is near the 45 • lines (1:1), resulting in the higher prediction accuracy of the suggested EC-MEP and TDS-MEP models, respectively. The R 2 -values for the EC model are 0.9519 (training), 0.9348 (validation) and 0.9674 (testing), while for the TDS model they are 0.9185 (training), 0.9441 (validation) and 0.9725 (testing). Both of the developed MEP models shows a generalized performance, as the measures in each stage are closer with a little difference [74].
Additionally, the number of data points employed for modeling has a substantial significance upon the efficiency as well as adaptability of these simulation equations [75].
Inside the compiled database, the maximum points i.e., 360 were picked for both EC and TDS, thus, resulting in a higher accuracy with minimal errors [75].

Overall Performance of Developed Models
The reliability of a model is directly related to the number of instances/readings being used in the establishment of the model. It is mentioned in the literature that the instances to independent variables ratio must exceed five [76]. For both the EC and TDS, this ratio is 31.5 and 6.75 for the training and validation/testing set, respectively. An R 2 -score consistently above 0.8 indicates a good association amongst measured and model predicted results [77]. Both EC and TDS are significantly associated with all the selected input variables. Conversely, the investigations revealed that R 2 identifies the linear relationship of outcome and independent factors. Therefore, evaluating the presented models (EC and TDS) just on the slope or inclination of the trendline and the regression coefficient is inadequate [76]. Thus, multiple statistical measures are used to examine the robustness and reliability of the generated MEP models.

EC-MEP MODEL
To evaluate the robustness of the developed MEP-EC model, the statistical error measures (i.e., MAE, RMSE, MAPE, and RMSLE) of all three sets is presented in Table 5. The MAE, RMSE and RMSLE are graphically interpreted via the mean absolute error plot (Figure 4), while the absolute percent error histogram ( Figure 5) is presented for the examination of MAPE. The MAE and RMSE have their own significance in evaluation of model performance. The RMSE gives higher weight to larger error values as the error is squared before taking the average. Conversely, MAE is significant in capturing the smaller error values effectively and is thus always smaller than RMSE. The MAE in training and validation are 12.36, and 12.14, respectively, while the RMSE values are 18.54, and 17.19, respectively, with slightly better performance in the testing stage having MAE and RMSE equaling 11.12 and 16.43, respectively, thus fulfilling the stated condition (MAE < RMSE) [37]. Considering the whole data set of 360 readings, the minimum and maximum values of the absolute error are 0.016 and 98.16, respectively. Additionally, the RMSLE also nearly equals to zero at each stage, replicating the outburst functioning as it penalized the larger measured/prediction values efficiently. Furthermore, the percent error histograms show that 282 readings (78.33% of data) have an absolute error below 20%, with no prediction having an error above 40%. The MAPE values fall below 10%, i.e., 5.317%, 6.145% and 6.119% for training, validation and the testing phase, respectively. In accordance with the criteria explained in Section 2.4, the developed EC-MEP model can be categorized as "excellent" [7,8]. The developed model can be effectively used for future prediction with higher accuracy and minimal error measures, thus assisting the practitioners in avoiding human and machine errors [26,35]. prediction having an error above 40%. The MAPE values fall below 10%, i.e., 5 6.145% and 6.119% for training, validation and the testing phase, respectively. In a ance with the criteria explained in Section 2.4, the developed EC-MEP model can b gorized as "excellent" [7,8]. The developed model can be effectively used for futur diction with higher accuracy and minimal error measures, thus assisting the practit in avoiding human and machine errors [26,35].   Similar to the EC-MEP model, the TDS-MEP model also shows higher accuracy on the slope of the regression line, R 2 , and the reliable predictive performance cons the error metrics (i.e., MAE, RMSE, RMSLE, and MAPE). As shown in Figure 6, the

TDS-MEP Model
Similar to the EC-MEP model, the TDS-MEP model also shows higher accuracy based on the slope of the regression line, R 2 , and the reliable predictive performance considering the error metrics (i.e., MAE, RMSE, RMSLE, and MAPE). As shown in Figure 6, the distribution of absolute error near the x-axis witnesses the outburst performance of the developed TDS model. As can be seen in Table 5, the MAE is lesser than RMSE in all three stages, with the least values in the testing stage i.e., 8.27 ppm and 11.36 ppm, respectively. The maximum absolute error is 66.25 ppm, while the minimum was found to be 0.046 ppm, signifying the effectiveness of the developed TDS model [7]. In addition, the error histogram (see Figure 7) reflects zero error predictions above 45% of the absolute percent error with 309 readings (85.83% data points) having absolute percent error below 10%. Like the EC mode, the MAPE for TDS model in training, validation and for unseen data (testing) also falls below 10% i.e., 7.40%, 8.25%, and 5.54%, respectively, and can be similarly classified as "excellent" [7].

Comparison between the MEP Models and NLRMs
The non-linear regression method was applied to develop a mathematical mod predicting the EC and TDS based on the inputs, using the same data sets [37,61]. Equ (9) and (10) denotes the developed regression equations for EC and TDS. Figure 8 lustrates the deviation of the experimental and regression model predicted results and TDS, respectively. The predicted results of both models (EC and TDS) largely d from their targeted values, which make the performance and reliability of develope ditional regression models doubtful. The statistical measure presented in Table 6 s that on average the performance of EC-NLRM is 28.36% (R 2 ), 29

Comparison between the MEP Models and NLRMs
The non-linear regression method was applied to develop a mathematical mod predicting the EC and TDS based on the inputs, using the same data sets [37,61]. Equa (9) and (10) denotes the developed regression equations for EC and TDS. Figure 8 lustrates the deviation of the experimental and regression model predicted results and TDS, respectively. The predicted results of both models (EC and TDS) largely de from their targeted values, which make the performance and reliability of develope ditional regression models doubtful. The statistical measure presented in Table 6 s that on average the performance of EC-NLRM is 28.36% (R 2 ), 29

Comparison between the MEP Models and NLRMs
The non-linear regression method was applied to develop a mathematical model for predicting the EC and TDS based on the inputs, using the same data sets [37,61]. Equations (9) and (10) denotes the developed regression equations for EC and TDS. Figure 8a,b illustrates the deviation of the experimental and regression model predicted results of EC and TDS, respectively. The predicted results of both models (EC and TDS) largely deviate from their targeted values, which make the performance and reliability of developed traditional regression models doubtful. The statistical measure presented in Table 6 shows that on average the performance of EC-NLRM is 28.36% (R 2 ), 29.67% (MAE), 54.67% (RMSE), 85.62% (RMSLE), and 27.19% (MAPE) lower than EC-MEP model. Consequently, TDS-NLRM gives 29.26% (R 2 ), 27.34% (MAE), 45.67% (RMSE), 78.58% (RMSLE), and 34.59% (MAPE), which is an inaccurate prediction as compared to TDS-MEP model. In the testing phase, the R 2 and MAPE of EC-NLRM are 0.7156 and 25.88% respectively, and 0.7295 and 28.813% for TDS-NLRM models. Thus, the developed NLRMs can be categorized as "acceptable" models for prediction but are less accurate than MEP-based models. It can be observed from Table 6 that the performance measurements of NLRM models get worse when subjected to unseen (testing data), replicating the inconsistency and irregularities in the performance [35,36,78]. In essence, the traditional regression models (i.e., NLRMs) are not useful for the prediction of complex problems because of their inefficiency and lesser generalization capability [35,78].  , which is an inaccurate prediction as compared to TDS-MEP model. In the testing phase, the R 2 and MAPE of EC-NLRM are 0.7156 and 25.88% respectively, and 0.7295 and 28.813% for TDS-NLRM models. Thus, the developed NLRMs can be categorized as "acceptable" models for prediction but are less accurate than MEP-based models. It can be observed from Table 6 that the performance measurements of NLRM models get worse when subjected to unseen (testing data), replicating the inconsistency and irregularities in the performance [35,36,78]. In essence, the traditional regression models (i.e., NLRMs) are not useful for the prediction of complex problems because of their inefficiency and lesser generalization capability [35,78].

Sensitivity Analysis of MEP Models
In machine learning modeling, it is imperative to do many assessments to verify that the developed models (EC and TDS) are reliable and function successfully on a variety of data configurations. The higher efficacy on an established set of data, which includes the training, validation, and testing phases, does not ensure a model's superiority. A sensitivity analysis was presented in numerous studies and is used in the current work to assess that the model works well and is not just a relationship of inputs and outputs. The total data is subjected to a sensitivity analysis to get a deeper interpretation of the relevance of the independent variables on the projected water quality indicators. The sensitivity " " for a given input variables " " are computed as Equation (11) [79,80].
In the above equation, " " are the forecasted outcome of the established predictive model (EC or TDS), and "m" represents the number of the instances/readings (m = 360). The score typically runs from zero to one, indicating the intensity of the relationship between a single input and the projected outcome (EC or TDS). When the score approaches one, the particular input has a greater effect upon a certain forecasted output. Figure 9 represents the sensitivity ( ) of the inputs for the prediction of EC and TDS. All the eight variables considered in the current research influence the prediction of the water quality parameters (EC and TDS), utilizing the suggested models with distinct effects (all above 0.5). The sensitivity of the chlorides, sulphates, and carbonates concentration remains high for both water quality parameters, suggesting that all three of these inputs are the most relevant attributes influencing the forecasting outcomes [35]. However, the least contributing input is the sodium with a sensitiveness equal to 0.53 and 0.51 for EC and TDS, respectively.

Sensitivity Analysis of MEP Models
In machine learning modeling, it is imperative to do many assessments to verify that the developed models (EC and TDS) are reliable and function successfully on a variety of data configurations. The higher efficacy on an established set of data, which includes the training, validation, and testing phases, does not ensure a model's superiority. A sensitivity analysis was presented in numerous studies and is used in the current work to assess that the model works well and is not just a relationship of inputs and outputs. The total data is subjected to a sensitivity analysis to get a deeper interpretation of the relevance of the independent variables on the projected water quality indicators. The sensitivity "R sen " for a given input variables "I j " are computed as Equation (11) [79,80].
In the above equation, "P j " are the forecasted outcome of the established predictive model (EC or TDS), and "m" represents the number of the instances/readings (m = 360). The R sen score typically runs from zero to one, indicating the intensity of the relationship between a single input and the projected outcome (EC or TDS). When the R sen score approaches one, the particular input has a greater effect upon a certain forecasted output. Figure 9 represents the sensitivity (R sen ) of the inputs for the prediction of EC and TDS. All the eight variables considered in the current research influence the prediction of the water quality parameters (EC and TDS), utilizing the suggested models with distinct effects (all above 0.5). The sensitivity of the chlorides, sulphates, and carbonates concentration remains high for both water quality parameters, suggesting that all three of these inputs are the most relevant attributes influencing the forecasting outcomes [35]. However, the least contributing input is the sodium with a sensitiveness equal to 0.53 and 0.51 for EC and TDS, respectively.

Recommendations and Suggestions
The current study concentrated on a case-based analysis of the upper Indus Ri basin. The results of this analysis provide a helpful and comprehensive knowledge of w ter-related challenges which can be applied for the assessment of the groundwater qua and performance evaluation of machine learning techniques. The results revealed t predictions based on evolutionary algorithms (i.e., MEP) are robust and can be utili confidently for water quality concerns, which will be useful for government, practition designers, and policy makers in order to save the available limited amount of water sources, thus conserving the environment. Furthermore, it is highly suggested that ad tional studies be undertaken employing other AI methodologies including ensemble s ulations to better develop their performance with least hyper-parameters tunning volved in the estimation of water quality metrics. Additional water attributes should a be included when evaluating the predictive power of the aforementioned methodolog The spatio-temporal assessment will provide a complete perspective and pinpoint mechanisms causing water quality degradation.

Conclusions
The contamination of water is a big issue that directly threatens both human hea and the environment, and also poses a serious issue for agricultural productivity. T study proposed evolutionary algorithm-(EA) based multi-expression programm (MEP) models for the prediction of specific conductivity (EC) and total dissolved sol (TDS). A bigger water quality dataset of 360 readings collected on a monthly basis w used in the modeling process. The eight most influential water quality input variab were selected, i.e., water temperature (°C), magnesium (Mg), calcium (Ca), sodium (N sulphate (SO4), chloride (Cl), pH, and bicarbonates (HCO3). The accuracy, reliability a generalization of the established models were evaluated using various well-known of tistical measures, i.e., slope and coefficient of determination (R 2 ), mean-absolute-pres error (MAPE), mean-absolute-error (MAE), root-mean-square-logarithmic er (RMSLE), and root-mean-square error (RMSE). The performance of the models was co pared with traditional multiple non-linear regression (NLRM) models. The regression sults of EC-MEP and TDS-MEP showed excellent accuracy with coefficient of regress (R 2 ) and slope above 0.95 in the testing phase on unseen data. Also, the error statistics

Input variables
Specific conductivity (EC) Total dissolved solids (TDS) Figure 9. Contribution of input variables for the prediction of EC and TDS.

Recommendations and Suggestions
The current study concentrated on a case-based analysis of the upper Indus River basin. The results of this analysis provide a helpful and comprehensive knowledge of water-related challenges which can be applied for the assessment of the groundwater quality and performance evaluation of machine learning techniques. The results revealed that predictions based on evolutionary algorithms (i.e., MEP) are robust and can be utilized confidently for water quality concerns, which will be useful for government, practitioners, designers, and policy makers in order to save the available limited amount of water resources, thus conserving the environment. Furthermore, it is highly suggested that additional studies be undertaken employing other AI methodologies including ensemble simulations to better develop their performance with least hyper-parameters tunning involved in the estimation of water quality metrics. Additional water attributes should also be included when evaluating the predictive power of the aforementioned methodologies. The spatio-temporal assessment will provide a complete perspective and pinpoint the mechanisms causing water quality degradation.

Conclusions
The contamination of water is a big issue that directly threatens both human health and the environment, and also poses a serious issue for agricultural productivity. This study proposed evolutionary algorithm-(EA) based multi-expression programming (MEP) models for the prediction of specific conductivity (EC) and total dissolved solids (TDS). A bigger water quality dataset of 360 readings collected on a monthly basis was used in the modeling process. The eight most influential water quality input variables were selected, i.e., water temperature ( • C), magnesium (Mg), calcium (Ca), sodium (Na), sulphate (SO 4 ), chloride (Cl), pH, and bicarbonates (HCO 3 ). The accuracy, reliability and generalization of the established models were evaluated using various well-known of statistical measures, i.e., slope and coefficient of determination (R 2 ), mean-absolute-present error (MAPE), meanabsolute-error (MAE), root-mean-square-logarithmic error (RMSLE), and root-mean-square error (RMSE). The performance of the models was compared with traditional multiple nonlinear regression (NLRM) models. The regression results of EC-MEP and TDS-MEP showed excellent accuracy with coefficient of regression (R 2 ) and slope above 0.95 in the testing phase on unseen data. Also, the error statistics are minimum, showing the generalized and reliable performance. The projected (RMSE and MAE) in EC prediction were (18.54 µS/cm and 12.36 µS/cm), (17.19 µS/cm and 12.14 µS/cm) and (16.43 uS/cm and 11.22 µS/cm) for training, validation and testing sets, respectively, and for the TDS modeling they were (13.36 ppm and 9.75 ppm), (13.33 ppm and 9.80 ppm) and (11.36 ppm and 8.27 ppm), respectively. The RMSLE approaches 0, indicating an outburst performance. According to MAPE, the performance of the established models was categorized as "excellent" and thus can be confidently used for future predictions. The predictions of NLRMs show a significant deviation from the targeted results, reflecting the reduced performance statistics that made the reliability of the NLRMs doubtful. However, the MAPE of the NLRMs also falls within acceptable limits i.e., below 50%. In essence, the traditional regression models (i.e., NLRMs) are not useful for the prediction of complex problems because of their inefficiency and least generalization capability. Furthermore, the sensitivity analysis of the developed MEP models revealed that all of the eight variables considered in current research influences the prediction of the water quality parameters (EC and TDS), with distinct effects having a sensitiveness index above 0.5. Thus, the developed EA-based MEP models are not merely the correlations but can be helpful for practitioners and decision-makers that will eventually save the time and money required for monitoring water quality parameters.