Assessment of Surface Water Quality Using Water Quality Index and Discriminant Analysis Method

: Given the complexity of water quality data sets, water resources pose a signiﬁcant problem for global public order in terms of water quality protection and management. In this study, surface water quality for drinking and irrigation purposes was evaluated by calculating the Water Quality Index (WQI) and Irrigation Water Quality Index (IWQI) based on nine hydrochemical parameters. The discriminant analysis (DA) method was used to identify the variables that are most responsible for spatial differentiation. The results indicate that the surface water quality for drinking is of poor and very poor quality according to the WQI values, however, the IWQI values indicate that the water is acceptable for irrigation with restrictions for salinity sensitive plants. The discriminate analysis method identiﬁed pH, potassium, chloride, sulfate, and bicarbonate as the signiﬁcant parameters that discriminate between the different stations and contribute to spatial variation of the surface water quality. The ﬁndings of this study provide valuable information for decision-makers to address the important problem of water quality management and protection.


Introduction
Water is essential for the survival of all living organisms and plays a vital role in the growth and development of life on Earth [1][2][3][4][5][6]. Therefore, it is imperative to take immediate action to preserve water resources through effective management and monitoring of water quality [7][8][9][10][11]. The usage of water for domestic purposes accounts for only 5%, with 20% being utilized by the industrial sector, leaving 75% for agricultural use [12,13]. It is crucial to implement a comprehensive program for water resource development that places emphasis on water quality management and monitoring.
It is crucial to evaluate the quality of water before it is used for human consumption [14][15][16]. The pollution of surface waters is a direct result of industrial development, as well as the excessive and uncontrolled use of fertilizers by man, in addition to natural processes such as the dissolution of rocks through water-rock interaction [17]. The assessment of water quality can be conducted through the use of the water quality index (WQI) model, which primarily depends on the physicochemical parameters of water, to monitor and control the quality of water intended for human consumption [18][19][20][21][22].
The study of the spatio-temporal variation of surface waters is a complex phenomenon, but the use of outdated methods can lead to suboptimal performance of the model and redundant results. In this study, factor analysis (FA), discriminant analysis (DA), cluster analysis (CA), and principal component analysis (PCA) methods were used to evaluate the quality of surface waters and reduce the number of monitoring results and the identification of measurement stations of comparable quality, as well as to separate measurement stations where the water quality deviates from the quality of other measurement stations [23,24]. The objectives of this study are to understand the current state of water quality, determine the suitability of surface waters for human consumption, provide scientific recommendations for water resource managers and, finally, identify the most influential parameters and those responsible for the spatial differentiation of water quality using the discriminant analysis (DA) method.

Study Area
The Koudiat Medouar watershed is located in northeastern Algeria, in the Aurès region, and is a part of the vast watershed of the Constantine Highlands, which has a total area of 9500 km 2 . It is between four huge basins to the north of Kebir Rhumel and Seybouse, to the east of Medjerda, to the south of Chot Hodna and Melghir, and to the west of Summam. It has a 590 km 2 land area. It is situated between the longitude of 35 • 30 57 North and the latitude of 6 • 30 48 East ( Figure 1). In northeastern Algeria, the Koudiat Medouar watershed contributes significantly to the surface water mobilization and transfer system in the high plains of Constantine and the Aurès. The Oued Chemorah basin provides water for the dam, as well as sub-basins, wastewater, and industrial water from neighboring municipalities. Due to recent droughts in the region, authorities have also utilized water from the Beni Haroun dam, which has a capacity of 74 million cubic meters. This region's climate is semi-arid, with a cold and humid winter and a hot and dry summer, given the semi-arid character of the study region which results in a more or less marked drought by a very high potential evaporation rate which is estimated at 1747.34 mm per year. The average annual rainfall and temperature are around 370 mm and 15 • C, respectively [25].

Sampling of Surface Water
The following sample stations were selected depending on the degree of pollution in the research area: Dam station, Oued Reboua station, Oued Timgad station, and Wastewater station. The surface water samples were collected over a two-year period

Sampling of Surface Water
The following sample stations were selected depending on the degree of pollution in the research area: Dam station, Oued Reboua station, Oued Timgad station, and Wastewater station. The surface water samples were collected over a two-year period (2019 to 2020), with two samples taken at the beginning and end of each month in accordance with conventional sampling protocols [26] guidelines on consumption and irrigation from the International Organization for Standardization (ISO) [27], the Bureau of International Standards (BIS) [28], and the Food and agriculture organization (FAO) [29]. The samples from the sampling stations were placed in polyethylene bottles, which were then transported in coolers to the analysis laboratory. Nitric acid (HNO 3 ) was added to the cationic solutions to achieve a pH of 2, but the ion balance is computed with errors of less than 10%. Table 1 summarizes the findings.

Water Quality Index (WQI)
In this study, two equations were used to evaluate the quality of surface water intended for consumption and irrigation. The first is the water quality index (WQI), which is classified into five different suitability classes ( Table 2). The second is the Irrigation Water Quality Index (IWQI), which is classified into four different suitability classes (Table 3). Both equations abide to the parameters proposed by the World Health Organization (WHO).
In the present research, the WQI was calculated using seven hydrochemical parameters: Ca, Mg, Na, K, Cl, SO 4 , and HCO 3 . There were three processes involved in calculating WQI.
First, each hydrochemical measure was allocated a weight based on its relative relevance to the overall quality of drinking and irrigation water.
The parameters (Ca and HCO 3 ) have significant influence on the quality of surface waters, and hence the greatest weight of 5 was allocated to them. In contrast, the parameter (K), which is believed to be non-harmful, was assigned the lowest weight of 2. (Table 4).
Step two involves calculating the relative weight (Wi) of each hydrochemical parameter using Equation (1).
where W i is the relative weight, w i is the weight of each parameter, and n is the number of parameters.
Step three: for each hadrochemical parameter, the quality scale (qi) is computed by dividing its concentration in each water sample by its respective standard [29,30], and then multiplying the result by 100 in accordance with Equation (2).
where q i is the quality grading, C i is the concentration of each hydrochemical parameter in each water sample, and S i is the concentration of standard hydrochemical concentration either for drinking water or for irrigation purposes proposed by different organizations [29,30]. Finally, Water Quality Index (WQI) is calculated by the following Equation: where WQI and IWQI are the quality indexes used to evaluate the quality of drinking water and irrigation water, respectively. W i denotes the relative weight of each hydrochemical parameter, and q i refers to the quality rating.

National Sanitation Foundation Water Quality Index
Seven parameters calcium (Ca 2+ ), magnesium (Mg 2+ ), potassium (K + ), sodium (Na + ), chloride (Cl − ), sulfate (SO 4 2− ), and bicarbonate (CaCO 3 ) were used to calculate the NS-FQWI. The graphical representation shows that the water quality level varies from 0 (very poor) to 100 (excellent), the value of Q is determined by comparing the values of these parameters to this weighting curve, and the value of Q is multiplied by a weighting factor which is based on the degree its importance in the calculation (Wi) [31].

Discriminant Analysis (DA)
The technique of discriminant analysis is used to demonstrate the connection and linear combinations between two or more independent metric variables. Calculating the discriminant function is as follows: where i represents the number of groups or stations (G). In the current study, four stations (groups) were selected: K i represents the constant inherent to each group; n, is the number of variables used to categorize a set of data into a given class, W j , is the weight coefficient, assigned by DA to a specified selected parameter P j ; and P j , is the analytical value of the selected variable. In this study, the independent variables were the hydrochemical parameters in the four stations. All statistical and multivariate data analysis methods, as well as visualization, were performed using R version 4.1.3 and SPSS version 28.0.1.

Descriptive Statistics
The descriptive statistics of surface water hydrochemical parameters may offer an approximation of the hydrochemical characteristics of a region. Table 5 and Figure 2 provide statistics for each station's surface water hydrochemical parameters. The pH range of the surface water samples was 6.8 to 8.6, suggesting that the surface water in the research region was mainly neutral or slightly alkaline [32,33]. Electrical conductivity (EC) values for Dam, Oued Reboua, Oued Timgad, and Wastewater station were 1349, 1010, 1326, and 1173 µS/cm, respectively. The coefficient of variation (in percent) of electrical conductivity was 22.42, 25.55, 8.81, and 15.82 for Dam, Oued Reboua, and Wastewater stations, respectively, indicating that the temporal variability of electrical conductivity was moderate at Dam, Oued Reboua, and Wastewater stations and low at Oued Timgad station. Ca > Na > Mg > K were the four locations with the highest average cation concentrations, as seen in Figure 2. In all stations, calcium was the predominant cation, followed by sodium. Dam, Oued Reboua, Oued Timgad, and Wastewater station had Ca concentrations of 94.04, 120.24, 116.93, and 107.84 mg/L, respectively. The Ca content in all surface water tests exceeded the WHO (2006) permissible limit [34]. HCO 3 > SO 4 > Cl was the predominant anion in the Oued Reboua, Oued Timgad, and Wastewater sites. In the case of the Dam station, HCO 3 > Cl > SO 4 was the dominant anion. Bicarbonate was the most prevalent anion in all four sites, followed by sulfate at Oued Reboua, Oued Timgad, and Wastewater and chloride at the Dam station. The main concentration of HCO 3 was 173.21, 307.28, 500.51, and 758.38 mg/L in the four stations, respectively. The highest value of HCO 3 was observed in the Wastewater and Oued Timgad stations.

Evaluation of Surface Water Quality
This research used the water quality index for drinking (WQI) and irrigation (IWQI) water purposes for assessing surface water quality in terms of analyzed parameters. Table  5 displays the summary statistics for each station's water quality index. Boxplots of WQI and IWQI for each station are shown in Figure 3.

Evaluation of Surface Water Quality
This research used the water quality index for drinking (WQI) and irrigation (IWQI) water purposes for assessing surface water quality in terms of analyzed parameters. Table 5 displays the summary statistics for each station's water quality index. Boxplots of WQI and IWQI for each station are shown in Figure 3.
The  Figure 3, all surface water samples collected at the Dam station were of poor quality. A total of 75% of the water samples collected at the Oued Reboua station were judged to be of poor quality. However, almost all water samples from the Oued Timgad and Wastewater sites were determined to be of very poor quality.
The Water Quality Indices for irrigation (IWQI) reveal the level of pollution based on the restriction standards established for each class. According to Table 4, zone one (I) is the unrestricted class, zone two (II) expresses the class requiring restriction, zone three (III) indicates the moderate restriction class, and zone four (IV) expresses the severe restriction class [35]. (III) indicates the moderate restriction class, and zone four (IV) expresses the severe restriction class [35].  The NSFWQI values presented in Table 6 show that all the samples were classified as medium (NSFWQI = 50-70), except for the samples from the dam station which were classified as good (NSFWQI = 72.82); this increase is due to the decrease in potassium (K + ) and chloride (Cl − ) concentrations [31].  The NSFWQI values presented in Table 6 show that all the samples were classified as medium (NSFWQI = 50-70), except for the samples from the dam station which were classified as good (NSFWQI = 72.82); this increase is due to the decrease in potassium (K + ) and chloride (Cl − ) concentrations [31].

Correlation Analysis
In addition to identifying the origins of the hydrochemical formation process and establishing the degree of relevance between elements [36], the correlation research was used to demonstrate the numerous types of correlations between the physico-chemical parameters evaluated in the study region.
Using the corrplot package in R [37], the correlation matrix between the hydrochemical parameters, WQI, and IWQI for each station is presented in Figure 4 in this section. The correlation matrix was created using Pearson's correlation analysis. Using the corrplot package in R [37], the correlation matrix between the hydrochemical parameters, WQI, and IWQI for each station is presented in Figure 4 in this section. The correlation matrix was created using Pearson's correlation analysis.   Figure 4 demonstrates that EC had a negative correlation with Mg, HCO 3 , SO 4 , and IWQI, indicating that ions are prevalent in the research region. HCO 3 , Cl, and IWQI all correlate positively with WQI. IWQI demonstrated a statistically significant positive correlation with Mg, HCO 3 , Cl, and SO 4 . In addition, a substantial positive correlation was identified between Mg-HCO 3 and SO 4 -K.

Oued Reboua Station
Ca had a substantial positive correlation with SO 4 according to the data, suggesting that both parameters may have the same source. In addition, a substantial association was found between IWQI and K. (0.99). Figure 4 shows that IWQI, Cl, SO 4 , Mg, Na, and K exhibited a substantial positive correlation with WQI, suggesting that these hydrochemical parameters have a considerable impact on water quality. IWQI was positively correlated with Cl, SO 4 , Mg, and NA. There was a substantial correlation between potassium (K) and pH, Cl, SO 4 , and Na. Figure 4 reveals that the bicarbonate parameter had a strong correlation with WQI and IWQI, showing that HCO 3 plays a significant role in influencing the surface water quality in the research region. The sodium (Na) parameter correlated positively with EC, Cl, Mg, and K, but negatively with pH and Ca. In addition, a high positive correlation between Ca and Mg suggests that these parameters may have the same source.

Wastewater Station
Significant correlations were explained by the dissolution of the rock's mineral composition and the exchange of ions, which generates precipitation, human activity, and the addition of chemical fertilizers that pollute surface waters via runoff and leaching from agricultural land [32,33].

Spatial Variation in Surface Water Quality
The spatial variation in surface water quality was analyzed using discriminant analysis (DA) conducted on the raw dataset consisting of 10 variables grouped into four stations (pH, EC, Ca, Mg, Na, K, Cl, SO 4 , HCO 3 , WQI, and IWQI). The stepwise mode of DA was employed, in which at each step, one variable that minimized the overall Wilks' Lambda statistic was added or removed. The aim was to identify the most crucial variables that contributed to the differences among the various stations. The independent variables were the main hydrochemical parameters that were measured, and the dependent variable consisted of four stations.
Since the p-values are less than the 0.01 level of significance, the Wilks' Lambda statistic for group mean equality is significant. Test F for Wilks' Lambda was used to identify the factors that substantially contributed to station differentiation. Table 7 displays the ANOVA findings. From the results, we can see that the Test F is significant for eight variables (pH, Ca, Mg, Cl, SO 4 , HCO 3 , WQI, and IWQI) out of eleven variables at p-level < 0.05. The three variables of electrical conductivity (EC), sodium (Na), and potassium (K), for which the significance (Sig.) is higher than 0.01, should be eliminated from the model. Table 8 provides an overview of canonical discriminant functions. Four stations were analyzed using three discriminant functions (DFs), and therefore, three eigenvalues in a discriminant analysis.
The first discriminant function has the greatest eigenvalue, suggesting that it has the strongest discrimination power of the three. From Table 8, we can see that the first function explained 77.2% of the total variance between stations, compared to the other two functions that unite the studied parameters, indicating a dispersion of less than 20%. However, the canonical correlation coefficient, which measures the ratio between the coordinates of the discriminating factor and the grouping variable, explained only 94.3%, or (0.971)2, of the total variance, indicating a difference.  Table 9 displays the Wilks' Lambda and Chi-square statistics for each discriminant function (DF). According to Table 8, the Wilks' Lambda and Chi-square statistics for each discriminant function (DF) were between 0.007 and 0.64 and 13.859 and 156.66, respectively. Wilk's Lambda statistic indicated that all three functions were statistically significant since their values are were less than 0.01, showing that the spatial discriminant analysis was effective and significant. Considered discriminating variables are expressed in various units of measurement [38]. The coefficients of the discriminant function are used to determine the discriminant score for each case. Given that the first function has the largest discriminating power, it is clear to focus on analyzing its output. The findings of the first discriminant function demonstrate that bicarbonate (HCO 3 ) was an important hydrochemical parameter for discriminating the spatial variation in surface water quality across four stations. Table 10 presents the normalized coefficients of the discriminant function. The structure matrix coefficient represents the relationship between every predictor variable and the discriminant function. Although the correlation coefficient of bicarbonate, which admits a high value, is strongly correlated with the first discriminant function, the chloride and pH are strongly correlated with the second discriminant function, whereas the coefficients of sulfate and potassium are well correlated with the third discriminant function. The values of the calculated structural coefficients are shown in Table 11. The first discriminant function clearly illustrates the spatial variation between the four sites, as seen in Table 12.  Tables 13 and 14 show the resulting classification matrices (CMs) and classification functions (CFs) from the discriminant analysis. From the findings, we can observe that the DA properly categorized 97.3% of the cases that were originally grouped. According to Tables 13 and 14, the DA effectively categorized 94.6% of the initially grouped cases in the classification matrix. Based on the findings of discriminant analysis, the most significant discriminant variables across the four geographical stations (Table 13) were pH, potassium, chloride, sulfate, and bicarbonate, causing the majority of expected surface water quality variations.

Conclusions
In the context of this work, physico-chemical parameters (pH, EC, Ca, Mg, Na, K, Cl, SO 4 , and HCO 3 ) were used to calculate the surface water quality indexes for potability (WQI) and irrigation (IWQI). All samples were collected monthly at four sites throughout 2019 and 2020.
Discriminant analysis (DA) is a crucial technique for producing a thorough interpretation of the acquired data, allowing managers to use them for improved surface water pollution management of water resources.
Based on the WQI, the results show that the water samples from the four stations are of poor and very poor quality for potability. According to the IWQI, the water quality of the research area is adequate for irrigation, with restrictions for salinity-sensitive plants, while the NSFWQI reflects the same water quality as the IWQI or samples are classified as fair water, except for samples from dam stations which classified as good quality water.
Using discriminant analysis, the spatial variation in surface water quality across the four sites was examined (DA). The discriminant analysis revealed the presence of pH, potassium, chloride, sulfate, and bicarbonate. These discriminating parameters are the most relevant at the four sites and contribute in the majority of cases to the predicted changes in surface water quality.