Combining Water Quality Indices and Multivariate Modeling to Assess Surface Water Quality in the Northern Nile Delta, Egypt

: Assessing surface water quality for drinking use in developing countries is important since water quality is a fundamental aspect of surface water management. This study aims to improve surface water quality assessments and their controlling mechanisms using the drinking water quality index ( DWQI ) and four pollution indices ( PI s), which are supported by multivariate statistical analyses, such as principal component analysis, partial least squares regression (PLSR), and stepwise multiple linear regression (SMLR). Twenty-two physicochemical parameters were analyzed using standard analytical methods for 55 surface water sites in the northern Nile Delta, Egypt. The DWQI results indicated that 33% of the tested samples represented good water, and 67% of samples indicated poor to unsuitable water for drinking use. The PI results revealed that surface water samples were strongly a ﬀ ected by Pb and Mn and were slightly a ﬀ ected by Fe and Cr. The SMLR models of the DWQI and PI s, which were based on all major ions and heavy metals, provided the best estimations with R 2 = 1 for the DWQI and PI s. In conclusion, integration between the DWQI and PI s is a valuable and applicable approach for the assessment of surface water quality, and the PLSR and SMLR models can be used through applications of chemometric techniques to evaluate the DWQI and PI s.


Introduction
Surface water is an essential natural resource and a sensitive issue for human life in developing countries. The pollution of freshwater resources by heavy metals has become one of the main environmental concerns in recent decades and is due to natural contamination processes and human practices, which have significantly degraded surface water quality and have led to serious health hazards for drinking use with local and regional implications [1][2][3][4]. The water quality in any region is under stress from both natural processes and anthropogenic activities, as well as from the transportation of nutrients and heavy metals to surface waters [5][6][7].
The present study area is one of the most important developed regions in the northern Nile Delta (Egypt) with many reclamation projects and industrial activities and a large population density; thus, the surface water network in the study area is considered an important water resource for drinking and many other purposes. The surface water network in the study area receives substantial quantities of pollutants from agricultural areas, sewage, and industrial activities [8]. For example, El-Bouraie et al. 2010 [9] studied the distribution of heavy metals in surface water, such as Al, Ba, Cd, Co, Cr, Cu, Fe, Mn, Ni, Pb, and Zn and their impacts on the water quality. Significant local water contamination issues were found due to the increasing swept-out effluents along different drains into the River Nile and extensive use of water.
In addition, the concentrations of the trace elements in the study area showed Fe to be the most abundant element in all water points followed by Mn, Cr, Pb, Al, Cu, Ba, Ni, Cd, Co, and Zn [10,11]. These are due to agricultural and industrial activities mainly responsible for elevated levels of the measured elements in river water. In addition, the other sources of the polluting heavy metals in freshwater include atmospheric deposition, contamination of water in natural geologic deposits, manufacturing processes related to metals, and discharges of municipal waste.
Assessing surface water quality for drinking use is determined through the drinking water quality index (DWQI), which presents a useful interpretation of water quality for drinking [12]. The DWQI is a powerful approach for creating simple and easily understandable monitoring tools that reveal the cumulative influences of different physicochemical parameters, is based on the weight and rate of each parameter, and expresses water quality. Physicochemical parameters provide a useful interpretation for evaluating trends, identifying specific environmental issues, and communicating information on water quality and water vulnerability to pollution [13][14][15].
A single water quality parameter by itself is not appropriate for evaluating water quality because it may be limiting and may produce insufficient performance; therefore, some documented pollution indices (PIs) have defined indices for B, Cd, Cr, Cu, F, Fe, Mn, Ni, Pb, and Zn contents and include the heavy metal pollution index (HPI), heavy metal evaluation index (HEI), contamination index (CD), and pollution index (PI), which can be used to understand the current status of surface water hydrochemistry and evaluate water suitability for drinking purposes. Surface water sources are vulnerable to impacts from human activities that may contribute to potential ecosystem destruction, so water pollution indices, including the HPI, HEI, CD, and PI, are useful approaches for surface water quality assessment and reflect overall water quality by considering the cumulative effects of heavy metals [16][17][18]. Water pollution indices are deemed a cost-effective means of preserving safety by establishing a control scheme for assessing the development, growth, urban production, and direction of human activities to reduce detrimental impacts on water quality resources. Many studies on water pollution and water quality monitoring have used documented pollution indices for heavy metals and include works by AbouZakhem and Hafez 2015 [19], Balakrishna and Ramu 2016 [20], El Fehri et al. 2014 [21], Gad and El-Hattab 2019 [1], and Sobhanardakani et al. 2017 [22].
The HPI is an effective method for rating the combined effect of individual heavy metals on overall water quality and the perception of surface water suitability for human use [23]. In addition, the HEI also considers the possible additive impact of heavy metals, which enables rapid assessment of overall drinking water quality [24,25]. The CD independently tests the relative toxicity of specific metals and reflects the cumulative effects of all metals on water quality, and the PI measures the levels of contamination effects on water quality with respect to individual heavy metals [26]. Therefore, water quality is assessed by measuring the degree of heavy metal exposure as an integration of the individual contamination parameters through the cumulative effects of heavy metals that are deemed hazardous for human consumption [18,22,27]. Although the DWQI and PIs are useful for water quality evaluation, multivariate statistical analyses are also broadly used to assess water quality, such as principal component analysis (PCA), which is widely used in hydrochemical and hydrogeological studies [28][29][30][31]. Therefore, PCA is a multivariate technique to identify significant heavy metals and the interrelationships between those metals and to understand the main factors influencing the distribution of those metals in surface water resources [3,32].
Integration of the DWQI and PIs is a valuable and applicable approach for assessing surface water quality through learning machine models, which are essential to describe the status of surface water quality and its controlling mechanisms for policymakers, which is helpful for selecting appropriate treatment techniques to address issues of concern [33,34]. Calculating the DWQI and PIs requires many calculation steps that require significant time and effort to transform large amounts of water characterization data into a single value describing the water quality level and reflecting the overall water quality level [17,26,35,36]. The partial least squares regression (PLSR) and stepwise multiple linear regression (SMLR) could be used to overcome this problem since they are typical methods that specify a linear relationship between a set of independent and response variables [37][38][39][40][41][42][43]. To the best of our knowledge, there is very little information available on comparative assessments of the performance of PLSR and SMLR models for predicting the DWQI and PIs.
Therefore, the objectives of this work were to (i) investigate water faces, heavy metals and geochemical processes by using physicochemical parameters; (ii) evaluate the suitability of the surface water for drinking purposes using the DWQI; (iii) evaluate surface water vulnerability to contamination using PIs, such as the HPI, HEI, CD, and PI; (iv) evaluate the performance of PLSR models as rapid methods based on major ions and heavy metals to predict the DWQI and PIs; and (v) evaluate stepwise multiple linear regression analyses based on the most influential major ions and heavy metals to estimate the DWQI and predict PIs.

Study Area
The study area is located in the northern Nile Delta (Egypt), which is located between 30 • 59 38" and 31 • 36 00" E latitude and between 30 • 21 40" and 31 • 18 40" N longitude ( Figure 1). Thus, it is considered the end of the disposal of all pollutants (agricultural and industrial). This area is bounded by the Mediterranean Sea to the north, with a coastline of approximately 100 km, by the Rosetta Nile Branch to the west, by the Damietta Nile Branch to the east, and by the Al Gharbiya Governorate to the south. According to Central Agency for Public Mobilization and Statistics (CAPMAS) 2012 [44], the study region has a total population of approximately 2.9 million people, which represents approximately 3.6% of the total population in Egypt.

Sampling and Analyses
Fifty-five water samples were collected from the surface water network in the studied area during summer 2019, and the geographical locations of each sampling site were recorded using Universal Transverse Mercator (UTM) coordinates by a handheld MAGELLAN GPS 315 ( Figure 1). Two sets of surface water samples were collected from each sampling location in 500 mL polyethylene bottles and were filtered through 0.45 µm Whatman filter paper. For trace elements analysis, the first set was acidified using nitric acid to a pH < 2. The other set was used to measure the rest of the physicochemical parameters. All the water samples were stored in a 4 • C refrigerator. In this study, 22 physicochemical parameters, including water temperature, pH, total dissolved solids (TDS), electrical conductivity (EC), K + , Na + , Ca +2 , Mg 2+ , Cl − , SO 4  analyzed using standard analytical techniques [45]; the results are presented in Table 1 and Table S1. Alkalinity was referred to the HCO 3 − since CO 3 2− was below the detection limit. Water temperature, total dissolved solids (TDS), pH, and electrical conductivity (EC)were measured in situ using a portable calibrated salinity multi-parameter instrument (Hanna HI 9811-5, Hanna Instruments Italia Srl, 35030 Sarmeola di Rubano-PD, Italy). Alkalinity and Cl − were analyzed by volumetric titration, while K + , Na + , Ca +2 , Mg 2+ , Cd, Cr, Cu, Fe, Mn, Ni, Pb, and Zn were analyzed using an atomic absorption spectrometer (FAAS-Zeeman AASZ-5000, Hitachi, Japan). In addition, a UV/visible spectrophotometer was used to analyze SO 4 2− , NO 3 − , B, and F. These procedures are highlighted in American Public Health Association (APHA) 2012 [45].

Indexing Approach
Water quality indices (WQIs), such as the DWQI, HPI, HEI, CD, and PI, were estimated with respect to heavy metal concentrations. The DWQI is defined by mathematical methods and is considered the most useful index for measuring the overall quality of surface water for drinking use. The DWQI is calculated using the arithmetic weight method according to Equation (1): Q i is the sub-quality index of each parameter, W i is the weight unit of each parameter, and 22 physicochemical parameters, expressed in mg/L (n = 22), were used. The computed value of Q i depends on the surface water concentration (C i ) and standard (S i ) for the drinking water value of each surface water parameter according to the World Health Organization (WHO) 2011 [46], as shown in Equation (2): w i for each parameter was calculated according to the recommended standards [46] by Equation (4): where K is the proportionality constant.
To compute the DWQI, assigning a weight to each surface water parameter (w i ) and calculating the relative weight (W i ) and quality rating range (Q i ) are required. Therefore, W i values were assigned for pH, EC, TDS, TH, K + , Na + , Ca +2 , Mg 2+ , alkalinity, Cl − , SO 4 2− , NO 3 − , B, Cd, Cr, Cu, F, Fe, Mn, Ni, Pb, and Zn, while w i was calculated using Equation (4). Weighted values were assigned according to the relative significance of the surface water parameters for drinking water quality and ranged from 1 to 5 [35]. The computed values of the standards, weights (wi), and relative weights (Wi) for the surface water parameters are presented in Table 2.

Pollution Indices (PIs)
The pollution indices, including the HPI, HEI, CD, and PI, were estimated for the concentrations of heavy metal, such as B, Cd, Cr, Cu, F, Fe, Mn, Ni, Pb, and Zn, according to the equations presented in Table 3. Table 3. Arithmetic rating method for calculation of heavy metal pollution index (HPI), heavy metal evaluation index (HEI), contamination index (CD), and pollution index (PI). W is weight (1/MAC), S is standard permissible level in ppm, I is highest permissible level in ppm, and MAC is maximum admissible concentration.

Heavy Metal Pollution Index (HPI)
The overall water quality was represented by a toxicological index (HPI) based on rating the arithmetic weights of heavy metals. The HPI values reflect the combined influence of the metals on total water quality [17,22] with respect to the recommended standard guidelines (S i ) for each metal, namely, B, Cd, Cr, Cu, F, Fe, Mn, Ni, Pb, and Zn. The HPI values were estimated according to Equation (5): where W i and Q i are the unit weights and the sub-indices for B, Cd, Cr, Cu, F, Fe, Mn, Ni, Pb, and Zn, respectively, and n = 10, which represents the number of heavy metals monitored. The HPI values were classified into three categories, which consisted of low heavy metal pollution (HPI < 100), heavy metal pollution with threshold risk (HPI = 100), and high heavy metal pollution (HPI > 100) [16,47].

Heavy Metal Evaluation Index (HEI)
Water quality conditions under the stress of heavy metals were represented by the HEI according to Equation (6): where H c is heavy metal concentration, H max is the maximum allowed concentration for each metal, and the subscript i is the i-th sample [36].

Contamination Index (CD)
The surface water contamination levels were measured using the contamination factors of individual heavy metals that exceeded permissible limits, which are expressed by CD values [16,26] according to Equations (7) and (8): Water 2020, 12, 2142 7 of 21 where C fi is the contamination factor for an individual heavy metal, C Ai is the analytical value for each metal, C Ni is the permissible concentration of each metal, and C Ni is taken as MAC.
Pollution Index (PI) The effect of pollution on surface water was measured for heavy metals using PI values. These reflect the individual contamination effect of each heavy metal on surface water quality and categorized to five classes (Table 4) according to Equation (9): where C i is the concentration of each metal and S i is the metal level according to the concentration of each metal in water [2,26].  [49], were applied using Geochemist's Workbench Student Edition 12.0 software to identify surface water facies, geochemical processes, and the dominant surface water chemistry control factors.
Multivariate statistical analyses are widely used for water quality assessments to improve the identification of effective pollutant factors in surface water by reducing the chemical analysis data into common patterns [28][29][30][31]. PCA was applied to recognize the sources or factors that were responsible for changes in water quality by transforming the original variables into a new set of variables that reflected the influence of major ions and heavy metals on surface water quality. The analytical chemical results of the physicochemical concentrations were processed for PCA using PAST software version 3.25 (Øyvind Hammer, University of Oslo, Oslo, Norway).
Chemometric methods, such as PLSR, are important modeling techniques that can effectively analyze data with many strongly multi collinear and noisy variables. PLSR was built by using unscramble X software version 10.2 (CAMO Software AS, Oslo, Norway). It was used to construct predictive models of the DWQI based on the major ions and heavy metals as input parameters and for PIs with respect to heavy metals. For example, PLSR is a standard calibration method for testing a single dependent variable (e.g., HPI) and multiple independent variables (e.g., B, Cd, Cr, Cu, F, Fe, Mn, Ni, Pb, and Zn). The PLSR tool can construct accurate models if the number of independent variables significantly exceeds the number of measured traits (e.g., dependent variables) [43,50,51]. The calibration and validation models were constructed through cross-validation of PLSR to minimize overfitting.
An important step in PLSR analysis is to select the optimum number of latent factors (PCs) to represent the calibration data without overfitting. To increase the model performances, 12-fold cross-validation was performed on the data to increase the robustness of the results, and the maximum number of latent factors was selected for the DWQI and PIs, as suggested by the software. The accuracies of the calibration (Cal.) and validation (Val.) models were indicated by the adjusted coefficients of determination, root mean square errors (RMSE), and slopes of the linear relationship models between the observed and predicted of values of the DWQI and PIs. The best model for both Cal. and Val. was chosen based on the RMSE minimum value and the maximum R 2 value and slope value.
SPSS software version 22 was used to construct stepwise multiple linear regressions (SMLR). Stepwise multiple linear regression (SMLR) is also a statistical regression method that is used to analyze the relationship between the single variable responses (e.g., dependent variable) with two or more variables (e.g., independent variables).
The major ions and heavy metals were further analyzed using the SMLR method to identify the most influential parameters that explained the greatest variability in the DWQI and PIs. This approach incorporated forward selection and backward elimination and selects the input variable (e.g., major ions or heavy metals) for the step-by-step regression equation depending on the significance of the input variables. Only those parameters that remained relevant at p-values with an F-statistic of 0.05 in the models were retained at each step. In addition, the parameters were discarded with a significant probability level set at 0.01 during backward elimination [52]. The equation for stepwise multiple linear regression models can be represented as: where Y = response variable, such as DWQI and PIs, β 0 = Constant variable, β 1 to β m = coefficients of control variables of major elements or heavy metals, and X 1 to X m = control variables of major elements or heavy metals.

Physicochemical Data
Physiochemical parameters play a decisive role in water quality assessments and are considered a significant reference for understanding the nature of water chemistry and relevant control mechanisms. Statistical descriptions of the physicochemical parameters in the collected surface water network are presented in Table 1. For example, the physiochemical parameters obtained show that the TDS values for the surface water samples ranged from 260 mg/L to 506 mg/L, with a mean value of 359.7 mg/L, and were associated with EC values, which varied from 406.3 to 790.6 µS/cm. The cation and anion concentrations of K + , Mg +2 , Na 2+ , Ca +2 , Cl − , SO 4

Geochemical Facies and Controlling Mechanisms
Piper's trilinear diagram was applied with respect to dominant cations and anions in surface water samples ( Figure 2). According to the chemical composition of the analyzed surface water samples, there are two water types presented by Ca-Mg-alkalinity and Ca-Mg-Cl-SO 4 . In addition, the main controlling processes that drive the surface water geochemistry are recognized using Gibb's diagram by plotting TDS vs. the ratios (Na + K)/(Na + K + Ca) and Cl/(Cl + alkalinity). According to the plot of geochemical data on the Gibbs diagram, the surface water points were scattered in the weathering and rock dominance fields (Figure 3). water samples (Figure 2). According to the chemical composition of the analyzed surface water samples, there are two water types presented by Ca-Mg-alkalinity and Ca-Mg-Cl-SO4. In addition, the main controlling processes that drive the surface water geochemistry are recognized using Gibb's diagram by plotting TDS vs. the ratios (Na + K)/(Na + K + Ca) and Cl/(Cl + alkalinity). According to the plot of geochemical data on the Gibbs diagram, the surface water points were scattered in the weathering and rock dominance fields (Figure 3).

Water Quality Indices
Statistical descriptions of water quality indices, including the DWQI, HPI, HEI, and CD, are presented in Table 5. The DWQI values ranged from 36.72 to 136.73, with a mean value of 66.99, and the results obtained indicated that 33% of surface water samples fell in the good water category, while 58% of samples were in the poor water category, and 9% of samples were in the very poor to

Water Quality Indices
Statistical descriptions of water quality indices, including the DWQI, HPI, HEI, and CD, are presented in Table 5. The DWQI values ranged from 36.72 to 136.73, with a mean value of 66.99, and the results obtained indicated that 33% of surface water samples fell in the good water category, while 58% of samples were in the poor water category, and 9% of samples were in the very poor to unsuitable water categories for drinking use. The spatial distribution map of DWQIs of the surface water network in the studied area indicated that most of the surface water quality degradation was observed near Burulus Lake at the end of the Rosetta Branch along the trend of the northwestern portion of the Nile Delta (Figure 4a). The HPI values ranged from 26.28 to 222.51, with a mean value of 68.24, which revealed that 87% of samples were lower than the critical HPI value (100), representing low polluted water values, while 13% of samples were above the critical HPI value, representing water highly polluted by heavy metals (Figure 4b). The HEI values of the surface water samples ranged from 1.98 to 17.23, with a mean value of 6.59, and the HEI results indicated that all surface water samples were gradually affected by heavy metals, where 4% of samples were moderately affected, 49% of samples were strongly affected, and 47% of samples were seriously affected by heavy metals. According to the spatial variation map of HEI results, surface water samples were more affected by heavy metals in the central and northwestern parts in the direction of surface water flow from the south to northwest parts of the Nile Delta (Figure 4c). The computed values for CD for the studied surface water samples revealed that the CD values ranged from −8.02 to 7.23, with a mean value of 3.33.
The CD values revealed that the majority of surface water samples (91%) had negative values (CD < 1), indicating better quality with respect to heavy metals, while the remaining samples (9%) had positive values (CD > 1), indicating medium to highly contaminated surface water (Figure 4d). The PI results revealed two classes of heavy metal effects based on the classification of PI levels ( Table 6). The PI values obtained revealed that the surface water samples were slightly affected by Cr (PI = 1.0) and Fe (PI = 1.17). In addition, the surface water network was strongly affected by Mn (PI = 4.50) and Pb (PI = 4.50), while there were no effects exerted by B, Cd, Cu, F, Ni, and Zn (PI > 1.0).

Relationships between the Drinking Water Quality Index and Pollution Indices
The relationships between the DWQI and PIs (as dependent variables) were calculated via simple regressions, as shown in Figure 5. The plot of DWQI vs. PIs, including HPI, HEI, and CD, showed a high positive correlation coefficient (R 2 = 0.91) with HEI and CD, while the relationship with HPI also had a high positive correlation (R 2 = 0.79).  The PI results revealed two classes of heavy metal effects based on the classification of PI levels ( Table 6). The PI values obtained revealed that the surface water samples were slightly affected by Cr

Principal Component Analysis
PCA was performed for all major ions and heavy metals for the surface water sites, and the results are presented in Figure 6. PCA explained 63.43% and 12.80% of the total variation between major ions (Figure 6a) as well as 24.58% and 18.83% of the total variation between heavy metals (Figure 6b). All major ions were grouped in a positive direction expect for NO 3 − , and all major ions and heavy metals were grouped in a positive direction except for Zn. The PCA results revealed that the presence of ten fundamental principal components indicated the effect of heavy metals on surface water quality in the study area; therefore, PC1 showed maximum loading of Fe, Cr, and F, while PC2 showed maximum loading of Pb, Mn, Ni, B, Cd, and Cu (Figure 6b). Figure 7 shows a 1:1 scatter plot of the observed and predicted values for the DWQI and PIs for the PLSR analysis of water sites. The PLSR models provided accurate predictions of the DWQI, HPI, HEI, and CD for both the Cal. and Val. datasets, with R 2 values ranging from 0.98 to 1.00 in the Cal. dataset and from 0.88 to 0.99 in the Val. Analysis of the optimum number of latent factors (PCs) to represent the calibration data without overfitting in the PLSR models was indicated (Figure 7).

Principal Component Analysis
PCA was performed for all major ions and heavy metals for the surface water sites, and the results are presented in Figure 6. PCA explained 63.43% and 12.80% of the total variation between major ions (Figure 6a) as well as 24.58% and 18.83% of the total variation between heavy metals Figure 5. The relationships between the drinking water quality index (DWQI) and HPI, HEI, and CD, with respect to heavy metals. ***: p < 0.001. (Figure 6b). All major ions were grouped in a positive direction expect for NO3 − , and all major ions and heavy metals were grouped in a positive direction except for Zn. The PCA results revealed that the presence of ten fundamental principal components indicated the effect of heavy metals on surface water quality in the study area; therefore, PC1 showed maximum loading of Fe, Cr, and F, while PC2 showed maximum loading of Pb, Mn, Ni, B, Cd, and Cu (Figure 6b).   Stepwise multiple linear regression was used to estimate the most influential parameters to better assess the greatest variations for the DWQI and PIs. For example, The SMLR model using all the major ions and heavy metals as input data, performed better for estimating the DWQI as output variable of water samples in Table 7 with (R 2 = 1 and standard error = 0.394). Since SMLR model (21) Figure 7. Observed and predicted relationships for the calibration and validation datasets of the DWQI and HPI, HEI, and CD with respect to heavy metals using the partial least squares regression model. ***: p < 0.001.
Stepwise multiple linear regression was used to estimate the most influential parameters to better assess the greatest variations for the DWQI and PIs. For example, The SMLR model using all the major ions and heavy metals as input data, performed better for estimating the DWQI as output variable of water samples in Table 7 with (R 2 = 1 and standard error = 0.394). Since SMLR model (21) in Table 7 had the lowest standard error and highest R 2 , this model was selected to estimate the DWQI.  Table 7. Extraction of the most influential major elements and heavy metals using stepwise multiple linear regression for the drinking water quality index and pollution indices.  SMLR model (10) using all heavy metals as input data performed better for estimating the three pollution indices (e.g., HPI, HEI, and CD) as output variable in Table 7. SMLR and the equations of three models are: In general, the four above equations performed the best estimation for the DWQI, HPI, HEI, and CD, respectively, with the lowest standard error and highest R 2 .

Physiochemical Parameters
Ostensibly, the physicochemical parameters obtained show that the pH values were slightly acidic to alkaline and fell in the range of acceptable drinking water according to the guidelines of the WHO 2011 [46]. The pH values revealed the presence of Ca 2+ , Mg 2+ , and CO 3 2− in the water samples and a reduction in heavy metal toxicity [53,54]. The TDS levels for the collected samples indicated that the surface water quality was of the freshwater type (e.g., less than 1000 mg/L) because of the effects of very little solute dissolution and rapid ion exchange between soil and water through continuous recharging from the Nile Delta branches. Seemingly, the EC values were lower than the permissible limit for drinking water according to the guidelines of the WHO 2011 [46] (1500 µS/cm), which indicated the suitability for drinking purposes. The cation and anion concentrations indicated that calcium was the dominant cation, and sodium was the second most prevalent cation. In addition, alkalinity was the dominant anion, and sulfate was the second most prevalent anion. Based on the results of cation and anion concentrations, the surface water of the study area had values below the WHO 2011 [46] guidelines except for alkalinity in some samples from the southern part of the study area. The high alkalinity concentrations indicated that the surface water in the study area was in the first stage of water quality evolution.
On the other hand, the heavy metal concentrations for B, Cd, Cr, Cu, F, Fe, Mn, Ni, Pb, and Zn varied significantly between samples, which indicated that the surface water in the study area was contaminated with chromium, iron, manganese, and lead at levels that exceeded the proposed permissible limits according to the WHO 2011 [46]; these heavy metal results were also reported by Masoud et al. 2007 [8]. The obtained physicochemical results for the studied area are comparable to those reported by many studies in this region [8][9][10][11].
The physicochemical properties of water are considered natural, and can be used to comprehensively understand the factors influencing pathways affecting the quality of surface water. Surface water chemistry focused on hydrochemical criteria provides preliminary information on water types and various geochemical processes [14,[55][56][57].
Piper's classification revealed that alkaline earths (Ca, Mg) and weak acids (alkalinity) were in preponderance over alkalis (Na, K) and strong acids (SO 4 , Cl) in the majority of the selected water points in the study area, thereby indicating Ca-Mg-alkalinity water type. The prevailing water types in the study area, as shown by the Piper diagram (Figure 2), were Ca-Mg-alkalinity and Ca-Mg-Cl-SO 4 facies. These facies indicated that the surface water was affected by rock water interaction and weathering processes [58].
According to the plot of geochemical data on Gibbs diagram, the surface water points were scattered in the weathering and rock dominance fields ( Figure 3). Therefore, these processes are considered the main factors controlling mechanisms that effect surface water geochemistry in the study area. The impact of these geochemical processes on ambient water quality of the Nile River water has not been significant due to the high self-assimilation capacity of the river water.

Assessment of Water Quality Indices
WQIs are viewed as a significant tool to detect the suitability of water for drinking use with respect to heavy metals [59]. The WQIs results obtained showed that the majority of surface water sites in the study area were not recommended for drinking use, especially in the northwestern parts of the study area near Burulus Lake in the downstream portion of the Rosetta Branch ( Figure 4). This may be attributed to the poor drainage network and runoff from extensive tracts of farmed areas [60]. Accordingly, surface water in the study area should be treated before it can be used for drinking purposes. The pollution indices, including the HPI, HEI, and CD, revealed that surface water in the study area was contaminated by heavy metals and that heavy metal pollution increased gradually from the south to northwestern parts in the direction of the surface water network flow (Figure 1). In the present study, there was no significant correlation between the distribution of the contamination and the flow directions, and this could be due to the proportions of variables in the heavy metal measurement schemes. Therefore, the contamination of surface water by heavy metals reached high levels in the areas that were dissected by drainage networks and aggregation of heavy metal content during surface water low-flow states, high temperatures, and evaporation in the study area as reported by Abdel-Satar 2001 [61] (Figure 4). In addition, the PI results revealed that the surface water sites were slightly affected by Cr and Fe and were strongly affected by Mn and Pb ( Table 6). The high loadings of Fe and Mn may be attributed to soil-water interactions, while the high loading of Cr revealed industrial activities. In addition, the high contribution of Pb could be considered to be due to traffic activities [62,63], and poor sanitation infrastructure. A comparison of the spatial distribution maps of both DWQI and HEI results (Figure 4a,c) reflected the degradation of surface water quality for drinking use near the downstream part of the Rosetta Branch. Deterioration in surface water quality in the study area was influenced by heavy metals, based on the relationship between the DWQI and PIs ( Figure 5). Thus, the water quality degradation in the studied area indicated that the surface water network was highly polluted according to HPI and was strongly to seriously affected by heavy metals according to HEI while reflecting low water pollution levels for CD due to variations in the evaluation schemes for heavy metal concentrations. Surface water quality in the studied area was subject to degradation due to the increasing levels of swept-out effluents along different drains into the surface water network. Therefore, the integration of the DWQI and PIs is a valuable and applicable approach for assessment of surface water quality for drinking purposes based on physicochemical parameters with respect to heavy metals.

Principal Component Analysis
PCA of surface water samples for the major ions revealed lithogenic sources that resulted from soil-water interactions and were represented by the loadings of TDS, K + , Na + , Ca +2 , and alkalinity on PC1 and the loadings of TH, Mg 2+ , Cl − , and SO 4 2− on PC2 (Figure 6a). On the other hand, PCA of surface water sites for metals also revealed the loadings of Fe, Cr, and F on PC1 and the loadings of Mn, Cr, Cu, Ni, B, and Cd on PC2 (Figure 6b). Therefore, these results may be attributed to soil-water interaction, industrial practices, and anthropogenic practices [3,64,65] that lead to surface water network contamination by individual heavy metals, especially the high loadings of Fe, Cr, Mn, Ni, and Pb. A strong agreement between PCA and HEI was observed, which indicated that the majority of surface water locations in the study area had poor water due to heavy metal contamination. A comparison between PCA and PI results was consistent with these results. Therefore, the integration of heavy metal contributions in PCA and PIs revealed lithogenic sources, industrial activities, and agricultural management practices that have developed close to the Rosetta Branch in recent years. Therefore, integrating PCA and PI is a valuable and applicable approach for surface water quality assessment with respect to heavy metals and shows special promise and unique insights for surface water quality assessment.

Partial Least Square Regression Models and Stepwise Multiple Linear Regressions to Predict the Drinking Water Quality Index and Pollution Indices
Accurate estimates of the DWQI, HPI, HEI, and CD of water sites can be performed by mathematical methods [17,26,35,36]. However, these methods are complicated because they need several mathematical equations to transform large amounts of water characterization data into a single value that describes water quality levels and reflects the overall water quality level. For that, the PLSR model was tested in this study to predict the DWQI based on the multiple response variables of major ions and heavy metals and PIs based on multiple response variables of heavy metals. PLSR is used in different fields. A common application is to model relationships between spectral measurements (e.g., Near-infrared (NIR), Infrared (IR), Ultraviolet (UV)), which involves several variables that are often correlated with each other [64,66]. The models presented robust and accurate estimations for the different indices by showing the highest R 2 values, and greatest slope values were close to 1.00, in addition to the lowest RMSE values for the calibration and validation datasets. For example, the PLSR model for HEI showed that the R 2 values were as high as 1.00 and 0.99, the slope values were as high as 1.00 and 0.99, and the RMSE values were as high as 0.09 and 0.11 for the calibration and validation datasets, respectively.
In addition, stepwise multiple linear regressions were evaluated to estimate the DWQI and PIs. The SMLR models produced good estimations for all indices by using some or all of the major ions and heavy metals with R 2 value = 1 and a very small standard error with the DWQI. These results agree with those of Mustapha and Aris 2012 [38], who found that the multiple linear regression model for water quality in Jakara-Getsi (Nigeria) could predict high heavy metal concentrations with R 2 values of 0.97 and a significance effect of 0.001. The SMLR models showed that the standard error when estimating the DWQI and PIs decreased with an increased number of input variables for the major elements and heavy metals.

Conclusions
This study presented integrated approaches for characterizing the suitability of surface water quality for drinking use with respect to physicochemical characteristics in the northern Nile Delta, Egypt, which was supported by multivariate statistical analyses, such as PCA, PLSR and SMLR. According to the obtained analytical results, surface water in the investigated area belongs to Ca-Mg-alkalinity and Ca-Mg-Cl-SO 4 water types. The surface water network in the study area was strongly affected by Pb and Mn and slightly affected by Fe and Cr. The deterioration of surface water quality can be attributed to large applications of agrochemical pesticides, industrial activities, and poor drainage networks. Therefore, the application of efficient treatment techniques for irrigation wastewater before disposing into the fresh surface water network will lead to better remediation of surface water quality deterioration in the study area. The utilization of physicochemical parameters and water quality indices, including DWQI, HPI, HEI, CD, and PI, with the support of multivariate statistical analysis, is an effective and applicable approach to provide a clear picture of surface water quality and controlling mechanisms. The PLSR models are easy, fast, and reliable methods to calculate the measured indices. The PLSR models presented robust and accurate estimations for the different indices by showing the highest R 2 values and greatest slope values close to 1.00 in addition to the lowest RMSE values for the calibration and validation datasets. The SMLR models produced the best models to estimate DWQI and PIs by using all major ions and heavy metals as input data with R 2 values =1. Future studies should test the PLSR and SMLR models under different environmental conditions for surface water.