Identiﬁcation of the Hydrogeochemical Processes and Assessment of Groundwater Quality, Using Multivariate Statistical Approaches and Water Quality Index in a Wastewater Irrigated Region

: Groundwater quality and availability are essential for human consumption and social and economic activities in arid and semiarid regions. Many developing countries use wastewater for irrigation, which has in most cases led to groundwater pollution. The Mezquital Valley, a semiarid region in central Mexico, is the largest agricultural irrigation region in the world, and it has relied on wastewater from Mexico City for over 100 years. Limited research has been conducted on the impact of irrigation practices on groundwater quality on the Mezquital Valley. In this study, 31 drinking water wells were sampled. Groundwater quality was determined using the water quality index (WQI) for drinking purposes. The hydrogeochemical process and the spatial variability of groundwater quality were analyzed using principal component analysis (PCA) and K-means clustering multivariate geostatistical tools. This study highlights the value of combining various approaches, such as multivariate geostatistical methods and WQI, for the identiﬁcation of hydrogeochemical processes in the evolution of groundwater in a wastewater irrigated region. The PCA results revealed that salinization and pollution (wastewater irrigation and fertilizers) followed by geogenic sources (dissolution of carbonates) have a signiﬁcant e ﬀ ect on groundwater quality. Groundwater quality evolution was grouped into cluster 1 and cluster 2, which were classiﬁed as unsuitable (low quality) and suitable (acceptable quality) for drinking purposes, respectively. Cluster 1 is located in wastewater irrigated zones, urban areas, and the surroundings of the Tula River. Cluster 2 locations are found in recharge zones, rural settlements, and seasonal agricultural ﬁelds. The results of this study strongly suggest that water management strategies that include a groundwater monitoring plan, as well as research-based wastewater irrigation regulations, in the Mezquital Valley are warranted.


Introduction
Water scarcity is a growing challenge in the arid and semiarid regions of the world [1][2][3][4]. In these regions, the use of wastewater for irrigation and other purposes has become a common and often unregulated practice, and groundwater quality assessments are seldom carried out [5,6]. In Latin America, it is estimated that 70% of the raw wastewater used in irrigation reaches an underlying aquifer [7,8]. The pollutants impacting the aquifer and the vadose zone with this practice include excess salts and heavy metals [1]. In its movement from the unsaturated zone to the saturated zone, groundwater quality changes due to various processes, including carbonate dissolution, ion exchange, and silicate weathering [9]. Unfortunately, few studies have focused on groundwater quality changes in both quality and spatial extent related to the irrigation use of wastewater [3,4,10].
The Mezquital Valley is a semiarid region in Central Mexico that utilizes about 1500 million m 3 per year of wastewater effluent from Mexico City for irrigation purposes [11]. The study region is composed of 5647.88 ha of the 90,000 ha that compose the entire Mezquital Valley [12]. The wastewater recharge rate in the study area is approximately 25 m 3 /s, and corresponds to 13.3 times the natural recharge rate in the Mezquital Valley [12]. The accelerated recharge rate facilitates the transport of persistent pollutants to the unconfined aquifer, and results in a deleterious impact on groundwater quality [13]. Previous studies have reported the pollution of groundwater related to high levels of NO 3 − [14], as well as TDS (total dissolved solids) and Na + above the maximum allowable limits [15].
In the study region, the available studies have focused on the impact of wastewater on heavy metal contamination on soil and crops, and the effect on soil properties and plant growth. However, groundwater classification and spatial quality characterization, with the purpose of identifying the most vulnerable regions to assist groundwater policy makers make informed decisions, has not been addressed.
However, it is a challenge to evaluate the complex, highly dimensional groundwater hydrogeochemical datasets. Multivariate statistical approaches, such as principal component analysis (PCA) and K-means clustering are robust tools for groundwater resources management [17][18][19][20][21]. They have been successfully used to define and understand the hydrogeochemical processes that dominate groundwater quality and identify pollution sources [22,23].
PCA explains the variance of large datasets of correlated variables, and converts them into a reduced set of uncorrelated variables known as their principal component (PC) [24,25]. K-means clustering is a powerful technique for discovery internal relationships and different characteristics of objects [26]. The K-means clustering algorithm partitions a set of objects into groups, where similar objects belong to unique clusters [17,22,24]. A water quality index (WQI) is also an efficient tool for identifying water quality and its suitability for a particular use [17,24,27]. Moreover, Piper and Stiff diagrams and ion relation scatter diagrams are standard methods to study geochemical data and analyze water-rock interactions, respectively [28,29].
In this study, K-means clustering, and PCA were used to identify the relevant hydrogeochemical processes that define groundwater quality. WQI was used to assess groundwater quality for drinking purposes and its spatial variations in a wastewater irrigation district in the Mezquital Valley, México.

Study Area
The study area is on the northern end of the Mezquital Valley ( Figure 1). It has an area of approximately 2200 km 2 . According to [13], there are three dominant soil types in the region: (1) Phaeozems, (2) Leptosols, and (3) Vertisols. The climate is semiarid, with a mean annual temperature of 18 • C [12,13,30]. Rainfall amounts range from 300 to 500 mm/year, while potential evapotranspiration (around of 2100 mm/year) exceeds the mean annual precipitation [12,13,30,31].

Geologic Setting
The geology of the Mezquital region includes volcanic rocks and a minor amount of sedimentary rocks. The El Doctor, Mexcala, and Soyatal formations belong to the Upper Cretaceous, and are mostly composed of limestone, sandstone, and shale respectively ( Figure 1). In addition, a sequence of interbedded shale, sand, and siltstone predominate in the Mexcala Formation [31,32].
Tertiary deposits are comprised primarily of detrital continental and volcanic rocks corresponding to the Medium Pliocene [32]. The Pachuca group composition varies from basaltic to rhyolitic rock. The thickest sequence of rocks is known as Pachuca mountain range. The Don Guinyó Formation encloses volcanic tuff and breaches, as well as compacted ignimbrites made up of dacitic and rhyolitic rock. The Zumate Formation includes a sequence of andesitic and dacitic rocks, with interbedded lava and lahar deposits. The Taximay Formation is comprised of lacustrine deposits composed of clay, with a thickness greater than 50 m. It is located across a hydrographic ridge between the Mezquital and Zumpango Valley. The San Juan group is composed of a succession of lava spills, with a few interbedded layers of tuff, breach, and volcanic agglomerates with thicknesses

Geologic Setting
The geology of the Mezquital region includes volcanic rocks and a minor amount of sedimentary rocks. The El Doctor, Mexcala, and Soyatal formations belong to the Upper Cretaceous, and are mostly composed of limestone, sandstone, and shale respectively ( Figure 1). In addition, a sequence of interbedded shale, sand, and siltstone predominate in the Mexcala Formation [31,32].
Tertiary deposits are comprised primarily of detrital continental and volcanic rocks corresponding to the Medium Pliocene [32]. The Pachuca group composition varies from basaltic to rhyolitic rock. The thickest sequence of rocks is known as Pachuca mountain range. The Don Guinyó Formation encloses volcanic tuff and breaches, as well as compacted ignimbrites made up of dacitic and rhyolitic rock. The Zumate Formation includes a sequence of andesitic and dacitic rocks, with interbedded lava and lahar deposits. The Taximay Formation is comprised of lacustrine deposits composed of clay, with a thickness greater than 50 m. It is located across a hydrographic ridge between the Mezquital and Zumpango Valley. The San Juan group is composed of a succession of lava spills, with a few interbedded layers of tuff, breach, and volcanic agglomerates with thicknesses less than 400 m. The Tarango Formation is made up of interbedded layers of gravel, sand, silt, and clay.
The quaternary deposits are enclosed by lava spills and cineritic cones, in addition to alluvial and fluvial sediments. They are composed of sands, clays, and gravels covering the valley's surface [31,32].

Hydrogeologic Setting
The aquifer underlies the Tula River watershed (Figure 1), and its recharge sources are irrigation and precipitation. It has a groundwater flow direction going from south to north [30,32]. Previous studies describe the aquifer as unconfined to semi-confined, heterogeneous, and anisotropic, and it is composed of an upper and a lower portion. The upper aquifer consists of porous formations of fluvial origin composed of interbedded granular alluvial material, volcanic rocks, and pyroclastic sediments with thicknesses of up to 400 m [31]. The lower (semi-confined) aquifer is made up of volcanic rocks, with interbedded lava of the Tarango formation [32].

Sample Collection
A total of 31 groundwater samples were collected from wells at urban settlements within the study area during the rainy season, in a sampling period from 15 July to 20 July 2014 ( Figure 1). Two samples per location were collected into 500 mL polyethylene bottles. The first sample at each location was used for cation analysis, and preserved with 2 mL of HNO 3 . The second sample was used for anion analysis, with no acid added [33].

Analytical Methods
All samples were analyzed according to the standards set by the American Public Health Association [33]. The pH, electric conductivity (EC), and total dissolved solids (TDS) were measured in situ by using a Hanna (HI9828, Hanna, USA) multi-parameter. Concentrations of major cations, such as calcium (Ca 2+ ), magnesium (Mg 2+ ), sodium (Na + ), and potassium (K + ), were analyzed using a Flame Photometer. Concentrations of bicarbonate (HCO 3 − ), carbonate (CO 3 2− ), chloride Cl − , and total hardness (TH) as CaCO 3 were determined by the acid titration method, while anions nitrate NO 3 − and sulfate SO 4 2− were analyzed using an ultraviolet spectrophotometer. The sodium adsorption ratio (SAR) was defined by Richards (1954) in the United States Department of Agriculture (USDA) Handbook 60. The SAR was calculated by Equation (1): where SAR is a non-dimensional number, and concentrations of Na + (Na), Ca 2+ (Ca), and Mg 2+ (Mg) are expressed in mEq/L. Results of the chemical analyses were tested by the percent charge balance error (%CBE). According to [34], this is defined as where z is the absolute value of the ionic valence, and m c and m a are the molality of the cationic and anionic species, respectively. In this study, the charge balance error of all samples was between −2.81% and 3%. These results are in agreement with previous studies for the statistical treatment of datasets [35,36].

Water Quality Index (WQI)
Water quality indexes are used to identify the suitability of groundwater quality for drinking purposes [37]. WQI is the most useful method to measure water quality [38]. The WQI was developed by Bascaron [39], as shown in Equation (3). It is defined as an algorithm that determines the qualitative state of the water [40]. The WQI represents a single value that numerically summarizes multiple water parameters [24]. The WQI can be calculated either by deductive or inductive methods [41]. In this study, the WQI was computed to assess the suitability of groundwater for drinking standards, according to the Mexican Official Norm [42] and the World Health Organization [43]. The WQI is expressed by Equation (3): where n is the total number of parameters, C i is the percentage value assigned to the parameter (i) according to the concentrations, P i is the relative weight assigned to each parameter, and k is a subjective constant taken from the values in Table 1. Table 1. Weight classification according to organoleptic characteristics of water.

Weight (k)
Characteristic of Water

1.0
Water without apparent pollution-clear or with natural suspended solids 0. 75 Water with slight color, scums, and apparent non-natural turbidity 0. 50 Water with polluted aspect and strong odor 0. 25 Highly polluted water with blackish color, hard odor, and visible fermentation The WQI is calculated in four steps. In the first step, each of the 13 physicochemical parameters considered (temperature, Ca 2+ , Mg 2+ , Na + , K + , TH (CaCO 3 ), HCO 3 − , pH, EC, TDS, SO 4 2− , Cl − , and NO 3 − ) is assigned a relative weight (P i ) on a scale of 1 to 4. The highest weight of 4 is designated to parameters with critical health effects that have concentrations above the permissible limits established by the WHO [43]. The second step is to calculate the relative weight using Equation (4): where P i is the relative weight and p i is the weight of each parameter. In the third step, a percentage value (C i ) is assigned to each parameter after normalization. Table 2 shows the different parameters used in the calculation of the WQI, as well as their relative weights and the normalization factors. These values were adopted from previous studies [44][45][46][47][48], which are based on the World Health Organization [43] and the Mexican Official Norm [42].
In the fourth-step, the WQI is calculated (Equation (3)). For each sample considered, the sum of the weighted parameters is computed and multiplied by the constant k, which characterizes water aesthetics and odor ( Table 1). The WQI values ranges between 0 and 100, where 0 indicates "extremely polluted" water quality and 100 indicates "excellent" water quality. Different scales have been used by different authors [41,45,46,[48][49][50][51][52][53]. In this study, the calculated WQI values were defined according to the quality scale proposed by [49], shown in Table 3. The WQI was mapped to evaluate spatial variability in the study zone.

Multivariate Statistical Analysis
Multivariate analysis was carried out by K-means clustering analysis and principal component analysis (PCA). Statistical analysis, including descriptive statistics, a correlations matrix, PCA, and cluster analysis were performed using RStudio software V. 1.0.153 (Copyright RStudio Inc., Boston, MA, USA).
<10,000 ≤20,000 >20,000 pH 0.0909 7 7-8 7-8.5 7-9 6.5-7 6-9.5 5-10 4-11 Units: ion concentrations (mg/L), EC (µS/cm), pH (Standard Units). The optimal results of statistical multivariate methods such as PCA require an univariate and multivariate normal distribution [54][55][56][57][58][59][60], as well as homogeneity of variances (homoscedasticity) [61,62]. The univariate and multivariate normality condition was verified by Shapiro-Wilk's tests [63] and Royston's test [64], respectively. The empirical distributions of most variables deviated from a normal distribution, except for pH. A non-normal distribution was observed on the dataset based on Royston's test. A logarithmic transformation (natural logarithm) was applied to the original set of variables to achieve a normal-like distribution. In this regard, it is recommended to start with this type of univariate transformation before applying the multivariate transformation methods [65]. Feature scaling on the database was made through standardization (or Z-score normalization) to approach the optimal conditions of the multivariate analysis. Standardization reduces the difference in variances in variables, and prevents dissimilarity measures like the Euclidian distance obtained from being severely affected. Each variable was standardized to their corresponding Z scores [66], which are calculated by Equation (5): where Z i is the standardized Z score, X i is the value of each variable, and mean and S are the mean value and the standard deviation of each variable, respectively. The adjustment of the transformed variables to the normal log distribution was evaluated favorably using the Kolmogorov-Smirnov (K-S) test [22,67]. A matrix of 14 hydrogeochemical variables and 31 observations was analyzed. The choice of Spearman's rank correlation was due to the non-normal distribution of the water quality parameters [22,68]. This test is more robust for variables moving away from the normal distribution, and deviations are minimized for correlations between the variables [69].
To test the data accuracy and suitability for PCA, Kaiser-Meyer-Olkin (KMO) and Bartlett's sphericity statistics were carried out. KMO is a measure of sampling validity, and indicates the proportion of the variance, which might be sourced by unknown factors [22,58]. KMO values larger than 0.9 indicate high validity for applying PCA, between 0.5 and 0.9 indicates sufficient validity, and KMO values less than 0.5 are considered not valid [70]. In this study, a KMO (valid) value of 0.78 was obtained. Bartlett's test of sphericity validates that the analyzed variables are correlated adequately. In this study, the significance level was less than 0.05, showing significant relationships among variables.

K-Means Clustering Analysis
K-means clustering uses criteria to segment the data based on intrinsic characteristics, proximity, or degree of similarity [71][72][73][74]. Similarly, the purpose of K-means clustering is to achieve both high (internal) homogeneity within a cluster and (external) heterogeneity among different clusters [72].
The principal objective of the K-means clustering algorithm is to partition an unlabeled dataset into K clusters (groups or categories), represented by centroids [72]. According to [75], in each repetition, instances allocated to the closest clusters founded on the Euclidean distance between instances and centroids (Equation (6)): where d is the distance, Z p is the point in the space representing a given object, Z q is cluster q, Z pj is known as the jth attribute of the pth instance, Z qj as the jth attribute of the qth cluster, and D is known as the total number of attributes. Each cluster centroid is updated through averaging all of the instances belonging to that group. For optimization purposes, the K-means algorithm attempts to minimize the sum of squared error by using Equation (7): where K is the number of clusters ( . . , Z pd and is a d dimensional vector, and m k is known as the kth cluster centroid.
In K-means clustering, an important parameter that needs to be defined is the number of clusters. There are more than 30 indices that have been published for identifying the optimal number of clusters.
To determine the optimal number of clusters in a dataset, the R package NbClust [76] was used. These provided 24 indexes for identifying the optimal number of clusters. Eight of these indexes suggested that wells are grouped in two clusters (See Supplementary Materials, Table S1). The K-means algorithm was run using R's K-means function in stats [77]. On this function, the number of clusters (k = 2) is specified, and the algorithm randomly selects each k-center. The resulting outputs are highly sensitive to the initial random assignments of the centers; therefore, multiple random initializations (25,50,75, and 100) were tested with the argument "nstart". The "set.seed" function is used to have some reproducibility of the results of the initial k-center assignments. This process allows us to evaluate the results of the K-means algorithm for each set of initial random results, by means of a practical visual evaluation and through geographic information system (GIS).

Principal Component Analysis (PCA)
The primary purpose of PCA is to explain the variance within the dataset while reducing the dimensionality of the data structure. PCA was carried out to transform the original correlated variables into a smaller set of uncorrelated variables called the principal components (PCs) [22,67,73,78]. PCs are the uncorrelated variables calculated by multiplying correlated variables with the eigenvector, and are expressed as loadings. The loadings show the contribution of a given variable to each of the extracted PCs [73].
According to [79], the principal component (PC) is expressed by the following equation: where Z is known as the component score, a is the component loading, x is the estimated value of the variable, i is the component number, j is the sample number, and r is the total number of variables.

Ionic Dominance
Maximum, minimum, and mean values of the chemical composition of groundwater are shown in the Supplementary Material (Supplementary Materials Table S2). Major cations were analyzed, and the abundance of ions was in the order of Na + > Ca 2+ > Mg 2+ > K + . Na + was in exceedance (440.95 mg/L) of the MPL (200 mg/L) [42,43]. High Na + levels can be due to the use of untreated wastewater for irrigation. In previous studies, researchers reported high Na + levels in wastewater, and were attributed to long-term irrigation practices with untreated wastewater. Calcium concentrations ranged from 21.64 to 188.78 mg/L. Magnesium concentrations ranged from 2.79 to 39.37 mg/L. K + concentrations ranged from 0.12 to 24.48 mg/L. Anions showed dominance in the following order: The high concentration of HCO 3 − can be explained by carbonate dissolution. Elevated concentrations of SO 4 2− , which are higher than the MPL (250 mg/L), can be due to mineral dissolution and the use of wastewater for irrigation. The Cl − concentration in sewage is high, so its use as irrigation water results in a significant increase in the concentrations of Cl − in the groundwater (Table S2). Nitrate showed a range of concentration between 5.7 and 83.29 mg/L. The highest NO 3 − levels can be attributed to the use of fertilizers and irrigation with untreated wastewater.

Hydrochemical Facies
Hydrochemical facies are employed for delineating the chemical composition of groundwater. In order to identify hydrochemical types and delineate the spatial variation of dominant ions, the Piper diagram [80] and Stiff diagram [81] were elaborated ( Figure 2). The trilinear diagram ( Figure 2) indicates that the dominant cations (Na + and Ca 2+ ) and anions (Cl − and HCO 3 − ) play a decisive role in defining water type. Three distinct hydrogeochemical facies were identified in the study zone. The Na + -Cl − type of water predominates among three facies with 37.9%, and the Ca 2+ -HCO 3 − type is the second dominant water type with 27.6%. The mixed Ca 2+ -Na + -Mg 2+ -HCO 3 − -Cl − (34.5%) water type is also present. Stiff diagram analysis was used to assess the spatial incidence of the major ions and water type distribution. It observed that Na + + K + -Cl − type water is located within the central and southern parts of urban and agricultural zones, as well as close to the Tula River, other natural flow streams, and irrigation canals (Figure 2). This suggests that Na + , K + , and Cl − predominance has increased by continuous irrigation with wastewater. The Ca 2+ -HCO 3 − dominated water is present on the northern portion of the study area, in the surroundings mountains (Figure 2), indicating that fresh water recharge is taking place. Mixed Ca 2+ -Na + -Mg 2+ -HCO 3 − -Cl − water type is found in the central and southern areas, in settlements, urban and agricultural zones, natural flow streams, and irrigation canals ( Figure 2). This suggests that the mixed nature of water is affected by multiple factors, such as wastewater irrigation, carbonate dissolution, and cation exchange processes.

Application of the Water Quality Index and the Quality of Water for Drinking Purposes
Water quality indexes (WQIs) were computed for the samples using the concentrations of Ca 2+ , Mg 2+ , Na + , K + , TH (CaCO 3 ), HCO 3 − , pH, EC, TDS, SO 4 2− , Cl − , and NO 3 − . WQI values range from 52 to 87, as shown in Figure 3. According to the water quality index classification (Table 3), 41.4% of the samples were polluted, 20.7% were slightly polluted, and 37.9% were acceptable for drinking purposes. The percentages indicate that groundwater treatment is warranted before use for drinking water purposes ( Table 3). The spatial distribution of the WQI indicates that groundwater unsuitable for human consumption is found in urban areas, wastewater irrigated zones, and the surroundings of the Tula River. On the other hand, high-quality groundwater is found in recharge zones, rural settlements, and seasonal agricultural fields. Low-range WQIs for groundwater (WQI > 70) are attributed to exceedances of EC, TDS, Cl − , Na + , K + , NO 3 − , pH, and TH (CaCO 3 ) (Supplementary Materials, Tables S2 and S3). This can be linked primarily to infiltration and recharge of untreated industrial-domestic wastewater and fertilizers, and also hydro-geochemical interactions. Similar studies have reported that lower quality groundwater is due primarily to high concentrations of Na + , TH (CaCO 3 ), TDS, and EC sourced from the infiltration of untreated wastewater [1,4]. Stiff diagram analysis was used to assess the spatial incidence of the major ions and water type distribution. It observed that Na + + K + -Cl − type water is located within the central and southern parts of urban and agricultural zones, as well as close to the Tula River, other natural flow streams, and irrigation canals (Figure 2). This suggests that Na + , K + , and Cl − predominance has increased by continuous irrigation with wastewater. The Ca 2+ -HCO3 − dominated water is present on the northern portion of the study area, in the surroundings mountains (Figure 2), indicating that fresh water recharge is taking place. Mixed Ca 2+ -Na + -Mg 2+ -HCO3 − -Cl − water type is found in the central and southern areas, in settlements, urban and agricultural zones, natural flow streams, and irrigation canals ( Figure 2). This suggests that the mixed nature of water is affected by multiple factors, such as wastewater irrigation, carbonate dissolution, and cation exchange processes.

Correlation Among Variables
The Spearman's rank correlation (r) was computed to measure and establish the interrelationship between two variables. The correlation matrix of 16 selected parameters is presented in Table 4. In this study, an r value greater than 0.7 indicates a high correlation, and an r value in the range of r > 0.5 or ≤0.7 indicates a moderate correlation.
Ca 2+ and Na + have the highest correlation coefficient (0.966). This suggests that cation exchange processes are occurring, where a Ca 2+ ion is taken as the exchanger by a release of Na + ions [9,10,35]. Moreover, Ca 2+ showed a positive correlation (Figure 4a samples were polluted, 20.7% were slightly polluted, and 37.9% were acceptable for drinking purposes. The percentages indicate that groundwater treatment is warranted before use for drinking water purposes ( Table 3). The spatial distribution of the WQI indicates that groundwater unsuitable for human consumption is found in urban areas, wastewater irrigated zones, and the surroundings of the Tula River. On the other hand, high-quality groundwater is found in recharge zones, rural settlements, and seasonal agricultural fields.  Tables S2 and S3). This can be linked primarily to infiltration and recharge of untreated industrial-domestic wastewater and fertilizers, and also hydro-geochemical interactions. Similar studies have reported that lower quality groundwater is due primarily to high concentrations of Na + , TH (CaCO3), TDS, and EC sourced from the infiltration of untreated wastewater [1,4].

Correlation Among Variables
The Spearman's rank correlation (r) was computed to measure and establish the interrelationship between two variables. The correlation matrix of 16 selected parameters is presented in Table 4. In this study, an r value greater than 0.7 indicates a high correlation, and an r value in the range of r > 0.5 or ≤0.7 indicates a moderate correlation. A high positive correlation between Ca 2+ and NO 3 − (0.87) suggests that the source of nitrate and calcium is the same (Figure 4c), and could be attributed to nitrate from agricultural fertilizers leaching to the aquifer [86]. Additional processes, such as nitrification, may increase the dissolution of carbonate minerals [87,88]. Mg 2+ and SAR showed a high positive correlation (0.913), indicating a cation exchange process. When there are high concentrations of Na + in irrigation water, Na + ions are absorbed by clay particles, exchanging Mg 2+ ions [1,89] (Supplementary Materials, Figure S1). The high positive association between Na + and total hardness (0.708) reveals the strong influence of Na + ion on the increase in TH (CaCO 3 ), through the dissolution of bicarbonates [26] and cation exchange [23,69] (Figure 4b).
High positive correlations of 0.860 and 0.783 were obtained by K + with SO 4 2− and TH, respectively.
This indicates that potassium ions may originate naturally from the dissolution and weathering of K-feldspars and carbonate rocks, as well as from two anthropogenic sources: (1) agricultural fertilizers and (2) wastewater infiltration and recharge [1,24,36,90]. High correlations between pH and K + and SO 4 2− (0.952 and 0.818, respectively) were obtained.
This could be due to the recharge of potassium and sulfate salts from the weathering process of rocks and the continuous usage of wastewater for irrigation [1,36,74]. The pH showed a positive correlation with TDS (0.622) and EC (0.627). This suggests that pH is defined primarily by the leaching of salts sourced from long-term usage of sewage for irrigation, which circulates the water within the aquifer. Wastewater is mostly alkaline, with significant variations of EC and TDS [1,35,88]. High salinity levels are caused by dissolution of minerals and soluble salts, such as Cl − , Ca 2+ , Mg 2+ , and Na + [1,10]. These results establish the significant contribution of the ions to the mineralization and salinization processes (Figure 4d-f). A moderate positive correlation (0.514) between pH and temperature could indicate more intensive water-rock interaction and the infiltration of warm wastewater [91,92]. A high positive correlation (0.939) between pH and WQI suggests that pH is a good indicator of groundwater quality in regions with significant anthropogenic influence. Ca 2+ and Na + have the highest correlation coefficient (0.966). This suggests that cation exchange processes are occurring, where a Ca 2+ ion is taken as the exchanger by a release of Na + ions [9,10,35]. Moreover, Ca 2+ showed a positive correlation (Figure 4a,b) with CO3 2− (0.785) and HCO3 − (0.772), indicating the dissolution and precipitation of calcite [9,74,[82][83][84][85]. A high positive correlation between Ca 2+ and NO3 − (0.87) suggests that the source of nitrate and calcium is the same (Figure 4c), and could be attributed to nitrate from agricultural fertilizers leaching to the aquifer [86]. Additional processes, such as nitrification, may increase the dissolution of carbonate minerals [87,88].
Mg 2+ and SAR showed a high positive correlation (0.913), indicating a cation exchange process. When there are high concentrations of Na + in irrigation water, Na + ions are absorbed by clay particles, exchanging Mg 2+ ions [1,89] (Supplementary Materials, Figure S1). The high positive association between Na + and total hardness (0.708) reveals the strong influence of Na + ion on the increase in TH (CaCO3), through the dissolution of bicarbonates [26] and cation exchange [23,69] (Figure 4b).
High positive correlations of 0.860 and 0.783 were obtained by K + with SO4 2− and TH, respectively. This indicates that potassium ions may originate naturally from the dissolution and

Cluster Analysis
Cluster analysis by the partition K-means method was used to identify similarity groups among the sampling sites. The results of spatial cluster analysis for the 31 water samples is shown in Figure 3. Two distinct groups were identified with distinct water quality characteristics. Cluster 1 includes 38% of the total samples, and 62% of the water samples were classified in cluster 2. Cluster 1 shows characteristics of high salinity by wastewater recharge and agricultural pollution. Groundwaters of cluster 1 were defined by elevated concentrations of EC, TDS, Na + , SO 4 2− , Cl − , K + , and NO 3 − (Table 5). Cluster 2 is characterized by mineralization processes from rock-water interaction, such as the dissolution of carbonates (calcite and dolomite). Cluster 2 groundwater is characterized by high concentrations of TH (CaCO 3 ), Ca 2+ , K + , and pH, as well as high temperature (Table 5)  EC and TDS in cluster 1 were found to have higher concentrations (mean = 1552.55 and mean = 1003.49, respectively) than in cluster 2 (mean = 733.11 and mean = 513.92, respectively) as shown in Table 5. High EC and TDS can be attributed to infiltration and recharge with wastewater. There is evidence that long-term irrigation practices with wastewater increase EC and TDS in groundwater [1,10]. The concentrations of Na + , Cl − , SO 4 2− , K + , and NO 3 − in cluster 1 are higher on average than those in cluster 2, and may be associated with wastewaters. Na + and Cl − concentrations in wastewater are very high, and irrigation, infiltration, and recharge increase their presence in groundwater [88,93]. Studies reported by the WHO have demonstrated that agriculture is the primary source of NO 3 − released to the groundwater [43]. A significant increase in the concentrations of SO 4 2− and K + are linked with agricultural fertilizers and sewage effluents [27,36,90]. The water wells classified in cluster 1 are located in urban areas, wastewater irrigated zones, and the nearby Tula River. Meanwhile, water samples from wells and streams classified in cluster 2 are located in the vicinity of urban settlements and temporary agricultural fields (Figure 3).

Principal Component Analysis (PCA)
PCA was performed on a groundwater dataset consisting of 31 observations and 14 physicochemical parameters. Three main PCs were extracted with an eigenvalue greater than 1, accounting for 85.1% of the total variance in the groundwater quality dataset (Table 6 and Figure 5). The first component (PC1) explains 56.31% of the total variance. PC1 has a robust negative loading of −0.87 to −0.988 on TDS, EC, temperature, Cl − , and Na + , and a robust positive loading of 0.96 on WQI. PC1 indicates that the occurrence of salinization processes in the aquifer is associated with anthropogenic activity. Inverse correlation between WQI and TDS, EC, Cl − , and Na + indicates that the water quality is characterized by changes in salinity. An increase in the WQI results in a decrease in the concentrations of Cl − , Na + , EC, TDS, and pH, as shown in Figure 6c,f and Figure 7a-c. High salt (TDS, EC, Cl − , and Na + ) concentrations are caused by irrigation with wastewater practices occurring for over a century [12,15]. Similar studies in other countries have reported the impacts of wastewater irrigation on groundwater quality [1,4,10]. The second component (PC2) describes 18.53% of the variability of the data. PC2 has a robust positive loading of 0.981 and 0.963 for Ca 2+ and TH (CaCO3), respectively. PC2 has negative correlations of −0.831 and −0.76 for pH and SAR, respectively. PC2 describes changes in water quality by the increase in hardness. An inverse correlation between TH (CaCO3), Ca 2+ , and pH reveals that the significant contribution of hardness is linked to the dissolution of salts coming from infiltrated wastewater, in addition to carbonate dissolution and evaporation. Similar studies of the wastewater irrigation show similar findings [1,4]. The spatial distribution showed an increase in hardness in wastewater irrigated lands and the surroundings of the Tula River, as shown in Figures 1 and 7f.
The third component (PC3) explains 10.26% of the total variance. PC3 has strong positive loadings of 0.951, 0.926, and 0.705 for K + , Mg 2+ , and HCO3 − , respectively. PC3 has a negative loading of −0.876 for SO4 2− . PC3 explains the groundwater pollution by sulfate, potassium, and magnesium concentrations sourced from rock-water interactions, as well as wastewater infiltration and recharge. The inverse correlation between SO4 2− and K + , Mg 2+ , and HCO3 − indicates that the principal source of sulfate is associated with agricultural fertilizers [36,90] and industrial and domestic wastewater, as well as their effluents [24,27]. The concentrations of SO4 2− were significantly higher in agricultural zones (Figures 1 and 6e). In other areas, sulfate concentrations decreased while concentrations of K + and Mg 2+ increased, as shown in Figure 6b,d,h. This reveals that the significant contribution of magnesium and potassium is due to the interactions with the underlying basaltic rock [17,94], and by the dissolution and weathering of silicate minerals, such as potassium, feldspar minerals, and clays [9,24,95]. The second component (PC2) describes 18.53% of the variability of the data. PC2 has a robust positive loading of 0.981 and 0.963 for Ca 2+ and TH (CaCO 3 ), respectively. PC2 has negative correlations of −0.831 and −0.76 for pH and SAR, respectively. PC2 describes changes in water quality by the increase in hardness. An inverse correlation between TH (CaCO 3 ), Ca 2+ , and pH reveals that the significant contribution of hardness is linked to the dissolution of salts coming from infiltrated wastewater, in addition to carbonate dissolution and evaporation. Similar studies of the wastewater irrigation show similar findings [1,4]. The spatial distribution showed an increase in hardness in wastewater irrigated lands and the surroundings of the Tula River, as shown in Figures 1 and 7f.
The third component (PC3) explains 10.26% of the total variance. PC3 has strong positive loadings of 0.951, 0.926, and 0.705 for K + , Mg 2+ , and HCO 3 − , respectively. PC3 has a negative loading of −0.876 for SO 4 2− . PC3 explains the groundwater pollution by sulfate, potassium, and magnesium concentrations sourced from rock-water interactions, as well as wastewater infiltration and recharge. The inverse correlation between SO 4 2− and K + , Mg 2+ , and HCO 3 − indicates that the principal source of sulfate is associated with agricultural fertilizers [36,90] and industrial and domestic wastewater, as well as their effluents [24,27]. The concentrations of SO 4 2− were significantly higher in agricultural zones (Figures 1 and 6e). In other areas, sulfate concentrations decreased while concentrations of K + and Mg 2+ increased, as shown in Figure 6b,d,h. This reveals that the significant contribution of magnesium and potassium is due to the interactions with the underlying basaltic rock [17,94], and by the dissolution and weathering of silicate minerals, such as potassium, feldspar minerals, and clays [9,24,95].

Conclusions
The results obtained from statistical analysis and Piper and Stiff diagrams indicate that cation predominance in groundwater is in the following order: Na + > Ca 2+ > Mg 2+ > K + . In addition, the predominance of anions is HCO3 − > SO4 2− > Cl − > CO3 2− . The prevailing hydrochemical types of groundwater disclosed by the Piper and Stiff diagrams was found to be Na + -Cl − (37.9%), Ca 2+ -HCO3 − (27.6%), and Ca 2+ -Na + -Mg 2+ -HCO3 − -Cl − (34.5%). The Na + -Cl − groundwater type can be attributed to wastewater irrigation, local hydrogeological conditions, and shallow groundwater evaporation; this groundwater type is found on the central and southern portions of the study area, characterized by urban and agricultural zones near the Tula River, as well as other natural flow streams and irrigation canals. The Ca 2+ -Na + -Mg 2+ -HCO3 − -Cl − groundwater type suggests the mixing of native groundwater and infiltrated wastewater; this groundwater type is found on the central and southern portions of the study area, characterized by rural and urban settlements, as well as agricultural land with natural flow streams and irrigation canals. The Ca 2+ -HCO3 − type indicates that natural recharge is occurring on the northern, mountainous part of the study area. in the study area.

Conclusions
The results obtained from statistical analysis and Piper and Stiff diagrams indicate that cation predominance in groundwater is in the following order: Na + > Ca 2+ > Mg 2+ > K + . In addition, the predominance of anions is HCO 3 − > SO 4 2− > Cl − > CO 3 2− . The prevailing hydrochemical types of groundwater disclosed by the Piper and Stiff diagrams was found to be Na + -Cl − (37.9%), Ca 2+ -HCO 3 − (27.6%), and Ca 2+ -Na + -Mg 2+ -HCO 3 − -Cl − (34.5%). The Na + -Cl − groundwater type can be attributed to wastewater irrigation, local hydrogeological conditions, and shallow groundwater evaporation; this groundwater type is found on the central and southern portions of the study area, characterized by urban and agricultural zones near the Tula River, as well as other natural flow streams and irrigation canals. The Ca 2+ -Na + -Mg 2+ -HCO 3 − -Cl − groundwater type suggests the mixing of native groundwater and infiltrated wastewater; this groundwater type is found on the central and southern portions of the study area, characterized by rural and urban settlements, as well as agricultural land with natural flow streams and irrigation canals. The Ca 2+ -HCO 3 − type indicates that natural recharge is occurring on the northern, mountainous part of the study area. K-means clustering and PCA were successfully applied to assess the groundwater quality and identify hydrogeochemical processes. Three components defined in the PCA are responsible for 85.1% of the total variance in the hydrochemical dataset, and were found to be dominated by three processes: salinization, mineralization, and groundwater contamination. Salinization processes are controlled by high Cl − and Na + concentrations and high EC and TDS, attributed to wastewater irrigation and mineralization processes. Mineralization process are dominated by Ca 2+ , TH (CaCO 3 ), pH, and SAR, and are related to the dissolution of salts in infiltrated wastewater, as well as carbonate dissolution and evaporation. Groundwater contamination is defined by high levels of sulfate, potassium, and magnesium concentrations sourced from rock-water interaction, as well as wastewater irrigation, infiltration, and recharge.
The K-means algorithm identified two distinct groundwater classes. The first class (cluster 1), characterized by low quality (WQI < 70), warrants significant water treatment prior to human consumption. This class of water was found in urban areas, wastewater irrigated zones, and the surroundings of the Tula River. This type of water is characterized by high concentrations of TDS, Cl − , and Na + , as well as high EC from long-term irrigation practices with wastewater and mineralization. Significant SO 4 2− , K + , and NO 3 − concentrations were linked to agricultural pollution by wastewater irrigation and agricultural fertilizers. The second class (cluster 2) is characterized by higher quality (WQI between 70 and 89), and is suitable for drinking purposes after some water treatment. This type of water was found in groundwater recharge zones, rural settlements, and seasonal agricultural fields. Cluster 2 groundwater is associated with mineralization processes sourced by rock-water interaction, such as the dissolution of carbonates. Waters of cluster 2 were associated with high concentrations of TH (CaCO 3 ), Ca 2+ , HCO 3 − , and K + , as well as high pH and temperature. Na + plus the K + -Cl − type of water was related to cluster 1, while the Ca 2+ -HCO 3 − type of water was associated with cluster 2. The results indicate good agreement between the results from the Piper and Stiff diagrams and K-means clustering. A suite of multivariate statistics coupled with geo-spatial analysis proved useful for the identification of hydrogeochemical processes in the evolution of groundwater. The methodology used in this study will be useful for local water resource managers for developing strategies to mitigate and prevent groundwater contamination.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4441/11/8/1702/s1, Figure S1: Spatial distribution map of the sodium adsorption ratio (SAR) in the study zone, Table S1: Number of clusters suggested by each index, Table S2: Basic statistics of analyzed hydrochemical parameters in the study area, Table S3: Correlation of groundwater quality with World Health Organization (WHO) and Mexican Official Standards for drinking purposes.