1. Introduction
Economic development, industrialization and urbanization, along with population growth, lead to an accelerated water consumption, which has generated concern of fresh water as a scarce resource [
1,
2]. Water quality is an important factor that affects human health and ecological systems [
3]. In rural locations, groundwater is the support of agricultural activities and it is essential is an important standard for crop production and food security [
4]. Due to, pollutants present in irrigation water can get accumulated in crops, causing serious clinical and physiological problems to humans when consumed it in large amounts in food [
5,
6]. In general, water quality for various applications is determined by its physical characteristics, chemical composition, biological parameters and uses [
1,
7]. These parameters reflect the inputs from natural sources, including atmosphere, soil and particular geological characteristics of each region, as well, as anthropogenic influence of various activities [
8,
9,
10].
The evaluation of water quality in most countries has become a critical issue in recent years [
2]. Water quality is subject to constant changes due to seasonal and climatic factors [
8]. Likewise, spatial variations emphasize the need of water monitoring that provides a representative and reliable estimate [
11]. Recently, several approaches have been used for water quality determination. Among them, it can be found methods based on modeling, monitoring or statistic techniques [
12]. Modeling tools such as Soil and Water Assessment Tool (SWAT) or Agricultural Nonpoint have been employed to evaluate water quality at watershed scale. The commonly used statistic techniques for the monitoring of water quality include: Ordinary Least Square (OLS), Geographic Weighted Regression (GWR), among others. The monitoring techniques provide knowledge information for decision-making, regarding water quality [
12]. However, in comparison to these approaches, multivariate techniques such as Principal Component Analysis (PCA) and Cluster Analysis (CA) could be used to analyze big water quality databases without losing important information [
13,
14,
15].
Multivariate techniques and exploratory data analyses are appropriate for the data synthesis and its interpretation [
16]. Classification, modeling and interpretation of the monitored data are the most important steps in the evaluation of water quality [
17,
18,
19]. The application of multivariate statistical techniques, such as principal component analysis (PCA) or Cluster Analysis (CA), has significantly increased in recent years, especially for the analysis of environmental data and extracting significant information [
20,
21,
22]. Additionally, these analyses have been reported as effective methods for the characterization and evaluation of water quality parameters [
9]. PCA and CA are the most common multivariate statistical methods used in environmental studies [
23].
The PCA is a mathematical technique used to reduce the dimensions of multivariate data and explain the correlation between a large number of variables observed by extracting a smaller number of new variables (i.e., the principal components or PC) [
24,
25,
26]. The CA helps grouping objects (cases) based on homogeneity and heterogeneity between groups. The clusters characteristics are not known in advance but may be determined in the analysis. Such analysis benefits the interpretation of the data by pointing out associations among the studied variables [
27,
28]. The application of different multivariate statistical techniques aids in the interpretation of complex data matrices to better understand the water quality and ecological status of the studied systems [
11,
29]. It also allows the identification of possible factors or sources that influence water systems and offers a valuable tool for the reliable management of water resources, both in quantity and quality [
21]. Previous research has shown that the use of multivariate analysis allows defining new variables that provide information on the variability of environmental data, as well as on the influence of each variable [
30,
31].
Furthermore, interpolation methods have been employed to map the spatial distribution of soil properties [
32,
33], heavy metals [
34,
35], population characteristics [
36], precipitation [
37,
38], radioactive elements [
39,
40], among others. Data interpolation offers the advantage of projecting maps or continuous surfaces from discrete data [
41]. Therefore, spatial interpolation techniques are essential to create a continuous (or predictable) surface from values of sampled points [
37]. Interpolation is an efficient method to study the spatial allocation of elements, their inconsistency, reduce the error variance and execution costs [
42]. The interpolation methods are useful to identifying contamination sources, assessing pollution trends and risks [
43,
44]. A growing number of studies have shown the need to determine the spatial distribution of pollutants. Spatial data helps to define areas where risks are higher and contribute in making decision to identify the locations where remediation efforts should be concentrated [
45]. However, one of the characteristics of the spatial distribution of pollutants lies in their frequent spatial heterogeneity [
46].
Few studies that combine multivariate techniques and interpolation methods have been completed [
47,
48] and many times, the studies are analyzed univariately. Therefore, the selection of PCA and CA methods was made to understand the multivariate relationship between parameters in this research, these techniques were used to compare the grouping analysis and the interpolations, to understood the spatial distribution of the parameters and even more the spatial distribution of the PCs. The objective of the present study was to analyze eight physicochemical parameters (PhP) in water samples from wells of the Irrigation District 005 (ID005) in Chihuahua, Mexico; perform a data analysis using multivariate techniques to evaluate the PhP contribution in water quality and apply interpolation methods to analyze the spatial variation of the PhP.
3. Results
3.1. Analysis of Physicochemical Parameters (PhP)
Table 2 shows the results obtained from the As and the PhP analysis of water samples, the maximum permissible levels (MPL) established according to Mexican regulations for each parameter, and the percentage of samples exceeding the limits.
In the two seasons, TDS was the parameter with the highest percentages of samples exceeding the MPL of the Mexican regulation (35% in S1, 39% in S2). Turbidity exceeded the MPL in S1 (29%) in more samples than S2 (12%). As concentrations were above the MPL of water for agricultural irrigation in 9% (Summer) and 13% (Fall) of the wells.
3.2. Multivariate Analysis
The correlation analysis reported the existence of significant positive and negative correlations (
p > 0.05 and
p < 0.0001) among the values of PhP from the first sampling (
Table 3). As was positive moderately correlated with Turbidity and pH, and negative moderately correlated with hardness. EC was positive moderately correlated with TDS and Hardness; and negative moderately correlated with ORP. Regarding TDS was moderately correlated with Turbidity (negative) and Hardness (positive). Furthermore, pH and ORP were correlated positive strongly and negative moderately, with Temperature, respectively. The poor correlation between the other pairs of PhP indicates the presence of other variation sources.
In S2 (
Table 4) it was observed that As was negative moderately correlated with Hardness while it was positive strongly correlated with pH. The EC was positive moderately correlated with Hardness and strongly correlated with TDS. Likewise, TDS was moderately negative correlated with Turbidity and ORP; and positive correlated with hardness.
3.3. Principal Components Analysis (PCA)
The assumption that the parameters are linearly related was verified, then the PCA was realized to explore the relationships among the eight PhP. The first four PCs in S1 explained 87% of the variance (
Table 5). In S1, PC1 contributed with 34% of the variance, PC2 with 30%, while PC3 and PC4 contributed with 12% and 9%, respectively. The dominant PhP in PC1 were As, pH and Temperature. Considering
Table 3, there is a significant correlation between As and pH (r = 0.44,
p < 0.05). In PC2, the coefficients that contributed the most were EC, TDS and Hardness. The parameters correlated were: EC and TDS (r = 0.625,
p < 0.0001), EC and Hardness (r = 0.493,
p < 0.0001) and TDS with Hardness (r = 0.586,
p < 0.0001). The PC3 was influenced by Turbidity and PC4 by As and ORP.
Regarding S2, 86% of the variance was explained by considering four PCs (
Table 5). The components contributed with 35%, 24%, 16% and 9% to PC1, PC2, PC3 and PC4, respectively. The PC1 was influenced by EC, TDS and Hardness with weak coefficients. These parameters strongly and moderately correlated as follows: EC and TDS (r = 0.89,
p < 0.0001), EC and Hardness (r = 0.54,
p < 0.0001), Hardness and TDS (r = 0.45,
p < 0.05). The PC2 was influenced by As and pH, with moderate and highly correlated coefficients (r = 0.77,
p < 0.0001). The PC3 explained the Turbidity and Temperature variability, with moderate to strong coefficients and PC4 was influenced by ORP. In
Table 4 it was observed that there is a highly significant correlation of EC with TDS and Hardness, which indicates that these three components explain a large amount of variation in the study area.
The grouping of the sites is shown in the displacement plane of the first two PCs (
Figure 3). The PhP for S1 and S2 were organized into 4 groups. Group 1: As, pH and Temperature; Group 2: EC, TDS and Hardness; Group 3: Turbidity; and Group 4: As and ORP. In regards to S2, Group 1 was composed of: Hardness, EC, TDS; Group 2: As and pH; Group 3: Turbidity and ORP; and Group 4: Temperature. Gebreyohannes et al. [
68] determined in their area of study that TDS, Hardness and EC were positively associated and these were negatively associated with pH and Turbidity.
The comparison plots of PC1 vs. PC2 in each sample indicate the displacement through components 1 and 2. These components, together explain more than 60% of the total variation. As, pH and Temperature in S1 move to the right side, indicating its dominance in PC1. While in S2, only As and pH remain dominant but in PC2. Furthermore, in S1 the variables with the greatest influence in PC2 were EC, TDS and Hardness and in S2 were also EC, TDS and but in PC1. The change in the station strongly influences the way in which the parameters are expressed in the well water, which explains the displacement of the parameters between stations.
3.4. Cluster Analysis (CA)
The definition of the number of groups was made considering the value of R
2 and the criterion of pseudo T
2. The value of R
2 indicated that, with four groups, up to 76% of the variability in S1 was explained, while in S2 with four groups 85% was explained. A line was drawn in both dendrograms to confirm the value of R
2. This line is imaginary and is traced in the dendrogram to support the definition of the groups [
69]. The pseudo T
2 was useful to reaffirm the decision of the four groups, showing a value of 77.3 and 9.2, respectively [
62]. The groups were significantly different based on the MANOVA test (
F = 25.65, λ of Wilk’s = 0.002,
p < 0.0001).
In S1, Group 1 was made up of 9 wells; Group 2 was the largest with 38 wells; Group 3, the smallest with 7 wells and Group 4 included 9 wells (
Figure 4). Each group was characterized with the average of the variables per group, presented in
Table 6. In S1, Group 1 consisted of high values of As (0.098 mg/L), Turbidity (687.6 NTU), pH (8.2) and EC (1117.7 μS/cm), and low values of TDS and Hardness (93.8 and 144.3 mg/L, respectively). Group 2 had moderate values with respect to almost all PhP, only with a low Turbidity value (3.9 NTU). Group 3 also presented moderate values in most of the PhP, with the exception of EC at low concentrations (15.3 μS/cm), and Turbidity at high concentrations (295.6 NTU). Lastly, Group 4 showed high values of TDS (883.9 mg/L), Hardness (497.4 mg/L), EC (1773.4 μS/cm) and Temperature (25.1 °C), while the lowest values corresponded to Turbidity (38.0 NTU). The remaining PhP had moderate values.
In the S2, Group 1 was the largest with 18 wells; Group 2 was the smallest with 6 wells; Group 3 was comprised of 17 wells; and Group 4 of 9 wells (
Figure 4). Group 1 was formed with high Turbidity values (196.1 NTU). Group 2 showed the highest values of Turbidity (164.5 NTU) and the lowest values of As (0.005 mg/L) and TDS (78.9 mg/L) among all groups. Group 3 showed high values of As (0.106 mg/L), EC (1.102.5 μS/cm), TDS (563.4 mg/L), Turbidity (157.6 NTU) and pH (8.02). Finally, Group 4 had the highest values of EC (1779.1 μS/cm), TDS (885.3 mg/L), Hardness (384.7 mg/L) and the lowest Turbidity (5.2 NTU), pH (7.3) and ORP (195.2 mV) values when compared to the other groups (
Table 6).
The spatial distribution of S1 and S2 was observed by linking the database derived from the CA with the vector file of wells, using ArcMap 10.3©. In S1, the distribution of Group 1 (high values of As, Turbidity and pH) was homogeneous in the northern part of the ID005, located in modules 7 and 8. Group 2, with the highest number of wells (moderate values of all PhP), included modules 4, 7, 8, 9 and 3. Group 3 (high Turbidity), was located in module 6, showing a homogeneous spatial grouping. Group 4 (high TDS, Hardness and EC) was defined in modules 4, 8, 7 and 3, being the group with the greatest geographical dispersion.
In S2, Group 1 (high Turbidity) was homogeneously distributed between module 9 and 6. Group 2 (high Turbidity) was placed in module 6, only with one observation in module 8. Group 3, with high magnitudes of As, EC, TDS, Turbidity and pH, was presented in module 7, with some observations in modules 8 and 4. Group 4, with high magnitudes of EC, TDS and Turbidity, was distributed in the boundary between module 3 and 4 in the southwest portion of the ID005 (
Figure 5).
3.5. Spatial Variability of Physicochemical Parameters
The maps of the PhP are shown in
Figure 6 and
Figure 7. The areas with low concentrations are colored in yellow, the blue colored areas represent moderate concentrations while the red colored areas represent high concentrations.
In S1, the PhP As, pH and Temperature showed a similar distribution where the highest and moderate concentrations are found in modules 6 and 7. The pattern of As (high concentration) may be due to a geological mineralization process [
70], which seems to be present in these modules. Likewise, Hardness and TDS show a similar distribution. The highest concentrations are predominantly distributed in modules 3, 4 and 5. The EC shows a distribution pattern similar to Hardness and TDS but with some variations. The highest concentrations prevailed in modules 3, 4, 7 and 8.
In S2, As and pH showed a similar pattern with high concentrations in the northern part of module 7. The values of EC, TDS and Hardness showed a very similar spatial distribution in modules 8, 7, 4 and 3, at high concentrations. The Temperature and Turbidity PhP presented a similar pattern of high concentrations in module 6. Finally, ORP was the only variable that did not show a spatial behavior similar to the rest of the PhP.
The similarities in the spatial distribution among the PhP confirm the results of the multivariate analysis, where As-Turbidity, pH-Temperature, Hardness-TDS-EC and As-pH were grouped in S2, while TDS-EC-Turbidity were grouped in S1. In a study conducted by Li and Feng [
71], similarities in the spatial distribution of the elements were found. In both, the S1 and S2 samplings, the spatial behavior for As, EC, Hardness, ORP, pH and TDS was similar. The PhP that varied were temperature and Turbidity. Temperature showed a greater variability in S1 compared to S2, which may be associated with the variation of the rest of the PhP.
Likewise, the coefficients of each PC of the PhP were used together with the geographical coordinates of the wells to generate the interpolation of the PCs (
Figure 8). These interpolations spatially indicate the multivariate relationships among the PhP.
In S1, the interpolation of the PC1 coefficients (33% of the variability explained) indicated that in the yellow areas (negative coefficients) low concentrations of As existed (0.000005 mg/L). These areas registered values of pH between 7.5–7.8 and Temperatures between 22–23 °C. Conversely, the areas in red are those with high concentrations of As (0.33 mg/L), pH (9.5) and Temperatures of 30 °C. PC2 (30%), influenced by EC, TDS and Hardness, presented negative coefficients (areas in yellow), indicating the presence of low values of EC (13–16 μS/cm), TDS (20–100 mg/L) and Hardness (100–200 mg/L). PC3 (12%), influenced by Turbidity, depicted areas with negative coefficients indicating the presence of low values for this parameter (1.0 NTU). Meanwhile, positive coefficients corresponded to areas with high concentrations (900 NTU). Although Turbidity has the highest coefficient in the matrix of eigenvectors, Temperature also shows a similar behavior in the database (not shown). In this database, the negative coefficients correspond to zones with temperatures of 30 °C and positive coefficients to areas with temperatures of 24 °C. Finally, PC4 (9%) is influenced by As and ORP. The areas with negative coefficients correspond to low concentrations of As (0.000005 mg/L) and ORP (85–113 mV). The areas in red correspond to high concentrations of As (0.27–0.34 mg/L) and ORP (250 mV).
In the S2, PC1 (35%) represents the variability of EC, TDS and Hardness. In this component, the red areas represent EC values (1863 μS/cm), TDS (932.33 mg/L) and Hardness (594 mg/L). The concentrations in yellow are for EC (1078 μS/cm), TDS (573 mg/L) and Hardness (0 mg/L), which are distributed in module 6 and the northern part of 7. PC2 (26%) explains the variability of As and pH, where the high concentrations are distributed in module 7. In this module, the coefficients with positive value indicate As concentrations of 0.575 mg/L and pH of 8.9, while in modules 6 and 5, the concentrations are the lowest (As = 0 mg/L, pH = 7.4). The distribution of PC3 (15%) explains the variation of Turbidity and Temperature. In these zones (red color), the PC explains Turbidity concentrations of 519 NTU and Temperature 23.8 °C. Turbidity values of 0.43 NTU and Temperature of 27.5 °C are reported in the yellow zones. PC4 (9%) represents the variability of ORP. The zones in red tone indicate ORP concentrations of 306 mV and in yellow values of 106 mV.
4. Discussion
The multivariate techniques and the interpolation were useful to interpret the relationship between the PhP. The PCA has been previously used to examine and interpret the behavior of groundwater quality parameters [
72,
73,
74,
75]. The relationship between the PhP provided significant information on the possible sources of these parameters. In this study, four components were needed to explain the original data set. The four components showed that the behavior of the PhP in the wells was governed by more than one process or phenomenon.
According to Yidana et al. [
76], in the analysis of dimensionality reduction of variables PC1 usually represents the most important mixture of processes in the study area. The PCA results suggest that most variations in water quality were found in S1 (summer) for As, pH and Temperature followed by EC, TDS and Hardness; and in S2 (Fall) for EC, TDS and Hardness, followed by As and pH. According to Bonte et al. [
77], the increase in temperatures was associated with the As increase, which was shown in S1 where the main variables that explained the total variability were shown. The above was also demonstrated in the CA and the spatial interpolation of the individual parameters and the main components where high temperature zones show a spatial distribution similar to As.
The variables that were grouped in PC1 and PC2 of each sampling season had similar coefficients, which imply the existence of some similarities in the way they influence the groundwater concentration. It was observed that As, EC, TDS, Hardness and pH were shown in components 1 and 2 in both samples. This was consistent with the results of the interpolation, showing that the distribution of these parameters had a similar dispersion in the ID005. The interpolations of the PC coefficients are similar to the maps of the main components derived from the water sampling from the wells. Previous studies have shown similar results to improve the interpretation of PC [
78,
79]. These variables together accounted for more than 60% of the variability of the original data set. Also, it was observed that there was an exchange of these variables in PC1 and PC2. This behavior may have been caused the result of the different sampling seasons.
These relationships agree with the natural dynamics of water PhP. The pH is the main factor that controls the concentrations of soluble metals [
80]. As well as, the arid climate leads to evaporation which can interfere in the concentration of As [
9] and cause seasonal variations. It was observed that As concentrations higher than the MPL (0.1 μg/L) of water for agricultural irrigation established in the Mexican regulation [
67], were presented in the northern area of the territory in both S1 and S2. The EC showed a significant correlation with parameters such as Hardness, TDS [
10], which can be related to water salinity [
9].
In wells that contain high amounts of As, the pH is also high. This was reaffirmed by the CA method and interpolation, where Group 1 showed the highest As and pH values for S1 and S2. The first two main components (PC1 and PC2) in both seasons (S1 and S2) showed similar variations. This same behavior was observed in wells where high concentrations of EC, TDS and Hardness were obtained, which was also observed by the CA and spatial interpolation.
For this study, the wells near the city showed the highest concentrations of EC, TDS and Hardness. The EC and TDS measurements for the S1 and S2 samples showed that the salinity is classified according to the Food and Agriculture Organization of the United Nations (FAO) as moderate (EC 700–3000 μS/cm, TDS 450–2000 mg/L), especially in the southern area of ID005 [
81]. EC and high TDS limit the absorption of water by crops because of the salt that stays in the roots. Due to the difficult access to water, the growth rate of plants is reduced, which limits agricultural production [
61]. The EC and the TDS in groundwater samples are significantly correlated with cations and anions (Ca
2+, K
+, Na
+, Cl
+, NO
3− and SO
42−), which can be the result of ionic changes in the aquifer [
9].
In the case of Hardness, it was within the MPL established by the Mexican regulation (500 mg/L) [
52]. However, according to Gebreyohannes et al. [
68], water with Hardness greater than 151 mg/L is classified as hard water. Considering these criteria, 75 and 77% of the samples (S1 and S2, respectively) were classified as hard water. This classification may indicate that there are deposits with high Mg
2+ and Ca
2+ contents [
68]. Likewise, it is considered that hard water is not suitable for industrial and agricultural purposes [
10].
The interpretation of the spatial behavior of the water quality in the studied area was possible when the scores derived from the PC were mapped. Previous studies have used geostatistical methods to map the scores resulting from PCA and used the resultant maps to predict the factors that may be impacting groundwater quality [
82,
83,
84]. Based on the results of PCA and CA it was possible to understand the multivariate relationship of the set of parameters. In turn, with the application of the IDW interpolation technique on the scores of the PCA, it was possible to analyze the spatial variability. The combination of both methods was useful to examine patterns in common groups of parameters allowing to summarize the multiple relationship of variables on geographic regions to use in water quality analysis.
The PCA, the CA, the correlation coefficients and the interpolation were consistent with these interpretations. Although the results of the present study provided important conclusions regarding the origin of each PhP, more studies are needed to obtain a better understanding of the sources of the PhP and their concentrations.
5. Conclusions
The As is an element present in the ID005, and it is at levels that can cause a risk to agricultural production, mainly in the northern region. In addition, it is important to continue this investigation to determine the As traceability in the medium and to identify the risk of introducing this metalloid to the food chain by diet intake.
A slight issue was observed with indicators that affect the salinity of water. If such high levels persist, it can be detrimental to the optimal development of crops. Therefore, it will be necessary to look for alternatives to ameliorate this situation. Perhaps, starting with a continuous monitoring of wells in the ID005.
Multivariate statistical methods and spatial interpolation can be useful to identify locations of priority concern and potential sources of PhP, and to evaluate water quality from wells in an agricultural area. The multivariate geographic information system (GIS) approach showed the spatial relationships between the PhP (As, pH, EC, TDS and Hardness), proving to be convenient for the confirmation and refinement of PhP interpretations through the statistical results.