Spatio-Temporal Analysis of Natural and Anthropogenic Arsenic Sources in Groundwater Flow Systems

The presence of arsenic in groundwater constitutes a hazard for the environment and human health, and the determination of its source has become a global challenge, which can be approached by defining the natural background levels (NBL) in conjunction with the indicator kriging method, with the aim of delineating anthropogenically contaminated areas. However, having a unique value of NBL for large areas can generate interpretation errors. This research integrates the determination of the flow systems present in the Calera Aquifer, and the definition of the natural background levels in each flow system by making estimation maps in ArcGIS using two databases, 10 years apart, to evaluate the spatio-temporal variation of arsenic in groundwater. The results indicate a notable increase in the probability of exceeding the arsenic NBL, mainly in the intermediate flow, which may be due to movement resulting from mining activities as well as a mixture of regional and intermediate flows caused by the extraction of water for agriculture and drinking water supplies. The presented values exceed the maximum limits allowed for human consumption, as stated by the World Health Organization.


Introduction
The natural and anthropogenic occurrence of arsenic in groundwater has become a concern in human health studies [1]. It is estimated that millions of people are at risk of the adverse effects of arsenic exposure through drinking water [2], which results in chronic poisoning that leads to changes in skin pigmentation and thickening, as well as various types of cancers of the skin, lungs, bladder, and kidney. Thus, in 1980, the international agency for cancer research classified arsenic in group one, as a carcinogen for humans beings [3].
Arsenic in groundwater has been identified in 105 countries in the world, and the population exposed to a concentration greater than the reference value indicated by the World Health Organization (WHO) is estimated to be to >200 million worldwide [4]. The natural occurrence of high values of arsenic in groundwater has been reported in different geologically young aquifers around the world [5][6][7]. Some research reports that the pollution of groundwater by arsenic may be due to agricultural activities in connection with the application of fertilizers and pesticides [8,9], as well as by mining activities [6,10,11].
Variation in the concentrations of arsenic in groundwater may arise in response to natural or anthropogenic factors. Natural variation in arsenic concentrations might be expected to occur in response to climatic and seasonal changes, such as wet and dry periods, although these factors have rarely been directly linked to variation in arsenic concentrations. Anthropogenic factors that may affect arsenic variability include land development, the addition of solutes to groundwater systems, or human-induced flow-system changes, including well development, groundwater pumping, and aquifer storage and recovery [1]. It is reported that the concentration of Arsenic (As) in groundwater may vary laterally, from unsafe to safe, within a range of 10-100 m, which often constrains the identification of safe regions and exact As mobilization mechanisms, and the spatial and temporal variation in groundwater As concentrations can have severe consequences with respect to the potential As exposure of people drinking presumably safe water [5]. In the treatment of environmental data, multivariate techniques have been used for a long time in the assessment of the degradation and/or spatial and temporal contamination of groundwater caused by natural or anthropogenic factors. Cluster analysis (CA) and principal component analysis (PCA) are important tools for the multivariate techniques, and they are applied to determine the dominant interrelationships among the variables to understand the processes responsible for controlling the chemistry of groundwater [12]. Multivariate statistics have been used extensively to reduce the complexity of large-scale data sets [13], as well as to classify and interpret the hydrogeochemical processes that occur in groundwater [6]. Several works have demonstrated the utility of a combination of principal component analysis and hierarchical cluster analysis in the determination of flow systems [14].
The assessment of the qualitative status of groundwater bodies is related to the definition of natural background levels (NBL) and threshold values (TV). The former are mainly related to the hydrogeochemical environments of the aquifer, while the latter are associated with the public health problem. Having a single NBL value for an element within a study area does not allow for the representation of the local variation in the geochemical and environmental conditions. In fact, it could generate a considerable uncertainty in the definition of the natural presence of a contaminant in an anthropogenic source [15]. The implementation of the NBL concept for groundwater quality management would benefit from the use of an integrated methodology that would allow for a clear identification of the distribution of the concentrations of a substance within a body of groundwater. The indicator kriging (IK) method, a non-parametric interpolation method based on transformed data for producing a set of binary value data, allows us to estimate the probability that the concentration of an element in a given place exceeds a predefined threshold value [16].
The main aim of this study is to propose a methodology that integrates the identification of flow systems and their relationship with natural background levels, allowing the spatio-temporal behavior of arsenic to be understood. It has the following specific objectives: (1) To define the flow systems in the historical series, using the main components; (2) to determine the natural background levels in each defined flow system to identify the anthropogenic or natural origin of arsenic; and (3) to evaluate the variation of arsenic's temporal space, considering the natural background levels and using the indicator kriging method. The novelty of this work is its methodology, which integrates flow systems and the definition of natural background levels. It is considered that, by incorporating the temporal component, it will be possible to explain the evolution of arsenic in groundwater, with greater certainty in the definition of the sources of natural degradation and anthropogenic contamination. This will allow decision makers to assess the current status of the aquifer and determine if its use is feasible. The methodology will be applied to the case of the Calera aquifer, in the state of Zacatecas, Mexico, due to its antecedents of high levels of arsenic [17].

Study Zone
The Calera aquifer is situated in the central zone of the state of Zacatecas, and has an area of 2087.6 km 2 , with an average rainfall of 450 mm/year, which is due to its rainy season occurring between June and September, and a potential evapotranspiration of 1900 mm/year. This aquifer is the main source of fresh water for agricultural and industrial activities, with an average annual extraction of 125 mm 3 [18]. It is located in the volcanic terrain of the Sierra Madre Occidental, in the south of a regional graben structure that gives rise to the Calera endorheic basin, characterized by ephemeral streams that are dry for most of the year. The Zacatecas Formation (maximum elevation 2700 m.a.s.l.) and the Chilitos Formation are linked to a flat area (2010 m.a.s.l.) in the south-central part and another flat area (2100 m.a.s.l.) in the north, and their floors are between 10 cm and 50 cm in depth. In general, they are considered arid, and the average annual temperature is between 18 and 20 • C, with a precipitation that varies from 400-450 mm. There is an important surface run-off, with small intermittent currents, which is towards the center of the study area and continues towards the north, with some water surface reservoirs of a reduced capacity, but of great importance for the area. This zone covers two important cities in the state of Zacatecas, where the main economic and social activities are established [17].
The recharge of the aquifer comes mainly from the pluvial precipitation, which occurs over the mountain ranges and hills of the basin, and from transmission losses from the flash floods in the ephemeral streams. In smaller proportions, the water precipitated in the valley recharges vertically in relation to the aquifer. Finally, another volume comes from the returns of irrigation by pumping. The artificial discharge is conducted by pumping wells; naturally, a small volume is drained by the underground flow to the Santa Ana and Sedano lagoons. The preferential direction of the underground flow is from south to north, with an entry in the east and west portions, within the limits of the mountains and hills [19]. The animal husbandry is a complementary occupation in the metropolitan area of Zacatecas-Guadalupe. On the other hand, mining activities have gained strength and become a major influence on the quality of groundwater over the past 30 years. Due to the industrial activity present in the area, a large amount of groundwater is extracted, and 50% of the total potable water, supplied to Calera, Morelos, and the metropolitan area of Zacatecas-Guadalupe, comes from the central and western part of the Calera aquifer [17].

Geology
In the Calera aquifer, the alluvium extends between tectonic pillars, which consist mainly of metamorphic rocks from the Zacatecas and Chilitos formations ( Figure 1). To the west are the massive sulphide deposits, which evolve towards the Las Pilas Complex, that Francisco I. Madero developed into an arch of islands with intense mining activities. Geophysical surveys and direct drilling show that the aquifer is only semi-confined locally by interspersed clay layers [17].
The lithology encompasses pyroclastic volcanic rocks, subvolcanic rocks, basaltic-andesitic, and metasedimentary terrigenous lavas. The distribution of volcanic rocks, located in the western zone from south to north, is the most extensive in the aquifer, and it includes an irregular block of approximately 62 km 2 , located to the northwest of the town, Francisco I. Madero. The sub-volcanic rocks are restricted to small extrusions located in the vicinity of the City of Zacatecas in the Cerro de la Bufa to the south of Cerro del Padre and to the north in the Cerros La Sierpe, Magistral, Calavera, and Calicanto [20].

Databases
The evolution of groundwater quality was analyzed for a period of 10 years through two databases for different periods of time and in different wells; both databases are presented in the Supplementary File (Table S1). The first was in 2005, with a total of 99 samples [21], and the second was conducted in 2015, with 173 samples [17], of which 35 are used for human consumption. The depth of the monitored wells was measured with a water level probe, which varied from 5 to 300 m; the average water depth was 85 m, and the electrical conductivity, pH, temperature, dissolved oxygen, and alkalinity were sampled in situ. In 2005, it was also analyzed, Redox potential (Eh), by means of a combination electrode, P100C-BNC, with values ranging from −67 to 415 mV. In the case of As, major ions, and tracer elements, the samples were filtered (0.45 µm membrane filters) and acidified (1% v/v HCO 3− ) in the field. The F − samples were not treated. The analytical determinations were carried out in the Environmental Engineering Laboratory of the Autonomous University of Zacatecas and the Laboratory of the Autonomous University of San Luis Potosí, and the As was analyzed by atomic absorption spectrophotometry (Thermo Scientific ICE AA 3300, Waltham, MA, USA), with hydride generation as well as the major ions, Ca 2+ , Na + , K + , and Mg 2+ . Calibration and validation for atomic absorption spectrophotometry were performed using a certificated standard. The ion balance ranged between ±7%, the parameters were determined under the guidelines described in the Standard Methods for the Examination of Water and Wastewater (APHA-SMWW 2006).

Determination of Flow Systems
The flow of groundwater is essentially controlled by geological and hydrogeological factors, which are represented in terms of the horizontal distance of movement and depth [22]. These hydraulic characteristics, and the idea that, in an aquifer, there are zones with a specific water quality called hydrochemical facies [23], form the basis for using a methodology to differentiate them. It is therefore feasible to establish a conceptual model for a multivariate analysis of: (a) The variables that control the regional and intermediate flows; and (b) the differentiation of flow systems through the analysis of the principal components (PCA) and cluster analysis [14].
The determination of the data flow systems was carried out through a multivariate analysis, using IBM SPSS Statistics, Version 20.0 (IBM Corp., Armonk, NY, USA) to determine: (a) The analysis of the principal components (PCA); and (b) the cluster analysis based on the results of the PCA. The PCA is a multivariate statistical procedure designed to classify correlated variables (in this case, dissolve chemical elements in groundwater) and reduce them to a few factors that can be interpreted more easily [24]. With the analysis of the main components, different associations between variables and samples were determined. When obtaining the cluster number, a k-means grouping was used to classify the samples with greater distinction.

NBL Estimation
Nowadays, the assessment of the qualitative status of a groundwater body is intrinsically related to the definition of natural background levels (NBLs) and threshold values (TVs), which are usually groundwater quality standards, defined according to the groundwater use [16]. The natural background levels (NBL) are defined as the level of concentration in water, controlled by natural geological, biological, and atmospheric processes and not influenced by human activities [18]. The selection method consists in selecting only those samples that are not influenced by anthropogenic activities, such as mining and agriculture, since natural groundwater concentrations are characterized by a concentration range, which is defined by certain percentiles of the natural component distribution [25]. The sample sets and the residual data set are eliminated, and the value is chosen, which is generally represented as the 90th, 95th, or 97th percentile. The value is chosen according to the degree of knowledge of the conceptual model and the hydrogeochemical system. All concentrations exceeding this level must be attributed to anthropogenic sources [16]. In this case, the 90th percentile was used because there are less than 60 samples in the data set [26] presented in each flow system.
The evolution of arsenic in groundwater can be evaluated by determining the natural background levels, in which samples that may be contaminated by anthropogenic processes are discarded by a preselection, as proposed by Hinsby et al. [27], using the nitrate modifications proposed by Preziosi et al. [28], with the following criteria: (1) Chloride concentration as a salinity indicator: Samples with [chloride] N > 200 mg/L were discarded; (2) concentration of nitrates and organic pollutants, as an indicator of the human impact in oxidized groundwater (DO > 2 mg/L o Eh < 100 mV): Samples with [NO 3 ] > 50 mg/L, or with total organic contaminants > 0.05 µg/L, were discarded; (3) oxidation capacity (OXC) [29], as an indicator of the human impact in the reduction of groundwater (DO < 2 mg/L, Eh < 100 mV o Fe (II) > 0.2 mg/L, [22]. OXC (meq/L) was calculated as 7 [SO4] + 5 [NO 3 ], with the concentration of the species in [mmol/L], and samples with OXC > 2 meq/L were discarded; and (4) ammonium: Samples discarded with NH4 > 0.5 mg/L under reducing conditions.
The NBL was established as the 90th percentile of the samples that are not contaminated.

Method of Indicator Kriging
The link between arsenic and complex hydrogeochemical processes is not yet understood, and geo-statistics, such as the measurement of inverse distances and natural neighbors through the analysis of variograms, is a useful tool for addressing this problem [15].
The conventional variograms and the geostatistical analysis are limited by the properties of the algorithms, the main problems being (i) normality (although the techniques are quite robust) and (ii) the independence of the standard value of ordinary kriging (OK) in the data values. The tool that allows these limitations to be overcome is the indicator kriging [16], which allows for the designation of several thresholds that allow the data to be transformed into a set of binary variables.
The calculation of the variogram is made according to a set of pairs of points, separated by an increased distance, h, by means of the following expression: where Z(X i ) and Z(X i+h ) are the values of the variable, Z, measured at the points, X i y X i+h , respectively, and n(h). denotes the number of pairs of points, separated by a lag, h.
The most widely used theoretical model for adjusting the functions of the experimental variogram is the spherical model, which is expressed by the following equations: where is the threshold (in most cases, it is equal to the variance of the sample) and a is the range, which is a distance beyond which the variable is not correlated. Another frequently used model is the Gaussian, which, in contrast to the spherical model, only asymptotically reaches the threshold value for which it does not show a defined range. It assumes a parabolic behavior near the origin and is characteristic of the data that show a high degree of spatial continuity. The formula of the Gaussian variogram is: However, one can define an effective range (a = 3a), which corresponds to 95% of the threshold (y (a ) = 0.95C).
The transformation of the indicator, I (X), at the location. X, for the data of value, z(x), estimated by the limits, k, of cut, zk, is defined by the following expression [30]: The results of the indicator kriging represent the possibility that the threshold value is exceeded, and these can be values between 0 and 1 [16].

Flow Systems
Lithology and climate generally determine the type of groundwater. Even if the lithology is similar, the degree of water-rock interaction will be determined by distance and travel time [6], however, diverse geochemical processes and the mixture modify its hydrochemical composition. An exploratory statistical analysis can provide valuable information and assemble samples with common features, which can yield statistical relationships that are expected to predict the hydrogeological processes that occur in the system [13]. To identify the present flows, a statistical analysis was performed for each set of data for the years, 2005 and 2015.
A matrix of correlations was generated to represent the statistical relationship between two or more variables by calculating the Pearson correlation coefficient in the year, 2005 ( Table 1). The variables, redox potential (EC) and total dissolved solids (TDS), are closely related to the dependence on ionic solutes, and with Na + , a major constituent present in the groundwater TDS. In addition, a close relationship was found with HCO 3− due to the high solubility of limestone or dolomite carbonate minerals. The hardness is closely related to Ca 2+ and Mg 2+ cations [31]. The results for the year, 2015, are shown in Table 2. In general, similar results to those of 2005 were found; there is a relation between TDS, Na + , Mg 2+ , and Cl − .

Principal Component Analysis
The PCA results are shown in Table 3 for the years, 2005 and 2015. For the year, 2005, the parameters identified in C1 were CE, TDS, HCO 3 − , and Na + , and those identified in C2 were the hardness, Ca 2+ and Mg 2+ . In C3, the temperature, pH, and Eh were identified, as well as in C4 K + . In 2015, C1 shows a positive relationship between TDS, Ca 2+ , Mg 2+ , and Cl − , and in C2, only the Na + was identified. In C3, Li and HCO 3 − were identified. With the results of the PCA, a cluster analysis was performed to group the samples into groups that allow the flow systems for the years, 2005 and 2015, to be determined, and the results are shown in Tables 4 and 5, respectively.  The average content of arsenic and fluorine is higher in all flows for 2015, representing a greater risk of the contamination of these ions due to the movement in the flows, which is reflected in the increase in temperature in all flow systems for this year.
In the year, 2015, all the values of F-were higher than in 2005, which may be due to an intense agricultural activity, where the use of phosphate is a common practice. Likewise, water with sodium characteristics causes high values of F, which is reflected in the local flow in the year, 2015.
A conceptual model of pressures influencing the groundwater quality of flow systems is shown in Figure 2. It can be seen that the local flow is influenced by the agriculture because there is return irrigation causing high values of As; also, parameters, like NO 3 , K, and SO 4 , can have a negative impact on the quality of groundwater. The intermediate flow can be affected by mining activities, since the mine's own processes can alter the geological environment, causing an appropriate environment for the movement of arsenic. Agriculture can affect the quality of groundwater by using fertilizers and pesticides, elements, such as F-, and phosphate, which can alter the quality of water. Due to the pumping of water for potable water use and agricultural and mining activities, intermediate and regional flows are mixed, causing high concentrations of arsenic, which constitutes a risk for the water supply of the population. The groundwater is influenced by natural aspects, such as salinity (Cl, SO 4 , Na), redox conditions (Fe, Mn), age (F, B), and geology (As, F).  Figure 3 shows the corresponding wells for each flow system. The lines correspond to the direction of each of the flows determined by a multivariate analysis; the results of the determination of the flow systems in this investigation show a direction of flow different from that shown by Navarro et al. [17]. The system flow theory used in the current study is described by Tóth [32], which considered the groundwater flow distances and the geochemistry of water. Arsenic occurs in the environment in several oxidation states. In the water, it is mostly found as inorganic forms, arsenite (+3) and arsenate (+5); under oxid conditions at thermodynamic equilibrium,  . Redox potential(Eh)−pH diagram of aqueous As species in a system containing As [33].

Natural Background Levels
The natural arsenic background levels for 2005 and 2015 were set as the 90th percentile of the non-contaminated samples, and the values are shown in Table 6. The reference value that is taken is the maximum arsenic limit allowed by WHO, which has a value of 10 µg/L. Table 6. Calculation of the Natural background levels (NBLs).

Flows
As µg/L The NBLs show a higher value than the reference values for almost all of the flows for the two years, except for the local flow in the year, 2005, which has a value of 4.92 µg/L. This means that, whatever the origin of the arsenic, exceeding the reference value represents a risk for the population, and in the same way, it is important to note that, in the year, 2015, higher values were obtained in all flow systems.
In 2005, the greatest value was presented in the regional flow, unlike in 2015, in which a higher value was obtained in the intermediate flow.
With the values of the reference levels, an interpolation was made of the distribution of As, fitted by a spherical model, as shown in Table 7.

Estimation Maps of As
The The reference maps distinguish the areas where the TVs (determined by the permissible limits of water for human consumption) are violated due to natural processes of a geological origin from those that result from contamination.

Regional flow
Since the distribution of volcanic rocks is the most extensive in the aquifer, they are located in the western zone, from south to north, and contain an irregular block of approximately 62 km 2 located to the northwest of the town of Francisco I. Madero [20], where arsenic is found naturally in the aquifer Calera.
In the regional flow of 2005, a high probability of exceeding the natural background levels in the northwestern and southwest parts is observed (Figure 5), and in 2015, the probability is increased in the central part. In 2015, the probability of exceeding the reference values increased in the southern part, unlike in 2005.

Intermediate Flow
The groundwater flows carry dissolved arsenic, as well as DOC, oxygen, sulfate, and competitive absorbents, and these factors influence the concentration of arsenic [35]. In this case, in the aquifer, there is an intermediate flow in the southern part, with an upward direction in the alluvial quaternary basin, and a notable increase is observed for the year, 2015. This may be related to the presence of mining activities, which influence this flow, thus facilitating arsenic mobility. The reference values surpass a large part of this flow ( Figure 6).

Mixture
The extraction of the groundwater in wells for agricultural activity and for the supply of drinking water generates a mixture of water from the regional and intermediate flows. In 2005, the probability of exceeding the NBL is concentrated in the northwestern part of the aquifer, and the reference values similarly cover a larger area. For 2015, a greater participation of the intermediate flow is observed, causing the reference values to be exceeded in the southern and northern parts (Figure 7). For the purposes of this investigation, estimation maps of the local flow in both years were omitted due to the small amount of data presented, which did not show relevant results.
Arsenic is a natural constituent of the Earth's crust, and ranks 20th in abundance in relation to the other elements. Arsenic can be presented in terrestrial and aquatic environments via natural geologic processes and anthropogenic activities. Atmospheric arsenic arising from coal burning has been cited as a major cause of lung cancer in parts of China and India. Values between 17 to 291 µg/day have been reported in food in different countries. Seafood accounts for 60-96% of the total dietary intake of arsenic, mostly in the form of arsenobetaine and arsenosugars, relatively non-toxic forms of arsenic. Other food sources are vegetables, mushrooms, grains, milk, chicken, and beef, which account for inorganic arsenic consumption. In non-arsenic endemic regions, the principal sources of inorganic arsenic in the diet are rice and chicken, which results in the accumulation (55-97 ng/g) of methylated arsenic compounds [36]. Drinking water has been reported to be the main route of arsenic exposure around the globe, and in some places of Zacatecas, it was estimated that 80% of the inhabitants could be exposed to arsenic levels higher than those recommended by EPA and the WHO [37].

Conclusions
A spatio-temporal analysis of the behavior of arsenic in the Calera aquifer was carried out through the interpretation of two databases of samples, taken 10 years apart, determining the flow systems for each period, and in each system, the natural levels were defined. In the background with the 90th percentile, prediction maps were finally made with the results, using the indicator kriging method in ArcGis to delineate contaminated areas.
Groundwater is not stable and concentrations of As change spatially and temporally. The identification of the flow systems allowed the contamination of the arsenic produced by mining activities in the intermediate flow for 2015, as well as the mixture of the regional and intermediate flows, caused by the extraction of water from wells that also have high arsenic values, to be identified.
In the Calera aquifer, most wells have values that exceed the limits allowed by the WHO, which makes it unfit for human consumption. This will cause serious human health effects given the fact that the consumption of drinking water is only one of the major exposure routes. Future research should be aimed at determining the impact of arsenic on the population and the factors that can facilitate the movement of arsenic in the aquifer. Since the wells destined for drinking water belong to the intermediate and mixed flows, a monitoring network must be carried out to control the extraction of water and thus supply the population in a safe way.