Evaluating Spatiotemporal Variations of Groundwater Quality in Northeast Beijing by Self-Organizing Map

As one of the globally largest cities suffering from severe water shortage, Beijing is highly dependent on groundwater supply. Located northeast of Beijing, the Pinggu district is an important emergency-groundwater-supply source. This area developed rapidly under the strategy of the integrated development of the Beijing–Tianjin–Hebei region in recent years. It is now important to evaluate the spatiotemporal variations in groundwater quality. This study analyzed groundwater-chemical-monitoring data from the periods 2014 and 2017. Hydrogeochemical analysis showed that groundwater is affected by calcite, dolomite, and silicate weathering. Self-organizing map (SOM) was used to cluster sample sites and identify possible sources of groundwater contamination. Sample sites were grouped into four clusters that explained the different pollution sources: sources of industrial and agricultural activities (Cluster I), landfill sources (Cluster II), domestic-sewage-discharge sources (Cluster III), and groundwater in Cluster IV was less affected by anthropogenic activities. Compared to 2014, concentrations of pollution indicators such as Cl−, SO42−, NO3−, and NH4+ increased, and the area of groundwater affected by domestic sewage discharge increased in 2017. Therefore, action should be taken in order to prevent the continuous deterioration of groundwater quality.


Introduction
With climate change and the rapid development of cities, surface water can no longer meet the demands for human survival, and groundwater resources play an increasingly important role [1,2]. Increasing attention is paid to the study of groundwater quality [3][4][5]. The evolution of groundwater is complex and affected by both natural (such as lithology, hydrogeological conditions, and water-rock interactions) and anthropogenic (such as agriculture and industry) factors [6][7][8].
In previous studies, clustering was often used to analyze groundwater data, and identify geogenic and anthropogenic factors. Commonly employed methods are hierarchical clustering analysis (HCA) [9], fuzzy c-means clustering technique [10], and Q-R mode factor cluster analysis [11]. However, due to the complexity and nonlinearity of groundwater data, these methods usually have two shortcomings: (i) results may have large deviations that cannot be scientifically interpreted [12,13]; and (ii) not be useful for the visual assessment of the results and the presentation of maps showing hydrogeochemical facies [14].
Self-organizing maps (SOM), as a recently proposed unsupervised machine-learning method, seem likely to become an excellent clustering method that have gradually shown advantages in complex classification issues [15]. SOM has strong statistical and self-association capabilities. It can achieve the

Study Area
Pinggu district is northeast of Beijing, and borders Beijing, Tianjin, and Hebei; total area is 950.13 km 2 . The study area is the plain of Pinggu (116 • 58 37"-117 • 18 42" N, 40 • 2 19"-40 • 11 48" E) ( Figure 1), with an area of about 402 km 2 , accounting for 42.3% of the total area. The study area has a temperate continental monsoon climate with distinct seasons: summers are rainy, and winters are dry. Annual average temperature is 11.7 • C. January is the coldest month, with an average temperature of 5.4 • C, and July is the hottest, with an average temperature of 26.1 • C; annual average precipitation is 636 mm. In 2014 and 2017, when two sampling campaigns were conducted, precipitation was 524 and 627 mm, respectively.
The lithology of the study area is mainly dolomite and sandstone. Bedrock is buried deep, with quaternary sediments overlaid on the top. The thickness of quaternary sediment varies greatly in different places due to the influence of topography and bedrock. There are 32 rivers in the study area, including three main rivers: Jv, Cuo, and Jinji. Jv is the longest river in Pinggu, with a total length of 54.15 km and a catchment area of 1692 km 2 . The runoff flow direction of groundwater is basically consistent with the flow direction of the main rivers, and the underground runoff of the mountain area moves along the valley of the mountain area to the basin.
The study area is dominated by a loose layer of pore water that is widely distributed along the alluvial fan in the basin. The karst fissure water in Pinggu is mainly distributed in the mountainous areas around the basin and the underlying bedrock. In the plain area, the underlying bedrock at the top of the alluvial-diluvial fan is in direct contact with the quaternary gravelly pebbles, thus providing a good runoff channel for groundwater [19]. The yield of a single well in the strong water-rich area of the study area can reach 3000-5000 m 3 /d, and 1500-3000 m 3 /d in the medium water-rich area, while it is below 1500 m 3 /d in the weak water-rich area. Groundwater-recharge sources in the study area are mainly composed of atmospheric-precipitation infiltration recharge, surface-water infiltration recharge such as river channels, and lateral replenishment of the piedmont zone. Moreover, leakage from Haizi reservoir, flood infiltration, and the return infiltration of canals are also important supply sources. In addition, there are two emergency-water-supply sources in the study area that provide domestic water for Beijing. also important supply sources. In addition, there are two emergency-water-supply sources in the study area that provide domestic water for Beijing.

Sample Collection and Analysis
Two sampling campaigns were conducted in 39 groundwater-sampling sites ( Figure 1) in August 2014 and 2017, and a total of 78 groundwater samples were collected. Before sampling, the in situ measurements of pH, temperature, TDS, and CO2 were performed by portable meters. Sample bottles (clean polyethylene bottles) were rinsed by the same groundwater 2-3 times, and all samples were filtered with 0.45 m filter membranes and collected in the sample bottles. Nitric acid was also added to the sample used to test cations for pH < 2. After that, all samples were stored in ice boxes at 4 °C until laboratory analysis. An inductively coupled plasma spectrometer (ICP-9100) was used to analyze ion concentrations (K + , Na + , Ca 2+ , Mg 2+ , Mn). Ion chromatography (ICS-2500) was used to determine the concentrations of Cl − , SO4 2− , NO3 − , F − , and NH4 + . An iron-ion-concentration detector (JC503-ET7400, BW Electronic Technology Ltd., China) was used to determine the concentrations of Fe 2+ . Concentrations of HCO3 − were determined by acid-base titration.
Laboratory quality-assurance and quality-control methods were used to ensure the quality of the analytical data, including standard calibration, the use of standard operating procedures, and reagent blank analysis. All analyses were performed in triplicate. Reagent blanks were monitored throughout analysis and used to correct analysis results.

Sample Collection and Analysis
Two sampling campaigns were conducted in 39 groundwater-sampling sites ( Figure 1) in August 2014 and 2017, and a total of 78 groundwater samples were collected. Before sampling, the in situ measurements of pH, temperature, TDS, and CO 2 were performed by portable meters. Sample bottles (clean polyethylene bottles) were rinsed by the same groundwater 2-3 times, and all samples were filtered with 0.45 m filter membranes and collected in the sample bottles. Nitric acid was also added to the sample used to test cations for pH < 2. After that, all samples were stored in ice boxes at 4 • C until laboratory analysis. An inductively coupled plasma spectrometer (ICP-9100) was used to analyze ion concentrations (K + , Na + , Ca 2+ , Mg 2+ , Mn). Ion chromatography (ICS-2500) was used to determine the concentrations of Cl − , SO 4 2− , NO 3 − , F − , and NH 4 + . An iron-ion-concentration detector (JC503-ET7400, BW Electronic Technology Ltd., China) was used to determine the concentrations of Fe 2+ . Concentrations of HCO 3 − were determined by acid-base titration.
Laboratory quality-assurance and quality-control methods were used to ensure the quality of the analytical data, including standard calibration, the use of standard operating procedures, and reagent blank analysis. All analyses were performed in triplicate. Reagent blanks were monitored throughout analysis and used to correct analysis results.

Self-Organizing Maps (SOM)
SOM is an unsupervised artificial neural network model. Kohonen (1982) [20] proposed the SOM algorithm and introduced its specific implementation process. This method simulates the biological Water 2020, 12, 1382 4 of 15 brain learning process that can make a cluster area form near each best-matching neuron, so that the input vectors of similar features are classified into a cluster. At the same time, SOM provides a visualization method of high-dimensional data projection in low-dimensional space, such as two-dimensional, while retaining the topological structure and metric relationships of the original data [21], which is suitable for high-dimensional data reduction, classification, and visualization. Heuristic rules can be used to select the appropriate number of neurons, such as 5 √ n suggested by Vesanto et al. (2000) [22]. where n is the number of samples. The Davies-Bouldin index can be used to determine the optimal number of clusters [23]. The SOM algorithm can be run with MATLAB software, and it usually consists of the following steps: (i) setting variables and parameters, including input vector X(n), weight vector W i (n), and number of iterations N; (ii) initializing weight vector W i (n), setting initial learning rate η 0 , and normalizing the weight vector, initial value W i (0), and input vector X(n); (iii) selecting training sample X from the input layer; (iv) calculating the Euclidean distance between W i and X ; (v) finding the best-matching unit (BMU) of the input sample according to Euclidean minimal principle and selecting winning neuron c through competitive learning; (vi) updating the BMU and its neighbor neurons towards the input sample, and weight vector W i at each time step t + 1 is calculated as follows: (vii) iterating learning rate η and topology fields, and renormalizing the weights after learning; (viii) checking whether number of iterations n exceeds N, and if it does returning to Step (iii), otherwise, the iteration process ends. Table 1 showed the statistical results of the hydrochemical parameters in the groundwater of the study area.

Correlation and Saturation Index Analysis
According to the Gibbs diagram (Figure 3), water chemistry in the study area was controlled by rock weathering. The Spearman correlation coefficient ( Table 2) showed that the positive correlation coefficients between Ca 2+ , Mg 2+ , and HCO3 − were higher, and TDS showed a high positive correlation with Na + (0.71), Ca 2+ (0.87), Mg 2+ (0.83), and HCO3 − (0.82), and the correlation between Na + and HCO3 − was also high (0.73), which indicated that Na + , Ca 2+ , Mg 2+ , and HCO3 − may have common sources. There was no obvious correlation between Na + and Cl − , and Ca 2+ and SO4 2− , indicating that evaporite dissolution may not be the dominant process. There was a clear positive correlation between Cl − , SO4 2-, and NO3 − , and combined with their high C.V. in Table 1, it can be inferred that Cl − , SO4 2− , and NO3 − may have originated from the same anthropogenic activities.
In order to further study the effect of water-rock interactions on groundwater chemical evolution, PHREEQC software was used to calculate the saturation index (SI) of calcite (CaCO3),

Correlation and Saturation Index Analysis
According to the Gibbs diagram (Figure 3), water chemistry in the study area was controlled by rock weathering. The Spearman correlation coefficient ( In order to further study the effect of water-rock interactions on groundwater chemical evolution, PHREEQC software was used to calculate the saturation index (SI) of calcite (CaCO 3 ), dolomite (CaMg(CO 3 ) 2 ), gypsum (CaSO 4 ·H 2 O), halite (NaCl), fluorite (CaF 2 ), and CO 2 (g) (Figure 4). Both calcite and dolomite were oversaturated, but gypsum, halite, and fluorite were undersaturated. The SI of CO 2 (g) were −3.18 to 1.8, showing an upward trend with the increase in TDS. Compared to 2014, the SI of CO 2 (g) was higher in 2017, but the SI of calcite and dolomite was lower. It seems that some external factor caused this opposite result, which we further discuss below.
Water 2020, 12, x FOR PEER REVIEW 6 of 15 dolomite (CaMg(CO3)2), gypsum (CaSO4·H2O), halite (NaCl), fluorite (CaF2), and CO2 (g) (Figure 4). Both calcite and dolomite were oversaturated, but gypsum, halite, and fluorite were undersaturated. The SI of CO2 (g) were −3.18 to 1.8, showing an upward trend with the increase in TDS. Compared to 2014, the SI of CO2 (g) was higher in 2017, but the SI of calcite and dolomite was lower. It seems that some external factor caused this opposite result, which we further discuss below.    (Figure 4). Both calcite and dolomite were oversaturated, but gypsum, halite, and fluorite were undersaturated. The SI of CO2 (g) were −3.18 to 1.8, showing an upward trend with the increase in TDS. Compared to 2014, the SI of CO2 (g) was higher in 2017, but the SI of calcite and dolomite was lower. It seems that some external factor caused this opposite result, which we further discuss below.

SOM Results
The SOM was used to analyze groundwater data in 2014 and 2017 (n = 39). Thirty-two (8 × 4) output neurons were selected (Figures 5a and 6a), and sampling points were grouped into four clusters (Figures 5b and 6b). Neurons use color codes to display the weight-vector value of each neuron. Blue and yellow correspond to low and high concentrations of parameters, respectively. Therefore, the interdependence of variables can be identified through the color comparison of neurons. Figure 5a showed that HCO 3 − , Na + , and K + had similar color gradients, indicating that they may be controlled by the same hydrochemical processes. Cl − , SO 4 2− , and NO 3 − showed a negative correlation with HCO 3 − , Na + , and K + by the inverse color gradient, which suggested that Cl − , SO 4 2− , and NO 3 − were controlled by different sources from HCO 3 − , Na + , and K + . Fe 2+ , Mn, and NH 4 + had obvious similarities in color gradient, indicating that they relied on common water chemical processes, such as redox reactions. Compared with the 2014 results, the 2017 results (Figure 6a) showed greater similarities in the color gradients of Cl − , SO 4 2− , and NO 3 − ; in combination with the high C.V. of these parameters, shown in Table 1, it could be inferred that anthropogenic factors controlling their concentrations were strengthened in 2017, such as sewage discharge and pesticide use.
Water 2020, 12, x FOR PEER REVIEW 8 of 15

SOM Results
The SOM was used to analyze groundwater data in 2014 and 2017 (n = 39). Thirty-two (8 × 4) output neurons were selected (Figure 5a and Figure 6a), and sampling points were grouped into four clusters (Figure 5b and Figure 6b). Neurons use color codes to display the weight-vector value of each neuron. Blue and yellow correspond to low and high concentrations of parameters, respectively. Therefore, the interdependence of variables can be identified through the color comparison of neurons. Figure 5a showed that HCO3 − , Na + , and K + had similar color gradients, indicating that they may be controlled by the same hydrochemical processes. Cl − , SO4 2− , and NO3 − showed a negative correlation with HCO3 − , Na + , and K + by the inverse color gradient, which suggested that Cl − , SO4 2− , and NO3 − were controlled by different sources from HCO3 − , Na + , and K + . Fe 2+ , Mn, and NH4 + had obvious similarities in color gradient, indicating that they relied on common water chemical processes, such as redox reactions. Compared with the 2014 results, the 2017 results (Figure 6a) showed greater similarities in the color gradients of Cl − , SO4 2− , and NO3 − ; in combination with the high C.V. of these parameters, shown in Table 1, it could be inferred that anthropogenic factors controlling their concentrations were strengthened in 2017, such as sewage discharge and pesticide use.    To study differences between clusters, a Mann-Whitney nonparametric test was performed (Table 4). In 2014 (Table 4a), anthropogenic parameters such as Cl − , NO 3 − , and SO 4 2− had significant differences with several clusters (Clusters I and II, Clusters I and IV, Clusters II and III, and Clusters III and IV). In 2017 (Table 4b), these three parameters had significant differences in all clusters, which indicated that anthropogenic activities affecting groundwater composition increased in these three years.  To study differences between clusters, a Mann-Whitney nonparametric test was performed (Table 4). In 2014 (Table 4a), anthropogenic parameters such as Cl − , NO3 − , and SO4 2− had significant differences with several clusters (Clusters I and II, Clusters I and IV, Clusters II and III, and Clusters III and IV). In 2017 (Table 4b), these three parameters had significant differences in all clusters, which indicated that anthropogenic activities affecting groundwater composition increased in these three years. Concentrations of NO3 − , SO4 2− , NH4 + , Fe 2+ and Mn were significantly different between Cluster II and the other clusters, indicating that parameters that are sensitive to the redox state of groundwater play an important role in the classification of Cluster II and the other clusters.

Formation Mechanisms of Groundwater Chemistry
The groundwater type of the study area is HCO 3 -Ca·Mg (Figure 2), and the main processes that control it are water-rock interactions ( Figure 3). According to correlation analysis and the SI of minerals, it could be inferred that the groundwater in this area may have undergone carbonate and silicate weathering (Equations (3)-(5)), and the results of Figure 7 also supported this inference. The negative correlation between SI (CO 2 ) and TDS might be due to the groundwater being dominated by dissolved carbonate having low salinity and CO 2 as the reactant of carbonate dissolution, the SI of which would show a downward trend with the decrease in TDS. Compared to 2014, the SI of CO 2 (g) was higher in 2017. This is probably because the higher precipitation in 2017 (627 mm in 2017 and 524 mm in 2014) brought more CO 2 and diluted the groundwater at the same time, which lowered the SI of calcite and dolomite in 2017. In this study, the concentration of F − did not exceed the 1 mg/L stipulated for Class III water in the Groundwater Quality Standard (GB/T 14848-2017), and no obvious aggregation occurred in space distribution. Therefore, in combination with the small C.V. in Table 1, it was speculated that the concentration of F − was not mainly affected by anthropogenic activities. Choi et al. (2013) [24] found high F − concentrations in alkaline HCO 3 -Na type groundwater. Although the groundwater chemical type in this study area was HCO 3 -Ca·Mg, silicate weathering still existed that could release Na + in the groundwater. Thus, the coexistence of HCO 3 − and Na + in groundwater was still beneficial for the enrichment of F − . The pH was alkaline (7.4-8.5), which is also conducive to enriching F − . Choi et al. (2013) [24] also found that Ca 2+ plays an important role in controlling F − concentration in groundwater. In this study, groundwater was supersaturated for both calcite and dolomite, but unsaturated for fluorite. This indicated that a high concentration of Ca 2+ is enough to maintain supersaturation with calcite and dolomite. However, groundwater cannot reach saturation with fluorite because fluorite saturation is dependent on the availability of F − in groundwater.

Spatiotemporal Variations
Groundwater is not only controlled by the natural processes analyzed above, but also affected by anthropogenic activities, which could be derived from the SOM results.
According to the spatial distribution of the sample sites (Figure 1), the Cluster I sample sites were located northwest of the study area. Concentrations of Cl − , SO 4 2− , NO 3 − , Ca 2+ , and Mg 2+ were the highest of the four clusters, and concentrations in 2017 were even higher. Cl − , SO 4 2− , and NO 3 − in groundwater typically come from domestic wastewater, agricultural chemicals, industrial chemicals, and surface water [25]. According to the field investigation, Yukou Industrial Park and large areas of farmland are located here; therefore, there are two sources of pollution around Cluster I, industrial and agricultural sources. Yukou Industrial Park focuses on the development of three major industrial chains, electromechanical electronics, clothing, and food brewing. Sewage from clothing-manufacturing and food-brewing processes could increase Cl − , SO 4 2− , and NO 3 − concentrations in groundwater.
Discharged wastewater from the electronic-device factory would increase the concentrations of Ca 2+ , Mg 2+ , and other metal ions. Pinggu district is an important agricultural product base of Beijing, and NH 4 Cl and (NH 4 ) 2 SO 4 contained in agricultural fertilizers could enter groundwater through precipitation leaching. A large area of orchards is also located here, and orchards must be sprayed with pesticides (such as Bordeaux mixture). As for Cluster III, sample sites were concentrated in the town of Pinggu, which is densely populated and the most prosperous place of the study area. Hence, the discharge of domestic sewage is the main source of groundwater pollution. Concentrations of Cl − , SO 4 2− , and NO 3 − were the second highest after Cluster I. The sample sites increased substantially in 2017, and those increased sample sites were mainly distributed in two areas, around the towns of (i) Pinggu and (ii) Mafang, in the southwest of the study area (P04, P05, P07). There are dense villages around Pinggu, drainage facilities in these villages are rudimentary, and villagers often spill domestic sewage directly on the bare ground. It is easy for sewage to contaminate groundwater on account of less asphalt pavement in the villages. Sample sites near Mafang were classified as Cluster III in 2017 for two reasons. First, the study area is surrounded by mountains on three sides. Mafang is the only "exit" of the plain area, so traffic there is heavy, and its population is growing rapidly. However, drainage and sewage-treatment facilities have not improved, which tends to make the groundwater quality of Mafang deteriorate; second, the elevation of Mafang is the lowest, making this a sink of groundwater runoff, and the groundwater containing Cl − , SO 4 2− , and NO 3 − converges here, which affects Mafang's groundwater quality to some extent. The sample sites of Cluster IV were distributed at the edge of the study area and sparsely populated areas. The concentrations of pollution indicators in this cluster were relatively low, and few changes occurred in 2014 and 2017. It can be inferred that Cluster IV represents groundwater that is less affected by anthropogenic activities. Nevertheless, the area covered by this cluster in 2017 was diminished compared to 2014, which indicated that groundwater quality in the study area deteriorated in 2017.

Conclusions
The type of groundwater in the Pinggu plain area is HCO 3 -Ca·Mg, with a TDS of 280-1090 mg/L. Groundwater quality is affected by both natural and anthropogenic factors. Natural factors are mainly dominated by carbonate and silicate weathering, and anthropogenic factors mainly come from industrial activities, agricultural activities, landfills, and domestic-sewage discharge.
Large-scale farming and intense industrial activities in the northwest of the study area have increased the concentrations of Cl − , SO 4 2− , NO 3 − , Ca 2+ and Mg 2+ in groundwater by using pesticides and discharging industrial sewage. In the southwest of the study area, NH 4 + , Fe 2+ , and Mn concentrations are mainly caused by refuse landfill activities. Areas affected by domestic-sewage discharge were around densely populated towns (Pinggu and Mafang), where Cl − , SO 4 2− , and NO 3 − concentrations were high. Compared to 2014, areas with good groundwater quality that was not affected by human activities decreased in 2017. Thus, measures must be taken to prevent the further deterioration of groundwater quality. Existing problems and related suggestions are as follows: (i) the discharge of industrial and domestic sewage should be strictly monitored in accordance with the relevant quality standards, pesticide-usage modes and dosages should be standardized, and effective treatment measures should be taken in heavily polluted agricultural areas and landfills; (ii) the area of groundwater affected by domestic-sewage discharge has increased, so it is necessary to improve drainage and sewage-treatment facilities in villages, and strengthen the education of villagers; (iii) the groundwater quality of Mafang in the southwest deteriorated due to rapid urban development, so the speed of upgrading drainage and sewage-treatment facilities should keep up with the speed of urban development, and relevant laws and regulations should be improved for areas with rapid development, which can prevent groundwater pollution from hindering urban development. The above suggestions provide ideas for the management of groundwater quality in other rapidly developing urban areas.