Assessment of Groundwater Quality Using APCS-MLR Model: A Case Study in the Pilot Promoter Region of Yangtze River Delta Integration Demonstration Zone, China

: Groundwater contaminant source identiﬁcation is an endeavor task in highly developed areas that have been impacted by diverse natural processes and anthropogenic activities. In this study, groundwater samples from 84 wells in the pilot promoter region of the Yangtze River Delta integration demonstration zone in eastern China were collected and then analyzed for 17 groundwater quality parameters. The principal component analysis (PCA) method was utilized to recognize the natural and anthropogenic aspects impacting the groundwater quality; furthermore, the absolute principal component score-multiple linear regression (APCS-MLR) model was employed to quantify the contribution of potential sources to each groundwater quality parameter. The results demonstrated that natural hydro-chemical evolution, agricultural activities, domestic sewage, textile industrial efﬂuent and other industrial activities were responsible for the status of groundwater quality in the study area. Meanwhile, the contribution of these ﬁve sources obtained by the APCS-MLR model were ranked as natural hydro-chemical evolution ( 18.89% ) > textile industrial efﬂuent ( 18.18% ) > non-point source pollution from agricultural activities ( 17.08% ) > other industrial activities ( 15.09% ) > domestic sewage ( 4.19% ) . It is believed that this contaminant source apportionment result could provide a reliable basis to the local authorities for groundwater pollution management.


Introduction
Groundwater quality degradation by natural processes or anthropogenic activities is widely recognized and has drawn the attention of researchers for decades [1][2][3]. The natural processes, mainly water-rock interactions, may cause specific ions' accumulations in groundwater, such as arsenic, magnesium and iodine [4][5][6]. In the meantime, anthropogenic activities can deteriorate groundwater quality and lead to a series of geological environment problems. For instance, agricultural behaviors including fertilization and livestock breeding are implicated in excess of nitrogen, phosphorus and potassium [7][8][9]. Industry effluent and leakage can lead to increased concentrations of sulfate and some heavy metal ions [10][11][12], and domestic sewage can lead to high levels of ammonia in groundwater [13]. Under the interaction of natural processes and anthropogenic activities, groundwater contamination management is undoubtedly a vital task. Classifying the contamination into their original sources is the primary step for an authentic assessment of the contaminated aquifer [14,15]. It will not only benefit the status survey, but also provide a consequential basis for future groundwater protection and pollution management.
Various methods have been developed for identifying the groundwater contaminant sources, such as the in situ survey, stable isotope methods, model-based numerical inversion and multivariate statistical approaches [16][17][18][19][20]. The in situ surveys usually attempts to find out the potential sources with labor-intensive work, but often fails to give a quantified analysis of the contamination source apportionment. The stable isotope methods can provide precise apportionment of sources, but only for the isotope-related contaminant. As for the model-based numerical inversion methods, they are believed to be more suitable for local scale research [21][22][23][24]. Meanwhile, multivariate statistical approaches have gained popularity among researchers due to their convenience and efficiency for regional scale problems [25][26][27]. This branch of methods include cluster analysis (CA), principal component analysis (PCA), positive matrix factorization (PMF), absolute principal component score-multiple linear regression (APCS-MLR) and so on. Compared with other approaches, the APCS-MLR method is an especially effective and practical method for identifying pollution sources. It was first developed by Thurston and Spengler [28] and then applied to pollution source apportionment problems on air, surface water and sediments [29][30][31][32][33]. More recently, the APCS-MLR method has started to be employed in groundwater quality research and has proven to be a powerful tool in identifying the impact of natural processes and anthropogenic activities to groundwater quality. For instance, Zhang et al. [34] employed the PCA and APCS-MLR methods to identify groundwater pollution sources and their apportionment in the Hutuo River alluvial-pluvial fan region of northern China. Meng et al. [35] used the APCS-MLR receptor model to assess the potential pollution sources of groundwater from 2006 to 2016 in the Limin Groundwater Source Area in Harbin. Yu et al. [36] applied the APCS-MLR method in nitrate pollution sources apportionment and compared its result with a Bayesian isotope mixing model. Sheng et al. [37] utilized the APCS-MLR receptor model to estimate the source apportionment of heavy metal pollution in an arid oasis region in Northwest China.
Therefore, in this research, we employ the APCS-MLR model to assess the potential sources of groundwater contamination in the pilot promoter region of the Yangtze River Delta integration demonstration zone. The study area is located in the Taihu watershed in eastern China, of which the groundwater has been influenced by intense anthropogenic activities. Previous work has demonstrated the complexity of the groundwater pollution in this area. Various contaminants have been found, including heavy-metal ions, nitrogen and phosphorus compounds [38][39][40][41]. Despite the numerous individual investigations for several specific contaminants, the overall groundwater contaminant source apportionment problem is still unsolved.
The objective of this work is to: 1. recognize the natural and anthropogenic aspects that affect the groundwater quality; 2. quantify the contribution of potential sources to each groundwater quality parameter via the APCS-MLR model. Insights from this paper could provide reliable advice for further pollution remediation plans in the pilot promoter region of the Yangtze River Delta integration demonstration zone.

Study Area
The pilot promoter region of the Yangtze River Delta integration demonstration zone (30 • 54 14 -31 • 09 25 N and 120 • 39 47 -121 • 07 30 E) is located southeastern of the Taihu Lake with a total area of 653.9 km 2 ( Figure 1). It is under a subtropical monsoon climate, with 16.2 • C annual temperature and 1127.1 mm annual rainfall [42]. The groundwater in this region is mainly distributed in the sand and silt layers from Quaternary sediments [43,44]. The thickness of the Quaternary sediments is larger than 150 m according to the borehole materials, and the phreatic aquifer with thickness ranging from 3 to 7 m is the layer closely related to human activities ( Figure 2). The aquifer is recharged from precipitation, river infiltration, and irrigation; in the meantime, its discharge mainly occurs through run off to rivers and evaporation since restrictions on groundwater exploitation have been imposed by the government since 1997 [45]. As an interactive area of Jiangsu, Zhejiang Provinces and Shanghai metropolis, this area is one of the most developed regions in China. Based on the a remote sensing survey with field validation, a major part of the land use in this area is cultivated land and residential area. Industries, especially textile factories, are mainly scattered in the south of the study area. What is more, a dense surface water network, including Taipu River and Dianshan Lake, is located in this area, which makes the surface water-groundwater interaction extremely complex. Hence, heavy agricultural activities and industrial sewage pose substantial threats to groundwater security. The application of nitrogen fertilizers and the use of industrial chemicals constitute potential point and non-point sources of groundwater contamination.

Data Preparation
Groundwater samples were collected from 84 wells in June 2020 and June 2021, and the sampling sites were chosen with the considerations of land-use types (cultivated land, residential land, etc.) and spatial distribution (approximately 1 km 2 per point) ( Figure 1). In each well, the water temperature (WT), pH, dissolved oxygen (DO) and total dissolved solids (TDS) were measured by SX-620 pH Testor, SX-630 ORP Testor and Hanna DiST in situ, respectively. Figure 3 shows the landscape surroundings of several wells and the in situ test procedures. Polyethylene containers with a capacity of 1.5 L were used to store groundwater samples, and then brought back to the laboratory for an analysis of total Mn, I and Sb. Various instruments were used for the analyses (Table 1).

Multivariate Statistical Analysis
For an in-depth analysis of the groundwater chemistry data, APCS-MLR was employed. The ACPS-MLR is a receptor model based on the results of principal component analysis (PCA), together with a multivariate linear regression using the measured contaminant concentrations [46][47][48]. First, Kaiser-Meyer-Olkin (KMO) criteria and Bartlett's test of sphericity were performed to assess the adequacy of the dataset for PCA [49,50]. We employed the criteria that for the PCA to be considered reliable, the KMO value needs to be larger than 0.5, and the significance level of Bartlett's test of sphericity should be smaller than 0.05. By conducting PCA procedure, the principal components from the related groundwater quality parameters could be obtained, as follows: for i = 1, 2, . . . , p, and j = 1, 2, . . . , n, where A z represents the component score; a stands for the component loading; C is the measured concentration of each groundwater quality parameter; p is the number of components; n is the number of samples and m is the number of groundwater quality parameters. The principal components, with eigenvalues greater than 1.0 in the variance computation, are believed to be able to provide qualitative information about the potential contamination [51,52]. For a clearer interpretation, the original loadings normally need to be rotated until the loadings of PCs are redistributed and polarized. This procedure is normally mentioned as varimax rotation and the obtained new variables are called varifactors (VFs). For each VF, the component loadings reflect relative attribution of the groundwater quality parameters, with absolute loading values >0.75, 0.75-0.5 and 0.5-0.3 defined as strong, medium and weak, respectively [53,54].
Then, the component scores from the PCA are normalized to perform APCS-MLR for groundwater contaminant source apportionment. A detailed description of this method could be found in Thurston and Spengler [28] and Rahman et al. [30]. In brief, the APCS-MLR model assumes that the contaminant sources attribute linearly to the pollutant concentration at each sampling site. Hence, the concentration of each contaminant at each sampling site (C kj ) can be calculated by a multiple linear regression of the contribution of contaminant sources through Equation (2): where r k0 represents the constants term of multiple linear regression for pollutant k; r ki stands for the coefficient of multiple linear regression of the contaminant source i for the pollutant k; APCS ij is the absolute principal component scores and it can be obtained through Equations (3)-(5).
where (Z 0 ) k stands for the normalized concentration of contaminant k in a non-pollution site;C k represents the mean concentration and σ k indicates the standard deviation of contaminant k; (A 0 ) i is the principal component score in the non-pollution site; S ki represents the score coefficient of component i for pollutant k; (A z ) ij stands for principal component score of sample j in principal component i.
Here, r ki × APCS ij implies the contribution of contaminant source i to pollutant k in sample j. The average of all samples r ki × APCS i is established as the contribution of contaminant source i to pollutant k. Notably, negative values may be achieved during the calculation process, which could lead to a total contribution of all pollutants exceeding 100%. Hence, Haji Gholizadeh et al. [46] proposed an absolute value method to calculate the contribution of contaminant sources to water quality parameters, as it is shown in Equations (6) and (7).
where PC ki stands for the relative contribution rate of contaminant source i to the pollutant k; PC k indicates the relative contribution rate of unrecognized source to the pollutant k; APCS i is the average value of the absolute principal component scores of all samples.

Characteristics of the Groundwater Pollution
The descriptive statistics of physicochemical parameters for all groundwater samples were summarized in Table 2. For several groundwater quality parameters, such as NH + 4 , NO − 2 , NO − 3 , Mn, Ca, I and Sb, the maximum concentration in groundwater samples had exceeded level III of Chinese Groundwater Quality Standard(GB/T 14848-2017). What is more, TP, which had a maximum concentration of 2510 µg/L, also deteriorated the groundwater quality in this region. The variation coefficient of groundwater quality parameter is an index presenting the overall variability of the samples, and such variability is believed to be caused by various anthropogenic activities [51,55,56]. Here, the geographic information system technique with inverse distance weighted interpolation method was utilized to generate the spatial distribution of groundwater quality parameters with variation coefficient larger than 80% (Figure 4). The comparison between figures showed that the contaminants could be classified into several groups with similar spatial distribution. For instance, Mn, NH + 4 and NO − 2 were relatively enriched in the western corner of the study area, while higher concentration of TP and K + were detected in the centre and northwestern of the region. As for Sb, it can be found distributed all across the region, but the highest concentrations are found primarily in the southern area, which was consistent with the location of textile industries. To uncover the linear correlations between physicochemical water quality parameters, the Pearson's correlation coefficient with statistical significance (p < 0.05) was performed here. As it was shown in Figure 5, strong correlations (r > 0.5) were observed among Na + , Ca 2+ , Mg 2+ and Cl − , indicating that they may have the same origin from hydrogeochemical processes. Their high positive correlations with TDS (r > 0.6) implied the chemical composition of groundwater in this region was mainly controlled by these ions. A moderate positive correlation (r = 0.55) was also spotted for NH + 4 and NO − 2 , illustrating that the nitrification process is occurring in the groundwater. As for K + , its relations with other physicochemical parameters were mostly uncorrelated, except for TP. With a positive correlation of 0.63, K + and TP were likely to be an indication of agricultural fertilization.
Based on the above analysis, we believe the nitrogen and phosphorus compounds could be attributed to the application of fertilizer or waste of livestock breeding. Other contaminants, such as I and Sb, can be ascribed to the natural minerals and industrial effluent, respectively.

Pollution Sources Identified Based on the PCA Analysis
To provide a more detailed pollution-source analysis, the PCA method was then applied. The Kaiser-Meyer-Olkin(KMO) and Bartlett's test were performed first to check the adequacy of the original dataset. The KMO value was equal to 0.732, while the value of Bartlett's sphericity test was very close to zero (p < 0.001), indicating the validity of the PCA application. Based on the criteria, the principal components with eigenvalues exceeding 1.0 were chosen as the potential contaminant sources. As indicated in Table 3, five principal components were selected in this case, accounting for 71.23% of the total variances. The rotated factor loadings for varifactors were shown in Figure 6.
The first varifactor, VF1, accounted for 24.72% of the total variance. It was heavily weighted by TDS (0.83), Cl − (0.91), Na + (0.94) and Mg 2+ (0.79), while moderately weighted by Ca 2+ (0.61) and I (0.70). Generally speaking, the contribution of these ions to VF1 was the result of water-rock interactions [47,57]. In addition, the quaternary aquifer marine and lagoon-facies sediments in this area are rich in iodide and serve as a source of dissolved I in groundwater [58,59]. Hence, VF1 can be interpreted as natural hydro-chemical evolution contribution. The second varifactor (VF2) accounted for 15.02% of the total variance, and was mainly characterized by K + (0.82) and TP (0.78), with moderate positive loadings of NO − 2 (0.52) and NO − 3 (0.50). These three ions in the groundwater are normally associated to the application of chemical fertilizer [60][61][62]. During our field survey, lots of aquaculture sites and farmlands could be spotted in the study area, especially near LiLi town. Hence, VF2 is assumed to represent the non-point source pollution from agricultural activities. VF3 (12.09% of the total variance) had a strong positive loading on NH + 4 (0.85), with moderate positive loadings of NO − 2 (0.54) and Mn (0.63). The most common source of NH + 4 and NO − 2 in groundwater are domestic sewage, livestock wastes and application of nitrogen fertilizer [48,63]. Dissolved Mn may naturally come from weathering of manganese oxide minerals, but its concentrations can be increased by contamination from industrial effluent and domestic sewage [64,65]. In the pilot promoter region of the Yangtze River Delta integration demonstration zone, as in some rural areas, the sewage collection system is imperfect and municipal sewage may leak and contaminate the groundwater. Especially in the west corner of the study area, several dry toilets and garbage heaps can be spotted during our field surveys. Therefore, VF3 is considered as domestic sewage. VF4 explained 10.02% of the total variance and had the highest loadings on SO 2− 4 (0.78) and NO − 3 (0.55). Meanwhile, weak negative loadings on WT (−0.45) and pH (−0.41) also could be observed. Since SO 2− 4 and low pH value (acidic condition) are normally related with industrial effluent [47,66,67], this factor can be identified as the influence of industrial activities. The last component (VF5) accounted for 9.38% of the total variance and it was mainly affected by DO (0.78) and Sb (0.64). The high level of Sb compound in groundwater normally comes from mining or textile industry [68,69]. Since the textile industries are widely distributed in the southern part of our study area, VF5 can be regarded as the impact of textile industrial effluent.

Source Apportionment Using APCS-MLR
Based on the results of PCA analysis, the APCS-MLR model was then employed to quantify the contribution of each potential contaminant source to all 17 groundwater quality parameters using Equations (6) and (7). The scatter plots of predicted and observed concentrations for main groundwater pollutant parameters by using the APCS-MLR model were shown in Figure 7. Except for Mn (0.59) and Sb (0.57), the linear regression of other groundwater quality parameters were all well-matched with a R-square values larger than 0.6, indicating the contaminant source apportionment was reliable. The outcome of source apportionment via APCS-MLR model were shown in Figure 8. The relative contribution of all pollution sources to each contaminant are calculated according to Equations (6) and (7), and the average relative contribution of pollution sources is also obtained for an overall evaluation. As it is shown in Figure 8B, the hydrochemical characteristics of groundwater in the Yangtze River Delta integration demonstration zone was greatly affected by natural hydro-chemical evolution (VF1), accounting for 18.89% of the total sources and showed high contribution ratios in TDS (46%), Cl − (46%), Na + (50%) and I (42%) ( Figure 8A). The most threatening anthropogenic factor to groundwater quality was textile industrial effluent (VF5), accounting for 18.18% of the total sources and presented high contribution to Sb (44%). Furthermore, non-point source pollution from agricultural activities (VF2) accounted for 17.08% of the entire sources and presented a relatively high contribution to K + (49%), TP (36%) and NO − 2 (38%). Other industrial activities (VF4) were also responsible for 15.09% of the total sources, shown as SO 2− 4 (45%) and NO − 3 (32%), while domestic sewage (VF3, 4.19%) presented relatively high contribution ratio in NH + 4 (24%). In addition, the contribution of unidentified sources to each groundwater quality parameters, which are mainly due to the complex pollutants evolution processes, ranging from 2% to 59% with the average of 26.58%. Therefore, the contributions of identified sources in the Yangtze River Delta integration demonstration zone were determined as being in the following descending order: natural hydro-chemical evolution > textile industrial effluent > non-point source pollution from agricultural activities > other industrial activities > domestic sewage.

Conclusions
In the present study, principal component analysis (PCA) was employed to identify the potential natural and anthropogenic aspects impacting the groundwater quality, then the absolute principal component score-multiple linear regression (APCS-MLR) model was used to quantify the contribution of potential sources to 17 groundwater quality parameters in the pilot promoter region of Yangtze River Delta integration demonstration zone, China. Based on the result of PCA analysis and possible sources for each ion, five major sources that affected the groundwater quality were identified, namely natural hydro-chemical evolution, agricultural activities, domestic sewage, textile industrial effluent and other industrial activities. With most linear regression R-square values larger than 0.6, the APCS-MLR model successfully quantified the contribution of potential sources in this study area. The contributions of five potential sources were ranked as natural hydro-chemical evolution (18.89%) > textile industrial effluent (18.18%) > non-point source pollution from agricultural activities (17.08%) > other industrial activities (15.09%) > domestic sewage (4.19%). The results clarified the groundwater in the pilot promoter region of the Yangtze River Delta integration demonstration zone was primarily contaminated by textile industrial effluent and agricultural activities, while domestic sewage only accounted for a small part of the responsibility. These insights could provide reliable advice for groundwater pollution management in highly developed areas. On the other hand, this study still suffered from a shortage of groundwater quality data, especially in temporal scale. Hence, further research with more measurements needs to be conducted and temporal variation could be a possible way to recognize the unidentified sources.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author, upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.