Application of Multivariate Statistical Analysis to Identify Water Sources in A Coastal Gold Mine, Shandong, China

Submarine mine water inrush has become a problem that must be urgently solved in coastal gold mining operations in Shandong, China. Research on water in subway systems introduced classifications for the types of mine groundwater and then established the functions used to identify each type of water sample. We analyzed 31 water samples from −375 m underground using multivariate statistical analysis methods. Cluster analysis combined with principle component analysis and factor analysis divided water samples into two types, with one type being near the F3 fault. Principal component analysis identified four principle components accounting for 91.79% of the total variation. These four principle components represented almost all the information about the water samples, which were then used as clustering variables. A Bayes model created by discriminant analysis demonstrated that water samples could also be divided into two types, which was consistent with the cluster analysis result. The type of water samples could be determined by placing Na+ and CHO3− concentrations of water samples into Bayes functions. The results demonstrated that F3, which is a regional fault and runs across the whole Xishan gold mine, may be the potential channel for water inrush, providing valuable information for predicting the possibility of water inrush and thus reducing the costs of the mining operation.


Introduction
With continuous technological development and progress, the mineral resources of the Earthespecially those easily exploited with good geological conditions and shallow burial depths-have been mined and used in large quantities. Acquiring resources from deep within the earth, the ocean, and polar spaces has become an important direction of future development, among which the exploitation and the use of seabed mine resources are promising. Therefore, mine lane water source identification plays an important role in water burst warning and water inrush protection.
Scholars have conducted extensive studies on identifying mine water resources. A conventional coefficients and scenario analysis methodology combined with the Bayesian network has been used to discriminate water sources and conduct probability estimation [1,2]. Physical experiments, numerical simulations, and field measurements have been employed to study mine water inrush [3][4][5]. To precisely predict the probability of water inrush and provide scientific evidence to prevent water inrush in the altitude generally ranges from 1 to 6 m and is covered by Quaternary sediment. The elevation of the south side of the mining area is 2-3.5 m, and the highest point is higher than the highest tide surface of the Bohai Sea, and the surface of the Bohai Sea is less than 1 m. Most of the mining areas are located below the intertidal zone and the seawater, and the ore beds do not have natural drainage.
The mining area belongs to the continental semi-humid climate of the temperate monsoon region and has four distinct seasons. Because of its proximity to the ocean, the area has both ocean and inland climate characteristics. The annual average temperature is 12.5 °C, and the precipitation is 612.1 mm. According to the differences in the lithology, Quaternary sediment can be divided into four units from top to bottom: (1) the first aquifer, which consists of medium and coarse sand. Its thickness ranges from 3.5 to 17.29 m, and the average thickness is 9.93 m. It contains groundwater in unconsolidated sediments, with the hydraulic conductivity ranging from 5.35 to 15.27 m/day. Additionally, its main renewable groundwater resources are atmospheric precipitation and seawater recharge. The total dissolved solids content varies considerably, with a minimum value of 0.21 g/L and a maximum value of 25.95 g/L; (2) the first aquiclude is located below the first aquifer. Its burry depth ranges from 5.5 to 9.0 m, and its thickness does not change obviously, with a general thickness of about 7 or 8 m; (3) the second aquifer gradually increases in thickness from north to south. Its average thickness is 3-4 m and has a maximum thickness of 11.9 m. It is a confined aquifer that can receive the recharge of the first aquifer and the seawater and is composed of medium sand, coarse sand, and gravel; (4) the second aquiclude is located above the weathered crust. Its thickness is 3-5 m and mainly comprises red clay, which makes it a good water barrier. There are three main watercontrolling fractures, which have been named F1, F2, and F3, in the mining area ( Figure 1). The maximum principle is horizontal, oriented NW-SE, and is perpendicular to the strike of F1. The F1 fault is the ore-controlling structure of the mining area. The ore body is distributed in the footwall of the fault, which is inclined to the southeast and has a dip angle of 40°. There is a 50-100mm-thick black gouge, which makes F1 an aquiclude [27].
The F2 fault is located west of F1 and is smaller than F1. F2 starts at the F3 fault and ends at the The F1 fault is the ore-controlling structure of the mining area. The ore body is distributed in the footwall of the fault, which is inclined to the southeast and has a dip angle of 40 • . There is a 50-100-mm-thick black gouge, which makes F1 an aquiclude [27].
The F2 fault is located west of F1 and is smaller than F1. F2 starts at the F3 fault and ends at the Bohai Sea; its strike is 280 • and its dip is 85 • [28]. F2 has the characteristics of a torsion fault, in which its hang wall moves northward, and the foot wall moves in the opposite direction. It is significantly different in terms of the fractures that have developed on both sides of F2. On the west side, the NE fracture is clearly developed, and a NW fracture is rare. On the east side, there is mainly a NW fracture. According to the results of geophysical exploration data, F2 has evident low-resistance characteristics, indicating that it is highly permeable.
The F3 fault is a regional fault that traverses the entire mining area-the Sanyuan-Chenjia fracture. The overall strike is 300-310 • , which dips to the northwest with a dip angle of nearly 90 • . It stretches northwest to the sea and southeast to the continent. The main fracture plane of F3 is located on the south side and is filled with gouge and fault breccia. The south side of the rock mass has perfect integrity and mainly develops NW-SE joints. According to current mining exploration, the depth is greater than 850 m, and the width of the fracture zone is approximately 15-35 m. At the ground surface, the broken belt is narrow, and the width increases as the depth increases. F3 is a tensile fracture, and there is no cement. It is a main groundwater pathway and has perfect hydraulic conductivity.

Water Sampling and Analysis
In the −375 m sublevel (Figure 2) of the Xishan gold mine, the water outlet sites were numbered. Two 600 mL bottles of water samples were collected from each water outlet site from 2009 to 2015 ( Figure 3). Before the analysis, to ensure the quality of the water samples, the water samples were placed in 600 mL plastic bottles, sealed, and stored in a refrigerator at 4 • C. fracture. According to the results of geophysical exploration data, F2 has evident low-resistance characteristics, indicating that it is highly permeable. The F3 fault is a regional fault that traverses the entire mining area-the Sanyuan-Chenjia fracture. The overall strike is 300-310°, which dips to the northwest with a dip angle of nearly 90°. It stretches northwest to the sea and southeast to the continent. The main fracture plane of F3 is located on the south side and is filled with gouge and fault breccia. The south side of the rock mass has perfect integrity and mainly develops NW-SE joints. According to current mining exploration, the depth is greater than 850 m, and the width of the fracture zone is approximately 15-35 m. At the ground surface, the broken belt is narrow, and the width increases as the depth increases. F3 is a tensile fracture, and there is no cement. It is a main groundwater pathway and has perfect hydraulic conductivity.

Water Sampling and Analysis
In the -375 m sublevel (Figure 2) of the Xishan gold mine, the water outlet sites were numbered. Two 600 mL bottles of water samples were collected from each water outlet site from 2009 to 2015 ( Figure 3). Before the analysis, to ensure the quality of the water samples, the water samples were placed in 600 mL plastic bottles, sealed, and stored in a refrigerator at 4 °C.    The parameters of nine variables, including K + , Na -, Ca 2+ , Mg 2+ , Cl -, SO4 2-, HCO3 -, pH, electrical conductivity (EC), and total dissolved solid (TDS), were measured at the Institute of Geology, China Earthquake Administration (Table 1). All water samples were measured by a DIONEX-500 ion chromatograph using EETROHM TM to test the CHO3 -content in the water samples. The detection limits of cations and anions were 0.05 and 0.1 mg/L, respectively. The analysis accuracies of cations, anions, pH, EC, and TDS were 0.01 mg/L, 0.1 mg/L, 0.01, 0.1 μs/cm, and 0.01 mg/L, respectively. The test standards of cations, anions, and alkalinity were based on DZ/T0064.28-1993, DZ/Too64.51-1993, and DZ/T0064. , respectively [29]. The unit for the ions and the TDS concentrations was mg/L, the unit of EC was μs/cm, and the pH values were standard values. Another 600 mL bottle of the water sample was used for environmental isotope analysis (δ 18 O and δ 2 H). The analysis was conducted in the Laboratory for Stable Isotope Geochemistry, Institute of Geology and Geophysics (Beijing, China). An MT-253 mass spectrometer was used for isotope analysis with an analytical precision defined by two-sigma uncertainties of 0.2% for δ 18 O and 2% for δ 2 H. The values were all expressed in ‰. The value was according to the standard mean ocean water (SMOW; Appendix A, Table A1). The majority of chemistry data were primarily obtained between 375-1 and 375-9. Because the seasonal rainfall and fissure water decreased and dried out, some water samples of some water outlet sites were collected a few times. For example, the water sample of 375-2 was only obtained once in August 2009, and after 2009, 375-2 dried up. With the exploitation of the -375 m sublevel, the subway stretched forward gradually. Site 375-9 was added in 2012, thus there were only two water samples from this site.    , respectively [29]. The unit for the ions and the TDS concentrations was mg/L, the unit of EC was µs/cm, and the pH values were standard values. Another 600 mL bottle of the water sample was used for environmental isotope analysis (δ 18 O and δ 2 H). The analysis was conducted in the Laboratory for Stable Isotope Geochemistry, Institute of Geology and Geophysics (Beijing, China). An MT-253 mass spectrometer was used for isotope analysis with an analytical precision defined by two-sigma uncertainties of 0.2% for δ 18 O and 2% for δ 2 H. The values were all expressed in % . The value was according to the standard mean ocean water (SMOW; Appendix A, Table A1). The majority of chemistry data were primarily obtained between 375-1 and 375-9. Because the seasonal rainfall and fissure water decreased and dried out, some water samples of some water outlet sites were collected a few times. For example, the water sample of 375-2 was only obtained once in August 2009, and after 2009, 375-2 dried up. With the exploitation of the −375 m sublevel, the subway stretched forward gradually. Site 375-9 was added in 2012, thus there were only two water samples from this site.

Factor Analysis
Factor analysis is the generalization of PCA. It applies the dimension reduction theory of PCA to the internal structure of the correlation matrix of original variables. Factor analysis is a multivariate statistical analysis method used to reduce variables with related relationships into a few comprehensive factors. According to relevance, factor analysis divides variables into a few significant different groups, with a higher relevance in the same group and a lower relevance in a different group. The variables of each group are represented by a common structure (a common factor). The seven main ions accounted for 99% of the TDS, and EC is directly related to the amount of anions and cations. This finding can be interpreted as all the ions having some relevance (Table 2). Additionally, a statistical description of concentrations of the seven main ions is shown in Table 3. Combining variance and Pearson correlation can explain the dynamic evolution of groundwater solutes. From Table 3, the values of variance were sorted from large to small: had the most variance. Cl − is considered a conservative ion, thus the variance caused by the water-rock reaction in the groundwater mixing was negligible. Cl − , which had the most variance, had a good positive correlation with Na + and Ca 2+ , with correlation coefficients of 0.956 and 0.842, respectively. This may have been due to inhomogeneous distributions of Cl − -bearing minerals and the different recharge of seawater and brine water. Ca 2+ had a similar mean concentration as Mg 2+ . However, Ca 2+ exhibited more variance than Mg 2+ . Ca 2+ had a positive correlation with Cl − and a negative correlation with Mg 2+ and CHO 3 − . In the mining area, beresite and lamprophyre, which contain a large amount of SO 4 2− , were widely distributed. Calcite and dolomite were randomly distributed in fractures. These factors may have resulted in the larger variance value of Ca 2+ . The increased flow rate of the groundwater resulting from mining operations and CO 2 fugacity in fractures could also increase the variance. The concentration of Mg 2+ was correlated with Cl − and Na + , with correlation coefficients of −0.555 and −0.496, respectively. This might have been due to the rate and the amount of Quaternary water recharge containing Mg 2+ being much greater than those of seawater.  In most geological studies, the relative magnitudes of variables are critical. Therefore, we attempted to conduct some work on the raw data. In cases where we had to standardize the data, we noted that a seemingly important property may have had a significant impact on the analysis. Standardization tends to inflate variables that display small variance and decrease the influence of variables with large variance [30]. The hydrochemical variables of the 31 water samples studied had different dimensions, and different dimensions caused large differences in the dispersion of each value of the variables. At that time, the total variance was controlled by the variables with large variance. To eliminate the effect due to different dimensions, we used a data normalization method to process the data and then performed a component analysis. In SPSS 25 software (IBM, Armonk, US), the data used for factor analysis and PCA were standardized. The Kaiser-Meyer-Olkin value was 0.585, and the Sig. value of Bartlett's Sphericity was 0.00, which was less than 0.05. This demonstrated that all the samples were suitable for PCA or cluster analysis. A cluster analysis of the water chemistry data without treatment would lead to the repeated use of variable data, causing the geological information represented by the weighted water data to be used, resulting in distortion of the calculated cluster results. As such, factor analysis was used for processing the data, translating the related variables into orthogonal rotation factors, and the orthogonal rotation factors could then be used as new variables for the cluster analysis. Using SPSS 25 software to analyze the data, the original factor, the eigenvalues, and the total variance explained 10 of the variables (Table 4). A cumulative variance contribution rate of 90% was considered to be a good explanation for hydrochemical information reflected in the data [31][32][33]. Therefore, extracting four original factors was reasonable. These four original factors accounted for 91.79% of the total variance. The purpose of establishing the factor analysis model was not only to identify the main factors but also to determine a clear meaning represented by the main factors to conduct a deeper analysis of the variables. Factor rotation can make the distinction between common factors more noticeable under the condition that the contribution rate of the common factors to the variable remains unchanged (Table 5).

Clustering Analysis
Clustering analysis is the main branch of multivariate statistical methods. Its primary purpose is to cluster water samples according to the properties of the variables. According to the Euclidean distance, clustering analysis divides water samples into a few groups, in which water samples in the same group have the most similar characteristics. The results of clustering analysis show that samples in the same group have the same properties, and vice versa. The most commonly used clustering method is hierarchical clustering analysis, which can identify the initial relationship between any sample and all the water sample data and visually demonstrate the relationship with a dendrogram of the clustering analysis. In this paper, the normalization matrix of the original data was multiplied by the component score coefficient matrix to obtain the factor score matrix, and the factor scores were then used as clustering variables. In MATLAB, the factor score matrix was obtained ( Table 6). The factor scores matrix of the water sample variables was used to perform the clustering analysis. The clustering groups were connected by the Ward method, and the similarity of the factor scores was demonstrated by Euclidean distances. Euclidean distance is often used to describe the similarity of water samples and can express the difference in a sample's value. Euclidean distance can be measured by the following equation [34]: where i and k are two different water samples, j is the mode of the variables, and v is the value of the variables. The smaller the Euclidean distance D is, the more similar the water samples are. The D value quantitatively shows the degree of affinity between water samples.

Discriminant Analysis
Using some special principle rules-a dichotomy-discriminant analysis usually divides cases into a few groups. Correspondingly, the water samples that belong to the same group share common properties. Most importantly, discriminant analysis could establish discriminant functions of only the original data of two or more variables, thus we could determine which group the water samples belonged to. In discriminant analysis, not as many variables are possible, but the method should select the main variables to discriminate, because each variable has a special role. Some play a major role, and some play a minor role. Putting the variables that play minor roles into the discriminant function not only increases the number of calculations but also causes interference, which influences the discrimination effect. The stepwise discriminant analysis method is mainly based on the importance of each variable in the discriminant function for selecting the best variables [35]. Stepwise discriminant analysis gradually introduces the "most important" variable into the discriminant function and tests some of the variables that are introduced first. If the discriminant ability with the incoming new variables becomes insignificant, they are removed until no new other variables enter and no old variables are removed. We used the remaining water samples to establish the Bayes linear discriminant function. Among all the variables that we selected, pH, EC, and TDS were the most direct expressions of the seven common ions in the water samples, and those variables had a high degree of correlation.
The mode was established using SPSS 25 software; stepwise discriminant analysis of the data collected from the selected water samples was performed, and the Bayes linear discriminant function was obtained as follows: where M1 is the mode of the water samples from 375-5 and 375-6, M2 is the mode of the other water samples, and V(Na + ) and V(CHO − 3 ) are the corresponding ion mass concentrations.

Results of Multivariate Statistical Analysis
Not all variables are isolated in the water of the laneways. The greater the correlation between ions is, the higher the degree of coincidence geological information is. Using SPSS 25 software to determine the correlations of the 10 variables, the results demonstrated that Na + had a strong positive correlation with Cl − ; Ca 2+ had a strong negative correlation with Mg 2+ and CHO 3 − , instead of a positive correlation with TDS; Mg 2+ had a strong positive correlation with CHO 3 − ; Cl − had a strong positive correlation with Ca 2+ , TDS, EC, and TDS (Table 2). Using the maximum variance method to rotate the factor load, after 13 iterations, the result converged. The variance of the variables became more significantly different, and the contribution rate of the variables for a special factor was more easily obtained (Table 5). Fa1 had a large factor load in Na + , Ca 2+ , Mg 2+ , Cl − , CHO 3 − , and TDS. Fa2 had a large factor load in Na + , pH, and EC. Fa3 had a large factor load in K + , and Fa4 had a large factor load in SO 4 2− . The factor load matrix is the basis for calculating the component score coefficient matrix. The factor load matrix's component score coefficient matrix is shown in Table 7. According to the results of the clustering analysis (Table 8), three water samples from 375-6-3, 375-7-1, and 375-8-1, which were defined as incorrect, were removed, and 375-1-2 and 375-4-4 were selected as the testing water samples (Table 9). We analyzed 31 water samples with SPSS 25 with the clustering method, and the results are presented in Figure 4. From the dendrogram of the hierarchical clustering analysis, we observed that, between the 1740 line and the 2740 line at the −375 m sublevel, the style of the water sample site was clearly divided into two statistically significant clusters at (D link /D max ) × 100 < 60 [32]. The 375-1, 375-2, 375-3, 375-4, 375-7, 375-8, and 375-9 samples belonged to the first mode (M1), and the 375-5 and the 375-6 samples belonged to another mode (M2). The locations of 375-5 and 375-6 were very close (Figure 2), and their close distance supported their similarity. Initially, due to tunnel excavation, a large amount of artificial drainage occurred, and groundwater flow accelerated, which removed much of the brine. Seawater and Quaternary water, which have a high hydraulic gradient, infiltrated the −375 m sublevel through bedrock fractures. Simultaneously, the F3 fault and the NW-oriented fractures opened due to mining-induced rock movement and stress redistribution [7,36]. M2 water samples that were affected by the F3 fault mainly received the vertical recharge of Quaternary water and partially received the lateral recharge of seawater. However, M1 water samples, which were affected by NW-oriented fractures and were obtained far away from the F3 fault, mainly received the lateral recharge of seawater and partially received the vertical recharge of Quaternary water. Therefore, according to the hydrological conditions and the results of cluster analysis, the water samples in the −375 m sublevel could be divided into two typical modes (M1 and M2). According to the Bayes posterior probability principle, the corresponding ion mass concentration was brought into the Bayes linear discriminant function. The function with a larger value determined the mode. The corresponding ion mass concentrations of 26 water samples and two test water samples were brought into the above two Bayes linear discriminant functions, and the resulting data are provided in Table 8. The cross-comparison between the results (Figure 4) of hierarchical clustering analysis and the results (Table 8) of stepwise discriminant analysis showed that the Bayes linear function established by the stepwise discriminant analysis method achieved 100% accuracy in identifying the water sources.  According to the Bayes posterior probability principle, the corresponding ion mass concentration was brought into the Bayes linear discriminant function. The function with a larger value determined the mode. The corresponding ion mass concentrations of 26 water samples and two

Discussion
Groundwater dynamics are complex, and this complexity is enhanced by the heterogeneity of the rock mass and the mining operations. The apertures of faults and fractures can open or close due to mining activities. Frequently, due to the release of underground stress, the fault is activated or the fractures are expanded, which can then intensify the flow rate of groundwater. For coastal mining operations, research on the roadway fracture network is critical.
The combined multivariate statistical methods were used to successfully identify groundwater sources and assess groundwater dynamics. According to the factor analysis results, the factor scores were plotted in Figure 5, which shows that the groundwater was divided into two types. From the location of water sites (Figure 2), M1 was close to the F3 fault, and M2 had a relatively large Euclidean distance compared to M1.
In this studied area, the recharge sources of groundwater are mainly freshwater, Quaternary water, and seawater. In the process of groundwater mixing, ion exchanges and hydrochemical reactions occur. However, using stable isotopes (δ 18 O and δ 2 H) to identify groundwater is reliable, because they are considered conservative tracers. The values of δ 18 O and δ 2 H (Appendix A Table A1) are plotted in Figure 6, which also includes the global meteoric water line (GMWL) and the local meteoric water line (LMWL). The δ 18 O and the δ 2 H values from −375 m sublevel water samples ranged from −24.546% to −6.478% and from −3.328 to −1.215, respectively. The δ 18 O and the δ 2 H values of freshwater were −7.543% and −53.44% , respectively. The δ 18 O and the δ 2 H values of the Quaternary water were −3.585 % and −32.03% respectively, whereas the seawater had the greatest values (0.036% and −5.405% , respectively) of δ 18 O and δ 2 H. All values except those for the site 375-6-4 were plotted in a polygon defined by freshwater, Quaternary water, seawater, and LMWL. This demonstrated that all water samples were mixes of these three types of water. M1 was nearing the Quaternary water. This demonstrated that M1 samples were mainly composed of Quaternary water. M2 was located between seawater and Quaternary water. The mixing of seawater and Quaternary water explained the M2 samples. The F3 fault and the NW-NE fractures had a significant effect on groundwater sources. A field investigation revealed that the F3 fault, whose width of the fracture zone was approximately 15-35 m, was a deep fault, and in the −375 m sublevel, there were 471 fractures with three prior attitudes: 38 • < 89 • , 16 • < 15 • , and 120 • < 83 • . Above the −600 m sublevel, Quaternary water and seawater were dominant sources, and F3 and the NW-oriented fractures were the main channels for these sources for deeper parts of tunnels [7,36]. M1 water samples were recharged by Quaternary water through the F3 fault in the vertical direction, and M2 water samples were partly recharged by Quaternary water in the vertical direction through the F3 fault and mostly by seawater in the lateral direction through NW−NE fractures.
resulting data are provided in Table 8. The cross-comparison between the results (Figure 4) of hierarchical clustering analysis and the results (Table 8) of stepwise discriminant analysis showed that the Bayes linear function established by the stepwise discriminant analysis method achieved 100% accuracy in identifying the water sources.

Discussion
Groundwater dynamics are complex, and this complexity is enhanced by the heterogeneity of the rock mass and the mining operations. The apertures of faults and fractures can open or close due to mining activities. Frequently, due to the release of underground stress, the fault is activated or the fractures are expanded, which can then intensify the flow rate of groundwater. For coastal mining operations, research on the roadway fracture network is critical.
The combined multivariate statistical methods were used to successfully identify groundwater sources and assess groundwater dynamics. According to the factor analysis results, the factor scores were plotted in Figure 5, which shows that the groundwater was divided into two types. From the location of water sites (Figure 2), M1 was close to the F3 fault, and M2 had a relatively large Euclidean distance compared to M1. In this studied area, the recharge sources of groundwater are mainly freshwater, Quaternary water, and seawater. In the process of groundwater mixing, ion exchanges and hydrochemical reactions occur. However, using stable isotopes (δ 18 O and δ 2 H) to identify groundwater is reliable, because they are considered conservative tracers. The values of δ 18 O and δ 2 H (Appendix A Table A1) are plotted in Figure 6, which also includes the global meteoric water line (GMWL) and the local meteoric water line (LMWL). The δ 18 O and the δ 2 H values from -375 m sublevel water samples ranged from -24.546‰ to -6.478‰ and from -3.328 to -1.215, respectively. The δ 18 O and the δ 2 H values of freshwater were -7.543‰ and -53.44‰, respectively. The δ 18 O and the δ 2 H values of the Quaternary water were -3.585 ‰ and -32.03‰ respectively, whereas the seawater had the greatest values (0.036‰ and -5.405‰, respectively) of δ 18 O and δ 2 H. All values except those for the site 375-6-4 were plotted in a polygon defined by freshwater, Quaternary water, seawater, and LMWL. This demonstrated that all water samples were mixes of these three types of water. M1 was nearing the Quaternary water. This demonstrated that M1 samples were mainly composed of Quaternary water. M2 was located between seawater and Quaternary water. The mixing of seawater and Quaternary water explained the M2 samples. The F3 fault and the NW-NE fractures had a significant effect on groundwater sources. A field investigation revealed that the F3 fault, whose width of the fracture zone was approximately 15-35 m, was a deep fault, and in the -375 m sublevel, there were 471 fractures with three prior attitudes: 38° < 89°, 16° < 15°, and 120° < 83°. Above the -600 m sublevel, Quaternary water and seawater were dominant sources, and F3 and the NW-oriented fractures were the main channels for these sources for deeper parts of tunnels [7,36]. M1 water samples were recharged by Quaternary water through the F3 fault in the vertical direction, and M2 water samples were partly recharged by Quaternary water in the vertical direction through the F3 fault and mostly by seawater in the lateral direction through NW−NE fractures. The combination of multivariate statistical analysis methods enables deriving a Bayes linear function that quantifies the type of water source. The method proposed in this paper is different from most presented in other research. We focused on types of water samples and rapidly and economically determined the specific type of water to which the samples belonged. However, some research [14][15][16]20,37] has been dedicated to studying the specific proportion of end-members of mixed water samples. Because of existing hydrochemical reactions, the end-members were uncertain. Although the specific proportion of end-members was known, it was not possible to determine which The combination of multivariate statistical analysis methods enables deriving a Bayes linear function that quantifies the type of water source. The method proposed in this paper is different from most presented in other research. We focused on types of water samples and rapidly and economically determined the specific type of water to which the samples belonged. However, some research [14][15][16]20,37] has been dedicated to studying the specific proportion of end-members of mixed water samples. Because of existing hydrochemical reactions, the end-members were uncertain. Although the specific proportion of end-members was known, it was not possible to determine which type of water belonged to which sample. Other studies [38][39][40] used PCA and clustering analysis to determine the water sources. This multivariate statistical analysis could also identify water source types, but it did not have a self-test feature and could not produce simple and direct functions. From the above discussion, we found that groundwater of the −375 m sublevel mainly consisted of seawater and Quaternary water. The F3 fault was the main vertical recharge channel of Quaternary water, and NW-oriented fractures were the main lateral recharge channel of seawater. The ratio of these two water sources is considerably different at different water sites, depending on the apertures of fractures caused by mining-induced stress redistribution. We speculate that lateral seawater recharge is the main source of mine water inrush, and the F3 fault is a potential channel of water inrush. Therefore, the water chemistry and the flow monitoring of M1 water samples are the focus of our attention.

Conclusions
In the study area, because of mining operations, the Quaternary aquifer was broken, and groundwater flow accelerated. This led to a recharge of seawater and freshwater becoming the dominant water source, with a small amount of vertical recharge occasionally occurring. The depth of recharge had already reached −510 m at the time of the study. In particular, the proportion of freshwater at most sites above −375 m exceeded 40%. [36]. It is possible to study the likelihood of intrusion and identify the water source type. The multivariate statistical analysis described here is useful for identifying the sources' types of mine groundwater.
(1) According to the hydrochemical information of the water samples from Xishan gold mine, the hydrochemical data were preprocessed, and the water samples were divided into M1 and M2 by factor analysis combined with principle component analysis. Among the 31 water samples, three were discriminated incorrectly, and the correct rate of discrimination was 90.3%. (2) Stepwise discriminant analysis and factor analysis were combined to process the data of seven conventional ions. The Bayes linear discriminant function and function values from the 1740 exploration line to 2740 exploration line in the −375 m sublevel were obtained. The Bayes linear function discriminant results were completely consistent with the results of the factor analysis method, and the two selected discriminant water samples also agreed. The consistency of the discriminant results showed that the factor analysis method and the stepwise analysis method were mutually verified. (3) A multivariate statistical method was combined to obtain a quantitative Bayes linear discriminant function, which was applied to recognize the source type in the mining area. It was only necessary to know the ion concentration of the corresponding variable, and the water sample type could be determined by substituting the value of the corresponding variable. This method is accurate, fast, and economical.
The method introduced in this paper could not identify fuzzy water samples that were close in terms of distance or extremely similar in water chemistry and could not provide specific measurements and prevention of mine water inrush. However, the method, which is fast and economical, could accurately identify the type of water sample in the study area. More importantly, this method has barely been studied and is universal for almost all water-bearing mines. The method was applied to other water-containing roadways of mines to identify groundwater sources, and then, according to the hydrogeological conditions, Bayes functions were finally obtained, which may serve as a scientific basis to solve mine water inrush problems.