Subsampling of Regional-Scale Database for improving Multivariate Analysis Interpretation of Groundwater Chemical Evolution and Ion Sources

Multivariate statistics are widely and routinely used in the field of hydrogeochemistry. Trace elements, for which numerous samples show concentrations below the detection limit (censored data from a truncated dataset), are removed from the dataset in the multivariate treatment. This study now proposes an approach that consists of avoiding the truncation of the dataset of some critical elements, such as those recognized as sensitive elements regarding human health (fluoride, iron, and manganese). The method aims to reduce the dataset to increase the statistical representativeness of critical elements. This method allows a robust statistical comparison between a regional comprehensive dataset and a subset of this regional database. The results from hierarchical Cluster analysis (HCA) and principal component analysis (PCA) were generated and compared with results from the whole dataset. The proposed approach allowed for improvement in the understanding of the chemical evolution pathways of groundwater. Samples from the subset belong to the same flow line from a statistical point of view, and other samples from the database can then be compared with the samples of the subset and discussed according to their stage of evolution. The results obtained after the introduction of fluoride in the multivariate treatment suggest that dissolved fluoride can be gained either from the interaction of groundwater with marine clays or from the interaction of groundwater with Precambrian bedrock aquifers. The results partly explain why the groundwater chemical background of the region is relatively high in fluoride contents, resulting in frequent excess in regards to drinking water standards.


Introduction
The goal of multivariate statistical approaches is to discriminate relationships that exist within a dataset of n × m dimensions (i.e., n samples and m chemical elements).For the interpretation of the geochemical origins of groundwater, the purpose is to group samples that share geochemical similarities or to group chemical elements that characterize a group of samples together.Such multivariate techniques are likely to define the intrinsic characteristics of a dataset, as they essentially generate information that helps to formulate hypotheses [1].However, there are some limitations the user must address to make better use of the multivariate statistical tools.
Truncated distributions are one of the limitations where some data are missing in a dataset (censored data).In geochemistry, such truncations are caused by the detection limit (DL) of the analytical methods.The DL is a function of (1) the analytical apparatus used for the determination of the concentrations of the chemical elements and (2) the purity of the aqueous matrix.Therefore, the DL is not the same, depending on the chemical element analyzed (apparatus), and varies from sample to sample (aqueous matrix).
Censored data (i.e., under the DL) are usually replaced by a single value that is commonly DL/2 [2].When the amount of data under the DL is high, the DL/2 values significantly impact the correlation factor between two chemical elements.Most multivariate analysis techniques are based on the correlation matrix built from several chemical elements [1,3].Consequently, the use of the single value DL/2 for the concentration of chemical elements under the DL would induce a bias of results from multivariate analyses.[2] performed data experiments and showed that the performance of the statistical multivariate processing was weakened when the number of substitutions of DL by DL/2 was applied for approximately 30% of the dataset.This phenomenon is referred to as the noise of the analysis [2].
Critical elements that are recognized as sensitive for human health, such as fluoride, barium, manganese, iron, and aluminum [4], commonly fall into this group.These elements may occur in high concentration locally, and are thus of interest for tracking their sources and processes for accounting for their anomalous high concentration.
The objective of this study is to propose a new approach for investigating regional-scale databases of groundwater quality by reducing the impact of censored data for multivariate treatments (i.e., amount of data below DL).The proposed method involves representative subsampling of the regional dataset.Statistical treatments, such as hierarchical Cluster analysis (HCA) and the principal component analysis, were performed on the subset and successfully compared to previous results obtained on the regional-scale database [5].The subsampling approach generates more accurate treatments with less noise and consequently allows for optimized factorial analysis (FA) for certain chemical parameters of interest.With the subsampling approach, it was possible to better ascertain the evolution of the various types of groundwater and to address the sources of problematic elements, in particular fluorides.The dataset reduction method could be a new method for the study of hydrogeochemical tracers of water-rock interaction mechanisms where the detection limits are still too high to ensure the optimal use of statistical analysis techniques.

Hydrogeological Context
The basement of the Saguenay-Lac-Saint-Jean (SLSJ) region is composed of plutonic rocks belonging to the Precambrian Canadian Shield [6].Around Lake Saint-Jean and in the lowlands, there are several remnants of an Ordovician platform composed of a series of stratified sedimentary rocks, including siliciclastic strata, micritic limestones, and highly fossiliferous alternating limestones and shales [7] (Figure 1).
The regional topography is controlled by the Phanerozoic Saguenay Graben (180 Ma), which is approximately 30 km wide (Figure 2).The northern and southern walls of the Saguenay Graben are bounded by west-northwest fault systems that mark the limits between the highlands (up to 1000 m asl; rugged terrain dominated by thin glacial drift deposits and outcropping areas) and the lowlands (0 to 200 m asl).The regional physiography has controlled the emplacement and the formation of large accumulations of Quaternary deposits (sand, gravel, and clay-silt) to a thickness of up to 180 m in the central lowlands [8,9], where the most populated areas are located.

Figure 1.
Location of the study area and the geographical distribution of all the samples (regionalscale database, 321 samples) collected within the limit of the study area.Samples from the subset (Section 4.2) selected from the regional-scale database are presented using red squares and red circles.When the glaciers retreated approximately 10,000 years ago, the lowlands were invaded by the Laflamme Sea [10].This marine incursion deposited a semicontinuous extensive layer of deep-water sediments consisting of laminated clayey silt and gray silty clay [11], under which are locally-found confined layers of till, moraines, and eskers [12].Various facies of deltaic, littoral, and prelittoral sediments were then deposited during the phase of isostatic rebound that forced the retreat of the Laflamme Sea [11,13], and thus, these facies are stratigraphically superimposed on marine deposits.Location of the study area and the geographical distribution of all the samples (regional-scale database, 321 samples) collected within the limit of the study area.Samples from the subset (Section 4.2) selected from the regional-scale database are presented using red squares and red circles.Location of the study area and the geographical distribution of all the samples (regionalscale database, 321 samples) collected within the limit of the study area.Samples from the subset (Section 4.2) selected from the regional-scale database are presented using red squares and red circles.When the glaciers retreated approximately 10,000 years ago, the lowlands were invaded by the Laflamme Sea [10].This marine incursion deposited a semicontinuous extensive layer of deep-water sediments consisting of laminated clayey silt and gray silty clay [11], under which are locally-found confined layers of till, moraines, and eskers [12].Various facies of deltaic, littoral, and prelittoral sediments were then deposited during the phase of isostatic rebound that forced the retreat of the Laflamme Sea [11,13], and thus, these facies are stratigraphically superimposed on marine deposits.When the glaciers retreated approximately 10,000 years ago, the lowlands were invaded by the Laflamme Sea [10].This marine incursion deposited a semicontinuous extensive layer of deep-water sediments consisting of laminated clayey silt and gray silty clay [11], under which are locally-found confined layers of till, moraines, and eskers [12].Various facies of deltaic, littoral, and prelittoral sediments were then deposited during the phase of isostatic rebound that forced the retreat of the Laflamme Sea [11,13], and thus, these facies are stratigraphically superimposed on marine deposits.
A conceptual cross section presenting the assemblage for the stratigraphic units of the study area, including the highlands/lowlands demarcation, is shown in Figure 3.
According to [5], groundwater mineralization follows two pathways in the SLSJ region (Figure 3).Each of the two salinization paths exerts a major and different influence on the chemical signature of groundwater.First, groundwater in the crystalline bedrock naturally evolves from a recharge-type groundwater to a type of brackish groundwater due to water/rock interactions (plagioclase weathering and mixing with deep basement fluids).Second, the term "water/clay interactions" was introduced to account for a combination of processes: ion exchange, leaching of salt water trapped in the regional aquitard, or both.Mixing with fossil seawater might also increase the groundwater salinity.
Although the groundwater in the SLSJ area is largely of good quality, water with excessive trace element concentrations relative to Canadian drinking water quality guidelines [14], such as fluoride (>1.5 mg/L), barium (>1 mg/L), manganese (>0.05 mg/L), iron (>0.3 mg/L), and aluminum (>0.1 mg/L), have been identified [8].Fluor are among the chemical elements that have been excluded from the multivariate statistical analysis performed by [5].Approximately 20% of the samples from the regional-scale dataset have groundwater fluoride levels exceeding the World Health Organization [4] standards [9].Since Fluoride can have serious health consequences for humans [15], government and municipal authorities are interested in characterizing its source [8].
Geosciences 2019, 9, x FOR PEER REVIEW 4 of 33 A conceptual cross section presenting the assemblage for the stratigraphic units of the study area, including the highlands/lowlands demarcation, is shown in Figure 3.
According to [5], groundwater mineralization follows two pathways in the SLSJ region (Figure 3).Each of the two salinization paths exerts a major and different influence on the chemical signature of groundwater.First, groundwater in the crystalline bedrock naturally evolves from a recharge-type groundwater to a type of brackish groundwater due to water/rock interactions (plagioclase weathering and mixing with deep basement fluids).Second, the term "water/clay interactions" was introduced to account for a combination of processes: ion exchange, leaching of salt water trapped in the regional aquitard, or both.Mixing with fossil seawater might also increase the groundwater salinity.
Although the groundwater in the SLSJ area is largely of good quality, water with excessive trace element concentrations relative to Canadian drinking water quality guidelines [14], such as fluoride (>1.5 mg/L), barium (>1 mg/L), manganese (>0.05 mg/L), iron (>0.3 mg/L), and aluminum (>0.1 mg/L), have been identified [8].Fluor are among the chemical elements that have been excluded from the multivariate statistical analysis performed by [5].Approximately 20% of the samples from the regional-scale dataset have groundwater fluoride levels exceeding the World Health Organization [4] standards [9].Since Fluoride can have serious health consequences for humans [15], government and municipal authorities are interested in characterizing its source [8].

The Statistical Treatment of the Regional Database
Samples were analyzed for a total of 37 elements-major (6), minor (5), and trace elements (26)from a 321 sample dataset [5]. Figure 1 shows the location of the sampling sites.With conventional statistics, the trace elements, for which at least 30% of the samples have concentrations below the DL, should be removed from the dataset for interpretation (truncated dataset), as suggested [2].The detection limits for trace elements are such that only 10 chemical elements (K + , HCO3 − , Mg 2+ , SiO2, Na + , Ca 2+ , Ba 2+ , Sr 2+ , SO4 2− , and Mn 2+ ) had less than 25% censored data and were considered in the multivariate treatment [5].The other 23 chemical elements of the regional-scale database were excluded from the multivariate statistical analysis because they had >25% of data below the DL.

The Statistical Treatment of the Regional Database
Samples were analyzed for a total of 37 elements-major (6), minor (5), and trace elements (26)-from a 321 sample dataset [5]. Figure 1 shows the location of the sampling sites.With conventional statistics, the trace elements, for which at least 30% of the samples have concentrations below the DL, should be removed from the dataset for interpretation (truncated dataset), as suggested [2].The detection limits for trace elements are such that only 10 chemical elements (K + , HCO 3 − , Mg 2+ , SiO 2 , Na + , Ca 2+ , Ba 2+ , Sr 2+ , SO 4 2− , and Mn 2+ ) had less than 25% censored data and were considered in the multivariate treatment [5].The other 23 chemical elements of the regional-scale database were excluded from the multivariate statistical analysis because they had >25% of data below the DL.Therefore, critical elements, such as fluorides, barium, manganese, iron and aluminum, were useless for addressing their source.

Methodology
The proposed method aims to reduce the initial dataset to a representative unbiased subgroup to improve the detection ratio of some key chemical elements.The detection ratio is expressed as a percentage of chemical elements above the DL: % NDetected = [NDetected/NDatabase] × 100.The regional-scale database reduction method is based on the current knowledge of the natural evolution of groundwater (i.e., salinization process) of the study area [5], as detailed below.

Sample Grouping as Hydrogeochemical Poles (RHPs)
In the SLSJ region, the chemistry of groundwater evolves specifically as it flows through the various aquifer types.The regional-scale database was divided into two groups (Figure 1): samples collected in fractured rock aquifers, and samples taken from granular aquifers.Chemical interaction and flow are known to occur simultaneously at all scales of space and time from the surface to the deepest of the porous parts of the continental crust [16,17].The groundwater chemical evolution can be characterized by considering the water types that are typically found in different zones of groundwater flow systems [18][19][20].Regardless of the aquifer type, the evolution of the groundwater causes the modification of the bicarbonate-type hydrochemical facies (HCO 3 − ) into a chloride-type hydrochemical facies (Cl − ).Consequently, the regional-scale database was divided into four groups (i.e., regional hydrogeochemical poles (RHPs)) according to permutation of the water types combined with the two aquifer types.In each of the four groups, samples are selected to form the subset of data by considering the detection ratio for the key chemical elements in each pole.This means that if the detection ratio of a chemical parameter is higher in a subgroup than in the other subgroups, then this chemical parameter characterizes the sub-group.

Detection Ratio of the Key Chemical Elements
Each pole should contain a number of samples that better characterizes them.These samples contain the largest number of key chemical elements with less than 25% of the results below the DL.For each pole, the detection ratios of all the analyzed chemical elements (37 in total) are plotted on a multielement graph and are compared.A greater detection ratio for chemical elements specific to a pole is considered a criterion of representativeness for a given RHP.When the detection ratio of a trace chemical element stands out in an RHP, it is considered a key chemical element.Numerous key chemical elements can characterize an RHP.With key chemical elements identified for a pole, the samples containing the largest number of key chemical elements are selected.The aim is to increase the detection ratio of the key chemical elements until it exceeds the threshold (>25%).

Multivariate Analysis
The software Statistica version 6.1 [21] was used to perform multivariate statistical analyses.The Box-Cox power transformation [22] was applied to the subset to approximate a normal distribution.The chemical elements were standardized by subtracting the mean concentration of a given element from each measured concentration in the samples of the selected subset and by dividing them by the standard deviation of the distribution [23].While [2] suggested that the number of analytical values below DL should not exceed 30%, the threshold was lowered to 25% in this study.Consequently, 25% or less of the data below DL were replaced by DL/2.
Hierarchical Cluster analysis produces different Cluster samples, each being characterized by some specific chemical content.With principal component analysis (PCA), the emphasis is on explaining the total variance of the Clusters as linear functions of the chemical elements (i.e., principal components).Factorial analysis (FA) attempts to explain covariances [1] expressed as linear combinations of a small number of highly correlated variables (i.e., the chemical elements).The combinations of variables are called factors.

Identification of the Regional Hydrogeochemical Poles (RHPs)
By combining the two groundwater types (HCO  1. In Table 1, RHPs of the chloride-type groundwater (advanced stage of hydrochemical evolution; RHP 2 and RHP 4) show a total of dissolved solid (TDS) higher than the TDS of RHPs of the initial hydrochemical evolution stages (bicarbonate type; RHP 1 and RHP 3).Major chemical elements (sodium, magnesium, chlorides, sulfates, and boron) and some of the trace elements (bromides and ammonium) also show higher concentrations in chloride-type groundwater.The highest median for TDS is recorded in the RHPs of the advanced chemical stage of groundwater evolution in fractured rock aquifers (RHP 4).The lowest median for TDS is recorded in the RHPs of the initial chemical evolution stage of groundwater in granular media.Median concentrations for nitrate and phosphorus are higher in the RHP 1 and RHP 3 of the early stage of evolution (HCO 3 − ).In the case of nitrates, the highest median concentration is recorded in groundwater collected from granular aquifers.In the case of phosphorus, the highest median concentration is recorded in groundwater collected in fractured rock aquifers.Overall, Table 1 shows that there are significant differences in the chemical content of RHPs.
The detection ratios of the chemical elements for each of the four RHPs are plotted on a multielement graph (Figure 4).Along the x-axis, ionic species are sorted in descending order of abundance for the most diluted groundwater pole RHP 1 (HCO 3 _GranAq); thus, the three other RHPs are "normalized" by the most diluted groundwater endmember.A number of the 10 chemical parameters common to the four RHPs have less than 25% valid data (K + , HCO 3 − , Mg 2+ , SiO 2 , Na + , Ca 2+ , Ba 2+ , Sr 2+ , SO 4 2− , and Mn 2+ ).These 10 chemical elements can be directly used for further multivariate statistical processing, as was done by [5]. Figure 4 also illustrates that the chemical elements (x-axis) have a different number of detection ratios depending on the RHP where they belong.As observed in Figure 4, bromide (Br − ) and lithium (Li + ) present a significant detection ratio in brackish groundwater (RHPs 2 and 4).As presented in Section 4.2, the trace elements with high detection ratios for a given RHP define the key chemical elements; thus, lithium and bromide are considered key chemical elements in RHPs 2 (Cl_Gran) and RHP 4 (Cl_Rock).Nitrates are more frequently detected in groundwater of the bicarbonate type collected in granular aquifers (Figure 4), as well as copper and lead, and are then taken as the key chemical elements for RHP 1 (HCO 3 _Gran).Ammonium shows a detection ratio in favor of RHPs 3 and 4 (fractured rock groundwater) and is thus taken as a key chemical element for RHPs 3 and 4. Boron (B 3+ ) and molybdenum (Mo 6+ ) are more frequently detected in brackish groundwater (chloride type; RHPs 2 and 4) and are thus considered key chemical elements in RHPs 2 (Cl_Gran) and RHP 4 (Cl_Rock).The correlation matrices constructed for the four RHPs (Supplementary Materials S1) consistently reveal a strong positive correlation between molybdenum and fluoride.The redox conditions will influence the content of metals with several valences, including iron and manganese.The approach used aims to find the causes of high values in iron and manganese in the groundwater of the region [8].From Figure 4, manganese already has a detection ratio greater than 75%.Molybdenum, fluoride, and iron were taken as key chemical elements for all RHPs, considering their health importance for the study area.The key chemical elements for each RHP are summarized in Table 2.  multielement graph (Figure 4).Along the x-axis, ionic species are sorted in descending order of abundance for the most diluted groundwater pole RHP 1 (HCO3_GranAq); thus, the three other RHPs are "normalized" by the most diluted groundwater endmember.A number of the 10 chemical parameters common to the four RHPs have less than 25% valid data (K + , HCO3 − , Mg 2+ , SiO2, Na + , Ca 2+ , Ba 2+ , Sr 2+ , SO4 2− , and Mn 2+ ).These 10 chemical elements can be directly used for further multivariate statistical processing, as was done by [5]. Figure 4 also illustrates that the chemical elements (x-axis) have a different number of detection ratios depending on the RHP where they belong.As observed in Figure 4, bromide (Br − ) and lithium (Li + ) present a significant detection ratio in brackish groundwater (RHPs 2 and 4).As presented in Section 4.2, the trace elements with high detection ratios for a given RHP define the key chemical elements; thus, lithium and bromide are considered key chemical elements in RHPs 2 (Cl_Gran) and RHP 4 (Cl_Rock).Nitrates are more frequently detected in groundwater of the bicarbonate type collected in granular aquifers (Figure 4), as well as copper and lead, and are then taken as the key chemical elements for RHP 1 (HCO3_Gran).Ammonium shows a detection ratio in favor of RHPs 3 Samples containing the greatest number of key chemical elements are selected according to the proposed approach presented in Section 4.More specifically, the array of analytical results for each sample in each RHP is transformed into a binary matrix, in which the concentrations of chemical elements below the detection limit are replaced by 0, and the analytical results above the detection limit are replaced by 1.
The key chemical elements (Table 2) are weighted by a factor of 10 in the binary matrix.Numbers (replacement values: 0, 1, or 10) are summed for each sample in each RHP.The samples are selected among the best scores in each RHP and are gathered to form the subset.
The final grouping of the samples selected to form the subset has the effect of reducing the detection ratio of some key chemical elements, especially those that are specific to one or two RHPs (Table 2).In the newly formed subset, concentrations above the DL for bromide, lithium, nitrate, copper, and lead could not be increased above the threshold (<25% of data below DL).
As a result, 16 samples were selected for RHP 1, 9 for RHP 2, 16 for RHP 3, and 10 for RHP 4 to form a subset of 51 samples.Table 3 presents the descriptive statistics (values above the detection limit-N; the median, the first-25; and third-75 quartiles, the maximum-Max; and the minimum-Min) for the 51 samples of the subset.The chemical elements that are used for multivariate statistical processing are shown in bold in Table 3 and marked with a cross in the "Multivariate" column.The physical and chemical data for the subset are presented in Table 4.The 51 samples have a good distribution over the study area (Figure 1).The ionic concentrations of the subset are compared to the one of the regional-scale databases in Section 5.2.

Chemical Characteristics of the Subset versus the Regional-Scale Database
Figure 5 shows the comparison between the detection ratios for the regional-scale database (N = 321 samples) and the subset (N = 51 samples).On the x-axis, the ionic species are sorted in descending order of detection ratio for the regional-scale database.For the subset, four new chemical elements have now risen above 75% of data > DL: fluoride, ammonium, iron, and molybdenum.A total of 18 chemicals could be justifiably included in the statistical multivariate treatment (valid data exceeding 75%).Only the detection ratio for aluminum and chromium has decreased slightly.

Chemical Characteristics of the Subset versus the Regional-Scale Database
Figure 5 shows the comparison between the detection ratios for the regional-scale database (N = 321 samples) and the subset (N = 51 samples).On the x-axis, the ionic species are sorted in descending order of detection ratio for the regional-scale database.For the subset, four new chemical elements have now risen above 75% of data > DL: fluoride, ammonium, iron, and molybdenum.A total of 18 chemicals could be justifiably included in the statistical multivariate treatment (valid data exceeding 75%).Only the detection ratio for aluminum and chromium has decreased slightly.To evaluate the effect of the dataset reduction, the ionic content difference of the regional-scale database is compared to the subset of samples.Figure 6 shows the comparison between the median concentration in ppm for each chemical element of the database and the subset on a log scale (y-axis).Chemical elements (x-axis) are sorted in descending order of differences between median concentrations (∆ median concentration -m.c.-expressed in ppm) for the 321 sample database and the 51 sample subset.Variations in the medians of the two datasets calculated in percentages are presented in the supplementary material (Supplementary Materials S2).To evaluate the effect of the dataset reduction, the ionic content difference of the regional-scale database is compared to the subset of samples.Figure 6 shows the comparison between the median concentration in ppm for each chemical element of the database and the subset on a log scale (y-axis).Chemical elements (x-axis) are sorted in descending order of differences between median concentrations (∆ median concentration -m.c.-expressed in ppm) for the 321 sample database and the 51 sample subset.Variations in the medians of the two datasets calculated in percentages are presented in the supplementary material (Supplementary Materials S2).
By reducing the regional-scale database, the ionic content of the 51 sample subset increased, although the median concentration (m.c.) of some chemical elements remained unchanged or decreased.Chemical elements with decreasing content are vanadium (88% of the m.c. of the 321 sample database), uranium (86% of the m.c. of the 321 sample database), phosphate (71% of the m.c. of the 321 sample database), and copper (63% of the m.c. of the 321 sample database).Selenium, beryllium, titanium, and bismuth are all <DL in the 51 sample subset.For silver, nitrate, and nickel, the m.c. for the 51 sample subset is equal to the m.c. for the 321 sample database.The most significant increase is for chloride (370% of the m.c. of the 321 sample database) and ammonium (288% of the m.c. of the 321 sample database).The correlation matrices for the subset and the database (Supplementary Materials S1 and S3) showed that ammonium was highly correlated with the major elements (K + , HCO 3 − , Mg 2+ , Na + , Ca 2+ , Cl − ,and SO 4 2− ), as well as some minor elements (Ba 2+ , Sr 2+ , and B 3+ ) and some trace elements (Li 2+ , Br − and Pb 2+ ).The possible origin of dissolved ammonium is discussed based on factorial analysis results in Section 6.By reducing the regional-scale database, the ionic content of the 51 sample subset increased, although the median concentration (m.c.) of some chemical elements remained unchanged or decreased.Chemical elements with decreasing content are vanadium (88% of the m.c. of the 321 sample database), uranium (86% of the m.c. of the 321 sample database), phosphate (71% of the m.c. of the 321 sample database), and copper (63% of the m.c. of the 321 sample database).Selenium, beryllium, titanium, and bismuth are all <DL in the 51 sample subset.For silver, nitrate, and nickel, the m.c. for the 51 sample subset is equal to the m.c. for the 321 sample database.The most significant increase is for chloride (370% of the m.c. of the 321 sample database) and ammonium (288% of the m.c. of the 321 sample database).The correlation matrices for the subset and the database (Supplementary Materials S1 and S3) showed that ammonium was highly correlated with the major elements (K + , HCO3 − , Mg 2+ , Na + , Ca 2+ , Cl − ,and SO4 2− ), as well as some minor elements (Ba 2+ , Sr 2+ , and B 3+ ) and some trace elements (Li 2+ , Br − and Pb 2+ ).The possible origin of dissolved ammonium is discussed based on factorial analysis results in Section 6.
Compared to the original 321 sample dataset, the 51 sample subset contains concentrated samples for major (cation and anion), minor, and some trace metallic elements and is depleted for some ultra-trace elements (selenium, beryllium, titanium, and bismuth).The overall enrichment of the subset suggests that groundwater chemistry of the subset is more mature in terms of water/rock interactions.Thus, the subset is suitable for studying the link between groundwater mineralization processes (i.e., salinization) and the presence of specific dissolved ions.

Hierarchical Cluster Analysis Results
In hierarchical Cluster analysis (HCA), the number of Clusters is determined by the position of the phenon line that cuts across the resulting dendrogram (Figure 7A).In our HCA, the distance linkage (Dlink) of the samples is expressed as a percentage of the maximum distance (Dmax) between the most dissimilar samples ([Dlink/Dmax] × 100).At this scale, 100% linkage (Dlink = Dmax) would Compared to the original 321 sample dataset, the 51 sample subset contains concentrated samples for major (cation and anion), minor, and some trace metallic elements and is depleted for some ultra-trace elements (selenium, beryllium, titanium, and bismuth).The overall enrichment of the subset suggests that groundwater chemistry of the subset is more mature in terms of water/rock interactions.Thus, the subset is suitable for studying the link between groundwater mineralization processes (i.e., salinization) and the presence of specific dissolved ions.

Results
Comparison between Subset and Regional-Scale Database

Hierarchical Cluster Analysis Results
In hierarchical Cluster analysis (HCA), the number of Clusters is determined by the position of the phenon line that cuts across the resulting dendrogram (Figure 7A).In our HCA, the distance linkage (Dlink) of the samples is expressed as a percentage of the maximum distance (Dmax) between the most dissimilar samples ([Dlink/Dmax] × 100).At this scale, 100% linkage (Dlink = Dmax) would indicate that all samples are grouped into a single Cluster that corresponds to the 51-sample subset.For this study, four Clusters (Dlink < 30% of Dmax) were selected (Figure 7A).Major cation and anion concentrations for the four Clusters were projected on a log-scale spider diagram in ppm (Figure 7B).Table 5 shows the median concentrations for the physical and analytical parameters, as well as the water type, aquifer type, and hydrogeological context of the Clusters.
indicate that all samples are grouped into a single Cluster that corresponds to the 51-sample subset.For this study, four Clusters (Dlink < 30% of Dmax) were selected (Figure 7A).Major cation and anion concentrations for the four Clusters were projected on a log-scale spider diagram in ppm (Figure 7B).Table 5 shows the median concentrations for the physical and analytical parameters, as well as the water type, aquifer type, and hydrogeological context of the Clusters.The results obtained using HCA are like those described in [5] for the regional-scale database (321 samples).Both studies allow for the identification of a Cluster dominated by fractured rock groundwater samples, two Clusters of saline groundwater samples, and at least a Cluster dominated by groundwater samples of the bicarbonate type.Thus, the reduction of the original dataset into a subset does not modify the Clustering obtained with HCA processing.Some distinctions between the two studies are, however, of interest.By positioning the phenon line to obtain four Clusters, the Clusters obtained with the subset of data have a lower percentage of resemblance (27% against 52% in [5]) to the regional-scale database.This observation suggests that the differences between the groups are better defined using the data reduction approach of this study, having the effect of reducing the general geochemical similarities between the samples.This result can be understood as the reduction of background noise, or the identification within the regional-scale database of more representative samples of hydrogeochemical endmembers (early stage or advanced stage of evolution) of the regional hydrogeochemical poles (RHPs).The background noise would then correspond to all types of mixing phenomena that lead to a certain homogenization of the general hydrogeochemistry.Another difference between the results of the two studies is in the Clustering of bicarbonate-type groundwater samples.Whereas [5] identify only one Cluster of bicarbonated groundwater (the most populated, 274 samples out of 321), the HCA applied to the subset of data allows for the definition of two Clusters (Cluster 1 and Cluster 2).Reducing the dataset, therefore, brought out two chemical signatures specific to bicarbonate groundwaters in the study area, Cluster 1 dominated by calcium and Cluster 2 dominated by sodium.The transformation of calcium hydrofacies into sodium hydrofacies is interpreted as one of the first stages in the evolution of groundwater, particularly in crystalline fractured aquifers [24].Clusters 1 and 2 (bicarbonate groundwater) both have a similar proportion of groundwater samples from crystalline rock and granular deposits (Table 5).In the SLSJ region, the evolution of calcium bicarbonate water to sodium bicarbonate water is also effective in the groundwater contained in confined granular deposits, where marine clay acts as a cation exchanger by releasing sodium in exchange for calcium [5,25].Clusters 1 and 2, respectively, represent two successive stages in the evolution of recharge water: on the one hand, in fractured rock aquifers, and on the other hand, in granular aquifers confined by marine clays.
The use of HCA with the reduced dataset also allowed for the grouping of samples of brackish water taken from confined granular aquifers (Cluster 4; Table 5).In [5], no distinction was possible between brackish groundwater from fractured rock aquifers and from confined granular aquifers.

Principal Component Analysis Results
Samples, labeled with their Cluster number as defined in Figure 7A, are plotted on the PCA correspondence circle (Figure 8).The horizontal axis of the correspondence circle corresponds to the first principal component (Component 1; 39.8% of the total variance of the dataset), and the vertical axis corresponds to the second principal component (Component 2; 13.5% of the total variance of the dataset).In Figure 8A The results obtained with the PCA also show a good correspondence with the study of [5] performed on the regional dataset.In both studies, the chemical elements calcium, barium, strontium, manganese and sulfate are associated with Cluster 3 (water-rock interactions), and magnesium, sodium, boron, silica, potassium, and bicarbonate are associated with Cluster 4 (water-clay interaction type).Both studies suggest that an increase of the salinity enhances the fingerprint of the hydrogeological context (nature of the aquifer and hydraulic conditions) on the major ions of groundwater chemistry.However, the regional-scale database reduction approach described in this paper allows for new insights into the presence of certain undesirable ions, such as fluorides.Fluorides are associated with Cluster 4, suggesting that some of the fluorides come from environments dominated by water-clay interactions, as suggested by [5].The water-clay interactions are characterized by a combination of The results obtained with the PCA also show a good correspondence with the study of [5] performed on the regional dataset.In both studies, the chemical elements calcium, barium, strontium, manganese and sulfate are associated with Cluster 3 (water-rock interactions), and magnesium, sodium, boron, silica, potassium, and bicarbonate are associated with Cluster 4 (water-clay interaction type).Both studies suggest that an increase of the salinity enhances the fingerprint of the hydrogeological context (nature of the aquifer and hydraulic conditions) on the major ions of groundwater chemistry.
However, the regional-scale database reduction approach described in this paper allows for new insights into the presence of certain undesirable ions, such as fluorides.Fluorides are associated with Cluster 4, suggesting that some of the fluorides come from environments dominated by water-clay interactions, as suggested by [5].The water-clay interactions are characterized by a combination of processes: ion exchange, salt leaching of the regional aquitard made of marine clays, or both.It would be relevant to investigate how these mechanisms could lead to an increase of fluorides in groundwater.Clearly, the marine clay found in the SLSJ region is suspected of releasing fluoride into the groundwater, where confining conditions of the aquifers prevail.This assumption is also valuable for the clays originating from the last marine invasion (approximately 10,000 years ago) elsewhere in Quebec Province.
The PCA results also give molybdenum with Cluster 3 (water-rock interactions) and ammonium in association with Cluster 4 (water-clay interactions).Again, these results raise questions about the possible links between these elements and their hydrogeochemical context.

New Graphical Fields and Interpretations Improvement
Walter et.al.[5] discussed the evolution of groundwater using binary graphs and the Piper diagram.Using the same graphs, the interpretation of the hydrogeochemical evolution remains the same but appears in a much more revealing way.A good example is given in Figure 9, the log-log [Ca 2+ vs. Na + ] plot.This figure is similar to the figure presented in [5], with the difference that the Clusters (1 to 4) in Figure 9 correspond to the Clusters identified with the reduced dataset method developed in this study.
The alignment of the Cluster samples observable in Figure 9 indicates a gradual progression of the bicarbonate-type endmembers (Clusters 1 and 2) toward the brackish endmembers (Clusters 3 and 4).Pathways for bicarbonate-type groundwater evolving to brackish groundwater can be proposed by linking Clusters 1 (recharge waters) to Clusters 3 and 4 (brackish groundwater) going through Cluster 2 samples (recharge to intermediate groundwater).Calcium and sodium content evolution pathways presented in Figure 9 define new fields that can be compared to the dilution line of seawater but in terms of water/rock interactions (Pathway 2; Figure 9) and water/clay interactions (Pathway 2; Figure 9).
The mineralization of groundwater usually corresponds to a process of salinization, since the increase of the residence time of the water in the aquifer enhances the dissolution of salt minerals contained in the solid fraction of the porous medium.This causes the groundwater to evolve from a dilute calcium bicarbonate type in recharge areas toward a more concentrated sodium chloride or calcium chloride type along the flow path.The method proposed in this article makes it possible to statistically identify, in a regional database, a series of samples representing the successive stages of the natural evolution of groundwater.These samples were taken from different locations in the study area, but their chemistry corresponds to the possible evolution of groundwater along a regional flow line according to [19] (salinization path 1), or from interaction with marine clay or mixing with seawater (salinization path 2).
Other samples of the regional-scale database previously discarded to form the subset also follow the pathways (Figure 9).As a first approximation, based on their calcium and sodium content, these samples can then be designated as belonging to one of the three main hydrogeochemical contexts: recharge-type groundwater (blue shape in background), groundwater of the water-rock interactions type (red shape in background), and groundwater of the water-clay interactions type (green shape in background).This result gives a first approximation of the hydrogeochemical context of groundwater using the calcium and sodium content of groundwater.Other samples of the regional-scale database previously discarded to form the subset also follow the pathways (Figure 9).As a first approximation, based on their calcium and sodium content, these samples can then be designated as belonging to one of the three main hydrogeochemical contexts: recharge-type groundwater (blue shape in background), groundwater of the water-rock interactions type (red shape in background), and groundwater of the water-clay interactions type (green shape in background).This result gives a first approximation of the hydrogeochemical context of groundwater using the calcium and sodium content of groundwater.
Groundwater chemical evolution can also be discussed by considering major cations and major anions of groundwater using a Piper plot.In Figure 10, the samples are marked with symbols representing the first approximation of the hydrogeochemical context (recharge-type groundwater, water-rock-type interactions, and water-clay-type interactions) interpreted using a Ca 2+ vs. Na + binary plot (Figure 9).Seawater is also shown on the Piper diagram (Figure 10).Clusters presented in Figure 9 are those identified using the reduced set of data.
The proposed method leads to the improvement of the interpretations of the Piper diagram.Based on the reduced dataset, new graphical fields of the Piper diagram are defined for the study area (Figure 10B).Groundwaters that evolved in the context of water-rock interactions are found in the upper part of the lozenge surface, and in the lower part, it is the samples in the context of waterclay interactions that dominate.The samples from Cluster 2 mark a very clear intermediate position between the samples from Cluster 3 and Cluster 4. At the origin of this demarcation, it is the Groundwater chemical evolution can also be discussed by considering major cations and major anions of groundwater using a Piper plot.In Figure 10, the samples are marked with symbols representing the first approximation of the hydrogeochemical context (recharge-type groundwater, water-rock-type interactions, and water-clay-type interactions) interpreted using a Ca 2+ vs. Na + binary plot (Figure 9).Seawater is also shown on the Piper diagram (Figure 10).Clusters presented in Figure 9 are those identified using the reduced set of data.
The proposed method leads to the improvement of the interpretations of the Piper diagram.Based on the reduced dataset, new graphical fields of the Piper diagram are defined for the study area (Figure 10B).Groundwaters that evolved in the context of water-rock interactions are found in the upper part of the lozenge surface, and in the lower part, it is the samples in the context of water-clay interactions that dominate.The samples from Cluster 2 mark a very clear intermediate position between the samples from Cluster 3 and Cluster 4. At the origin of this demarcation, it is the transformation of the calcium facies toward a sodium facies.This intermediate position makes the transition between the signature of recharge-type waters and the brackish groundwater.Based on the interpretations of the hydrogeochemical context from Figure 9, the losangic surface can be split into two parts, by joining the recharge-type pole and the seawater pole (Figure 10B).transformation of the calcium facies toward a sodium facies.This intermediate position makes the transition between the signature of recharge-type waters and the brackish groundwater.Based on the interpretations of the hydrogeochemical context from Figure 9, the losangic surface can be split into two parts, by joining the recharge-type pole and the seawater pole (Figure 10B).A mixing zone is interpreted at the boundary between the two parts of the lozenge surface.The hydrogeochemical context has little impact on the magnesium and sulfate content of groundwater, although magnesium has a slight affinity for water-rock interactions and sulfates preferentially characterize recharge-type waters and water-rock interactions locally.

Factor Analysis
Normalized Varimax rotation was applied to the factors [1,3,23].The Kaiser criterion [23][24][25][26] helps to determine the number of factors to be retained; only those components with Kaiser criterion greater than 1 were retained.Factors were calculated on the symmetrical covariance matrix computed for 16 variables (K + , HCO 3 − , Mg 2+ , SiO 2 , NH 4 + , F − , B 3+ , Mo 6+ , Na + , Ca 2+ , Ba 2+ , Sr 2+ , SO 4 2− , Fe 2+ , and Mn 2+ ).Chloride has been consciously discarded from the factorial analysis after observation of the correlation matrix (Supplementary Materials S1 and S3), as Cl − was highly correlated with all the other major elements and some trace elements and would induce a redundancy.The correlation matrix for the 51-sample subset also revealed that aluminum and zinc had no significant correlation with other chemical elements.To prevent noise introduction in the multivariate statistical treatment, aluminum and zinc have been discarded as well.
The first four factors present loadings (i.e., explained variance; Table 6) greater than 1 and account for 78.8% of the total variance of the subset.The first factor (25.1% of the total variance) is characterized by high loadings for K + , HCO

Discussion
The statistical treatments generated four clusters of water compositions in link with aquifer types and water evolution.However, numerous processes and sources can account in part for critical health elements.In particular, (1) the evolution of the recharge water, (2) the fingerprinting of the Laflamme water sea, (3) the Precambrian basement contribution, and (4) the specific sources of fluorides are addressed below linked with statistical grouping (cluster and factor).
6.1.Evolution of Recharge Groundwater (Clusters 1 and 2; Factor 4) Clusters 1 and 2 belong to a recharge-type groundwater.The fourth factor explains only 10.8% of the total variance of the data set, and thus characterizes only a few samples among the 51 sample subset.In Figure 11c,e,f, sample scores for the fourth factor are projected onto the y-axis.This factor presents high loadings for Fe 2+ and Mn 2+ (Table 5).These are redox-sensitive species [27,28], and thus, their concentrations in groundwater are primarily influenced by the amount of available dissolved oxygen [29].In the presence of dissolved oxygen, principally near surface environments, iron and manganese will be solubilized by oxidizing and acidic conditions that tend to increase the Fe 2+ and Mn 2+ content in groundwater.This finding may explain why Factor 4 predominantly influences (CaNa)-HCO 3 groundwater (Clusters 1 and 2; Figure 11c,e,f). .Two series of vectors represent the evolution of groundwater: one set of vectors represents the evolution of the recharge groundwater (fresh water influx), and the second set of vectors corresponds to the evolution of groundwater from the mixing and dilution zones toward the brackish groundwater (maturation gradient).

Fingerprinting of the Precambrian Shield Brines (Cluster 3; Factor 3)
Chemical data for brines found at depth in the Canadian Shield were compiled from the existing literature to compare the geochemical trend of the samples from this study.A total of 137 chemical analyses from five studies were included: [33], 33 samples; [34], 24 samples; [35], 36 samples; [36], 35 samples; and [24], 17 samples.The compiled data are presented as Supplementary Materials S5.
In Figure 11b,d,f, samples from Cluster 3 are predominantly influenced by the third factor with loadings dominated by Ca 2+ , Ba 2+ , Sr 2+ , and SO4 2− (Table 5).Log-log plots of Ca 2+ , Ba 2+ , Sr 2+ , and SO4  In Figure 11b,d,f, samples from Cluster 3 are predominantly influenced by the third factor with loadings dominated by Ca 2+ , Ba 2+ , Sr 2+ , and SO 4 2− (Table 5).Log-log plots of Ca 2+ , Ba 2+ , Sr 2+ , and SO 4 2− versus Cl − concentrations (Figure 12A-D) for the subset of samples in this study were coupled with the data from other Canadian Shield brines.Frape et al. [36] determined that the geochemistry of Ca 2+ plays a major role in controlling the chemistry of mineralized groundwater in the crystalline bedrock of the Canadian Shield.Frape et al. [36] described very old stagnant groundwater that may have undergone prolonged chemical alteration since its original emplacement.The similar Ca 2+ /Cl − trend for the Cluster 3 and the Precambrian Shield brines (PSB) (Figure 12A) suggests a common origin for Ca 2+ and Cl − .
In Figure 12A,D, Ca 2+ and Sr 2+ concentrations for Clusters 3 and 4 show similar trends, suggesting a common origin for Sr 2+ and Ca 2+ in Cluster 3.For Sr 2+ , as well as Ba 2+ , it is commonly found incorporated as a trace element in the crystalline structure of feldspar plagioclase minerals [37,38].In Figure 12B, the trend for Ba 2+ is different from the trend for Sr 2+ and Ca 2+ .Edmunds and Smedley [39] showed that barite solubility controls Ba 2+ concentration and that Ba 2+ quickly reaches saturation or supersaturation in recharge groundwater.In Figure 12B, Ba 2+ concentrations for Clusters 1 and 2 reach levels similar to the concentrations for Clusters 3 and 4, suggesting that maximal Ba 2+ concentrations are attained in recharge-type groundwater.
Unlike Ba 2+ , Sr 2+ is not limited in terms of solubility [39].Some of the Sr 2+ may be added by the dissolution of anhydrite or gypsum.However, SO 4 2− vs. Cl − (Figure 12C) shows a trend different from Sr 2+ vs. Cl − (Figure 8d), refuting this possibility.According to [24], most of the added Cl − and some SO 4 2− in groundwater collected in selected granitic, gabbroic, and gneissic plutons in the Canadian Shield are derived from the rock matrix.Ions are present either as soluble salts in fluid inclusions and along grain surfaces [40] or as structurally bound elements in micas and amphiboles [41].Sulfate may then be more derived from the oxidation of sulfide minerals in the rock matrix rather than due to the presence of gypsum-infilling minerals in the fractures.For Cluster 3 samples, the strong relationship of Ca 2+ and Sr 2+ versus Cl − concentrations, the Ba 2+ content for recharge groundwater, and the similar trends with the Precambrian Shield brines (PSB), all suggest that Cluster 3 corresponds to some diluted endmembers with a composition such as the PSB compiled in this study.Thus, Cluster 3 belongs to a PSB-type of groundwater.A ratio Ca 2+ /Sr 2+ ≈ 40 is proposed here to identify the PSB groundwater type.  .Seawater (SW) dilution lines were defined using seawater ratios from Goldberg [42].The strong linear correlation between calcium, strontium, and chloride in PSB groundwater suggests a mass ratio Ca 2+ /Sr 2+ ≈ 40 for salinization related to basement fluids.

Sources of Dissolved Fluoride in Groundwater (Factor 2)
The second factor requires investigation as it may provide clues to the origin of dissolved fluoride in groundwater.Factor 2 is characterized by high loadings of F − , B 3+ , Mo 6+ , and Na + (Table 6).
In the study area, F − concentrations of 20% of the sampled wells [9] exceed both the Canadian drinking water quality guidelines [14] and the WHO standards [4].Sample scores for Factor 2 are projected on the x-axis in Figure 11d,e and on the y-axis in Figure 11a.Factor 2 influences groundwater from Clusters 2, 3, and 4. The source of F − is ambiguous.Since more than one group is influenced by the second factor.The covariations of the chemical elements that compose Factor 2 suggest multiple processes of interaction between groundwater and the solid matrix of aquifers or aquitards for explaining the origin of fluoride in groundwater.Numerous sources and processes have been proposed and are detailed below.
Boron (B 3+ ) and Mo 6+ are found incorporated into the crystalline structure of tourmaline and molybdenite, two very common minerals in pegmatite of Grenville Province (Carignan and Gariépy, 1992) that constitute some of the Precambrian rock of the study area.Despite the low solubility of these minerals, the long-term interaction between groundwater and granitic rocks could account for their source, hence explaining the covariations of B 3+ , Mo 6+ , and F − .Recent studies showed that micas in granitic rocks are the main source of dissolved fluoride in groundwater [43][44][45].Experiments with regional rocks should be performed to test if this hypothesis is possible for the study area.
Boron has been described in the SLSJ region where shales are present [46].Mo 6+ is also frequently found in fossil fuels [38].Occurrences of gas and oil in shales and limestones have been identified in the study area [8].The combination of B 3+ and Mo 6+ could indicate that F − originates from the groundwater interaction with organic matter-rich sedimentary rocks present in the region.Investigations on a local scale would help to test this hypothesis.
Walter [45] explained that the geochemical behavior and concentration of fluorine in groundwater are controlled by the existence of Ca-bearing minerals and by the fluorite precipitation as a function of the absolute concentration of Ca 2+ ions in water.In the study region, Walter et al. [5] suggested that the Ca 2+ water -Na + mineral ion exchange processes may lead to a decrease in the groundwater Ca 2+ content.As fluorite (CaF 2 ) equilibrium controls F − and Ca 2+ concentrations [47], it is expected that the removal of Ca 2+ from the water in exchange for Na + will increase the fluoride (F − ) concentration in presence of fluorite.A slight increase in F − is observed in Table 5 between Clusters 1 and 2 (Recharge-like), and would suggest the presence of fluorite in the system.
The Ca 2+ water -Na + mineral ion exchange processes also support the linear correlation of Na + and F − in Factor 2. Ion exchange is believed to occur early in the evolution of groundwater in bedrock aquifers [24] but also particularly characterizes groundwater from confined aquifers [5].
Overall, dissolved F − can be generated either from the interaction of groundwater with marine clay or from the interaction of groundwater with bedrock aquifers in the early stage of evolution of recharge type of groundwater, also suggesting that a great environment for enhancing groundwater fluoride content would be the crystalline bedrock aquifers confined by the marine clay aquitard.

Conclusions
This study proposed a new approach for diminishing the negative impact of censored values in large databases for multivariable treatments to better constrain chemical evolution and sources of ions for groundwater.The approach involves first data grouping accounting for regional hydrogeochemical poles (RHP).Second, key chemical elements are identified in each of the groups by considering the detection ratio, which is obtained by comparing to the analytical detection limit as % NDetected = [NDetected/NDatabase] × 100.Selected key elements have high detection ratios in a specific RHP.
The groups of the subset were treated by a combination of hierarchical Cluster analysis (HCA), principal component analysis (PCA), binary plot investigations, and Piper diagrams to decipher geochemical trends and processes.The results obtained with the subset are consistent with those of [5] obtained with a regional-scale database.Furthermore, the subset allowed for factor analysis (FA), bringing new insights into the chemical evolution of groundwater through the salinization process.Four factors accounted for 78.8% of the total variance.
Factor 1, with high loadings for K + , HCO 3 − , Mg 2+ , SiO 2 , NH 4 + , and SO 4 2− , characterizes groundwater in contact with the regional clay aquitard.The chemical transformation of microcline into illite and CO 2 production control the chemistry of groundwater in confined aquifers.Factor 2 has loadings for F − , B 3+ , Mo 6+ , and Na + .Dissolved F − can be generated either from the interaction of groundwater with marine clay or from bedrock aquifers (crystalline or sedimentary aquifers).Part of the discussion of this article focuses on hypothetical regional sources of fluoride by studying the FA results obtained using the subset of data.The results partly explains why the groundwater chemical background of the region has relatively high levels in fluoride, resulting in frequent excess of drinking water standards.Subsequent studies may now attempt to test the proposed processes experimentally (batch experiments) and numerically (speciation and geochemical modeling).
The third factor is dominated by Ca 2+ , Ba 2+ , Sr 2+ , and to a lesser extent, SO 4 2− .The third factor characterizes samples corresponding to a diluted endmember of the Precambrian Shield brines (PSBs).
The diluted endmember of the PSBs in this study have been compared with PSBs compiled from the literature on log-log plots of Ca 2+ , Ba 2+ , SO 4 2− , and Sr 2+ versus Cl − concentrations.A mass ratio Ca 2+ /Sr 2+ ≈ 40 is proposed to identify groundwater that experienced geochemical processes leading to a composition, such as the Canadian Shield brines.Factor 4 is strongly influenced by Fe 2+ and Mn 2+ and characterizes the recharge-like type of groundwater (Cluster 1 and 2), where in the first stage of groundwater evolution, iron precipitate while manganese remains dissolved.In this study, samples from the subset belong to the same flow path from a statistical point of view, not from a geographical point of view.Others samples from the database can then be compared with the samples of the subset and discussed according to their stage of evolution.Our results demonstrated that the dataset reduction could be a better approach for the study of water-rock interaction mechanisms based on truncated distribution.Other selection methods must be tested.Rather than detection limits, quartiles or water quality standards could be used as criteria for deciphering patterns in large-scale databases.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2076-3263/9/3/139/s1,S1: Correlation matrices constructed for the 4 regional hydrogeochemical poles.S2: Differences between median concentrations for the 321-sample database and the 51-sample subset (∆ m.c) expressed in percentages.S3: Correlation matrices for the subset and the database.S4: Combined Fe 2+ and Mn 2+ Eh-pH diagrams for Clusters 1 to 4. S5: Chemical data for brines found at depth in the Canadian Shield compiled from the existing literature.

Figure 2 .
Figure 2. Digital elevation model of the study area (administrative limits) showing the highlands and the lowlands delimited by the Saguenay Horst-Graben faults.The northwestern part of the study area is characterized by a half-graben structure.

Figure 1 .
Figure1.Location of the study area and the geographical distribution of all the samples (regional-scale database, 321 samples) collected within the limit of the study area.Samples from the subset (Section 4.2) selected from the regional-scale database are presented using red squares and red circles.

Geosciences 2019, 9 , 33 Figure 1 .
Figure1.Location of the study area and the geographical distribution of all the samples (regionalscale database, 321 samples) collected within the limit of the study area.Samples from the subset (Section 4.2) selected from the regional-scale database are presented using red squares and red circles.

Figure 2 .
Figure 2. Digital elevation model of the study area (administrative limits) showing the highlands and the lowlands delimited by the Saguenay Horst-Graben faults.The northwestern part of the study area is characterized by a half-graben structure.

Figure 2 .
Figure 2. Digital elevation model of the study area (administrative limits) showing the highlands and the lowlands delimited by the Saguenay Horst-Graben faults.The northwestern part of the study area is characterized by a half-graben structure.

Figure 3 .
Figure 3. Conceptual cross-section showing the two paths followed by groundwater mineralization towards a brackish endmember in the Saguenay-Lac-Saint-Jean region (modified from Walter et al. [5]).

Figure 3 .
Figure 3. Conceptual cross-section showing the two paths followed by groundwater mineralization towards a brackish endmember in the Saguenay-Lac-Saint-Jean region (modified from Walter et al. [5]).

Figure 5 .
Figure 5.Comparison between the frequencies of detection of the chemical elements for the 321 sample dataset and the 51 sample subset.

Figure 5 .
Figure 5.Comparison between the frequencies of detection of the chemical elements for the 321 sample dataset and the 51 sample subset.

Figure 6 .
Figure 6.Comparison between the median concentration in ppm for each chemical element of the database and the subset on a log scale (y-axis).Chemical elements (x-axis) are sorted in descending order of differences between median concentrations for the 321 sample database and the 51 sample subset.Compared to the 321 sample dataset, the ionic content of the 51 sample subset increased, although the m.c. of some chemical elements remained unchanged or decreased.Variations for the medians of the 2 datasets calculated in percentage are presented in the Supplementary Materials S2).

Figure 6 .
Figure 6.Comparison between the median concentration in ppm for each chemical element of the database and the subset on a log scale (y-axis).Chemical elements (x-axis) are sorted in descending order of differences between median concentrations for the 321 sample database and the 51 sample subset.Compared to the 321 sample dataset, the ionic content of the 51 sample subset increased, although the m.c. of some chemical elements remained unchanged or decreased.Variations for the medians of the 2 datasets calculated in percentage are presented in the Supplementary Materials S2).

Figure 7 .
Figure 7. (A) Dendrogram of the hierarchical Cluster analysis: when distance linkage (Dlink) is <30% of Dmax (i.e., position of the phenon line), the subset is divided into 4 Clusters (C1 to C4). (B) Logscale spider diagram of major cation and anion concentrations for the 4 Clusters.Clusters 1 and 2 correspond to more dilute groundwater where bicarbonate ion dominates, Cluster 4 is enriched in major elements when compared to Clusters 1, 2, and 3, and calcium dominates Cluster 3.

Figure 7 .
Figure 7. (A) Dendrogram of the hierarchical Cluster analysis: when distance linkage (Dlink) is <30% of Dmax (i.e., position of the phenon line), the subset is divided into 4 Clusters (C1 to C4). (B) Log-scale spider diagram of major cation and anion concentrations for the 4 Clusters.Clusters 1 and 2 correspond to more dilute groundwater where bicarbonate ion dominates, Cluster 4 is enriched in major elements when compared to Clusters 1, 2, and 3, and calcium dominates Cluster 3. Clusters 1 and 2 are composed largely of bicarbonate waters (32/35 samples), while Clusters 3 and 4 are composed of groundwater dominated by chloride anion (16/16, Table 5).All samples in Cluster 1 are composed of groundwater of the Ca-HCO 3 type, while the samples in Cluster 2 are composed mainly of groundwater of the Ca-HCO 3 (11/20) and Na-HCO 3 type (6/20), with a limited number of groundwater samples of the Na-Cl type (2/20).Clusters 3 and 4 are dominated by chloride, sulfate, and sodium.Samples from Cluster 3 were collected mostly from fractured rock aquifers (7/8 samples).The very large proportion of Cluster 4 corresponds to groundwater samples collected in granular aquifers (7/8 samples), mostly confined (6/8 samples).The samples from Cluster 1 come mainly from unconfined aquifers, and Cluster 2 samples were collected in equal proportions in confined aquifers or in unconfined aquifers.Cluster 3 is dominated by Ca-Cl (5/8) samples and contains some Na-Cl (3/8) samples.Cluster 4 consists exclusively of groundwater of the Na-Cl type.The results obtained using HCA are like those described in[5]  for the regional-scale database (321 samples).Both studies allow for the identification of a Cluster dominated by fractured rock groundwater samples, two Clusters of saline groundwater samples, and at least a Cluster dominated by groundwater samples of the bicarbonate type.Thus, the reduction of the original dataset into a subset does not modify the Clustering obtained with HCA processing.Some distinctions between the two studies are, however, of interest.By positioning the phenon line to obtain four Clusters, the Clusters obtained with the subset of data have a lower percentage of resemblance (27% against 52% in[5]) to the regional-scale database.This observation suggests that the differences between the groups are better defined using the data reduction approach of this study, having the effect of reducing the general geochemical similarities between the samples.This result can be understood as the reduction of background noise, or the identification within the regional-scale database of more representative samples of hydrogeochemical endmembers (early stage or advanced stage of evolution) of the regional hydrogeochemical poles (RHPs).The background noise would then correspond to all types of mixing phenomena that lead to a certain homogenization of the general hydrogeochemistry.
, TDS is indicated for each sample.Increasing TDS from left to right along component 1 indicates that component 1 corresponds to a salinity gradient.Chemical element loadings for components 1 and 2 are plotted in the correspondence circle in Figure 8B.Component 2 scores are positive for Cluster 3 samples.Chemical elements with positive scores for component 2 are Ca 2+ , Ba 2+ , Sr 2+ , Mn 2+ , Mo 6+ , and SO 4 2 .Component 2 scores are negative for Cluster 4 samples.Chemical elements with negative scores for component 2 are Mg 2+ , F − , Na + , B 3+ , NH 4 + , SiO 2 + , K + , and HCO 3 − .Geosciences 2019, 9, x FOR PEER REVIEW 2 of 33 component 1 indicates that component 1 corresponds to a salinity gradient.Chemical element loadings for components 1 and 2 are plotted in the correspondence circle in Figure 8B.Component 2 scores are positive for Cluster 3 samples.Chemical elements with positive scores for component 2 are Ca 2+ , Ba 2+ , Sr 2+ , Mn 2+ , Mo 6+ , and SO4 2 .Component 2 scores are negative for Cluster 4 samples.Chemical elements with negative scores for component 2 are Mg 2+ , F − , Na + , B 3+ , NH4 + , SiO2 + , K + , and HCO3 − .

Figure 8 .
Figure 8. Graphical representation of PCA results.The horizontal axis of the correspondence circle corresponds to the first principal component (Component 1: 39.8% of the total variance of the dataset), and the vertical axis corresponds to the second principal component (Component 2: 13.5% of the total variance of the dataset).In (A), sample loading for Components 1 and 2 are plotted according to the Cluster to which they belonged.Chemical element loadings for Components 1 and 2 are plotted in the correspondence circle in (B).

Figure 8 .
Figure 8. Graphical representation of PCA results.The horizontal axis of the correspondence circle corresponds to the first principal component (Component 1: 39.8% of the total variance of the dataset), and the vertical axis corresponds to the second principal component (Component 2: 13.5% of the total variance of the dataset).In (A), sample loading for Components 1 and 2 are plotted according to the Cluster to which they belonged.Chemical element loadings for Components 1 and 2 are plotted in the correspondence circle in (B).

Geosciences 2019, 9 , 33 Figure 9 .
Figure 9. Log-log plot of calcium versus sodium.Samples that were not included in the subset of data are marked with the symbol (+).

Figure 9 .
Figure 9. Log-log plot of calcium versus sodium.Samples that were not included in the subset of data are marked with the symbol (+).

Figure 10 .
Figure 10.(A) Piper diagram showing the geochemical evolution of groundwater according to the first approximation of the hydrogeochemical context (recharge-type groundwater, water-rock-type interactions, and water-clay-type interactions) interpreted using a Ca 2+ vs. Na + binary plot (Figure 9).(B) New graphical fields and improved interpretations of the Piper diagram for groundwater evolving in Precambrian terrain invaded previously by seawater.

Figure 10 .
Figure 10.(A) Piper diagram showing the geochemical evolution of groundwater according to the first approximation of the hydrogeochemical context (recharge-type groundwater, water-rock-type interactions, and water-clay-type interactions) interpreted using a Ca 2+ vs. Na + binary plot (Figure 9).(B) New graphical fields and improved interpretations of the Piper diagram for groundwater evolving in Precambrian terrain invaded previously by seawater.

Geosciences 2019, 9 , 33 Figure 11 .
Figure 11.Graphical representation of the sample scores calculated for each of the factors obtained during the factorial analysis procedure.The different combinations of factors led to the construction of six binary diagrams: (a) Factor 1 vs. Factor 2; (b) Factor 1 vs. Factor 3; (c) Factor 1 vs. Factor 4; (d) Factor 2 vs. Factor 3; (e) Factor 2 vs. Factor 4; and (f) Factor 3 vs.Factor 4. Samples are representedwith symbols corresponding to the cluster number of the samples for the subset.Mixing and dilution zones defined in the center of the binary diagrams contain samples that are weakly influenced by the factors (scores < 1).Two series of vectors represent the evolution of groundwater: one set of vectors represents the evolution of the recharge groundwater (fresh water influx), and the second set of vectors corresponds to the evolution of groundwater from the mixing and dilution zones toward the brackish groundwater (maturation gradient).

Figure 11 .
Figure 11.Graphical representation of the sample scores calculated for each of the factors obtained during the factorial analysis procedure.The different combinations of factors led to the construction of six binary diagrams: (a) Factor 1 vs. Factor 2; (b) Factor 1 vs. Factor 3; (c) Factor 1 vs. Factor 4; (d) Factor 2 vs. Factor 3; (e) Factor 2 vs. Factor 4; and (f) Factor 3 vs.Factor 4. Samples are representedwith symbols corresponding to the cluster number of the samples for the subset.Mixing and dilution zones defined in the center of the binary diagrams contain samples that are weakly influenced by the factors (scores < 1).Two series of vectors represent the evolution of groundwater: one set of vectors represents the evolution of the recharge groundwater (fresh water influx), and the second set of vectors corresponds to the evolution of groundwater from the mixing and dilution zones toward the brackish groundwater (maturation gradient).

Funding:
This project was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC), the Fonds de recherche du Québec-Nature et technologies (FRQNT), the Fondation de l'Université du Québec à Chicoutimi and Rio Tinto Alcan (RTA), and the Programme d'acquisition de connaissances sur les eaux souterraines of Quebec (PACES), with contributions from the Quebec Ministère du Développement durable, de l'Environnement, de la Faune et des Parcs (MDDEFP), and the five county municipalities of the SLSJ region (Domaine-du-Roy, Du-Fjord-Du-Saguenay, Lac-Saint-Jean-Est, Maria-Chapdelaine, and the City of Saguenay).

Table 1 .
Descriptive statistics for the physicochemical parameters of the four regional hydrogeochemical poles defined in this study.

Table 2 .
Key chemical elements for each regional hydrogeochemical pole (RHP).

Table 3 .
Statistical data for the 51 sample subset.
Analytical Methods: * multiparameter probe (in situ); ** Inductively Coupled Plasma Mass Spectrometry; *** Ion chromatography; **** Specific probe; ***** Titration; 25 and 75 header = the first and the third quantiles; TDS is calculated using the software Aquachem v.5.0;For major, minor and trace elements, data are given in mg/L; For the physical parameters, N = number of measured data; For the laboratory analyzed parameters, N = Number of detected values; If N = 1 or 2: data are underlined; If N = 1: unique measured value is presented; If N = 2: mean value is presented; Multivariate header = parameters used in the multivariate analysis.

Table 4 .
Physicochemical data for the Cluster (51 samples of the subset).

Table 6 .
Factor analysis loadings and explained variance after applying a Varimax rotation.