Next Article in Journal
Relationships of Sexual Dysfunction with Depression and Acceptance of Illness in Women and Men with Type 2 Diabetes Mellitus
Previous Article in Journal
Perception of Radiation Risk as a Predictor of Mid-Term Mental Health after a Nuclear Disaster: The Fukushima Health Management Survey
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan

1
Western College of Veterinary Medicine, University of Saskatchewan, 52 Campus Drive, Saskatoon, SK S7N 5B4, Canada
2
School of Public Health, University of Saskatchewan, 104 Clinic Place Saskatoon, SK S7N 2Z4, Canada
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2017, 14(9), 1065; https://doi.org/10.3390/ijerph14091065
Submission received: 31 August 2017 / Revised: 11 September 2017 / Accepted: 12 September 2017 / Published: 15 September 2017
(This article belongs to the Section Environmental Health)

Abstract

:
Groundwater drinking water supply surveillance data were accessed to summarize water quality delivered as public and private water supplies in southern Saskatchewan as part of an exposure assessment for epidemiologic analyses of associations between water quality and type 2 diabetes or cardiovascular disease. Arsenic in drinking water has been linked to a variety of chronic diseases and previous studies have identified multiple wells with arsenic above the drinking water standard of 0.01 mg/L; therefore, arsenic concentrations were of specific interest. Principal components analysis was applied to obtain principal component (PC) scores to summarize mixtures of correlated parameters identified as health standards and those identified as aesthetic objectives in the Saskatchewan Drinking Water Quality Standards and Objective. Ordinary, universal, and empirical Bayesian kriging were used to interpolate arsenic concentrations and PC scores in southern Saskatchewan, and the results were compared. Empirical Bayesian kriging performed best across all analyses, based on having the greatest number of variables for which the root mean square error was lowest. While all of the kriging methods appeared to underestimate high values of arsenic and PC scores, empirical Bayesian kriging was chosen to summarize large scale geographic trends in groundwater-sourced drinking water quality and assess exposure to mixtures of trace metals and ions.

Graphical Abstract

1. Introduction

Several studies have highlighted associations between exposure to arsenic from drinking water and a variety of chronic diseases including type 2 diabetes, hypertension, cardiovascular disease and cancer [1,2,3,4,5,6,7]. Poor water quality, especially aesthetic issues that impact consumer acceptance of tap water, might also have indirect impacts on the development of chronic diseases such as diabetes and cardiovascular disease by motivating the consumption of sugar-sweetened beverages as an alternative to drinking water [8]. Additionally, mixtures of contaminants may act synergistically and have impacts that are not appreciated when examining the isolated effects of individual contaminants [9,10,11].
In Canada, drinking water oversight is a provincial mandate. Guidelines for acceptable levels of contaminants and regulations pertaining to water quality testing are established by each province, but typically follow recommendations set forth in the federal Guidelines for Canadian Drinking Water Quality [12,13]. In the province of Saskatchewan (SK), the Drinking Water Quality Standards and Objectives [14] apply to regulated water supplies. Standards are legally enforceable and are based on parameters that are potentially harmful to human health. Objectives are guidelines based on parameters that are not considered harmful to human health in the context of expected concentrations in drinking water, but which can negatively impact the aesthetic qualities of the water.
In Saskatchewan, three tiers of drinking water systems are defined, of which two are subject to regulation [15]. Public drinking water systems are monitored and regulated by the Saskatchewan Water Security Agency and semi-public supplies are monitored by the Ministry of Health. Private water supplies serve a single household or farm; these supplies are not regulated and have no monitoring requirements. Residents who use private water supplies may lack the knowledge and resources to routinely and adequately test their household water source, and therefore could be at greater risk of exposure to drinking water of poor quality or that has elevated concentrations of contaminants such as arsenic. Similarly, smaller water distribution systems may lack the resources to avoid placing wells in aquifers of lower water quality, and the requirements for testing are based on the size of the population served with less stringent requirements for smaller supplies [16]. Therefore, residents in rural areas, whether using a smaller public system or a private water supply, may be vulnerable to health effects related to the quality of water available at their household tap. Access to affordable tap water alternatives such as bottled water may also be limited in rural areas as compared to those available in urban locations, exacerbating the potential impacts of poor tap water quality for those located outside urban centers.
Ground water quality has been previously identified as a public health concern in SK, particularly for residents using private wells for their drinking water [17,18,19,20,21]. Approximately 28% of SK residents, primarily in rural areas, use groundwater-based public drinking water supplies and another 15% obtain drinking water from private water supplies [22]. A small study of arsenic levels in 61 private and rural municipality-owned wells in SK found that 23% of the wells had arsenic concentrations above the current SK standard of 0.01 mg/L [17]. Another study found that 99.6% of 535 wells sampled exceeded at least one aesthetic objective or health standard and, of those, 35% exceeded a health-related standard [23]. In addition, other minerals and ions such as iron and manganese, though not considered substantial health risks, are frequently found in groundwater at concentrations that exceed recommended levels and negatively affect the aesthetic quality of drinking water [19,23]. In a recent survey of residents in rural SK, 25% of respondents reported having an aesthetic complaint about their household drinking water [24].
Our goal was to summarize the quality of ground-sourced drinking water as part of the exposure assessment for the investigation of associations between tap water quality and rates of type 2 diabetes and cardiovascular disease in rural SK. Estimating exposure to constituents of drinking water is a major challenge in investigating such associations. Previous work has suggested that geostatistics can be used to map estimated exposure to arsenic through groundwater as a continuous surface across a region based on a limited set of point measures [25,26,27,28,29]. Additionally, principal components analysis (PCA) has been used as a variable reduction method to investigate groupings of groundwater parameters [30,31,32,33] and has been combined with geostatistical methods to map underlying latent processes contributing to overall water quality across a region [34,35,36,37]. However, we are not aware of previous studies exploring the use of PCA and geostatistics to summarize groundwater quality in SK. We are also not aware of studies exploring the use of PCA to summarize mixtures of contaminants for exposure assessment in epidemiologic studies.
Previously collected water surveillance records represent an existing source of data by which water quality can be summarized as a means of exposure assessment for epidemiological models of associations between household water quality and health. Our objective was to evaluate the use of existing water quality monitoring data from public systems and private water supplies to describe ground-sourced drinking water quality, and use PCA to summarize metals and ions included in the monitoring of health standards and aesthetic objectives. Our secondary objectives were to compare the performance of ordinary kriging, universal kriging, and empirical Bayesian kriging for predicting arsenic concentrations and principal component scores derived from principal components analysis, and to evaluate the potential use of these tools in human health exposure analysis for large rural areas of western Canada. The results of this analysis were later used to estimate groundwater-sourced drinking water arsenic concentrations and principal components summarizing water health standard and aesthetic parameters for geographic regions across rural Saskatchewan. The estimates were needed to inform an exposure analysis for epidemiological investigation of associations between water quality and diabetes, hypertension, and cardiovascular disease in rural SK.

2. Materials and Methods

2.1. Study Area

Saskatchewan is a province located in central Canada with borders approximately located along 49° N and 60° N, and 102° W and 110° W. The geostatistical analysis was limited to the southern part of the province, because the water monitoring data were very sparse in the sparsely-populated northern part of the province (Figure 1). The study area for the geostatistical analysis was bounded by the solid line in Figure 1 and encompassed an area just over 327,900 km2 and was home to approximately 1 million people in 2010.

2.2. Data Sources

All data were obtained from the Saskatchewan Water Security Agency. The public supply data originated from 492 groundwater sourced systems that fall under regulation of the Water Security Agency and were collected during 1985–2012, and consisted primarily of treated water samples. Wells considered to be under the influence of surface water were excluded, but otherwise well depths were not readily available. The private water supply data were collected from 4093 wells as part of the Water Security Agency’s Rural Water Quality Advisory Program (RWQAP), a unique program designed to provide advice and subsidized water testing services to private well owners during 1996–2011. Well depth was reported by landowners, and subject to considerable uncertainty. Furthermore, well depth was not available for over 25% of samples; therefore, well depth was not considered in the analysis. All chemical analyses were performed using methods accredited by the Canadian Association for Laboratory Accreditation Inc. [38] at the Saskatchewan Disease Control Laboratory (Regina, SK, Canada) or the Saskatchewan Research Council Environmental Analytical Laboratories (Saskatoon, SK, Canada). The densities of sampling sites for public water supplies and private wells are shown in Figure 1.
All analyses were carried out separately for public water supply data and private well data. Concentrations for all parameters, for both private and public supply databases, were converted to mg/L. Only samples from groundwater sources were retained, and samples with missing location information were discarded. Because analytic methods and detection limits changed over time, data were compared to the detections limits corresponding to the sampling date; for samples where the parameter concentration was reported as below the detection limit, half the detection limit was imputed for the missing value [26,39]. The distributions for all parameters were right skewed and natural logarithm transformations were applied prior to analysis. Descriptive statistics for parameters of interest, along with the proportion of samples exceeding Saskatchewan’s standards or objectives, and the proportion of samples that were below detection limits were summarized for the public water supply data and the private well data.
Cyanide, mercury, cadmium, chromium, and fluoride were excluded from the analysis for both public supplies and private wells to maintain consistency in the analysis of the different types of supplies. Cyanide and mercury were sampled less frequently than the other metals and ions of interest in public systems. Cyanide was not sampled in the private wells and mercury was sampled infrequently. Additionally, 98% of cadmium and 72% of chromium samples were below detection limit for the public water supply data, and 92% of cadmium and 91% of chromium samples were below detection limit in the private well data. Fluoride was infrequently reported for private wells, but was added to some municipal supplies [40] and was also excluded.
The water quality data for the public water supplies consisted of repeated measures taken at irregular intervals during 1985–2012. Because kriging methods implemented in ArcGIS require a single measure at each point location, a generalized linear mixed model (GLMM) with no fixed effects was used to estimate a single predicted value for each water parameter for each water supply location, using the Mixed procedure with maximum likelihood estimation in SAS (SAS Institute Inc., Cary, NC, USA). The GLMM included a random intercept μ~N(0, σ2) to account for between supply variation, as well as a structured error term with a spatial power covariance structure based on the time between sampling to account for the non-equidistant repeated sampling within water supplies [41,42].
The private well data consisted of measures at a single point in time for each location. However, due to privacy considerations, private well locations were generalized to the centroid of the section of land on which they were located before confidential data access was granted for this analysis. A section is approximately 1.6 km by 1.6 km and corresponds to a parcel of land described by the grid-based land description system used in western Canada [43]. Wells with duplicated generalized locations were manually jittered by alternately increasing or decreasing the latitude and longitude by increments of 10 m for each subsequent duplicate well location until no duplicate locations remained.

2.3. Principal Components Analysis

Principal components analysis (PCA) is a multivariate statistical technique that can be used to reduce the dimensionality of a dataset with numerous correlated variables and categorize variables into groups based on their covariance. The resultant independent principal components account for the variance in the observed data, except for that explained by components that are not retained. These independent principal components can then be used as predictor variables in subsequent analyses. PCA was performed using SAS (SAS Institute Inc., Cary, NC, USA) and applied separately to the public supply and private supply data, separated into groups of water parameters identified as health standards and as aesthetic objectives according to the Saskatchewan Drinking Water Quality Standards and Objectives. For the public water supply data, the single predicted value for each supply was used in PCA, while values for each individual well were used in the PCA of private well data.
Log transformed concentrations (mg/L) were converted to standardized z-scores prior to performing PCA to mitigate the effect of measurement scales and to ensure that highly variable parameters did not dominate the analysis [44,45]. Kaiser’s measure of sampling adequacy (KMO) [46] and Bartlett’s test of sphericity [47] were performed to assess the sampling adequacy and correlation of these data and therefore their suitability for PCA. In each analysis, principal components (PC) with an eigenvalue of greater than one were retained [48], and subject to varimax rotation to maximize the variation explained by each of the retained principal components and to obtain the final principal component loadings and coefficients. Principal component (PC) scores for each of the retained principal components were calculated for the public supplies and private wells for use in the geostatistical analysis.

2.4. Geostatistical Analysis

Kriging is a geostatistical method that incorporates spatial autocorrelation in continuous variables to interpolate values at locations at which they have not been measured [49]. The basis of kriging is the semivariogram model, which uses the semivariance between point measures to summarize the spatial relationships in variables. Ordinary kriging and universal kriging are methods for which estimation is based on weighted least squares and the assumption that the calculated semivariogram is the true model for the data. Ordinary kriging differs from universal kriging in that ordinary kriging assumes a constant unknown mean across a given area, while universal kriging assumes a constant trend in the data [49].
Empirical Bayesian kriging first became available in ArcGIS version 10.1, and is based on restricted maximum likelihood estimation [50]. Additionally, it allows for uncertainty in the semivariogram model by a process of data subsetting and simulation to estimate a range of semivariogram models. As a result, empirical Bayesian kriging requires fewer assumptions about the semivariogram model form, is considered accurate for moderately non-stationary data, and provides more accurate estimation of prediction standard errors than other kriging methods [50].
Principal component scores and log transformed arsenic concentrations were mapped in ArcMAP 10.3 (ArcGIS, ESRI, Redlands, CA, USA) to compare the performance of ordinary, universal and Empirical Bayesian Kriging for the prediction of arsenic concentrations, PC scores for health standards (PChealth), and PC scores for aesthetic objectives (PCaesthetic). The analyses for each of these three types of data were performed separately for public water supply data and private well data.
Variogram models for each variable were developed using SAS 9.4 software (SAS Institute Inc., Cary, NC, USA) [51]. First or second order large scale trends were identified in the data for each variable using regression analysis. Where trends were detected, the residuals from the regression models were used to develop variogram models. The average nearest neighbor distance for the sampling locations for each variable were calculated in ArcGIS 10.3 and set as the lag distance for the variograms, and half the study area extent divided by the lag distance was used to calculate the maximum number of lags, which limits the semivariogram lag to half the extent of the sampling area [51]. The fits of spherical, exponential, and Gaussian models were compared using the Akaike Information Criterion (AIC) for each variable in each of the public and private water supply datasets. The presence of anisotropy was evaluated visually by dividing the variogram into eight directions (22.5° angles); however, for the final analysis, an omnidirectional variogram was modeled.
ArcMAP 10.3 (ArcGIS, ESRI, Redlands, CA, USA) was used to perform ordinary, universal and empirical Bayesian kriging for each variable. The order of trend removal and lag distance were set based on the variogram models developed in SAS. For the public water supply data, the number of lags was set to the maximum number of lags calculated for the SAS variogram models, but for the private well data 100 lags were used for each kriging model because the calculated maximum number of lags for each parameter exceeded the maximum of 100 allowed in ArcMAP. Because the spherical form of the variogram model had the lowest AIC for all but two of the variogram models (PC4aesthetic for public water supplies and PC3health for private wells), the model form was specified as spherical for each variable for ordinary and universal kriging. Because empirical Bayesian kriging is less interactive and spherical models are not available, the settings for Bayesian Kriging were left at default values, such that subsets of 100 points and 100 simulations were modeled for each variable using a power-based model.
The root mean square prediction error (RMSE) in the kriging cross validation analysis performed in ArcMAP was used to compare the ability of ordinary, universal, and empirical Bayesian kriging to predict values for each parameter [52]. In cross validation, each point is sequentially omitted from the dataset and the remaining points used to predict a value for that point; the difference between the predicted and measured value is the prediction error. The method that resulted in the greatest number of parameters with the lowest RMSE was identified as the optimal kriging method for the water data.

3. Results

For both public water supply (Table 1) and private well data (Table 2), the medians of most parameters, with the exception of manganese, were below the values specified in Saskatchewan’s Drinking Water Quality Standards and Objectives. However, the 95th percentile value exceeded the standard or objective for several parameters including arsenic, uranium, iron, manganese, sodium, sulfate, alkalinity, hardness and total dissolved solids. When the values for public systems after repeated measures modeling (Table 3) were compared to the summary statistics for the raw data (Table 1), the median values were similar, but the maximums were smaller.

3.1. Principal Components Analysis

Bartlett’s test of sphericity was satisfied for each dataset indicating that there was sufficient correlation among the variables for PCA to be useful (p < 0.001 for each dataset). The KMO measure of sampling adequacy was above the recommended minimum value of 0.6 [46] for the data for aesthetic objectives for public water supplies (KMO = 0.63), for health standards in private wells (KMO = 0.62) and aesthetic objectives for private wells (KMO = 0.68). However, the KMO was 0.49 for the health standards in the public water supply data. Removing the parameter with the lowest individual measure of sampling adequacy (arsenic) did not substantially improve the KMO. Despite the low KMO, PCA was completed for this dataset to facilitate planned comparisons.
For health standards, analysis of the public supply data yielded three PCs accounting for 63.7% of the variance, while the private supply analysis yielded three PCs that accounted for 67.6% of the variance (Table 4). The first health standards principal component (PC1health) was associated with nitrate and selenium for both types of supplies. However, PC1health was more strongly associated with uranium in private wells than in public supplies, while PC1health was more strongly associated with lead in public supplies than in private wells. The second PChealth exhibited opposite pattern, demonstrating a positive association with boron and a negative association with barium in public water supplies, and a negative association with boron and positive association with barium in private wells. The third PChealth was associated with arsenic for the public supplies, but only weakly associated with arsenic in private supplies. For public supplies, PC3health was also weakly associated with uranium, but, in the private supplies, PC3health was associated with lead.
For the aesthetic objectives, PCA on the public supply data yielded four PCs explaining 77.8% of the variance, while the private supply data yielded only three PCs accounting for 70.3% of variance (Table 4). For both types of supplies, the first aesthetic objectives PC (PC1aesthetic) was associated most strongly with alkalinity, chloride, sodium, sulfate, and total dissolved solids. In addition, for both types of supplies, the second PCaesthetic was characterized by strong associations with hardness and magnesium, and weak associations with sulfate and manganese. In addition, private supplies had a weak association of zinc with PC2aesthetic. The third PCaesthetic for public and private supplies were associated primarily with iron and manganese, but, in private systems, PC3aesthetic also exhibited a negative association with copper. Conversely, the PC4aesthetic retained only from the public supply was associated with copper and zinc.

3.2. Geostatistical Analysis

Of the 492 groundwater-sourced public water supplies sampled, 480 fell within the study area outlined in Figure 1 and were used in the geostatistical analysis. Arsenic concentrations were available from all 480 public water supply locations within the study area. Health standards PC scores were available for 459 locations and aesthetic objectives for 435 locations.
Of the 4093 private wells sampled, 4084 fell within the study area and were used in the geostatistical analysis. Arsenic concentrations were available for 4073 private wells within the study area; health standards PC scores were available for 3970 private wells; and aesthetic objectives for 3999 private wells in the study area.
The number of data points, lag distance, and number of lags used as inputs in ordinary and universal kriging are summarized in Table 5. Visual inspection of plots for anisotropy suggested the few observed differences were minor, thus isotropic models were assumed for all variables.
The RMSE values from each of the kriging models for each of the variables demonstrate that, overall, the Bayesian kriging resulted in the lowest RMSE for nine variables, whereas ordinary kriging resulted in the lowest RMSE for six variables (Table 6).
For each type of kriging, cross-validation plots of predicted vs. measured values (Appendix A, Figure A1, Figure A2, Figure A3, Figure A4, Figure A5, Figure A6, Figure A7 and Figure A8) revealed that, for low measured values, predicted values tended to be overestimated, and that, for higher measured values, predicted values tended to be underestimated.
Results from empirical Bayesian kriging of log transformed arsenic concentrations from the public and private water supply data within the study area were back transformed to obtain surfaces of predicted arsenic concentrations (Figure 2).
Prediction surfaces based on the natural logarithm transformed arsenic concentrations are shown in Figure 3a, along with their corresponding prediction standard error surfaces, which illustrate the amount of uncertainty associated with the predicted values (Figure 3b). Surfaces for each of the principal components are available in Appendix B (Figure A9, Figure A10, Figure A11, Figure A12, Figure A13, Figure A14 and Figure A15).

4. Discussion

In Saskatchewan, groundwater sources are primarily utilized by residents of rural and remote areas [53]. A considerable number of water samples exceeded drinking water standards and objectives in the surveillance data, highlighting the need to promote adequate testing of drinking water in rural areas. Generally, it appears that contaminants listed as aesthetic objectives exceed guideline values at a higher frequency than the health-related standards. As expected, the raw groundwater sampled from private wells exceeded standards and objectives more frequently than the water from the regulated, treated public water supplies. However, a considerable number of samples from public supplies still exceeded guidelines, especially for aesthetic objectives.
In a previous study, a sample of 283 wells in Saskatchewan [19] found that approximately 45% of the wells exceeded the Saskatchewan drinking water objective for sulfate, 47% exceeded the objective for iron, 61% exceeded the objective for hardness, and 79% exceeded the objective for manganese. Our study reflected a similar pattern, although the rates of exceedances were slightly lower; approximately 39% for sulfate, 40% for iron, 31% for hardness, and 68% for manganese in the private wells, and 32% for sulfate, 19% for iron, 21% for hardness and 53% for manganese in public supplies. A previous study from Saskatchewan reported that having aesthetic complaints about tap water was associated with the perception that tap water was unsafe [24]. The aesthetic quality of tap water could act as a determinant of health by increasing consumption of water alternatives which may include sugar-sweetened beverages [8].
Previous studies have also investigated concentrations of arsenic [17] and nitrate [18] in Saskatchewan wells. Thompson et al. [17] sampled 61 wells (private wells and wells maintained by rural municipalities) for arsenic, and found that 23% exceeded the current Saskatchewan drinking water standard applied to regulated public water supplies. In our study, just over 13% of private wells exceeded the standard, while approximately 7% of public supply samples exceeded the standard. Thompson [18] found that 14% of wells tested exceeded the standard for nitrate for regulated waterworks, while 12% of the private wells included in our study exceeded the standard. However, only 4% of public supplies exceeded the nitrate standard.

4.1. Principal Components Analysis

PCA has been used in previous studies to examine and interpret patterns of groundwater quality parameters [30,31,32,33,34,54]. These types of studies typically identify common factor patterns and interpret them with respect to presumed natural and anthropogenic processes that impact groundwater quality, and are often focused on major ions (e.g., sodium, chloride, magnesium, sulfate) that would fall under aesthetic objectives in the Saskatchewan Drinking Water Quality Standards and Objectives. PCA analysis of groundwater has often included nitrate, which falls under Saskatchewan health standards, as a marker for anthropogenic influences on groundwater (e.g., [30,31,32,34]). However, the full range of parameters included in such studies has not been consistent, particularly with respect to the inclusion of trace metals, making it somewhat difficult to compare results. Comparison to our study was further hampered because we analyzed health standards and aesthetic objectives separately to align our analysis with Saskatchewan Drinking Water Quality Standards and Objectives.
We limited our analyses to include parameters that were routinely sampled from both the public and private supply data to facilitate comparison between the differing supplies. We expected the results to differ between the types of systems because the public supply data represent treated water supplies and the private well data represent raw water samples. While there were some differences in the principal components extracted from the public and private data, there were some striking similarities, especially in the results for the aesthetic objectives, even though four PCaesthetic were retained for the public supply data and three for the private well data. The first PCaesthetic was associated with the same group of variables in both datasets: sodium, chloride, sulfate, alkalinity and total dissolved solids. Additionally, hardness and magnesium were strongly associated with the second PC, and iron and manganese with the third PC in both public and private water supplies. The consistent patterns of these parameters between the datasets suggest relatively strong associations between these parameters in Saskatchewan groundwater.
The PCA for health standards also exhibited some consistencies: nitrate and selenium were strongly associated with the first PC1health for both public and private supplies. Arsenic was associated with the third PChealth in both datasets, but strongly associated with that PC in the public supply data. In contrast, uranium was associated with PC3health in public supplies and with PC1health in private wells. In addition, lead was associated with PC1health in public supplies, and with PC3health in private wells. Because lead contamination of water can be associated with leaching from distribution systems, differences in the covariance of lead with other parameters between public supplies and private wells is not unexpected. However, caution is warranted in the interpretation of the PCA for health standards from the public supply data considering the low Kaiser’s measure of sampling adequacy for these data.

4.2. Geostatistical Analysis

Kriging has previously been validated as a method to summarize arsenic concentrations in groundwater quality and in one study was found to be superior to using an area average or nearest well as a proxy to predict well concentrations [26]. While some studies have investigated the use of indicator kriging to model the probability of higher arsenic concentrations using geological and hydrological covariates [25,27], some recent studies have compared various kriging methods that are accessible in GIS software to investigate prediction of arsenic concentrations in groundwater [28,29]. James et al. [29] evaluated the performance of various kriging methods (ordinary, universal, simple kriging with varying means, kriging with external drift, cokriging with ordinary kriging and cokriging with universal kriging) over a relatively small area in Colorado and found that ordinary kriging performed best. Gong et al. [28] compared inverse distance weighted interpolation with kriging using Gaussian and spherical models as well as cokriging in predicting arsenic concentrations over various regions in Texas, and found regional differences in the performance of kriging, and concluded that kriging over smaller areas was more accurate than over large geographic regions.
In the present study, Bayesian kriging had the lowest RMSE for the greatest number of variables and was considered the optimal method for our data. However, values of RMSE for ordinary kriging were very similar to those for Bayesian kriging, so there does not appear to be much difference between these methods in the accuracy of the predicted values on cross validation.
We elected not to use covariate information such as well depths or geological data in our models due to difficulty in obtaining accurate covariate data over our study area. Furthermore, a previous study in SK demonstrated a lack of correlation between well depths and concentrations of the water parameters studied [55]. While depth might be expected to improve modeling of arsenic concentration, conflicting results from other studies suggest that the contribution of depth may be dependent on the study area. For example, a negative correlation between increasing well depth and arsenic concentrations has been reported for wells in Bangladesh [56,57], while a positive association between well depth and arsenic was reported in North Carolina [58]. Yang et al. [27] did not detect any association between arsenic concentration and well depth in Maine. In one study, including well depth in cokriging models did not improve the ability of kriging to predict arsenic levels [29]. Gong et al. [28] found that incorporating well depth in cokriging did not necessarily improve the correlation between predicted and actual values, but did improve the performance of regression models used to predict arsenic levels. Furthermore, Yu et al. [57] investigated factors affecting arsenic at different geographic scales and concluded that much of the variability in arsenic concentrations at a scale of less than 3 km could be explained by well depth, while geology was the most important factor at scales of greater than 10 km. This suggests that given the large scale of our study area relative to other reported studies, it is unlikely that adding well depth as a covariate would have improved our models. While incorporation of geological data might have improved our predictions, this information was not available for the large study area.
Others have reported a tremendous amount of heterogeneity in groundwater concentrations of arsenic over small scales that is poorly understood [27,57]. In Bangladesh, wells within a radius of less than 1 km were found to vary by up to 1000 μg/L [57]. In another study of a relatively small region of Bangladesh, wells in close proximity exhibited extremely variable arsenic concentrations, especially wells less than 30 m in depth [59]. This issue was also highlighted in the geostatistical analysis of arsenic in wells in Michigan; residuals for predicted arsenic values were mapped and no spatial pattern in the residuals was detected [26]. The close proximity of wells with negative and positive residuals of greater than 10 μg/L reflected high variability in arsenic concentrations over short distances [26]. Additionally, a study in Texas compared geostatistical methods among regions, and found the performance of the different methods varied less within a given area than across the different regions [28]. This suggests that variability in the distribution of groundwater arsenic across regions is a limiting factor in identifying a single method that would perform uniformly well in different geographic areas. Given the apparent differences in processes influencing spatial variability of arsenic at different scales, it is possible that developing kriging models over smaller targeted areas with a high density of samples could improve the performance of predictions for some local regions. However, this analysis was intended to estimate the mean arsenic concentrations along with principal components representing drinking water quality over a total area of approximately 327,900 km2.
Interpretation of mapped results of PCA is less straightforward because the values are a representation of a combination of parameters that contribute to the PCA components. For example, areas with high values for the first PC for aesthetic objectives represent higher predicted concentrations of one or more of the contributors to this component, including sodium, chloride, sulfate, alkalinity and total dissolved solids. Nevertheless, this method is useful for examining patterns in common grouping of parameters and allowed extraction of factor scores to summarize mixtures of variables over geographic regions for use in epidemiological analyses.
Previous studies have used geostatistical methods to map the scores resulting from PCA or FA and used the resultant maps to predict the factors that may be impacting groundwater quality, such as pollution or salt water intrusion [34,35,36,37]. It does not appear that the use of kriging with PCA or factor analysis has been well-validated for prediction of groundwater quality. We are not aware of other studies that have assessed the ability of kriging to accurately predict PCA scores at unmeasured locations so we have not compared our results to others.
We would expect to see a reduction in predictive ability by combining a variable reduction method such as PCA with kriging. PCA reduces the dimensionality of a dataset while capturing as much of the information in the original variables as possible. In our data, the percentage of variance in the original measures described by the retained PCs ranged 63.7–77.8%. While spatial patterns of arsenic have been studied extensively, spatial patterns of the other variables, and especially mixtures of variables, have not. Therefore, it is possible that the PCs we extracted are subject to variability at scales not captured by our analysis. The use of PCA combined with kriging of factor scores should not be discounted as a means of summarizing water quality but should be investigated further, ideally with higher density sampling over smaller geographical regions.
Although the data from public water supplies consisted of repeated measures over time for most sites, the decision was made to model a mean value for each parameter at each site rather than specifically estimating concentrations and component scores at particular points in time. The capacity to estimate time-specific values was limited as not all sites had the same intensity of sampling and sufficient samples to provide a precise estimate for every year in the dataset. Finally, and most importantly, the primary objective of this study was to estimate exposures to metals and ions in drinking water for an epidemiologic study of associations between water quality and chronic diseases with uncertain induction periods. The relevant exposure period over which environmental exposures contribute to chronic diseases is uncertain and represents a potential source of misclassification [60,61]. Because the precise time period of interest for estimating exposure was unknown, an estimate of average past exposure was deemed more appropriate for the planned epidemiological analysis than estimating exposures for specific time points, for example, using space-time kriging models.

4.3. Limitations

It is estimated that there are over 66,000 wells in Saskatchewan [17] and our sample of 4093 private wells is a non-random sample of less than 10% of privately owned wells in the province. Because the database consists of samples taken through participation in a voluntary water quality program, it could disproportionately represent residents with concerns about their well water quality. The results from private wells and public water supplies were similar, suggesting this was not a substantial issue.
Although the public supply data represent data from all available public water supplies across Saskatchewan, there were relatively few locations represented in the public supply data relative to the size of the study area, resulting in a low sampling density that may have particularly impacted the ability of kriging to capture the variability of arsenic at small spatial scales.
Our PCA may have been hampered by not being able to make use of a full suite of parameters and high proportions of concentrations below detection limits for some variables, especially with respect to the health-related standards. We also made the decision to separately analyze aesthetic and health parameters because they are segregated into drinking water standards and objectives. It is possible that considering all available parameters together could have improved the performance of the PCA, although it seems likely that the high number of samples below detection limits would continue to limit the usefulness of some of the variables measured as health standards.
Kriging methods rely on an estimation of the spatial structure of data. While semivariogram models provide a means of investigating spatial relationships, kriging typically requires the assumption that the chosen semivariogram model represents the true spatial structure. This assumption is relaxed with empirical Bayesian kriging allowing for uncertainty in the semivariogram parameters which likely contributes to the superior predictive performance of this method in our study. However, the performance of empirical Bayesian kriging was only marginally better than that of ordinary kriging. Other researchers have investigated Bayesian statistical methods to predict arsenic groundwater concentrations which incorporate spatial relationships using alternatives to semivariograms [58,62]. Methods such as these could potentially be used to improve prediction of arsenic concentrations and overcome some of the limitations of kriging especially when spatial variability arises from processes at different scales, limiting the effectiveness of variogram modeling even after allowing for uncertainty in the semivariogram.

5. Conclusions

In this study, we investigated the use of PCA and kriging to predict groundwater water quality across southern Saskatchewan as part of the exposure assessment for epidemiological investigation of links between water quality and chronic disease.
A considerable number of samples from rural public water supplies and private wells exceeded the standards and objectives set by the province of SK for regulated supplies, highlighting the need for study of potential health impacts associated with consumption of drinking water of poor quality in rural populations. Arsenic concentrations exceeded the health standard of 0.01 mg/L in 7% of public water supplies and 13% of private wells. Guidelines for aesthetic objectives were exceeded with greater frequency than health standards. These results suggest that the investigation of health impacts of unpalatable water and mixtures of contaminants may be particularly important.
We compared ordinary, universal and Bayesian kriging for predicting log arsenic concentrations and PC scores across the study area for public and private water supplies. Across the variables investigated, empirical Bayesian kriging resulted in the greatest accuracy of the most predicted values.
As a method that is accessible in commercial software packages, empirical Bayesian kriging represents a flexible and viable statistical technique that could be used to summarize water quality as a method of exposure assessment. While geostatistical analysis may be more informative if applied over smaller regions and higher sampling densities, the objective of this study was to estimate exposure to drinking water parameters over a very large geographic area in rural Saskatchewan.

Acknowledgments

The study was funded by a grant from the Saskatchewan Health Research Foundation. Water surveillance data were provided by the Saskatchewan Water Security Agency. The authors gratefully acknowledge the assistance and advice provided by Lorelei Ford and Sam Ferris of the Saskatchewan Water Security Agency.

Author Contributions

Cheryl L. Waldner, Lalita Bharadwaj and Tasha Epp conceived the study objectives; Lianne McLeod analyzed the data in consultation with Cheryl L. Waldner and Tasha Epp; Lianne McLeod wrote the paper; Cheryl L. Waldner edited the paper; and Tasha Epp and Lalita Bharadwaj reviewed the manuscript and provided comments.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Cross Validation Plots for Arsenic and Principal Component Scores

Figure A1. Predicted vs. measured values for the natural logarithm of arsenic concentration obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Figure A1. Predicted vs. measured values for the natural logarithm of arsenic concentration obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g004
Figure A2. Predicted vs. measured values for PC1health scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Figure A2. Predicted vs. measured values for PC1health scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g005
Figure A3. Predicted vs. measured values for PC2health scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Figure A3. Predicted vs. measured values for PC2health scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g006
Figure A4. Predicted vs. measured values for PC3health scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Figure A4. Predicted vs. measured values for PC3health scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g007
Figure A5. Predicted vs. measured values for PC1aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Figure A5. Predicted vs. measured values for PC1aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g008
Figure A6. Predicted vs. measured values for PC2aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Figure A6. Predicted vs. measured values for PC2aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g009
Figure A7. Predicted vs. measured values for PC3aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Figure A7. Predicted vs. measured values for PC3aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for: public water supplies (top row); and private wells (bottom row). Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g010
Figure A8. Predicted vs. measured values for PC4aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for public water supplies. Line of perfect concordance indicated by dashed line.
Figure A8. Predicted vs. measured values for PC4aesthetic scores obtained from the cross-validation results for ordinary, universal, and empirical Bayesian kriging for public water supplies. Line of perfect concordance indicated by dashed line.
Ijerph 14 01065 g011

Appendix B. Prediction Maps for Principal Component Scores

Figure A9. (a) Predicted PC1health scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Figure A9. (a) Predicted PC1health scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Ijerph 14 01065 g012
Figure A10. (a) Predicted PC2health scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Figure A10. (a) Predicted PC2health scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Ijerph 14 01065 g013
Figure A11. (a) Predicted PC3health scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Figure A11. (a) Predicted PC3health scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Ijerph 14 01065 g014
Figure A12. (a) Predicted PC1aesthetic scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Figure A12. (a) Predicted PC1aesthetic scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Ijerph 14 01065 g015
Figure A13. (a) Predicted PC2aesthetic scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Figure A13. (a) Predicted PC2aesthetic scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Ijerph 14 01065 g016
Figure A14. (a) Predicted PC3aesthetic scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Figure A14. (a) Predicted PC3aesthetic scores for public water supplies and private wells in study area; and (b) the corresponding prediction standard errors.
Ijerph 14 01065 g017
Figure A15. (a) Predicted PC4aesthetic scores for public water supplies in study area; and (b) the corresponding prediction standard errors. The fourth principal component was extracted for public water supply data only.
Figure A15. (a) Predicted PC4aesthetic scores for public water supplies in study area; and (b) the corresponding prediction standard errors. The fourth principal component was extracted for public water supply data only.
Ijerph 14 01065 g018

References

  1. Navas-Acien, A.; Sharrett, A.R.; Silbergeld, E.K.; Schwartz, B.S.; Nachman, K.E.; Burke, T.A.; Guallar, E. Arsenic exposure and cardiovascular disease: A systematic review of the epidemiologic evidence. Am. J. Epidemiol. 2005, 162, 1037–1049. [Google Scholar] [CrossRef] [PubMed]
  2. Navas-Acien, A.; Silbergeld, E.K.; Streeter, R.A.; Clark, J.M.; Burke, T.A.; Guallar, E. Arsenic exposure and type 2 diabetes: A systematic review of the experimental and epidemiologic evidence. Environ. Health Perspect. 2006, 114, 641–648. [Google Scholar] [CrossRef] [PubMed]
  3. Chen, C.-J.; Wang, S.-L.; Chiou, J.-M.; Tseng, C.-H.; Chiou, H.-Y.; Hsueh, Y.-M.; Chen, S.-Y.; Wu, M.-M.; Lai, M.-S. Arsenic and diabetes and hypertension in human populations: A review. Toxicol. Appl. Pharmacol. 2007, 222, 298–304. [Google Scholar] [CrossRef] [PubMed]
  4. Celik, I.; Gallicchio, L.; Boyd, K.; Lam, T.K.; Matanoski, G.; Tao, X.; Shiels, M.; Hammond, E.; Chen, L.; Robinson, K.A.; et al. Arsenic in drinking water and lung cancer: A systematic review. Environ. Res. 2008, 108, 48–55. [Google Scholar] [CrossRef] [PubMed]
  5. Lisabeth, L.D.; Ahn, H.J.; Chen, J.J.; Sealy-Jefferson, S.; Burke, J.F.; Meliker, J.R. Arsenic in drinking water and stroke hospitalizations in Michigan. Stroke 2010, 41, 2499–2504. [Google Scholar] [CrossRef] [PubMed]
  6. Chen, Y.; Graziano, J.H.; Parvez, F.; Liu, M.; Slavkovich, V.; Kalra, T.; Argos, M.; Islam, T.; Ahmed, A.; Rakibuz-Zaman, M.; et al. Arsenic exposure from drinking water and mortality from cardiovascular disease in Bangladesh: Prospective cohort study. BMJ 2011, 342, d2431. [Google Scholar] [CrossRef] [PubMed]
  7. Moon, K.; Guallar, E.; Navas-Acien, A. Arsenic exposure and cardiovascular disease: An updated systematic review. Curr. Atheroscler. Rep. 2012, 14, 542–555. [Google Scholar] [CrossRef] [PubMed]
  8. Onufrak, S.J.; Park, S.; Sharkey, J.R.; Sherry, B. The relationship of perceptions of tap water safety with intake of sugar-sweetened beverages and plain water among US adults. Public Health Nutr. 2014, 17, 179–185. [Google Scholar] [CrossRef] [PubMed]
  9. Monosson, E. Chemical Mixtures: Considering the evolution of toxicology and chemical assessment. Environ. Health Perspect. 2005, 113, 383–390. [Google Scholar] [CrossRef] [PubMed]
  10. Murphy, E.A.; Post, G.B.; Buckley, B.T.; Lippincott, R.L.; Robson, M.G. Future challenges to protecting public health from drinking-water contaminants. Annu. Rev. Public Health 2012, 33, 209–224. [Google Scholar] [CrossRef] [PubMed]
  11. Villanueva, C.M.; Kogevinas, M.; Cordier, S.; Templeton, M.R.; Vermeulen, R.; Nuckols, J.R.; Nieuwenhuijsen, M.J.; Levallois, P. Assessing exposure and health consequences of chemicals in drinking water: Current state of knowledge and research needs. Environ. Health Perspect. 2014, 122, 213. [Google Scholar] [CrossRef] [PubMed]
  12. Dunn, G.; Bakker, K.; Harris, L. Drinking water quality guidelines across canadian provinces and territories: Jurisdictional variation in the context of decentralized water governance. Int. J. Environ. Res. Public Health 2014, 11, 4634–4651. [Google Scholar] [CrossRef] [PubMed]
  13. Guidelines for Canadian Drinking Water Quality—Summary Table. Available online: https://www.canada.ca/en/health-canada/services/environmental-workplace-health/reports-publications/water-quality/guidelines-canadian-drinking-water-quality-summary-table-health-canada-2012.html (accessed on 24 October 2014).
  14. Water Security Agency Saskatchewan’s Drinking Water Quality Standards and Objectives (Summarized). Available online: http://www.saskh2o.ca/pdf/epb507.pdf (accessed on 14 May 2016).
  15. Drinking Water—Source to Tap Solutions. Available online: http://www.saskh2o.ca/PDF/Source_to_Tap.pdf (accessed on 15 November 2013).
  16. Municipal Drinking Water Quality Monitoring Guidelines 2016. Available online: http://www.saskh2o.ca/pdf/epb202.pdf (accessed on 5 November 2016).
  17. Thompson, T.S.; Le, M.D.; Kasick, A.R.; Macaulay, T.J. Arsenic in well water supplies in Saskatchewan. Bull. Environ. Contam. Toxicol. 1999, 63, 478–483. [Google Scholar] [CrossRef] [PubMed]
  18. Thompson, T.S. Nitrate concentrations in private rural drinking water supplies in Saskatchewan, Canada. Bull. Environ. Contam. Toxicol. 2001, 66, 64–70. [Google Scholar] [CrossRef] [PubMed]
  19. Thompson, T.S. General chemical water quality of private groundwater supplies in Saskatchewan, Canada. Bull. Environ. Contam. Toxicol. 2003, 70, 0447–0454. [Google Scholar] [CrossRef] [PubMed]
  20. Corkal, D.; Schutzman, W.C.; Hilliard, C.R. Rural water safety from the source to the on-farm tap. J. Toxicol. Environ. Health A 2004, 67, 1619–1642. [Google Scholar] [CrossRef] [PubMed]
  21. Charrois, J.W.A. Private drinking water supplies: Challenges for public health. Can. Med. Assoc. J. 2010, 182, 1061–1064. [Google Scholar] [CrossRef] [PubMed]
  22. Government of Saskatchewan Sask H2O—Water Information Quick Facts. Available online: http://www.saskh20.ca/WaterInformation_QuickFacts.asp (accessed on 7 November 2014).
  23. Sketchell, J.; Shaheen, N. Ground water quality in rural Saskatchewan—Emerging issues for drinking water. In Proceedings of the Maintaining Drinking Water Quality—Lessons from the Prairies and Beyond, 9th National Conference on Drinking Water, Regina, SK, Canada, 16–18 May 2000. [Google Scholar]
  24. McLeod, L.; Bharadwaj, L.; Waldner, C. Risk factors associated with perceptions of drinking water quality in rural Saskatchewan. Can. Water Resour. J. Rev. Can. Ressour. Hydr. 2015, 40, 36–46. [Google Scholar] [CrossRef]
  25. Goovaerts, P.; AvRuskin, G.; Meliker, J.; Slotnick, M.; Jacquez, G.; Nriagu, J. Geostatistical modeling of the spatial variability of arsenic in groundwater of southeast Michigan. Water Resour. Res. 2005, 41, W07013. [Google Scholar] [CrossRef]
  26. Meliker, J.R.; AvRuskin, G.A.; Slotnick, M.J.; Goovaerts, P.; Schottenfeld, D.; Jacquez, G.M.; Nriagu, J.O. Validity of spatial models of arsenic concentrations in private well water. Environ. Res. 2008, 106, 42–50. [Google Scholar] [CrossRef] [PubMed]
  27. Yang, Q.; Jung, H.B.; Culbertson, C.W.; Marvinney, R.G.; Loiselle, M.C.; Locke, D.B.; Cheek, H.; Thibodeau, H.; Zheng, Y. Spatial pattern of groundwater arsenic occurrence and association with bedrock geology in Greater Augusta, Maine, USA. Environ. Sci. Technol. 2009, 43, 2714–2719. [Google Scholar] [CrossRef] [PubMed]
  28. Gong, G.; Mattevada, S.; O’Bryant, S.E. Comparison of the accuracy of kriging and IDW interpolations in estimating groundwater arsenic concentrations in Texas. Environ. Res. 2014, 130, 59–69. [Google Scholar] [CrossRef] [PubMed]
  29. James, K.A.; Meliker, J.R.; Buttenfield, B.E.; Byers, T.; Zerbe, G.O.; Hokanson, J.E.; Marshall, J.A. Predicting arsenic concentrations in groundwater of San Luis Valley, Colorado: Implications for individual-level lifetime exposure assessment. Environ. Geochem. Health 2014, 36, 773–782. [Google Scholar] [CrossRef] [PubMed]
  30. Helena, B.; Pardo, R.; Vega, M.; Barrado, E.; Fernandez, J.M.; Fernandez, L. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga River, Spain) by principal component analysis. Water Res. 2000, 34, 807–816. [Google Scholar] [CrossRef]
  31. Liu, C.-W.; Lin, K.-H.; Kuo, Y.-M. Application of factor analysis in the assessment of groundwater quality in a blackfoot disease area in Taiwan. Sci. Total Environ. 2003, 313, 77–89. [Google Scholar] [CrossRef]
  32. Chapagain, S.K.; Pandey, V.P.; Shrestha, S.; Nakamura, T.; Kazama, F. Assessment of deep groundwater quality in Kathmandu Valley using multivariate statistical techniques. Water. Air. Soil Pollut. 2010, 210, 277–288. [Google Scholar] [CrossRef]
  33. Belkhiri, L.; Boudoukha, A.; Mouni, L. A multivariate statistical analysis of groundwater chemistry data. Int. J. Environ. Res. 2011, 5, 537–544. [Google Scholar]
  34. Sánchez-Martos, F.; Jiménez-Espinosa, R.; Pulido-Bosch, A. Mapping groundwater quality variables using PCA and geostatistics: A case study of Bajo Andarax, southeastern Spain. Hydrol. Sci. J. 2001, 46, 227–242. [Google Scholar] [CrossRef]
  35. Satyaji Rao, Y.R.; Keshari, A.K.; Gosain, A.K. Evaluation of regional groundwater quality using PCA and geostatistics in the urban coastal aquifer, East Coast of India. Int. J. Environ. Waste Manag. 2009, 5, 163–180. [Google Scholar]
  36. Shyu, G.-S.; Cheng, B.-Y.; Chiang, C.-T.; Yao, P.-H.; Chang, T.-K. Applying factor analysis combined with kriging and information entropy theory for mapping and evaluating the stability of groundwater quality variation in Taiwan. Int. J. Environ. Res. Public Health 2011, 8, 1084–1109. [Google Scholar] [CrossRef] [PubMed]
  37. Nazzal, Y.; Zaidi, F.K.; Ahmed, I.; Ghrefat, H.; Naeem, M.; Al-Arifi, N.S.N.; Al-Shaltoni, S.A.; Al-Kahtany, K.M. The combination of principal component analysis and geostatistics as a technique in assessment of groundwater hydrochemistry in arid environment. Curr. Sci. (00113891) 2015, 108, 1138–1145. [Google Scholar]
  38. Canadian Association for Laboratory Accreditation Inc. CALA Directory of Laboratories. Available online: http://www.caladirectory.ca/index.php (accessed on 22 October 2016).
  39. Farnham, I.M.; Singh, A.K.; Stetzenbach, K.J.; Johannesson, K.H. Treatment of nondetects in multivariate analysis of groundwater geochemistry data. Chemom. Intell. Lab. Syst. 2002, 60, 265–281. [Google Scholar] [CrossRef]
  40. Saskatchewan Community Fluoride Data 2010. Available online: https://cypresshealth.ca/wp-content/uploads/2015/02/SaskCommunityFluorideData2010doc_3_pdf (accessed on 16 May 2016).
  41. Dohoo, I.R.; Martin, S.W.; Stryhn, H.; Dohoo, I.R. Methods in Epidemiologic Research; VER Inc.: Charlottetown, PE, Canada, 2012; ISBN 978-0-919013-73-5. [Google Scholar]
  42. Singh, B.P.; Hatton, B.J.; Singh, B.; Cowie, A.L.; Kathuria, A. Influence of Biochars on Nitrous Oxide Emission and Nitrogen Leaching from Two Contrasting Soils. J. Environ. Qual. 2010, 39, 1224. [Google Scholar] [CrossRef] [PubMed]
  43. The Township System. Available online: https://www.isc.ca/About/History/LandSurveys/MeasuringLandSask/Pages/TownshipSystem.aspx (accessed on 27 November 2014).
  44. Cloutier, V.; Lefebvre, R.; Therrien, R.; Savard, M.M. Multivariate statistical analysis of geochemical data as indicative of the hydrogeochemical evolution of groundwater in a sedimentary rock aquifer system. J. Hydrol. 2008, 353, 294–313. [Google Scholar] [CrossRef]
  45. Güler, C.; Thyne, G.D.; McCray, J.E.; Turner, K.A. Evaluation of graphical and multivariate statistical methods for classification of water chemistry data. Hydrogeol. J. 2002, 10, 455–474. [Google Scholar] [CrossRef]
  46. Kaiser, H.F.; Rice, J. Little Jiffy, mark IV. Educ. Psychol. Meas. 1974, 34, 111–117. [Google Scholar] [CrossRef]
  47. Bartlett, M.S. Tests of significance in factor analysis. Br. J. Math. Stat. Psychol. 1950, 3, 77–85. [Google Scholar] [CrossRef]
  48. Kaiser, H.F. The application of electronic computers to factor analysis. Educ. Psychol. Meas. 1960, 20, 141–151. [Google Scholar] [CrossRef]
  49. Chang, K. Introduction to Geographic Information Systems with Data Files and CD-ROM, 5th ed.; McGraw-Hill Science/Engineering/Math: Dubuque, IA, USA, 2009; ISBN 978-0-07-729436-6. [Google Scholar]
  50. Empirical Bayesian Kriging. Available online: http://www.esri.com/news/arcuser/1012/empirical-byesian-kriging.html (accessed on 15 May 2016).
  51. Olea, R.A. A six-step practical approach to semivariogram modeling. Stoch. Environ. Res. Risk Assess. 2006, 20, 307–318. [Google Scholar] [CrossRef]
  52. Robinson, T.P.; Metternicht, G. Testing the performance of spatial interpolation techniques for mapping soil properties. Comput. Electron. Agric. 2006, 50, 97–108. [Google Scholar] [CrossRef]
  53. Environment and Climate Change Canada Environment and Climate Change Canada—Water—Groundwater. Available online: https://www.ec.gc.ca/eau-water/default.asp?lang=En&n=300688DC-1#sub5 (accessed on 13 August 2016).
  54. Singh, C.K.; Shashtri, S.; Mukherjee, S. Integrating multivariate statistical analysis with GIS for geochemical assessment of groundwater quality in Shiwaliks of Punjab, India. Environ. Earth Sci. 2010, 62, 1387–1405. [Google Scholar] [CrossRef]
  55. Maathuis, H. The Quality of Natural Groundwaters in Saskatchewan; Saskatchewan Research Council: Saskatoon, SK, Canada, 2008. [Google Scholar]
  56. Hassan, M.M.; Atkins, P.J. Application of geostatistics with indicator kriging for analyzing spatial variability of groundwater arsenic concentrations in Southwest Bangladesh. J. Environ. Sci. Health Part A 2011, 46, 1185–1196. [Google Scholar] [CrossRef] [PubMed]
  57. Yu, W.H.; Harvey, C.M.; Harvey, C.F. Arsenic in groundwater in Bangladesh: A geostatistical and epidemiological framework for evaluating health effects and potential remedies. Water Resour. Res. 2003, 39, 1146. [Google Scholar] [CrossRef]
  58. Kim, D.; Miranda, M.L.; Tootoo, J.; Bradley, P.; Gelfand, A.E. Spatial modeling for groundwater arsenic levels in North Carolina. Environ. Sci. Technol. 2011, 45. [Google Scholar] [CrossRef] [PubMed]
  59. Van Geen, A.; Zheng, Y.; Versteeg, R.; Stute, M.; Horneman, A.; Dhar, R.; Steckler, M.; Gelman, A.; Small, C.; Ahsan, H.; et al. Spatial variability of arsenic in 6000 tube wells in a 25 km2 area of Bangladesh. Water Resour. Res. 2003, 39, 1140. [Google Scholar] [CrossRef]
  60. Rothman, K.J. Induction and latent periods. Am. J. Epidemiol. 1981, 114, 253–259. [Google Scholar] [CrossRef] [PubMed]
  61. Armstrong, B.K.; White, E.; Saracci, R. Principles of Exposure Measurement in Epidemiology; Oxford University Press: Oxford, UK, 1992; ISBN 0-19-262020-7. [Google Scholar]
  62. Sanders, A.P.; Messier, K.P.; Shehee, M.; Rudo, K.; Serre, M.L.; Fry, R.C. Arsenic in North Carolina: Public health implications. Environ. Int. 2012, 38, 10–16. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The density of sampling sites for: public water supplies (left); and private wells (right) across the province. The solid line indicates the boundary for the geostatistical analysis. Inset map shows location of the province of Saskatchewan within Canada.
Figure 1. The density of sampling sites for: public water supplies (left); and private wells (right) across the province. The solid line indicates the boundary for the geostatistical analysis. Inset map shows location of the province of Saskatchewan within Canada.
Ijerph 14 01065 g001
Figure 2. Arsenic concentrations (μg/L) predicted by empirical Bayesian kriging for: public water supply data (left); and private well data (right) in study area in southern Saskatchewan.
Figure 2. Arsenic concentrations (μg/L) predicted by empirical Bayesian kriging for: public water supply data (left); and private well data (right) in study area in southern Saskatchewan.
Ijerph 14 01065 g002
Figure 3. Empirical Bayesian kriging prediction maps for natural logarithm transformed arsenic concentrations (a); and the prediction standard error maps (b) for public supplies (left) and private supplies (right) in study area in southern Saskatchewan.
Figure 3. Empirical Bayesian kriging prediction maps for natural logarithm transformed arsenic concentrations (a); and the prediction standard error maps (b) for public supplies (left) and private supplies (right) in study area in southern Saskatchewan.
Ijerph 14 01065 g003
Table 1. Descriptive statistics for individual samples in public water supplies in Saskatchewan during 1985–2012, along with the Saskatchewan standards and objectives, percent of samples exceeding the respective standard or objective, and percent of samples below detection limit for the water parameters included in the analysis.
Table 1. Descriptive statistics for individual samples in public water supplies in Saskatchewan during 1985–2012, along with the Saskatchewan standards and objectives, percent of samples exceeding the respective standard or objective, and percent of samples below detection limit for the water parameters included in the analysis.
SamplesMeanSDMedianP95MaxSK StandardExceedancesBelow DL
(n)(mg/L)(mg/L)(mg/L)(mg/L)(mg/L)(mg/L)Percent of SamplesPercent of Samples
Health standards
Arsenic47320.0030.0070.0010.0140.0980.016.922.9
Barium44850.050.100.020.182.4010.042.8
Boron41160.360.500.241.206.0050.25.5
Lead45690.00190.00780.00050.00700.41000.012.567.3
Nitrate956211.620.53.042.0933.0454.131.4
Selenium45270.0010.0060.0010.0040.1400.011.972.2
Uranium46170.0060.0110.0030.0230.1800.027.316.9
Aesthetic objectives SK Objective
Alkalinity5404408154408674245150022.00
Chloride543548.586.918.0233.41803.02504.23.8
Copper44970.0840.2900.0180.3106.20011.26.8
Hardness41625363414891107780080020.70.2
Iron45870.301.330.061.0846.000.318.85.9
Magnesium312055.537.449.0125.0449.02000.0024.3
Manganese46140.261.580.070.98101.000.0553.57.2
Sodium435316219080585186830018.80.1
Sulfate42844033673261045900050032.52.5
TDS 42901283661119924536687150034.60
Zinc44810.030.240.010.0811.0050.0425.4
SD = standard deviation, P95 = 95th percentile, max = maximum, DL = detection limit, TDS = Total Dissolved Solids.
Table 2. Descriptive statistics for individual samples in private water supplies in Saskatchewan during 1996–2011, along with the Saskatchewan standards and objectives, percent of samples exceeding the respective standard or objective, and percent of samples below detection limit for the water parameters included in the analysis.
Table 2. Descriptive statistics for individual samples in private water supplies in Saskatchewan during 1996–2011, along with the Saskatchewan standards and objectives, percent of samples exceeding the respective standard or objective, and percent of samples below detection limit for the water parameters included in the analysis.
SamplesMeanSDMedianP95MaxSK StandardExceedancesBelow DL
(n)(mg/L)(mg/L)(mg/L)(mg/L)(mg/L)(mg/L)Percent of SamplesPercent of Samples
Health Standards
Arsenic40820.0050.0120.0010.0230.2100.0113.521.3
Barium40820.080.140.030.262.1910.40.3
Boron40820.330.540.151.407.1050.21.8
Lead40820.00070.00430.00050.00140.21000.010.772.9
Nitrate399624.573.51.20126.01300.04512.227.4
Selenium40760.0080.0360.0010.0330.8400.0111.241.1
Uranium40760.0120.0210.0050.0440.4000.0217.811.9
Aesthetic Objectives SK Objective
Alkalinity4019416148399671162050021.8n/a
Chloride401969.8178.021.0257.04090.02505.21.7
Copper40800.0110.0370.0030.0441.10010.0224.1
Hardness40196955695571760681080030.70.3
Iron40911.242.810.126.0040.000.340.51.2
Magnesium401981.084.460.0220.01450.02006.10.3
Manganese40910.440.680.181.7011.000.0568.22.5
Sodium401918123784653271030020.80
Sulfate40195466183541680769050039.10.15
TDS 4019156010301330345011300150042.7n/a
Zinc40810.191.000.020.7631.0050.415.8
SD = standard deviation, P95 = 95th percentile, max = maximum, DL = detection limit, n/a = no detection limit available, TDS = Total Dissolved Solids.
Table 3. Descriptive statistics for predicted concentrations after repeated measures modeling for each parameter by site for public water supplies in Saskatchewan, along with Saskatchewan standards and objectives, and the percent of sites with a predicted concentration exceeding the standard or objective. A summary of the number of samples per site incorporated in repeated measures models for each parameter is also shown.
Table 3. Descriptive statistics for predicted concentrations after repeated measures modeling for each parameter by site for public water supplies in Saskatchewan, along with Saskatchewan standards and objectives, and the percent of sites with a predicted concentration exceeding the standard or objective. A summary of the number of samples per site incorporated in repeated measures models for each parameter is also shown.
Mean Predicted Concentration Number of Samples per Site
SitesMeanSDMedianP95MaxSK StandardExceedMinMedianMax
(n)(mg/L)(mg/L)(mg/L)(mg/L)(mg/L)(mg/L)Percent of Sites(n)(n)(n)
Health Standards
Arsenic4920.0020.0030.0010.0070.0390.012.01959
Barium4910.040.070.020.130.74101928
Boron4770.350.410.231.133.03501926
Lead4910.00060.000040.00060.00070.00070.0101928
Nitrate4972.676.241.0310.4295.18450.2111366
Selenium4920.0010.0010.00030.0020.0120.010.21928
Uranium4910.0050.0060.0030.0160.0760.022.61933
Aesthetic Objectives SK Objective
Alkalinity50340012740061290050018.111036
Chloride49942.964.219.8173.7489.82502.4110257
Copper4920.0240.0250.0150.0750.170101928
Hardness501492296457104971482780015.21832
Iron4820.100.120.0600.291.180.34.11934
Magnesium48352.334.046.4116.1191.920001626
Manganese4830.120.190.060.451.840.0552.61934
Sodium4881601768756188330018.01834
Sulfate480383318304995193050030.01834
TDS 4871277590118923553467150033.11834
Zinc4910.010.010.010.020.10501928
SD = standard deviation, Min = minimum, P95 = 95th percentile, Max = maximum, Exceed = predicted concentration exceeds standard or objective, TDS = Total Dissolved Solids.
Table 4. Principal components analysis on public water supplies and private well water supplies: varimax rotated principal components patterns, eigenvalues and percent of variance explained for each retained component. The loadings in bold font indicate the principal component with the maximum loading for each measured variable.
Table 4. Principal components analysis on public water supplies and private well water supplies: varimax rotated principal components patterns, eigenvalues and percent of variance explained for each retained component. The loadings in bold font indicate the principal component with the maximum loading for each measured variable.
Public Water SuppliesPrivate Wells
Health Standards
PC1PC2PC3 PC1PC2PC3
Arsenic−0.1210.1420.808−0.341−0.0910.474
Barium0.0470.818−0.141−0.0410.8930.100
Boron−0.0620.903−0.123−0.1950.8180.168
Lead0.4720.0920.1710.1560.0260.893
Nitrate0.768−0.071−0.1640.7700.275−0.110
Selenium0.867−0.2200.0190.853−0.0070.074
Uranium0.387−0.2900.5760.772−0.013−0.049
Eigenvalue2.1271.2751.0592.2901.3811.057
Cumulative variance (%)30.448.663.732.752.567.6
Aesthetic Objectives
PC1PC2PC3PC4PC1PC2PC3
Alkalinity0.7550.1110.164−0.1980.687−0.0230.217
Chloride0.753−0.1930.0020.2260.7790.043−0.195
Copper0.1270.012−0.2000.7140.0300.2230.757
Hardness0.0090.9730.0660.0420.0670.9600.038
Iron0.138−0.0890.9010.0530.1170.1210.784
Magnesium−0.0140.9610.0550.0380.1030.9510.020
Manganese0.1880.4520.711−0.0650.0620.4680.663
Sodium0.914−0.1990.1360.0260.922−0.1160.118
Sulfate0.6630.5170.0180.1160.6090.5550.076
Total Dissolved Solids0.9200.2880.121−0.0160.9070.3250.082
Zinc−0.0890.0780.2370.763−0.0910.396−0.375
Eigenvalue3.7462.3621.2641.1813.7752.1841.779
Cumulative variance (%)34.155.567.077.834.354.270.4
PC = principal component.
Table 5. Summary of parameters used as input for variogram models for ordinary and universal kriging.
Table 5. Summary of parameters used as input for variogram models for ordinary and universal kriging.
VariableNumber of SitesLag Distance (km)Number of LagsLarge Scale Trend
Public Supplies
Arsenic48013.03312nd order
PC1health45913.44302nd order
PC2health45913.44302nd order
PC3health45913.44302nd order
PC1aesthetic43513.84292nd order
PC2aesthetic 43513.84292nd order
PC3aesthetic 43513.84292nd order
PC4aesthetic 43513.84291st order
Private Supplies
Arsenic40732.521001st order
PC1health39702.551002nd order
PC2health39702.551002nd order
PC3health39702.551002nd order
PC1aesthetic39992.541002nd order
PC2aesthetic39992.541002nd order
PC3aesthetic39992.541001st order
PC = principal component.
Table 6. Root mean square error values for each parameter obtained from cross validation analysis of ordinary, universal and empirical Bayesian kriging models for public water supply and private well data. The lowest RMSE for each variable is bolded.
Table 6. Root mean square error values for each parameter obtained from cross validation analysis of ordinary, universal and empirical Bayesian kriging models for public water supply and private well data. The lowest RMSE for each variable is bolded.
OrdinaryUniversalBayesian
Municipal Systems
Arsenic1.01151.26341.0180
PC1health0.97301.03000.9734
PC2health0.95103.53630.9586
PC3health0.89871.08650.8925
PC1aesthetic0.92761.02930.9268
PC2aesthetic0.88990.95670.9013
PC3aesthetic0.99936.29330.9822
PC4aesthetic1.02041.02391.0232
Private Wells
Arsenic1.59491.60811.5593
PC1health0.9130212.780.9090
PC2health0.83880.91110.8200
PC3health0.95361.02990.9626
PC1aesthetic0.8234211.460.8091
PC2aesthetic0.86180.89180.8020
PC3aesthetic0.96300.97240.9606

Share and Cite

MDPI and ACS Style

McLeod, L.; Bharadwaj, L.; Epp, T.; Waldner, C.L. Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan. Int. J. Environ. Res. Public Health 2017, 14, 1065. https://doi.org/10.3390/ijerph14091065

AMA Style

McLeod L, Bharadwaj L, Epp T, Waldner CL. Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan. International Journal of Environmental Research and Public Health. 2017; 14(9):1065. https://doi.org/10.3390/ijerph14091065

Chicago/Turabian Style

McLeod, Lianne, Lalita Bharadwaj, Tasha Epp, and Cheryl L. Waldner. 2017. "Use of Principal Components Analysis and Kriging to Predict Groundwater-Sourced Rural Drinking Water Quality in Saskatchewan" International Journal of Environmental Research and Public Health 14, no. 9: 1065. https://doi.org/10.3390/ijerph14091065

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop