Next Article in Journal
Parental Nonstandard Work Schedules and Child Development: Evidence from Dual-Earner Families in Hong Kong
Next Article in Special Issue
Modeling Community Health with Areal Data: Bayesian Inference with Survey Standard Errors and Spatial Structure
Previous Article in Journal
Rapid Review on COVID-19, Work-Related Aspects, and Age Differences
Previous Article in Special Issue
The Impact of Individual Mobility on Long-Term Exposure to Ambient PM2.5: Assessing Effect Modification by Travel Patterns and Spatial Variability of PM2.5
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Soil Sample Assay Uncertainty and the Geographic Distribution of Contaminants: Error Impacts on Syracuse Trace Metal Soil Loading Analysis Results

School of Economic, Political and Policy Sciences, The University of Texas at Dallas, 800 West Campbell Road, Richardson, TX 75080, USA
*
Authors to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2021, 18(10), 5164; https://doi.org/10.3390/ijerph18105164
Submission received: 6 April 2021 / Revised: 4 May 2021 / Accepted: 9 May 2021 / Published: 13 May 2021
(This article belongs to the Special Issue Spatial Data Uncertainty in Public Health Research)

Abstract

:
A research team collected 3609 useful soil samples across the city of Syracuse, NY; this data collection fieldwork occurred during the two consecutive summers (mid-May to mid-August) of 2003 and 2004. Each soil sample had fifteen heavy metals (As, Cr, Cu, Co, Fe, Hg, Mo, Mn, Ni, Pb, Rb, Se, Sr, Zn, and Zr), measured during its assaying; errors for these measurements are analyzed in this paper, with an objective of contributing to the geography of error literature. Geochemistry measurements are in milligrams of heavy metal per kilogram of soil, or ppm, together with accompanying analytical measurement errors. The purpose of this paper is to summarize and portray the geographic distribution of these selected heavy metals measurement errors across the city of Syracuse. Doing so both illustrates the value of the SAAR software’s uncertainty mapping module and uncovers heavy metal characteristics in the geographic distribution of Syracuse’s soil. In addition to uncertainty visualization portraying and indicating reliability information about heavy metal levels and their geographic patterns, SAAR also provides optimized map classifications of heavy metal levels based upon their uncertainty (utilizing the Sun-Wong separability criterion) as well as an optimality criterion that simultaneously accounts for heavy metal levels and their affiliated uncertainty. One major outcome is a summary and portrayal of the geographic distribution of As, Cr, Cu, Co, Fe, Hg, Mo, Mn, Ni, Pb, Rb, Se, Sr, Zn, and Zr measurement error across the city of Syracuse.

1. Introduction

The geography of error (arising from, e.g., calculation, sampling, measurement, specification, and stochastic sources) dates back many decades, with Goodchild and Gopal’s [1] book, followed by the first International Symposia on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences convened in Williamsburg, VA, in 1994 (http://www.spatial-accuracy.org/History, accessed on 8 February 2021), signifying more concerted geospatial research efforts to address this theme. The history of this topic parallels that of popular applied statistics in general, which evolved from reporting only central tendency values to also reporting their associated uncertainties (e.g., margins of error), which became a common practice only after 1928. In parallel, maps began including error statements, mostly after 1941, although they were global map-wide ones. However, reporting a global error measure on an increasing number of maps furnishes little knowledge and understanding about the geography of error in general (even though a meta-analysis might provide some insights). Meanwhile, as geostatistical analysis became more prevalent (the software packages GEO-EAS [2] and GSLIB [3] appeared at the beginning of the 1990s), prediction error maps began accompanying krigged surface (i.e., mean response) maps, a practice that continues today. The principal disclosure of this geography of error exception is that its map pattern relates to the underlying sampling network, which almost always is visible in a prediction error map. Hexagonal-tessellation stratified random spatial sampling (e.g., [4]) error constitutes another exception, with a primary concern of its geography being the relationship between spatial landscape coverage and sampling error. Current georeferenced data releases, such as those from the United States (USA) American Community Survey (ACS), include sampling errors, furnishing an increasing number of databases to study the geography of sampling error in a more comprehensive way.
The objective of this paper is to contribute to this geography of error literature, focusing on combinations of measurement (i.e., soil geochemistry content analysis accuracy/precision), sampling (i.e., the selection of only a tiny fraction of soil in a geographic landscape), and specification (i.e., the correctness of assumptions and/or functional forms of equations for an analysis) errors in the context of spatial autocorrelation, a fundamental feature of geospatial data. Its empirical analyses and simulation experiments exploit a unique database containing analytical measurement errors (i.e., uncertainty introduced by chemical assay procedures) for trace metals contaminating soil in the city of Syracuse, NY, USA.

2. Background

Errors in georeferenced data can occur in both of their components: location and attribute. The geography and GIScience literature contains extensive investigations about location errors in various contexts, including geocoding (e.g., [5,6]) and raster modeling (e.g., [7]). This literature also includes studies about georeferenced data attribute errors. Griffith et al. [8] discuss four major sources of georeferenced data errors, namely sampling error, measurement error, specification error, and analytical assaying error. The literature addresses these first three sources for attribute errors. For example, Griffith et al. [9] argue that sampling error of estimates is heterogeneous across spatial units, specifically census tracts, and tends to correlate with both the size of tracts and their socio-economic characteristics. Wang et al. [10] discuss sampling as a major source of error, and present a sampling approach to reduce error variances. In addition, Leung et al. [11] discuss a general error analysis framework concerning georeferenced data measurements, whereas many literature entries (e.g., [12,13,14]) recognize model specification error. In contrast, assaying errors have not been a popular topic in spatial analysis and/or spatial statistics, although they generally occur in field surveys of, for example, soil [15,16]. Kriging is one spatial analysis topic that does not avoid referencing assaying errors; it links these errors to the nugget effect [17]. Otherwise, assaying errors are rarely investigated in spatial analysis.
Griffith et al. [8] discuss ways to enhance research about uncertainty. They focus on the following four research themes: visualizing error; spatial patterns and spatial modelling acknowledging error; spatial data aggregation; and data quality. Visualizing error, the first theme, can improve the current understanding about spatial patterns of information uncertainty. Research dealing with mapping uncertainty presents various approaches focusing on mapping observations together with their data quality information [18,19]. Recently, Koo et al. [20] discussed a framework for uncertainty mapping. Nevertheless, the common protocol has two side-by-side maps, displaying a geographic distribution of values juxtaposed with a geographic distribution of individual value uncertainties, because concurrently exhibiting the two sets of information in a single map tends to be overwhelmed by the portrayal of a large amount of data in a limited two-dimensional space.
Spatial patterns and spatial data modelling constitute a second research theme that can benefit from incorporating uncertainty into spatial analysis. Sun et al. [21] discuss a map classification technique that seeks to avoid adverse impacts by the presence of uncertainty, one of which is the possibility of a different map pattern outcome. They present a new map classification approach incorporating uncertainty, demonstrating it with American Community Survey data, which is accompanied by sampling error measures. Meanwhile, Koo et al. [22] and Mu and Tong [23] present alternative approaches to incorporating uncertainty into map classifications. New approaches to spatial modelling recognizing uncertainty, going beyond widely recognized model specification issues such as omitted variables [24], also need development. Hu et al. [14] furnish one promising effort that models the existence of a mixture of positive and negative spatial autocorrelation in spatial data, showing how successfully accounting for such a mixture pattern can enhance a spatial analysis.
A third research theme, spatial data aggregation, can also raise uncertainty issues. The modifiable areal unit problem (MAUP) describes a convolution of spatial analysis, with observations based on polygon area units [25]. Lee et al. [26] present simulation experimental results demonstrating how spatial autocorrelation levels may be affected by areal unit aggregation (both geographic resolution and zonation). Despite these spatial data aggregation issues being recognized in the literature for decades now, a universally acceptable solution to the MAUP remains elusive for geographic analysis. Finally, Griffith et al. [8] discuss the importance of spatial metadata for data quality so that users understand the accuracy level of data and, when necessary, more cautiously assess spatial analysis results.
Given this preceding context, this paper investigates assaying measurement errors of heavy metal amounts in soil sample observations. It focuses mainly on the geographic patterns that materialize from mapping individual soil samples with point locations and then further aggregates these data by census tracts. In addition, this paper explores both individual and a mixture of different error types with simulation experiments. By doing so, its principal knowledge contribution is to the geography of error literature.

3. About the Data

A total of 3628 useful surface soil samples were collected across the city of Syracuse, New York, USA, mostly during the summers of 2003 and 2004, under the auspices of the National Science Foundation (BCS-0552588), with Institutional Review Board (IRB) oversight by that organization for Syracuse University, then the University of Miami, and finally the University of Texas at Dallas. Of these, 3324 unique location samples―multiple samples for the same locations were averaged by location―had useful (i.e., positive) analytical assay error values (Figure 1); two of these soil samples had locations outside of Syracuse, resulting in their being removed from census tract resolution spatial analyses. This dataset contained a rich set of observations, with extensive sample coverage across the city of Syracuse, including measurements for 15 heavy metals and corresponding assay errors with accurate point location tags. Accordingly, it furnished a rare opportunity to investigate geographic patterns of assay error at the individual point level as well as to compare these individual results with ones at an aggregated geographic resolution, specifically the census tract level. Currently, this type of dataset is scarce.
A NITON XL-700-series x-ray fluorescence (XRF) instrument (NITON Corporation 900 Middlesex Turnpike, Billerica, MA 01821)was used in a chemistry laboratory to measure 15 trace metals (As, Cr, Co, Cu, Fe, Hg, Mn, Mo, Ni, Pb, Rb, Se, Sr, Zn, and Zr; see Table 1) in these soil samples, based on 120 s testing time and NIST 2711 standard reference materials (SRM). Assaying also supplied standard deviations for each of these quantities (i.e., analytical assay errors) by soil sample. Calculated quantities are in milligrams of trace metal per kilogram of soil, or parts per million (ppm). Table 1 summarizes detection and natural background thresholds for these metals as well as the number of useful samples (i.e., with at least a non-negative assay trace metal measurement quantity) exceeding either a published maximum permissible level (MPL; [27]) or the worldwide average contamination/risk threshold [28]; all soil samples have an analytical error calculation, but not all have a valid (i.e., non-negative) trace metal quantity calculation. A number of already-published studies ([29,30,31,32,33,34]) summarize targeted analyses of the geographic distribution of this sample of trace metal measurement quantities and/or their location error. In contrast, assaying-furnished error standard deviations for all soil samples constitute a dataset yet to be analyzed, until now.
A useful expectation is that a log-normal distribution describes the ordered distribution of soil contaminant errors (after [39]; also see [40]), especially for a cross-sectional study. Diagnostic results appearing in Table 2 imply that a log-normal approximation furnishes a suitable, albeit not perfect, description of the sample trace metal errors studied here. With so little residual variance (Table 2), any residual soil sample geographic resolution level spatial autocorrelation that is present has little impact upon variance estimates; this topic constitutes an appealing future research theme. The classical form of the log-normal random variable implies the following theoretical standard deviation (std), acknowledging that its minimum is zero (i.e., variance is non-negative):
LN ( y o + δ ^ ) = μ ^ o = α ^ + β ^ { Φ 1 [ r o 3 / 8 n + 1 / 4 ] } + ε ^ o ,   if   y o   is   from   the   sample   being   studied ,  
LN ( y o + δ ^ ) = μ ^ o = α ^ + β ^ { Φ 1 [ r o 3 / 8 n + 1 + 1 / 4 ] } + ε ^ o ,   if   y o   is   from   outside   of   the   sample   being   studied ,  
σ ^ LN ( y o + δ ^ ) 2 = σ ^ ε 2 =   MSE ,  
y ^ o = δ ^ +   e μ ^ o + σ ^ LN ( y o + δ ^ ) 2 / 2 ,   and  
σ ^ y o = [ e 2 μ ^ o + σ ^ LN ( y o + δ ^ ) 2 ] [ e σ ^ LN ( y o + δ ^ ) 2 1 ] ,  
where ro is the rank of the existing/new value, yo, MSE denotes the appropriate regression mean squared error (see Appendix B), and Φ denotes the standard normal random variable’s cumulative distribution function (the mean and variance of these n or n + 1 z-scores respectively are 0 and 1). This data feature suggests that theoretical standard errors may be posited for each of the 15 trace metal variances studied in this paper (based on Appendix B, Table A1). These quantities support a resampling-based uncertainty measure for each of the quantified analytical assay standard deviations.
Table 3 summarizes selected descriptive spatial statistics about the trace metal analytical errors, recognizing that individual soil contaminants, because of their positive skewness (e.g., the minimum is zero), often are approximately log-normally distributed (e.g., [40]). In general, their latent spatial autocorrelation is positive and weak, which is sensible given that analytical error basically should be locationally independent across samples. Similarities common to nearby sample soil compositions could introduce some spatial autocorrelation; however, the assaying sequence should not.

4. Dimensions of Analytical Assay Error

One relevant research question asks whether or not the geographic distribution of trace metal soil sample analytical assay measurement error has distinct dimensions. These errors for the 15 heavy metals were analyzed with factor analysis. Table 4 reports factor loadings of the four prominent uncovered data dimensions (after being subject to a varimax rotation), which summarizes a simple structure (i.e., varimax rotated) version for them. Both the point and census tract averaged data render the same four dimensions, which account for more than 90% of the total generalized variance for the 15 trace metal analytical assay errors, with all of these errors positively correlated with their respective data dimensions. In accordance with their toxicity to living organisms, a subset of these heavy metals can be ordered as follows: Hg > Cu > Zn > Ni > Pb > Cr > Fe > Mn [41]. These four dimensions are as follows: (1) Co, Cu, Hg, Ni, Se, and Sr; (2) Co, Fe, Mn, and Rb; (3) As, Pb, and Zn; and, (4) Mo and Zr. This first dimension most likely embodies the most toxicity. Factor-1 essentially reflects variability patterns arising from biosolids and manure usage coupled with airborne vehicle pollution. Factor-2 signals variability patterns arising from the use of modern yard landscape organic and mineral fertilizers. Factor-3 signifies variability patterns typical of urban soil contamination arising from the use of outdoor wood and metal material treatments coupled with pesticide chemicals for landscape plant protection. Factor-4 may mirror variability patterns attributable to a geogenic dimension, given that lower amounts of Zr tend to be found in glacial drift soils (e.g., drumlins), whereas Mo contamination of soils is widespread in historically industrial places, particularly those that produced steel alloys (the 100+ years of production continues in Syracuse to this day in the form of specialty steel) (See Appendix C).
Figure 2 portrays the geographic distributions of the four trace metal error dimensions. The red census tracts in Figure 2a reflect the interstate expressways transecting the city. The red census tracts in Figure 2b focus on the locations of the pre-city settlements of the village of Syracuse and the town of Salina during the early 1800s. The green census tracts in Figure 2c highlight much of the low density housing and other low population density land use sections of the city. Finally, the red census tracts in Figure 2d basically align with major drumlins across the city. Overall, these four map patterns are consistent with the preceding trace metal dimension interpretations. Furthermore, the spatial autocorrelation exhibited by these four dimensions signals the swarthiness tendencies rather than any coherent geographic clusters depicted by their map patterns.
Figure 3 presents the geographic distributions of the analytic assay measurement errors for the 15 trace metals using a quantile classification scheme. These maps generally confirm the results of the dimension analysis. The maps for the metals in Factor-1 (i.e., Co, Cu, Hg, Ni, Se, and Sr) have a similar pattern, with high values in the northwest and southeast areas and low values in the northeast area. The four metals in Factor-2 (i.e., Co, Fe, Mn, and Rb) have a similar pattern, with the first three having high values in the northwest area and low values in the south area; Rb has a slightly different geographic pattern from the others in Factor-2, with high value clusters in the east as well as the west side and low values through the central north-south axis of the city. The three metals in Factor-3 (i.e., As, Pb, and Zn) have a similar geographic pattern. In particular, the maps for As and Ni are almost identical with the quantile taxonomy, depicting high values occupying the center and the northwest areas and low values occurring in the northwest and south areas. The maps for Mo and Zr generally show high values in the northern part and low values in the southern part of the city.

5. Mapping Analytical Assay Measurement Error and Sampling Error

Although the choropleth maps present the spatial patterns of the assay errors for the heavy metals, they do not consider the uncertainty of these assay errors. Studies discuss that a map classification result may not be robust when the uncertainty of observations is not considered [42,43]. This section presents choropleth maps for the heavy metals utilizing the approach presented in Koo et al. [22] and the separability index [21], which consider observed values and their uncertainties simultaneously in map classification. The choropleth mapping methods are implemented in the SAAR software package that utilizes R functions through R.NET [44]. It is available from https://thesaar.github.io/ (accessed on 30 April 2021).
Figure 4 presents visual representations of classification results for the assay errors and their theoretical analytical measurement errors for three selected heavy metals (As, Cu, and Fe). These three heavy metals are highly associated with Factors 3, 1, and 2, respectively, reported in the dimensional analysis section (Table 4). Note that plots for the other heavy metals are presented in Appendix D (Figure A2). In these plots, blue dots represent assay errors for the census tracts, and the corresponding bars represent the 95% confidence intervals calculated with analytical measurement errors. The analytical measurement error is relatively consistent across the census tracts. This consistent pattern is conspicuous for Cu and Fe. In contrast, the analytical measurement error for As tends to increase as assay error increases. In other words, the length of the bars gets longer as the As value increases. Nevertheless, the analytical errors for two consecutive observations in the graph are similar. Because of this error consistency, at least between consecutive observations, the assay error values (blue dots) profoundly dominate, whereas the analytical error has little or no impact on the classification results. The classification break points (the vertical bars) appear where the assay error value separations are sizeable. The plots for the other heavy metals have this same pattern.
Figure 5 presents the classification results based upon the assay error and resampling error. The resampling error varies across census tracts more than the analytical error. For example, the resampling error of Cu varies, while its analytical error is relatively constant. All heavy metals have a similarly larger variation in their resampling errors (Figure 5 and Figure A3). This resampling error variation has an impact on classification results. For Cu, although most observations fall into the first class in Figure 4b, most observations fall into the fifth class in Figure 5b. For As, most observations fall into the fifth class in Figure 5a, whereas observations are relatively equal in number across the fourth and fifth classes in Figure 4a.
Figure 6 presents the geographic distribution of the census tract aggregated simulated log-normal analytical measurement error for each metal. The magnitudes of the errors are represented with proportional circle sizes (the symbol sizes in the legends for their corresponding labels), whereas the choropleth maps represent mean values for the heavy metals. Some metals, such as Co (Figure 6c), display relatively sizeable error dispersions, whereas others, such as Hg (Figure 6h), display relatively small error dispersions, and yet others, such as As (Figure 6a), display more of a mixture of dispersion magnitudes. Regardless, all are well described by a log-normal distribution (Table 2).
Figure 7 reveals that the resampling error dispersion tends to be substantially larger than its corresponding analytical measurement error. Sampling error across the Figure 7 maps partially reflects variation in the post-tabulated sample sizes, which is far from constant. Accordingly, many maps (e.g., Figure 7c,h, respectively portraying Co and Hg) have relatively wide ranges of dispersions, which are quite conspicuous. Some (e.g., Figure 7b,k) have relatively narrow ranges of small dispersions, whereas none have relatively constant and more substantial resampling errors (similar to Figure 6b).
The geographic error variability displays some noticeable features and evokes several implications. First, analytic measurement error generally is stable across census tracts. Most of the heavy metals have essentially the same analytical error for all of the census tracts, although some heavy metals (As, Pb, and Zn) show an increasing pattern that covaries with the assay error. Second, the resampling error varies considerably, unlike the analytical measurement error. The geographic distributions of the resampling error show pattern similarity to the factors uncovered as data analytic dimensions of the assay error. The heavy metals that are mainly associated with the same factor in Table 4 have a map pattern similar to that of the resampling error. For example, the geographic patterns for Co, Fe, Mn, and Hg, which mainly contribute to Factor-2, are alike. This outcome may imply that resampling error results can be affected by samples in an areal unit, consequently affecting its geographic pattern.

6. Error Mixtures in Data: Some Simulation Experiments

This section summarizes output from exploratory simulation experiments addressing the sensitivity of map patterns to analytical assay measurement error (via sampling from appropriate log-normal random variables), sampling error (via bootstrap resampling), and their combination. Table 4 reports census tract resolution summary results for the observed data. Concluding comments focus on specification error with regard to estimating the nature and degree of spatial autocorrelation in geographic dimensions of data.

6.1. Geographic Dimension Sensitivity to Analytical Assay Measurement Error

Generation of the assaying sensitivity analysis simulation experiment results summarized in Table 5 utilized random samples from appropriate heavy metal log-normal distributions for each of the 3322 individual soil samples located in the city of Syracuse. These census tract averages then were georeferenced data input to a varimax-rotated factor analysis. Results for the data analytic factor structure of the 15 Syracuse, NY, trace metal measurement errors imply that changes in assay measurement error based on theoretical frequency distributions corrupt findings in subtle ways (e.g., average Factor-2 and Factor-3 loadings decrease, but not significantly), although it preserves simple structure as well as the dimensional clustering of heavy metals. Nevertheless, the percentage of variance accounted for by each dimension remains essentially the same as that for the observed data.

6.2. Geographic Dimension Sensitivity to Sampling Error

This resampling experiment addressing sensitivity to soil sampling error employed a bootstrap census tract tessellation stratified random sampling (with replacement) design. Table 6 summarizes sampling error resampling findings for the data analytic factor structure of the 15 Syracuse trace metal measurement errors. Sampling error corrupts results in various ways. Four heavy metals exhibit a propensity to load onto two factors, compromising simple structure. Now Zn loads onto the first dimension (which arguably occurs for the collected data), and Rb fails to load onto any of the first four dimensions. Prominent loadings for the second and third dimensions markedly decrease (although not significantly). Meanwhile, as with the assaying simulation findings, the percentage of variance accounted for by each dimension remains essentially the same as that for the observed data.
A third experiment, addressing sensitivity to specification error, employed the gamma distribution (with LN [ r o 3 / 8 n r o + 5 / 8 + 0.25 ] as its covariate; representing minimal specification error) because it furnishes a competitive empirical distribution to the log-normal one. It also employed the beta distribution (with LN [ r o 3 / 8 n r o + 5 / 8 ] as its covariate; representing noticeable specification error), which provides a conceptually appealing alternative, given that the trace metal quantities are proportions in ppm. It additionally employed the uniform distribution (with pi = r i 3 / 8 n + 1 / 4 as its covariate; representing severe specification error), whose probabilities are much smaller for the concentration portion and much larger for the right-hand tail of the positively skewed empirical frequency distribution. Based upon order statistics combined with six-sigma theory—e.g., covariate values have a variable transformed beta distribution with variance r i ( n r i + 1 ) ( n + 2 ) ( n + 1 / 4 ) 2 for the ith rank—sampling draws were from the uniform distribution over the interval
[ p i     3 { p i ( 1 p i ) + 33 64 ( n + 1 4 ) 2 } / ( n + 2 ) ] ,   p i + 3 { p i ( 1 p i ) + 33 64 ( n + 1 4 ) 2 } / ( n + 2 ) ] ] ,
where p i (which also is the covariate) denotes the empirical cumulative distribution function probability for soil sample i. In each case, 10,000 analytical assay measurement error samples were drawn. Table 7, Table 8 and Table 9 summarize averaged varimax-rotated factor analysis results from these experiments. Meanwhile, Figure 8 portrays selected extreme gamma and beta distributions embraced by these simulations.
Approximating the analytical assay measurement error that is a proportion restricted to the interval [0, 1] with a log-normal distribution whose support is the interval [ 0 , ) appears to introduce little specification error, most likely because the skewness of heavy metal error tends to concentrate its values very close to zero. However, specification error introduced by replacing this log-normal with its gamma distribution competitor [45] corrupts a number of results (see Table 7): Mn loads onto the first rather than the second dimension; Pb and Zn fail to load onto any of the first four dimensions; all of the prominent loadings are lower, some substantially so, verging on statistical significance; and the percentage of variance accounted for decreases noticeably for the first three dimensions. Figure 8 portrays the scatterplot association between the log-normal and gamma individual heavy metal goodness-of-fit regression (pseudo-)R2 values (also see Appendix E, Table A3); overall, the gamma assumption appears to be inferior to the log-normal assumption.
Meanwhile, although the beta distribution and analytical assay measurement error share the same support interval, the beta distribution’s performance is poorer than that produced by the gamma assumption (Table 5, Table 7 and Table 8). Here the first dimension gains Co and Mn and loses Hg and Sr. The second dimension resembles the original third dimension, whereas the third dimension lacks substance. Zn fails to load onto the fourth dimension; more specifically, Fe, Hg, Rb, Se, and Zr fail to load onto any of the first four dimensions. Again, all of the prominent loadings are lower, some substantially so, verging on statistical significance. Figure 9 also portrays the scatterplot association between the log-normal and beta individual heavy metal goodness-of-fit regression (pseudo-)R2 values (see Appendix E, Table A3); overall the beta appears inferior to the gamma, and markedly inferior to the log-normal, assumption.
Figure 9 endorses the contention that the uniform distribution represents the most extreme specification error assumption studied here. Table 9 reveals that the simulation error is negligible because selection is from relatively narrow intervals (regardless of the use of the six-sigma principle). This assumption essentially preserves the multivariate statistical structure of Table 5 but with an overfitting cost (i.e., analyses over-emphasize assumption-specific information). The second and third dimensions switch (which is not surprising, given that the respective percentages of accounted variance are almost the same), and many of the loadings decrease somewhat. Nevertheless, a number of the bivariate regression R2 values are only moderate in degree and hence relatively low (see Appendix E, Table A4), with corresponding coefficients that deviate considerably from an intercept of 0 and a slope of 1.

6.3. Selected Error Propagation Illustrations

This section reviews three instances of error propagation. The first, motivated by Koo et al. [46], assesses analytical measurement error, resampling error, and specification error on indices of spatial autocorrelation. The second, inspired by a widespread recognition of the existence of interacting error sources, evaluates a mixture of sampling and specification error latent in georeferenced data. The third illustrates variance inflation, a common geospatial data complication.

6.3.1. Error Propagation to Spatial Autocorrelation Indices

Table 10 summarizes Moran Coefficient (MC) and Geary Ratio (GR) spatial autocorrelation index results by data factor dimensions. With regard to spatial autocorrelation measures, based upon a root mean squared error criterion, the presumably correct statistical distributional assumption results most closely align with the original data results. Sampling error noticeably impacts these measures, with a tendency to decrease them. Specification error markedly impacts these measures, with the beta and uniform distribution assumptions corrupting them far more than the gamma assumption. The MC indicates that the beta, whereas the GR indicates that the uniform, assumption introduces more specification error.
Meanwhile, the standard errors display considerably more of an impact than the spatial autocorrelation indices themselves. Interestingly, the resampling error does not tend to exhibit the closest alignment with the theoretical standard errors reported for the original data. Specification error introduced by the uniform distribution assumption deviates the most, which is rather obvious from a visual inspection of the table entries. The asymptotic standard error (i.e., 0.85) for the MC is consistent with these results; that for the GR (i.e., 0.214) is not.
The general implication here is that error propagation impacts spatial autocorrelation index values and their standard error estimates. Specification error may or may not be a more serious source of this corruption. This topic merits considerably more future research.

6.3.2. Error Propagation from a Mixture of Error Sources

Another experiment, inspired by Gustavsson et al. [47], involved the following two-step procedure: draw a bootstrap census tract tessellation stratified random sample (with replacement) of 3322 log-normal fitted values, and then use these measurements to generate simulated (log-normal distribution assumption) assaying analytical errors. This sequence was executed 10,000 times. Table 11 summarizes output for this combined error source sensitivity analysis. As expected, deviations from the original data results increase, with sampling error markedly dominating analytical measurement error (also see Table 12).

6.3.3. Error and Spatial Autocorrelation Induced Variance Inflation

The final exploratory analysis concerns an additional evaluation of error, partially reminiscent of the arguments of Koo et al. [46]. A principal impact of positive spatial autocorrelation is variance inflation. Therefore, a conventional variance formula includes specification error, whereas adjusting for spatial autocorrelation renders an uninflated variance estimate; adjustment here is with the popular pure spatial simultaneous autoregressive (SAR) model specification. Table 12 shows the relevant results. The various pairs of data results demonstrate how variance inflation increases with increasing positive spatial autocorrelation; this inflation is rather modest for this example because the spatial autocorrelation parameter ρ is not vary large (e.g., approximately 0.4), generating roughly a 20% increase in variance. Random sampling error impacts variance estimates more than the other reported statistical quantities; Table 12 suggests that it also might slightly dampen variance inflation vis-à-vis the other error sources. In contrast, random measurement error has similar but less pronounced impacts on the reported statistical quantities; it appears to have a rather neutral impact on variance inflation. When mixed, random sampling error impacts tend to dominate analytical measurement error impacts. To quantify this situation based on the Table 12 information, for Factor-1.
Total variance − analytical measurement error − sampling error: 1 − 0.002 − 0.122 = 0.876 (with rounding error vis-à-vis Table 12).
The corresponding mixture result confirms this outcome. Results for the other three factors are more ambiguous.
Because the gamma distribution often is an assumption competitor for the log-normal distribution, it furnishes the basis for a minimum specification error assessment here. Specification error impacts are detectable in Factor-1 and considerable in the other three factors. These impacts are at least as substantial as those for sampling error, being far more extreme for Factor-4. Accordingly, in terms of trace metal analytical error, the evidence presented in this paper supports the following rank ordering:
measurement < sampling < specification
This ordering merits further research.

7. Discussion

Table 13 furnishes a basic summary revealing prominent similarities and differences displayed by the presence of measurement, sampling, and specification error. The random nature of these error sources partially functions as noise, deteriorating latent patterns. Regardless, Factor-4 is a robust dimension, and Factor-1 is a reasonably robust dimension, with error-created confusion primarily materializing in the second and third factors. Sampling error tends to dominate analytical measurement error, with specification error introducing serious corruption. Meanwhile Rb results appear to be the most sensitive, of the 15 heavy metals studied here, to sources of error.

8. Conclusions and Implications

This paper investigates assaying errors in surface soil samples that were collected across the city of Syracuse, NY, mostly during the summers of 2003 and 2004. These samples are composed of a total of 3628 observations, with measurements for 15 heavy metals. The geographic distributions of the 15 heavy metals were examined at the individual point level and at the aggregate census tract levels. In addition, the data analytic dimensions of these heavy metals were examined using varimax-rotated factor analysis. The results show that the assaying errors do not present a high level of spatial autocorrelation. This result may indicate that the assay errors generally are independent across geographic locations. The factor analysis results imply that the geographic distributions of the 15 heavy metals display four prominent data analytic dimensions and that their patterns are associated with objects in the physical environment (e.g., expressways), historical settlement patterns in the city, and low housing/population densities. In addition, these assay error patterns are consistent with those of the corresponding observed values of the 15 heavy metals. Empirically, this outcome shows that the size of assay errors tends to be positively associated with the sizes of observed values.
This paper also investigates impacts of sampling error, measurement error, and their mixture on assaying error for the 15 heavy metals using explanatory simulation experiments. The results show that although both sampling error and measurement error have an impact on assaying errors, measurement error has a greater impact than sampling error. Furthermore, geographically, the measurement error is stable across the census tracts, whereas the sampling error varies considerably. These findings may provide useful insights for constructing a more rigorous sampling design as well as a measurement plan. Also, the results suggest that specification error would have a severe impact; however, this conjecture needs further investigation to better understand this impact in a general context. Finally, the results of the simulation experiments confirm a geographic resolution impact.
Because this paper investigates a specific dataset, the findings may not be directly applicable to other geographic landscapes. Nevertheless, this paper shows impacts of the different error sources on assaying error and posits a potential importance ranking of different error sources.

Author Contributions

The co-authors D.A.G. and Y.C. meet the authorship criteria and are in agreement about the submission of the manuscript. They collaboratively conceived the research idea, designed the methodology, and processed the data, and then executed different parts of the statistical analyses. Both contributed to the writing of this paper, Both authors have read and agreed to the published version of the manuscript.

Funding

Only the data collection for this research was supported by the United States National Science Foundation, grant BCS-0221949. No external funding supported the data analyses or manuscript preparation; any opinions, findings, and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the following three Institutional Review Boards (IRBs): The Syracuse University IRB (protocol number: 02030; initial approval date of 6 June 2002), the University of Miami IRB (protocol number: 03/703; initial approval date of 15 October 2003), and the UTD IRB (protocol number 05-35; initial approval date of 8 August 2005).

Informed Consent Statement

Informed consent forms were secured from the residential occupants under oversight by the Syracuse University and University of Miami IRBs.

Data Availability Statement

Restrictions apply to the availability of the soil sample data. These data, now housed at UTD, are available by request in census tract aggregated form, as per presiding agreements established by IRBs in coordination with the Onondaga County Health Department.

Acknowledgments

D.A.G. is an Ashbel Smith Professor of Geospatial Information Sciences. An abridged version of this paper was presented at the annual NARSC/Regional Science Association International meeting, held virtually on 10 November 2020.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Sample Analytical Assay Measurement Error Boxplots

Figure A1. Sample analytical assay measurement error boxplots.
Figure A1. Sample analytical assay measurement error boxplots.
Ijerph 18 05164 g0a1

Appendix B. Tabulation of Important Back-Transformation Statistics

Griffith [48] discusses back-transformations, specifically commenting about transformation-introduced specification error that can be indexed by the mean of the ratio y ^ i / y I appearing in Table A1. All of the Syracuse heavy metals contain less than 1% specification error based upon this criterion. Furthermore, 11 of the heavy metals have a bivariate regression slope of approximately 1, its expectation for a perfect match between the two sets of values. In particular, Zn exhibits what appears to be excessive bivariate regression parameter values, implying that its log-normal characterization may contain excessive specification error.
Table A1. Selected descriptive statistics for calculating and assessing the theoretical error of the 15 trace metals measured via Syracuse soil sample assaying.
Table A1. Selected descriptive statistics for calculating and assessing the theoretical error of the 15 trace metals measured via Syracuse soil sample assaying.
Trace Metal L N ( y i + δ ) = α + β ( r i 3 / 8 ) ( n + 1 / 4 ) + ε i ,   i = 1 , 2 , , n MSE y ^ i / y i
α ^ β ^ MeanStandard
Deviation
Arsenic (As)−5.8171.4060.012741.0060.065
Chromium (Cr)−2.2941.0220.000481.0000.016
Cobalt (Co)−0.1761.0010.000171.0000.020
Copper (Cu)−3.3211.0600.001531.0000.020
Iron (Fe)−0.1471.0000.000131.0000.021
Lead (Pb)−6.7941.3860.012141.0060.063
Manganese (Mn)−0.9361.0070.000231.0000.013
Mercury (Hg)−0.4641.0450.001091.0000.017
Molybdenum (Mo)0.0011.0000.000001.0000.016
Nickel (Ni)−2.0281.0280.000601.0000.016
Rubidium (Rb)−0.0021.0000.000021.0000.012
Selenium (Se)−0.7421.0990.002851.0010.024
Strontium (Sr)−1.5471.1890.007221.0020.039
Zinc (Zn)−13.7971.3470.013021.0020.052
Zirconium (Zr)0.5020.9680.000461.0000.024

Appendix C. A Cross-Tabulation of Heavy Metals and Their Potential Sources

This cross-tabulation reflects findings by Wuana et al. [49], Aschale et al. [50], and Smiljanić et al. [51].
Table A2. Cross-tabulation of selected trace metals and their potential yard landscape sources.
Table A2. Cross-tabulation of selected trace metals and their potential yard landscape sources.
Heavy MetalFertilizersPesticidesPreservativesBiosolids/ManuresStorm WaterVehicles
Arsenic (As)XXXXX
Chromium (Cr) XXX X
Cobalt (Co)X X
Copper (Cu)XXX X
Iron (Fe) X
Lead (Pb)XXXXXX
Manganese (Mn)XX
Mercury (Hg)XX XX
Molybdenum (Mo) X X
Nickel (Ni)X X X
Rubidium (Rb)
Selenium (Se)X X
Strontium (Sr)
Zinc (Zn)XX X X
Zirconium (Zr) X

Appendix D. Classification Results of Assay Errors Using Theoretical Analytical Measurement Error and Resampling Error

Figure A2. Classification results of assay errors and the corresponding theoretical analytical measurement error for three selected heavy metals: (a) Cr, (b) Co, (c) Pb, (d) Mn, (e) Hg, (f) Mo, (g) Ni, (h) Rb, (i) Se, (j) Sr, (k) Zn, and (l) Zr.
Figure A2. Classification results of assay errors and the corresponding theoretical analytical measurement error for three selected heavy metals: (a) Cr, (b) Co, (c) Pb, (d) Mn, (e) Hg, (f) Mo, (g) Ni, (h) Rb, (i) Se, (j) Sr, (k) Zn, and (l) Zr.
Ijerph 18 05164 g0a2
Figure A3. Classification results of assay errors and the corresponding resampling error for three selected heavy metals: (a) Cr, (b) Co, (c) Pb, (d) Mn, (e) Hg, (f) Mo, (g) Ni, (h) Rb, (i) Se, (j) Sr, (k) Zn, and (l) Zr.
Figure A3. Classification results of assay errors and the corresponding resampling error for three selected heavy metals: (a) Cr, (b) Co, (c) Pb, (d) Mn, (e) Hg, (f) Mo, (g) Ni, (h) Rb, (i) Se, (j) Sr, (k) Zn, and (l) Zr.
Ijerph 18 05164 g0a3

Appendix E. Random Variable Distributions Selected to Explore Specification Error

Table A3. Selected descriptive statistics for the theoretical analytical error of the 15 trace metals measured via Syracuse soil sample assaying (n = 3324 soil sample points).
Table A3. Selected descriptive statistics for the theoretical analytical error of the 15 trace metals measured via Syracuse soil sample assaying (n = 3324 soil sample points).
Trace MetalGamma DistributionBeta Distribution (ppm)Uniform Distribution
α ^ (Shape) β ^
(Scale)
δ ^ Pseudo-R2 α ^ β ^ Pseudo-R2yminymaxR2
minmaxminmax
Arsenic (As)0.868.04.752−50.930.435.2282,8830.6651560.45
Chromium (Cr)37.8106.81.317−460.9585.0187.41,194,2070.98621750.82
Cobalt (Co)127.3230.11.318470.9139.9114.1492,6170.99682750.85
Copper (Cu)20.177.60.927−310.9788.1195.92,394,0690.93351020.76
Iron (Fe)171.9287.23.0912100.9126.276.2129,6650.991797320.85
Lead (Pb)0.773.66.155−60.910.334.3212,6880.6661970.45
Manganese (Mn)44.0116.51.792−410.9358.6144.9689,5160.99802290.86
Mercury (Hg)22.282.40.155−60.979.720.71,361,5550.956190.78
Molybdenum (Mo)393.2555.60.01520.850.81.5248,5680.98270.87
Nickel (Ni)38.0106.40.815−350.96113.4231.42,239,6310.97441130.81
Rubidium (Rb)336.2495.10.02750.9248.5137.617,260,2641.00290.91
Selenium (Se)18.066.10.151−40.958.118.51,635,1450.864180.66
Strontium (Sr)5.049.30.533−40.9914.073.23,905,6560.844230.66
Zinc (Zn)2.056.63.846−250.968.658.6568,8910.73251510.52
Zirconium (Zr)340.0490.10.090170.8851.2141.45,433,1300.996280.91
Table A4. Bivariate regression coefficients for response variable y and covariate y ^ (predicted with the quantile covariate) from Table A3.
Table A4. Bivariate regression coefficients for response variable y and covariate y ^ (predicted with the quantile covariate) from Table A3.
Trace MetalGamma DistributionBeta Distribution (ppm)Uniform Distribution (0, 1)
aba × 10−8bab
Arsenic (As)3.081140.79557−10.71504.283.088130.15557
Chromium (Cr)3.531920.96686−3.01668.7176.777840.25547
Cobalt (Co)3.597910.97439−9.72909.0498.633000.24652
Copper (Cu)0.774000.98592−1.7881.5440.065200.22243
Iron (Fe)8.338630.97776−26.77855.31262.775820.24818
Lead (Pb)4.724820.74484−13.91930.213.247900.15801
Manganese (Mn)6.742990.95033−6.32429.5192.139160.28651
Mercury (Hg)0.296540.97152−0.2155.937.650600.21986
Molybdenum (Mo)0.108140.974800.057.403.140110.27041
Nickel (Ni)1.922460.97352−1.11026.1152.521460.25786
Rubidium (Rb)0.110930.97669−0.397.373.184730.29627
Selenium (Se)−0.290951.03881−0.3124.855.720630.15852
Strontium (Sr)0.163690.98023−1.4277.494.796230.25584
Zinc (Zn)1.905020.95328−9.31637.7422.632880.20818
Zirconium (Zr)0.456640.97102−1.0312.8211.152780.27411
NOTE: because its results are based on ppm, the beta distribution regression slope should be compared with 1000, whereas the other two slopes should be compared with 1; a and b in the column headings denote parameter coefficients for the intercept and covariate in the bivariate regression.

References

  1. Goodchild, M.; Gopal, S. (Eds.) Accuracy of Spatial Databases; CRC Press: Boca Raton, FL, USA, 1989. [Google Scholar]
  2. US EPA. GEO-EAS 1.2.1, Geostatistical Environmental Assessment Software User’s Guide; Publication No. EPA/600/4-88/033; Environmental Monitoring and Systems Laboratory: Las Vegas, NV, USA, 1991.
  3. Deutsch, C.; Journel, A. GSLIB: Geostatistical Software Library and User’s Guide; Oxford University Press: New York, NY, USA, 1992. [Google Scholar]
  4. Stehman, S.; Overton, S. Chapter 9: Environmental sampling and monitoring. In Handbook of Statistics, 1st ed.; Patil, G., Rao, C., Eds.; Elsevier: Amsterdam, The Netherlands, 1994; Volume 12, pp. 263–306. [Google Scholar]
  5. Strickland, M.; Siffel, C.; Gardner, B.; Berzen, A.; Correa, A. Quantifying geocode location error using GIS methods. Environ. Health 2007, 6, 10. [Google Scholar] [CrossRef] [Green Version]
  6. Koo, H.; Chun, Y.; Griffith, D. Modeling positional uncertainty acquired through street geocoding. Int. J. Appl. Geospat. Res. 2018, 9, 1–22. [Google Scholar] [CrossRef] [Green Version]
  7. Arbia, G.; Griffith, D.; Haining, R. Error propagation modelling in raster GIS: Overlay operations. Int. J. Geogr. Inf. Sci. 1998, 12, 145–167. [Google Scholar] [CrossRef]
  8. Griffith, D.; Wong, D.; Chun, Y. Uncertainty related research issues in spatial analysis. In Uncertainty Modelling and Quality Control for Spatial Data; Shi, J., Wu, B., Stein, A., Eds.; Taylor & Francis Group/CRC Press: London, UK, 2015; pp. 3–11. [Google Scholar]
  9. Griffith, D.; Haining, R.; Arbia, G. Heterogeneity of attribute sampling error in spatial data sets. Geogr. Anal. 1994, 26, 300–320. [Google Scholar] [CrossRef]
  10. Wang, J.; Haining, R.; Cao, Z. Sample surveying to estimate the mean of a heterogeneous surface: Reducing the error variance through zoning. Int. J. Geogr. Inf. Sci. 2010, 24, 523–543. [Google Scholar] [CrossRef]
  11. Leung, Y.; Ma, J.; Goodchild, M. A general framework for error analysis in measurement-based GIS Part 1: The basic measurement-error model and related concepts. J. Geogr. Syst. 2004, 6, 325–354. [Google Scholar] [CrossRef]
  12. Anselin, L.; Griffith, D.A. Do spatial effects really matter in regression analysis? Pap. Reg. Sci. 1988, 65, 11–34. [Google Scholar] [CrossRef]
  13. Fingleton, B.; Le Gallo, J. Estimating spatial models with endogenous variables, a spatial lag and spatially dependent disturbances: Finite sample properties. Pap. Reg. Sci. 2008, 87, 319–339. [Google Scholar] [CrossRef]
  14. Hu, L.; Chun, Y.; Griffith, D.A. Uncovering a positive and negative spatial autocorrelation mixture pattern: A spatial analysis of breast cancer incidences in Broward County, Florida, 2000–2010. J. Geogr. Syst. 2020, 22, 1–18. [Google Scholar] [CrossRef]
  15. Luo, W.; Wang, T.; Lu, Y.; Giesy, J.; Shi, Y.; Zheng, Y.; Xing, Y.; Wu, G. Landscape ecology of the Guanting Reservoir, Beijing, China: Multivariate and geostatistical analyses of metals in soils. Environ. Pollut. 2007, 146, 567–576. [Google Scholar] [CrossRef]
  16. Delbari, M.; Afrasiab, P.; Loiskandl, W. Geostatistical analysis of soil texture fractions on the field scale. Soil Water Res. 2011, 6, 173–189. [Google Scholar] [CrossRef] [Green Version]
  17. Carrasco, P.C. Nugget effect, artificial or natural? J. S. Afr. Inst. Min. Metall. 2010, 110, 299–305. [Google Scholar]
  18. Leitner, M.; Buttenfield, B. Guidelines for the display of attribute certainty. Cartogr. Geogr. Inf. Sci. 2000, 27, 3–14. [Google Scholar] [CrossRef]
  19. MacEachren, A.M.; Robinson, A.; Hopper, S.; Gardner, S.; Murray, R.; Gahegan, M.; Hetzler, E. Visualizing geospatial information uncertainty: What we know and what we need to know. Cartogr. Geogr. Inf. Sci. 2005, 32, 139–160. [Google Scholar] [CrossRef] [Green Version]
  20. Koo, H.; Chun, Y.; Griffith, D. Geovisualizing attribute uncertainty of interval and ratio variables: A framework and an implementation for vector data. J. Vis. Lang. Comput. 2018, 44, 89–96. [Google Scholar] [CrossRef]
  21. Sun, M.; Wong, D.; Kronenfeld, B. A classification method for choropleth maps incorporating data reliability information. Prof. Geogr. 2015, 67, 72–83. [Google Scholar] [CrossRef]
  22. Koo, H.; Chun, Y.; Griffith, D. Optimal map classification incorporating uncertainty information. Ann. Am. Assoc. Geogr. 2017, 107, 575–590. [Google Scholar] [CrossRef]
  23. Mu, W.; Tong, D. Mapping uncertain geographical attributes: Incorporating robustness into choropleth classification design. Int. J. Geogr. Inf. Sci. 2020, 34, 2204–2224. [Google Scholar] [CrossRef]
  24. Griffith, D.A.; Chun, Y. Evaluating eigenvector spatial filter corrections for omitted georeferenced variables. Econometrics 2016, 4, 29. [Google Scholar] [CrossRef] [Green Version]
  25. Wong, D. The modifiable areal unit problem (MAUP). In The SAGE Handbook of Spatial Analysis; Fotheringham, A.S., Rogerson, P.A., Eds.; Sage: London, UK, 2009; pp. 105–123. [Google Scholar]
  26. Lee, S.-I.; Lee, M.; Chun, Y.; Griffith, D.A. Uncertainty in the effects of the modifiable areal unit problem under different levels of spatial autocorrelation: A simulation study. Int. J. Geogr. Inf. Sci. 2019, 33, 1135–1154. [Google Scholar] [CrossRef]
  27. Vodyanitskii, Y.N. Standards for the contents of heavy metals in soils of some states. Ann. Agrar. Sci. 2016, 14, 257–263. [Google Scholar] [CrossRef] [Green Version]
  28. Shacklette, H.; Boerngen, J. Element Concentrations in Soils and Other Surficial Materials of the Conterminous United States; U.S. Geological Survey: Alexandria, VA, USA, 1984; p. 1270.
  29. Griffith, D.; Millones, M.; Vincent, M.; Johnson, D.; Hunt, A. Impacts of positional error on spatial regression analysis: A case study of address locations in Syracuse, NY. Trans. GIS 2007, 11, 655–679. [Google Scholar] [CrossRef]
  30. Griffith, D.; Johnson, D.; Hunt, A. The geographic distribution of metals in urban soils: The case of Syracuse, NY. GeoJournal 2009, 74, 275–291. [Google Scholar] [CrossRef]
  31. Griffith, D.; Chun, Y.; Lee, M. Locational error impacts on local spatial autocorrelation indices: A Syracuse soil sample Pb-level data case study. In Proceedings of Spatial Accuracy 2016; UMR 7300 ESPACE; Bailly, J.-S., Griffith, D., Josselin, D., Eds., Eds.; UMR: Avignon, FR, USA, 2016; pp. 136–143. [Google Scholar]
  32. Hunt, A.; Johnson, D.; Griffith, D.; Zitoon, S. Citywide distribution of lead and other elements in soils and indoor dusts in Syracuse, NY. Appl. Geochem. 2012, 27, 985–994. [Google Scholar] [CrossRef]
  33. Lee, M.; Chun, Y.; Griffith, D. Uncertainties of spatial data analysis introduced by selected sources of error. In Advances in Geocomputation: Geocomputation 2015—The 13th International Conference; Griffith, D., Chun, Y., Dean, D., Eds.; Springer: Berlin, Germany, 2017; pp. 303–313. [Google Scholar]
  34. Lee, M.; Chun, Y.; Griffith, D. Error propagation in spatial modeling of public health data: A simulation approach using pediatric blood lead level data for Syracuse, New York. Environ. Geochem. Health 2018, 40, 667–681. [Google Scholar] [CrossRef]
  35. McComb, J.; Rogers, C.; Han, F.; Tchounwou, P. Rapid screening of heavy metals and trace elements in environmental samples using portable X-ray fluorescence spectrometer, A comparative study. Water Air Soil Pollut. 2014, 225, 2169. [Google Scholar] [CrossRef] [Green Version]
  36. Smith, D.; Cannon, W.; Woodruff, L.; Solano, F.; Kilburn, J.; Fey, D. Geochemical and Mineralogical Data for Soils of the Conterminous United States; Data Series 801; U.S. Geological Survey: Reston, VA, USA, 2013.
  37. Kabata-Pendias, A. Trace Elements in Soils and Plants; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
  38. Makishima, A.; Nakamura, E.; Nakano, T. Determination of zirconium, niobium, hafnium and tantalum at ng g-1 levels in geological materials by direct nebulization of sample HF solution into FI-ICP-MS. Geostand. Geoanal. Res. 2007, 23, 7–20. [Google Scholar] [CrossRef]
  39. Crooks, J.L.; Whitsel, E.A.; Quibrera, P.M.; Catellier, D.J.; Laio, D.; Smith, R.L. The effect of ignoring interpolation error on the inferred relationship between ambient particulate matter exposure and median RR interval in post-menopausal women. Epidemiology 2008, 19, S127. [Google Scholar]
  40. Ott, W. A Physical explanation of the lognormality of pollutant concentrations. J. Air Waste Manag. Assoc. 1990, 40, 1378–1383. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Zwolak, A.; Sarzyńska, M.; Szpyrka, E.; Stawarczyk, K. Sources of soil pollution by heavy metals and their accumulation in vegetables: A review. Water Air Soil Pollut. 2019, 230, 164. [Google Scholar] [CrossRef] [Green Version]
  42. Xiao, N.; Calder, C.A.; Armstrong, M.P. Assessing the effect of attribute uncertainty on the robustness of choropleth map classification. Int. J. Geogr. Inf. Sci. 2007, 21, 121–144. [Google Scholar] [CrossRef]
  43. Sun, M.; Wong, D.W. Incorporating data quality information in mapping American Community Survey data. Cartogr. Geogr. Inf. Sci. 2010, 37, 285–299. [Google Scholar] [CrossRef]
  44. Koo, H.; Chun, Y.; Griffith, D. Integrating spatial data analysis functionalities in a GIS environment: Spatial Analysis using ArcGIS Engine and R (SAAR). Trans. GIS 2018, 22, 721–736. [Google Scholar] [CrossRef] [PubMed]
  45. Kundu, D.; Manglick, A. Discriminating between the log-normal and gamma distributions. J. Appl. Stat. Sci. 2005, 14, 175–187. [Google Scholar]
  46. Koo, H.; Wong, D.; Chun, Y. Measuring Global Spatial Autocorrelation with Data Reliability Information. Prof. Geogr. 2019, 71, 551–565. [Google Scholar] [CrossRef] [PubMed]
  47. Gustavsson, B.; Luthbom, K.; Lagerkvist, A. Comparison of analytical error and sampling error for contaminated soil. J. Hazard. Mater. 2006, B138, 252–260. [Google Scholar] [CrossRef]
  48. Griffith, D. Better articulating normal curve theory for introductory mathematical statistics students: Power transformations and their back-transformations. Am. Stat. 2013, 67, 157–169. [Google Scholar] [CrossRef]
  49. Wuana, R.; Okieimen, F. Heavy metals in contaminated soils: A review of sources, chemistry, risks and best available strategies for remediation. Int. Sch. Res. Not. 2011, 402647. [Google Scholar] [CrossRef] [Green Version]
  50. Aschale, M.; Sileshi, Y.; Kelly-Quinn, M. Health risk assessment of potentially toxic elements via consumption of vegetables irrigated with polluted river water in Addis Ababa, Ethiopia. Environ. Syst. Res. 2019, 8, 29. [Google Scholar] [CrossRef] [Green Version]
  51. Smiljanić, S.; Tomić, N.; Perušić, M.; Vasiljević, S.; Pelemiš, S. The main sources of heavy metals in the soil. In Proceedings of the VI International Congress on Engineering, Environment Materials in Processing Industry, Jahorina, Bosnia and Herzegovin, 11–13 March 2019; UDK 502.17:546.3. pp. 453–465. [Google Scholar] [CrossRef]
Figure 1. Syracuse, NY. (a) The geographic distribution of trace metal soil samples overlaid with a polygon map of the 2000 United States census tract boundaries. (b) The distribution of number of soil samples with analytical assay error measurements. (c) The study area in the state of New York.
Figure 1. Syracuse, NY. (a) The geographic distribution of trace metal soil samples overlaid with a polygon map of the 2000 United States census tract boundaries. (b) The distribution of number of soil samples with analytical assay error measurements. (c) The study area in the state of New York.
Ijerph 18 05164 g001
Figure 2. Tertile choropleth maps for the four aggregated trace metal error dimensions across Syracuse, NY; red denotes relatively high factor scores, yellow denotes intermediate factor scores, and green denotes relatively low factor scores. (a) Factor-1 (MC = 0.185, GR = 1.038); (b) Factor-2 (MC = 0.242, GR = 0.800); (c) Factor-3 (MC = 0.162, GR = 0.725); (d) Factor-4 (MC = 0.300, GR = 0.618).
Figure 2. Tertile choropleth maps for the four aggregated trace metal error dimensions across Syracuse, NY; red denotes relatively high factor scores, yellow denotes intermediate factor scores, and green denotes relatively low factor scores. (a) Factor-1 (MC = 0.185, GR = 1.038); (b) Factor-2 (MC = 0.242, GR = 0.800); (c) Factor-3 (MC = 0.162, GR = 0.725); (d) Factor-4 (MC = 0.300, GR = 0.618).
Ijerph 18 05164 g002
Figure 3. Tertile choropleth maps portraying the 15 trace metal analytic measurement error geographic distributions (census tract resolution): (a) As, (b) Cr, (c) Co, (d) Cu, (e) Fe, (f) Pb, (g) Mn, (h) Hg, (i) Mo, (j) Ni, (k) Rb, (l) Se, (m) Sr, (n) Zn, and (o) Zr; red denotes relatively high errors, yellow denotes moderate errors, and green denotes relatively low errors.
Figure 3. Tertile choropleth maps portraying the 15 trace metal analytic measurement error geographic distributions (census tract resolution): (a) As, (b) Cr, (c) Co, (d) Cu, (e) Fe, (f) Pb, (g) Mn, (h) Hg, (i) Mo, (j) Ni, (k) Rb, (l) Se, (m) Sr, (n) Zn, and (o) Zr; red denotes relatively high errors, yellow denotes moderate errors, and green denotes relatively low errors.
Ijerph 18 05164 g003aIjerph 18 05164 g003b
Figure 4. Illustrative heavy metals classification results for assay error and their corresponding theoretical analytical measurement error: (a) As, (b) Cu, and (c) Fe.
Figure 4. Illustrative heavy metals classification results for assay error and their corresponding theoretical analytical measurement error: (a) As, (b) Cu, and (c) Fe.
Ijerph 18 05164 g004
Figure 5. Illustrative heavy metals classification results for assay error and their corresponding resampling error: (a) As, (b) Cu, and (c) Fe.
Figure 5. Illustrative heavy metals classification results for assay error and their corresponding resampling error: (a) As, (b) Cu, and (c) Fe.
Ijerph 18 05164 g005
Figure 6. Choropleth maps with superimposed theoretical analytical measurement error magnitude proportional circles portraying each of the 15 trace metals (census tract resolution): (a) As, (b) Cr, (c) Co, (d) Cu, (e) Fe, (f) Pb, (g) Mn, (h) Hg, (i) Mo, (j) Ni, (k) Rb, (l) Se, (m) Sr, (n) Zn, and (o) Zr.
Figure 6. Choropleth maps with superimposed theoretical analytical measurement error magnitude proportional circles portraying each of the 15 trace metals (census tract resolution): (a) As, (b) Cr, (c) Co, (d) Cu, (e) Fe, (f) Pb, (g) Mn, (h) Hg, (i) Mo, (j) Ni, (k) Rb, (l) Se, (m) Sr, (n) Zn, and (o) Zr.
Ijerph 18 05164 g006
Figure 7. Choropleth maps with superimposed resampling error magnitude proportional circles portraying each of the 15 trace metals (census tract resolution): (a) As, (b) Cr, (c) Co, (d) Cu, (e) Fe, (f) Pb, (g) Mn, (h) Hg, (i) Mo, (j) Ni, (k) Rb, (l) Se, (m) Sr, (n) Zn, and (o) Zr.
Figure 7. Choropleth maps with superimposed resampling error magnitude proportional circles portraying each of the 15 trace metals (census tract resolution): (a) As, (b) Cr, (c) Co, (d) Cu, (e) Fe, (f) Pb, (g) Mn, (h) Hg, (i) Mo, (j) Ni, (k) Rb, (l) Se, (m) Sr, (n) Zn, and (o) Zr.
Ijerph 18 05164 g007aIjerph 18 05164 g007b
Figure 8. Specimen gamma (left) and beta (right) distributions. (a) α = 0.065, 1/β = 62; (b) α = 0.489, 1/β = 1337; (c) α = 4.11 β = 212,688; (d) α = 163.19, β = 2,239,631.
Figure 8. Specimen gamma (left) and beta (right) distributions. (a) α = 0.065, 1/β = 62; (b) α = 0.489, 1/β = 1337; (c) α = 4.11 β = 212,688; (d) α = 163.19, β = 2,239,631.
Ijerph 18 05164 g008
Figure 9. Goodness-of-fit (pseudo-)R2 scatterplots labelled by heavy metal (each label pertains to a horizontal row of three scatterplot points). Denoted distribution results are: solid black circles denote gamma; solid gray circles denote beta; and, Hebrew pictograms tet (⨂) denote uniform.
Figure 9. Goodness-of-fit (pseudo-)R2 scatterplots labelled by heavy metal (each label pertains to a horizontal row of three scatterplot points). Denoted distribution results are: solid black circles denote gamma; solid gray circles denote beta; and, Hebrew pictograms tet (⨂) denote uniform.
Ijerph 18 05164 g009
Table 1. Selected relevant Syracuse soil sample attributes for the 15 trace metals.
Table 1. Selected relevant Syracuse soil sample attributes for the 15 trace metals.
Trace MetalLimit of Detection;
McComb et al. [35]
US Natural Median Background Level;
Smith et al. [36]
World-Wide Average Contamination Level;
Kabata-Pendias [37]
Maximum Permissible Level (MPL); Vodyanitskii [27]The Number of Samples Exceeding the MPL (n = 3324)
Arsenic (As)0.615.26.834.53324
Chromium (Cr)14.063059.53.83324
Cobalt (Co)9.017.711.3243324
Copper (Cu)6.5214.438.93.53324
Iron (Fe)12.419,50022,979 ← δ0
Lead (Pb)0.718.1275568
Manganese (Mn)19.4492488200 (pH < 5.2) 8
Mercury (Hg)2.390.020.071.93324
Molybdenum (Mo)1.20.781.12530
Nickel (Ni)14.413.5292.63324
Rubidium (Rb)0.3565.268← δ0
Selenium (Se)0.730.20.440.113324
Strontium (Sr)0.33121175← δ0
Zinc (Zn)2.415870163324
Zirconium (Zr)0.96165 *267← δ0
NOTE: * Makishima et al. [38];  https://www.uaex.edu/publications/PDF/FSA-2118.pdf (access on 11 May 2021); imputed value; ← δ this arrow symbol points to the worldwide average contamination level to be used.
Table 2. Selected descriptive statistics for the theoretical analytical error of the 15 trace metals measured via Syracuse soil sample assaying (n = 3324 soil sample points).
Table 2. Selected descriptive statistics for the theoretical analytical error of the 15 trace metals measured via Syracuse soil sample assaying (n = 3324 soil sample points).
Trace MetalMeanStd. Dev.yminymaxThree-Parameter Log-Normal Random Variable
δ ^ % OutliersK-S A-D §Pseudo-R2
TransformedBack-Transformed
Arsenic (As)15.6510.025.96155.54–5.86.590.04017.0190.9700.907
Chromium (Cr)106.989.1362.07174.37–29.45.720.05420.4280.9630.955
Cobalt (Co)140.8815.8368.66274.1175.64.780.04316.180 *0.9690.962
Copper (Cu)55.304.9435.06101.92–26.95.630.07539.5250.9410.922
Iron (Fe)375.8242.74179.71731.27283.64.840.04316.449 *0.9680.962
Lead (Pb)19.3112.816.93196.38–6.88.090.03614.3830.9730.915
Manganese (Mn)136.3113.1980.00228.34–16.75.170.0379.5270.9800.973
Mercury (Hg)10.460.866.7518.81–5.04.960.07032.5110.9520.940
Molybdenum (Mo)4.290.312.406.14138.25.350.05918.9850.9690.969
Nickel (Ni)72.865.6444.80112.93–24.75.780.07231.9130.9530.949
Rubidium (Rb)4.770.532.408.288.16.440.025 *3.3180.9900.990
Selenium (Se)7.520.734.9117.79–4.35.810.07440.5640.9260.854
Strontium (Sr)8.321.644.7722.81–4.52.140.08951.5010.9410.924
Zinc (Zn)40.9010.3925.46150.04–25.03.880.07947.1000.9370.870
Zirconium (Zr)15.781.706.6627.130.06.140.053 *19.916 *0.9630.975
NOTE:  Appendix A displays sample analytical measurement error boxplots; K-S denotes Kolmogorov-Smirnov; § A-D denotes Anderson-Darling; * Overall, a fitted log-normal corresponded more closely than a fitted gamma distribution (an asterisk denotes the exceptions to this finding); outlier identification resulted from robust estimation using an M-estimator coupled with Huber’s weight and scale.
Table 3. Spatial autocorrelation index values for log-measurement errors, LN ( y + δ ^ ) .
Table 3. Spatial autocorrelation index values for log-measurement errors, LN ( y + δ ^ ) .
Trace MetalIndividual Point Data:
Ordinary Kriging and K Bessel Function
Census Tract Aggregated Data
Effective Range (Meters; Maximum = 14,800)Cross-Validation Standardized RMSEMC
(MCmax = 1.018)
GR
(GRmin = 0.051)
Arsenic (As)70350.920.2160.691
Chromium (Cr)70661.0110.111.076
Cobalt (Co)78801.0480.220.922
Copper (Cu)63821.0040.2190.967
Iron (Fe)77941.0510.2190.913
Lead (Pb)380.8930.2150.692
Manganese (Mn)280.9380.0861.067
Mercury (Hg)51401.0050.1830.977
Molybdenum (Mo)17131.0330.2180.752
Nickel (Ni)69771.0290.1950.963
Rubidium (Rb)8091.0150.4230.607
Selenium (Se)330.8690.1620.91
Strontium (Sr)64651.1160.2070.851
Zinc (Zn)380.8180.2440.745
Zirconium (Zr)14301.0560.3490.558
NOTE: log-transformed values more closely conform to a bell-shaped curve but fail to markedly change the spatial autocorrelation index values; MC denotes Moran’s I (i.e., the Moran coefficient), and GR denotes Geary’s c (i.e., the Geary Ratio), which are the two most popular spatial autocorrelation indices.
Table 4. Syracuse trace metal analytical error varimax-rotated factor dimensions.
Table 4. Syracuse trace metal analytical error varimax-rotated factor dimensions.
Trace MetalIndividual Point DataCensus Tract Aggregated Data
Factor-1Factor-2Factor-3Factor-4Factor-1Factor-2Factor-3Factor-4
Arsenic (As)0.1810.1930.950.0670.1930.2030.9570.026
Chromium (Cr)0.7210.5420.2990.1120.8930.3670.2070.077
Cobalt (Co)0.3130.9010.1710.1090.4160.8350.2730.071
Copper (Cu)0.8590.3130.2660.0350.960.1570.178−0.027
Iron (Fe)0.30.9090.1490.1050.3920.8470.2660.058
Lead (Pb)0.1790.190.9510.0690.1880.1990.9590.028
Manganese (Mn)0.3460.8010.2320.150.520.6940.3040.125
Mercury (Hg)0.8220.3290.3790.0980.9440.1920.205−0.010
Molybdenum (Mo)0.1370.2590.0790.9380.1910.1190.0850.958
Nickel (Ni)0.8280.4160.2640.0850.9420.2440.165−0.013
Rubidium (Rb)−0.0910.8270.2090.183−0.4190.7790.042−0.073
Selenium (Se)0.6740.3070.6020.1140.8520.2020.4410.009
Strontium (Sr)0.875−0.1440.043−0.1100.892−0.2860.1−0.073
Zinc (Zn)0.3860.2110.811−0.0150.640.2120.702−0.053
Zirconium (Zr)−0.0690.0960.030.977−0.242−0.047−0.0560.952
% variance29.326.322.413.242.420.219.512.5
Table 5. Log-normal distribution simulated Syracuse trace metal analytical assay measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Table 5. Log-normal distribution simulated Syracuse trace metal analytical assay measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Trace MetalFactor-1Factor-2Factor-3Factor-4
r ¯ s r r ¯ s r r ¯ s r r ¯ s r
Arsenic (As)0.1920.010.2910.2430.8660.2430.0260.01
Chromium (Cr)0.890.0080.350.0530.2220.0480.0770.013
Cobalt (Co)0.4140.0170.770.1780.3340.1770.0710.014
Copper (Cu)0.9560.0050.160.0170.1730.017−0.0270.014
Iron (Fe)0.390.0170.780.1840.330.1840.0570.015
Lead (Pb)0.1870.010.2880.2440.8670.2450.0280.009
Manganese (Mn)0.5180.0170.650.1230.3450.1210.1240.015
Mercury (Hg)0.940.0050.1940.0160.2010.016−0.010.014
Molybdenum (Mo)0.190.0150.1150.0170.0880.0160.9550.004
Nickel (Ni)0.9380.0050.2360.030.1720.025−0.0130.014
Rubidium (Rb)−0.4180.0160.6890.2390.130.24−0.0730.011
Selenium (Se)0.8460.0130.230.0810.4090.0820.0090.018
Strontium (Sr)0.8880.011−0.2380.1260.0520.13−0.0720.016
Zinc (Zn)0.6350.0180.2710.1630.6390.162−0.0530.017
Zirconium (Zr)−0.240.014−0.0490.014−0.0540.0140.9490.004
% variance42.120.119.412.4
NOTE: bold font denotes a prominent factor loading (see Table 4); s r denotes the standard deviation of a simulated assaying data correlation coefficient.
Table 6. Resampled Syracuse trace metal analytical measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Table 6. Resampled Syracuse trace metal analytical measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Trace MetalFactor-1Factor-2Factor-3Factor-4
r ¯ s r r ¯ s r r ¯ s r r ¯ s r
Arsenic (As)0.2060.0360.5220.3710.6250.3810.0260.037
Chromium (Cr)0.8870.030.3220.0890.2480.0720.0710.036
Cobalt (Co)0.4360.1080.6120.2860.4580.2640.0750.064
Copper (Cu)0.9530.0260.1820.0560.1490.071−0.0360.043
Iron (Fe)0.4140.1130.6150.2960.4570.2750.0620.068
Lead (Pb)0.2010.0360.5210.3740.6250.3840.0290.036
Manganese (Mn)0.5210.0860.550.2040.4150.1830.1230.069
Mercury (Hg)0.9340.0260.2170.0530.1860.07−0.0080.041
Molybdenum (Mo)0.1810.0620.1190.0590.0910.0960.9440.09
Nickel (Ni)0.9340.0250.2290.0610.180.058−0.0080.036
Rubidium (Rb)−0.3460.1080.4610.3580.3760.383−0.0130.105
Selenium (Se)0.8380.0330.3180.1330.3240.1590.0180.034
Strontium (Sr)0.8670.064−0.1170.198−0.0730.219−0.0960.069
Zinc (Zn)0.6180.0480.4290.2530.4730.277−0.0580.043
Zirconium (Zr)−0.2240.049−0.0400.042−0.0300.1010.9440.087
% variance41.720.818.812.6
NOTE: bold font denotes a prominent factor loading (see Table 4); Sr denotes the standard deviation of a resample correlation coefficient.
Table 7. Syracuse trace metal induced gamma distributed measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Table 7. Syracuse trace metal induced gamma distributed measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Trace MetalFactor-1Factor-2Factor-3Factor-4
r ¯ s r r ¯ s r r ¯ s r r ¯ s r
Arsenic (As)0.0790.0950.380.3190.5050.3660.1080.197
Chromium (Cr)0.8460.0660.2730.1510.1280.0980.0690.08
Cobalt (Co)0.50.1710.5690.2820.240.1980.1040.148
Copper (Cu)0.8840.0720.1620.1630.0560.11400.087
Iron (Fe)0.4830.1750.5760.2890.2330.2090.0890.154
Lead (Pb)0.2710.0750.3960.2920.4780.3960.0330.231
Manganese (Mn)0.5180.150.4990.2070.2570.1450.1620.131
Mercury (Hg)0.8590.070.1870.1520.10.1330.0050.097
Molybdenum (Mo)0.1770.0810.1050.1370.2150.3510.6860.36
Nickel (Ni)0.8670.0660.1960.1570.0640.107−0.0120.084
Rubidium (Rb)−0.2700.1860.4180.4160.1510.351−0.0470.294
Selenium (Se)0.770.0810.2560.1540.2180.2090.0170.132
Strontium (Sr)0.6850.161−0.1540.25−0.1730.168−0.1800.13
Zinc (Zn)0.0430.0430.1460.2450.2960.2960.0840.294
Zirconium (Zr)−0.2290.075−0.0800.1390.130.3780.6540.37
% variance34.916.912.911.1
NOTE: bold font denotes a prominent factor loading (see Table 4); Sr denotes the standard deviation of a combined simulated assaying and resampling error correlation coefficient.
Table 8. Syracuse trace metal induced beta distributed measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Table 8. Syracuse trace metal induced beta distributed measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Trace MetalFactor-1Factor-2Factor-3Factor-4
r ¯ s r r ¯ s r r ¯ s r r ¯ s r
Arsenic (As)0.2110.1490.6360.3550.3530.3810.0070.125
Chromium (Cr)0.750.1170.330.1610.2120.1760.0140.18
Cobalt (Co)0.4980.2420.4050.2570.3730.3040.040.188
Copper (Cu)0.7520.1490.3250.1680.1610.241−0.0460.209
Iron (Fe)0.4740.2530.3880.2670.3660.3120.0290.199
Lead (Pb)0.2060.1510.6340.3590.3520.3830.0090.125
Manganese (Mn)0.5570.2020.3720.2250.3030.2340.0580.188
Mercury (Hg)0.4890.2380.170.2280.1140.2340.030.364
Molybdenum (Mo)0.0310.1890.0210.2140.0160.2710.4940.457
Nickel (Ni)0.7550.1370.3060.170.1740.205−0.0280.203
Rubidium (Rb)−0.0990.3050.090.3740.2920.550.0250.292
Selenium (Se)0.4330.2250.2580.2210.1410.2380.0380.368
Strontium (Sr)0.6010.2730.2230.278−0.0090.44−0.0390.235
Zinc (Zn)0.4920.1170.570.2210.290.305−0.0460.187
Zirconium (Zr)−0.1640.168−0.0920.173−0.0450.2170.3590.485
% variance28.320.216.110.2
NOTE: bold font denotes a prominent factor loading (see Table 4); Sr denotes the standard deviation of a combined simulated assaying and resampling error correlation coefficient.
Table 9. Syracuse trace metal induced uniform distributed measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Table 9. Syracuse trace metal induced uniform distributed measurement error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Trace MetalFactor-1Factor-2Factor-3Factor-4
r ¯ s r r ¯ s r r ¯ s r r ¯ s r
Arsenic (As)0.2746.0 × 10−50.9492.0 × 10−50.1357.0 × 10−5−0.0327.0 × 10−5
Chromium (Cr)0.8888.0 × 10−50.1971.4 × 10−40.3681.8 × 10−40.0061.5 × 10−4
Cobalt (Co)0.4331.5 × 10−40.3021.1 × 10−40.821.0 × 10−40.0491.3 × 10−4
Copper (Cu)0.9245.0 × 10−50.2831.2 × 10−40.1491.4 × 10−4−0.1481.3 × 10−4
Iron (Fe)0.4121.5 × 10−40.2931.2 × 10−40.839.0 × 10−50.0351.3 × 10−4
Lead (Pb)0.2716.0 × 10−50.952.0 × 10−50.137.0 × 10−5−0.0267.0 × 10−5
Manganese (Mn)0.6141.9 × 10−40.1851.7 × 10−40.651.8 × 10−40.1241.9 × 10−4
Mercury (Hg)0.9275.0 × 10−50.2591.3 × 10−40.1441.4 × 10−4−0.1191.4 × 10−4
Molybdenum (Mo)0.0661.2 × 10−4−0.011.1 × 10−40.0511.2 × 10−40.9783.0 × 10−5
Nickel (Ni)0.9266.0 × 10−50.191.4 × 10−40.2521.6 × 10−4−0.1061.5 × 10−4
Rubidium (Rb)−0.2681.3 × 10−4−0.1051.0 × 10−40.8637.0 × 10−5−0.0671.2 × 10−4
Selenium (Se)0.877.0 × 10−50.4221.2 × 10−40.1381.3 × 10−4−0.1021.2 × 10−4
Strontium (Sr)0.8747.0 × 10−50.2721.1 × 10−4−0.3041.7 × 10−4−0.0291.2 × 10−4
Zinc (Zn)0.6111.0 × 10−40.7448.0 × 10−50.1161.1 × 10−4−0.1661.0 × 10−4
Zirconium (Zr)−0.3351.1 × 10−4−0.111.0 × 10−4−0.0231.0 × 10−40.9165.0 × 10−5
% variance42.220.419.612.7
NOTE: bold font denotes a prominent factor loading (see Table 4); Sr denotes the standard deviation of a combined simulated assaying and resampling error correlation coefficient.
Table 10. Factor score spatial autocorrelation index values for the various datasets (Table 5, Table 6, Table 7, Table 8 and Table 9).
Table 10. Factor score spatial autocorrelation index values for the various datasets (Table 5, Table 6, Table 7, Table 8 and Table 9).
Error SourceFactor-1Factor-2Factor-3Factor-4
MCGRMCGRMCGRMCGR
Original data0.1841.0390.2420.80.1620.7240.30.617
−0.078−0.129−0.081−0.098−0.081−0.097−0.081−0.098
Analytical (log-normal assumption)0.1841.0380.2310.7930.1720.7330.2970.62
−0.01−0.013−0.026−0.028−0.028−0.024−0.011−0.012
Resampling0.1531.0480.1760.8070.180.780.2350.696
−0.06−0.086−0.058−0.072−0.07−0.064−0.059−0.064
Gamma assumption0.1951.040.2050.8080.1650.7790.2020.766
−0.05−0.074−0.074−0.073−0.078−0.063−0.077−0.064
Beta assumption0.1320.9890.1890.740.2180.7460.0880.882
−0.085−0.122−0.062−0.131−0.089−0.107−0.12−0.135
Uniform assumption0.1780.8990.2650.4910.3170.7180.2890.648
(<0.001)(<0.001)(<0.001)(<0.001)(<0.001)(<0.001)(<0.001)(<0.001)
Mixture0.1521.0480.1740.8060.1820.780.2350.698
−0.062−0.088−0.058−0.074−0.062−0.064−0.059−0.063
NOTE: standard errors appear in parentheses; the original data standard errors assume randomization (normality respectively renders standard errors of 0.081 and 0.099).
Table 11. Combined log-normal simulation and resampled Syracuse trace metal analytical error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Table 11. Combined log-normal simulation and resampled Syracuse trace metal analytical error varimax-rotated factor dimensions: census tract averaged soil sample data (10,000 replications).
Trace MetalFactor-1Factor-2Factor-3Factor-4
r ¯ s r r ¯ s r r ¯ s r r ¯ s r
Arsenic (As)0.2050.0380.5390.3730.6060.3840.0260.038
Chromium (Cr)0.8840.0330.3190.0930.2480.0770.0710.038
Cobalt (Co)0.4390.1150.5960.2890.4660.2670.0760.068
Copper (Cu)0.9480.030.1850.0620.1450.077−0.0360.045
Iron (Fe)0.4170.120.5980.2990.4650.2790.0620.073
Lead (Pb)0.2010.0380.5370.3760.6050.3870.0290.037
Manganese (Mn)0.5230.0910.5390.2070.4190.1860.1220.072
Mercury (Hg)0.930.030.220.0590.1830.077−0.0070.043
Molybdenum (Mo)0.1810.0630.1170.0590.0930.1030.9410.097
Nickel (Ni)0.930.0290.2290.0660.1790.065−0.0090.038
NOTE: bold font denotes a prominent factor loading (see Table 4); Sr denotes the standard deviation of a combined simulated assaying and resampling error correlation coefficient.
Table 12. Factor variance estimates based upon a spatial simultaneous autoregressive model specification and census tract aggregated heavy metal results.
Table 12. Factor variance estimates based upon a spatial simultaneous autoregressive model specification and census tract aggregated heavy metal results.
DatasetParameterFactor-1Factor-2Factor-3Factor-4
Original
(Table 4)
σ ^ 2 1111
ρ ^ 0.2640.4350.5270.484
σ ^ adjusted 2 0.9280.8320.7870.785
Pr(S-W)<0.00010.5750.9610.444
Analytical measurement error (Table 5) σ ^ 2 0.9980.7860.7830.993
ρ ^ 0.2640.4420.5320.484
σ ^ adjusted 2 0.9260.6490.6110.781
Pr(S-W)<0.00010.6070.8120.443
Sampling error (Table 6) σ ^ 2 0.8780.4390.4150.777
ρ ^ 0.2470.4820.5320.484
σ ^ adjusted 2 0.8210.3510.3190.611
Pr(S-W)<0.00010.8930.6040.455
Mixture
(Table 11)
σ ^ 2 0.8750.4350.4130.774
ρ ^ 0.2460.4860.5290.487
σ ^ adjusted 2 0.8180.3560.3190.606
Pr(S-W)<0.00010.8960.5940.451
Specification (Table 7) σ ^ 2 0.8890.4560.3430.488
ρ ^ 0.2820.4600.4260.435
σ ^ adjusted 2 0.8160.3650.2910.403
Pr(S-W)<0.00010.8560.1330.985
Table 13. Syracuse trace metal analytical error varimax-rotated factor dimension comparisons (Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 and Table 11).
Table 13. Syracuse trace metal analytical error varimax-rotated factor dimension comparisons (Table 4, Table 5, Table 6, Table 7, Table 8 and Table 9 and Table 11).
Trace MetalOriginal DataCensus Tract Averaged Simulated Data
PointsCensus TractsSampling ErrorMeasurement ErrorSpecification ErrorMixture
GammaBetaUniform
Arsenic (As)33333223
Chromium (Cr)11111111
Cobalt (Co)22222132
Copper (Cu)11111111
Iron (Fe)2222232
Lead (Pb)3333223
Manganese (Mn)2222113
Mercury (Hg)1111111
Molybdenum (Mo)44444444
Nickel (Ni)11111111
Rubidium (Rb)2223
Selenium (Se)1111111
Strontium (Sr)11111111
Zinc (Zn)3331221
Zirconium (Zr)4444444
Threshold | r ¯ | 0.6740.6940.6390.6120.5050.4940.7440.596
% variance91.293.69493.975.874.894.993.3
NOTE: — denotes no factor loading (i.e., correlation) greater than the stipulated threshold value.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Griffith, D.A.; Chun, Y. Soil Sample Assay Uncertainty and the Geographic Distribution of Contaminants: Error Impacts on Syracuse Trace Metal Soil Loading Analysis Results. Int. J. Environ. Res. Public Health 2021, 18, 5164. https://doi.org/10.3390/ijerph18105164

AMA Style

Griffith DA, Chun Y. Soil Sample Assay Uncertainty and the Geographic Distribution of Contaminants: Error Impacts on Syracuse Trace Metal Soil Loading Analysis Results. International Journal of Environmental Research and Public Health. 2021; 18(10):5164. https://doi.org/10.3390/ijerph18105164

Chicago/Turabian Style

Griffith, Daniel A., and Yongwan Chun. 2021. "Soil Sample Assay Uncertainty and the Geographic Distribution of Contaminants: Error Impacts on Syracuse Trace Metal Soil Loading Analysis Results" International Journal of Environmental Research and Public Health 18, no. 10: 5164. https://doi.org/10.3390/ijerph18105164

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop