Next Article in Journal
Effect of the Forest-Mine Boundary Form on Woody Colonization and Forest Expansion in Degraded Ecosystems
Previous Article in Journal
Effects of Several Chemicals on the Migration Behavior of Bursaphelenchus xylophilus (Steiner & Buhrer) Nickle
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Comparison of Variance Estimators for Systematic Environmental Sample Surveys: Considerations for Post-Stratified Estimation

1
Department of Forest Engineering, Resources and Management, Oregon State University, Corvallis, OR 97331, USA
2
United States Department of Agriculture Forest Service, Pacific Northwest Research Station, Corvallis, OR 97331, USA
*
Author to whom correspondence should be addressed.
Forests 2021, 12(6), 772; https://doi.org/10.3390/f12060772
Submission received: 17 March 2021 / Revised: 1 June 2021 / Accepted: 8 June 2021 / Published: 11 June 2021
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Abstract

:
The estimation of the sampling variance of point estimators under two-dimensional systematic sampling designs remains a challenge, and several alternative variance estimators have been proposed in the past few decades. In this work, we compared six alternative variance estimators under Horvitz-Thompson (HT) and post-stratification (PS) point estimation regimes. We subsampled a multitude of species-specific forest attributes from a large, spatially balanced national forest inventory to compare the variance estimators. A variance estimator that assumes a simple random sampling design exhibited positive relative bias under both HT and PS point estimation regimes ranging between 1.23 to 1.88 and 1.11 to 1.78 for HT and PS, respectively. Alternative estimators reduced this positive bias with relative biases ranging between 1.01 to 1.66 and 0.90 to 1.64 for HT and PS, respectively. The alternative estimators generally obtained improved efficiencies under both HT and PS, with relative efficiency values ranging between 0.68 to 1.28 and 0.68 to 1.39, respectively. We identified two estimators as promising alternatives that provide clear improvements over the simple random sampling estimator for a wide variety of attributes and under HT and PS estimation regimes.

1. Introduction

Frequently, environmental sample surveys utilize systematic or quasi-systematic sampling designs to estimate parameters of interest. The network of permanent sample plots that is part of the United States Forest Inventory and Analysis (FIA) program is one such case [1]. The FIA program implements a complex survey that provides estimates of multiple forest attributes across a wide range of spatial scales. These estimates are used in a variety of applications, including forest health monitoring, reporting on the current status and change of forested environments, and landscape-level forest management planning. Additionally, the FIA program serves a broad base of end-users, including members of the public and federal and state forest management programs, among many others.
The FIA program implements a spatially balanced, quasi-systematic sampling design [1]. While these sampling designs have been shown to provide more precise estimates than simple random sampling designs in many contexts, estimating the precision of these estimators remains a challenge. Design-based estimators of the sampling variance are either intractable, as in the case of strictly systematic designs ([2], p. 83), or very unstable due to their very small pairwise inclusion probabilities, as in the case of quasi-systematic sampling designs [3]. Typically, variance estimators that assume the data were collected with a simple random sampling design are employed ([4], p. 54), obscuring important efficiency gains resulting from these designs. This practice is often justified because it is a conservative estimator, or because it would be appropriate if the population units were in random order ([1], pp. 25–26). However, many studies have demonstrated that such estimators may significantly overestimate the true variance [5,6]. Biased estimators of sampling variance can result in a misallocation of resources, as a larger sample size than is necessary to meet some minimum desired precision may be suggested using an inflated variance estimator. Confidence intervals produced using these estimators will have coverage probabilities that likely exceed the nominal rate.
Many studies have examined variance estimation for systematic sampling in the context of environmental sample surveys. These studies typically address the problem of variance estimation under the Horvitz-Thompson (HT) point estimator. Generally, these studies conclude that a variance estimator that assumes simple random sampling, i.e., the “standard” variance estimator, will tend to over-estimate the sampling variance, while proposing or comparing alternative estimators that, to some degree, have less positive bias than the standard estimator [5,6,7,8]. As a result of these studies, several alternative estimators have arisen as promising reliable replacements for the standard estimator.
However, many environmental sample surveys rely on a post-stratification (PS) point estimator instead of the HT estimator. PS estimators can provide valuable increases in efficiency with respect to HT estimation [9], but they may also affect the relative performance of the standard estimator of the variance. The assumption that the population units are in random order, which is often cited to justify the standard estimator of the variance ([1], pp. 25–26; [10], p. 212), does not hold if there is a global trend in the population. In this situation, the performance of the standard estimator of the variance degrades rapidly, with the overestimation of the design variance increasing as the strength of the trend increases [6]. In fact, the sampling variability in a systematic design arises from the variability in local neighborhoods, an observation that has led to the development of several alternative estimators [3,11,12,13]. Post-stratification, which allows for different means in each stratum, may reduce the effect of a global trend in the standard estimator of the variance and improve its performance. However, to our knowledge, very little attention has been paid to the assessment of systematic variance estimators under PS estimation.
The task of variance estimation expands in complexity when analysts are faced with a “generic inference” survey. Generic inference surveys, such as the FIA, usually rely on a single post-stratification to estimate many attributes. The post-stratification is constructed to be most effective for a small set of variables, typically total volume or biomass. Thus, it may not be optimal for most variables and, as it may not remove the trend very effectively, it may not reduce the design bias of the standard estimator of the variance. Therefore, alternative estimators of the variance should be stable for a wide array of conditions, i.e., in situations where substantial trend, spatial correlation in the residuals, or both, exist. In support of this objective, we used a spatially balanced sample from the FIA program that provides a large swath of potential spatial domains and attributes with varying trends and spatial correlation structures upon which we compare and recommend potential alternative estimators.
The objective of this study was to examine six systematic variance estimators proposed in the sampling literature using a spatially balanced forest survey sample from the state of Oregon, USA. In this paper, we consider variance estimation for the PS and HT point estimators for a wide variety of forest attributes attained from a spatially balanced sample from the FIA program. We consider two specific cases. First, the estimators will be examined in a large domain setting, considering the mean basal areas, densities and volumes of 18 different tree species as attributes of interest. Second, the estimators will be examined across a number of smaller spatial domains within this large domain for the same set of forest attributes. This will provide an assessment of the proposed variance estimators in a generic inference setting, and recommendations for alternative estimators will follow from these results.

2. Materials and Methods

2.1. Study Area

The study area is the state of Oregon, located in the northwestern United States. Oregon contains a multitude of land cover types, including densely populated urban areas, agricultural land, forest land, deserts, and alpine areas. A recent study indicated that 48.4% of the Oregon land base is forested, of which 79.8% is timberland, i.e., capable of producing at least 1.4 m3 per hectare per year of industrial wood, and not withdrawn from utilization for timber production by regulation [14]. The remaining share are reserved forests, such as wilderness areas, or land incapable of producing least 1.4 m3 per hectare per year. Oregon has a diverse set of tree species, with the areas west of the Cascade mountain range dominated by Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco), western hemlock (Tsuga heterophylla (Raf.) Sarg), and Sitka spruce (Picea stichensis (Bong.) Carrière). Areas east of the Cascade Range dominated by ponderosa pine (Pinus ponderosa Lawson & C. Lawson), lodgepole pine (Pinus contorta Douglas ex Loudon), and western juniper (Juniperus occidentalis Hook. var. occidentalis). While coniferous species such as those stated are the most common, a number of deciduous species are present as well, including red alder (Alnus rubra (Bong.)), bigleaf maple (Acer macrophyllum Pursh), and quaking aspen (Populus tremuloides Michx.), among many others.
We used a set of 10,537 inventory plots collected by the FIA program between 2009 and 2018. The FIA in Oregon uses a 10-year panel rotation, and this set represents all ten panels, depicted in Figure 1. Each plot is a cluster of four fixed-radius, nested subplots. At each subplot, trees between 12.70 cm and a “breakpoint” diameter are measured on a circular plot of radius 7.32 m and trees larger than the breakpoint diameter are measured on a circular plot of radius 17.95 m. The breakpoint diameter is either 60.96 cm or 76.20 cm depending on whether the plot is east or west of the Cascade Range crest, respectively. For each tree, diameter at breast height (DBH), total height, and species are recorded, among other attributes. Cubic volumes for each tree, including top and stump, are predicted using the National Volume Estimator Library (NVEL) [15].

2.2. Sampling Design, Forest Attributes and Domains

The network of field plot locations for the FIA in Oregon used in this study arise from a random tessellated sampling design [1]. Plot centers were located using a compact hexagonal tessellation with hexagons that are approximately 2184 hectares in size, such that one field plot exists in each hexagon. The plot centers were located using a random azimuth and distance from the hexagon center. In some cases, plots from previous inventory systems were already present, and the locations of those plots were assigned to the hexagon rather than allocating a new field plot. For each field plot we calculated the stem volume ( V O L ,   m 3   ha 1 ), basal area ( B A ,   m 2   ha 1 ), and stem density ( D E N ,   stems   ha 1 ) for all species combined and for the species that were in the 40th percentile of the total volume in Oregon or greater.
Given this set of 10,537 field plots and their corresponding attributes, we created a population upon which variance estimators could be assessed by assigning each field plot to a position in a discrete two-dimensional space. This discrete space consists of a set of N = 10,537 positions upon which systematic sampling can be conducted. Population means of each species-specific attribute considered are given in Table 1, along with a proportion representing the number of times the species appeared in a plot divided by N .
To subsample this set of N positions, we leveraged the hexagonal tessellation by assigning each field plot to its coincident hexagon in the original sample. The centroids of this set of N hexagons, with forest attributes calculated from the field plots, represents the population upon which systematic sampling is conducted. To retain a hexagonal structure of any systematic sample produced from this population, every a th population unit along the horizontal, diagonal, and vertical axes of the hexagonal grid were included in the sample. Let t h i = ( t h i 1 , t h i 2 ) represent the row vector of discrete spatial coordinates of the i th plot in the h th post-stratum. Figure 2 displays one possible systematic sample from a hypothetical population in the discrete space with the discrete vectors t h i displayed for one systematic sample. Repeating this procedure for every possible systematic sample provides K = a 2 samples. We used a sampling interval of a = 5 for the following assessments. For the remainder of the analysis, the discrete positions t h i were treated as the true positions of the field plots.
The performance of the standard estimator of the variance is affected by the degree of spatial correlation between the observations in the sample as well as large-scale trend in a given attribute. By spreading the sample by a distance corresponding to the interval a , the spatial correlation will likely decrease, and the observations may become independent. Therefore, the advantage of alternative estimators may not be apparent. However, the density of the sample does not affect trends in the population, and even a less dense sample would allow evaluation of the effect of trends and post-stratification on the performance of the estimators.
The FIA program provides standard reports on many forest attributes for a wide range of domains. Domains are defined as subpopulations of trees that meet certain criteria that serve to define parameters of interest. Examples of domains include spatial domains such as counties, and species domains, such as the set of trees of a particular species. To assess the performance of the estimators when subpopulations are of interest, we defined several domains over which we conducted the analysis: a set of two spatial domains and a set of eighteen species domains. The first spatial domain is the area within the Oregon state boundary, which we will refer to as the “large domain”. In addition to this we create domains from the set of Oregon counties which we refer to as “small domains”. Two adjacent counties (Sherman and Gilliam, east of small domain 65, Figure 1) were removed from consideration as a small domain because no trees were present on any plot. Note that the plots in these counties were still included in the large domain.
The FIA program uses post-strata to enhance the precision of point estimates. FIA uses a large number of post-strata that contain information about forest structure, property ownership, land use and other variables. We used a coarser set of post-strata that mimic the forest structure covariates used operationally by the FIA in Oregon for the purposes of this study. First, we used a forest-non-forest mask produced for 2016 provided by the United States Forest Service Geospatial Technology and Applications Center using the methodology described in [16]. Second, we used three canopy cover classes, corresponding to 0–33%, 33–66% and 66–100% canopy cover derived from a canopy cover map published by the Multi-Resolution Land Characteristics Consortium (http://www.mrlc.gov/ (accessed on 17 August 2020)). Finally, we used an east-west delineation of Oregon derived from the level three ecoregions used by the Environmental Protection Agency (EPA) [17]. The combinations of the forest-non-forest classes, canopy cover classes, and east-west classes produce the twelve post-strata used in this study and are depicted in Figure 1.

2.3. Point Estimators

Observations of plot-level species-specific attributes were calculated by computing a weighted sum of trees that meet the criteria for a given domain. These weights arise from the different sizes used in the nested plot design ([4], p. 53). Let z ( t h i ) represent the observation of the i th plot-level forest attribute in the h th post-strata at the two-dimensional position vector t h i . Let S k refer to the set of index pairs h and i that correspond to the h th stratum and i th field plot included in the k th sample, i.e., each element of S k is an index pair. We obtain an unbiased estimate of the population mean via the post-stratification estimator under an equal inclusion probability design:
μ ^ P S , k = h = 1 H W h z ¯ h k + 1 n k { h , i }     S k e ^ ( t h i ) ,
where z ¯ h k is the sample mean of the observations in the h th post-stratum, W h = N h N is the weight of the h th post-stratum where N h is the number of population elements in the h th post-stratum and n k is the number of elements in the k th sample. This differs from actual practices in the FIA, where W h is determined from wall-to-wall covariate rasters of fine resolution, rather than field plots [4]. The second summation is over all stratum and plot index pairs in the sample. The error term e ^ ( t h i ) = z ( t h i ) z ¯ h k explains the deviation of z ( t h i ) from the estimate of the corresponding post-stratum-level mean. Using the above estimator, we can obtain the Horvitz-Thompson estimator for equal inclusion probability designs by setting H = 1 as a special case. We obtain:
μ ^ H T , k = z ¯ k + 1 n k { h , i } S k e ^ ( t h i )
where z ¯ k is the sample mean of the n k in the sample. Note that the second term is equal to zero and we obtain the sample mean as the HT estimator under equal inclusion probabilities. For brevity, we reserve the notation μ ^ k to refer to either the HT or PS estimator.

2.4. Variance Estimators

The sampling variance of the PS point estimator under simple random sampling designs is typically approximated using a first order Taylor expansion ([2], p. 235). We obtain:
A V ( μ ^ P S , k ) = f p c k 1 n k ( N 1 ) { h , i }   U e ( t h i ) 2
where e ( t h i ) = z ( t h i ) μ h is the error between z ( t h i ) and its corresponding stratum mean μ h and f p c k = N n k N is the finite population correction factor for the k th sample. Note that, for the HT estimator we obtain e ( t h i ) = z ( t h i ) μ . Many variance estimators for PS or, more generally, model-assisted estimators consider only this residual term (e.g., [18]). The standard estimator of the systematic variance that assumes the sample was collected with simple random sampling is one such estimator. We obtain the estimator:
V ^ S R S ( μ ^ k ) = f p c k 1 n k ( n k 1 )   { h , i } S k e ^ ( t h i ) 2 .
In the case of systematic sampling designs, pairwise inclusion probabilities for many population units will be zero, precluding the possibility of unbiased estimation of A V ( μ ^ P S ) . Under HT estimation, many alternatives have been proposed which typically involve squared differences between a local mean of neighboring points and the observation. To make these estimators consistent with PS estimation, we plug in the residual e ^ ( t h i ) to alternative estimators presented for HT variance estimation, which is an approach taken in several other studies [19,20].
Stevens and Olsen [3] proposed a neighborhood-based estimator for the use in the generalized random tessellated stratified (GRTS) sampling design that can be used in systematic designs. The estimator computes a weighted sum of squared errors over the neighborhood D S O , h ( t h i ) . The neighborhood is developed iteratively, first by including t h i itself plus the u nearest neighbors according to Euclidean distance from the element at t h i . Then any other points, t h j , that have t h i as a neighbor themselves are added to the neighborhood. Noting that the first-order inclusion probabilities for all units are a 2 , where a is the systematic sampling interval, we obtain the estimator:
V ^ S O , u ( μ ^ k ) = f p c k N 2 a 4 { h ,   i }   S k { h , i }     D S O , u ( t h i ) w h i , h i ( e ^ ( t h i ) e ¯ ^ D S O , u ( t h i ) ) 2
where w h i , h i is a plot-specific weight that decreases as the distance between t h i and t h i increases and e ¯ ^ D S O , u ( t h i ) = w h i , h i e ^ ( t h i   ) . This distance weighting effect is similar to kriging, where residual pairs that are close together will be weighted more than pairs of residuals further away when computing the neighborhood-level variance. These weights are optimized by ensuring the sum of e ¯ ^ D S O , u ( t h i ) over all plots adds up to the total error (refer to [3] for further information on this optimization procedure). We consider the special case where u = 12 . Implementations for V ^ S O were provided by the R package spsurvey [21].
Matérn ([11], pp. 120–122) describes an estimator for two-dimensional systematic sampling designs that relies on computing contrasts of units within non-overlapping neighborhoods as local variance estimates and aggregating these variance estimates over all neighborhoods to produce a final estimate. We considered two neighborhood configurations. First, we consider:
D M A T , h e x ( t h i ) = { ( t h i 1 1 ,   t h i 2 1 ) ,   ( t h i 1 ,   t h i 2 2 ) , ( t h i 1 1 ,   t h i 2 + 1 ) , ( t h i 1 1 ,   t h i 2 + 1 ) , ( t h i 1 , t h i 2 + 2 ) , ( t h i 1 + 1 ,   t h i 2 + 1 ) }
which represents the nearest neighbors with respect to t h i and:
D M A T , p a r ( t h i ) = { ( t h i 1 , t h i 2 ) ,   ( t h i 1 1 , t h i 2 + 1 ) , ( t h i 1 + 1 , t h i 2 + 1 ) ,   ( t h i 1 , t h i 2 + 2 ) }
which represents a parallelogram constructed from the neighbors of t h i and itself. This second configuration closely mimics the original Matérn estimator, which was developed for the case of rectangular grids. Both neighborhood specifications are displayed in Figure 2.
For each neighborhood, a contrast of the mean-centered residuals of the elements in the neighborhood is computed. Let e ^ c t , l = ( e ^ h i ( t h i ) e ¯ ^ D M A T , c ( t h i ) ) T ,   { h i } D M A T , c ( t h i ) represent the vector of mean-centered residuals for the l th neighborhood using configuration c of the Matérn estimator (Table 2). Points in the neighborhood that fall outside of the study space are given a value of zero. For each neighborhood calculate the value:
T l = ( e ^ c t , l T c l ) 2 n l
where n l indicates the number of sampled population units in the neighborhood and c l is a vector containing elements that are either −1 or 1. We use the vectors ( 1 ,   1 ,   1 , 1 , 1 , 1 ) T and ( 1 , 1 , 1 1 ) T for the hexagonal and parallelogram neighborhood structures, respectively. The positive and negative elements of c l create two groups which each make a prediction for the mean residual in the neighborhood, and Equation (8) calculates a squared difference between these two predictions. This squared difference creates the basis for a neighborhood-level variance estimate. Using these values, we obtain the Matérn variance estimator:
V ^ M A T , c ( μ ^ k ) = f p c k q c n 2 l = 1 L T l
where q p a r = 4 and q h e x = 7 for the parallelogram and hexagonal neighborhood structures, respectively.
Adjustment-based estimators rely on applying an adjustment factor to V ^ S R S that approximates the inter-cluster correlation coefficient. This approach is motivated by the fact that variance estimates using V ^ S R S in the case of systematic sampling will tend to have a larger positive bias in situations where the inter-cluster correlation, i.e., the correlation between pairs of possible systematic samples, is strongly positive or negative. One such estimator was proposed by D’Orazio [12] which relies on the Geary’s C index. We consider two neighborhood structures. The first structure considers the six closest points to an element at t h i to be neighbors. This is equivalent to the D M A T , h e x neighborhood. We also consider second order neighbors, i.e., those elements that are first order neighbors, as well as their first order neighbors. Noting this we obtain the expression for Geary’s C:
C k = ( n k 1 ) h i h i δ h i , h i ( e ^ ( t h i ) e ^ ( t h i ) ) 2 2 ( h i h i δ h i , h i ) ( h i e ^ ( t h i ) 2 )
where δ h i , h i is an indicator variable that takes the value one if the element at t h i is considered a neighbor of the element at t h i . We obtain the D’Orazio’s Geary’s C estimator:
V ^ D C ( μ ^ k ) = { V ^ S R S ( μ ^ k ) C k ,   h i e ^ ( t h i ) 2 0 V ^ S R S ( μ ^ k ) , o . w . .
The first condition of (11) ensures that Geary’s C can be calculated. If not, V ^ S R S is used without adjustment. For the purposes of reporting results, we calculate Geary’s C using Equation (10) with all N field plots using the residuals obtained from the HT estimator. We refer to this quantity as C for the remainder of the text.

2.5. Performance Measures

Define a systematic sampling interval a such that every a th element along each axis is sampled. Sampling in this fashion for a hexagonal tessellation implies that there are a 2 = K possible subsamples. Furthermore, all samples produced will retain a hexagonal structure (Figure 2). Using all subsamples, we obtain the actual variance of the estimator, which we will refer to as the “true variance” for brevity:
V S Y S ( μ ^ ( p ) ) = 1 K k = 1 K ( μ ( p ) μ ^ k ( p ) ) 2
where the superscript ( p ) is used to refer to a domain-attribute combination indexed by p . Although secondary to the objective of this study, comparing the performances of the point estimators HT and PS can be done using the relative efficiency of the true variances:
R E ( μ ^ H T ( p ) , μ ^ P S ( p ) ) = V S Y S ( μ ^ H T ( p ) ) V S Y S ( μ ^ P S ( p ) ) .
We relied on two general categories of performance measures to assess the behavior of the proposed variance estimators. The first category assesses the performance of an estimator for a particular attribute in a particular domain. Let V ^ g k ( p ) refer to the k th variance estimate for the p th domain-attribute combination for estimator g { I R S ,   S O ,   ,   D I , 2 } . For brevity, let V S Y S ( μ ^ ( p ) ) = V S Y S ( p ) . We obtain the mean error of the ratio of the variance estimate to the systematic variance:
R a t i o ¯ ( V ^ k ( p ) ) = 1 K k = 1 K V ^ g k ( p ) V S Y S ( p )
which quantifies the mean deviation of the ratio of the variance estimate from one. R a t i o ¯ ( V ^ k ( p ) ) is equivalent to the design expectation of the quantity V ^ g k ( p ) V S Y S ( p ) . Equation (14) can be considered a relative measure of bias for a variance estimate for a particular domain-attribute combination. In addition to (14) we also produce the relative efficiencies of the variance estimators themselves with respect to the simple random sampling estimator, V ^ S R S . In many environmental sampling surveys V ^ S R S is considered the standard choice for a variance estimator under this design and thus assessing the alternatives with respect to V ^ S R S can provide valuable information on potential efficiency gains. We obtain:
R E ( V ^ g * ( p ) ,   V ^ S R S ( p ) ) = S S E ( V ^ g * ( p ) ) S S E ( V ^ S R S ( p ) )
where g * represents any estimator that is not the simple random sampling estimator and S S E ( V ^ g ( p ) ) = k = 1 K ( V ^ g ( p ) V S Y S ( p ) ) 2 is the sum of squared errors for a given estimator over all K subsamples.
The second category of performance measures consider the performance of an estimator across several domain-attribute combinations. We consider the mean of the mean ratio across all p domain-attribute combinations:
R a t i o ¯ ¯ ( V ^ g ) = 1 P p = 1 P R a t i o ¯ ( V ^ g ( p ) ) .
while R a t i o ¯ is a measure of relative bias, R a t i o ¯ ¯ is simply an aggregation of these measures across several attributes and is not strictly a measure of bias itself. Instead, R a t i o ¯ ¯ can be interpreted as a performance index over many estimates of the true variance for a particular estimator. R a t i o ¯ ¯ values close to one indicate a tendency for an estimator to over-estimate the true variance across many domain-attribute combinations, and vice versa. In a similar manner, we consider the mean of the relative efficiencies across all P domain-attribute combinations and obtain:
R E ¯ ( V ^ g ) = 1 P   p = 1 P R E ( V ^ g ( p ) )
which is the mean relative efficiency for a given estimator over all P domain-attribute combinations.

3. Results

3.1. Comparison of Point Estimators

We estimated three attributes (VOL, BA, and DEN) for each species considered in this analysis using the PS estimator and HT estimator. The relative efficiencies of these point estimators for each species-specific attribute are shown in Figure 3. For most species-specific attributes, the use of the post-stratification estimator provides a lower variance (i.e., a relative efficiency less than one) than the Horvitz-Thompson estimator, with relative efficiencies as low as 0.75 for the VOL attribute of Pseudotsuga menziesii. In some cases, the post-stratified estimator was not as efficient, e.g., for Abies grandis and Pinus contorta, among others, with relative efficiencies as high as 1.4. This suggests that the post-strata correlate well with species-specific attributes for some species, but not others. Note that species with larger relative efficiencies tended to have a much lower prevalence (Table 1). Additionally, we used a much coarser set of post-strata than is used in official reporting by the FIA, and relative efficiencies using this larger set of post-strata may be lower than what we obtained.
The HT and PS point estimators imply different correlation structures among the residuals attained in Equations (1) and (2) for the same species-specific attributes, as well as different models for the trend. We summarize changes in this structure using the Geary’s C correlation coefficient, C , in Figure 4. For all species-specific attributes we considered, C increased when going from the HT to the PS estimator. The effect is most extreme for Pseudotsuga menziesii and Pinus ponderosa attributes. A large change in C implies that the post-stratification variables explain either the linear trend or the spatial correlation structure of the residuals more than without (i.e., under HT estimation). Even under the PS estimator, however, C remains large, implying that some spatial dependence is present in the residual, or that the post-stratification variables do not completely accommodate the trend, or both. Large increases in C are present for Pseudotsuga menziesii, Pinus ponderosa, and Alnus rubra for all three attributes. Other species exhibited only moderate increases.

3.2. Comparison of Variance Estimators

3.2.1. Large Spatial Domain Assessment

For the large domain assessment, we produced variance estimators and their performance measures considering species-specific attributes. Figure 5 displays the R a t i o ¯ for species-specific attributes for all variance estimators under HT and PS point estimation. For V ^ S R S under the HT point estimator the degree of positive bias for attributes with small C values (i.e., stronger spatial dependence) is large. As this spatial dependence weakens, the degree of bias becomes less apparent. Under the PS point estimator, the range of C values decreases (Section 3.1) and the positive bias of V ^ S R S is reduced to some degree. For the alternative estimators under the HT estimator the positive bias observed for V ^ S R S for small C attributes is clearly reduced, producing ratio estimates very close to one. For attributes with C values exceeding 0.7 this reduction is not as clear.
Figure 6 displays the relative efficiency for each of the P domain-attribute combinations under the HT and PS point estimators. Generally, the alternative estimators improve over V ^ S R S , as indicated by the points that fall below the solid line. Under HT estimation this improvement is most clear for domain-attribute combinations with small C values. Under the PS point estimator, the improvement is not as clear, with several notable points falling above the solid line. The V ^ S O , V ^ D C , 1 , and V ^ D C , 2 estimators appear to have the most reliable improvements under both HT and PS point estimators, with only relatively few points falling above the solid line.
Table 3 summarizes the estimators across the P domain-attribute combinations. Under HT estimation the V ^ S R S tended to demonstrate positive bias, with a R a t i o ¯ ¯ value of 1.23. The positive bias for the alternative estimators under HT estimation is less, with R a t i o ¯ ¯ values ranging between 0.96 to 1.12. The relative efficiencies for the alternative estimators were less than one except for V ^ M A T , p a r , which suggests that these alternatives provide gains in efficiency under the HT estimator. Under the PS point estimator, the V ^ S R S estimator demonstrates positive bias with a R a t i o ¯ ¯ value of 1.11. The alternative estimators had smaller R a t i o ¯ ¯ values, ranging between 0.90 to 1.08. The mean relative efficiencies under the PS point estimators were less than one with the exception of V ^ M A T , p a r and V ^ M A T , h e x , although these relative efficiencies are slightly higher than the analogous estimators under the HT point estimator.

3.2.2. Small Spatial Domain Assessment

In addition to the large spatial domain, we conducted an assessment for the species-specific attributes for the small domains in Oregon that correspond, in most cases, to existing county boundaries (see Section 2.2). Such an assessment provides a range of different, albeit smaller, populations, each with P domain-attribute combinations exhibiting a range of spatial structures. If a species-specific attribute had an estimate of zero for any of the K subsamples within a domain, then the species-specific attribute was discarded for that domain. This was considered appropriate because a desirable property for a variance estimator is to produce a variance of zero if there is no variance between sample observations. However, this property guarantees a given estimator will be negatively biased in the case where the point estimate is zero. For this reason, species-specific attributes within a domain are removed from consideration if any point estimate for that domain was zero among the K subsamples. In total, there are 30 small domains that contain between six to 30 species-specific attributes. For each small domain, the sample size within that domain is a random variable. Let n c represent the mean sample size across the K subsamples within a small domain. We assign each domain to one of three classes, based on its sample size: n c 10 , 10 < n c 20 and n c > 20 . These thresholds represent the 33rd and 66th percentile of the domain-specific sample sizes across study areas.
We produced variance estimates for each of these species-specific attributes in each small domain under the HT and PS estimators (Figure 7). Generally, all estimators exhibited a strong positive bias across the range of C values and across all sample size classes, with a diminishing positive bias as the sample size increased. For most estimators, variance estimation under the post-stratified estimator reduced the bias with notable exceptions for some of the attributes in the smallest sample size classes. In the small sample size classes, two specific attributes are apparent as positive outliers for all estimators that correspond to the VOL and BA attribute in spatial domain 47 (see Figure 1). However, these outliers have R a t i o ¯ values that are more extreme than the V ^ S R S estimator for the V ^ D C , 1 , V ^ D C , 2 and V ^ M A T , p a r estimators.
All three of these alternative estimators rely to some extent on the definition of neighborhoods surrounding given sample positions (Equations (5), (6) and (9)). For a small spatial domain, these estimators will only have access to relatively few neighborhoods to compute the variance, which may lead to the extreme values we observe. In addition, spatial domain 47 appears to have a large perimeter-to-area ratio and is very irregular in shape. In contrast, V ^ S O constructs neighborhoods for each point in the sample space, which provides a larger number of neighborhoods used to estimate the variance, which may resolve some of these issues.
Generally, the estimators attain lower relative efficiencies as sample size increases (Figure 8) with notable exceptions occurring for the V ^ M A T , p a r estimator. The V ^ S O estimator is more efficient the V ^ S R S across all of the possible attributes, and V ^ D C , 1 is more efficient for many, but not all, attributes. Only small differences exist between variance estimates attained under HT or PS point estimators. Generally, variance estimates attained under PS point estimation have slightly higher relative efficiencies, suggesting that the PS point estimator removes some of the benefit of using an alternative estimator.
The SRS variance estimator has the largest R a t i o ¯ ¯ value among all estimators considered across all three sample size classes with the exception of V ^ D C , 2 under the post-stratified estimator (Table 4). The V ^ S R S estimator under post-stratification has a lower R a t i o ¯ ¯ value than under the HT estimator, which is consistent with the large domain assessment in Section 3.2.1. The V ^ M A T , h e x estimator attained the R a t i o ¯ ¯ value closest to zero under both the HT and PS point estimators. With the exception of the V ^ M A T , p a r estimator, all alternative estimators attained a lower mean relative efficiency, with the V ^ S O estimator attaining the lowest relative efficiency across all three sample size classes ranging between 0.67 and 0.68 relative efficiency.

4. Discussion

4.1. Variance Estimation under Horvitz-Thompson and Post-Stratification Estimators

We examined performance measures for six variance estimators under HT and PS point estimation for a variety of populations and spatial domains. Considering both HT and PS estimation provides insight into how well the variance estimators can handle populations with varying spatial structure in the residual. A robust variance estimator should be able to handle these diverse situations under both point estimation regimes.
Several studies have examined variance estimation for systematic and similar designs under a Horvitz-Thompson point estimator. Magnussen et al. [6] investigated alternative variance estimators under simulated populations and found improvements for estimators similar to V ^ D C , 1 and V ^ M A T , p a r with R a t i o ¯ ¯ values ranging between 1.03 to 1.37 and 0.98 to 1.19, respectively while attaining V ^ S R S values ranging between 1.50 to 2.88. Similarly, Stevens and Olsen [3] present results using V ^ S O under a generalized random tessellated stratified design and attained R a t i o ¯ close to one and confidence interval coverage probabilities very close to the nominal rate using a simulated data set. Our results also indicate improvements using the alternative estimators. For the large spatial domain, we noted a trend of positive bias for the V ^ S R S estimator with decreasing Geary’s C (Figure 5 and Figure 7). Specifically, we attained R a t i o ¯ ¯ values of 1.23 for the large domain and values between 1.54 to 1.88 for the small spatial domains. Several alternative estimators provide clear improvements over V ^ S R S under an HT point estimator. V ^ S O and V ^ D C , 1 attained R a t i o ¯ ¯ values closer to one and smaller relative efficiencies in both the large spatial domain and small spatial domains. Notably, V ^ S O was the only estimator that attained smaller relative efficiencies for every species-specific attribute we considered across all spatial domains under the HT point estimator (Figure 7 and Figure 8).
When moving from HT to PS point estimation, two things may happen; a reduction in the spatial dependence in the residual and removal of at least some of the large-scale trend. Both of these can reduce the positive bias of the V ^ S R S [6,22]. We found a reduction in C for the residuals (Figure 4), and an increase in relative efficiency of the point estimates for many attributes (Figure 3) when moving from HT to PS, suggesting that the removal of some spatial correlation and some trend is expected for these attributes. However, for many species-specific attributes this change was small. This implies that the positive bias typically observed under the V ^ S R S estimator may remain even under PS point estimation for the attributes and post-strata we considered (Table 3 and Table 4), motivating the adoption, or at least consideration, of the alternative estimators we present. While it is possible to reduce this bias with higher quality covariates (e.g., [23]), covariates that correlate well with species-specific attributes may not be available. Note that the FIA uses a more detailed stratification, exceeding 100 different strata for the state of Oregon, and may provide lower relative efficiencies under PS estimation than we observed. For the large spatial domain we attained R a t i o ¯ ¯ close to one under the PS point estimator with values of 0.96 and 1.05 for the V ^ S O and V ^ D C , 1 estimators respectively (Table 3). All alternative estimators showed improvements for the small spatial domains across all sample size classes, i.e., they attained R E ¯ ¯ lower than one and R a t i o ¯ ¯ closer to one than the V ^ S R S estimator (Table 4).
Some of the estimators appear to have a large number of outlier points, e.g., the V ^ M A T , h e x and V ^ M A T , p a r estimators (Figure 7). Closer inspection of the spatial domains where these outliers manifested revealed very sparse populations, i.e., with many observations of attributes that are zero. The V ^ M A T estimators rely on a much smaller number of neighborhoods to compute a variance estimate relative to V ^ S O and in populations that have widely varying neighborhood-level estimates of variance, this may cause the observed erratic behavior. In addition, moving from the p a r to h e x neighborhood structure for the V ^ M A T estimator, which implies a reduction in the total number of neighborhoods, appears to exacerbate the issue. V ^ S O , although neighborhood-based, does not rely on disjoint groups of elements to form neighborhoods. Rather, every sample point in the population has its own neighborhood such that different neighborhoods possibly share members, resulting in a larger number of neighborhoods available to estimate the variance.

4.2. Implications for Environmental Sample Surveys

For generic inference surveys, post-stratification covariates are typically applied uniformly to all attributes of interest, since they are often optimized using only one or a few response variables in mind (e.g., VOL). Following this, an ideal variance estimator under PS will demonstrate improvements in cases where spatial dependence is strong or weak, or whether the post-stratification succeeds in removing the trend. These improvements should manifest in reductions of bias, captured by our R a t i o ¯ ¯ measure, and reductions in mean squared error relative to an alternative, captured by our R E ¯ measure. Our results indicated that V ^ S O , V ^ D C , 1 , and V ^ M A T , p a r demonstrated reliable reductions with respect to these measures under both HT and PS point estimation regimes and are leading candidates for robust alternative variance estimators. In particular, V ^ S O attained lower relative efficiencies for every species-specific attribute we considered, except for five attributes under PS point estimation in the large domain (Figure 6).
Treating field plots as a population likely reduces the strength of spatial correlation between population units relative to the actual population considered in the FIA and other forest inventories. As well, systematic subsamples collected from this discontinuous population contain even less spatial correlation, as field plots selected in the sample are separated by a large distance (see Figure A1 (Appendix A)). On the other hand, subsampling of field plots does not remove potential large-scale trends in the attributes, which alone can affect the performance of variance estimators [6,22]. This begs the question; how will the variance estimators behave in actual applications? We demonstrate that several alternative variance estimators exhibit improvements of the standard estimator for a wide array of attributes and domains in this spatially discontinuous population. To the extent that population attributes are similar in trend and spatial correlation, similar reductions in bias and relative efficiency can be expected using these alternatives, especially when considering our study among the broader literature on the topic of systematic variance estimation that utilizes a simulation approach [3,6]. Selecting an alternative variance estimator for real applications requires a degree of risk, since obtaining census populations required for a formal assessment of bias and relative efficiency is an impossible task. However, one can mitigate this risk by considering alternatives that demonstrate utility under a broad of conditions and studies.
Our assessment was conducted using a strictly systematic sampling design from a finite population. However, many environmental sampling surveys employ other designs, such as quasi-systematic sampling designs, utilize estimators derived for continuous populations (e.g., [24]), or both. For quasi-systematic sampling designs, similar positive bias issues with respect to variance estimation are apparent [3]. While V ^ S O provides clear routes forward for these designs, all other estimators we considered except for V ^ S R S rely on regular geometries that exist between sample positions from strictly systematic designs. Furthermore, we considered only equi-probable systematic sampling, which need not be the case for systematic designs (e.g., [25], pp. 332–335). Further research could examine modifications to the proposed estimators to consider quasi-systematic, varying probability designs or designs that combine both concepts (e.g., [26]) while considering more complex point estimation regimes such as generalized regression estimation and post-stratification.
Finally, an alternative variance estimator must also be computationally feasible to implement for many potential attributes. While the V ^ S O estimator provides reliable improvements to V ^ S R S we found that there is a higher computational demand using this estimator relative to the alternatives. The estimator, as implemented in spsurvey, requires a singular value decomposition of a matrix of dimension n x n to optimize the weights in (4) which has a computational complexity of O ( n 3 ) in this case [27]. For example, producing variance estimates for samples with large sizes may be infeasible in some cases. However, the reliable performance of this estimator, and its flexibility for other types of sampling designs, including unequal probability designs, suggests that the optimization or modification for very large sample sizes is a research priority. All other alternative estimators required a much lower computational burden, and we expect their operational implementations for large sample sizes to be more straightforward.

5. Conclusions

We assessed six variance estimators for systematic sampling designs under HT and PS point estimation using a large spatially balanced environmental survey as a population. In general, all alternative estimators demonstrated improvements over an estimator that assumes simple random sampling, i.e., the “standard” estimator. An estimator presented by Stevens and Olsen [3] provided the clearest improvements, with mean relative bias for a large spatial domain of 1.01 and 0.96 for HT and PS, respectively. Relative efficiencies were smaller for this estimator in the large spatial domain, with values ranging between 0.65 and 0.82 for HT and PS, respectively. For small spatial domains, this estimator was the only alternative to attain relative efficiencies less than one for all attributes we considered. We found that the computational burden of this estimator is a potential hurdle for samples of large size, and have identified an estimator presented by D’Orazio [12] as an additional promising candidate that requires a lower computational effort.

Author Contributions

Writing—original draft preparation, conceptualization, formal analysis, B.F.; methodology, B.F. and V.J.M.; writing—review and editing V.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by an appointment to the United States Forest Service (USFS) Research Participation Program administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and the U.S. Department of Agriculture (USDA). ORISE is managed by ORAU under DOE contract number DE-SC0014664. All opinions expressed in this paper are the author’s and do not necessarily reflect the policies and views of USDA, DOE, or ORAU/ORISE.

Data Availability Statement

The exact coordinates of the data presented are not publicly available due to the confidentiality of FIA plot locations used in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Figure A1. Changes between Geary’s C calculated with all field plots and the mean Geary’s C calculated using all subsamples collected from a systematic sample of interval a = 5 . Note that in some cases, the Geary’s C between attributes and HT and PS residuals can be very similar, resulting in overlapping lines in the figure.
Figure A1. Changes between Geary’s C calculated with all field plots and the mean Geary’s C calculated using all subsamples collected from a systematic sample of interval a = 5 . Note that in some cases, the Geary’s C between attributes and HT and PS residuals can be very similar, resulting in overlapping lines in the figure.
Forests 12 00772 g0a1

References

  1. Reams, G.A.; Smith, W.D.; Hansen, M.H.; Bechtold, W.A.; Roesch, F.A.; Moisen, G.G. The Forest Inventory and Analysis Sampling Frame; Gen. Tech. Rep. SRS-80; US Department of Agriculture, Forest Service, Southern Research Station 21–36: Asheville, NC, USA, 2005.
  2. Särndal, C.-E.; Swensson, B.; Wretman, J. Model Assisted Survey Sampling; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003. [Google Scholar]
  3. Stevens, D.L.; Olsen, A.R. Variance Estimation for Spatially Balanced Samples of Environmental Resources. Environmetrics 2003, 14, 593–610. [Google Scholar] [CrossRef]
  4. Scott, C.T.; Bechtold, W.A.; Reams, G.A.; Smith, W.D.; Westfall, J.A.; Hansen, M.H.; Moisen, G.G. Sample-Based Estimators Used by the Forest Inventory and Analysis National Information Management System; Gen. Tech. Rep. SRS-80; US Department of Agriculture, Forest Service, Southern Research Station: Asheville, NC, USA, 2005; pp. 53–77.
  5. Aune-Lundberg, L.; Strand, G.-H. Comparison of Variance Estimation Methods for Use with Two-Dimensional Systematic Sampling of Land Use/Land Cover Data. Environ. Model. Softw. 2014, 61, 87–97. [Google Scholar] [CrossRef] [Green Version]
  6. Magnussen, S.; McRoberts, R.E.; Breidenbach, J.; Nord-Larsen, T.; Ståhl, G.; Fehrmann, L.; Schnell, S. Comparison of Estimators of Variance for Forest Inventories with Systematic Sampling-Results from Artificial Populations. For. Ecosyst. 2020, 7, 1–19. [Google Scholar] [CrossRef] [Green Version]
  7. Dunn, R.; Harrison, A.R. Two-Dimensional Systematic Sampling of Land Use. J. R. Stat. Soc. Ser. C 1993, 42, 585–601. [Google Scholar] [CrossRef]
  8. Strand, G.-H. A Study of Variance Estimation Methods for Systematic Spatial Sampling. Spat. Stat. 2017, 21, 226–240. [Google Scholar] [CrossRef]
  9. McConville, K.S.; Moisen, G.G.; Frescino, T.S. A Tutorial on Model-Assisted Estimation with Application to Forest Inventory. Forests 2020, 11, 244. [Google Scholar] [CrossRef] [Green Version]
  10. Cochran, W.G. Sampling Techniques; Wiley: Hoboken, NJ, USA, 1977. [Google Scholar]
  11. Matérn, B. Spatial Variation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 1986; Volume 36. [Google Scholar]
  12. D’Orazio, M. Estimating the Variance of the Sample Mean in Two-Dimensional Systematic Sampling. J. Agric. Biol. Environ. Stat. 2003, 8, 280. [Google Scholar] [CrossRef]
  13. Opsomer, J.D.; Francisco-Fernández, M.; Li, X. Model-Based Non-Parametric Variance Estimation for Systematic Sampling. Scand. J. Stat. 2012, 39, 528–542. [Google Scholar] [CrossRef]
  14. Palmer, M.; Kuegler, O.; Christensen, G. Oregon’s Forest Resources, 2006–2015: Ten-Year Forest Inventory and Analysis Report; Gen. Tech. Rep. Pnw-Gtr-971; US Department of Agriculture, Forest Service, Pacific Northwest Research Station: Portland, OR, USA, 2018; Volume 971, 54p.
  15. Wang, Y. Volume Estimator Library Equations 2019. Available online: https://www.fs.fed.us/forestmanagement/products/measurement/volume/nvel/index.php (accessed on 15 April 2020).
  16. Blackard, J.A.; Finco, M.V.; Helmer, E.H.; Holden, G.R.; Hoppus, M.L.; Jacobs, D.M.; Lister, A.J.; Moisen, G.G.; Nelson, M.D.; Riemann, R.; et al. Mapping U.S. Forest Biomass Using Nationwide Forest Inventory Data and Moderate Resolution Information. Remote Sens. Environ. 2008, 112, 1658–1677. [Google Scholar] [CrossRef]
  17. Omernik, J.M.; Griffith, G.E. Ecoregions of the Conterminous United States: Evolution of a Hierarchical Spatial Framework. Environ. Manag. 2014, 54, 1249–1266. [Google Scholar] [CrossRef]
  18. Särndal, C.E. Implications of Survey Design for Generalized Regression Estimation of Linear Functions. J. Stat. Plan. Inference 1982, 7, 155–170. [Google Scholar] [CrossRef]
  19. Montanari, G.E. Mixed Model Regression Estimation of a Spatial Total in the Continuous Plane Paradigm. J. Stat. Plan. Inference 2017, 185, 1–14. [Google Scholar] [CrossRef]
  20. Räty, M.; Kuronen, M.; Myllymäki, M.; Kangas, A.; Mäkisara, K.; Heikkinen, J. Comparison of the Local Pivotal Method and Systematic Sampling for National Forest Inventories. For. Ecosyst. 2020, 7, 1–17. [Google Scholar] [CrossRef]
  21. Kincaid, T.; Olsen, T.; Stevens, D.; Platt, C.; White, D.; Remington, R.; Kincaid, M.T. Package ‘Spsurvey’; 2015; Available online: https://cran.r-project.org/web/packages/spsurvey/index.html (accessed on 15 April 2020).
  22. Wolter, K.M. An Investigation of Some Estimators of Variance for Systematic Sampling. J. Am. Stat. Assoc. 1984, 79, 781–790. [Google Scholar] [CrossRef]
  23. Babcock, C.; Finley, A.O.; Gregoire, T.G.; Andersen, H.-E. Remote Sensing to Reduce the Effects of Spatial Autocorrelation on Design-Based Inference for Forest Inventory Using Systematic Samples. arXiv 2018, arXiv:1810.08588. [Google Scholar]
  24. Cordy, C.B. An Extension of the Horvitz—Thompson Theorem to Point Sampling from a Continuous Universe. Stat. Probab. Lett. 1993, 18, 353–362. [Google Scholar] [CrossRef]
  25. Wolter, K. Introduction to Variance Estimation; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2003; ISBN 978-0-387-40622-0. [Google Scholar]
  26. Stevens, D.L. Variable Density Grid-Based Sampling Designs for Continuous Spatial Populations. Env. Off. J. Int. Env. Soc. 1997, 8, 167–195. [Google Scholar] [CrossRef]
  27. Vasudevan, V.; Ramakrishna, M. A Hierarchical Singular Value Decomposition Algorithm for Low Rank Matrices. arXiv 2017, arXiv:1710.02812. [Google Scholar]
Figure 1. Map of the FIA field plots and subdomain boundaries for the state of Oregon, USA. Note that the publicly available FIA plot positions are used for display and to obtain the post-strata variables for this figure. Points are colored by their forested/non-forested status and canopy cover class. The east-west delineation used as an additional post-stratum is indicated by the red line.
Figure 1. Map of the FIA field plots and subdomain boundaries for the state of Oregon, USA. Note that the publicly available FIA plot positions are used for display and to obtain the post-strata variables for this figure. Points are colored by their forested/non-forested status and canopy cover class. The east-west delineation used as an additional post-stratum is indicated by the red line.
Forests 12 00772 g001
Figure 2. Representation of the field plots positioned in the discrete space. One systematic sample using a sampling interval of a = 5 is indicated by the black circles and unsampled plots are indicated by the gray circles. The standardized discrete positions vectors, t h i , are indicated in the parentheticals, and the neighborhood structures (Section 2.3) for D M A T , h e x ( ( 4 , 2 ) ) and D M A T , p a r ( ( 4 , 2 ) ) are given by the dotted black line and dashed red line, respectively. Note that D M A T , h e x includes the center point in the set as well as the outer points.
Figure 2. Representation of the field plots positioned in the discrete space. One systematic sample using a sampling interval of a = 5 is indicated by the black circles and unsampled plots are indicated by the gray circles. The standardized discrete positions vectors, t h i , are indicated in the parentheticals, and the neighborhood structures (Section 2.3) for D M A T , h e x ( ( 4 , 2 ) ) and D M A T , p a r ( ( 4 , 2 ) ) are given by the dotted black line and dashed red line, respectively. Note that D M A T , h e x includes the center point in the set as well as the outer points.
Forests 12 00772 g002
Figure 3. Relative efficiency of the post-stratified point estimator to the Horvitz-Thompson estimator for each species-specific attribute. Species are ordered in ascending order from left to right by their relative efficiencies for the VOL attribute.
Figure 3. Relative efficiency of the post-stratified point estimator to the Horvitz-Thompson estimator for each species-specific attribute. Species are ordered in ascending order from left to right by their relative efficiencies for the VOL attribute.
Forests 12 00772 g003
Figure 4. Geary’s C of the residuals attained from the post-stratified estimator and the Horvitz-Thompson estimator for each species-specific attribute. Geary’s C is computed using all available field plots. Species are ordered by their relative efficiencies for the VOL attribute.
Figure 4. Geary’s C of the residuals attained from the post-stratified estimator and the Horvitz-Thompson estimator for each species-specific attribute. Geary’s C is computed using all available field plots. Species are ordered by their relative efficiencies for the VOL attribute.
Forests 12 00772 g004
Figure 5. R a t i o ¯ for all species-specific attributes considered for the large domain under HT and PS point estimators for all variance estimators. Each point represents R a t i o ¯ for one species-specific attribute under a specific point estimator and variance estimator.
Figure 5. R a t i o ¯ for all species-specific attributes considered for the large domain under HT and PS point estimators for all variance estimators. Each point represents R a t i o ¯ for one species-specific attribute under a specific point estimator and variance estimator.
Forests 12 00772 g005
Figure 6. Relative efficiencies of the variance estimators with respect to the SRS estimator under HT and PS point estimation. Each point represents R E attained for one species-specific attribute under a specific point estimator and variance estimator.
Figure 6. Relative efficiencies of the variance estimators with respect to the SRS estimator under HT and PS point estimation. Each point represents R E attained for one species-specific attribute under a specific point estimator and variance estimator.
Forests 12 00772 g006
Figure 7. R a t i o ¯ for all estimators disaggregated by the small domain size sample class under the HT and PS point estimators.
Figure 7. R a t i o ¯ for all estimators disaggregated by the small domain size sample class under the HT and PS point estimators.
Forests 12 00772 g007
Figure 8. Relative efficiencies of alternative estimators with respect to V ^ S R S for each species-specific attribute within each small domain disaggregated by the small domain sample size class under the HT and PS point estimators.
Figure 8. Relative efficiencies of alternative estimators with respect to V ^ S R S for each species-specific attribute within each small domain disaggregated by the small domain sample size class under the HT and PS point estimators.
Forests 12 00772 g008
Table 1. Population means of the 18 species-specific attributes considered in this study. The proportion of times the species appeared in the N = 10,537 field plots is given in the p 0 column.
Table 1. Population means of the 18 species-specific attributes considered in this study. The proportion of times the species appeared in the N = 10,537 field plots is given in the p 0 column.
Name μ V O L μ B A μ D E N p 0
Abies amabilis2.050.220.560.02
Abies concolor3.850.420.820.04
Abies grandis5.050.531.140.07
Abies procera1.380.120.180.01
Abies shastensis1.440.150.200.01
Acer macrophyllum1.770.170.440.04
Alnus rubra3.130.340.900.05
Arbutus menziesii1.130.140.480.03
Calocedrus decurrens1.140.160.290.03
Larix occidentalis0.940.100.170.03
Picea engelmannii0.960.090.160.02
Picea sitchensis1.400.120.130.01
Pinus contorta2.950.452.080.07
Pinus ponderosa10.841.442.840.15
Pseudotsuga menziesii64.595.748.460.28
Thuja plicata2.430.260.330.04
Tsuga heterophylla9.510.841.580.08
Tsuga mertensiana3.360.400.800.02
Table 2. Notation, name, and configurations of the variance estimators used in this study. A canonical study that uses the estimator is included if available.
Table 2. Notation, name, and configurations of the variance estimators used in this study. A canonical study that uses the estimator is included if available.
NotationNameConfigurationsReference
V ^ S R S Simple random sampling estimator----
V ^ S O , h Stevens and Olsen estimator u { 12 } —number of nearest neighbors[3]
V ^ M A T , c Matern estimator c { h e x ,   p a r } —neighborhood contrast structure[11]
V ^ D C , o D’Orazio (2003)–Geary’s C o { 1 , 2 } —neighborhood order[12]
Table 3. Performance summaries of all variance estimators for the large spatial domain under HT and PS estimation.
Table 3. Performance summaries of all variance estimators for the large spatial domain under HT and PS estimation.
Variance EstimatorPoint Estimator R a t i o ¯ ¯ R E ¯
V ^ S R S HT1.23--
PS1.11--
V ^ S O HT1.010.65
PS0.960.82
V ^ M A T , h e x HT0.961.16
PS0.901.51
V ^ M A T , p a r HT1.100.92
PS1.061.12
V ^ D C , 1 HT1.080.77
PS1.050.93
V ^ D C , 2 HT1.120.84
PS1.070.99
Table 4. R a t i o ¯ ¯ and R E ¯ disaggregated by the small domain sample size class for all estimators under the HT and PS estimators.
Table 4. R a t i o ¯ ¯ and R E ¯ disaggregated by the small domain sample size class for all estimators under the HT and PS estimators.
Variance EstimatorPoint Estimator R a t i o ¯ ¯ R E ¯
n c 10 10 < n c 20 n c > 20 n c 10 10 < n c 20 n c > 20
V ^ S R S HT1.881.631.54------
PS1.781.481.47------
V ^ S O HT1.621.361.310.680.670.68
PS1.571.271.270.730.720.72
V ^ M A T , h e x HT1.341.071.240.790.711.28
PS1.301.001.210.850.781.39
V ^ M A T , p a r HT1.671.411.420.970.971.03
PS1.641.321.401.061.061.07
V ^ D C , 1 HT1.641.391.420.710.740.85
PS1.601.301.390.770.810.90
V ^ D C , 2 HT1.661.451.520.740.831.01
PS1.621.361.480.790.891.05
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Frank, B.; Monleon, V.J. Comparison of Variance Estimators for Systematic Environmental Sample Surveys: Considerations for Post-Stratified Estimation. Forests 2021, 12, 772. https://doi.org/10.3390/f12060772

AMA Style

Frank B, Monleon VJ. Comparison of Variance Estimators for Systematic Environmental Sample Surveys: Considerations for Post-Stratified Estimation. Forests. 2021; 12(6):772. https://doi.org/10.3390/f12060772

Chicago/Turabian Style

Frank, Bryce, and Vicente J. Monleon. 2021. "Comparison of Variance Estimators for Systematic Environmental Sample Surveys: Considerations for Post-Stratified Estimation" Forests 12, no. 6: 772. https://doi.org/10.3390/f12060772

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop