Comparison of Variance Estimators for Systematic Environmental Sample Surveys: Considerations for Post-Stratiﬁed Estimation

: The estimation of the sampling variance of point estimators under two-dimensional systematic sampling designs remains a challenge, and several alternative variance estimators have been proposed in the past few decades. In this work, we compared six alternative variance estimators under Horvitz-Thompson (HT) and post-stratiﬁcation (PS) point estimation regimes. We subsampled a multitude of species-speciﬁc forest attributes from a large, spatially balanced national forest inventory to compare the variance estimators. A variance estimator that assumes a simple random sampling design exhibited positive relative bias under both HT and PS point estimation regimes ranging between 1.23 to 1.88 and 1.11 to 1.78 for HT and PS, respectively. Alternative estimators reduced this positive bias with relative biases ranging between 1.01 to 1.66 and 0.90 to 1.64 for HT and PS, respectively. The alternative estimators generally obtained improved efﬁciencies under both HT and PS, with relative efﬁciency values ranging between 0.68 to 1.28 and 0.68 to 1.39, respectively. We identiﬁed two estimators as promising alternatives that provide clear improvements over the simple random sampling estimator for a wide variety of attributes and under HT and PS estimation regimes.


Introduction
Frequently, environmental sample surveys utilize systematic or quasi-systematic sampling designs to estimate parameters of interest. The network of permanent sample plots that is part of the United States Forest Inventory and Analysis (FIA) program is one such case [1]. The FIA program implements a complex survey that provides estimates of multiple forest attributes across a wide range of spatial scales. These estimates are used in a variety of applications, including forest health monitoring, reporting on the current status and change of forested environments, and landscape-level forest management planning. Additionally, the FIA program serves a broad base of end-users, including members of the public and federal and state forest management programs, among many others.
The FIA program implements a spatially balanced, quasi-systematic sampling design [1]. While these sampling designs have been shown to provide more precise estimates than simple random sampling designs in many contexts, estimating the precision of these estimators remains a challenge. Design-based estimators of the sampling variance are either intractable, as in the case of strictly systematic designs ( [2], p. 83), or very unstable due to their very small pairwise inclusion probabilities, as in the case of quasi-systematic sampling designs [3]. Typically, variance estimators that assume the data were collected with a simple random sampling design are employed ( [4], p. 54), obscuring important efficiency gains resulting from these designs. This practice is often justified because it is a conservative estimator, or because it would be appropriate if the population units were in random order ( [1], pp. [25][26]. However, many studies have demonstrated that such estimators may significantly overestimate the true variance [5,6]. Biased estimators of sampling variance can result in a misallocation of resources, as a larger sample size than is necessary to meet some minimum desired precision may be suggested using an inflated variance estimator. Confidence intervals produced using these estimators will have coverage probabilities that likely exceed the nominal rate. Many studies have examined variance estimation for systematic sampling in the context of environmental sample surveys. These studies typically address the problem of variance estimation under the Horvitz-Thompson (HT) point estimator. Generally, these studies conclude that a variance estimator that assumes simple random sampling, i.e., the "standard" variance estimator, will tend to over-estimate the sampling variance, while proposing or comparing alternative estimators that, to some degree, have less positive bias than the standard estimator [5][6][7][8]. As a result of these studies, several alternative estimators have arisen as promising reliable replacements for the standard estimator.
However, many environmental sample surveys rely on a post-stratification (PS) point estimator instead of the HT estimator. PS estimators can provide valuable increases in efficiency with respect to HT estimation [9], but they may also affect the relative performance of the standard estimator of the variance. The assumption that the population units are in random order, which is often cited to justify the standard estimator of the variance ( [1], pp. 25-26; [10], p. 212), does not hold if there is a global trend in the population. In this situation, the performance of the standard estimator of the variance degrades rapidly, with the overestimation of the design variance increasing as the strength of the trend increases [6]. In fact, the sampling variability in a systematic design arises from the variability in local neighborhoods, an observation that has led to the development of several alternative estimators [3,[11][12][13]. Post-stratification, which allows for different means in each stratum, may reduce the effect of a global trend in the standard estimator of the variance and improve its performance. However, to our knowledge, very little attention has been paid to the assessment of systematic variance estimators under PS estimation.
The task of variance estimation expands in complexity when analysts are faced with a "generic inference" survey. Generic inference surveys, such as the FIA, usually rely on a single post-stratification to estimate many attributes. The post-stratification is constructed to be most effective for a small set of variables, typically total volume or biomass. Thus, it may not be optimal for most variables and, as it may not remove the trend very effectively, it may not reduce the design bias of the standard estimator of the variance. Therefore, alternative estimators of the variance should be stable for a wide array of conditions, i.e., in situations where substantial trend, spatial correlation in the residuals, or both, exist. In support of this objective, we used a spatially balanced sample from the FIA program that provides a large swath of potential spatial domains and attributes with varying trends and spatial correlation structures upon which we compare and recommend potential alternative estimators.
The objective of this study was to examine six systematic variance estimators proposed in the sampling literature using a spatially balanced forest survey sample from the state of Oregon, USA. In this paper, we consider variance estimation for the PS and HT point estimators for a wide variety of forest attributes attained from a spatially balanced sample from the FIA program. We consider two specific cases. First, the estimators will be examined in a large domain setting, considering the mean basal areas, densities and volumes of 18 different tree species as attributes of interest. Second, the estimators will be examined across a number of smaller spatial domains within this large domain for the same set of forest attributes. This will provide an assessment of the proposed variance estimators in a generic inference setting, and recommendations for alternative estimators will follow from these results.

Study Area
The study area is the state of Oregon, located in the northwestern United States. Oregon contains a multitude of land cover types, including densely populated urban areas, agricultural land, forest land, deserts, and alpine areas. A recent study indicated that 48.4% of the Oregon land base is forested, of which 79.8% is timberland, i.e., capable of producing at least 1.4 m 3 per hectare per year of industrial wood, and not withdrawn from utilization for timber production by regulation [14]. The remaining share are reserved forests, such as wilderness areas, or land incapable of producing least 1.4 m 3 per hectare per year. Oregon has a diverse set of tree species, with the areas west of the Cascade mountain range dominated by Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco), western hemlock (Tsuga heterophylla (Raf.) Sarg), and Sitka spruce (Picea stichensis (Bong.) Carrière). Areas east of the Cascade Range dominated by ponderosa pine (Pinus ponderosa Lawson & C. Lawson), lodgepole pine (Pinus contorta Douglas ex Loudon), and western juniper (Juniperus occidentalis Hook. var. occidentalis). While coniferous species such as those stated are the most common, a number of deciduous species are present as well, including red alder (Alnus rubra (Bong.)), bigleaf maple (Acer macrophyllum Pursh), and quaking aspen (Populus tremuloides Michx.), among many others.
We used a set of 10,537 inventory plots collected by the FIA program between 2009 and 2018. The FIA in Oregon uses a 10-year panel rotation, and this set represents all ten panels, depicted in Figure 1. Each plot is a cluster of four fixed-radius, nested subplots. At each subplot, trees between 12.70 cm and a "breakpoint" diameter are measured on a circular plot of radius 7.32 m and trees larger than the breakpoint diameter are measured on a circular plot of radius 17.95 m. The breakpoint diameter is either 60.96 cm or 76.20 cm depending on whether the plot is east or west of the Cascade Range crest, respectively. For each tree, diameter at breast height (DBH), total height, and species are recorded, among other attributes. Cubic volumes for each tree, including top and stump, are predicted using the National Volume Estimator Library (NVEL) [15]. estimators in a generic inference setting, and recommendations for alternative estimators will follow from these results.

Study Area
The study area is the state of Oregon, located in the northwestern United States. Oregon contains a multitude of land cover types, including densely populated urban areas, agricultural land, forest land, deserts, and alpine areas. A recent study indicated that 48.4% of the Oregon land base is forested, of which 79.8% is timberland, i.e., capable of producing at least 1.4 m 3 per hectare per year of industrial wood, and not withdrawn from utilization for timber production by regulation [14]. The remaining share are reserved forests, such as wilderness areas, or land incapable of producing least 1.4 m 3 per hectare per year. Oregon has a diverse set of tree species, with the areas west of the Cascade mountain range dominated by Douglas-fir (Pseudotsuga menziesii (Mirb.) Franco), western hemlock (Tsuga heterophylla (Raf.) Sarg), and Sitka spruce (Picea stichensis (Bong.) Carrière). Areas east of the Cascade Range dominated by ponderosa pine (Pinus ponderosa Lawson & C. Lawson), lodgepole pine (Pinus contorta Douglas ex Loudon), and western juniper (Juniperus occidentalis Hook. var. occidentalis). While coniferous species such as those stated are the most common, a number of deciduous species are present as well, including red alder (Alnus rubra (Bong.)), bigleaf maple (Acer macrophyllum Pursh), and quaking aspen (Populus tremuloides Michx.), among many others.
We used a set of 10,537 inventory plots collected by the FIA program between 2009 and 2018. The FIA in Oregon uses a 10-year panel rotation, and this set represents all ten panels, depicted in Figure 1. Each plot is a cluster of four fixed-radius, nested subplots. At each subplot, trees between 12.70 cm and a "breakpoint" diameter are measured on a circular plot of radius 7.32 m and trees larger than the breakpoint diameter are measured on a circular plot of radius 17.95 m. The breakpoint diameter is either 60.96 cm or 76.20 cm depending on whether the plot is east or west of the Cascade Range crest, respectively. For each tree, diameter at breast height (DBH), total height, and species are recorded, among other attributes. Cubic volumes for each tree, including top and stump, are predicted using the National Volume Estimator Library (NVEL) [15].

Sampling Design, Forest Attributes and Domains
The network of field plot locations for the FIA in Oregon used in this study arise from a random tessellated sampling design [1]. Plot centers were located using a compact hexagonal tessellation with hexagons that are approximately 2184 hectares in size, such that one field plot exists in each hexagon. The plot centers were located using a random azimuth and distance from the hexagon center. In some cases, plots from previous inventory systems were already present, and the locations of those plots were assigned to the hexagon rather than allocating a new field plot. For each field plot we calculated the stem volume (VOL, m 3 ha −1 ), basal area (BA, m 2 ha −1 ), and stem density (DEN, stems ha −1 ) for all species combined and for the species that were in the 40th percentile of the total volume in Oregon or greater.
Given this set of 10,537 field plots and their corresponding attributes, we created a population upon which variance estimators could be assessed by assigning each field plot to a position in a discrete two-dimensional space. This discrete space consists of a set of N = 10, 537 positions upon which systematic sampling can be conducted. Population means of each species-specific attribute considered are given in Table 1, along with a proportion representing the number of times the species appeared in a plot divided by N. Table 1. Population means of the 18 species-specific attributes considered in this study. The proportion of times the species appeared in the N = 10, 537 field plots is given in the p 0 column. To subsample this set of N positions, we leveraged the hexagonal tessellation by assigning each field plot to its coincident hexagon in the original sample. The centroids of this set of N hexagons, with forest attributes calculated from the field plots, represents the population upon which systematic sampling is conducted. To retain a hexagonal structure of any systematic sample produced from this population, every ath population unit along the horizontal, diagonal, and vertical axes of the hexagonal grid were included in the sample. Let t hi = (t hi1 , t hi2 ) represent the row vector of discrete spatial coordinates of the ith plot in the hth post-stratum. Figure 2 displays one possible systematic sample from a hypothetical population in the discrete space with the discrete vectors t hi displayed for one systematic sample. Repeating this procedure for every possible systematic sample provides K = a 2 samples. We used a sampling interval of a = 5 for the following assessments. For the remainder of the analysis, the discrete positions t hi were treated as the true positions of the field plots.
of spatial correlation between the observations in the sample as well as large-scale trend in a given attribute. By spreading the sample by a distance corresponding to the interval , the spatial correlation will likely decrease, and the observations may become independent. Therefore, the advantage of alternative estimators may not be apparent. However, the density of the sample does not affect trends in the population, and even a less dense sample would allow evaluation of the effect of trends and post-stratification on the performance of the estimators. The FIA program provides standard reports on many forest attributes for a wide range of domains. Domains are defined as subpopulations of trees that meet certain criteria that serve to define parameters of interest. Examples of domains include spatial domains such as counties, and species domains, such as the set of trees of a particular species. To assess the performance of the estimators when subpopulations are of interest, we defined several domains over which we conducted the analysis: a set of two spatial domains and a set of eighteen species domains. The first spatial domain is the area within the Oregon state boundary, which we will refer to as the "large domain". In addition to this we create domains from the set of Oregon counties which we refer to as "small domains". The performance of the standard estimator of the variance is affected by the degree of spatial correlation between the observations in the sample as well as large-scale trend in a given attribute. By spreading the sample by a distance corresponding to the interval a, the spatial correlation will likely decrease, and the observations may become independent. Therefore, the advantage of alternative estimators may not be apparent. However, the density of the sample does not affect trends in the population, and even a less dense sample would allow evaluation of the effect of trends and post-stratification on the performance of the estimators.
The FIA program provides standard reports on many forest attributes for a wide range of domains. Domains are defined as subpopulations of trees that meet certain criteria that serve to define parameters of interest. Examples of domains include spatial domains such as counties, and species domains, such as the set of trees of a particular species. To assess the performance of the estimators when subpopulations are of interest, we defined several domains over which we conducted the analysis: a set of two spatial domains and a set of eighteen species domains. The first spatial domain is the area within the Oregon state boundary, which we will refer to as the "large domain". In addition to this we create domains from the set of Oregon counties which we refer to as "small domains". Two adjacent counties (Sherman and Gilliam, east of small domain 65, Figure 1) were removed from consideration as a small domain because no trees were present on any plot. Note that the plots in these counties were still included in the large domain.
The FIA program uses post-strata to enhance the precision of point estimates. FIA uses a large number of post-strata that contain information about forest structure, property ownership, land use and other variables. We used a coarser set of post-strata that mimic the forest structure covariates used operationally by the FIA in Oregon for the purposes of this study. First, we used a forest-non-forest mask produced for 2016 provided by the United States Forest Service Geospatial Technology and Applications Center using the methodology described in [16]. Second, we used three canopy cover classes, corresponding to 0-33%, 33-66% and 66-100% canopy cover derived from a canopy cover map published by the Multi-Resolution Land Characteristics Consortium (http://www.mrlc.gov/ (accessed on 17 August 2020)). Finally, we used an east-west delineation of Oregon derived from the level three ecoregions used by the Environmental Protection Agency (EPA) [17]. The combinations of the forest-non-forest classes, canopy cover classes, and east-west classes produce the twelve post-strata used in this study and are depicted in Figure 1.

Point Estimators
Observations of plot-level species-specific attributes were calculated by computing a weighted sum of trees that meet the criteria for a given domain. These weights arise from the different sizes used in the nested plot design ( [4], p. 53). Let z(t hi ) represent the observation of the ith plot-level forest attribute in the hth post-strata at the two-dimensional position vector t hi . Let S k refer to the set of index pairs h and i that correspond to the hth stratum and ith field plot included in the kth sample, i.e., each element of S k is an index pair. We obtain an unbiased estimate of the population mean via the post-stratification estimator under an equal inclusion probability design: where z hk is the sample mean of the observations in the hth post-stratum, W h = N h N is the weight of the hth post-stratum where N h is the number of population elements in the hth post-stratum and n k is the number of elements in the kth sample. This differs from actual practices in the FIA, where W h is determined from wall-to-wall covariate rasters of fine resolution, rather than field plots [4]. The second summation is over all stratum and plot index pairs in the sample. The error termê(t hi ) = z(t hi ) − z hk explains the deviation of z(t hi ) from the estimate of the corresponding post-stratum-level mean. Using the above estimator, we can obtain the Horvitz-Thompson estimator for equal inclusion probability designs by setting H = 1 as a special case. We obtain: where z k is the sample mean of the n k in the sample. Note that the second term is equal to zero and we obtain the sample mean as the HT estimator under equal inclusion probabilities. For brevity, we reserve the notationμ k to refer to either the HT or PS estimator.

Variance Estimators
The sampling variance of the PS point estimator under simple random sampling designs is typically approximated using a first order Taylor expansion ( [2], p. 235). We obtain: where e(t hi ) = z(t hi ) − µ h is the error between z(t hi ) and its corresponding stratum mean µ h and f pc k = N−n k N is the finite population correction factor for the kth sample. Note that, for the HT estimator we obtain e(t hi ) = z(t hi ) − µ. Many variance estimators for PS or, more generally, model-assisted estimators consider only this residual term (e.g., [18]). The standard estimator of the systematic variance that assumes the sample was collected with simple random sampling is one such estimator. We obtain the estimator: In the case of systematic sampling designs, pairwise inclusion probabilities for many population units will be zero, precluding the possibility of unbiased estimation of AV(μ PS ). Under HT estimation, many alternatives have been proposed which typically involve squared differences between a local mean of neighboring points and the observation. To make these estimators consistent with PS estimation, we plug in the residualê(t hi ) to alternative estimators presented for HT variance estimation, which is an approach taken in several other studies [19,20].
Stevens and Olsen [3] proposed a neighborhood-based estimator for the use in the generalized random tessellated stratified (GRTS) sampling design that can be used in systematic designs. The estimator computes a weighted sum of squared errors over the neighborhood D SO,h (t hi ). The neighborhood is developed iteratively, first by including t hi itself plus the u nearest neighbors according to Euclidean distance from the element at t hi . Then any other points, t hj , that have t hi as a neighbor themselves are added to the neighborhood. Noting that the first-order inclusion probabilities for all units are a −2 , where a is the systematic sampling interval, we obtain the estimator: where w hi,h i is a plot-specific weight that decreases as the distance between t hi and t h i increases andê D SO,u (t hi ) = ∑ w hi,h i ê(t h i ). This distance weighting effect is similar to kriging, where residual pairs that are close together will be weighted more than pairs of residuals further away when computing the neighborhood-level variance. These weights are optimized by ensuring the sum ofê D SO,u (t hi ) over all plots adds up to the total error (refer to [3] for further information on this optimization procedure). We consider the special case where u = 12. Implementations forV SO were provided by the R package spsurvey [21]. Matérn ([11], pp. 120-122) describes an estimator for two-dimensional systematic sampling designs that relies on computing contrasts of units within non-overlapping neighborhoods as local variance estimates and aggregating these variance estimates over all neighborhoods to produce a final estimate. We considered two neighborhood configurations. First, we consider: which represents the nearest neighbors with respect to t hi and: which represents a parallelogram constructed from the neighbors of t hi and itself. This second configuration closely mimics the original Matérn estimator, which was developed for the case of rectangular grids. Both neighborhood specifications are displayed in Figure 2.
For each neighborhood, a contrast of the mean-centered residuals of the elements in the neighborhood is computed.
T , {h i } ∈ D MAT,c (t hi ) represent the vector of mean-centered residuals for the lth neighborhood using configuration c of the Matérn estimator (Table 2). Points in the neighborhood that fall outside of the study space are given a value of zero. For each neighborhood calculate the value: where n l indicates the number of sampled population units in the neighborhood and c l is a vector containing elements that are either −1 or 1. We use the vectors (1, 1, 1, −1, −1, −1) T and (1, 1, −1 − 1) T for the hexagonal and parallelogram neighborhood structures, respectively. The positive and negative elements of c l create two groups which each make a prediction for the mean residual in the neighborhood, and Equation (8) calculates a squared difference between these two predictions. This squared difference creates the basis for a neighborhood-level variance estimate. Using these values, we obtain the Matérn variance estimator:V where q par = 4 and q hex = 7 for the parallelogram and hexagonal neighborhood structures, respectively.
, 2}-neighborhood order [12] Adjustment-based estimators rely on applying an adjustment factor toV SRS that approximates the inter-cluster correlation coefficient. This approach is motivated by the fact that variance estimates usingV SRS in the case of systematic sampling will tend to have a larger positive bias in situations where the inter-cluster correlation, i.e., the correlation between pairs of possible systematic samples, is strongly positive or negative. One such estimator was proposed by D'Orazio [12] which relies on the Geary's C index. We consider two neighborhood structures. The first structure considers the six closest points to an element at t hi to be neighbors. This is equivalent to the D MAT,hex neighborhood. We also consider second order neighbors, i.e., those elements that are first order neighbors, as well as their first order neighbors. Noting this we obtain the expression for Geary's C: where δ hi,h i is an indicator variable that takes the value one if the element at t hi is considered a neighbor of the element at t h i . We obtain the D'Orazio's Geary's C estimator: The first condition of (11) ensures that Geary's C can be calculated. If not,V SRS is used without adjustment. For the purposes of reporting results, we calculate Geary's C using Equation (10) with all N field plots using the residuals obtained from the HT estimator. We refer to this quantity as C for the remainder of the text.

Performance Measures
Define a systematic sampling interval a such that every ath element along each axis is sampled. Sampling in this fashion for a hexagonal tessellation implies that there are a 2 = K Forests 2021, 12, 772 9 of 20 possible subsamples. Furthermore, all samples produced will retain a hexagonal structure ( Figure 2). Using all subsamples, we obtain the actual variance of the estimator, which we will refer to as the "true variance" for brevity: where the superscript (p) is used to refer to a domain-attribute combination indexed by p.
Although secondary to the objective of this study, comparing the performances of the point estimators HT and PS can be done using the relative efficiency of the true variances: We relied on two general categories of performance measures to assess the behavior of the proposed variance estimators. The first category assesses the performance of an estimator for a particular attribute in a particular domain. LetV SYS . We obtain the mean error of the ratio of the variance estimate to the systematic variance: which quantifies the mean deviation of the ratio of the variance estimate from one. Ratio V (p) k is equivalent to the design expectation of the quantityV . Equation (14) can be considered a relative measure of bias for a variance estimate for a particular domain-attribute combination. In addition to (14) we also produce the relative efficiencies of the variance estimators themselves with respect to the simple random sampling estimator,V SRS . In many environmental sampling surveysV SRS is considered the standard choice for a variance estimator under this design and thus assessing the alternatives with respect toV SRS can provide valuable information on potential efficiency gains. We obtain: where g * represents any estimator that is not the simple random sampling estimator and is the sum of squared errors for a given estimator over all K subsamples. The second category of performance measures consider the performance of an estimator across several domain-attribute combinations. We consider the mean of the mean ratio across all p domain-attribute combinations: while Ratio is a measure of relative bias, Ratio is simply an aggregation of these measures across several attributes and is not strictly a measure of bias itself. Instead, Ratio can be interpreted as a performance index over many estimates of the true variance for a particular estimator. Ratio values close to one indicate a tendency for an estimator to over-estimate the true variance across many domain-attribute combinations, and vice versa. In a similar manner, we consider the mean of the relative efficiencies across all P domain-attribute combinations and obtain: which is the mean relative efficiency for a given estimator over all P domain-attribute combinations.

Comparison of Point Estimators
We estimated three attributes (VOL, BA, and DEN) for each species considered in this analysis using the PS estimator and HT estimator. The relative efficiencies of these point estimators for each species-specific attribute are shown in Figure 3. For most speciesspecific attributes, the use of the post-stratification estimator provides a lower variance (i.e., a relative efficiency less than one) than the Horvitz-Thompson estimator, with relative efficiencies as low as 0.75 for the VOL attribute of Pseudotsuga menziesii. In some cases, the post-stratified estimator was not as efficient, e.g., for Abies grandis and Pinus contorta, among others, with relative efficiencies as high as 1.4. This suggests that the post-strata correlate well with species-specific attributes for some species, but not others. Note that species with larger relative efficiencies tended to have a much lower prevalence (Table 1). Additionally, we used a much coarser set of post-strata than is used in official reporting by the FIA, and relative efficiencies using this larger set of post-strata may be lower than what we obtained.    (1) and (2) for the same species-specific attributes, as well as different models for the trend. We summarize changes in this structure using the Geary's C correlation coefficient, C, in Figure 4. For all species-specific attributes we considered, C increased when going from the HT to the PS estimator. The effect is most extreme for Pseudotsuga menziesii and Pinus ponderosa attributes. A large change in C implies that the post-stratification variables explain either the linear trend or the spatial correlation structure of the residuals more than without (i.e., under HT estimation). Even under the PS estimator, however, C remains large, implying that some spatial dependence is present in the residual, or that the post-stratification variables do not completely accommodate the trend, or both. Large increases in C are present for Pseudotsuga menziesii, Pinus ponderosa, and Alnus rubra for all three attributes. Other species exhibited only moderate increases.   Geary's C of the residuals attained from the post-stratified estimator and the Horvitz-Thompson estimator for each species-specific attribute. Geary's C is computed using all available field plots. Species are ordered by their relative efficiencies for the VOL attribute.

Large Spatial Domain Assessment
For the large domain assessment, we produced variance estimators and their performance measures considering species-specific attributes. Figure 5 displays the Ratio for species-specific attributes for all variance estimators under HT and PS point estimation. For V SRS under the HT point estimator the degree of positive bias for attributes with small C values (i.e., stronger spatial dependence) is large. As this spatial dependence weakens, the degree of bias becomes less apparent. Under the PS point estimator, the range of C values decreases (Section 3.1) and the positive bias ofV SRS is reduced to some degree. For the alternative estimators under the HT estimator the positive bias observed forV SRS for small C attributes is clearly reduced, producing ratio estimates very close to one. For attributes with C values exceeding 0.7 this reduction is not as clear.
For the large domain assessment, we produced variance estimators and their performance measures considering species-specific attributes. Figure 5 displays the ̅̅̅̅̅̅̅̅ for species-specific attributes for all variance estimators under HT and PS point estimation. For ̂ under the HT point estimator the degree of positive bias for attributes with small values (i.e., stronger spatial dependence) is large. As this spatial dependence weakens, the degree of bias becomes less apparent. Under the PS point estimator, the range of values decreases (Section 3.1) and the positive bias of ̂ is reduced to some degree. For the alternative estimators under the HT estimator the positive bias observed for ̂ for small attributes is clearly reduced, producing ratio estimates very close to one. For attributes with values exceeding 0.7 this reduction is not as clear.     Table 3 summarizes the estimators across the P domain-attribute combinations. Under HT estimation theV SRS tended to demonstrate positive bias, with a Ratio value of 1.23. The positive bias for the alternative estimators under HT estimation is less, with Ratio values ranging between 0.96 to 1.12. The relative efficiencies for the alternative estimators were less than one except forV MAT,par , which suggests that these alternatives provide gains in efficiency under the HT estimator. Under the PS point estimator, theV SRS estimator demonstrates positive bias with a Ratio value of 1.11. The alternative estimators had smaller Ratio values, ranging between 0.90 to 1.08. The mean relative efficiencies under the PS point estimators were less than one with the exception ofV MAT,par andV MAT,hex , although these relative efficiencies are slightly higher than the analogous estimators under the HT point estimator.  Table 3 summarizes the estimators across the domain-attribute combinations. Under HT estimation the ̂ tended to demonstrate positive bias, with a ̿̿̿̿̿̿̿̿ value of 1.23. The positive bias for the alternative estimators under HT estimation is less, with ̿̿̿̿̿̿̿̿ values ranging between 0.96 to 1.12. The relative efficiencies for the alternative estimators were less than one except for ̂, , which suggests that these alternatives provide gains in efficiency under the HT estimator. Under the PS point estimator, the ̂ estimator demonstrates positive bias with a ̿̿̿̿̿̿̿̿ value of 1.11. The alternative estimators had smaller ̿̿̿̿̿̿̿̿ values, ranging between 0.90 to 1.08. The mean relative efficiencies under the PS point estimators were less than one with the exception of ̂, and ̂, ℎ , although these relative efficiencies are slightly higher than the analogous estimators under the HT point estimator.

Small Spatial Domain Assessment
In addition to the large spatial domain, we conducted an assessment for the speciesspecific attributes for the small domains in Oregon that correspond, in most cases, to existing county boundaries (see Section 2.2). Such an assessment provides a range of different, albeit smaller, populations, each with P domain-attribute combinations exhibiting a range of spatial structures. If a species-specific attribute had an estimate of zero for any of the K subsamples within a domain, then the species-specific attribute was discarded for that domain. This was considered appropriate because a desirable property for a variance estimator is to produce a variance of zero if there is no variance between sample observations. However, this property guarantees a given estimator will be negatively biased in the case where the point estimate is zero. For this reason, species-specific attributes within a domain are removed from consideration if any point estimate for that domain was zero among the K subsamples. In total, there are 30 small domains that contain between six to 30 species-specific attributes. For each small domain, the sample size within that domain is a random variable. Let n c represent the mean sample size across the K subsamples within a small domain. We assign each domain to one of three classes, based on its sample size: n c ≤ 10, 10 < n c ≤ 20 and n c > 20. These thresholds represent the 33rd and 66th percentile of the domain-specific sample sizes across study areas.
We produced variance estimates for each of these species-specific attributes in each small domain under the HT and PS estimators (Figure 7). Generally, all estimators exhibited a strong positive bias across the range of C values and across all sample size classes, with a diminishing positive bias as the sample size increased. For most estimators, variance estimation under the post-stratified estimator reduced the bias with notable exceptions for some of the attributes in the smallest sample size classes. In the small sample size classes, two specific attributes are apparent as positive outliers for all estimators that correspond to the VOL and BA attribute in spatial domain 47 (see Figure 1). However, these outliers have Ratio values that are more extreme than theV SRS estimator for theV DC,1 ,V DC,2 and V MAT,par estimators. Generally, the estimators attain lower relative efficiencies as sample size increases ( Figure 8) with notable exceptions occurring for the ̂, estimator. The ̂ estimator is more efficient the ̂ across all of the possible attributes, and ̂, 1 is more efficient for many, but not all, attributes. Only small differences exist between variance estimates attained under HT or PS point estimators. Generally, variance estimates attained under PS point estimation have slightly higher relative efficiencies, suggesting that the PS All three of these alternative estimators rely to some extent on the definition of neighborhoods surrounding given sample positions (Equations (5), (6) and (9)). For a small spatial domain, these estimators will only have access to relatively few neighborhoods to compute the variance, which may lead to the extreme values we observe. In addition, spatial domain 47 appears to have a large perimeter-to-area ratio and is very irregular in shape. In contrast,V SO constructs neighborhoods for each point in the sample space, which provides a larger number of neighborhoods used to estimate the variance, which may resolve some of these issues.
Generally, the estimators attain lower relative efficiencies as sample size increases ( Figure 8) with notable exceptions occurring for theV MAT,par estimator. TheV SO estimator is more efficient theV SRS across all of the possible attributes, andV DC,1 is more efficient for many, but not all, attributes. Only small differences exist between variance estimates attained under HT or PS point estimators. Generally, variance estimates attained under PS point estimation have slightly higher relative efficiencies, suggesting that the PS point estimator removes some of the benefit of using an alternative estimator. The SRS variance estimator has the largest ̿̿̿̿̿̿̿̿ value among all estimators considered across all three sample size classes with the exception of ̂, 2 under the post-stratified estimator ( Table 4). The ̂ estimator under post-stratification has a lower ̿̿̿̿̿̿̿̿ value than under the HT estimator, which is consistent with the large domain assessment in Section 3.2.1. The ̂, ℎ estimator attained the ̿̿̿̿̿̿̿̿ value closest to zero under both the HT and PS point estimators. With the exception of the ̂, estimator, all alternative estimators attained a lower mean relative efficiency, with the ̂ estimator attaining the lowest relative efficiency across all three sample size classes ranging between 0.67 and 0.68 relative efficiency.  The SRS variance estimator has the largest Ratio value among all estimators considered across all three sample size classes with the exception ofV DC,2 under the post-stratified estimator ( Table 4). TheV SRS estimator under post-stratification has a lower Ratio value than under the HT estimator, which is consistent with the large domain assessment in Section 3.2.1. TheV MAT,hex estimator attained the Ratio value closest to zero under both the HT and PS point estimators. With the exception of theV MAT,par estimator, all alternative estimators attained a lower mean relative efficiency, with theV SO estimator attaining the lowest relative efficiency across all three sample size classes ranging between 0.67 and 0.68 relative efficiency.

Variance Estimation under Horvitz-Thompson and Post-Stratification Estimators
We examined performance measures for six variance estimators under HT and PS point estimation for a variety of populations and spatial domains. Considering both HT and PS estimation provides insight into how well the variance estimators can handle populations with varying spatial structure in the residual. A robust variance estimator should be able to handle these diverse situations under both point estimation regimes.
Several studies have examined variance estimation for systematic and similar designs under a Horvitz-Thompson point estimator. Magnussen et al. [6] investigated alternative variance estimators under simulated populations and found improvements for estimators similar toV DC,1 andV MAT,par with Ratio values ranging between 1.03 to 1.37 and 0.98 to 1.19, respectively while attainingV SRS values ranging between 1.50 to 2.88. Similarly, Stevens and Olsen [3] present results usingV SO under a generalized random tessellated stratified design and attained Ratio close to one and confidence interval coverage probabilities very close to the nominal rate using a simulated data set. Our results also indicate improvements using the alternative estimators. For the large spatial domain, we noted a trend of positive bias for theV SRS estimator with decreasing Geary's C (Figures 5 and 7). Specifically, we attained Ratio values of 1.23 for the large domain and values between 1.54 to 1.88 for the small spatial domains. Several alternative estimators provide clear improvements over V SRS under an HT point estimator.V SO andV DC,1 attained Ratio values closer to one and smaller relative efficiencies in both the large spatial domain and small spatial domains. Notably,V SO was the only estimator that attained smaller relative efficiencies for every species-specific attribute we considered across all spatial domains under the HT point estimator (Figures 7 and 8).
When moving from HT to PS point estimation, two things may happen; a reduction in the spatial dependence in the residual and removal of at least some of the large-scale trend. Both of these can reduce the positive bias of theV SRS [6,22]. We found a reduction in C for the residuals (Figure 4), and an increase in relative efficiency of the point estimates for many attributes (Figure 3) when moving from HT to PS, suggesting that the removal of some spatial correlation and some trend is expected for these attributes. However, for many species-specific attributes this change was small. This implies that the positive bias typically observed under theV SRS estimator may remain even under PS point estimation for the attributes and post-strata we considered (Tables 3 and 4), motivating the adoption, or at least consideration, of the alternative estimators we present. While it is possible to reduce this bias with higher quality covariates (e.g., [23]), covariates that correlate well with species-specific attributes may not be available. Note that the FIA uses a more detailed stratification, exceeding 100 different strata for the state of Oregon, and may provide lower relative efficiencies under PS estimation than we observed. For the large spatial domain we attained Ratio close to one under the PS point estimator with values of 0.96 and 1.05 for theV SO andV DC,1 estimators respectively (Table 3). All alternative estimators showed improvements for the small spatial domains across all sample size classes, i.e., they attained RE lower than one and Ratio closer to one than theV SRS estimator ( Table 4).
Some of the estimators appear to have a large number of outlier points, e.g., thê V MAT,hex andV MAT,par estimators ( Figure 7). Closer inspection of the spatial domains where these outliers manifested revealed very sparse populations, i.e., with many observations of attributes that are zero. TheV MAT estimators rely on a much smaller number of neighborhoods to compute a variance estimate relative toV SO and in populations that have widely varying neighborhood-level estimates of variance, this may cause the observed erratic behavior. In addition, moving from the par to hex neighborhood structure for thê V MAT estimator, which implies a reduction in the total number of neighborhoods, appears to exacerbate the issue.V SO , although neighborhood-based, does not rely on disjoint groups of elements to form neighborhoods. Rather, every sample point in the population has its own neighborhood such that different neighborhoods possibly share members, resulting in a larger number of neighborhoods available to estimate the variance.

Implications for Environmental Sample Surveys
For generic inference surveys, post-stratification covariates are typically applied uniformly to all attributes of interest, since they are often optimized using only one or a few response variables in mind (e.g., VOL). Following this, an ideal variance estimator under PS will demonstrate improvements in cases where spatial dependence is strong or weak, or whether the post-stratification succeeds in removing the trend. These improvements should manifest in reductions of bias, captured by our Ratio measure, and reductions in mean squared error relative to an alternative, captured by our RE measure. Our results indicated thatV SO ,V DC,1 , andV MAT,par demonstrated reliable reductions with respect to these measures under both HT and PS point estimation regimes and are leading candidates for robust alternative variance estimators. In particular,V SO attained lower relative efficiencies for every species-specific attribute we considered, except for five attributes under PS point estimation in the large domain ( Figure 6).
Treating field plots as a population likely reduces the strength of spatial correlation between population units relative to the actual population considered in the FIA and other forest inventories. As well, systematic subsamples collected from this discontinuous population contain even less spatial correlation, as field plots selected in the sample are separated by a large distance (see Figure A1 (Appendix A)). On the other hand, subsampling of field plots does not remove potential large-scale trends in the attributes, which alone can affect the performance of variance estimators [6,22]. This begs the question; how will the variance estimators behave in actual applications? We demonstrate that several alternative variance estimators exhibit improvements of the standard estimator for a wide array of attributes and domains in this spatially discontinuous population. To the extent that population attributes are similar in trend and spatial correlation, similar reductions in bias and relative efficiency can be expected using these alternatives, especially when considering our study among the broader literature on the topic of systematic variance estimation that utilizes a simulation approach [3,6]. Selecting an alternative variance estimator for real applications requires a degree of risk, since obtaining census populations required for a formal assessment of bias and relative efficiency is an impossible task. However, one can mitigate this risk by considering alternatives that demonstrate utility under a broad of conditions and studies.
Our assessment was conducted using a strictly systematic sampling design from a finite population. However, many environmental sampling surveys employ other designs, such as quasi-systematic sampling designs, utilize estimators derived for continuous populations (e.g., [24]), or both. For quasi-systematic sampling designs, similar positive bias issues with respect to variance estimation are apparent [3]. WhileV SO provides clear routes forward for these designs, all other estimators we considered except forV SRS rely on regular geometries that exist between sample positions from strictly systematic designs. Furthermore, we considered only equi-probable systematic sampling, which need not be the case for systematic designs (e.g., [25], pp. 332-335). Further research could examine modifications to the proposed estimators to consider quasi-systematic, varying probability designs or designs that combine both concepts (e.g., [26]) while considering more complex point estimation regimes such as generalized regression estimation and post-stratification.
Finally, an alternative variance estimator must also be computationally feasible to implement for many potential attributes. While theV SO estimator provides reliable improvements toV SRS we found that there is a higher computational demand using this estimator relative to the alternatives. The estimator, as implemented in spsurvey, requires a singular value decomposition of a matrix of dimension nxn to optimize the weights in (4) which has a computational complexity of O n 3 in this case [27]. For example, producing variance estimates for samples with large sizes may be infeasible in some cases. However, the reliable performance of this estimator, and its flexibility for other types of sampling designs, including unequal probability designs, suggests that the optimization or modification for very large sample sizes is a research priority. All other alternative estimators required a much lower computational burden, and we expect their operational implementations for large sample sizes to be more straightforward.

Conclusions
We assessed six variance estimators for systematic sampling designs under HT and PS point estimation using a large spatially balanced environmental survey as a population. In general, all alternative estimators demonstrated improvements over an estimator that assumes simple random sampling, i.e., the "standard" estimator. An estimator presented by Stevens and Olsen [3] provided the clearest improvements, with mean relative bias for a large spatial domain of 1.01 and 0.96 for HT and PS, respectively. Relative efficiencies were smaller for this estimator in the large spatial domain, with values ranging between 0.65 and 0.82 for HT and PS, respectively. For small spatial domains, this estimator was the only alternative to attain relative efficiencies less than one for all attributes we considered. We found that the computational burden of this estimator is a potential hurdle for samples of large size, and have identified an estimator presented by D'Orazio [12] as an additional promising candidate that requires a lower computational effort.  Appendix A Figure A1. Changes between Geary's C calculated with all field plots and the mean Geary's C calculated using all subsamples collected from a systematic sample of interval = 5. Note that in some cases, the Geary's C between attributes and HT and PS residuals can be very similar, resulting in overlapping lines in the figure. Figure A1. Changes between Geary's C calculated with all field plots and the mean Geary's C calculated using all subsamples collected from a systematic sample of interval a = 5. Note that in some cases, the Geary's C between attributes and HT and PS residuals can be very similar, resulting in overlapping lines in the figure.