A R egularized R aking E stimator for S mall A rea M apping from F orest I nventory S urveys

: We propose a new estimator for creating expansion factors for survey plots in the USDA Forest Inventory and Analysis program. This estimator was previously used in the GIS literature where it was called Penalized Maximum Entropy Dasymetric Modeling. We show here that the method is a regularized version of the raking estimator widely used in sample surveys. The regularized raking method differs from other predictive modeling methods for integrating survey and ancillary data in that it produces a single set of expansion factors that can have general purpose use to produce small area estimates and wall-to-wall maps of any plot characteristic. This method also differs from other more widely used survey techniques, such of GREG estimation, in that it is guaranteed to produce positive expansion factors. We extend the previous method here to include cross-validation, and provide a comparison to expansion factors between the regularized raking and ridge GREG survey calibration.

reasonable results when applied to other variables, thus greater specificity may come at a loss of 34 generality. Second, these predictive models often produce estimates that are inconsistent across 35 spatial scales or with "official" published estimates. For example, an empirical model that produces a 36 wall-to-wall map of land use or basal area may not add up to state-level official estimates of land use 37 or basal area. Such inconsistencies can create obstacles for the use of such estimates in official business. 38 In the Eastern United States, the USDA FIA survey uses a permanent network of plots that are 39 visited on a rotating five-year cycle. These plots have been selected through a spatially stratified 40 random design with a density of approximately 1 plot per 2400 ha (6000 acres). This sampling density 41 certainly does not permit detailed resource mapping, and for most of the United States, does not even 42 permit reliable direct estimates of county averages. For official reporting, the FIA has created custom 43 survey units, which are agglomerations of counties that typically contain hundreds of forested sample 44 plots. 45 To illustrate the approximate scale of these sample densities and survey units, Figure 1 shows a 46 map of the FIA plot locations in South Carolina, as well as county and survey unit boundaries (the 47 locations are adjusted by FIA to preserve the confidentiality of landowners). The entire state of South Most survey estimates of population totals can be reduced to linear estimators:τ k = ∑ i∈s w i y ik 103 where w i are survey weights or expansion factors. Expansion factors earn their name from the fact that 104 they can be understood as the number of elements in the population that are represented by a sample 105 unit. For example, expansion factors in the FIA data represent the number of acres each plot represents 106 in the entire sample population. For small area estimation, we will require expansion factors w it that 107 allocate sample i to target patch t.
If a survey is conducted with known sampling probability, then a simple, design-unbiased estimator for a population total is the "Horvitz-Thompson" (HT) estimator [19,20] τ HT j = ∑ i∈s d i y ij = d y j where the weights are called design weights because they are related to the sample design through the 109 inverse sampling probability, i.e. d i = 1/π i . In a simple random sample with equal probability, the 110 expansion factors are simply N n .

111
At the next level of complexity, survey weights are often modified to account for known ancillary 112 data. For example, the FIA program adjusts the HT weights to account for the land area that is forest, 113 non-forest, and forest-fringe, as derived from the National Land Cover Database [21][22][23]. Two widely used methods of accounting for ancillary data are raking (also called "Iterative 116 Proportional Fitting", and "poststratification" when there is only one ancillary variable) and Generalized we would like to estimate or map.) In particular, both methods create weights such that ∑ i∈s w i x i = τ 121 for each ancillary total. Thus, both raking and GREG assume that ancillary totals are precisely known.

122
However, this is rarely the case, since most ancillary estimates are statistical predictions or imputations 123 developed from satellite reflectance values and other imperfect spatial data sets. Since most ancillary 124 data have uncertainty, they do not strictly satisfy the condition that ancillary data be known.

125
Raking is an iterative process that uses the HT estimates to initialize the expansion factors for 126 each survey plot. Then an estimated ancillary total ∑ i w i x i =τ is calculated. Then, the expansion 127 factors are all adjusted by the multiplicative factor τ τ . This guarantees that the known ancillary total τ HT estimator over-or under-predicts some of the ancillary totals, then we might want to adjust our weights accordingly. The GREG estimator is given by the equation: where the β are the weighted least squares regression coefficients obtained by regressing the response 142 variable y j on the ancillary variables (τ −τ HT ), using the design weights d i on each observation.

143
It may appear that the GREG estimator depends on the response variable y j , and thus that a different estimator would be needed for each response variable, but it is possible to rewrite the GREG estimator in terms of an expansion factor as: where λ are coefficients that do not depend on the response y. Thus, the expansion factors w GREG can 144 be calculated once and stored for general purpose use.

145
The GREG estimator is not widely used in survey practice. In particular, the GREG estimator has a 146 tendency to produce some negative expansion factors. This is especially common when many ancillary 147 data are used. In these situations, the raking estimator can produce erratic expansion factors, whereas 148 the GREG estimator tends to produce expansion factors that are slightly less erratic, but sometimes 149 negative. Negative survey weights are clearly problematic, as they defy the interpretation of the Ridge and Lasso are both types of "regularization" estimators. Regularization has proven to be a 160 powerful heuristic tool for prediction problems involving many explanatory variables because if 161 effectively addresses common practical problems caused by the multicollinearity and overfitting 162 problems that emerge in such situations. Regularization estimators typically trade off a little bias in a 163 prediction in exchange for greatly reduced mean square error.

164
Not requiring the survey to exactly reproduce the ancillary data is an attractive property, both 165 conceptually and statistically. Conceptually, we ought to recognize that ancillary data do not usually 166 represent a "gold standard" of truth. For example, the FIA weighting process uses estimates of forest, 167 not-forest and forest edge that are derived from satellite imagery. Satellites do not observe forests; 168 they observe spectral reflectances which we then feed through models that create predictions of tree 169 canopy and land cover. Thus, while we may have confidence that the model is -on the whole -reliable, 170 the output by no means represents the absolute truth. By forcing our survey estimates to match these 171 modeled outputs, we are forcing our survey estimates to reproduce the errors found in these imperfect 172 spatial data layers. 173 We believe that many of the practical limitations of calibration estimators are amplified by can be used -if too many noisy auxiliary variables are used then calibration will overfit these data and the final weights will be erratic. In regression modeling, the effects of overfitting are similar to the 179 effects of multicollinearity.

180
To extend these ideas to raking, we first point to the fact that GREG solves the optimization problem [27]: Thus, GREG can be interpreted as "find new expansion factors w i , that are close to the HT factors but 181 that also satisfy the ancillary constraints." 182 Guggemos et al. show [25] show that the ridge GREG estimator can be similarly written as: In contrast to GREG, which exactly reproduces the ancillary totals, the ridge GREG estimator To develop our regularized raking estimator, we first point to results showing that the raking and GREG estimators are similar; Deville and Sarndal [28] proposed a class of estimators that calibrate the Horvitz-Thompson design weights to ancillary data. These estimators are:

192
By analogy to the regularized GREG estimator, we propose a regularized raking (or regularized entropy) estimator. Let τ j be an ancillary total for spatial region J j and characteristic k. The estimator we use is The output of this estimator is the set of strictly positive expansion factors w it . In our case, the to distribute the accuracy in proportion to the quality of the ancillary data. The estimator will try to 198 calibrate the weights to the ancillary exactly, but if there are conflicts between the ancillary data, then 199 the weights will lean toward more closely reproducing the more precise ancillary data. -in the absence of any ancillary information -the best prior estimate is that each patch is identical.

202
Other prior weights may be possible, for example by weighting nearby samples more, but this is not 203 explored in this article.

204
The regularization factor γ was not considered by Nagle et al. We include it here to allow a 205 data-driven selection to the regularization. If γ = 0, then the final weights will be to use the HT More generally, it may be possible to use raking with a LASSO regularization rather than a ridge 217 regularization. LASSO is often justified over ridge regression on the grounds that it is "sparse" and 218 automatically selects some ancillary variables to use and others not to use. Sparsity is a useful feature 219 when the regression coefficients β must be stored for later use. We find the justification less compelling 220 in the current use case because the model only needs to be fit once, and it is not a set of regression 221 coefficients that must be stored, but expansion factors w. LASSO has no effect on the sparsity of 222 expansion factors w, and here is no guarantee that a sparse set of regression coefficients leads to more 223 efficient or robust expansion factors. A sparse set of expansion factors may be desirable, but neither 224 ridge nor LASSO methods currently provide that.

225
Another related approach is the previously mentioned one by Riley et al. [18]. Using ancillary 226 data on vegetation and biophysical attributes, they build a random forest model to identify the single 227 FIA plot that best predicts the ancillary data at each 30m pixel across the Western US. This is in contrast 228 to our expansion factor approach that identifies the probability that each site is represented by each  When a plot is sampled, the site is first viewed in the office using aerial imagery. If the office 245 staff determine that the site is unlikely to be in use as a forest, then its land use is recorded and the 246 sampling process ends for that site. If the plot is regarded to possibly contain use as a forest, then the 247 plot is visited by a field crew. Field crews divide each plot into separate "condition classes" so that FIA field crews also record the dominant species type of forest in each condition class (FORTYP).

254
The Forest Type is a three digit classification which we collapsed into a two digit classification 255 corresponding to "Pine and other Softwood", Oak/Hickory, Oak/Pine, Oak/Gum/Cypress,

256
Elm/Cottonwood, and Other. Owing to small sample sizes, we have further grouped together 257 the Oak/Gum/Cypress and Elm/Cottonwood into one group, and we have included the "Other" 258 category with the Oak/Hickory. This produces a classification of 4 different Species groups. 259 We have then combined the Land Use and Forest Type variables into one nested variable, so that 260 Forest Use is divided by Forest Type, and and Non Forest Use is divided by Use class. The definitions 261 of these 9 classes classes are shown in Table 1. 262 We also estimate the Forest Volume for each plot, which is defined as the (estimated) net volume 263 of wood in timber species greater than 5 inches in diameter at breast height.

264
In addition to land use, forest type, and forest volume, we also used the estimate of Basal Area of 265 Live trees (BALIVE) for each plot for prediction. We do not use basal area as an ancillary variable, but  that are homogeneous with respect to the ancillary predictions and standard errors. Since our ancillary 286 maps are derived from NLCD land cover and tree canopy cover layers, and we also desire county-level 287 estimates, we group pixels into patches bases on county, NLCD class, and tree canopy cover (grouped The mapped outputs from these models are then used as auxiliary variables in the regularized 293 raking estimator. In all cases, the ancillary data model produces both estimate as well as standard 294 errors of prediction, which are used to weight the ancillary variables.
295 Figure 2 shows an example model fit for the auxiliary variables: the probability of forest use and 296 the probability of the Eastern Softwood Forest Type, as well as their standard errors, for one of the 297 survey units. It is obvious that standard error of the "Forest" land use class is smaller than the standard 298 error for the more specific "Eastern Softwood" class. This is especially so at the higher canopy cover 299 levels that are most likely to be forested.

300
The complete set of auxiliary variables used across the state include: there is less uncertainty in their aggregation to the 2-class forest/non forest proportions (e.g. see 315 Figure 2). The regularization incorporates ancillary data variance, but not covariance. It is possible 316 that the 9-class map would be sufficient if all cross-covariances are accounted for, but this would 317 dramatically complicate the computational routines. Additionally, while it is somewhat reasonable to 318 obtain estimates of prediction error variance from ancillary geospatial data products, it is much less 319 reasonable to expect covariances between different ancillary data products.

320
Using the ancillary maps created by our predictive model, we fit both regularized raking model The expansion factors were then used to estimate forest volume for each of the patches containing a 331 withheld plot and we calculated the square error between the withheld plot measurement and the 332 estimate, and then averaged across simulations and withheld plots.

333
Before continuing, we remark that the ancillary data were fit using the survey estimates. The   Figure 3 shows the cross-validation error as function of the regularization parameter γ. The 358 cross-validation measure is the average square error of estimating forest volume across the withheld 359 plots. Regularized raking consistently performs better than ridge GREG when predicting forest volume 360 out of sample. This is especially interesting since it is GREG that minimizes (within sample) square 361 error loss, not raking. The improvement seems to be in the fact that GREG can produce errors beyond 362 the range of the data, whereas raking can not. We also note that regularized raking prefers slightly more 363 regularization than does ridge GREG. This is slightly unfortunate: since GREG is less computationally 364 demanding, and cross validation is a computationally intensive process, we had hoped that the optimal 365 regularization parameter for GREG would also be a good regularization parameter for raking. One 366 possible explanation for why raking may prefer larger regularization parameters stems from the more 367 extreme maximum weight of the raking procedure relative to GREG.

368
It should not be inferred from Figure 3 that regularized raking is always better than ridge GREG.

369
At some point to the left of the range in these plots (γ < .01), the regularized raking error explodes as pointing out that the any errors in the ancillary data will be carried through to the raking estimates.

386
For example, the model that creates Forest Volume from NLCD tree canopy cover and NLCD land 387 use saturates with respect to tree canopy cover; while the model is able to distinguish between low      combining these smoothed outputs with regularized raking will allow the construction of temporally