3.1. Comparative Evaluation of Classifiers and Tile Resolutions
To evaluate the influence of tile sizes on model performance, we compare four spatial dimensions—, , , and —using the rangeland dataset from South Dakota and the pastureland dataset from Arkansas. The optimal tile size is determined based on the configuration that yields the highest area under the receiver operating characteristic curve (AUC). We also implemented a 10-fold cross-validation procedure to assess the predictive performance of four modeling approaches: LASSOmin, Light Gradient Boosting Machine (LightGBM), Support Vector Machine (SVM), and Random Forest. For each outer training set, model tuning was performed using only the training data to avoid information leakage from the held-out fold.
For LASSO, LASSOmin denotes the value of the penalty parameter that minimized the cross-validated loss. For LightGBM, hyperparameter tuning was conducted over the number of leaves, learning rate, number of boosting iterations, class-weight adjustment, and the minimum number of observations in a child node. For SVM, the tuning grid included the penalty parameter, kernel type, and kernel coefficient option. For Random Forest, the tuning grid included the number of trees, maximum tree depth, the minimum number of samples required to split an internal node, and the minimum number of samples required at a leaf node. For each tuned model, the optimal hyperparameter combination was selected using only the training data, with cross-validated AUC as the tuning criterion.
Table 4 summarizes the mean values of AUC in 16 different scenarios, comparing the performance of various models in different square tile sizes and modeling approaches. The AUC serves as a key metric for evaluating prediction accuracy, with higher values indicating better model performance.
For the rangeland dataset, the Random Forest model achieved an AUC exceeding 0.95, demonstrating excellent predictive accuracy. In particular, the combination of Random Forest with the tile configuration provided the most robust framework for estimating the acreage of rangelands. In the case of the pastureland dataset, the results exhibited a consistent pattern: larger square tiles corresponded to higher AUC values, suggesting that increased spatial context enhances predictive capability. This observation is consistent with the findings of the rangeland analysis. Consequently, to estimate pastureland acreage, data derived from the largest tile size were utilized. Among all models assessed, Random Forest consistently achieved AUC values greater than 0.94, underscoring its strong classification performance and validating its selection as the preferred modeling approach for this analysis.
While the AUC results in
Table 4 demonstrate strong predictive performance across different models and tile sizes, the evaluation is based on randomly partitioned cross-validation folds. Because land cover types often exhibit strong spatial continuity, nearby observations may be spatially correlated. When training and test samples are randomly split without accounting for spatial structure, evaluation metrics may be overly optimistic due to information leakage between neighboring locations.
To address this concern, we conducted an additional spatial cross-validation analysis when evaluating model performance. Specifically, we adopted a leave-county-out validation scheme in which approximately of counties were held out as the test set in each fold, while the remaining counties were used for model training. Because observations within the same county tend to be spatially clustered, this approach helps reduce the influence of spatial autocorrelation between training and testing samples and provides a more realistic assessment of predictive performance.
Table 5 reports the AUC values obtained under this spatial validation scheme. The results remain consistent with the main findings of this study: larger spatial tiles generally improve predictive performance, and the Random Forest model with a
tile achieves the highest AUC for both rangeland and pastureland classification. These results suggest that the proposed framework remains robust even under spatially structured validation.
Based on the AUC evaluation results presented above, the 7 × 7 Random Forest model was selected as the best-performing classifier for both the rangeland and pastureland analyses. To further interpret the fitted models, we computed feature importance for the final rangeland model (South Dakota) and the final pastureland model (Arkansas). Feature importance was assessed using permutation importance, which measures the decrease in predictive performance after randomly permuting the values of a given predictor while leaving the remaining predictors unchanged. Predictors that cause a larger reduction in model performance when permuted are considered more influential in the fitted model.
Figure 8 summarizes the top 10 most important CDL-derived predictors for the selected 7 × 7 Random Forest models. In both panels, the CDL Grass/Pasture class is the dominant predictor, indicating that local grass-related land-cover composition plays the strongest role in distinguishing grazing land. At the same time, the remaining important predictors differ between the two models, suggesting that the surrounding land-cover context associated with rangeland classification is not identical to that associated with pastureland classification. For example, the rangeland model assigns relatively greater importance to cropland-related classes such as Soybeans and Corn, whereas the pastureland model gives relatively greater weight to forest- and wetland-related classes such as Evergreen Forest and Woody Wetlands. Nevertheless, the CDL Grass/Pasture class is not identical to the NRI pastureland definition used as the response variable in this study, so its strong importance does not imply that pastureland can be classified with uniformly high accuracy.
Using state boundaries as the domain of analysis, we evaluate model performance across 17 U.S. states for rangeland (
Figure 9a) and 48 U.S. states for pastureland (
Figure 9b). Model accuracy is assessed using mean AUC values computed for each state, with a 10-fold cross-validation procedure employed to ensure robust and reliable evaluation.
For the rangeland dataset, all models demonstrate relatively high AUC values, with Random Forest and LASSO exhibiting slightly higher median AUCs, reflecting strong classification capabilities across the analyzed states. The observed consistency and distribution of AUC values indicate that, although all models perform well, Random Forest stands out as particularly robust in predictive accuracy. In the case of pastureland, the Random Forest, LightGBM, and LASSO models display relatively consistent performance, characterized by higher median AUC values. In contrast, the SVM model shows greater variability and a lower median AUC, suggesting reduced reliability. These findings indicate that Random Forest, LightGBM, and LASSO offer more stable and accurate predictions for pastureland estimation across the 48 states, whereas SVM exhibits a broader performance range and lower overall effectiveness.
To further examine how important CDL-derived predictors vary across regions, we aggregated the feature importance values from the state-specific Random Forest models and summarized the top 10 predictors for rangeland and pastureland in
Figure 10. Although this aggregation is intended as a descriptive summary rather than a formal inferential comparison, it provides a useful overview of the predictors that repeatedly contribute to classification across states. For pastureland, the CDL Grass/Pasture class is the dominant predictor by a substantial margin, followed by Other Hay/Non Alfalfa, suggesting that pastureland classification is most strongly associated with grass- and forage-related land-cover composition. For rangeland, Shrubland and Grass/Pasture emerge as the two most influential predictors, with additional contributions from classes such as Alfalfa, Evergreen Forest, and other cropland-related categories. These patterns indicate that, while the CDL Grass/Pasture class is important for both pastureland and rangeland, the broader land-cover context differs between the two, with pastureland showing stronger associations with forage and wetland/forest-adjacent classes and rangeland showing stronger associations with shrub-dominated and mixed surrounding landscapes. Overall, these results support the view that regional variation in land-cover composition contributes to differences in classification patterns across grazing-land types.
While AUC is widely used to assess classification performance, it evaluates only the relative ranking of predicted probabilities and is insensitive to probability calibration. Because our acreage estimator directly aggregates predicted probabilities, accurate probability estimation is critical. We therefore complement AUC with cross-entropy loss, which explicitly penalizes deviations between predicted probabilities and observed outcomes.
Table 6 reports the mean cross-entropy values for rangeland and pastureland classification across four square tile sizes and four modeling approaches. Cross-entropy directly evaluates the quality of predicted probabilities, with lower values indicating better calibrated and more accurate probabilistic predictions.
For the rangeland dataset, all models exhibit decreasing cross-entropy as tile size increases, indicating that incorporating broader spatial context improves probabilistic performance. Among the methods considered, the Random Forest model consistently achieves the lowest cross-entropy across all tile sizes, followed closely by LASSO, while LightGBM and SVM exhibit comparatively higher loss values. A similar pattern is observed for the pastureland dataset, where larger tile sizes are associated with lower cross-entropy values across all models. Random Forest again demonstrates superior performance, yielding the lowest cross-entropy for each tile configuration. LASSO performs competitively, whereas SVM and LightGBM show higher loss, particularly for smaller tile sizes. Overall, these results suggest that Random Forest provides the most reliable probability estimates for both rangeland and pastureland classification, especially when larger spatial neighborhoods are used.
Figure 11 presents boxplots of cross-entropy values across states for four classification models. Lower cross-entropy values indicate better probabilistic calibration and predictive accuracy. In panel (a), corresponding to rangeland classification, the Random Forest model exhibits the lowest median cross-entropy and the narrowest interquartile range among the four methods, indicating both strong predictive performance and relatively stable behavior across states. LASSO and SVM yield moderately higher median losses with comparable dispersion, while LightGBM displays substantially higher median loss and greater variability, suggesting less consistent probability calibration in this setting. Panel (b) shows a similar pattern for pastureland classification across 48 states. Random Forest again achieves the lowest median cross-entropy, with limited dispersion, highlighting its robustness and reliability across a larger and more heterogeneous spatial domain. LASSO and SVM demonstrate intermediate performance, with slightly higher median losses and moderate variability. In contrast, LightGBM exhibits the highest median cross-entropy and the widest spread, indicating greater sensitivity to state-level heterogeneity and reduced stability in probabilistic predictions.
Figure 12a compares NRI design-based estimates with model-based estimates of rangeland acreage across 17 U.S. states, ordered in ascending magnitude based on the NRI design-based estimates. The Random Forest, LASSO, and SVM models exhibit strong concordance with the design-based values, underscoring their accuracy and effectiveness in estimating rangeland acreage. In contrast, the LightGBM model displays a systematic tendency to overestimate rangeland extent. These patterns are consistent with the cross-entropy results reported in
Figure 6, where Random Forest achieves the lowest median loss and the most stable distribution across states, followed by LASSO and SVM, while LightGBM exhibits substantially higher and more variable cross-entropy values. These findings suggest that Random Forest, LASSO, and SVM provide particularly reliable estimates at the state level.
Figure 12b presents a similar comparison for pastureland acreage across 48 U.S. states. Again, the Random Forest, LASSO, and SVM models align closely with the design-based estimates, indicating robust predictive performance. Conversely, the LightGBM model consistently overestimates pastureland acreage. This comparative analysis reinforces the suitability of Random Forest, LASSO, and SVM for accurate state-level pastureland estimation.
3.2. Level III Ecoregion Estimation of Total Rangeland and Pastureland Area
Building on the preceding state-level analyses, we extend our modeling framework to the EPA Level III ecoregion scale, which constitutes the primary objective of this study: generating wall-to-wall spatial predictions of rangeland and pastureland acreage. For this purpose, we employ Random Forest models (identified as the most accurate in prior evaluations) using CDL data aggregated within grid sections.
A key challenge at this resolution is the limited number of NRI sample points classified as rangeland in several Level III ecoregions, which hinders the reliable estimation of region-specific models. To mitigate this limitation, ecoregions with insufficient sample sizes were combined with adjacent ecoregions that yielded state-level estimates most closely aligned with NRI design-based benchmarks (
Appendix B).
To evaluate the accuracy of the Level III ecoregion models, we adopted an indirect validation strategy, as NRI design-based estimates are not available at the ecoregion level. Specifically, we fitted separate Random Forest models for each Level III ecoregion, allowing for spatial heterogeneity across ecoregions, and used the resulting continuous probability map to generate county-level predictions. These model-based estimates were compared with available NRI design-based county-level estimates, providing a proxy assessment of model performance.
Figure 13a presents this comparison for rangeland acreage across counties in 17 U.S. states, including 95% confidence intervals for the design-based estimates, with variances computed using the jackknife method. The results indicate strong alignment between model-based and design-based estimates, with estimates based on Random Forest models mostly falling within the 95% confidence intervals for the design-based estimates, demonstrating high predictive accuracy.
For pastureland estimation, we similarly transitioned from state-level to Level III Ecoregion modeling, again utilizing the Random Forest approach and
grid-aggregated CDL data. One exception was Level III Ecoregion 79 (Madrean Archipelago [
40]), where the scarcity of NRI observations classified as pastureland impeded reliable model fitting. To address this issue, Ecoregion 79 was merged with the adjacent Ecoregion 81 (Sonoran Basin and Range [
40]), selected based on consistency with state-level design-based estimates.
Figure 13b presents the county-level comparison for pastureland acreage across 48 U.S. states. As with the rangeland analysis, model-based predictions were generated using ecoregion-specific Random Forest models and benchmarked against NRI county-level estimates. Confidence intervals for the design-based estimates were again calculated using the jackknife method.
The model-based estimates demonstrate strong consistency with design-based estimates in counties with large number of acres of pastureland. Divergences are largely restricted to instances where the number of NRI sample points is small. In these scenarios of limited sample size, the design-based estimates themselves become unstable and less reliable as a benchmark. Consequently, the overall high level of agreement confirms the Random Forest framework as a robust and effective approach for generating pastureland acreage estimates, particularly where design-based data are sparse.
Figure 14a presents the map of predicted rangeland probability across the 17 states included in this analysis, overlaid with Level III ecoregion boundaries. Darker green shades represent areas with a higher probability of being classified as rangeland, while lighter green areas indicate lower probabilities. This probabilistic representation provides a spatial overview of rangeland distribution, facilitating the identification and extraction of specific regions of interest from which rangeland acreage can be accurately quantified.
Figure 14b displays the corresponding map for pastureland, illustrating predicted probabilities across 48 states, with Level III ecoregion boundaries similarly delineated. This map offers a valuable spatial framework for assessing pastureland distribution and supports the targeted extraction of areas for accurate acreage estimation.
Furthermore, the visual contrast between
Figure 14a,b highlights the substantially smaller extent of pastureland relative to rangeland, underscoring the added complexity and limitations associated with modeling and estimating pastureland acreage.
Table 7 summarizes both the NRI sample composition and the corresponding model-based rangeland acreage estimates across Level III ecoregions. After generating the map of predicted rangeland probabilities, we applied the model-based estimation procedure described in
Section 2.3.1 to aggregate pixel-level predictions and obtain the estimator
for each ecoregion. These estimates represent total non-federal rangeland acreage within each Level III ecoregion and are reported in thousand acres.
The table also reports NRI Sample, the total number of NRI sample points within each ecoregion, and Range Points, the number of those points classified as rangeland. These quantities are informative not only descriptively, but also statistically, because they reveal substantial variation in class balance across ecoregions. For example, some ecoregions, such as 2 and 3, have very few rangeland-classified points relative to the total number of NRI samples, while others contain a much larger share of rangeland observations. Thus,
Table 7 provides useful context for interpreting regional differences in acreage estimates and for understanding the varying difficulty of the classification task across ecological settings.
Table 8 presents Level III ecoregion-specific summaries of pastureland sample composition together with the corresponding model-based acreage estimates. After generating predicted pastureland probabilities over the spatial domain, we overlaid the prediction surface with Level III ecoregion boundaries and applied the model-based estimator described in
Section 2.3.1 to obtain
for each ecoregion. The resulting values represent estimated non-federal pastureland acreage and are reported in thousand acres.
To help interpret these estimates,
Table 8 also includes two sample-based quantities: NRI Sample, the total number of NRI sample points within an ecoregion, and Pasture Points, the subset of those points classified as pastureland. These counts provide additional statistical context by showing that the proportion of pastureland observations varies substantially across ecoregions. In some regions, such as 79, pastureland points account for only a small fraction of the available NRI samples, whereas in others they make up a larger share. This heterogeneity is useful for understanding regional differences in estimated pastureland extent and for assessing variation in classification difficulty across ecological settings.
Figure 15a presents model-based rangeland acreage estimates for a selected set of Level III ecoregions, together with their corresponding
confidence intervals. After obtaining the estimator
for each ecoregion, we applied the variance estimation procedure described in
Section 2.3.3 to compute standard errors and construct confidence intervals. The figure shows substantial regional variation not only in estimated rangeland acreage, but also in estimation uncertainty. For example, some ecoregions, such as 1 and 15, exhibit relatively wide confidence intervals, indicating greater variance in the acreage estimates, whereas others, such as 35, 37, and 48, have comparatively narrow intervals and thus more stable estimates. Several high-acreage ecoregions, including 9 and 75, also show noticeable uncertainty, although their confidence intervals remain informative relative to the magnitude of the estimates.
Figure 15b similarly displays pastureland acreage estimates and
confidence intervals for a subset of Level III ecoregions. As in panel (a), both the estimated acreage and the width of the intervals vary across regions. Some ecoregions, such as 23 and 30, show relatively large confidence intervals, suggesting higher estimation variance, while others, including 2, 3, and 19, have narrower intervals that reflect more stable pastureland estimates. These differences indicate that uncertainty is not determined solely by the magnitude of the acreage estimate, but also by regional variation in sample support and spatial heterogeneity.
These figures are intended as illustrative examples rather than a complete presentation of all Level III ecoregions. The selected ecoregions were drawn from different geographic parts of the United States and include regions with comparable estimated acreage levels, making it easier to compare uncertainty patterns across space and between rangeland and pastureland. In both panels, narrower confidence intervals indicate more precise estimates, whereas wider intervals reflect greater uncertainty. Variance estimates shown in
Figure 15 were obtained using the Jackknife resampling procedure.