New biomass estimates for chaparral-dominated southern California landscapes

: Chaparral shrublands are the dominant wildland vegetation type in southern California and the most extensive ecosystem in the state. Disturbance by wildfire and climate change have created a dynamic landscape in which biomass mapping is key in tracking the ability of chaparral shrublands to sequester carbon. Despite this importance, most national and regional scale estimates do not account for shrubland biomass. Employing plot data from several sources, we built a random forest model to predict above ground live biomass in southern California using remote sensing data (Landsat NDVI) and a suite of geophysical variables. By substituting the NDVI and precipitation predictors for any given year we were able to apply the model to each year from 2000-2019. Using a total of 980 field plots, our model had a k-fold cross validation R 2 of 0.51 and a RMSE of 3.9. Validation by vegetation type ranged from R 2 = 0.17 (RMSE=9.7) for Sierran mixed conifer to R 2 = 0.91 (RMSE = 2.3) for sagebrush. Our estimates showed an improvement in accuracy over a two other biomass estimates that included shrublands, with an R 2 = 0.82 (RMSE = 4.7) compared to R 2 = 0.068 (RMSE = 6.7) for a global biomass estimate and R 2 = 0.29 (RMSE = 5.9) for a regional biomass estimate. Given the importance of accurate biomass estimates for resource managers we calculated the mean year 2010 shrubland biomass for the four national forests which ranged from 3.5 kg/m 2 (Los Padres) to 2.3 kg/m 2 (Angeles and Cleveland). Finally, we compared our estimates to field-measured biomass from the literature summarized by shrubland vegetation type and age class. Our model provides a transparent and repeatable method to generate biomass measurements in any year, thereby providing data to track biomass recovery after management actions or disturbances such as fire.


Introduction
Evergreen sclerophyllous shrubland covers 9% of the state of California, half of which is located in the Mediterranean-type climate ecosystems of southern California, notably in the chaparral shrublands in the Transverse and Peninsular range foothills [1]. Large topographic and precipitation gradients have resulted in tremendous regional species richness. While the rich plant diversity of these chaparral shrublands contribute to the California's Floristic Province status as a biodiversity hotspot, they also provide habitat for nearly 400 species of vertebrate fauna and provide multiple ecosystem services [2,3]. One of these services, albeit not widely recognized compared to forest landscapes, is the contribution shrublands make to climate mitigation through carbon storage and sustained carbon sequestration. Studies show old-growth chaparral shrublands to be a significant sink for carbon and consequently an important component of the global carbon budget [4,5].
The productivity of shrubland vegetation, and associated biomass and carbon storage, is related to environmental factors including distance to coast, elevation, aspect, and climatic water deficit. These environmental factors are reflected in plant communities with different species compositions. Dry, south-facing slopes (or rocky areas with shallow soils) are dominated by chamise chaparral (Adenostoma fasciculatum); less xeric, north-facing slopes (or areas with deeper soils) host mixed chaparral with a variety of species including ceanothus (Ceanothus spp.), manzanita (Arctostaphylos spp.), and scrub oak (Quercus berberidifolia); and coastal areas are dominated by a single, often locally endemic species of Ceanothus or Arctostaphylos [1,6]. Understanding patterns of shrubland productivity is important for resource managers given the relationship between productivity (and available fuels) and susceptibility of shrublands to large fires [7].
Disturbances such as wildfire are a major ecological factor in southern California, with major fire years burning hundreds of thousands of hectares, causing extensive damage to homes and property, the loss of human lives, and millions of dollars spent in fire suppression costs [8][9][10], This has been particularly poignant in recent decades, as Southern California has experienced several multi-year droughts, including the 2012-2017 drought which resulted in significant mortality in mature chaparral stands [11], consequently increasing future fire risk. During a wildfire, the combustion of living vegetation and dead fuel result in an initial release of large amounts of carbon [12,13]. Where the fire return interval of shrublands approximates the historic fire return interval (average 55 years for low elevation shrubland) [8,14,15], shrub cover recovers after 10-14 years, while biomass keeps accumulating [16,17]. However, in many areas the fire return interval has decreased, often in conjunction with an increase in non-native plant species, drought, and nitrogen deposition [18][19][20]. Under these conditions, post-fire biomass recovery can be impeded and in some cases may result in type conversion from native shrubland to non-native grassland [20].
Despite recognition of the importance of chaparral shrublands in ecosystem carbon budgets, there are few published methods for assessing the biomass of this vegetation type at the landscape scale and even fewer that can consistently be applied before and after disturbances such as wildfire. Regional and national biomass estimates often focus on forested lands only [21,22] where the amount of biomass in shrublands is considered negligible compared to forests at these spatial scales. However, given that chaparral shrublands cover nearly 3 million hectares (ha) in the state of California, their role in carbon sequestration should not be underestimated [23]. Mature chaparral shrublands capture and store substantial amounts of carbon aboveground while many resprouting species of chaparral are characterized by deep roots with large belowground carbon stores [24,25].
At the national scale, the foremost program for mapping biomass and combustible material (fuel) on wildlands is LANDFIRE, a cooperative program between the US Department of the Interior and the US Department of Agriculture Forest Service (hereafter USFS). However, often these fuel mapping products are difficult to crosswalk into meaningful biomass estimates as they focus on dead rather than live vegetation to inform fire behavior models.
Alternatively, at a finer scale, a few studies that have assessed pre-and post-fire biomass in shrublands (e.g., [26]), although most have focused on a spatially and temporally limited number of plots and results are often not applicable beyond the immediate extent of the study. Studies at fine spatial scales can be dominated by site variability that obscure detection of temporal trends or have been undertaken at temporal scales unable to account for the influence of previous fire history and frequency.
This data gap is particularly important for land management agencies like the USFS that manage large areas of shrubland: for example, the four national forests in southern California contain an average of 66% shrubland. Estimating the amount of biomass (and associated carbon storage) on federal lands and the impacts of wildfire and management activities (e.g., fuel reduction) are critical components of resource management. Moreover, in instances when native shrubland fails to reestablish after fire it can alter the future risk and severity of fires [27], underscoring the importance of a readily available method for assessing biomass in planning activities. Timely estimates using a transparent, repeatable methodology of the impacts of wildfire on carbon storage can contribute to environmental damage assessments, while understanding how biomass recovers to pre-fire levels is important for managers given the connection between restoring vegetation cover and providing critical ecosystem services such as sediment erosion regulation [28].
Remote sensing has been shown to provide spatially explicit estimates of biomass in Mediterranean-type climate regions, over a variety of spatial and temporal scales [29]. The Landsat program (4, 5, 7, and 8)(www.landsat.usgs.gov/) in particular, offers a deep temporal stack (~20 years) of images collected on average every 16 days, and at 30 m spatial resolution is well-suited for monitoring heterogeneous shrublands at the landscape scale [30,31]. Spectral vegetation indices have been shown to be sensitive for detecting changes in shrubland vegetation and biomass. The Normalized Difference Vegetation Index (NDVI) was successfully used to estimate coastal shrubland biomass [32], dwarf shrub biomass [33], and to predict biomass using models which combine NDVI and precipitation data [34] in the Mediterranean Basin. Storey et al. (2016) found that NDVI and the Normalized Burn Ratio (NBR) provided statistically significant indications of postfire recovery two to three decades postfire in southern California. In other studies, the Enhanced Vegetation Index (EVI) has successfully assessed vegetation recovery postfire on Mount Carmel, Israel [35] and in southern California [26].
In this study, we developed a method to map biomass annually by using vegetation indices derived from Landsat TM/ETM+/OLI instrument data (on-board Landsat 5, 7, and 8 respectively) and precipitation data specific to the year, resulting in a stack of biomass raster layers for the years 2000-2019. Specifically, we: 1) created a random forest model using vegetation indices, environmental variables, and field data to estimate above ground live biomass (AGLBM); 2) compared these data with existing global and regional (statewide) spatial biomass estimates for four national forests; and 3), compared our biomass data to values compiled in the literature from field measurements by shrub type and age class.

Study area
Our study area totaled 3,515,805 ha (8,687,731 acres), defined by all HUC12 (USGS Hydrological Unit Code) watersheds that intersected with USFS National Forests in southern California (Angeles, Cleveland, Los Padres, and San Bernardino, hereafter the ANF, CNF, LPNF, and SBNF, respectively) ( Figure 1). The Mediterranean-type climate in the study area is characterized by warm dry summer months and mild wet winters with most of the annual precipitation occurring in the winter and spring. Elevation in the study area ranged from 133 m to 2231 m. Shrublands accounted for over half (54%) of the natural vegetation in the study area, followed by conifer and hardwood forest (<20%) in higher elevation areas. Wildfires are a common disturbance [9,14] with an estimated 5580 ha per year of the study area burned (estimated from [36]). Although we focused on the AGLBM of chaparral shrublands, we estimated biomass for all wildland vegetation types in the study area.

Environmental data layers
We assembled environmental data layers considered useful for predicting biomass based on input from USFS resource managers, ecologists and studies in the literature (e.g., Stephenson, 1998). Given our objective to develop a model that can be applied to different years based on that years' Landsat NDVI reflectance and precipitation, we divided our predictive variables into two categories: time-dependent and static. Static predictors included geophysical features of the landscape: aspect, slope, elevation, flow accumulation, PAM (Partitioning Around Medoids, [39], solar radiation, geomorphons, and soils (all at 30 m resolution, see Table 1). We also included the 30-year average of several climatic variables: annual precipitation, climatic water deficit, groundwater recharge, water runoff, and actual evapotranspiration as calculated by the Basin Characterization Model [40].
Time-dependent predictors are specific to the target biomass year being analyzed and included the annual (previous year) and biennial (previous 2 years) total precipitation and NDVI data from Landsat imagery (2000-2019, see next section). For analysis purposes, we established the year as September 1 to August 31 to coincide with the end of the primary growing season and before the fire season in the fall.
Annual and biennial precipitation data were obtained from ClimateNA v5.10 software, which compiles precipitation data from the Parameter-elevation Regressions on Independent Slopes Model (PRISM) [41]. Using the ClimateNA software, we downscaled the monthly PRISM precipitation data from 4 km to 500 m; ClimateNA employs a bilinear interpolation and local elevation adjustment to reduce the scale of gridded climate data [42]. The monthly precipitation raster files were summed for the annual and biennial periods for each year 2000-2019. We prepared all environmental data layers so the pixel size (30 m) and projection matched using R version 3.6.0 with the raster package version 3.0.12 and ArcGIS Pro version 2.5 software.

Landsat TM data and vegetation indices
We obtained Landsat 5, 7 and 8 surface reflectance (http://landsat.usgs.gov/CDR_LSR.php) maximum value composite images for July and August for each year 2000-2019 from the Google Earth Engine data catalog (https://earthengine.google.com/datasets/) using the JavaScript Application Programming Interface (API) [43]. This code extracted the maximum value for each of the bands and calculated the Normalized Difference Vegetation Index (NDVI) using the following formula: Where nir = near-infrared band (OLI band 5, TM/ETM+ band 4), red = red band (OLI band 4, TM/ETM+ band 3), blue = blue band (OLI band 2, TM/ETM+ band 1). For the period 2000-2011 we used Landsat 5 TM and Landsat 7 ETM+ data. Landsat 5 TM data has pixel drops/errors so to reduce the effect of these we masked and replaced them with Landsat 7 pixels from the same year. In July/Aug 2012, Landsat 8 was not yet operational, and Landsat 5 had stopped collecting data so for this year we used Landsat 7. Landsat 7 ETM+ data has artifacts related to the scan line correction failure which occurred in 2008 so we limited the use of these data to 2012. For 2013-2019 we used Landsat 8 OLI data. Revisit period for both Landsat 5, 7 and 8 is 16 days, so at most there would be two scenes available for any given month. The maximum NDVI value pixel was extracted from the collection of all NDVI pixels from all available scenes from July and August of each year. This raster product is called the maximum value composite (MVC).
Shrublands in southern California are influenced by precipitation-driven flushes of herbaceous/grass vegetation, particularly in the winter and spring months [44]. Since our objective was to estimate woody shrub biomass, we minimized the effects of these ephemeral pulses of herbaceous vegetation using July-August MVCs when the herbaceous layer is largely senesced. Assuming that we had at a minimum of 3 scenes for each compositing period we were reasonably assured of cloud free pixels throughout the study area for any given year. A July-August NDVI MVC was created for each year 2002-2019.

USFS Forest Inventory and Analysis data
We used 668 plots from the USFS Forest Inventory and Analysis (FIA) program, which is a network of inventory and monitoring plots across the conterminous US. These plots represent a sample of forested lands across all ownerships that provide consistent, scientifically reliable forest inventory data. Visit dates for our plots were distributed between the years 2001, 2002, 2004, 2008, 2010 and 2012. All FIA plots used in this study were located on USFS administered lands.
An FIA plot consists of four circular subplots (radius 7.32 m) with three of the subplots arranged at angles of 360°, 120°, and 240° and 36.6 m from the center subplot. In total, the four subplots sample 673 m 2 and can intersect with four to eight 30 m Landsat pixels (depending on the plot center location relative to the pixel grid [45].
In the FIA data collection procedure, crown diameter and height of all individual shrubs were recorded on the four subplots within each plot. Using these measurements, we applied species-specific allometric equations [46,47] to calculate the AGLBM of shrub species where possible, otherwise we used a generalized shrub-herb biomass equation [48]: Where B is biomass (kg/m 2 ), H is height (m), C is percent cover/100, and BD is the bulk density (kg/m 3 ) constant of 0.8 for grass and herbaceous plants and 1.8 for shrubs. Biomass was calculated for shrub and herbaceous species with ≥ 3% cover in the subplot.
To estimate the biomass of trees, we first defined classes using diameter at breast height (1.37 m above ground level): 'trees' ≥12.7 cm dbh; 'saplings' 2.54-12.7 cm DBH; and 'seedlings' < 2.54 cm dbh. We applied regional models that estimate live tree AGLBM using the component ratio method; inputs included dbh, height and species. The AGLBM biomass estimates do not include tree foliage such as needles, leaves. We calculated sapling AGLBM using a modified biomass equation from [49] with an adjustment factor based on sapling dbh [50]. For sapling woodland tree species with multiple stems close to the ground (e.g., Quercus gambelii and Cercocarpus ledifolius) diameter is measured at the root collar [51]. Seedling (< 2.54 cm dbh) biomass was not recorded and shrub data were not collected on plots visited after 2010 in the FIA plots. After calculating the biomass for each individual shrub and tree (≥12.7 cm) species, we compiled the total AGLBM density (kg/m 2 ) for the plot in each of the three categories: tree, shrub, and herbaceous.

LANDFIRE Reference Database
LANDFIRE plot data (LANDFIRE Reference Database [LFRDB]) are collected from a variety of sources to generate and validate the LANDFIRE products including vegetation, fuel, and fire frequency that can be used as inputs to fire behavior models (https://www.landfire.gov/). We downloaded approximately 6,000 LFRDB records for the southern California study area collected between 2000-2005. Of these 6,000 plots, 276 contained the vegetation data (cover and height) required to calculate biomass and were located in our study area. These data were collected in 2005, using the FIREMON protocol (https://www.fs.fed.us/rm/pubs/rmrs_gtr164.pdf) which consists of a series of variable radius and shape subplots all with the same macroplot center. Individual shrub and plant data were not recorded, instead the average cover and height of live vegetation in two height classes (<1.83 m and >1.83 m) was estimated. We estimated AGLBM using the general equation [48] for both height classes. Herbaceous cover and height were also present in LFRDB, so we calculated these as a separate category, again using the generalized biomass equation. The LFRDB plots in the study area had no data on trees so we assumed that they were not present. To avoid underestimating AGLBM in plots with trees, we confirmed their absence by examining plot locations with the high-resolution imagery contained in Google Earth desktop software (GE) and eliminated any that appeared to have trees. AGLBM was summed in the three categories (< 1.83 m, > 1.83 m, and herbaceous) to provide total AGLBM in each plot (kg/m 2 ).

Additional field plot data
We obtained data from two research projects to integrate with the FIA and LFRDB plots [52,53]. Vourlitis (2012) sampled biomass on the same plots quarterly (2004-2016) using non-destructive dimensional analysis as part of a study on the effects of added nitrogen on chaparral productivity. The experimental design used four 10 m x 10 m plots as the untreated control; we used the biomass sampled on these control plots during the summer quarter (July-August-September) for the years 2004-2016. Uyeda et al.'s (2016) study tested growth ring analysis as a proxy for post-fire biomass recovery in chaparral. This study developed species-specific regression equations from a sample of harvested shrubs; these equations related stem basal area to biomass. These equations were then used to estimate biomass on twenty-four 16 m 2 plots on the USFS San Dimas Experimental Forest. All stem measurements were conducted in the fall of 2013. In cases where the plots fell in the same Landsat pixel these plots were averaged, resulting in the elimination of three plots.
Finally, we sampled five plots in June 2019 within the Powerhouse fire which burned in 2013 using a 30 m x 0.5 m belt-transect to collect vegetation height and cover for shrub species. Grass and forb cover was estimated using the point-line-intercept method from points sampled every 30 cm along the transect. We used the same allometric equations described above to estimate AGLBM; for species with no allometric equation we applied the general equation [48].
Our gross number of plots before filtering was 980: 668 FIA, 276 LFRDB, 21 from Uyeda et al. (2016), 10 from Vourlitis (2012), and 5 from the Powerhouse fire data collection campaign. Initial filtering of the plot data reduced this number to 821. Filtering criteria included removing any FIA plots with zero biomass (e.g., where plots contain only herbaceous cover, or several species that individually do not comprise the 3% cover threshold). FIA plot visits after 2010 did not collect any data on shrubs, forbs, or grasses so these plots were reviewed in GE with historical high-resolution images. If plots were determined to be tree-dominant by ocular estimation they were retained so we could calculate biomass, otherwise they were removed from the plot database. Our logic here was that if the plots were tree dominant, then the lack of any biomass data for shrubs and forbs would be negligible. We filtered the data for outliers using the 1.5 quartile rule where plots with AGLBM values that were lower than Q1 -1.5 (Q3 -Q1) and higher than Q3 -1.5 (Q3 -Q1) [Q1 = the first quartile of the data distribution and Q3 = the third quartile of the data distribution] are removed from the plot data (1.5 x IQR rule, [54]). Further screening was undertaken of suspect plots identified in scatterplots of AGLBM versus the predictor variables and reviewed in Google Earth, which reduced the final number of plots used in the model to 795.

Assigning predictor values to plots
The predictor spatial data are 30 m in resolution but because of uncertainty in the plot location, the continuous predictor variables (aet, aspN, aspE, cwd, dem, facc, ppt_1yr, ppt_2yr, ppt_avg, rch, run, solrad, slope, NDVI [ Table 1]) were extracted using bilinear interpolation while the remaining categorical variables (geomorph, PAM, soils) were extracted with no interpolation. Predictor values were assigned to each plot with a python script that used the ArcGIS Python API tool 'Multi-values to points' to extract the values from the predictor raster layers. We assigned NDVI and annual/biennial precipitation values to the plots based on the calendar year of the field visit. To confirm there had been no change in plot cover since they were measured, we reviewed wildfire perimeter polygons from the CalFire FRAP website (https://frap.fire.ca.gov/frap-projects/fire-perimeters/) for each plot. Plots that experienced a disturbance after the field visit were analyzed in Google Earth to assess if substantial vegetation changes had occurred before the NDVI data were acquired, if yes, then the plots were omitted. Finally, we randomly split our dataset into training (627 plots) and validation (168 plots), excluding the validation plots in building the biomass model. Reserving an independent set of validation data enabled us to analyze errors by vegetation class and to compare our model results to AGLBM estimates from other sources using the same dataset.

Estimating AGLBM with random forest
To build a predictive regression model to estimate AGLBM from the FIA, LFRDB and other field data, we used random forest (RF, randomForest package version 4.6.14 in R software), an ensemble machine learning algorithm. RF builds a 'forest' of decision trees; each tree is built using a random selection of predictors and samples from the training data. RF has two parameters that are used to optimize the computed regression model -mTry, the number of predictors used in each regression tree and Ntree, the number of regression trees grown. During the training stage, RF withholds a random sample of 1/3 of the plots (out-of-bag samples or OOB) for cross validation while the remaining two-thirds are used to build the regression tree [55]. We repetitively ran RF with different combinations of mTry and Ntree to maximize accuracy (high percent variance explained, low OOB RMSE).
By randomly shuffling the values of a predictor variable while preserving the values of all other variables during OOB validation, the relative importance of the variable in the model is calculated. The importance is expressed as the percent increase in mean standard error if the variable is excluded from the model. This is called variable importance.
After creating the optimized model, we wrote an R script [56] that ingested the predictor layers (Table 1), including the time dependent NDVI and precipitation layers matched to the target year, to build a continuous AGLBM surface raster for each target year 2000-2019. Before running the script, we masked out urban, water, and agriculture land cover from the predictor layers using the California Wildlife Habitat Relations (CWHR) 'WHR Type' classification [38]. Hereafter we refer to our AGLBM estimates as the WWETAC-UCD estimates. Following inspection of the WWETAC-UCD raster layers, we determined that predicted biomass of perennial grassland and desert scrub CWHR classes [38] was higher than field based estimates [57,58] by approximately 2 orders of magnitude, so we reduced AGLBM values for these classes by 99% and 90% respectively. All estimates of biomass are provided in kg/m 2 .
We assessed the accuracy of the final RF model in several ways. First, each of the OOB samples (one-third of the plots) was run through the RF tree to report the variance and associated mean of squared residuals and Mean Standard Error (MSE). This is done as a part of the RF process. Second, using the 20% validation data (168 plots) we reserved prior to running the RF model, we conducted a k-fold cross validation of the model predictions. This method splits validation data into k subsets, reserves one subset and runs the model with the remaining k-1 subsets, and calculates the average prediction error rate. Next, we generated linear regression models of actual versus predicted AGLBM by crossreferencing the validation data with the CWHR vegetation class to create linear models for each vegetation class that constituted 1% or more of the USFS lands in the study area. To overcome insufficient number of plots in some classes, we also compared modeled and actual estimates for aggregated groups (all shrub, all needle-leaf, and all hardwood) and finally all CWHR classes. Finally, we created linear regressions with the Uyeda et al. (2016) data -which used field-based biomass measurements rather than allometric equationsto test for bias in the methods used to calculate AGLBM from the FIA and LFRDB data.

Comparison of WWETAC-UCD estimates with other global and statewide biomass datasets
To view our AGLBM estimates in a broader context and to provide useful information for resource managers, we compared our 2010 WWETAC-UCD data to two contemporary spatial biomass estimates for the four southern USFS national forests: the Global Aboveground and Belowground Biomass Carbon Density Data for the year 2010 (NASA-ORNL at 300 m resolution) [57], and the statewide California Air Resources Board (ARB at 30 m resolution) carbon stocks for 2010 [58,59]. We converted the raster biomass data from these projects to kg/m 2 of biomass (rather than carbon) to compare with our WWETAC-UCD AGLBM estimates. In addition, the NASA-ORNL data were resampled from the original 300 m pixel size to 30 m to match the resolution of the WWETAC-UCD data.
We compared our estimates to the NASA-ORNL and ARB biomass estimates in two ways: 1). We isolated pixels by creating three different groups and created corresponding raster masks from the CWHR vegetation data to permit comparison: shrubland (mixed chaparral, chamise-redshank chaparral, coastal scrub, montane chaparral, and sagebrush); non-shrubland (montane hardwood, pinyon-juniper, Sierran mixed conifer, coastal oak woodland, montane chaparral Jeffery pine, annual grassland, and eastside pine); and finally, all vegetation pixels (both shrub and non-shrub types listed above). Next we calculated mean biomass in these three categories for each of the three estimates for each national forest in the study area. We tested for differences between the AGLBM means for shrub types, non-shrub types, and all CWHR vegetation types that were 1% or more of the USFS lands in the study area for each national forest using the two-sample ttest. 2). We extracted the 2010 biomass values from each of the three sources for all validation plots visited in 2010 and built linear models to examine the relationship between the field and modeled biomass estimates. In addition to comparisons made with the remote sensing-based datasets described above , we also wanted to compare the WWETAC-UCD AGLBM estimates to field estimates published in the literature. To do this we cross-referenced our estimates with biomass estimates compiled by Bohlman et al. (2018) [17], hereafter referred to as Bohlman, based on a literature review of 37 research studies spanning 72 years in California shrublands. The review summarizes field studies by three CWHR shrubland classes -mixed chaparral, chamise-redshank chaparral, and coastal sage scrub -and age class (time since fire disturbance). Using stand age and CWHR vegetation type, Bohlman calculated the average biomass increment (kg/m 2 /year) by age class. The age class breaks varied by vegetation type (reflecting the available data): mixed chaparral, 1-10 yrs, 11-20 yrs, 21-30 yrs; chamise chaparral, 1-10 yrs, 11-30 yrs, > 30 yrs, and coastal scrub, 1-10 yrs, >10 yrs. To compare, we generated data for the three shrubland types and two age classes (1-10 years, 11-20 years) from within the perimeters of the Station fire (2009) located on the Angeles National Forest and the Pechanga fire (2000) on the Cleveland National Forest. Note, no comparison can be made with Bohlman's 21-30+ age class because the oldest fire analyzed (Pechanga) is only 20 years post-fire.

Comparison of WWETAC-UCD estimates with field estimated biomass
We focused on biomass data from the Station and Pechanga fires as they had at least 10 years of post-fire recovery; contained adequate amounts of all three vegetation classes; were not substantially impacted by multiple fires during 1980-2019; and both burned at the same time of year (late July-August). Fire history data were obtained from the Fire Return Interval Database (FRID) compiled annually by the USDA Forest Service [15] and fire perimeter data from the CalFire FRAP website (https://frap.fire.ca.gov/frap-projects/fire-perimeters/). A series of spatial raster masks were developed for each CWHR type, first to mask pixels within these fires that only burned once in 1980-2020 and second, to identify pixels that burned at high intensity (using data from the USDA Forest Service

Rapid Assessment of Vegetation Condition after Wildfire [RAVG] program [60]). Masks were applied to each of the annual AGLBM raster layers and statistics calculated, producing a table of 2000-2019 AGLBM for both fires.
We also focused on biomass estimates from 'normal' precipitation years by excluding abnormally dry and wet years to avoid any bias related to precipitation from our recovery estimates. Abnormally wet and dry years were determined using the Palmer Drought Severity Index (PDSI) [61]. PDSI is determined on a 2.5 degree by 2.5 degree grid and compiled monthly for the National Climate Data Center's U.S. Climate Divisions (https://www.ncdc.noaa.gov/monitoring-references/maps/us-climate-divisions.php). The range of these data is 10 (very wet) to -10 (very dry). We used PDSI data from the South Coast Drainage division boundary which corresponded closely with our study area boundary. Monthly South Coast Division PDSI values were assembled in a data table for the years 1995-2020. We defined a drought year as containing 4 or more consecutive months with a PDSI of less than -3 (severe or extreme) in the growing season (November to May) and a wet year as 4 or more consecutive months with a PDSI of 2 or greater (moderately moist to very moist) in the same period (Appendix A).

Plot data
Using the biomass calculations from the species-specific and general equations, the 627 plots used in the RF model (hereafter referred to as the RF plots) ranged in total AGLBM from 0.02 to over 46 kg/m 2 . All RF plots over 16 kg/m 2 AGLBM were in forested CWHR types including pinyon-juniper, montane hardwood, and coastal oak woodland. The allocation of RF plots by CWHR types (accounting for > 5% of the study area) included: 270 plots in mixed chaparral (32% of the study area), 62 in pinyon-juniper (3% of the study area), 48 in chamise-redshank chaparral (6% of the study area), 45 in montane hardwood (3% of study area), 32 in coastal scrub (8% of the study area) and 35 in coastal oak woodland (5% of the study area).

RF model
Our final RF model (mTry = 8, Ntree = 1000) explained 49% of the variance in the OOB (out of bag) samples, with an accompanying RMSE of 4.0, The k-fold cross validation R 2 was 0.51 and a RMSE of 3.9. NDVI is definitively the most important variable in our model (Figure 2) --over twice that of the next ranking variable. A second tier of variables included cwd and DEM, followed by solar radiation and annual and biennial precipitation. Together they would account for an increase in MSE of 46% if removed from the model.  Table 1 for descriptions of the predictor variables along the y axis. X axis units are percent increase in mean standard error (MSE) if the variable is excluded from the model.

Validation by CWHR vegetation class
The statistics for the linear regression models of the validation plots for each CWHR class ranged from R 2 = 0.91 (sagebrush, RMSE = 2.3) to 0.17 (Sierran mixed conifer, RMSE = 9.7) ( Table 2). Due in part to low numbers of validation plots in many of the CWHR classes, only 5 of 11 of the classes had statistically significant (p < 0.05) linear models ( Table  2). Aggregating the CWHR types into shrub, needle-leaf, and hardwood super-classes revealed that shrub types had a weak to moderate significant relationship with the field validation plots (R 2 = 0.30, RMSE = 2.3 [138% of mean]), while the needle-leaf and hardwood types were moderate to strong (R2 = 0.53, RMSE = 3.9 [83% of mean] and R 2 = 0.49, RMSE = 7 [89 % of mean], respectively). The model that included all CWHR types was also moderate to strong (R 2 = 0.54, RMSE = 3.8 [112% of mean]), like the k-fold cross validation values from the RF model.

Comparison of WWETAC-UCD estimates with other global and statewide biomass datasets
To evaluate our WWETAC-UCD data with global and statewide biomass estimates and provide useful information for resource managers, we compared the ORNL-NASA (global) and ARB (California) biomass estimates for the four southern CA national forests for 2010. These National Forests (NFs) have distinct geographies, vegetation types and fire histories that are evident in their respective biomass estimates ( Table 3). The LPNF is the most northerly of the national forests and contains coastal forests, resulting in more productive vegetation types. Consequently, it has the highest mean WWETAC-UCD AGLBM in all three vegetation categories (shrub only, non-shrub only, and all vegetation types. The drier, most southerly forest CNF had the lowest mean WWETAC-UCD AGLBM for all three classes. On each NF (by shrub, non-shrub, and all vegetation classes), the ORNL-NASA AGLBM (with one exception) and ARB estimates (except for the LPNF) were consistently higher than the WWETAC-UCD estimates.
Mean delta values between the WWETAC-UCD estimates and the ARB estimates for all four NFs were comparable -0.25 kg/m 2 for shrub vegetation and 0.45 kg/m 2 for nonshrub vegetation. In contrast the ORNL-NASA -WWETAC-UCD delta values were much higher (2.3 kg/m 2 ) for shrub vegetation than non-shrub (0.78 kg/m 2 ).  (Figure 3), followed by the ARB model (R 2 = 0.29, RMSE = 5.9 kg/m 2 ), Mean biomass values for these plots were not significantly different for all three estimates at the 1% level. The model of the ORNL-NASA and WWETAC-UCD data was not significant at the 5% level (p-value = 0.33). RMSE values for all three models were high (6.7 -4.7 kg/m 2 ) compared to the mean biomass values presented in Table 3. Table 3. Mean total above ground live biomass (AGLBM) estimates for the year 2010 for the four national forests in the study area using the methods developed in this paper (WWETAC-UCD), the California Air Resources Board (ARB) [58] and the Oak Ridge National Laboratory -NASA (ORNL-NASA) [57]. Mean biomass values (kg/m 2 ) for each forest were calculated from the 30 m resolution raster layers developed using each of the three methods. The corresponding standard deviation values are in parentheses. Two sample t-tests for differences in means were conducted between the ARB and the ORNL-NASA AGLBM means and the WWETAC-UCD means. Mean biomass was calculated for three CWHR vegetation classes -Shrub only (Mixed chaparral, Chamise-Redshank chaparral, and coastal scrub), Non-Shrub (All types, excluding Mixed chaparral, Chamise-Redshank chaparral, and Coastal scrub), and All CWHR vegetation types.   [58] and (c) the WWETAC-UCD estimates from this study (30 m resolution). The 2012 false color infrared image from the USDA National Aerial Imagery Program is provided for reference (d).

Comparison of WWETAC-UCD estimates with field measured biomass
Compared to Bohlman, our estimates were consistently higher for all shrub types and age classes (Table 4). Bohlman's average annual biomass increment estimates were higher for all shrub vegetation age classes, with the exception of the 11-20 year chamise chaparral class where no data from the literature review available. Early post-fire recovery (1-10 years) is higher for the WWETAC-UCD estimates, but for the 11+ year age classes AGLBM values for the two estimates are much closer. Table 4. Above ground live biomass (AGLBM) and mean annual biomass increment (growth) values estimated by the WWETAC-UCD model for the three CWHR vegetation types and two age classes presented in Bohlman [17]. WWETAC-UCD values were extracted from the Station (2009) and Pechanga (2000) fires. Bohlman [17] summarized AGLBM and increment data from a literature search of 37 studies spanning 72 years.

Discussion
In this study, we present a repeatable and transparent biomass model that is a substantial improvement in accuracy over two other estimates that include shrubands at both statewide and global scales (Figure 3). In addition, by adjusting the time-dependent predictors in our model it can be applied consistently across different years, yielding a temporal stack of biomass estimates that can be used to assess recovery after disturbance. Our estimates by age class and vegetation type are generally consistent with field estimates from the literature [17], (Table 4), particularly among the older (> 10 years) age classes.
The foundation of this work has been the inclusion of shrub measurements into FIA plot re-visits for 2001-2010 on the four southern California NFs, which provided essential data. However, there are two potential issues with the plot data that we used. First, accurate plot locations are essential so that the correct raster values can be associated with the plot measurements. However, little information was available on the methodology or accuracy of the FIA or LFRDB plot geolocations, which raises questions as to whether the value of the pixel at plot center correctly represents the plot. Second, given that making in-situ biomass measurements is time-consuming and costly, typically involving destructive sampling of vegetation, separation of components (stems, leaves, roots), followed by drying and weighing of these separate materials [63], we used allometric equations to estimate biomass from the FIA and LFRDB plots. Both the allometric equations and especially the general equation undoubtedly are sources of error in estimating the true quantity of biomass at the plot level. We used species-specific allometric equations for 14 shrub species and two genera from two studies [46,47]; for the remaining shrub species with no published allometric equation we used a general equation (described in Methods). This equation uses a generic bulk density constant for all shrubs and is a general approximation that is probably relatively low for many species [64][65][66]. Biomass estimates using the general equation have a moderate to strong relationship with allometric equation estimates (Appendix B), but further work is necessary to examine the accuracy of these estimates to in-situ data on chaparral shrub species. In addition, the bulk density constant used for forbs and grasses has also been shown to yield underestimates of grass biomass [67].
To maintain consistency, we used the same suite of allometric equations on the Powerhouse plots as the FIA data; the LFRDB data used the general equation; and Uyeda et al. (2016) used AGLBM calculation methods that did not use these equations. Many of the equations used were developed outside of southern California. Further work is needed to develop and refine allometric models and bulk density constants for chaparral species. Finally, the FIA methods do not specify recording tree seedlings or any shrub or forb with < 3% cover; presumably if seedlings were present in significant number or if there were several species present with less than 3% cover this could also result in an underestimation of plot biomass.
NDVI is a proven spectral index in assessing vegetation in Mediterranean-type shrublands [68,32,69,44]; in our RF model it ranks highest in variable importance ( Figure  2). This is an important consideration as NDVI, along with annual and biennial precipitation are the time-dependent predictors -the factors that are defining the year-to-year changes in biomass. Chaparral landscapes in southern California are subject to highly variable precipitation, and AGLBM varies accordingly. This is also reflected in the variable importance of our model; annual and biennial precipitation (ppt_1yr and ppt_2yr, respectively) are the 5 th and 6 th most important variables in the model. The importance of climatic water deficit in our model confirms the findings from other studies in California's shrublands [70] and shows that the availability of soil moisture and not just precipitation is a critical factor in production in drought-prone chaparral communities. Short term fluctuations in precipitation probably influence AGLBM thru corresponding fluctuations in leaf area and herbaceous/grass production, while soil moisture is an indication of a site's ability to support perennial woody vegetation.
There was consistency in the validation statistics for the model (k-fold cross validation R 2 = 0.51, OOB RMSE = 4.0) and from the validation plot metrics (R 2 = 0.54, RMSE = 3.8 for all CWHR types), suggesting that our model is capable of making good predictions for observations not used to build the model.
Although our focus was on improving AGLBM estimates for chaparral shrublands, we did include all wildland vegetation types in this study. Breaking down our AGLBM estimates by CWHR class allowed us to explore errors or inaccuracy that might be associated with vegetation type, although for many of the classes there were few validation plots ( Table 2). The high RMSE values we report for the shrubland CWHR classes are an indication of the spatial heterogeneity in shrubland ecosystems and the difficulty inherent in building a model that captures this diversity accurately at the landscape scale [71]. Furthermore, within a single CWHR vegetation class there may be a wide range of species assemblages and structure types. Ideally we would have enough field plots for modeling and validation of vegetation classes at a much finer spatial scale. For the majority of CWHR vegetation types in the study area (Table 2) our model underestimated AGLBM of the validation plots (slope values < 1), with the exception of sagebrush and This underestimation is probably increased if we consider the application of the general allometric equation used to calculate AGLBM for most shrub species in the FIA plots and in all the LFRDB plots, which are likely underestimating quantitative field measurements of biomass.

Comparison of WWETAC-UCD estimates with other global and statewide biomass datasets
Since the majority of AGLBM for the continental US is contained in forests and woodlands, it is not surprising that most national and global scale biomass estimates do not address shrublands (see [21,22]). In these datasets, the total carbon present as well as sequestration capability of chaparral shrublands is low compared to forests. However, this is changing as regions and countries that contain substantial areas of Mediterranean-type shrublands and woodlands are assessing carbon stocks to fulfill agreements under the Kyoto Protocol and the REDD+ (Reducing Emissions from Deforestation and Forest Degradation) project.
We compared our estimates to two previous efforts that also mapped AGLBM for shrublands in southern California. The goal of the ARB project was to consistently map carbon stocks across California. They assigned AGLBM to LANDFIRE vegetation existing vegetation types (EVT), cover (EVC) and when available existing height classes [59]. This approach, known as "stratify and multiply", tends to obscure the wide range of AGLBM variability within a vegetation class (which can be observed in Figure 4) and also is subject to inaccuracies in the vegetation classification [59,72]. In our case, the ARB aggregation of AGLBM by LANDFIRE classes resulted in their estimates by forest to be higher than ours (Table 3). Second, the ORNL-NASA project goal was to develop a global biomass dataset that captured the uncertainty surrounding both the above and below ground biomass estimates [57]. In developing their datasets, they leveraged the GlobBiomass project [73] for forested areas. Estimates for shrublands and savannahs were developed for Africa using synthetic aperture radar (SAR) [74]. These shrubland/savannah estimates were subsequently applied globally.
We made the comparison between the ARB, ORNL-NASA and WWETAC-UCD to show that scale matters -compiling statewide and global estimates at the local scale of National Forests obscures the spatio-temporal heterogeneity that is intrinsic to chaparral shrublands ( Figure 4). Additionally, it highlights the inaccuracy of shrubland biomass data in these broader scale efforts. However, it is interesting that the means of predicted AGLBM for the 22 validation plots are not significantly different for all three estimates (ARB, ORNL-NASA and WWETAC-UCD), suggesting that these estimation methods produce similar values when aggregated, but this does not hold true at the National Forest level. Note however, that the ARB and ORNL-NASA studies place cautionary notes in regard to their shrubland estimates; stating that carbon in shrublands is poorly quantified [58] and that the estimates for shrublands in Spawn and Gibbs (2020) were developed based on Bouvet et al. (2018) and should not be used outside of the African continent.
Further considerations when reviewing estimates of AGLBM at the NF level is the fire history of each NF which should be reflected in the 2010 AGBLM estimates (presented in Table 3). The LPNF had the highest area burned (41%) between 2005-2010. Based solely on this fire history, we would assume that AGLBM would be lower on the LPNF because a greater percentage of the area was in an earlier post-fire successional stage, but that is not the case, meaning that other factors such as greater productivity in more northerly latitudes, location in the coast range, and shrub species with different life histories (e.g., resprouting versus seeding) are involved. However, the effect of fire is evident between the ANF and SBNF; WWETAC-UCD AGLBM is lower for the ANF (31% burned) than the SBNF (8% burned).

Comparison of WWETAC-UCD estimates with field estimates
Although most of the AGLBM age class/vegetation type means reported by Bohlman are represented by 2 or fewer studies, our results are nevertheless comparable ( Table 4). The 1-10 year age classes exhibit the biggest differences, one explanation for this is that most post-fire recovery in chaparral communities takes place during this period. Our estimates are the mean of all pixels 1-10 years post fire (from the Station and Pechanga fires) so there is probably a higher frequency of high AGLBM values than in the Bohlman studies, which are based on temporally and spatially limited field plots. For all vegetation classes except coastal sage scrub, our AGLBM values for years 11-20 were much closer (0.1 -0.4 kg/m 2 difference) to the values presented in Bohlman. Established vegetation characteristic of these older age classes is more resistant to short term climatic fluctuations, i.e. AGLBM tends to be more stable compared to the 1-10 year age class where the AGLBM is more sensitive to climatic variation. Future research into post-fire recovery by community type using the WWETAC-UCD AGLBM data stack could examine spatial and temporal variation of AGLBM by age class; this may provide further evidence of this trend. Another possible explanation is that our estimates did not include wet or drought years. Over half (60%) of the studies reviewed in Bohlman were conducted in the 1970s and 1980s and no information is provided on which studies occurred in drought or wet years. Assuming that droughts were more prevalent than wet years, we can surmise that the Bohlman estimates are relatively low because some were sampled in drought years. Prolonged ecological drought can result in substantial mortality in chaparral shrub species, suppressing AGLBM [11].
Despite the sources of error described above, we have confidence in our AGLBM estimates because of the strong relationship (R 2 = 0.69, RMSE = 1.02) they had with field measurements of AGLBM conducted by Uyeda et al 2016. This suggests our estimates are better at characterizing AGLBM than the validation statistics ( Table 2) indicate and that there is error in the allometric and general equations used for most of our plot data. Uyeda et al 2016 used stem growth rings to estimate basal area and biomass [52]. Although limited in geographic scope, the Uyeda et al 2016 plots are in the mixed chaparral vegetation type which comprise over 50% of our study area. Generating additional species specific allometric equations and refining the bulk density constants used in the general equations would be a valuable contribution for future biomass mapping efforts.
The advantages of adding active remote sensing data (e.g. lidar, radar) to optical data sources is well documented [75][76][77][78]; these data provide vegetation structure that is difficult to extract from optical data alone. Future shrubland mapping efforts should investigate synthetic aperture radar (SAR) and spaceborne lidar (ICESat-2 ATLAS) data in combination with optical data [75,79,80].

Conclusions
Mediterranean-type shrublands and woodlands, already noted for their globally important biodiversity [81], are increasingly being recognized for their importance regionally and globally in carbon sequestration [4,29]; net primary productivity [29,82,83]; and water quantity and quality [84]. Carbon sequestration rates may be comparable to those estimated for temperate old-growth forests [4]. Vegetation biomass is an essential part of the landscape assessment process for these services and the ability of chaparral shrublands to deliver these services is not consistent across spatio-temporal gradients due to environmental differences (e.g., slope, aspect, elevation) and disturbance from fire. The USFS mandate to monitor ecosystem services and the impacts on those services due to fire has led to the development of tools such as SoCal EcoServe, a web-based application that delivers spatial and tabular data on pre-and-post fire ecosystem services in southern California [85]. The AGLBM data developed in this study is incorporated in the SoCal EcoServe tool.
In this study we created a transparent and repeatable model that can be used to estimate AGLBM on shrubland dominated landscapes that can be consistently applied annually across the Landsat period of record. This is timely given that the USDA Forest Service is now required to monitor and report on carbon stocks; every National Forest System unit is required to report annually on their efforts to incorporate carbon information in land management plans, projects, decisions, and communications [86]. A principal goal of the WWETAC-UCD methodology was to develop an approach that could be applied to different years to assess future inquiries into post-fire recovery of biomass and vegetation type conversion. We accomplished this by using the NDVI and precipitation data specific to the year of the AGLBM assessment. Consequently, it overcomes one of the major limitations of most landscape scale biomass estimates: that they are limited to a single point in time. This is especially important considering the disturbance frequency of chaparral communities.
Given that chaparral shrublands are difficult environments for field work -mature stands are nearly impenetrable, and in southern California access is difficult and costly, the AGLBM data we present here are a substantial improvement over the existing statewide and global estimates for shrubland dominated landscapes. As such, they present a valuable step forward in accurately accounting for the role these under-studied communities play in carbon budgets and ecosystem services.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. thank the following people who provided valuable input throughout the study including Hugh Safford, Nicole Molinari and Allan Hollander. We also acknowledge valuable data provided by John Battles, Kellie Uyeda, George Vourlitis, and John Chase. We also thank Kama Kennedy for her help in FIA data interpretation and processing.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Palmer Drought Severity Index (PDSI) monthly values for 1995-2020 for the South Coast Drainage U.S. Climate Division in southern California. Anomalous years (wet or drought) are indicated by the horizontal lines in the upper and lower areas of the graph. Drought years were defined as containing 4 or more consecutive months with a PDSI of less than -3 (severe or extreme) in the growing season (November to May) and a wet year as having 4 or more consecutive months with a PDSI of 2 or greater (moderately moist to very moist) in the same period. Linear regression model statistics for biomass estimates using species-specific allometric equations and estimates using the general biomass equation [48]. Individual shrub measurements used in these estimates were collected during a field campaign on the Los Padres National Forest in 2020.