1. Introduction
Forests are paramount in regulating the global environment, mainly through sequestering carbon [
1]. They are particularly important these days to combat the changing climate, which affects people’s lives in many aspects. Due to the multiple significance of forest resources, information about the resource base, its spatial distribution and spatio-temporal changes have become a global concern. The information is a basis to make decisions when planning and assessing impacts regarding mitigation and adaptation to global climate change [
2,
3,
4]. Following a series of international dialogues, the conference of the parties to the United Nations Framework Convention on Climate Change (UNFCCC) has passed several decisions to combat the impacts of climate change through sequestering carbon in the living biomass, which mainly includes forests. Incentivizing the REDD+ (Reducing Emission from Deforestation and Forest Degradation, Sustainable Forest Management and Conservation) programs was one of the main issues in the Paris agreement in 2015 [
4]. All these programs, initiatives and treaties require information about the resource stock and trends of changes over time.
Many forest types, which the dry Afromontane forest type is one of them, contribute to the REDD+ programs through storing carbon for mitigating the impacts of climate change. The dry Afromontane forests in Ethiopia are attributed to areas with an altitude range from 1500 to 3400 m above sea level; mean annual temperature of 14–25 °C; and mean annual precipitation of 400–1700 mm [
5]. These forests are of great ecological and economic importance [
6,
7]. They contribute to the national and international initiatives towards biodiversity conservation, soil erosion control and the mitigation of the global climate change [
6]. Although these forests are important forest types in Ethiopia and have various benefits [
8], they are under pressure from the local community for expansion of agriculture, settlement and fuelwood collection [
7,
9].
Among the forest variables, aboveground biomass (AGB) estimation is of great importance due to its multiple uses. AGB of trees is the weight of all living materials of trees above the soil surface including the stem, stump, branches, bark, seeds and leaves. A change in AGB stock can be used to monitor forest dynamics. AGB estimates, which can be converted to carbon stock estimates, are required in forest management, particularly in the implementation of the REDD+ programs that are underway in Ethiopia. Despite growing requirements for a precise estimation and timely reporting, the current tradition of measurement, monitoring and change estimation for forest resources in Ethiopia relies mainly on field-based sample surveys (FBSSs). These methods are not suitable for biomass monitoring in large areas. The FBSS methods are constrained by high costs, logistical challenges and limited field access [
10]. As a result, many of the national forest inventory programs in developing countries including Ethiopia are dependent on field inventories conducted with relatively small sample sizes and thus, have high uncertainties in the estimates [
11]. Studies about uncertainties of emission reduction in Ethiopia indicated that the estimates based on FBSSs with small sample size are not sufficiently precise to support decision-making [
12]. The sources of uncertainties for biomass estimation using remotely sensed (RS) data can be tree measurement, allometric models or RS-based model predictions. Therefore, it is important to look for alternative approaches that can reduce costs and contribute to improving the precision of estimates from pure FBSSs.
In recent years, RS data and associated estimation techniques have become viable options to support quantification of resource stocks cost-effectively in areas inaccessible for FBSS [
13,
14,
15]. Previous research has shown that RS data can help reducing FBSS efforts without loss of precision of estimates [
16]. Following the improvements in RS data and technologies, there are many sources of useful satellite RS data for estimation of forest variables including AGB. Landsat and Sentinel are examples of such satellite programs, which provide freely available data [
17,
18]. Images of Landsat-8 (L8) and Sentinel-2 (S2) are useful for AGB estimation in various forest ecosystems [
19,
20,
21,
22,
23]. However, data with higher spatial resolution are often considered better [
24,
25]. PlanetScope (PS) images are among potentially applicable commercial satellite RS data, which have 3 m spatial resolution and been acquired daily. These image characteristics make the PS data suitable for REDD+ MRV (measurement, reporting and verification) systems [
25]. Compared to the L8 and S2 images, fewer studies have been carried out on biomass estimation using the PS images [
26].
Various studies used either spectral band (SB) reflectance, spectral indices (SIs) or texture variables solely or in combination for AGB modelling. For example, a study by [
27] on AGB estimation using Landsat TM data in the Brazilian Amazon indicated that a combination of SB and texture variables improved AGB estimation. The study showed the importance of texture information particularly in primary forests, which have complex canopy structures. The most commonly used SBs that correlate strongly with AGB, particularly in forests with simple stand structure, are the visible, near-infrared and shortwave infrared bands (e.g., [
27,
28]).
Existing scientific works found that some types of SIs contribute greatly to AGB estimation in different forest types. A study of AGB estimation using Landsat images in Northwestern Turkey revealed that SIs were better in estimating AGB in that forest type as compared to SB reflectance [
29]. However, the sensitivity of SIs to biomass vary between environments and forest types [
30,
31,
32,
33]. According to the research findings by [
30] in India, a significant correlation was observed between AGB and simple ratio (SR), difference vegetation index (DVI), normalized difference vegetation index (NDVI), soil adjusted vegetation index (SAVI) and modified soil adjusted vegetation index (MSAVI). Gizachew et al. [
19] found that NDVI, enhance vegetation index (EVI), SAVI, MSAVI, and normalized difference moisture index (NDMI) had significant correlations with total AGB in the Miombo woodlands of Tanzania. Furthermore, atmospherically resistant vegetation index (ARVI) of L8 imagery was used for AGB estimation in Mount Tai, China [
22]. A similar study in southern Portugal indicated that SIs are useful as predictors of AGB [
34]. Imran et al. [
35] in their study in Pakistan found that red-edge normalized difference vegetation index (RENDVI) had greater correlation with AGB than the individual SBs. Together with other SIs mentioned above, the red-edge simple ratio (SRRE) index was used for estimating AGB of mangrove forest in the Philippines [
26]. Motohka et al. [
36] studied the normalized difference green index (NDGI) as a good phenological indicator of various ecosystems in Japan. According to the study by [
37], data collected using unmanned aerial vehicles for monitoring the post-fire recovery of pine forests in the Mediterranean areas indicated excessive green index (ExGI) as a useful variable for estimating diameter at breast height (DBH), which is a default predictor of AGB allometry. In another study, ExGI was used for discriminating vegetation types in the USA and Canada [
38]. Furthermore, SIs that are indicators of leaf greenness and used in different applications including crop monitoring and discriminating vegetation types, like the green leaf index (GLI) and vegetation index (VI), were included in the current list of potential predictor variables to test if they relate to AGB. See Table 2 for detailed descriptions of the SIs explored in this study.
The other group of potentially useful variables for AGB estimation are the texture data derived from the high-resolution PS images. These variables describe the role of pixel resolution in identifying spatial variations of image values. The texture information of L8 and S2 images were not used due to the coarser resolution of these images as compared to the PS images. Several studies indicated that image texture variables could improve AGB estimation, especially in dense tropical forests [
22,
27,
39]. The most common method of calculating image texture variables is the grey level co-occurrence matrix (GLCM). Table 3 shows how the GLCM variables were calculated.
Some studies (e.g., [
16,
19,
40]) evaluated the use of RS data for biomass estimation in small study areas in the region of east Africa. However, to the best of our knowledge, except some efforts related to the use of Landsat images for land cover classification and mapping, data from the mentioned satellite missions subject to analysis in the current study have never been used to assess AGB of the dry Afromontane forests in Ethiopia.
Because there is little current experience with what types of variables extracted from the satellite systems in question that would be useful for AGB modelling in this forest type, the first objective of this study was to explore what kind of variables extracted from the different satellite programs might be useful for AGB modelling in the dry Afromontane forest. The second objective was to evaluate to what extent such RS data could help improving the precision of AGB estimates beyond the precision of a pure FBSS in these forests.
2. Materials and Methods
2.1. Description of the Study Area
The study was conducted in the Degaga-Gambo forest in south-central Ethiopia. It belongs to a state-owned enterprise, Oromia Forest and Wildlife. The study area is located on the eastern escarpment of the central rift valley of Ethiopia, in the Horn of Africa (
Figure 1). It extends geographically from 38°45′ to 38°56′ E longitude and from 7°13′ to 7°33′ N latitude. The forest has an area of 14,176 ha. The altitude of the study area ranges from 2100 to 2730 m above sea level. The study area has a bimodal rainfall distribution. The main rainy season is from July to September while the short rainy season is from March to May [
41]. The mean annual precipitation and temperature in the area are 1245 mm and 14.9 °C, respectively.
The forest area has both natural and plantation forest types. The major species of plantation forest compartments, which are mostly found in the lower elevations, are Cupressus lucitanica, Pinus patula, Grevillea robusta and different Eucalyptus species. The natural forest has high tree species diversity. The dominant tree species observed in the natural forest include Syzygium guineense, Afrocarpus falcatus, Juniperus procera, Pitosporum viridiflorum, Maesa lanceolate, Millettia ferruginea, Croton macrostachyus and Maytenus arbutifolia. The objectives of the enterprise are the production of lumber and poles from the plantations and conserving the natural forests. The natural forests are home to a wide range of wildlife species and are sources of water for the downstream areas. Nevertheless, the forests are under severe pressure. Illegal cutting of trees and land-use change for settlement and farmland expansion are the common problems in the area.
The forest has complex vertical and horizontal structures. Besides the species diversity, there is large variability in tree height and wood basic density of the study forest. The mean (and range) of observed tree height was 13.90 m (4.90–40.10 m); while the mean (and range) of wood basic density (g cm
−3) for tree species in the forest was 0.59 (0.43–0.98) [
42].
2.2. Field Data Collection
The sampling frame was defined to include the Degaga-Gambo forest territory, which contains both the natural and plantation forest types. Circular sample plots (SPs) of 17.85 m radius aligned in a systematic grid at an interval of 1.18 km were used for field data collection (
Figure 1). One hundred and eleven plots (from the natural forests, plantation forests and other categories like clear-cut, cropland, settlement and grassland cover types) were sampled from February 2018 to January 2019. Handheld global positioning system (GPS) receiver was used to navigate to the pre-defined locations of the SPs. Then, the precise coordinates of the plot centers were determined using differential GPS and global navigation satellite system (GLONASS) measurements. Two Topcon legacy-E + 40 dual-frequency receivers were used for this purpose [
43]; one serving as a base station and the other as a rover field unit. The receivers record pseudo-range and carrier phase of GPS and GLONASS.
The base station was set up at Wondo Genet College of Forestry and Natural Resources campus. The Euclidean distance between the base station and the plot centers ranged between 21.70 and 57.20 km with an average distance of 41.80 km. To determine the position of the base station using precise point positioning, the GPS and GLONASS data were recorded continuously for 24 h [
44]. At the plot centers, the rover was mounted on a 2.98 m carbon rod and recorded for 41.50 min on average using a one-second logging rate. The recordings were post-processed using the Magnet tools software [
45]. The standard error of the post-processed planimetric plot coordinates ranged from 0.02 to 1.11 m with a mean of 0.23 m.
In each of the SPs, we recorded species names and measured DBH, i.e., the diameter of trees at 1.3 m above the ground, for all the trees with DBH ≥ 5 cm. Caliper or diameter tape was used for DBH measurement depending on tree size. Tree height measurements were carried out for 10 trees selected systematically in each of the plots using a Haglöf vertex laser 5 instrument [
46]. Heights of the trees for which height was not measured were predicted using height-diameter models developed based on the sample trees [
16,
19,
47].
2.3. Plot-Level AGB Estimation
Plot-level AGB was estimated by aggregating the predicted individual tree AGB in the respective plots. For predicting tree AGB in the natural forests, the allometric model constructed by [
42] was used. This model has DBH, height and wood basic density as predictor variables. Wood basic density values were obtained from [
48]. For plantation forests, tree AGB was estimated using species-specific allometric models. Accordingly, for
Cupressus lusitanica, we used the model by [
49] with DBH and height as predictor variables. For
Eucalyptus species and
Grevillea robusta, models by [
50,
51] were used, respectively, having DBH and height as predictor variables. The plot-level AGB data in units of kg m
−2 were converted to Mg ha
−1 (megagrams per hectare) since the data were collected from large plots (1000 m
2). The plot-level AGB values ranged from 0 to 845.70 Mg ha
−1 with a mean and standard deviation of 184.35 Mg ha
−1 and 155.10 Mg ha
−1, respectively.
2.4. Satellite Image Acquisition
Satellite images acquired in January 2019 were considered since this is the dry season when most of the undergrowth vegetation dries up and is easier to distinguish from the trees. This time window was also within the field inventory period. Additionally, selected images were restricted to those with cloud cover < 5%. A detail description of the images used in this study is given in
Table 1.
Single tiles of each of the L8 and S2 products were downloaded from the USGS Earth Explorer website [
52]. Both images were Level-1C products, which means that the images were corrected for any possible topographic and geometric errors. The processing level of the L8 image used in this study was L1-TP, which is a Level-1 precision and terrain corrected product. Besides terrain and topographic correction, radiometric correction has already been done for S2 products before delivery. The SBs used in this study (i.e., blue (B), green (G), red (R), near-infrared (NIR), shortwave infrared-1 (SWIR1) (for both L8 and S2), red-edge (RE) (only for S2)) have spatial resolutions of 30 m for L8 and 10 or 20 m for S2 (see
Table 1 for details of the resolutions of individual bands).
We downloaded the PS Ortho Scene Product (Level-3B) from the Planet Explorer website [
53]. Six scenes of orthorectified scaled Top of Atmosphere Radiance (at sensor) images were downloaded to cover the study area. These images contain information about the B, G, R and NIR SBs.
2.5. Image Processing and Independent Variable Definition
In the current study, we first evaluated a great number of potential candidate variables that could be useful for AGB modelling. A series of image processing techniques were applied to the satellite images to get the independent variables. First, atmospheric correction was done using the QGIS software version 3.1.0 [
54] and python codes. For L8 and S2 images, the semi-automatic classification plugin (SCP) of QGIS was used for running the dark-object subtraction (DOS-1) algorithm, which removes the dark pixels that result from atmospheric scattering. The satellite images were transformed from spectral radiance to top of atmosphere reflectance values based on the conversion factors in the metadata file that comes along with the image files. However, the PS images were processed using the empirical line correction for conversion of radiance to reflectance values indicated in Equation (1):
The radiances of the input images were converted to reflectance values and atmospheric correction applied since variables from multiple images were compared. In addition to variation in the sensors, the three sets of images were acquired on different dates although within 13 days of maximum gap among them. Furthermore, six scenes of the PS imagery covered the area of interest. After atmospheric correction, all the images became Level-2A products, which have pixels with surface reflectance values suitable for calculating SIs and texture variables used in this study. Atmospherically corrected SBs, which were used for creating SIs and texture variables shown in
Table 2 and
Table 3, respectively, were selected for this study.
Table 2 shows the expressions used to derive spectral index values from each satellite image type used in this study and references to scientific evidences on the use of the indices in general and for biomass estimation in particular.
Descriptions of the GLCM image texture data derived from the PS images are presented in
Table 3. Texture information of the L8 and S2 images were not used due to the coarse spatial resolutions. Sentinel Application Platform (SNAP) software version 7.0.0 [
69] was used for calculating the texture variables. Processing parameters of window size of 11 × 11 pixels, angle in all directions, probability quantization with level of 128 were set to obtain the texture data used in the current study. This processing window size was set to provide an equivalent area to the field SPs.
Area-weighted mean and standard deviation (hereafter referred to as mean and standard deviation, respectively) of all the variables were extracted to each SP using QGIS. These were used as independent variables of the models constructed from each RS data type, the details of which are explained in the following sections.
2.6. Variable Selection and Model Fitting
The purpose of the AGB regression modelling was to construct models with variables from the RS data as predictors and which could be used to enhance the precision of the overall AGB estimates for the study area. For the AGB estimation, we used a model-assisted approach to inference (see details in
Section 2.8) because that would allow a direct comparison of the uncertainty of the AGB estimate with similar uncertainty estimates obtained for the pure field-based estimate. In model-assisted estimation, the model form and the predictors selected for the model should be determined independent of the sample at hand [
70]. In model-assisted inference, no claim of a true model is necessary. A poor choice of model form and predictors would have negative consequences in terms of efficiency [
71] (p. 238), but would not invalidate the unbiasedness of the estimator. If, however, the choice of model form and the choice of predictors were sample-based, e.g., by choosing predictors by optimizing the predictive power of the model for the sample at hand, there would be a risk of overfitting and underreporting of uncertainty [
72].
On this background, we found ourselves in a dilemma in this study. On one hand, we had no prior information about useful variables derived from the given RS data for AGB modelling for the particular forest types under study. Neither had we any experience with suitable model forms for the study area. On the other hand, if model selection and variable selection were optimized for the given sample, overfitting would be a likely consequence.
To balance these conflicting requirements, we first did a screening of the variables mentioned above to gain first-hand experience with the three types of satellite data for the current forest types. We then chose a model-form a priori, and allowed only a small number of predictors to be included in the model. In the modelling phase, we paid special attention to any sign of overfitting.
Thus, in the first phase of the analysis, Pearson’s correlation coefficient was used to explore the relationships of individual independent variables with AGB. Those variables that had a significant correlation with AGB were used as potential variables for the AGB model fitting. Furthermore, correlation analysis was done for each pair of independent variables within each satellite data source to evaluate the level of intercorrelation between them. Results of the correlation analysis indicated that most of the variables were strongly intercorrelated (
Figure 2). Hence, variable screening was employed to reduce the redundant information emanating from those strongly intercorrelated variables. Results of the initial analysis using more complex models showed overfitting problems, which was manifested in precision difference between training and validation results for each model. Such severe overfitting was observed for models with more than two variables. Because of the risk of overfitting, we restricted the selection of independent variables in the models to a maximum of two variables only. The results from the analysis of models with more than two variables are not documented any further.
The relevant variables of each satellite data source were related to plot-level AGB using the logarithmic link function in a generalized linear model (GLM) of the form:
where
is ground reference AGB (Mg ha
−1),
is intercept,
is the coefficient of the independent variable
, and i is the index of an individual independent variable.
This model form was chosen since it provides valid estimates where true zeroes are included in the estimate of AGB, which has positive continuous numerical values. A study of AGB prediction using topographic variables in human-impacted tropical dry forest landscapes of Mexico indicated that GLM estimation technique improved predictions [
73]. Thus, the mean of SBs and SIs of L8 image were candidate independent variables for the L8 model. The mean and standard deviation of the SBs and SIs of the S2 image were candidate independent variables for the S2 model. The mean and standard deviation of SBs, SIs and texture features of PS bands were used as candidate independent variables for the PS model.
2.7. Model Validation
We evaluated the performance of the models using a leave-one-out-cross validation technique. The cross-validation was used to assess overfitting. Each model was validated in terms of coefficient of determination (R
2), root mean squared error (RMSE, %), mean deviation (MD, %), and Akaike Information Criterion (AIC) as determined by Equations (3)–(8). The AIC was used to evaluate the maximum likelihood of the model parameters. The maximum likelihood estimation enables choosing the parameter that makes the likelihood of having the observed data a maximum fit with the dependent variable (AGB) without causing an overfitting issue. When comparing models, the model with a smaller AIC is better than the one with a higher AIC.
where
and
are the ground reference and predicted AGB (Mg ha
−1) in the ith SP;
is the mean of ground reference AGB ( Mg ha
−1) of all SPs; n is the sample size;
is the likelihood function of the observations,
is the maximum likelihood estimation of the parameter β given the number of parameters of k within the model.
In addition to the validation metrics indicated above, we did qualitative evaluation based on a visual comparison between the predictions using the selected models in each satellite data source and false-color composite (i.e., band combination of NIR-R-G in the R-G-B channels) depiction of the S2 image.
2.8. Population-Level Estimation and Efficiency Assessment
Based on the SP inventory data, for the sample size of 111 plots of about 1000 m
2 area, the estimators of the mean AGB for the population and its variance were calculated by Equation (9) and Equation (10), respectively [
71]:
where
is AGB (Mg ha
−1) of the ith SP in the sample and n is the sample size.
The 95% confidence interval (CI) of
was calculated using Equation (11):
where
is the standard error (SE) of
and t is student’s t at a significance level of 0.05.
Similarly, we estimated the mean AGB for the entire study area using the selected regression model for each satellite data source. For this purpose, the study area was tessellated into grid cells of 31.64 × 31.64 m providing a total of N (141,604) population units. The size of the grid cells was chosen to be equivalent to that of the SPs. Area-weighted mean and standard deviation of the variables used in the regression models were extracted for each grid cell using QGIS. AGB was predicted for each population unit (i) in the map of the tessellated granules using the selected regression models for each satellite data source and is represented by
. Because the prediction relied on field data collected based on probability sampling inside the population of interest, we adopted generalized model-assisted regression estimators. The mean and the variance estimates were computed using Equation (12) and Equation (13), respectively [
71] (p. 231):
where
is the mean remote sensing-assisted estimate of AGB (either L8, S2 or PS). The first term in this estimator (
) is the mean of the model predictions (
for all population units, and the second term (
) is an estimate of the mean error calculated over the sample units and compensates for systematic model prediction errors.
where
and
are the estimates of error at each data point (i) and the average, respectively.
The SE of the mean AGB estimators (i.e., and ) were calculated by taking the square root of the respective variance estimators and .
The study assessed the gain in precision of AGB estimation with the use of the three types of RS data. The measure of quantifying such a gain in precision of using RS data over the pure field-based estimates was expressed using relative efficiency (REf). REf quantifies the magnitude of estimated variance of a remote sensing-assisted estimate of mean AGB to a field-based estimate. It was computed by Equation (14) as the ratio of the variance of the field-based estimates to the remote sensing-assisted estimates:
When REf is greater than one, it is interpreted as the amount of additional precision gained due to the use of the RS data for estimating mean AGB.
5. Conclusions
Optical RS images from L8, S2 and PS satellites were studied to identify relevant RS predictor variables that could be used to enhance AGB estimation in a dry Afromontane forest. Most of the SBs, some SIs and texture variables (listed in
Table 4) were found to be promising variables for predicting AGB. Although some of them were not selected in the models used for assisting AGB estimation, we identified variables including the mean of GLI, ExGI and NDGI that were seldom used for AGB modelling but are highly correlated with AGB. We recommend a detailed investigation of the importance of these variables for AGB assessment in various forest conditions.
The simple models selected for each satellite data source enhanced AGB estimation. Of the variables used in the models, the SWIR1 SB, which lacks in the PS data, was a useful variable of the L8 and S2 images for AGB estimation in this forest type despite the huge differences in pixel resolution among the image types. The study suggested that the additional spectral information of L8 and S2 images was more determinant of AGB estimation than the small pixel size of the PS images.
The use of RS data for AGB estimation improved the precision of estimates. Thus, the remote sensing-assisted estimation techniques used in this study will complement the FBSS estimates of AGB by improving precision. The model-assisted estimation will reduce sample sizes to obtain a similar estimation efficiency with the field survey. However, the models used for AGB estimation in this study revealed saturation problem. Therefore, future studies should focus on refining these limitations using a synergy of different data sources to enhance the estimation efficiency of AGB models beyond the ones achieved in the current study.
The methods used in this study could be adopted to similar conditions in forests that have limited application of RS data. The potential predictor variables derived from optical satellite images for biomass estimation were identified from studies showing global experiences. Exploratory data analysis was used to identify relevant predictor variables for biomass estimation in the current study site. Choice of a model form that is important for biomass required understanding the characteristics of data types. The selected models for each image type predicted biomass with estimation efficiencies comparable with those obtained in other forest types. These methods contain a unique mix of techniques capable of using satellite images for biomass estimation in a data scarce forest type.