1. Introduction
The Sudano-Sahelian woodlands occupy vast areas between the Saharan desert and the moist forests of the Guinean zone [
1,
2]. Woodland tree cover is an essential element of the local livelihoods, in particular through agro-forestry practices [
3], fuelwood, and timber extraction, and the provision of non-wood products (such as food, fodder, and medicine). The wide area extent also makes this landscape type an important component in the global climate system by sequestering and storing substantial amounts of carbon in woody biomass and soils [
4,
5,
6]. At present, these woodlands are subject to increasing pressure from intensified land use [
7] and climate change [
8]. Local case studies based on field assessments and high resolution remote sensing data have shown that these factors have resulted in decreased tree density, carbon stocks, and floristic diversity [
9,
10,
11]. Yet other local case studies show that tree cover conditions have improved substantially since the severe droughts that hit the area in the 1970s and 1980s [
12]. Such improvements are generally attributed to increased rainfall or farmer managed natural regeneration, with notable cases found in northern Burkina Faso [
13] and southern Niger [
14,
15]. Given these divergent research findings and the importance of trees for local livelihoods, timely information on the extent and conditions of woodlands, including agroforestry landscapes, is therefore of great interest to a number of local actors, such as researchers, natural resource managers and forestry industries [
2].
In this paper, we evaluate the potential of Landsat 8 imagery to map two attributes commonly used to characterize tree cover structure and conditions, namely tree canopy cover (TCC) and aboveground biomass (AGB). Quantifying TCC and AGB at spatial scales relevant for natural resource monitoring (e.g., landscape scale) through field surveys is time-demanding and costly. Furthermore, the application of robust sampling strategies in woodlands is complicated by the heterogeneous landscape composition and the variable tree cover structure [
1]. A large body of research has explored the potential of using various satellite systems as tools for providing remote sensing based observations of TCC and AGB at a range of spatial scales [
16,
17,
18,
19]. Optical satellite data of medium spatial resolution, such as Landsat imagery, are favorable when the objective is to map and monitor large areas over decadal time scales while retaining a relatively high degree of spatial detail and minimizing data acquisition costs.
Pixel size is of particular importance when remote sensing data are used in fragmented landscapes where tree cover may alternate from open to closed canopy within short distances [
20]. Several factors contribute to the reflected radiance recorded by the sensor which poses challenges to the mapping of tree cover in landscapes with an open canopy, such as the Sudano-Sahelian woodlands [
21]. Important factors include the heterogeneous spectral characteristics of soil and bedrock [
22], the spectral similarity between different vegetation types [
23,
24] and the high seasonal and annual variability in vegetation development, which is species dependent and related to water availability [
25]. Assessments have repeatedly shown that global tree cover products, such as the Vegetation Continuous Fields [
26] derived from the Moderate Resolution Imaging Spectroradiometer (MODIS), have significant limitations in characterizing areas with an open tree canopy [
21,
27,
28,
29].
Some environmental characteristics of woodlands may represent opportunities for using optical satellite data for mapping TCC and AGB. For example, saturation of spectral vegetation indices, such as the Normalized Difference Vegetation Index (NDVI), is less of a problem when relating spectral data to TCC and AGB in open tree cover conditions compared to closed forests [
16]. The relatively open canopy of woodlands does not obscure low growing trees to the same extent as in dense forests, where these can represent 30%–50% of the AGB [
30]. Previous research also suggests that the correlation between tree cover attributes, in particular AGB, and image texture is higher in open as compared to closed canopies [
31,
32,
33,
34]. Eckert [
33] hypothesized that texture is highly correlated to AGB in open canopy conditions due to its ability to capture shadow structures caused by large trees, which may contain up to 80% of the AGB in woodland landscapes [
35]. Lastly, trees in the seasonal tropics have contrasting phenological traits compared to other vegetation that may be identified using multi-temporal satellite data [
36,
37].
The Operational Land Imager (OLI) onboard Landsat 8 has several improvements over its predecessors the Thematic Mapper (TM; Landsat 4 and 5) and the Enhanced Thematic Mapper (ETM+; Landsat 7). The main changes include an increased number of spectral bands, a higher radiometric resolution (12 bits) and an improved signal-to-noise ratio resulting from the use of a push-broom sensor [
38]. These improvements may enable higher accuracy in the mapping of tree cover attributes, including AGB [
39]. The continuity and open data policy of the Landsat program also enables the use of image time series, which have shown great promise for large area mapping of tree cover attributes in boreal forests [
40,
41]. Thus, Landsat 8 represents an interesting data source for remote sensing based tree cover mapping, but its use has not yet been evaluated in the Sudano-Sahelian woodlands.
The estimation of tree cover attributes from remote sensing data involves modeling the relation between the response variable
Y (e.g., local reference measurements of TCC or AGB) [
42] and the predictor variables
Xn (e.g., remotely sensed reflectance). The parametric Ordinary Least Squares (OLS) regression has been the most common choice for fitting the equation between
X and
Y [
43], which enables the prediction of the tree cover attribute over the extent of the satellite imagery. An alternative to statistical regression is provided by non-parametric machine learning techniques, or algorithmic modeling [
44]. During the last decade machine-learning techniques, such as support vector machines [
45], decision trees [
46], and Random Forest [
47] have been increasingly used for both classification and relationship modeling with remote sensing data. These techniques tend to outperform the commonly used statistical regression models (e.g., OLS regression) in terms of prediction accuracy of TCC and AGB from remote sensing data [
19,
48,
49]
The aim of this study was to assess the utility of Landsat 8 imagery for mapping TCC and AGB in a Sudano-Sahelian woodland landscape. The Random Forest (RF) algorithm [
47] was used for identifying important predictor variables and for predictive modeling. Our methodology comprised three main steps: (a) assemblage of a reference dataset from field data and WorldView-2 imagery; (b) identification of important predictor variables from Landsat 8 data; (c) RF modeling of TCC and AGB as a function of the predictor variables. Three types of predictor variables were assessed for their effectiveness to capture woodland tree cover characteristics: spectral, texture, and phenology variables. The spectral variables included top of atmosphere (TOA) reflectance values of the Landsat 8 bands (bands 2 to 8), tasseled cap components [
50,
51,
52] and a set of vegetation indices. Texture variables were calculated using the gray level co-occurrence matrix (GLCM) approach [
53]. Phenology variables were derived from a dry season NDVI time series [
28]. The potential benefit of including phenology variables was assessed by comparing RF models based on multi-temporal and single date imagery, respectively.
4. Discussion
Several assessments have shown that global tree cover products based on satellite data have clear limitations for characterizing areas where the tree canopy is open [
21,
27,
28,
29]. Improved approaches are therefore needed to enable collection of accurate spatial information on key tree cover attributes, including TCC and AGB, in areas such as the Sudano-Sahelian woodlands. To our knowledge, this is the first study to map TCC and AGB using the Landsat 8 sensor and multi-temporal imagery in this region. We showed that spatially detailed and reasonably accurate maps of TCC and AGB can be derived using freely available Landsat 8 imagery. The coefficient of determination (R
2) between Landsat 8 based predictions and the reference data reached 0.77 for TCC (RMSE = 8.9%) and 0.57 for AGB (RMSE = 17.6 tons∙ha
−1). The relative RMSE was relatively high for AGB (66%) and lower for TCC (40.6%), however, the mean values of AGB and TCC are quite low with a wide range of values within the study area. The accuracy of the maps was assessed at plot level using 10-fold cross validation. If TCC and AGB estimates from application of the models were to be aggregated over larger areas, the errors would be lower.
The observed prediction errors highlight the uncertainties and limitations associated with mapping tree cover attributes using optical remote sensing. A main problem of using optical imagery in areas with an open tree canopy is that the understory vegetation and soil contributes to the spectral signal and therefore renders the relationship between the tree cover and the remote sensing data less predictable [
22]. In particular bright soil types, such as those found in the study area, have been shown to negatively affect the prediction of tree cover attributes from optical remote sensing data [
21]. An additional complicating factor is that woodland tree cover in general, and in the study area in particular, is composed of a relatively large number of tree species [
2] which are partly characterized by variations in the spectral properties of leaves and canopies [
28].
We aimed to account for the contribution from understory vegetation by using imagery from periods when the phenological differences between trees and grasses/crops are largest [
28,
37,
89], including the early wet season and the dry season. However, the understory vegetation in the study area also includes a considerable component of shrubs and tree coppice, which contribute to the spectral signal. The reference dataset therefore has limitations because (i) only trees with DBH ≥ 5 cm were surveyed in the field and (ii) the tree crown delineation in the WorldView-2 imagery has a higher likelihood of omitting small trees [
57]. A complete sampling of all woody vegetation in the field plots would require substantially more time and resources, which were not available in this study. A compromise between limited resources and field data completeness could be to use a nested inventory design where different types of woody vegetation are surveyed in small sub-plots [
90]. A further potential limitation of the reference dataset that may cause prediction errors is the use of allometric equations to obtain plot-level AGB from individual tree attributes (
i.e., height, DBH, and crown area). We opted to use species specific equations developed in areas with similar environmental conditions as those of the study area to the largest extent possible. However, the availability of species specific equations is limited in Africa [
91] and the pan-tropical allometric equation by Chave
et al. [
68] was therefore used for 28 of the tree species (42% of the field data). Furthermore, our approach to estimate AGB from tree crowns delineated in WorldView-2 imagery includes two uncertainties. Firstly, the crown delineation in the WorldView-2 image includes errors, especially for small trees [
57]. Secondly, the allometric equation used for estimating AGB from crown area was developed for
Vitellaria paradoxa and may therefore not be optimal for other tree species. The relationship between crown area and AGB is also complicated due to the pollarding of trees, which is a common practice in the region [
92,
93].
In order to reduce the effect of potential spatial mis-registration between remote sensing data and reference data, one suggestion is to average the remote sensing data within a window (e.g., 3 × 3) of pixels [
16,
31,
39]. However, the spatial variation in tree cover properties is extremely high in woodlands and parklands, and such an approach was therefore not suitable for this study. Instead we extracted the remote sensing predictor variables from individual Landsat pixels. This approach is heavily dependent on the spatial correspondence between the remote sensing data and the reference dataset. We estimate that the geo-location accuracy of the Landsat 8 imagery is below half a panchromatic pixel (
i.e., 7.5 m), thereby giving confidence to the approach used in this study. Similarly accurate spatial registration of Landsat 8 was also recognized by Zandler
et al. [
94].
4.1. Relationships between Predictor Variables and Tree Cover Attributes
The panchromatic band proved to be the most important variable for predicting both TCC and AGB, ranking above all of the vegetation indices. The observed strong inverse relationship between the panchromatic band and the tree cover attributes suggests two things. Firstly, the image acquisition date in early June (
i.e., early wet season) provided good contrast between tree cover and background components due to low growth activity of grasses, crops, and shrubs [
95]. Specifically, the foliage of Sudan-Sahelian tree species is known to develop before the re-growth of the herbaceous vegetation [
96,
97]. Secondly, the size of the 15 m panchromatic pixels seemed to be better suited to capture the reflectance contributions from trees, which may be mixed (e.g., trees and grass) in the larger 30 m Landsat multispectral pixels when the tree canopy is open [
28]. This observation was reinforced since image texture derived from the panchromatic band also proved useful for predicting both TCC and AGB. The relatively strong relationship between image texture, in particular the gray level co-occurrence matrix (GLCM) homogeneity, and tree cover attributes found in this study agrees with previous research suggesting that image texture is particularly useful in areas where the tree canopy is open [
31,
32,
33,
34,
98]. In addition to the panchromatic band, tasseled cap components adapted to Landsat 8 [
52] proved to be important for predicting tree cover attributes; greenness and wetness were strongly related to TCC and AGB, respectively. Greenness measures the amount of green vegetation by quantifying the contrast between the NIR band and the visible bands that results from spectral properties of leaf cellular structure and plant pigments. The better performance of greenness to predict TCC compared to the other vegetation indices can be explained by the inclusion of a mechanism to account for soil reflectance [
50,
51]. Soil reflectance can be highly variable in the Sudano-Sahelian zone and has been shown to complicate relationships between vegetation indices and vegetation properties [
99]. Individual SWIR bands have been shown to be sensitive to vegetation water content [
100,
101]. Wetness contrasts the SWIR bands against the visible and NIR bands in order to isolate the reflectance contribution from water content in leafs and soil [
51]. Previous research has found wetness and SWIR bands to be among the most important for predicting forest structure, including AGB, in various types of environments [
28,
102,
103,
104]. The same pattern is seen in the present study where the importance of wetness and SWIR bands is more pronounced when predicting AGB as compared to predicting TCC.
The inclusion of phenology variables generally improved the predictions of TCC and AGB; the product of dry season NDVI was included in the best TCC model (
Figure 6), while the median of dry season NDVI was included in the best AGB models (
Figure 7). The decreased RMSE for both TCC (−9.1%) and AGB (−9%) predictions suggest that the dry season NDVI time series contain additional information related to phenology and seasonal differences in soil moisture that facilitates the separation between tree cover and background components. These results are promising, but further research will be required to investigate the underlying mechanisms of this observation and to optimize the procedure for the Sudano-Sahelian woodland landscape. For example, the dry season time series could be contrasted to climate data, field observations of phenological events and temporal profiles from MODIS in order to better understand the Landsat 8 phenology variables. We used NDVI to characterize vegetation during the dry season, but other remote sensing variables could be used. The results from this study suggest that tasseled cap greenness and wetness are potential candidates for the phenology variables due to their stronger relationship to TCC and AGB in this study. The temporal definition of the dry season and frequency of image acquisition during this period may also merit further research.
4.2. Random Forest Regression and Variable Selection
We used the error rate calculated from the OOB data to perform variable selection with RF in order to assess its effect on the predictive performance of the resulting models. Previous research has shown this to be a statistically sound and efficient approach because the OOB data provide reliable internal estimates of error rate when compared to results derived from 10-fold cross-validation [
84,
105,
106]. The results show that variable selection did improve predictions of both TCC and AGB. This finding is in line with previous related remote sensing research [
84,
86,
106], and suggests that the effect of variable selection should be evaluated when RF is used for predicting tree cover attributes from remote sensing data. A plausible explanation to the better performance of the reduced models is that the mechanisms of RF partly fail to block the influence of noisy predictor variables [
106].
RF regression has several advantages for modeling remote sensing data [
47], but also limitations. In this study, RF appeared to consistently overestimate low values and underestimated high values, which partly explains the absence of bias in the TCC and AGB predictions. This effect was most pronounced for AGB predictions and is due to both properties of the algorithm and characteristics of the reference data. The final prediction from a RF model is based on the average value of individual trees generated from bootstrap samples [
47]. If the reference dataset contains too few extreme values they might be consistently underrepresented in the tree construction and RF predictions may therefore be biased towards the mean value. This property of the RF algorithm needs specific attention when reference data are collected. Specifically, the reference data need to cover the full range and represent the variability of the variable of interest in the specific study area. A stratified sampling design is therefore recommended for reference data collection.
The results from this study are promising, especially for the mapping of TCC. However, the approach should be tested in a larger area, preferably a site that covers a wider tree cover gradient. We used WorldView-2 imagery in addition to field data to derive the reference dataset. Availability of such imagery may be restricted due to high costs, especially for large areas. However, Wu
et al. [
75] showed that Google Earth is an interesting alternative source of high resolution imagery by using it to manually derive a reference dataset of TCC for the main part of Sudan.