Mapping Above-Ground Biomass in a Tropical Forest in Cambodia Using Canopy Textures Derived from Google Earth

This study develops a modelling framework for utilizing very high-resolution (VHR) aerial imagery for monitoring stocks of above-ground biomass (AGB) in a tropical forest in Southeast Asia. Three different texture-based methods (grey level co-occurrence metric (GLCM), Gabor wavelets and Fourier-based textural ordination (FOTO)) were used in conjunction with two different machine learning (ML)-based regression techniques (support vector regression (SVR) and random forest (RF) regression). These methods were implemented on both 50-cm resolution Digital Globe data extracted from Google EarthTM (GE) and 8-cm commercially obtained VHR imagery. This study further examines the role of forest biophysical parameters, such as ground-measured canopy cover and vertical canopy height, in explaining AGB distribution. Three models were developed using: (i) horizontal canopy variables (i.e., canopy cover and texture variables) plus vertical canopy height; (ii) horizontal variables only; and (iii) texture variables only. AGB was variable across the site, ranging from 51.02 Mg/ha to 356.34 Mg/ha. GE-based AGB estimates were comparable to OPEN ACCESS Remote Sens. 2015, 7 5058 those derived from commercial aerial imagery. The findings demonstrate that novel use of this array of texture-based techniques with GE imagery can help promote the wider use of freely available imagery for low-cost, fine-resolution monitoring of forests parameters at the landscape scale.

Horizontal forest structure and AGB can be determined by texture analysis of high-resolution remote sensing data [10,11]. Texture is a qualitative characterization of a given object in terms of its spatial and structural attributes. Statistical texture measures, notably grey level co-occurrence metrics (GLCMs), are widely used in remote sensing studies of tropical forest AGB and structure [11] and have been applied to moderate-resolution remote sensing data to derive landscape-scale AGB models in Indonesia [12] and the Brazilian Amazon [13]. However, Singh et al. [9] indicate that while GLCM textures are strong predictors for AGB values of forests that have undergone varying rounds of logging, they tend to underestimate field-measured AGB values due to data saturation at higher AGB values. A less common canopy texture technique is Fourier-based textural ordination (FOTO). This method uses a combination of Fourier transforms and ordination approaches (such as principal component analysis (PCA)) to quantify canopy structure and AGB heterogeneity and how this influences AGB variations [10]. This method has been used for quantifying AGB stock variability in varied tropical forested ecosystems, such as coastal mangrove forests [14] and natural and plantation forests [10,15]. Wavelet-derived texture methods are also increasingly used to characterize different forest types, particularly their areal delineation. For example, discrete wavelet transforms (DWT) were used to differentiate five tropical forests with similar canopy heights [16]. Gabor wavelets have been used to delineate different temperate forest types in the Swiss Alps [17]. However, wavelet-based texture analysis holds great promise for the quantification of AGB stocks and forest structure variations. In addition to texture-based methods, machine learning-based modelling techniques are increasingly utilized in AGB modelling. ML is a datacentric analysis technique that provides a robust means of modelling forestry data for both classification and predictive analysis-related tasks, such as estimation of AGB values [18,19].

Goal and Objectives
This study examined the utility of three-band remote sensing imagery obtained from two different sources-50-cm resolution Digital Globe imagery (extracted from Google Earth™) and 8-cm VHR aerial imagery-in examining variation in AGB stocks. Different texture-based and machine learning (ML) techniques were used to: (i) compare the utility of GLCMs, Fourier and Gabor wavelet-based texture measures for predicting AGB from both 50-cm and 8-cm aerial data; (ii) identify which of the two ML algorithms, random forest (RF) or support vector regression (SVR), produces better AGB estimation models; and (iii) examine whether the inclusion of height-related variables and canopy cover data improves AGB estimates. These results can facilitate the development of a remote sensing-based AGB monitoring framework that utilises a combination of low-cost/free high-resolution aerial imagery, field data and novel modelling techniques.

Study Area
Angkor Thom in northwest Cambodia is a 3 km × 3 km walled city (originally built in the 12th to 13th century AD) located within the Angkor Archaeological Park, much of which is forested (Figure 1). The surrounding forests are semi-evergreen forests dominated by dipterocarp tree species, ranging in height from 25 m to 40 m [20]. While Cambodia has distinct dry and wet seasons (like much of continental SE Asia), these forests are mostly dominated by evergreen tree species. Given its deep cultural significance, historical evolution and current land use changes, Angkor Thom and its associated forests can be considered a "living landscape". Although these forests enjoy a high level of protection owing to Angkor Thom's UNESCO World Heritage designation, the issue of tree removal and degradation persists within these forests [21].

Field Data Collection
Field data collection was conducted from December, 2013, to January, 2014. Twenty five randomly-stratified 1-ha plots were established in the forested portion of Angkor Thom (Figure 1), as per the RAINFOR protocol [22]. One-hectare plots were chosen, as smaller plot sizes may increase the dominance of larger trees, causing an increase in the root-mean-square error (RMSE) [23,24]. Stratification was conducted on the basis of the four forest quadrants that lead to the north, south, east and west gateways of Angkor Thom. All of the forested areas leading inwards have faced different levels of anthropogenic disturbance and have been occupied by different disturbance niches. Plots were randomly located within each of the quadrants. Diameter at breast height (DBH) was recorded for all trees where DBH ≥10 cm. The tree height (H) of 150 trees was measured using a clinometer. A DBH-to-tree height relationship was derived using field-measured tree heights and DBH, which, in turn, was used to estimate the heights of trees for which only DBH was measured. For more detail, see Tables S1-S3. AGB stocks were estimated from tree height and DBH measurements using the following allometric equation recommended by Chave et al. [25]: Here, ρ represents specific wood density, where a mean value of 0.57 Mg/m 3 was used [26]. Aside from the required variables in Equation (1), canopy cover was estimated in each 1-ha plot from 20 to 25 canopy cover readings using a densitometer. More specifically, 20 to 25 canopy cover readings were recorded from random locations within each of the 1-ha plots (Table S2).

Analysis of Aerial Imagery Data
Fifty-centimetre resolution Digital Globe data (images dated 26 December 2012) were extracted over Angkor Thom, north-western Cambodia directly from Google Earth™ using Google Satellite Map Downloader ® (Version 7.20) [27]. These data have three spectral bands (red, green and blue (RGB)) and covered the entire Angkor Thom area ( Figure 1). Eight-centimetre resolution aerial multispectral data were acquired in April 2013, using a 40-megapixel Leica RCD105 medium-format camera, collected by a helicopter at a flying height of 800 m above ground level. Aerial multispectral data also consisted of three RGB spectral bands. The wavelengths of RGB bands for high-resolution aerial data, including those from Google Earth™, are 0.45 to 0.52 nm, 0.52 to 0.60 nm and 9.62 to 0.69 nm, respectively. Radiometric, atmospheric and geometric corrections were conducted by the data provider. Figure 2 shows the 50-cm GE and 8-cm VHR image over a portion of Angkor Thom. Airborne LiDAR data were collected in conjunction with the aerial multispectral RGB data. The LiDAR data consisted of four returns and a point density of 12 points per square metre. Figure 3 shows the LiDAR point cloud-based 3D structure of the forest canopy. The blue regions represent the lowest elevations, while orange to red regions stand for the high tree elevations in the plot.
LiDAR data were used to develop a canopy height model (CHM) as a three-dimensional (3D) representation of the vertical canopy structure. CHMs were generated using FUSION software and are the product of subtracting the digital surface model (DSM) from the assumed digital terrain model (DTM); first and last return, respectively. The LiDAR point cloud-derived products (DTM, DSM and CHM) all had a resolution of 0.5 m. FUSION/LDV is freely available software for processing LiDAR point clouds [28]. In turn, CHM was used for deriving tree heights at the plot scale. See Figure S1 for a detailed workflow.

Grey Level Co-Occurrence Matrix
GLCMs quantify the tonal variability of image pixels. There are two main kinds of GLCM texture measures used in the image analyses: (i) first-order or occurrence metrics; and (ii) second-order or co-occurrence metrics. First-order occurrence metrics quantify pixel values at a given location, ignoring the relationship between adjacent pixels (e.g., common statistical measures, such as mean and standard deviation). Second-order co-occurrence metrics account for the relationship between pixels, as well as the probability of pixel occurrence in a given direction. This includes measures, such as entropy and the angular sum of moment [29]. GLCM values were extracted from the three different bands of the VHR and GE aerial imagery using a window size of 3 × 3 in the ENVI image processing software. Details of the GLCM metrics extracted using ENVI can be found in Singh et al. [9].

Fourier-Based Textural Ordination
FOTO aims to identify the gradients of textural variation by coupling two-dimensional (2D) Fourier spectra describing the spatial frequency distribution of image variance in canopy scenes. FOTO algorithms decompose the spatial information contained in an image into its frequency components by applying a 2D fast Fourier transform (FFT), where canopy structure variability can be encompassed within a few spatial frequencies [30]. The spectral and spatial information in the image is decomposed into the frequency domain characterized by F(p, q), which travel in the X and Y directions. This, in turn, is used to obtain Fourier coefficients. In order to best represent the extracted features/associated greyscale variance, a power spectrum was computed by squaring the amplitude of the FFT transform. Averaging was carried out across all orientations to obtain the r-spectrum, which contains the mean values of each frequency [16]. The rvalue quantifies the number of times that a pattern is repeated, which, in turn, represents the spatial structure of the canopy within a given window using a select few spatial frequencies obtained by the 2D FFT [10]. The r-spectrum also captures the coarseness gradient-related textural properties, which, in turn, may be related to underlying forest structure properties. Principal component analysis (PCA) was applied to the r-spectrum. The first three principal axes were taken to be the texture indices, which, subsequently, can be used to correlate the stand parameters and biomass dynamics of the different land-use types [14]. In our study, the first PC axis explained around 98% of the variance, while the second PC axis explained 1.2% of the variance. The third PC axis explained 0.2% of the variance. The PC axis beyond this did not explain any variance and, hence, was excluded. The various steps involved in this algorithm are detailed in the flowchart shown in Figure S2.

Gabor Wavelet-Based Texture Measures
Unlike FOTO-based methods, wavelet-based methods belong to the family of spatial frequency-based analysis. Spatial frequency decomposition breaks down imagery data into multiple scales and orientations. The variations in scale and orientation of a given image help capture the textural variation of its pixels, as provided in Equation (2): where σ = 2π and θ and ν represent the orientation and scale, respectively, of the Gabor wavelet in a given spatial domain. Gabor wavelet analysis produces both real and imaginary parts. The real part encapsulates the texture variation of the canopy and repetitiveness and was retained for analysis [17].

Support Vector Regression
SVR accounts for the nonlinearity, non-normality and complexity of ecological data by using kernel functions to map the original input space into a higher-dimension feature space. In Equation (3), y is the variable whose values we seek to predict, and x represents predictor variables that are mapped into the bespoken higher-dimensional feature space using a nonlinear function ϕ(x). That is, where ω is the weight and b is the bias of the regression function. C is a constant used to account for penalized loss when training errors occur [31], as shown in Equation (4): SVR then seeks to minimize the square of the weight function and the sum of the permitted errors, ε. The value of ϕ(x) is not known, and kernel functions are used for mapping the predictor variables into higher dimensions [32]. Radial basis function (RBF) is a kernel function that is commonly used to account for such data and which produces robust results compared to other kernels [33,34]. Linear and polynomial kernels were tested statistically, as per Kuhn et al. [35], using correlation coefficients and RMSE to evaluate model performance. The correlation values between field AGB and predicted AGB obtained from different kernels was comparable; however, the lowest RMSE values were obtained from the RBF kernel; thus, linear and polynomial kernels were not used. The RBF kernel function is used for fitting hyper-planes, which allows increasing the separability of the input variables. The purpose of a basic support vector algorithm is to discover a hyper-plane that maximizes the distance between the input variables. The points from each class closest to the hyper-plane are support vectors. By projecting data into a higher dimension, nonlinear data can be visualized to be functioning as linear data. This facilitates the establishment of mathematical relationships between variables [36].

Random Forests Models
RF models fit multiple decision trees to input data using a random subset of the input variables for each tree constructed. For a dataset with n features, square root of n may be used in each subset. A multitude of trees are constructed, and their average is used to construct a new predictive model [37]. RF offers the advantages of being non-parametric, handling a large number of input variables and skewed data and preventing over-fitting [38]. Both of the above modelling techniques were implemented using the caret package in the R programming language. For biomass modelling purposes, 14 of the 25 plots were randomly used for training/calibration purposes, and the remaining 11 plots were used for testing/validation purposes. Plots were randomly assigned to the training or testing set via the "random" function available in R. Prior to relating texture measures to field AGB, the strongly correlated predictor texture variables were removed using the built-in functionalities provided within the caret package. Including strongly correlated predictor variables in the training model leads to the problem of multicollinearity and over-fitting. The resulting model has limited generalizability and does not strongly predict the unseen validation data. However, by removing the strongly correlated predictors, we can build a parsimonious model containing a small subset of uncorrelated texture variables, which, in turn, helps build robust predictive models. Training our initial model development used the 14 training plots with 25 iterations for bootstrap-based resampling (these are built-in within the "train" function used to build a model). The final predicted model was used with the test data (in this case, 11 plots) to generate predictions for the target variable. Two ML algorithms were used for generating predictions of field AGB using vertical canopy height, canopy cover and texture measures derived from different methods. The performance of the three models-(i) included vertical canopy height, horizontal canopy structure and texture variables as predictors; (ii) with canopy cover and texture variables; and (iii) with texture variables only-was compared using two different aerial datasets and ML algorithms.

GLCM-Derived AGB Estimates
The GLCM method achieved strong to very strong positive correlations between SVR-predicted AGB and field-measured AGB for both the 50-cm resolution GE imagery and the 8-cm resolution aerial multispectral imagery, with r-values (Pearson's product-moment correlation coefficient) ranging from 0.630 to 0.888. Pearson's correlation was chosen in order to identify any relationship between the field-measured and SVR predicted AGB [39]. As evident in Figures 1 and 2, comparatively high r-values and low RMS errors are obtained from GE imagery compared to those from VHR imagery. For both datasets, the same trend can be observed; correlation coefficient (r) was highest and RMSE was lowest when both horizontal canopy variables (HCV) and vertical canopy height (VCH) were included in the analysis.
AGB estimates from VHR imagery yielded r-values of 0.757 and 0.639 and RMS errors of 55.76 and 70.90 Mg/ha for the first ( Figure 4A) and second ( Figure 4B) GLCM-derived models, respectively. No significant strength of association was detected for the third GLCM model extracted from GE imagery, which only used texture variables for AGB prediction. Conversely, the AGB estimates from GE imagery yielded: (1) r = 0.892 and RMSE = 33.73 Mg/ha for the first GLCM-derived model, which combined HCV and VCH as predictors ( Figure 5A  Visual comparison of Figures 4 and 5 revealed that the two GLCM models derived from VHR imagery exhibit more pronounced deviations from their respective 1:1 correspondence lines than did the three GE models. All five GLCM models, however, underestimated AGB at lower field values (AGB < 200 Mg/ha) and overestimation at higher values (AGB > 250 Mg/ha).
Furthermore, when the GLCM method was employed in conjunction with random forest regression (RFR), the resulting RFR-predicted AGB also displayed strong to very strong positive associations with field-measured AGB, although the RFR-based r-values were slightly lower than their SVR counterparts.

FOTO-Derived AGB Estimates
The results of the FOTO analyses were comparable to those of the GLCM analyses, establishing strong to very strong positive associations between FOTO-derived AGB estimates and field-measured AGB, e.g., In the case of FOTO-derived AGB estimates from GE imagery, the second model (which excludes the vertical canopy height) exhibited the strongest positive association between SVR-predicted AGB and fieldmeasured AGB, as well as between RFR-derived AGB and field-measured AGB. For FOTO models extracted from VHR imagery, no significant association was detected for the third model, which only used texture variables for AGB prediction.

Mapping AGB
Field-measured AGB values for Cambodia showed considerable variation (range: 44.55 to 323 Mg/ha, mean 181.1 ± 16.6 Mg/ha). This study developed different GLCM-derived and FOTO-derived models in conjunction with SVR and RFR, many of which capture AGB variations to a high level of accuracy. Table  1 summarizes the various method and model combinations and their corresponding r and RMSE values for both Google Earth™ and commercial VHR imagery. Of all the remote sensing models presented in this study, the AGB estimates derived via a combination of vertical canopy structure, ground canopy cover variables and GLCM texture variables within an SVR modelling framework showed the strongest association with field-measured AGB (r = 0.892). Figure 10 shows the resulting spatial distribution of AGB in Angkor Thom, as derived using all of the aforementioned parameters.
AGB exhibits a distinct pattern across Angkor Thom, with the highest values recorded at the northeast corner and the lowest in the southwest corner ( Figure 3). Figure 10. Spatial distribution of above-ground biomass in Angkor Thom, based on a best-fit SVR + vertical canopy + horizontal canopy + GLCM model.

Spatial Patterns of AGB in Angkor Thom
AGB stocks differed across the Angkor Thom landscape, with some areas holding high AGB stocks (up to 300 Mg/ha), while other areas held much lower stocks. A distinct pattern of AGB stock distribution can be seen across the site, with the southern portion (especially the southwest) having a much lower AGB stock value. Since the southern portion of Angkor Thom serves as the main access to most of the historical buildings in the area, it receives a significant flow of tourists. This massive inflow of people is likely to cause anthropogenic disturbance, thereby influencing AGB in the area. However, as one moves toward the archaeological sites, there is an increased level of security (restricted tourist access, presence of security guards), dramatically reducing the chances of illegal forest loss.
The variation in AGB distribution may be seen as a problem of "leakage" that is usually associated with protected areas. A similar situation has been observed in the buffer zones around tropical protected areas [31]. Closer proximity to the core of the protected area affords a much higher level of protection as compared to buffer zones; core areas are thus subject to lower rates of forest loss and resource extraction. Another major cause of AGB loss that is apparent not only in Angkor Thom, but also in other parts of tropical Asia [32] is the proximity of an area to various sources of anthropogenic disturbance, such as roads.

The Importance of Structural Parameters
The present study employed several model combinations to estimate AGB stocks and their spatial distribution. Previous studies have utilized solely multispectral sources to estimate AGB [10,15], although in the majority of the texture models-GLCM, FOTO and Gabor wavelets-the inclusion of vertical height data provided marginally improves AGB estimates. However, ground-measured canopy cover percentage was strongly associated with field AGB values (r = 0.698). Previous studies in the tropics using different datasets, and modelling approaches have established the role of vertical canopy height in producing accurate estimates of AGB [23]. However, the corresponding role of the percentage of canopy cover was not previously examined in the literature. While this study establishes the role of canopy cover in explaining AGB variation, there are no universally accepted protocols for measuring canopy cover, either in the field or using remote sensing data. This makes the comparison of our canopy cover-based results with other research difficult.

Performance of Different Modelling Approaches
In all cases, the SVR-based models outperformed the RFR-based models. SVR also produced more robust estimates of AGB variation than did multiple regression models [40]. ML methods, such as SVR, are capable of producing robust and generalizable models of forest parameters. Furthermore, they can characterize forest structural complexity at finer scales, which is poorly captured by traditional linear and non-linear regression models. This enables detailed AGB models that can facilitate monitoring and conservation activities. In addition to the efficacy of ML models, it was found that GE aerial imagery (in conjunction with different texture measures) can produce robust AGB estimates.

Performance of Google Earth™ Imagery vs. VHR Imagery for AGB Estimation
GLCM-, Gabor-and Fourier-based texture variables derived from GE and VHR imagery (along with vertical canopy height and canopy cover) showed a strong association with field-measured AGB. In all cases, SVR-predicted AGB values were more similar to field AGB with lower RMSE than were RF-predicted AGB values. SVR is known to produce accurate AGB estimates [40] and to outperform other ML techniques, such as RF regression [41]. On the basis of the results obtained using both ML techniques, textural variables, both statistical and Fourier-based, are powerful predictors of AGB stock variation. The strength of association did not decline considerably between models that used both horizontal and vertical variables and those that used horizontal variables (canopy cover and texture metrics) only. Thus, in opencanopied forest ecosystems, such as the one in our study area, consideration of vertical canopy structure does not significantly improve AGB estimates, while inclusion of percentage canopy cover estimates is more important. However, models that used a combination of percentage canopy cover and texture variables (derived from either of the methods) tended to underestimate AGB values, especially for lower AGB values, whereas the inclusion of vertical canopy height improved the fit between predicted AGB and the 1:1 line. Texture variables themselves had limited utility for AGB prediction. AGB predicted using GLCM variables derived from GE imagery, VHR-based Fourier texture and Gabor wavelet variables had a moderate strength of association with field AGB values and deviated considerably from the 1:1 line.
Ground canopy cover is often ignored in most tropical forest inventories, whereas this study makes a strong case for its inclusion in AGB monitoring. Vertical canopy height is an important variable in its own right, but its collection either through LiDAR surveys or field measurements can be expensive and/or cumbersome.

Implications of Freely Available Remote Sensing Imagery for Conservation
This study has shown the utility of freely available remote sensing sources, such as GE and other Digital Globes, for robust, quantitative mapping of forest parameters, such as AGB, although this is rarely conducted. However, see Ploton et al. [42] for an example of AGB estimation using GE for forests in the Western Ghats, India. Such data sources could allow regular and cost-effective monitoring of tropical forested ecosystems.
While high-resolution GE imagery is not available for the entire Asian tropics (especially some inaccessible forested areas), its coverage is increasing. Other endeavours, such as the Digital Globe Foundation, offer free/low-cost VHR imagery for researchers. Structural parameters, such as canopy height, improve the accuracy of AGB estimations, but are relatively difficult and expensive to obtain. Thus, multispectral-only models derived from VHR Digital Globe imagery could play an important role in the repeated monitoring of tropical landscape-scale AGB stocks. Repeated monitoring is vital for accurate reporting, monitoring and verification of forest carbon stocks for conservation instruments, such as REDD+, to support enforcement and enable reliable accounting of ecosystem service provision [43], especially at sites and in ecosystems that are dynamic and affected by surrounding land use decisions and external stressors [44]. However, there is a need to test these models in other ecosystems (notably, closedcanopy forests, mangroves) at different spatial scales, different plot sizes and with other VHR data to improve their general applicability. Finally, not much is known about the data quality of the different image sources present within the GE database, acquisition errors and pre-processing algorithms applied to them. Detailed information relating to acquisition conditions and pre-processing protocols are needed for aerial data (including GE acquired data) to improve the interpretability of results.

Conclusions
This study highlights the potential to harness the power of freely-available Digital Globe imagery, such as Google Earth™, in conjunction with novel methods, such as grey level co-occurrence metrics, Gabor wavelets and Fourier-based textural ordination, to quantify biophysical forest parameters of importance to conservation. This study was unique in the range of texture methods utilized, in addition to the comparison of freely available and commercial imagery. Combinations of these techniques were able to quantify finescale changes and gradients in forest parameters, such as AGB across a disturbed, open canopy site that is experiencing several threats to forest permanence.
The testing and comparison of GE-derived results against traditional remotely-sensed data sources highlights its potential utility as part of a framework for robust, low-cost, fine-resolution, quantitative mapping of forest parameters, such as above-ground biomass, at the landscape scale in open-canopied forests. The use of freely available data sources, such as GE, could allow regular and cost-effective monitoring of imperilled tropical forested ecosystems in the future. Future studies should focus on the use of the presented framework for repeated monitoring, comparing the different modelling methods to see which best represent changes in forest parameters through time.