Using Google Earth Surface Metrics to Predict Plant Species Richness in a Complex Landscape

Google Earth provides a freely available, global mosaic of high-resolution imagery from different sensors that has become popular in environmental and ecological studies. However, such imagery lacks the near-infrared band often used in studying vegetation, thus its potential for estimating vegetation properties remains unclear. In this study, we assess the potential of Google Earth imagery to describe and predict vegetation attributes. Further, we compare it to the potential of SPOT imagery, which has additional spectral information. We measured basal area, vegetation height, crown cover, density of individuals, and species richness in 60 plots in the oak forests of a complex volcanic landscape in central Mexico. We modelled each vegetation attribute as a function of surface metrics derived from Google Earth and SPOT images, and selected the best-supported linear models from each source. Total species richness was the best-described and predicted variable: the best Google Earth-based model explained nearly as much variation in species richness as its SPOT counterpart (R2 = 0.44 and 0.51, respectively). However, Google Earth metrics emerged as poor predictors of all remaining vegetation attributes, whilst SPOT metrics showed potential for predicting vegetation height. We conclude that Google Earth imagery can be used to estimate species richness in complex landscapes. As it is freely available, Google Earth can broaden the use of remote sensing by researchers and managers in low-income tropical countries where most biodiversity hotspots are found.


Introduction
Remote sensing is a useful tool in environmental and ecological research. This technology allows ecologists to gather information about the natural world more rapidly and cost effectively than traditional field methods, and aid in the analyses and integration of processes across spatial and temporal scales [1]. Remote sensing is useful when studying vegetation to assess ecosystem functioning and health [2], regulate human use of vegetation [3,4], and prioritise areas for conservation and restoration [5,6].
Although Landsat imagery is still the most frequently used source in remote sensing studies [7], some vegetation patterns cannot be captured at its resolution (30 m pixel) [8]. Other sensors such as SPOT (Satellite Pour l'Observation de la Terre), Quickbird, GeoEye-1 and RapidEye produce

Study Site
We conducted the study in El Tepozteco National Park, located on the Chichinautzin volcanic range, ca. 50 km south of Mexico City, Mexico (19 • 01.10 N, 99 • 05.80 W; Figure 1). It encompasses a complex territory including the Tepozteco Mountains, which are the result of eroded lahars, as well as a mosaic of lava fields with ages ranging from 1835 to >20,000 years old [18,19]. We studied oak forests in six geomorphological units: Tepozteco Mountains (TM), and four lava fields, namely Chichinautzin (CH), Suchiooc (SU), Otates, which is divided into lower (LO) and upper (UO) subunits, and Oclayuca (OC). Overall mean annual temperature is 15.6 • C, and mean total annual precipitation is 2100 mm, with a rainy season between May and October [20]. Climatic conditions vary along the park's elevational gradient, which ranges from 1350 to 3350 m a.s.l. Oak forests extend from 1800 to 2800 m a.s.l., and shift gradually into conifer forests above, and tropical dry forests below. Additionally, patches of xerophytic vegetation in rocky outcrops and recent lava fields are found amid the oak forests.

Field Data
We randomly located ten 10 × 10 m plots in each geomorphological unit ( Figure 1) and sampled them in October and November 2010 and 2011. We recorded all plant species present in each plot, and for plants with a diameter at breast height (DBH) ≥2.5 cm (hereafter referred to as 'canopy'), we measured DBH, height and two orthogonal crown diameters (to estimate crown cover as the area of an ellipse). We used these variables to calculate density of individuals, basal area, total crown cover, and vegetation height (mean height of the three tallest trees) in each plot. Also, we calculated species richness and Shannon's Index [21] for the whole community and for the canopy in each plot.

Image Processing
We calculated spectral and textural metrics for GE and SPOT 5 images, which were then used to model vegetation attributes. The GE image used in this study was a pansharpened Quickbird image, constructed by mosaicing and georeferencing screenshots captured with a fixed eye altitude of 3 km in Google Earth, while the SPOT image was subjected to geometric correction. We preferred the term GE image instead of its commercial name (Quickbird) to highlight the use of a free image server, and also because we did not have access to the raw Quickbird information. The main differences between these two image sources were: (1) the available bands (RGB in GE; RG and NIR in SPOT), (2) their spatial resolution (GE = 0.75 m; SPOT = 2.5 m), and (3) their acquisition date (January 2005 for GE and March 2010 for SPOT). Although the GE image differed by five years from our field measurements, we decided to use it because it was the only high-resolution image covering the entire study area.
We used only the red band and a vegetation index for each image to calculate the Visible Vegetation Index (VVI) for GE, and the NDVI for SPOT, as follows:

Image Processing
We calculated spectral and textural metrics for GE and SPOT 5 images, which were then used to model vegetation attributes. The GE image used in this study was a pansharpened Quickbird image, constructed by mosaicing and georeferencing screenshots captured with a fixed eye altitude of 3 km in Google Earth, while the SPOT image was subjected to geometric correction. We preferred the term GE image instead of its commercial name (Quickbird) to highlight the use of a free image server, and also because we did not have access to the raw Quickbird information. The main differences between these two image sources were: (1) the available bands (RGB in GE; RG and NIR in SPOT), (2) their spatial resolution (GE = 0.75 m; SPOT = 2.5 m), and (3) their acquisition date (January 2005 for GE and March 2010 for SPOT). Although the GE image differed by five years from our field measurements, we decided to use it because it was the only high-resolution image covering the entire study area.
We used only the red band and a vegetation index for each image to calculate the Visible Vegetation Index (VVI) for GE, and the NDVI for SPOT, as follows: where NIR, R, G, and B are the near-infrared, the red, the green, and the blue band, respectively, R 0 = 40, G 0 = 60, B 0 = 10, C = 10, and w = 1 [22,23]. Equation 1 is a modification of the original formula following the recommendations of its proponents. These indices were calculated manually in ENVI 5 (Exelis Visual Information Solutions, Boulder, CO, USA). We calculated 13 surface metrics for each band or vegetation index. Eight were co-occurrence or texture metrics calculated following the grey level co-occurrence method (GLCM; Supplementary Material 1) [24], using an offset of one pixel and a transformation of pixel values to 64 grey levels to minimise computation time. To obtain a rotation-invariant texture measure, all the co-occurrence metrics were calculated in four directions (0 • , 45 • , 90 • , 135 • ) and then averaged. The remaining five occurrence metrics, also known as spectral metrics, were calculated using raw pixel values (see Supplementary Material 1). We calculated all metrics using the moving window approach [24]. To ensure a good match between window size and sampling plot size, we used a square window of 15 pixels for GE and 5 pixels for SPOT.

Statistical Analysis
We constructed linear and quadratic models of surface metrics as predictors of vegetation attributes. Higher-order models were not explored because of limited degrees of freedom (n = 60). Thus, we fitted eight types of models: where y is a log-transformed vegetation variable, and x 1 , x 2 and x 3 are surface metrics. The normal distribution of the error was verified in all models. All possible combinations of metrics were evaluated. We calculated the coefficient of determination (R 2 ) as a goodness-of-fit measure for each model. R 2 has the advantage of having a fixed range (0 to 1, with 1 being a perfect fit), making it easy to interpret. However, R 2 is not useful for model comparison, as it does not penalise for model complexity. Therefore, our model selection process was based on the Akaike Information Criterion; due to our small sample sizes we used the corrected version (AICc). Two models were considered to be equally supported when ∆AICc < 2 [25].
Given the large number of fitted models and the small sample size of our database, we expected some large R 2 -values to result purely by chance. To explore this possibility, we randomly sorted the data of each variable, fitted the models, and calculated their associated R 2 . Iteration of this procedure 1000 times allowed us to estimate the distribution of the R 2 -values under a random scenario, and thus calculate P-values associated with the observed R 2 values.
We used a leave-two-out cross-validation to evaluate the ability of the different combination of surface metrics to predict vegetation attributes in the models. In this procedure, we split the dataset into a calibration subset (with 58 data points) and a validation subset (two data points), fitted the model to the former, and predicted the vegetation attribute for the latter. Following Gallardo-Cruz et al. [17], we used the leave-d-out cross-validation R 2 : where n is the number of plots, c is the number of possible splits of the data set, y obs are the observed vegetation-attribute data points, y pred are the values predicted by the model using the d surface data points, and y avg is the average over the n vegetation-attribute data points. R 2 CV ranges from −∞ to 1, with 1 being a perfect fit, and negative values representing over-fitted models that make worse predictions than chance.

Results
Total species richness was the vegetation attribute best described and predicted by the image surface data. It was the only attribute for which both GE and SPOT models had higher R 2 values than expected by chance ( Table 1). The best models explained about half of the variation in community species richness for both GE and SPOT images (R 2 = 0.44 and 0.51, respectively; Table 2). In contrast, canopy species richness and vegetation structure were described poorly, and R 2 values were lower than expected by chance. Exceptions to this were mean vegetation height, which was successfully described by SPOT-based models, and density, which was adequately described by one GE-based model. All the best models selected by AICc included either one or two surface metrics as predictors ( Table 2). Including a third predictor never resulted in better models. While the best models to predict total species richness based on SPOT images included only red-band surface metrics, models based on GE images used both red and VVI metrics. Table 2. Best models constructed for each vegetation attribute. Only those metrics for which at least one model had a P-value < 0.05 associated to its random R 2 distribution are shown. Of those, models with a ∆AICc < 2 are shown. In Model type, y is the log-transformed vegetation variable and x 1 and x 2 are the surface metrics described in their respective columns. The + and − signs indicate positive and negative β coefficients, respectively. See Supplementary Material 1 for the mathematical formulation of surface metrics.

Vegetation Attribute
Image      The accuracy of model estimates of total species richness varied among geomorphological units ( Figure 2). The worst predictions were observed for the Oclayuca plots, where the best GE estimated values were close to the average richness of plots in that unit, but failed to distinguish the variation among them. Species richness in the Chichinautzin lava field tended to be underestimated, while it was overestimated for the Lower Otates unit.  The accuracy of model estimates of total species richness varied among geomorphological units ( Figure 2). The worst predictions were observed for the Oclayuca plots, where the best GE estimated values were close to the average richness of plots in that unit, but failed to distinguish the variation among them. Species richness in the Chichinautzin lava field tended to be underestimated, while it was overestimated for the Lower Otates unit.

Discussion
In our rapidly changing world, it is increasingly important to assess and monitor ecosystem properties accurately across landscapes, particularly biodiversity. Satellite-based modelling is a valuable tool to achieve this, but cost still limits the acquisition of high-resolution images [11]. Our results show that GE imagery can be used to describe and predict species richness in heterogeneous environments. Although the best GE models only explained about one half of the variation of the recorded species richness, this result is valuable given the high floristic heterogeneity of the studied forests [26]. Moreover, the performance of GE-based models in describing richness was similar to that of SPOT-based models, a widely used but costly counterpart. It is worth noting that an apparent saturation in the capability of modelling low species richness (under 10 species) from texture metrics is observed in our models ( Figure 2). However, these low richness sites correspond to a single geomorphological unit (OC, Oclayuca); therefore we cannot be certain whether the models have limited predictive power in low richness sites, or if it is simply a local effect associated with a given geomorphological unit.
To our surprise, all models failed to describe canopy richness adequately (Table 1), which may be due to the considerably poorer and more homogeneous canopy richness in these oak forests across the landscape, relative to the high richness of the accompanying (i.e., non-canopy) species. One possible explanation is that, due to the relatively small size of our sampling plots (10 × 10 m) compared to the crown of some of the largest trees, in some cases the heterogeneity of pixels within a plot could be more closely associated with the variability within a single tree crown than with the variability among crowns of different tree species.
The most common textural metrics in the best species richness models were cc.mean and cc.asm, both of which are related to the local spectral heterogeneity of pixels in an image. Previously, cc.mean was identified as one of the most capable surface metrics to detect changes in vegetation attributes [17]. The potential of GE images to predict species richness is supported by the fact that the best SPOT models for species richness did not include NIR or NDVI metrics, which suggests that the NIR band is less useful in image-based estimations of species richness, at least in this forest type.
Admittedly, other aspects regarding image acquisition can affect the calculation of GLCM metrics such as pixel size, topography and illumination. Pixel size will determine the scale at which the GLCM texture will summarise patterns of heterogeneity [27]. In this study, the effect of different pixel sizes on the capabilities of GLCM to model forest attributes was not tested. Testing the effect of scale or pixel resolution on the capabilities of texture metrics remains an important task for future texture studies. The effect of topography and acquisition configuration may not affect our results greatly because most of our plots are located in well-illuminated areas and beyond steep areas [28].
All vegetation structure attributes, except vegetation height, were poorly modelled. Interestingly, for estimations of vegetation height, we obtained the sharpest difference in performance between GE and SPOT: while models based on GE images performed poorly, models for SPOT images performed well. The inability of the models based on GE imagery to predict vegetation height adequately is disappointing given the importance of this attribute for biomass estimation [29]. Other important structural attributes such as basal area and vegetation cover have been correctly estimated from surface metrics in other satellite imagery [11,17]. The underlying mechanistic relationships between vegetation attributes and remotely-sensed surface metrics are still blurry, which partially explains the inconsistent results obtained through different methodological approaches and remote sensors [16,30]. However, Gallardo-Cruz et al. [17] showed that for a vegetation attribute to be correctly modelled, it should be highly heterogeneous. In this study, mean vegetation height and species richness differed significantly among geomorphological units; however, while basal area was equally variable among these units, its variation was not significant [31].

Conclusions
In a time when global change and biodiversity loss are major threats to ecosystem structure and function, estimating species richness from remote sensing information is of the utmost importance. Among the enormous variety of available high-resolution images, freely available ones have not been tested as widely as their commercial counterparts. In this study, we showed that the limited spectral resolution and lack of metadata of free GE images, in comparison with a commercial one (SPOT) are of minor consequence for its potential to predict plant species richness in heterogeneous landscapes from surface metrics. In addition, GE images can be more user-friendly than commercial counterparts and, in most cases, require no special skills to view or download. Harnessing the predictive potential of GE imagery emerges as an important tool for detecting and managing biodiversity hotspots, especially in tropical regions where researchers and managers have less access to expensive high-resolution imagery.
Supplementary Materials: The following are available online at www.mdpi.com/2072-4292/8/10/865/s1, S1, Image metrics calculated for each red band or vegetation index of the SPOT and Google Earth images. S2, R code describing the modelling procedure followed in this study. S3, Data set used in the study.