1. Introduction
Estimates of tropical tree species diversity from remotely sensed data are continually sought to aid conservation decisions and evaluate forest conditions over space and time [
1,
2]. Rapid rates of seasonally dry tropical forest (SDTF) loss, fragmentation, and degradation underscore the need to map forest diversity, both prospectively and retrospectively [
3,
4]). Latin America has shown a decline in SDTF cover and low representation within protected areas, yet they maintain particularly high levels of species endemism and diversity [
5]. Ecuador, Colombia, and Peru may now contain as little as 10% or less of their former SDTF cover that can often reside in early successional or highly fragmented patches [
6,
7,
8]. Banda et al. [
5] defines five distinct floristic groups of SDTF for northern South America, three of which are in Ecuador, recognized as the Piedmont Central Andes Coast, Central inter-Andean Valleys, and Tarapoto-Quillabamba floristic groups. Groups differ by woody plant composition, but all contain deciduous vegetation that shed leaves for 3 to 5 months a year during periods with <100 mm of rainfall per month. SDTF diversity is notable in southern Ecuador and northern Peru because of high turnover in species composition at relatively short geographic distances and sub-regional endemic flora [
9]. Dry forest formations in this region are considered vulnerable to land conversion through a variety of anthropogenic and climate-driven changes [
10,
11,
12].
Global data from passive and active satellite sensors have rapidly increased in terms of availability and the spectral, spatial, and temporal range covered [
13]. Plant phenology, biophysical parameters and leaf traits related to tree diversity levels can be quantified from an ever-wider variety of multispectral satellite image platforms [
14]. The Global Ecosystem Dynamic Investigation (GEDI) satellite mission and waveform light detection and ranging (LiDAR) provide complementary measurements of vertical forest canopy structure linked to tree diversity for all tropical environments [
15,
16]. These advances increase opportunities to evaluate spatial and temporal differences in tree diversity from satellite remote sensing systems, and by extension, help to determine diversity relationships among other taxonomic groups [
17,
18]. In this regard, knowledge of hierarchical, or surrogate measures of biological diversity can benefit from robust applications to map tree species diversity [
19,
20,
21].
For this study, we sought to investigate relationships between SDTF tree diversity and remotely sensed data in southwestern Ecuador using mixed sensor types. Bustamante et al. [
3] noted that data assimilation and synthesis methods are greatly needed to improve monitoring for biodiversity and other natural resource values. Forest census plots with spatially referenced trees and species identification are particularly beneficial for evaluating mixed sensor approaches that can suggest how future sampling and remote sensing applications can be developed in tandem [
4,
22].
Experimentation, data integration, and models are needed to help reveal details essential to estimating diversity at differing levels of organization and across scales [
14]. Intensive, rather than extensive investigation of tree diversity metrics and sensor types with global coverage can help to further identify data requirements for broader-scale efforts. For our purposes, we took an experimental simulation approach within a 9 ha intensively measured permanent forest inventory plot to evaluate relationships between
α-diversity (i.e., within community) and satellite imagery from different sources. Given the limited extent of our study area, we focused on multiple indices that help describe within community tree diversity in a late successional SDTF. Daly et al. [
23] indicated that diversity summarized by solitary metrics can be problematic because of the multidimensional nature of species diversity. For instance, sampled areas with an equal number of species can be represented by an uneven number of individual species [
24]. Common diversity measures can also be strongly correlated with species richness which is dependent on sample size [
23].
Conservation decision-making will likely require various indicators to describe the successional status, diversity, and spatial structure of forest communities [
24,
25,
26]. Measuring interaction between species diversity and varied site factors likely requires examining diversity metrics that can appear interchangeable but describe dissimilar aspects of the community [
27]. We examined methods to map separate measures of
α-diversity with sensitivities to common or rare species as well as those that attempt to differentiate levels of abundance and reduce sample bias, outlined in methods. We assumed that
α-diversity indices modeled from spectral, structural, and biophysical indicators will afford an enhanced means to characterize SDTF tree diversity.
The spectral variability hypothesis posits that reflectance values collected from optical remote sensing platforms are expected to vary with changes in species composition [
28,
29]. Yet, multispectral data are challenged in tropical settings because of the high tree species numbers (i.e., richness) and the limited spectral range covered by global satellite systems [
30]. Prior studies have determined that vertical height and canopy metrics from airborne LiDAR combined with spectral values and indices from the RapidEye satellite sensor increased model variation explained for
α- and
β-diversity (i.e., between community) measurements in SDTF [
31,
32]. Because of inadequate data quality and the small extent of our study area, we could not use GEDI waveform LiDAR, and airborne data was not available. However, we found that total station tree coordinates, height, and elevation measurements could be used to develop a high-resolution tree canopy height and topography models explained in methods.
We explored methods to select and evaluate explanatory variables from mixed sensor types that were used to fit machine learning (ML) models for predicting and mapping SDTF diversity. In place of selecting a single modeling approach (e.g., regression trees, support vector machines, or artificial neural networks) we applied an ensemble of moderately tuned model types for mapping tree diversity predictions weighted by model performance [
33,
34]. Our objective was to use multiple image dates from both commercial (e.g., RapidEye) and publicly available sources (e.g., Sentinel-2) to capture seasonal variability in tree canopy reflectance. In this setting we deployed a set of canopy biophysical measures from Sentinel-2 imagery to estimate canopy gap fraction, chlorophyll content, wetness, and leaf-area linked to tropical forest heterogeneity and diversity [
35]. Each data source (e.g., interpolated canopy height, topography, and multispectral imagery) was expected to contribute complementary information for predicting SDTF tree diversity.
3. Results
During field surveys in 2019, we encountered only two recent tree falls creating minor canopy gaps within the study area since 2014 that may have contributed to some unexplained error in
α-diversity models and predictions. The mean values for field LAI measures (2.77 ± 0.81) were generally comparable with Sentinel-2 image LAI values (2.37 ± 0.10) for the 9 ha study area, although maximum LAI was higher from fine-scale field measures than from 10 m Sentinel-2 pixels (4.32 vs. 2.65) likely due to fine-scale ceptometer measurements made in the field. The distribution of
α-diversity measures from 0.10 ha plots showed good correspondence between all three simulated datasets, which we considered important for making valid model comparisons (
Figure 4). From simulated training and random validation plots (
n = 115), tree density and basal area averaged 1114 trees per ha and 15.6 m
2 per ha, respectively, with an average tree height of 7.4 m for trees ≥5 cm DBH comprising of 38 species. Species richness averaged 16 tree species per 0.10 ha plot and ranged from 8 to 25 species. Plot elevation ranged between 32 m and 38 m. Further plot summary information for each
α-diversity and forest structure measurement is found in
Supplementary 3. Correlation among
α-diversity measurements was strongly positive except between species richness, unbiased Simpsons and Piélou’s J that showed positive but lower correlation (
Supplementary 3).
Prior to developing base learners, RFE typically reduced the number of variables entering models except for a single case where all Sentinel-2 variables were selected with Fisher’s alpha (A) models (
Table 4). In nearly all cases, ensemble models outperformed base learners except for Piélou’s J that showed increased cross-validation error (
Supplementary 4). An exploration of model training data revealed no spatial correlation among tree species on plots and that areas with higher
α-diversity were often sites with greater tree densities (
Supplementary 2). There was a significant positive spatial correlation among plots for Sentinel-2 spectral values on plots <100 m apart and little or significantly negative correlation between plots and RapidEye spectral data for plots ≤75 m apart. Positive and negative spectral correlation among training samples was likely related to spatial resolution differences between Sentinel-2 (10 m to 20 m pixels) and RapidEye (5 m pixels), respectively.
Ensemble model performance from 10-fold cross-validation showed that, in most cases, combined sensor models had higher R
2 values and similar or lower RMSE for
α-diversity indices as compared to single-sensor models (
Table 4). In some instances, such as with Inverse Simpson’s (D2), separate Sentinel-2 and RapidEye models showed slightly greater R
2 values, but equal or increased RMSE and MAE. Random validation data compared well with observed values for combined sensor models that showed better goodness of fit than single sensor ensembles in all cases, including D2 and D3 models (
Table 5). Regularly spaced validation samples with a greater distance apart showed a lower model fit overall, primarily for Sentinel-2 and RapidEye models (
Table 5). Goodness of fit with regularly spaced validation samples also decreased for combined sensor models but showed consistently better goodness of fit than single-sensor models for H, D2, D3, and S. Fisher’s alpha (A) and Piélou’s J showed slightly better goodness of fit with regularly spaced plots only for RapidEye and Sentinel-2, respectively. Some spatial correlation among randomly placed validation plots in the 9 ha study area likely contributed to greater overall model performance (i.e., adj. R
2) observed from these comparisons.
In general,
α-diversity measurements that were more strongly correlated with species richness exhibited better model performance, such as Shannon’s H′ (r = 0.85) that is sensitive to rare tree species on a site. Scatterplots from combined sensor model predictions compared with random validation data showed a relatively strong positive relationship with other
α-diversity measurements except for Piélou’s J (
Figure 6a–f). Scatterplots from combined sensor models and regularly spaced validation plots showed similar results, but with fewer test samples to draw from (
Figure 7a–f). Piélou’s J showed the lowest model performance at each level of validation except with the Sentinel-2 model and regularly spaced validation samples.
Spatial predictions from the different sensor models showed higher
α-diversity values in the upper (northern) half of the 9 ha study area (
Figure 8a–c). Combined, Sentinel-2 and RapidEye models were in general agreement in terms of locations showing high and low
α-diversity. RapidEye imagery with higher spatial resolution (5 m pixels) showed more discrete differences between areas of high and low diversity. The larger pixel size from Sentinel-2 imagery (10–20 m) rendered a more generalized map of high and low tree diversity that was relatively consistent with diversity measures mapped using RapidEye. Combined sensor model predictions varied in appearance, which was likely dependent on variables selected from individual sensors that contributed most to each prediction. Variable importance measures from combined models indicated that, overall, elevation values were most important to
α-diversity predictions (
Figure 9a–f). Somewhat unexpectedly, canopy height was not among the top 10 predictor variables for most diversity measures, likely in part because average tree height was positively correlated with elevation (r = 0.54) in the study area. In contrast, mean, maximum and minimum canopy height predictors were among the top 10 variables selected with RFE for species richness, in addition to elevation mean and standard deviation (data not shown). Variable importance measures further indicated that a mixture of bands and VI from the two sensors and seasonal dates were helpful for making accurate predictions (
Figure 9a–f). This was clear from combined models where the Sentinel-2 chlorophyl absorption band 2 in the blue light spectrum from the leaf-on period was among the most important variables for Shannon’s H′, Inverse Simpson’s, Unbiased Simpson’s, and species richness predictions. Mean and standard deviation values from red, red-edge and NIR bands from both sensors were alternately important for diversity measures with an exception of Fisher’s Alpha showing elevation as strongly important (
Figure 9d). We did not interpret predictors for evenness because of relatively low model performance.
We found that observed
α-diversity measurements from simulated plots were, in many cases, under-represented by predicted values at the tails of the value distributions (
Figure 10a–f). For example, predicted species richness and related indices did not contain minimum and maximum values observed from plots. Only an unbiased Simpson’s index showed consistently good model performance and predictions spanning the full range of observed values from the combined model. Limitations are likely related to characterizing diversity from all tree species ≥5 cm DBH, some of which reside in the understory of larger trees. We found that validation plots with an increasingly dense understory generally showed higher species richness (
Figure 11a,b) that was positively correlated with most diversity indices. Average tree height showed a strong negative correlation with tree density (r = −0.74) that was positively correlated with all
α-diversity measures apart from Piélou’s evenness (
Supplementary 3).
We relied on matrix comparisons using training data sample data (
Figure 2a) to better understand the contribution of spectral reflectance measures from each satellite sensor for assessing
α-diversity. We established that
α-diversity was not significantly related to the geographic distance between plots but was strongly related to tree species composition that showed a significantly positive relationship with forest structure (i.e., tree height and density) on plots (
Table 6). These complexities were important for interpreting relationships between
α-diversity and spectral data. Mantel and partial Mantel tests revealed that in all cases Sentinel-2 spectral data showed a significant relationship with
α-diversity measures (
Table 6). Sentinel-2 spectral distance remained significantly related to
α-diversity when controlling for geographic and forest structure distance among training plots. RapidEye spectral data were not significantly related to
α-diversity once it controlled forest structure variables. In contrast, RapidEye showed a significantly positive relationship with the forest structure, while Sentinel-2 did not. Spectral data and VI from either of the two multispectral sensors showed no significant relationship with species composition, when forest structure was controlled using partial Mantel tests (
Table 6).
4. Discussion
We found that mixed sensor types provided complementary information for estimating levels of
α-diversity in SDTF. With a few exceptions, models using combined sensor data produced consistently lower model error and better goodness of fit for
α-diversity measures when compared with single sensor models. Model ensembles typically outperformed those which developed from a single ML approach showing lower cross-validation error and improved fit (
Supplementary 4). Satellite imagery with mixed spectral and spatial resolutions contributed distinct information to tree diversity predictions. Sentinel-2 data with a greater number of spectral bands showed a significant relationship with
α-diversity measures from Mantel tests that aligned more closely with the spectral variability hypothesis [
2,
28]. RapidEye bands and indices, in general, showed no direct statistical relationship to
α-diversity measures but were significantly related to forest structure (i.e., Bray–Curtis distance) that was indirectly linked to tree diversity indices. Fricker et al. [
22] indicate that spectral reflectance patterns from high spatial resolution imagery capture shadow and light gaps that are correlated with vertical forest structure and tree diversity. We found that RapidEye bands and indices are likely impacted by canopy surface roughness and light volume scattering that related significantly to tree height and density. Neither Sentinel-2 nor RapidEye data showed a direct significant relationship with tree species composition on model training plots which, as expected, was statistically related to
α-diversity measures (
Table 6).
Our results suggest that, in most cases, multispectral satellite imagery was indirectly associated with diversity measures but could adequately capture vegetation and biophysical variation in ways linked to tree diversity [
77]. Forest structure differences were strongly correlated with fine-scale topography in our study area. Importantly, the 9 ha study area on REA is in a transitional environment between deciduous dry scrub and tree dominated vegetation [
38]. These conditions are conducive to multiple canopy strata and mixed composition SDTF that showed no spatial correlation among tree species on plots near one another (
Supplementary 2). Lower elevation sites showing higher
α-diversity were an assortment of short-stature and overstory trees with higher tree density relative to upland sites. Correlation comparisons confirmed that elevation was negatively correlated with all
α-diversity measures in addition to tree height (
Supplementary 3). Mapped predictions showed consistently higher
α-diversity levels in lower topography as opposed to uplands sites composed of taller trees with comparatively open sub-canopy structure (
Figure 8a–c). We observed that actual
α-diversity estimates from validation plots were visually consistent with areas showing high and low
α-diversity from spatially explicit predictions (
Figure 12a–f). Higher tree diversity values are also generally aligned with lower elevation sites in the study area.
Our findings generally correspond with studies showing that canopy height variables explain a significant proportion of
α- and
β-diversity for SDTF in areas with differing levels of disturbance [
31,
54]. Hernández-Stefononi et al. [
54] found that SDTF height variability assessed from multi-return LiDAR explained differences in tree species richness. Interpolated canopy height in our study were not among the most important
α-diversity predictors, likely because of a moderately positive correlation with elevation. Fine-scale elevation data improved models because of its negative relationship with tree density (r = −0.45) that had a strong positive relationship with tree species richness (r = 0.71). Other topographic variables were less important in our models, which have also shown only minor gains for predicting tree diversity values in temperate forest systems [
48]. Nevertheless, the impact of local terrain variability, slope curvature and hillslope position that impact hydrology and solar radiation are potentially important at larger spatial scales [
22]. We found that indicators of environmental heterogeneity such as mean and standard deviation in elevation, eastness and northness alternately appeared as important variables for species richness, Shannon’s H′, inverse Simpson’s, and evenness indices (
Figure 9a–f), consistent with SDTF field studies on this site [
78].
To better understand the relationship between elevation, forest height structure, and
α-diversity, we experimentally removed elevation from tree species richness models. We found that minimum and mean tree heights became highly important with little or no model performance loss (
Supplementary 5). These results suggest that forest structure assessed from LiDAR can be a strong indicator of tree diversity in SDTF areas even when disturbance is low [
79]. Marselis et al. [
16] also found that vertical canopy structure assessed from GEDI waveform LiDAR was a reasonable proxy for tree species richness in wet tropical forest. Prior studies using simulated GEDI metrics describing vertical forest structure have shown a similarly significant relationship with Shannon’s H′ and tree species richness [
80]. In our study area, forest height structure was likely important to tree diversity differences, although interpolated canopy height data lacked other potentially important and complementary information on vertical canopy structure.
Combined model outcomes and variable contributions were not easily interpreted because of the ensemble approach used, and differences between important predictors for each
α-diversity measure. However, our results were comparable to studies in SDTF and other systems that showed improved tree diversity estimates could be obtained using multi-season imagery and mixed sensor types [
31,
48,
80]. Vegetation indices and spectral bands from leaf-off and leaf-on and sensor types were interchangeably important to
α-diversity measurements in our study area. Vegetation indices incorporating the red-edge band were often among the top predictors or were the most important variable in the case of inverse Simpson’s index. Ochoa-Franco et al. [
31] also found that the RapidEye red-edge band was the most important and statistically significant model covariate related to
β-diversity, explaining greater variation in tree diversity than tree canopy height.
Conversely, Sun et al. [
81] observed only minor gains from incorporating leaf-on red-edge band into remote sensing plant diversity index values for mixed broadleaf and conifer forest types in parts of China. In our study area, we observed that locations in low topography retained some photosynthetically active plant material during the peak dry season period (
Figure 1). These areas contrasted with upland sites showing very little dry season photosynthetic activity and had lower tree diversity. Low topography and biophysical conditions appeared to mediate differences in seasonal phenology related to tree diversity that were better captured by red-edge spectral indices. In addition,
Malpighia emarginata, an evergreen shrub or small tree found only in low topography, comprised 8% of the stems counted in the 9 ha area that intermix with other common evergreen species
Cynophalla mollis and
Colicodendron scabridum. A post hoc assessment of the RapidEye leaf-off red-edge VI only showed a statistically significant relationship with
α-diversity measures (Bray–Curtis distance) and partial Mantel tests controlling for geographic (Mantel r = 0.14,
p = 0.018) and forest structure (Mantel r = 0.12,
p = 0.035) distance. These outcomes suggest that plant phenology and leaf functional traits (e.g., deciduousness, chlorophyll, or nutrient content) captured by seasonal red-edge indices were helpful for distinguishing
α-diversity levels [
30].
In many cases, spatial predictions were similar between models, diversity indices and validation plots (
Figure 8 and
Figure 12). Of the six indices examined, Piélou’s J (evenness) was less correlated with other
α-diversity indices and showed low predictive capacity from ensemble models. Tree species evenness (relative abundance) is a component of species richness relative to the minimum and maximum number of species observed [
24] that is likely constrained by sensor type, number of spectral bands and band widths. In contrast, Redowan [
82] found that evenness categories could be accurately predicted for temperate forest types using an artificial neural network classifier with Landsat TM bands and terrain variables. Further work is likely needed to determine how data assimilation methods can better distinguish levels of trees species abundance and differences between sites for tropical areas when a larger number of species are present.
Although we did not specifically examine sampling differences and impacts on
α-diversity models with this study, initial comparisons indicated that tree measurements (e.g., plot size and minimum tree diameter used) can strongly influence diversity index values and prediction outcomes. Fricker et al. [
22] showed that sub-setting smaller diameter trees and shrubs (<10 cm DBH) improved species richness model predictions from remotely sensed data. We also found that shrubs or short stature trees were important in our models and distinguishing
α-diversity differences on simulated plots. Plot size relative to satellite sensor specifications has also proven important for producing robust model predictions [
80]. In our case, sensor types with larger pixel sizes produced some spatial correlation between closely spaced plots and spectral data. This may be less a factor for landscape-scale studies with data from forest plots that are widely apart. Nevertheless, incorporating varied sensor specifications into elements of field sampling design developed to capture plant diversity will likely improve model performance [
16,
22].
Our findings confirm the usefulness of mixed remote sensing platforms for distinguishing some elements of SDTF tree diversity, and not others. Diversity indices correlated with species richness were better predicted by proxy metrics than indices related to evenness that rely on precise estimates of species distribution and abundance. Assimilated spectral bands, seasonal VI, tree canopy height and topography were highly important predictors. Higher tree diversity was positively correlated with differences in forest height structure that occurred within specific topographic environments in our study area. Efforts to harmonize remotely sensed data sources, forest inventories and other field sampling efforts could likely advance broad-scale estimates of tree species diversity [
15,
83].
5. Conclusions
We established that georeferenced tree census data can provide unique opportunities for examining tropical tree species diversity with information obtained from global remote sensing platforms. From our study, we found that spatial and spectral resolution differences between RapidEye and Sentinel-2 imagery contributed unique information that was related to SDTF tree diversity. Higher spatial resolution RapidEye bands and indices were significantly correlated with tree density and canopy height that were indirectly related to α-diversity measures. Sentinel-2 provided higher spectral resolution data that was more directly correlated with the tree diversity indices examined. Seasonal imagery and vegetation indices from each sensor, useful for distinguishing phenology differences among species present, were frequently important in α-diversity models. Notwithstanding, we found that high resolution digital elevation data related to tree height and composition differences present within distinct topographic environments was vitally important to tree diversity predictions. Each data source routinely provided complementary information to α-diversity models.
Correspondingly important to our study were machine learning applications for variable selection, assimilation, and model development. With 156 possible predictor variables in combined sensor models, recursive feature elimination coupled with ensemble machine learning was efficient for data reduction, model integration and making spatially explicit α-diversity predictions. Optimized variables from multiple sources often resulted in superior model performance. Predicted species richness and tree diversity indices from Shannon’s H′, inverse Simpson’s, and unbiased Simpson’s unambiguously exhibited a stronger relationship with field validation data in comparison with single sensor models. Mapped α-diversity values largely agreed with areas of high and low diversity observed in the study area. The most robust predictions at each validation stage were from combined models for species richness and Shannon’s H′ index, which showed a strong positive correlation with one another (r = 0.85). Indices less affected by sample size, common or rare species (e.g., Fisher’s alpha) showed mixed results, alternately demonstrating better performance with combined and single sensor models when assessed with separate validation data. Model performance was relatively low in all cases for Piélou’s J, a measure of evenness that was not strongly correlated with tree species richness (r = 0.21). Further work and data assimilation methods are likely needed for assessing these and other alternate diversity measures.
Our findings suggest that forest structure and elevation data from global satellites such as GEDI waveform LiDAR and multispectral imagery collected at near-daily intervals, such as PlanetScope 8-band imagery, could enhance methods developed with this study. Greater alignment between tropical forest inventories and information obtained from global remote sensing platforms can likely yield significant gains for assessing SDTF tree diversity at landscape to regional scales. As national and international conservation programs seek to improve tropical forest information needed for attaining biodiversity goals, data integration methods examined here can help fill essential information gaps.