Predicting Vascular Plant Diversity in Anthropogenic Peatlands: Comparison of Modeling Methods with Free Satellite Data

Peatlands are ecosystems of great relevance, because they have an important number of ecological functions that provide many services to mankind. However, studies focusing on plant diversity, addressed from the remote sensing perspective, are still scarce in these environments. In the present study, predictions of vascular plant richness and diversity were performed in three anthropogenic peatlands on Chiloé Island, Chile, using free satellite data from the sensors OLI, ASTER, and MSI. Also, we compared the suitability of these sensors using two modeling methods: random forest (RF) and the generalized linear model (GLM). As predictors for the empirical models, we used the spectral bands, vegetation indices and textural metrics. Variable importance was estimated using recursive feature elimination (RFE). Fourteen out of the 17 predictors chosen by RFE were textural metrics, demonstrating the importance of the spatial context to predict species richness and diversity. Non-significant differences were found between the algorithms; however, the GLM models often showed slightly better results than the RF. Predictions obtained by the different satellite sensors did not show significant differences; nevertheless, the best models were obtained with ASTER (richness: R2 = 0.62 and %RMSE = 17.2, diversity: R2 = 0.71 and %RMSE = 20.2, obtained with RF and GLM respectively), followed by OLI and MSI. Diversity obtained higher accuracies than richness; nonetheless, accurate predictions were achieved for both, demonstrating the potential of free satellite data for the prediction of relevant community characteristics in anthropogenic peatland ecosystems.


Introduction
Peatlands are wetland ecosystems of great relevance, because they store large amounts of carbon [1], regulate the storage and purification of water [2], and are a habitat for particular species [3]. Anthropogenic peatlands emerge when native forests are cut in poorly drained sites, which trigger vegetation succession processes with high moss abundance (from the genus Sphagum [4]). This moss is currently harvested due to its value as a substrate for horticultural purposes. In this sense, the study of vegetation structure is key to preserving the functions and services that these peatland ecosystems provide to humans [5]. The climate is temperate with oceanic influence, with a dry period in summer. Annual precipitation is about 2100 mm, with mean annual temperatures of ~10 °C.
The three study peatlands are located at the Senda Darwin Biological Station (P2) and its surroundings (Figure 1), whose extensions are 10.5, 5.6, and 6.2 ha for P1, P2, and P3, respectively. Currently, the three peatlands do not present significant human pressure, allowing the successional development of vascular plants over the Sphagnum layer.
The three peatlands have the same dominant species, which are Baccharis patagonica, Gaultheria mucronata, Myrteola numularia, and Sticherus cryptocarpus (a full description of species composition and abundance is presented in Table S1). In addition, there is an important area dominated by reed species (Juncus procerus, Juncus planifolius, and Juncus stipulatus) and several species of ferns, dominated by the genus Blechnum (B. chilensis and B. penna-marina, among others). In spite of the species shared between peatlands, these can be differentiated in their horizontal structure, since their spatial arrangement and mean cover vary from one another. Peatland P1 has a structure dominated by open scrubs, allowing the establishment of numerous herbaceous species (most of them Gramineae) over Sphagnum, and reed species in flooded areas. Peatland P2 is dominated by dense shrubs, with a lesser number of herbs species. Reeds are still present in flooded areas where Sphagnum remains exposed. Peatland P3 has a matrix dominated by dense scrubs, with the same species mentioned for P2 but with the presence of Philesia magellanica (in addition to the presence of tree species), foregrounding the presence of the endangered Pilgerodendron uviferum. The climate is temperate with oceanic influence, with a dry period in summer. Annual precipitation is about 2100 mm, with mean annual temperatures of~10 • C.
The three study peatlands are located at the Senda Darwin Biological Station (P2) and its surroundings (Figure 1), whose extensions are 10.5, 5.6, and 6.2 ha for P1, P2, and P3, respectively. Currently, the three peatlands do not present significant human pressure, allowing the successional development of vascular plants over the Sphagnum layer.
The three peatlands have the same dominant species, which are Baccharis patagonica, Gaultheria mucronata, Myrteola numularia, and Sticherus cryptocarpus (a full description of species composition and abundance is presented in Table S1). In addition, there is an important area dominated by reed species (Juncus procerus, Juncus planifolius, and Juncus stipulatus) and several species of ferns, dominated by the genus Blechnum (B. chilensis and B. penna-marina, among others). In spite of the species shared between peatlands, these can be differentiated in their horizontal structure, since their spatial arrangement and mean cover vary from one another. Peatland P1 has a structure dominated by open scrubs, allowing the establishment of numerous herbaceous species (most of them Gramineae) over Sphagnum, and reed species in flooded areas. Peatland P2 is dominated by dense shrubs, with a lesser number of herbs species. Reeds are still present in flooded areas where Sphagnum remains Remote Sens. 2017, 9, 681 4 of 15 exposed. Peatland P3 has a matrix dominated by dense scrubs, with the same species mentioned for P2 but with the presence of Philesia magellanica (in addition to the presence of tree species), foregrounding the presence of the endangered Pilgerodendron uviferum.

Ground Data
A systematic sampling of vascular plant richness (expressed as the number of species in a given area or a given sample) was conducted to adequately represent the variability of the ecosystem, due to the high spatial heterogeneity of peatlands [35]. For this task, sampling points were located in a regular grid of 60 m in each peatland, to adjust the sampling points with the spatial resolution of Landsat 8 (30 m). This implied that 15 sampling points were located in each peatland, obtaining a total of 45 observations. Each plot consists in square plots of 2 m × 2 m, where the species occurrence and cover were registered [36]. The vegetation assessments were conducted in two field campaigns: during January 2014 for P2, and during January 2016 for P1 and P3 ( Figure 1).
Finally, we obtained for each plot the species richness (species count) and diversity, which we estimated using the Shannon index [22]. This index is one of the most popular plant diversity indexes used in ecological studies [25], and can be calculated as H = − ∑ P i × log(P i ), where P i is the relative proportion of species i. Foody and Cutler [31] point out the superiority of that index over the richness, to reflect the structural variability of a landscape (since it allows a better representation of the dominant composition, and hence the dominant structure of a plant community).

Remote Sensing Predictors
Models were built with free available satellite data from three sensors: Operational Land Imager (OLI), Advanced Spaceborne Thermal Emission and Reflection (ASTER), and Multi Spectral Instrument (MSI). Acquisition dates of scenes were defined according to the field campaigns date. For OLI, the dates were 24 December 2013, and 6 January 2016. Surface reflectance at 30 m spatial resolution was obtained from The Climate Data Record (CDR). ASTER scenes were obtained on 23 September 2013 and 31 January 2016. Surface reflectance at 15 m spatial resolution was obtained from the ASTER 7XT products. MSI data were obtained only for 5 January 2016, because it has only been operational since mid-2015). Top of Atmosphere reflectance at 10 m spatial resolution was obtained for the visible spectrum and one near infrared band (NIR), and at 20 m resolution in four NIR bands and two Shortwave Infrared bands (SWIR). Data were acquired from the Level 1C product, and the Dark Object Subtraction (DOS) atmospheric correction was applied to derive Surface reflectance [37].
Model predictors were obtained from all available bands of the three sensors. From these reflectance ranges, common vegetation indices (VI) used in wetland studies were computed (see Table 1). Furthermore, texture predictors were obtained from spectral bands and VI, which were divided in first and second order predictors, following Mairota et al. [38]. The first order predictors were the median and the standard deviation of pixels obtained from a 3 × 3 moving window, computed using focal statistics tools from ArcGIS 10.3 (ESRI, Redlands, CA, USA). The second order predictors were obtained using the gray-level co-occurrence matrix (GLCM) [39], namely the mean, variance, homogeneity, contrast, correlation, second moment, entropy, and dissimilarity. A 3 × 3 moving window was considered, using ENVI 5.1 (Exelis Visual Information Solutions, Boulder, CO, USA).
Finally, three sets of predictors were obtained, with a total of 148, 360, and 724 predictors for the OLI, ASTER, and MSI models, respectively. Reflectance values at the plot locations were obtained by bilinear interpolation [19].
Tasseled Cap Transformation TCT = Greenness, Brightness, and Wetness [40] x x Modified Soil Adjusted Vegetation Index , and c are spectral bands x 1 This index was calculated with selected NIR and Red bands to avoid overload with too many predictors.

Statistical Models
Species richness was modeled using two approaches: random forest (RF), and generalized linear models (GLM).
According to Lopatin et al. [33], the processing of data using GLM can be subdivided into three steps:

1.
Identify the proper model family to deal with the statistical properties of the observed variables.
To predict species richness, we compared the normalized quantile-plots of the residuals of several GLMs, using the model families which are generally recommended for count data: Poisson, Quasi-Poisson, and negative binomial. All models tested were set with log-link functions.
To predict the diversity, we tested model families (recommended for continuous positive data) as Gamma with inverse, identity, and log-link functions. Likewise, we tested less adequate Gaussian error distribution with log-link function. In all cases we tested the models using the best (only one) predictor, ranked by recursive feature elimination (RFE; see below). By the shape of the normalized quantile-plots of the residuals, we selected the Poisson error distribution with a log-link function for the species richness models, while the Gaussian error distribution with log-link function was selected for the prediction of diversity.

2.
Select the subset of predictors for each model. As GLMs cannot cope with multi-collinearity among independent predictors, a selection and ranking of the most important predictors was performed by the recursive feature elimination (RFE) algorithm [41]. This algorithm operates based in an iterative procedure, in which one predictor at a time is eliminated and ranked by its importance. We used random forest as the kernel. RFE was implemented using a leave-one-out cross validation procedure to acquire the RMSE and the number and importance of predictors. Models were selected based on a tradeoff between a low RMSE and a low number of predictors. In addition, we followed the recommendations of Hair et al. [42], who state that the number of observations should not be less than 5 per predictor, and ideally should be between 15 and 20.

3.
With the first subset obtained, the correlation between these predictors was assessed using the Pearson correlation coefficient (r), selecting the pairs of predictors with r ≥ |0.6| and eliminating, one by one, the least important predictors according to the RFE ranking. This process was carried out independently for each sensor model, so the models could be compared.

4.
Select the final model. The final number of predictors were determined by calculating several models and adding a single predictor in each run (starting with the most important predictor according to RFE). The best number of predictors was assessed by comparing the Akaike information criterion (AIC) and deviance [43]. We selected the model using three predictors.
The ensemble regression tree method RF [44] has been reported to be an efficient predictive approach, especially when the number of observations is comparatively low compared to the number of predictors [45]. The algorithm requires that two parameters be set: (1) mtry, the number of predictors performing the data partitioning at each node; and (2) ntree, the total number of trees to be grown in the model run. The ntree parameter was set to 500, following the recommendations in the literature [46], while mtry was tuned for each model. All statistical processes were carried using R (packages randomForest [47] and Caret [48]). Here, we also use the RFE-ranked predictors to find the best set of predictors. As in the GLM models, we added a single predictor at the time to find the best model per sensor. The best models were selected by the percentage of explained variance.
To check the statistical assumptions of the models, we applied a Shapiro-Wilk [49] test to the residuals and the Moran test to assess the spatial auto-correlation [50], discarding those models that did not pass both tests.

Model Validation
For validation purposes, we followed the recommendations of Lopatin et al. [33], where the final RF and GLM models were embedded in a bootstrap procedure with 500 iterations. In each bootstrap iteration we drew 45 times, with replacements from the 45 available samples. In this procedure, on average, 36.8% of the total number of samples (~28 samples) were not drawn, and were used as samples for the independent validation [46]. The model performances of RF and GLM were compared, based on differences in the squared Pearson correlation coefficient (R 2 ), normalized root square error (%RMSE), bias (measured as one minus the slope of a regression, without intercept of the predicted versus observed values), and a final model selected. We tested for significance differences (α = 0.05) between algorithms and sensors by applying a one-sided bootstrap test [33]. First, we obtained the median values of the previous bootstrap accuracies (R 2 , %RMSE and bias), described above. Using these values as a reference, we selected which algorithm performed accurately for each sensor. Second, we embedded the models in another bootstrap, where in each iteration the R 2 of the 'better' models were subtracted by the R 2 of the 'worst' model, and the other way around for %RMSE and bias (assuming that the 'better' model will have higher R 2 and lower errors and bias). From these distributions, a one-sided test was performed to test if the differences between GLM and RF were larger than zero (based on 500 bootstrap samples). The same procedure was applied to define significant differences between sensors (using the best algorithm in each sensor).

Predictive Species Map
The final step of the procedure consisted in obtaining the richness and diversity maps, which was done using the best models obtained in each case. The maps were computed based on 500 iterations from the bootstrap, using the best selected models in each iteration. The predictors used to generate the models were spatially explicit, allowing the extrapolation (and prediction) of areas that were not covered by the pixels used to build the models, which correspond to the field samples. With that procedure 500 predictive maps were obtained, and we report maps of the median values and the coefficient of variation (CV, given in %) values for the species richness and diversity, to account for stability in predictions within each pixel. The predictions were made solely within the limits of the study of peatlands, to avoid predictions beyond the range of the model.

Model Performance
By applying RFE, we obtained the importance and the optimal number of predictors for each model. According to Hair et al. [42], and considering the number of the observation in our study (n = 45), the maximum number of predictors for each model was 3. Table 2 shows the predictors included in the final models for predicting plant richness and diversity. Table 2. Predictors selected in the final models. The predictors used in each model are marked with an x and ordered importance according to recursive feature elimination (RFE). Abbreviations of predictors are presented in Table 1. The numbers in the subscript correspond to the central wavelength of the sensor band (see Table S2). The validation metrics R 2 , %RMSE, and bias, obtained from the 500 bootstrap, showed no significant differences between GLM and RF for species richness and diversity ( Figure 2); but, in most cases, GLM models showed higher accuracies using fewer predictors, except for species richness prediction obtained with MSI (Table 2). Non-significant differences were found between algorithms and sensors; however, higher accuracies were obtained with ASTER for species richness performed with RF (R 2 = 0.62 and %RMSE = 17.2% for the median bootstrap value) and for diversity performed with GLM (R 2 of 0.71 and %RMSE of 20.2% for the median bootstrap value) ( Table 3). OLI obtained a similar performance for predicting species richness with GLM (R 2 = 0.59 and %RMSE = 18.3% for the median of bootstrap), but a slightly lower for predicting diversity with GLM (R 2 = 0.63 and %RMSE = 22.8% for the median bootstrap value). MSI obtained similar prediction of species richness with GLM (R 2 = 0.60 and %RMSE = 18.3% for the median bootstrap value), but inferior prediction of diversity using RF. Predictions of diversity obtained higher accuracy than species richness in terms of R 2, but not in terms of %RMSE and Bias (Table 3), showing a systematic error in the diversity predictions.  Non-significant differences were found between algorithms and sensors; however, higher accuracies were obtained with ASTER for species richness performed with RF (R 2 = 0.62 and %RMSE = 17.2% for the median bootstrap value) and for diversity performed with GLM (R 2 of 0.71 and %RMSE of 20.2% for the median bootstrap value) ( Table 3). OLI obtained a similar performance for predicting species richness with GLM (R 2 = 0.59 and %RMSE = 18.3% for the median of bootstrap), but a slightly lower for predicting diversity with GLM (R 2 = 0.63 and %RMSE = 22.8% for the median bootstrap value). MSI obtained similar prediction of species richness with GLM (R 2 = 0.60 and %RMSE = 18.3% for the median bootstrap value), but inferior prediction of diversity using RF. Predictions of diversity obtained higher accuracy than species richness in terms of R 2 , but not in terms of %RMSE and Bias (Table 3), showing a systematic error in the diversity predictions.

Prediction Map
Predictive maps of richness and the Shannon index, and their corresponding coefficient of variation, are shown in Figures 3 and 4, respectively. No spatial autocorrelation was found for the residuals of the models (Moran: I = −0.053, P = 0.68 for species richness and I = −0.051, P = 0.66 for diversity), demonstrating robust spatial predictions. The predictive map of species richness shows values ranging between 6 and 13 species, distributed evenly along the three peatlands, with most of the values around 6 and 10 species and maximum values located in the border of the peatlands. The map of CV represents the stability of the predictions, with maximum values around 14% in a few pixels, and most of them <9% (Figure 3).

Prediction Map
Predictive maps of richness and the Shannon index, and their corresponding coefficient of variation, are shown in Figures 3 and 4, respectively. No spatial autocorrelation was found for the residuals of the models (Moran: I = −0.053, P = 0.68 for species richness and I = −0.051, P = 0.66 for diversity), demonstrating robust spatial predictions. The predictive map of species richness shows values ranging between 6 and 13 species, distributed evenly along the three peatlands, with most of the values around 6 and 10 species and maximum values located in the border of the peatlands. The map of CV represents the stability of the predictions, with maximum values around 14% in a few pixels, and most of them <9% (Figure 3). The predictive maps of diversity show values ranging between 0.64 and 2.81, with the highest values in peatland P1, followed by P2 and P3 (Figure 4). The high values correspond to areas where shrubs are more sparsely distributed and where Sphagnum remains freely exposed, allowing the establishment of numerous herbaceous species. The map of CV shows values <15%, with higher variability in areas with a high level of the Shannon index ( Figure 4).

Prediction Map
Predictive maps of richness and the Shannon index, and their corresponding coefficient of variation, are shown in Figures 3 and 4, respectively. No spatial autocorrelation was found for the residuals of the models (Moran: I = −0.053, P = 0.68 for species richness and I = −0.051, P = 0.66 for diversity), demonstrating robust spatial predictions. The predictive map of species richness shows values ranging between 6 and 13 species, distributed evenly along the three peatlands, with most of the values around 6 and 10 species and maximum values located in the border of the peatlands. The map of CV represents the stability of the predictions, with maximum values around 14% in a few pixels, and most of them <9% (Figure 3). The predictive maps of diversity show values ranging between 0.64 and 2.81, with the highest values in peatland P1, followed by P2 and P3 (Figure 4). The high values correspond to areas where shrubs are more sparsely distributed and where Sphagnum remains freely exposed, allowing the establishment of numerous herbaceous species. The map of CV shows values <15%, with higher variability in areas with a high level of the Shannon index (Figure 4).  The predictive maps of diversity show values ranging between 0.64 and 2.81, with the highest values in peatland P1, followed by P2 and P3 (Figure 4). The high values correspond to areas where shrubs are more sparsely distributed and where Sphagnum remains freely exposed, allowing the establishment of numerous herbaceous species. The map of CV shows values <15%, with higher variability in areas with a high level of the Shannon index (Figure 4).

Selected Predictors and Their Ecological Implications
Most selected predictors for modeling plant richness in peatland ecosystems had the influence of the infrared electromagnetic spectrum, which is a wavelength area often defined as the most relevant to study the differences in vegetation structures [51,52]. In the NIR region, high levels of reflectance are present, due to the cellular structure of leaves that reflect a great part of the incident energy, and by canopy structural traits, such as leaf area and angle [53], that can differentiate between species compositions or plant strategies [54] related to diversity. Meanwhile, in the SWIR region there is a low reflection in specific regions of the spectrum, where water molecules within cells absorb most of the incident energy [55]. The visible spectrum, generally dominated by the absorption of pigments and canopy structures, usually has less variability among plant species, as compared to infrared spectrum [55,56].
Regarding the selected predictors, most of them (14 out of 17) were textural predictors derived from GLCM or standard deviation ( Table 2). The spectral heterogeneity has been defined as the most important factor that explains plant richness [57] and plant diversity [25]. This statement was based on the spectral variation hypothesis, which states that a higher spectral heterogeneity shows a positive correlation with species richness [25,58]. Theoretical and empirical studies suggest that the diversity of a particular site is strongly influenced by and positively correlated with its environmental heterogeneity [32]. More complex environments can host a greater number of ecological niches, which in turn can be colonized and inhabited by a larger number of species. In our study, developed at a local scale, this phenomenon was well captured by the spectral heterogeneity contained in the textural metrics (derived from spectral information). Nevertheless, recent efforts have proven that this hypothesis may not hold areas across landscapes with coarser resolution [59].

Satellite Sensor Comparison
Similar predictions for species richness and diversity were obtained for the three sensors, with greater discrepancy in diversity predictions ( Figure 2). The non-significant differences was likely related to a similar spatial resolution (ranging from 10 to 30 m), and to the spectral capabilities of sensors to provide NIR information, which has demonstrated to be relevant in α-diversity studies [52,60].
Considering this, it is expected that MSI should provide the best predictions of richness and diversity because of its more detailed spectral representation of the NIR region, including 5 bands. We found something different, which may have been caused by the following factor: the use of a January 2016 image for representing the field samples of January 2014 in Peatland 2, due to the lack MSI scenes before 2015. That mismatching between field samples and satellite observations could be the main reason for the poor performance of MSI. This is supported by Bradley [61], who mentioned that, ideally, acquisition time should be at the same time as field samples extraction (the method of atmospheric correction applied to MSI information), since DOS has demonstrated to have poor performances in several applications (e.g., [62]). In this sense, ASTER and OLI products have implemented complex and validated atmospheric correction algorithms, which can provide a more reliable estimation of surface reflectance (the spatial resolution of NIR bands could have played a role as well, especially in peatlands where the community gradient and soil chemical compositions may change drastically over distances of a few meters [63]). Here we obtained better predictions with the ASTER sensor (the one with finer spatial resolution (15 m)) as compared to MSI (20 m) and OLI (30 m), allowing a better representation of ecosystem variability, in terms of composition and dominance of species [26]. This is consistent with the results of other studies, where the NIR spectrum region was selected as the most relevant predictor using a hyperspectral sensor to separate species in a mangrove ecosystem, where water and vegetation strongly interact [52]. The use of ASTER sensor in α-diversity predictions was already been reported by Feilhauer and Schmidtlein [29], who obtained similar results in a walnut-fruit forest for species richness (R 2 = 0.51) and diversity (R 2 = 0.61).
Regarding the textural metrics, Laurin et al. [24] in the Gola Rainforest National Park, obtained good predictions (R 2 = 0.84) using these type of metrics to estimate diversity based on hyperspectral data. Species richness predictions, using textural predictors, were reported by Cabezas et al. [19] in an anthropogenic peatland divided into two zones with different types of management: productive (with Sphagnum extraction) and conservation (corresponding to peatland P2 in this study). In summary, spatial resolution of the scenes seems to be important for the analysis, especially in peatlands. Rocchini et al. [26] point out that ideally the size of the pixel should be at least the same size as the sampling unit, especially when spectral heterogeneity is computed to estimate the local diversity of species, as in this study. Nevertheless, when pixel size has a very small dimension (1-5 m), the shadows can create a high level of spatial heterogeneity that can lead to producing more noise rather than adding relevant information [64]. In addition, Rocchini et al. [26] mentioned that coarse spatial resolutions can affect the representation of actual heterogeneity due to a smoothing process that makes it difficult to detect patterns that exist at a finer scale. Hence, this trade-off between noises caused by high resolution and the information loss caused by low resolution must be taken into account.
Nonetheless, in this study the moving window of 3 × 3 pixels was considered to compute the textural predictors, regardless of the spatial resolution of each sensor. Therefore, the smoothing effect differs among sensors, and thus could possibly affect the representation of actual heterogeneity of the ecosystems, consequently underestimating the environmental gradient at low resolutions. However, on the three-pixel scale considered, similar accuracies were obtained by the different modeling approaches, which may imply that the spatial difference (10-30 m) does not result in a significant difference in the acquisition of textural information in such highly heterogeneous environments (where medium-resolution satellites may only account for general patterns).

Model Comparisons
The slightly better performance (though not significant) of GLM in species richness modeling could be due to its ability to handle count data, where the error has a non-symmetrical distribution [33]. In our particular case, the Poisson distribution was the most appropriate to represent the error distribution, agreeing with other studies [65].
Machine-learning methods such as RF are popular among modelers because they are described as non-parametric, because it is usually assumed that there are no requirements concerning the error distribution. Nevertheless, this is not true for RF, which either fits standard linear (Gaussian) regressions for tree nodes or is based on measures for node impurity, such as the sum of squared deviations from the mean [66]. But, it is true that in several cases they have proven to be more accurate than parametric approaches with easier tuning procedures. Despite these advantages, their predictions are often difficult to follow, compared to parametric approaches such as GLM [67]. Another issue with RF is that the application of sub-sampling in the algorithm can result in an increment of the variance, especially when the number of samples is small [21]. This can turn into a major problem in the validation process, when the bootstrap method is implemented and sub-samples are chosen, reducing the input observations used in the model. In spite of these limitations, studies using RF to predict species richness obtained accurate predictions with similar field samples as the one used in this study [19,24], and demonstrated to be an important tool for modeling α-diversity. This study shows that both GLM and RF are valid methods for predicting α-diversity with similar performance; nonetheless, according to Latifi et al. [21] and Lopatin et al. [33], GLM proved to be a more parsimonious approach, since it was easier to interpret, required less computation time, and fewer predictors.

Conclusions
Accurate predictions of α-diversity were obtained in anthropogenic peatlands from data of three medium-resolution sensors (OLI, ASTER, and MSI), using the random forest (RF) and generalized linear model (GLM) approaches. Predictions of vascular plant richness were obtained with an acceptable level of accuracy, and similar results were obtained for plant diversity, demonstrating the capability of sensors to estimate α-diversity.
In spite of the non-significant differences between algorithms and sensors, ASTER was able to provide the most accurate models for both species richness and diversity, probably due to its spatial resolution in the NIR band. The comparison between RF and GLM modeling approaches showed no significant differences either, but slightly better results were often obtained with GLM, which is easier to understand and provides a more parsimonious prediction with fewer predictors. Most of the selected predictors were derived from the computation of textural variables. The inclusion of the spatial context represented by the textural metrics was an important feature in modeling α-diversity.
Free satellite data have proven to be helpful for modeling α-diversity in anthropogenic peatlands, which are ecosystems of great relevance (both locally and globally).

Supplementary Materials:
The following are available online at www.mdpi.com/2072-4292/9/7/681/s1, Table S1: Species cover by peatland, Table S2: Spectral resolution of sensors. The sensor bands are divided by spectral range, where NIR is near infrared and SWIR is short wave infrared.