Plant Species Discrimination in a Tropical Wetland Using In Situ Hyperspectral Data

We investigated the use of full-range (400–2,500 nm) hyperspectral data obtained by sampling foliar reflectances to discriminate 46 plant species in a tropical wetland in Jamaica. A total of 47 spectral variables, including derivative spectra, spectral vegetation indices, spectral position variables, normalized spectra and spectral absorption features, were used for classifying the 46 species. The Mann–Whitney U-test, paired oneway ANOVA, principal component analysis (PCA), random forest (RF) and a wrapper approach with a support vector machine were used as feature selection methods. Linear discriminant analysis (LDA), an artificial neural network (ANN) and a generalized linear model fitted with elastic net penalties (GLMnet) were then used for species separation. For comparison, the RF classifier (denoted as RFa) was also used to separate the species by using all reflectance spectra and spectral indices, respectively, without applying any feature selection. The RFa classifier was able to achieve 91.8% and 84.8% accuracy with importance-ranked spectral indices and reflectance spectra, respectively. The GLMnet classifier produced the lowest overall accuracies for feature-selected reflectance spectra data (52–77%) when compared with the LDA and ANN methods. However, when featureselected spectral indices were used, the GLMnet produced overall accuracies ranging from 79 to 88%, which were the highest among the three classifiers that used feature-selected data. A total of 12 species recorded a 100% producer accuracy, but with spectral indices, and an additional 8 species had perfect producer accuracies, regardless of the input features. The results of this study suggest that the GLMnet classifier can be used, OPEN ACCESS Remote Sens. 2014, 6 8495 particularly on feature-selected spectral indices, to discern vegetation in wetlands. However, it might be more efficient to use RFa without feature-selected variables, especially for spectral indices.


Introduction
Over the last decade, leaf spectral reflectance has been used successfully to discriminate plant species found in various habitat types/ecosystems [1][2][3][4]. In particular, in situ hyperspectral measurements greatly assist with the discrimination process by allowing contiguous spectral data to be analyzed statistically. The manipulation of small, but often significant differences in the spectral curves using methods, such as continuum removal, permits the identification of different vegetation types [1,5].
Species differentiation has also been achieved with univariate and multivariate approaches, which include parametric and non-parametric analysis of variance [6][7][8], discriminant analysis [9] and classification and regression tree-based techniques [10]. These methods can be used individually or several methods can be combined to achieve hyperspectral feature reduction. Mather and Koch [11] described the two main categories of feature reduction: feature extraction and feature selection. Feature selection seeks to reduce the original data set to a subset of features that retain information required to better separate the classes, while excluding highly correlated and redundant features from the classification analysis [12]. Contrastingly, feature extraction applies a transformation that allows the definition of a small set of new features that contain the majority of information contained in the original data set [13].
Band selection techniques have been used to select informative bands in hyperspectral data from spectra collected in different bio-types and to discriminate vegetation types or species [13][14][15][16]. However, intra-species variations due to age differences [17], micro-climate [18], edagilogics, topography [19], phenology [20], illumination [20], precipitation and other environmental factors [20] can influence the biophysical and biochemical constituents of a leaf [1,20], and have limited the success of these band-based techniques.
Consequently, hyperspectral variables and indices that are known to be related to foliar pigments have been used to discriminate between canopy species across different landscapes. Spectral indices are mathematical transformations of spectral reflectance that can be used to improve the accuracy of vegetation signals [21]. These indices can also be used to differentiate plant species that differ in canopy structure and/or biochemical composition (example: [2,22]. Therefore, a suite of hyperspectral metrics that indicate vegetation chemical and structural properties are often used for species discrimination [22]. Such species discrimination may prove advantageous to natural resource managers tasked with monitoring vegetation invasions, for example. Hyperspectral data may prove invaluable in landscapes characterized by a high degree of heterogeneity, fragmentation and high biodiversity, such as those present in the Black River Lower Morass (BRLM) in Jamaica. The sustainable management of different ecosystem types within this wetland requires a detailed understanding of vegetation species distribution and the ability to identify plants accurately and efficiently at the landscape level [23]. Moreover, wetlands inventory and the monitoring of vegetation species quality and distribution are important for management, but can be impeded by the marsh and seasonally-to-permanently flooded conditions. To this end, field spectrometry has been used to characterize the reflectance of vegetation types in situ and for the scaling-up of measurements from the leaf to crown scales [9] and within laboratory settings [24].
We therefore attempted to build hyperspectral libraries from the most indicative vegetation species found in remnant swamp forest patches and morass areas in the largest wetland in Jamaica, giving priority to capturing reflectance from endemic vegetation wherever possible. Both reflectance spectra and spectral indices were used in an attempt to discriminate between sampled species. Generalized linear models fitted with an elastic net regularization (GLMnet) is an untested technique for tree discrimination using hyperspectral data. The performance of this supervised learning classifier was compared to linear discriminant analysis (LDA) and neural networks (ANN), two algorithms commonly employed for such tasks. Therefore, the objectives of this investigation were to: (1) compare the performance of LDA, ANN and GLMnet techniques in identifying wetland vegetation species; (2) compare the effectiveness of spectral features and indices derived from five feature selection procedures; and (3) examine the analysis capability of hyperspectral data for identifying vegetation found in different niches in the BRLM. We also compared the ability of the random forest tree-based classifier (RF) to discern species using non-feature-selected reflectance spectra and spectra indices.

Study Site
Located in the parish of St. Elizabeth, Jamaica, the Black River Lower Morass (coordinates 18.189553°N and 77.683307°W) has an approximate area of 6,075 ha and is the largest wetland on the island. The BRLM was declared a wetland of international importance in 1998 by the Convention on Wetlands of International Importance, RAMSAR [25]. The boundary of the RAMSAR site was used to define the limits of the study area (Figure 1), and within the study site, there are patches of swamp and mangrove forests, morass, varied types of herbaceous wetland habitats and several raised limestone islands. Five human-residential communities are found on different limestone islands within the confines of the RAMSAR boundaries and at least 6 communities border the area. The 2011 population census estimated that approximately 20,000 persons live in those communities [26].

Sample Acquisition
To investigate the ability of reflectance patterns of leaves to effectively discriminate plant species, we attempted to collect 20 healthy, sun-lit leaves from at least 15 individual plants per species. This was not always achieved, due to availability, access, leaf position in the canopy (shading), fungal growth and poor leaf health. However, we tried to ensure that the number of sampled plants allowed for spectral variation and a wide spatial distribution between individuals. The list of species sampled and the number of training and test spectra used for the subsequent data analyses are given in Table 1.  Leaves at different positions in the canopy may have unique spectral attributes, due to differences in photosynthetic properties. Therefore, we stratified our sampling of leaves according to tree height. Specifically, we sampled leaves from the well-lit top, middle and low foliage branches from the crowns of individual plants. The remoteness of our site, coupled with the prevalent soft peat substrate in the morass prohibited the use of ladders, cranes or any other mechanism to access the canopy tops safely. Leaves were collected for immediate processing by either climbing trees, or by using a telescopic pole with a clipper attached to one end, or a combination of the two methods. Consequently, with the exception of Rhizophora mangle (RHMA) (which was easy to climb), samples were usually obtained from trees ≤10 m in height.
In the field, foliar reflectance was measured immediately after the leaf was cut. However, on days when inclement weather prevented immediate reflectance measurement, clipped samples were transported to a field shelter within 45 minutes, where their reflectance was measured immediately. The leaves were handled in a manner consistent with Liu et al. [16], thereby ensuring leaf freshness. We assessed differences in the reflectance pattern of leaves stratified according to canopy heights (the independent variable with three levels using a one-way ANOVA) for each species and found that they were not statistically different (F 2,38 = 35.4; p < 0.19). Additionally, the spectra of leaves that were clipped and transported were not significantly different when compared with the spectra of leaves analyzed in situ (F 2,35 = 30.1; p < 0.18). We therefore pooled the spectra of leaves from the different canopy-strata and those obtained both in situ and ex situ to calculate the mean reflectance curve for each tree.

Collection of Reflectance Spectra
During October, 2012, reflectance spectra for 46 plant species (Table 1) were collected with a full-range analytical spectral device (ASD) (Fieldspec®4, Analytical Spectral Devices, Boulder, CO, USA). The ASD instrument consists of three separate spectrometers and covered a spectral range of 350 nm to 2,500 nm. These spectrometers ranged from 350 to 975 nm, 976 to 1,770 nm and 1,771 to 2,500 nm, with spectral resolutions of 3 nm at 700 nm and 10 nm at 1,400 and 2,100 nm, respectively. A leaf-clip assembly was interfaced with the ASD Plant Probe, with an internal Halogen light source, and a 1.5-m fiber optic cable was used with an effective 25° field of view. The leaf-clip holds the target sample in place without removing the leaf from a tree and excludes ambient light, ensuring a constant geometry for the light source and foreoptics.
Fitted to the leaf-clip was a two-sided, rotating head, each embedded with a background panel. The black panel face (less than 5% reflectance) was used for reflectance measurements, and the white panel face was fitted with a 99% reflectance panel (Spectralon, Labsphere, North Sutton, NH, USA). A dark current correction was performed every ten minutes to eliminate instrument noise from spectral measurements, and a white reference measurement was taken to convert leaf radiance to percent reflectance. The spectrometer was programmed to give a spectra with 10 spectral averaging, to obtain reliable mean and variance estimates.
Each leaf, depending on its surface area, was measured in 3 to 5 (3 for most species) spots along an axis perpendicular to the main leaf vein [27]. For smaller leaves, for example Adenanthera pavonina (ADPA) and Avicennia germinans (AVGE), the field of view occupied over 50% of the surface area of the leaf. In such cases, single-leaf repetitions were not possible; therefore, spectra from other leaflets on the same bi-pinnate stalk were treated as the replicate per leaf. For larger leaves (e.g., Alpinia allughas (ALAL)), a minimum of 3 spots per leaf was used. Palm leaves (for example, Roystonea princeps (ROPR) and Calyptronoma occidentalis (CAOC)) were measured on a central axis parallel with the length of the leaves, as per Benoit et al. [27].

Preprocessing
The preprocessing procedure employed by Pu [2] was used during this study. Reflectance values below 400 nm and above 2,400 nm were truncated and curve smoothing was applied to the remaining bands by using a simple average over blocks of five neighboring bands. Spectral curves were then normalized by dividing the curve by its mean reflectance, which reduces intraspecies spectral variability by suppressing illumination differences [28]. Thus, the spectral reflectance curve ρ i , was replaced with (ρ i /((1/k)Σ ρ i ), where k represents the total bands of the spectral reflectance. The normalized data were then used in all subsequent analyses. All statistical analyses were conducted using the R statistical package Version 3.0.2 [29].

Extraction of Spectral Variables
The 47 spectral variables used for classifying the sampled species are given in Table 2. These include derivative spectra, spectral vegetation indices, spectral position variables, normalized spectra and spectral absorption features extracted from the in situ hyperspectral measurements. Continuum removed spectra features were not included in this study, because they have been reported as ineffective for plant species discrimination (e.g., [1,2]).  Lichtenthaler et al. [33] Simp Peñuelas and Filella [36] Simp le Ratio, SR (ρ 774 )/(ρ 677 ) Indicator of prolonged vegetation stress due to changes in canopy structure.
Datt [46] Chlorophyll Index, SGA Chlorophyll content Sims and Gamon [47] Chlorophyll Index , SGB (ρ 750 − ρ 445 )/(ρ 705 − ρ 445 ) Chlorophyll content Sims and Gamon [47] DattA (ρ 780 − ρ 710 )/(ρ 780 -ρ 680 ) Chlorophyll content Datt [46] Water Index at 1,180 nm, WI.1180 Pu et al. [49] An outline of the steps used in this study is shown in Figure 2. All relevant reflectance spectra and spectral indices were randomly split into training (approximately 70% of the data) and test data (approximately 30%) sets. Five different feature selection methods were used to isolate spectra and identify indices that were most effective at discriminating among species. The training and test data sets were subsequently modified to include only those indices and spectra identified during the respective feature selection procedures. Each feature-selected training data set was then used for training the 3 different classifier models. The test data sets were then used to assess the classifier's performance by comparing their overall accuracy and kappa statistic. All of the training spectra and indices were respectively classified with a random forest classifier, without applying any prior feature selection.

Figure 2.
Flowchart showing a summary of the process followed in this study.

Feature Selection and Discrimination Procedure
Feature selection by means of recursive feature elimination uses a feature-ranking criterion to produce a list of features arranged by their discriminatory ability and therefore provides a means by which the most parsimonious model can be selected. For this study, a statistically-based criterion that estimated the importance of the waveband features was used to separate features. Spectra and indices were first standardized to ensure that they had a mean of 0 and a standard deviation of 1 before feature selection. A support vector machine algorithm with a 10-fold cross-validation was implemented using the R package e1071 (Version 1.6-1) [50], coupled with the caret package [51] to select the 'optimal' number of variables for the reflectance spectra and spectral indices data sets that produced the highest accuracy. A feature set of 15 indices and 20 spectra yielded the most accurate results; therefore, the top 15 and top 20th ranked spectral indices and reflectance spectra, respectively, per feature selection procedure, were used for subsequent species separations. To allow for comparability, the same random number seed (100) was set prior to modeling.
We then used recursive feature elimination that included resampling from the caret package [51] using a random forest (RF) algorithm in the R package randomForest [52] and a wrapper using a support vector machine (wSVM) feature selection method implemented in the R package FSelector [53]. The RF approach is an embedded method of feature selection that uses recursive partitioning, producing an ensemble of classification trees that are calculated on random subsets of the data [54]. For each resampling iteration, the algorithm partitions the data into training and hold-back sets via resampling. The model is then trained on the training sets using all predictors and subsequently performs a prediction on the held-back samples. Variable importance or rankings were then calculated by keeping the S i most important variables for each subset size S, for which i = 1…S. The training data set was used to train the model using S i predictors, and the held-back samples were used to make predictions. The performance profile over S i using the held-back samples was then calculated, and the appropriate number of predictors was determined.

U-Test
A Mann-Whitney U-test was used to determine if the variance of reflectance between tree species was greater than within tree species [1]. A non-parametric test was chosen because hyperspectral data are not independent and are not normally distributed [55]. Unequal sample sizes did not affect the statistical test, because the number of samples was sufficiently large [56]. The U-test was used to test the hypothesis that there was no significant difference between the median reflectance of each individual waveband between pairs of tree species [1]. Schimdt and Skidmore [1] described the null hypothesis for N vegetation types and J spectral bands per reflectance measurement as: where i is the spectral band, η n is the median reflectance for vegetation type number n = 1,2,3...(n−1) and i = 1,2,3... For this study, the maximum frequency was 1,035 ( 46 C 2 = (46*(46-1))/2). The hypothesis was therefore tested 1,035 times for all possible combinations of the 46 species at the significance levels of α' = 0.05 and α'' = 0.01, with Bonferroni adjustments in both cases.

Analysis of Variance, ANOVA
Adjacent hyperspectral wavebands show high correlations, and it is therefore not efficient or reliable to include all measured bands for analyses [57]. Most classification algorithms underperform when reflectance values for highly correlated data are used during model training. To minimize such effects, we performed a paired one-way ANOVA for all possible combinations of species in the training data set across the 47 spectral indices. The resultant probability value obtained from the ANOVA analysis provided an index of importance for the tested index that was used to discern between tree species [57].
The ANOVA was used to eliminate redundant spectral bands from the analysis. However, because adjacent bands are not independent, an ANOVA could not be used to test within-band differences. We recorded the frequencies of each spectral variable (reflectance spectra and spectral indices) that the ANOVA identified as having interspecific variation between the paired species at the 95 and 99% probability levels. The frequencies obtained from the one-way ANOVA were then used to select a subset of variables for running the classifier algorithms.

Principal Component Analysis, PCA
PCA is a multivariate statistical technique that can be used to extract information from spectral data and transform the data into a set of orthogonal variables called principal components (PC). Because neighboring bands of hyperspectral data are highly correlated, a PCA was used to transform the original data into its PCs. This reduces irrelevant information from the original inter-correlated variables. The uncorrelated linear combinations (eigenvector weights) of variables in n-dimensional space are then chosen to successively extract linear combinations that have corresponding smaller variances. The first PC accounts for the maximum proportion of the variance, and subsequent components account for the next highest proportion of the remaining variance [58]. PCA wavebands were computed using factor loadings (or eigenvectors) for each of the bands and by multiplying the factor loadings by their respective waveband's reflectivity [15].

Random Forest Selector
Random forest comprises a collection of decision tree classifiers [59], in which each tree in the forest has been trained using a bootstrap sample of training data, and a random subset of features is sampled independently from the input features [13]. By omitting a subset of the training data set from the training of each plant species, random forests are better able to examine the contribution and behavior of each predictor (spectral band or index) [60]. The out-of-bag data were used for feature selection by determining the importance of different spectral wavelengths or indices during the classification process, based on a Z score, which was used to assign a significance level.
2.6.5. The Wrapper Approach with a Support Vector Machine: Wrapper-SVM Wrapper feature selection uses the induction algorithm as a black box method [61] and combines the strength of a traditional search algorithm with the capability of a classifier [7]. Therefore, crossvalidation or bootstrapping should account for the variability caused by feature selection when assessing performance. In this study, performance estimates that included feature selection variation were generated using the rfe function (caret package), which induces recursive feature elimination with resampling. Spectral indices and reflectance spectra feature sets that were permitted with the rfe function were 1, 2, 3, 4, 5, 10, 15, 20 and 30, using a support vector machine with a 10-fold crossvalidation and 10 repeats of the bootstrap for optimization. The predictors (spectral indices and spectral bands, respectively) were ranked; the less important predictors were sequentially eliminated prior to modeling.

Classifier Training, Prediction and Accuracy Assessment
Three classifiers, namely a linear discriminant analysis (LDA), an artificial neural network (ANN) and a generalized linear model fitted with elastic net penalties (GLMnet), were used to distinguish the plant species. RF was used as a fourth classifier (denoted as RFa to distinguish it from the same algorithm used for feature selection); however, instead of using the feature selected variables as inputs, the full complement of spectral bands and indices were used. The RFa classifier uses predictor variable importance (measured as the decrease in overall classification accuracy when the variable is permuted in the out-of-bag samples) as the criteria to build the best model for discerning species.
Spectral wavebands and spectral indices with the highest discriminating powers determined from the feature selection procedures were used to create respective training sets. For each training set, a resampling iteration was performed in which specific samples were 'held-out', and the model was fitted on the remaining samples. Predictions were subsequently extracted from the 'held-out' samples and the average performance on the held-out predictions determined. The predetermined optimal parameter set was used to generate a final model that included all of the training data. For every combination of classification method and training set, a 10-fold cross-validation with 10 repeats of the bootstrap was performed in the caret package. To ensure that the same resampling sets were used, the random seed was set to 100 prior to each model run. Estimations of model performance were conducted on the training set, while the withheld test set was used to evaluate the classifier's performance. The kappa index and overall accuracy (OAA) were calculated for the algorithms used to discern between the species (LDA, ANN, GLMnet and RFa), by using each classifier's respective confusion matrix.

Linear Discriminant Analysis (LDA)
LDA is a parametric classifier that has been used to discern vegetation types or to identify tree species (e.g., [2,9,57,62]). LDA uses a pooled within-class covariance matrix and spectral predictor variables from training samples to build the discrimination functions for each class. Hence, the original redundant data are projected to a new orthogonal space oriented along the axis that can maximize the ratio of between-class to within-class variance among training samples [57]. The LDA function within the caret package was used to implement the LDA algorithm.

Artificial Neural Networks (ANN)
Artificial neural networks (ANNs) are non -parametric statistical data modeling tools that have been used successfully to discriminate remotely-sensed vegetation (e.g., [2,63]). We used a multilayered perceptron neural model with a fully-connected feed-forward, supervised learning network, trained by the back-propagation algorithm to minimize a quadratic error criterion. In a layered structure, the input to each node is the sum of the weighted outputs of the nodes in the prior layer, which are connected to the input spectral features (the wavelengths and indices selected from feature selection procedure). For the respective spectra and indices for each feature selection method, we used an output layer containing as many neurons as classes into which the samples were differentiated.
Wavebands and indices that were identified by the feature-selection method as being most important for discriminating species were used as the input nodes for the multilayered perception implemented in the MLP function from the RSNNS package, Version 0.4-3 [64]. RSNNS implements an R interface to the Stuttgart Neural Network Simulator, SNNS [65]. Five hidden layers with a maximum of 100 iterations were used for learning, with randomized weights as the initialization function. The parameter for the learning function was a standard back-propagation, with a topological order update function.

Generalized Linear Models with an Elastic Net Regularization (GLMnet)
Generalized linear models (GLMs) are mathematical extensions of linear models that do not force data into unnatural scales and thereby allow for non-linearity and non-constant variance structures in data [66]. However, this popular machine learning technique does not penalize for the size of estimated coefficients, therefore limiting its performance. By introducing a penalty term, the elastic net [67] penalizes the size of estimated coefficients by using a combination of two regularization techniques [68], the l2 regularization (used in ridge regression) and the l1 regularization, used in lasso [69]. The penalty term, P a (β) is defined as: where P α is a compromise between the ridge-regression penalty (Alpha, α = 0) and the lasso penalty (α = 1). In applying the l1 penalty, lasso attempts to achieve a parsimonious solution. This idea has been broadly applied, for example to generalized linear models [69]. Lasso attempts to ensure that most of the variable coefficients will be shrunk to 0, so the least significant variables are removed from the model. Contrastingly, the ridge penalty shrinks all variables, but not to 0. For comparisons with the popular LDA and ANN classifiers used for hyperspectral species discrimination, we used an extension of Friedman et al. [68] to classify the hyperspectral data in our training data set (both spectral wavebands and indices). A GLM was fitted with an elastic-net regularization via the R package glmnet (Version 1.9-5) [68]. Alpha (α) values of 0, 0.5 and 1 and lambda (l) values ranging from 0 to 0.05 with increments of 0.01 were used as the tuning grid to select the optimal model. The response type was chosen as Gaussian, and a Newton logistic type was used.

Feature Selection
A support vector machine using recursive feature elimination, a 10-fold cross-validation and 10 repeats of the bootstrap selected the 'optimal' number of variables to be taken from the different feature selection methods. Ultimately, a set of 15 spectral indices produced the lowest root mean squared error, RSME (5.81 ± 0.119 standard deviation, SD), while a group of 20 spectra wavelengths produced the lowest RSME of 6.02 ± 0.792 SD.

ANOVA
A subset of the most frequent spectral indices was selected from all of the spectral metrics using a one-way ANOVA. Figure 3 illustrates the frequency distribution of the ANOVA for all of these indices between every paired species, arranged in descending order of frequency. The pre-determined selected number of features meant that the first 15 indices (H.1D to LIC.1 from left to right) were subsetted. Five vegetation indices (VI), Moisture Stress Index (MSI), Cellulose Absorption Index (CAI), NDWI, Disease Water Stress Index (DWSI) and NDVI, were within that subset. The spectral variables H.1D, F.1D, Simple Ratio (SR), R550, C.1D, formed a sub-group of variables that were related to leaf pigment status, notably chlorophyll content of leaves among the different species. Figure 4a summarizes the ANOVA results of all possible species combination pairs showing species pairs that were statistically different per wavebands. Of the 401 wavebands assessed, 296 were able to discriminate over 800 pairs of species and 392 wavebands showed a discrimination frequency of over 700. The highest frequency was observed in the shortwave infrared range of the spectrum, and relatively low frequencies were noted in the visible range of the spectrum.

Mann-Whitney U-Test
A Mann-Whitney U-test was used to test the null hypothesis 1,035 times at the adjusted Bonferroni significance levels of 95% and 99% for the spectral variables used in this study. The lowest performing variables were the spectral positions (A-WP to J-WP), corresponding to the maximum first derivative spectra (A-1D to J-1D) ( Figure 5). Of the first 15 spectral derivatives (H.1D to SR from left to right) that were selected, 11 (73.3%) were also among the ANOVA-based selected features; these included H.1D, CAI, RATIO1200, WI1180, MSI, F.ID, NDWI, DWSI, R550, RATIO975 and SR.  The results of all possible species combinations and the frequency of species pairs with a statistically significant difference per waveband are shown in Figure 4b, with the mean normalized reflectance of TELA (Terminalia latifolia) plotted as a reference for the position of the main features of a typical leaf reflectance curve. At the 99% significance level (Bonferroni adjusted), the wavebands at 1,385, 1,390 and 1,480 nm scored the highest frequencies overall, with the waveband at 1,385 nm recording the highest frequency (900). In contrast, the lowest frequencies (683, 693 and 730) were obtained for the wavebands at 400, 405 and 410 nm, respectively. On average, the U-test was able to discriminate 12 additional pairs of species combinations at α = 0.05, compared to the α = 0.01 significance level. Relatively low frequencies were noted for the infrared plateau compared with the relatively higher frequencies occurring in the shortwave infrared regions (SWIR).

Figure 5.
Frequency distribution of the Mann-Whitney U-test for the 47 spectral variables for paired species across the 46 species studied. Each bar represents the number of paired species for which a spectral variable difference is significant.

Principal Component Analysis, PCA
The first five PCs explained at least 95% of the variation for all samples within the training spectra data set, and the first eight PCs explained at least 99%. However, for some species, the target variabilities (95% and 99%) were obtained before the stated number of PCs. For example, the first five PCs were able to explain at least 99% of the variability for CEPE (Cecropia peltata) and Trichocentrum luridum (TRLU).
For reflectance values, the first PC only contained wavebands from the SWIR and near-infrared (NIR) (700 to 1,100 nm) regions. Wavebands from the SWIR region dominated the first PC with a 68% frequency occurrence, and the near-infrared (NIR) (700 to 1,100 nm) accounted for the remaining 32% frequency of occurrence (14 out of 44 species). Conversely, when all of the PCs that accounted for at least 99% of the variation were pooled, bands from the visible region had a high frequency of occurrence. A summary of these pooled results for the PCA for band selection revealed that three main clusters were distinguishable; one in the visible (400 to 560 nm), another in the red edge slope (680 to 750 nm) and the last in the SWIR (1,340 to 1,545 nm).

Species Separation
Overall, spectra data produced higher accuracies compared to variable indices, irrespective of the algorithm used (Table 3). Generally, GLMnet produced higher accuracies on the spectral indices when compared with the spectral waveband data. The GLMnet, ANN and LDA classifiers produced overall accuracies that ranged from 52-77%, 74-87% and 83-87%, respectively (Table 3). A one-way ANOVA test confirmed significant differences between the performance of the three classifiers (F 2,12 = 9.68; p = 0.003) for spectra reflectance using feature selected wavebands. Post hoc comparisons using the Tukey HSD test indicated that the GLMnet (M = 65.79, SD = 11.2) produced significantly lower accuracies at the 95% CI, when compared with LDA (M = 85.19, SD = 2.07) and ANN (M = 81.67, SD = 5.9), respectively. One-way ANOVA also confirmed significant differences between the three classifiers when feature-selected spectral variables were used (  The RFa classifier was able to discern species with 91.8% accuracy when 24 spectral indices were used as input variables (Table 3). Without prior feature selection, the RFa chose 201 wavebands in its optimal model, and achieved an overall accuracy of 84.8% in discerning the 46 plant species. However, in some instances with feature selection, the more parsimonious models yielded higher accuracies (up to a 2.5% higher) compared with the RFa. However, the LDA classifier, using features selected from RF, produced relatively high accuracies and kappa statistics for both spectra and spectral variables. The ANOVA and U-test were the only two methods among the different feature-selected methods that selected bands exclusively from the mid-infrared part of the spectra (1,300-2,500 nm).

Producer's and User's Accuracies
Of the 46 species, 15 were easily separated using RF feature selection on spectra reflectance and the LDA classifier based on their producer's accuracy. A total of 12 species recorded a 100% producer accuracy, using the same combination of feature selection and classification methods, but with spectraderived variables, and an additional eight species ( Figure 6)-AVGE (Avicennia germinans), EIDI (Eichhornia diversifolia), EUBR (Eugenia brownie), GYSA (Gynerium sagittatum), SAMA (Sabal maritima), SYJA (Syzygium jambos), Terminalia latifolia (TELA) and Trichocentrum luridum (TRLU)-had perfect producer accuracies regardless of input features (spectral reflectance or spectral derivatives). EIDI, EUBR, TELA and TRLU were among the most easily distinguished species, or species that recorded full accuracy percentages at least 50% of the time, irrespective of the combination of classifier, feature selection and input features used. In contrast, there was greater spectral confusion among the epiphytes, PHLA (Philodendron lacerum) and FIPE (Ficus pertusa), the invasive tree, MEQU (Melaleuca quinquenervia), and the native shrub, CAGU (Casearia guianensis).   Indices   ADPA  ALAL  AMLA  ANGL  ANIN  ARDO  AVGE  CAAC  CACA  CACH  CAGU  CAOC  CEPE  CHIC  CLMA  COLO  CRAM  CRSP  EIDI  ELGU  EUBR  FIMA  FIPE  GRCA  GUAR  GYSA  HIEL  IPTI  LODO  MEQU  NEPA  PEGL  PHLA  RHMA  ROPR  SALA  SAMA  SYAU  SYGL  SYJA  TAAN  TELA  THGE  TIFA  TRLU  The wavebands selected using the five spectral discrimination methods (ANOVA, t-test, RF, PCA and w-SVM) were merged to determine their frequency of occurrence, and the distribution of these bands along the spectral axis formed several distinct clusters (Figure 7). In the visible spectral range, one cluster comprised spectra selected from at least three feature selection methods. The wavelength range of this region was 400-610 nm. In the red edge region, wavebands formed a cluster at both ends of the slope (680-775 nm). Other clusters were located in the far near-infrared (FNIR) and short-wave infrared (SWIR) segments of the spectrum, with spectra located at 1,380, 1,385 and 1,390 nm being selected by the PCA, ANOVA and U-test feature selection procedures, respectively.

Discussion
The different feature selection methods used in this study demonstrated varied measures of effectiveness when used with the different classifiers. For the feature-selected spectral wavebands, LDA proved to be the most effective classifier, whereas the regularized GLM correctly discerned more plant species based on their associated spectral indices. Instead of using feature-selected inputs, RFa was able to produce the highest accuracy (92%) with a less parsimonious set of 24 spectral indices. A comparison of results from other studies that used different or similar feature selection methods and the results of this study are found in Tables 4 and 5 for spectral reflectance and spectral indices variables, respectively. The comparisons show that our results were able to identify wavelengths in the three regions of the spectra commonly used for discerning plants based on their foliar spectral reflectance (the visible, NIR and mid-NIR bands).   The results from our study suggest that classification performance is improved, at least with the ANN, when bands from different parts of the spectrum are chosen. Lower OAA and kappa values were obtained for the ANN and GLMnet classification when the ANOVA and U-test feature-selected sets were used ( Table 3). The ANOVA and U-test were the only selection methods that isolated bands from a single part of the spectrum (the mid-infrared region). Our results agree with previous studies (e.g., [1,2]) that have illustrated the relative importance of using different parts of the spectrum for species discrimination, particularly the wavebands of the NIR plateau.

Performance of the Different Classifiers
In this study, the LDA generally outperformed the ANN and GLMnet classifiers, especially with feature-selected reflectance wavebands. The nonlinear ANN can usually handle both parametric and nonparametric data sets, while the LDA is theoretically limited to parametric data sets. In situ spectral measurements collected from individual species are assumed to follow a normal distribution [2]; therefore, one would expect non-significant species recognition accuracies if the performance of the LDA and ANN were compared. In fact, accuracies for the two classifiers were not significantly different for feature-selected spectral waveband data. However, the ANN classifier failed to accurately distinguish between plant species when spectra-derived indices were used. This finding is consistent with other studies that have found ANNs to produce relatively lower classification results with remote sensing data when compared with traditional methods (e.g., [73]).
Although the RFa is a well-known classifier, it failed to top the accuracies attained by the LDA algorithm, even on a less parsimonious set of input features. The LDA simultaneously uses all of the predictor variables to estimate predictor covariance, allowing it to distinguish between classes. Conversely, the RFa distinguishes classes by individually building decision spaces for each explanatory variable at each node level; therefore, there is interdependence between the nodes, and as such, the final classification is ultimately dependent on the decision spaces at higher nodes [23]. This translates to higher misclassification rates, especially for reflectance data, which can be highly variable for different samples of foliar reflectance from one plant species, leading to many different possible splits for the decision tree. However, the RFa algorithm should be robust even in the presence of spectral variability, as the classifier minimizes errors from a single decision tree by selecting random samples, generating hundreds of decision trees and using a majority vote to make the final decision. The relatively large sample size of spectra for some species compared to others ( Table 1) may have increased the spectral variability in plants with more foliar spectra measured. This may have caused an overlap in the foliar spectra of some species, making it more difficult for the RFa classifier to distinguish species yielding spectra with a high degree of overlap.
Even when used for feature selection, the RF algorithm was able to produce the highest accuracy (87.6%) when combined with the GLMnet classifier, compared with the LDA and ANN. The GLMnet uses a generalized linear model (GLM) along with an elastic net regularization. The GLM allows for response variables that have error distribution models other than a normal distribution, and the elastic net regularization applies constraints to the lasso and to the ridge parameters [70]. Therefore, we would expect the distribution of the wavebands and spectral indices to have minimal effect on the performance of the GLMnet classifier. Furthermore, the lasso constraints control the selection or removal of variables in the model, while the ridge handles collinear variables. By controlling the relative weighting of these two constraints, the elastic net regularization is able to handle highly correlated data. These are desirable qualities of a classifier when dealing with hyperspectral data, thus we would expect the GLMnet classifier to give consistently high accuracies and its performance to be unaffected by data from different feature-selection methods. However, the findings of this study do not support these assertions. Although the GLMnet gave significantly higher accuracies with spectral indices when compared with spectral wavebands, PCA feature-selected data significantly lowered (in the case of spectral indices) or increased (for reflectance wavebands) the accuracies.

Reflectance Spectra vs. Spectral Indices
In this study, the spectral metrics out-performed the spectral reflectance data sets only when the GLMnet classifier was applied; but the opposite was true for the LDA and ANN classifiers. When data from all of the feature-selected spectra wavebands were pooled, they were found to aggregate in several clusters along the wavelength axis ( Figure 7). Therefore, irrespective of the feature-selection method used, different spectral regions were able to differentiate the tested species. Furthermore, in most cases, there was no overlap among the bands selected using the different feature selection methods, but instead, neighboring bands had comparable discriminating power when selected by other methods. This is consistent with the results of other studies that used foliar reflectance properties for species discrimination Indices derived from spectral reflectance and reflectance wavebands were shown in this study to produce different accuracies when used by the same classifier. Spectral differences among species can be affected by illumination, and these differences are better captured with reflectance spectra [22]. However, it can be assumed that illumination did not significantly affect our results, because we used a used an artificial light source and excluded ambient light. Moreover, spectral data in this study were collected from an illuminated source under controlled conditions, thus eliminating the need to apply filtering or corrections to the raw data. Indices normally minimize brightness variation from band ratios and derivative analyses [22], but shading effects were negligible in this study. Accordingly, chemical absorptions and leaf structure were responsible for species separations.

Inter-and Intra-Plant Spectral Variability
Spectral variability among individual species can be attributed to differences in internal leaf structure and to leaf biochemical composition [9], most notably water and chlorophyll content [20], epiphyll cover and leaf morphology [74]. However, in this study, the poor performance of the spectral indices may be indicative of pigment absorption in the visible region and water, cellulose, starch and lignin spectral absorption in the near-infrared (NIR) and short wave infrared (SWIR) regions of the spectrum (spectral positions A-WP to J-WP, corresponding to the maximum first derivative spectra A-1D to J-1D). However, the spectral signatures of plant species are also affected by factors that are not limited to age, vitality or physiological characteristics [75]. Inter-species leaf variability can also be attributed to the measurement of bidirectional reflectance, instead of hemispherical reflectance [9]. In this study, most of the spectra metrics selected by the feature selection methods were directly related to leaf chemistry. Therefore, spectral responses due to the concentration of leaf pigments and other bio-chemicals may elucidate greater spectral variation than provided by differences due solely to leaf morphology.

Implications for Natural Resource Management
We were able to achieve vegetation species discrimination from leaf spectral reflectance using data obtained from controlled illumination at the leaf level. This was an initial step towards the ultimate goal of discriminating and mapping wetland vegetation species and communities using hyperspectral sensors based on an airborne platform at the landscape level. Clark et al. [9] , Kalacska et al. [72], Cho et al. [71] and, more recently, Clark and Roberts [22] successfully demonstrated that leaf-level methods can be scaled up to the canopy level to facilitate spectral discrimination of plant/tree species from different types of tropical forest (rain forest, dry forest and mangroves) at the landscape level.
However, discerning species from heterogeneous habitats in wetlands has been accomplished with varying levels of success. Schmidt and Skidmore [1] used hyperspectral remote sensing to map 27 salt-marsh grass and herbaceous plant species, by assessing the canopy level reflectance spectra of several vegetation associations occurring in a Dutch salt-marsh. They were able to demonstrate that separability can be achieved for most plant species from the marshland. However this success has not been widely replicated, especially for discriminating among tropical wetland species. This can be attributed to the high spectral and spatial variabilities associated with herbaceous wetland vegetation and associated steep environmental gradients, which produce short ecotones and sharp demarcations between vegetation units [10,76].
Furthermore, the reflectance spectra of wetland vegetation canopies are often very similar and can be confused with the reflectance spectra of the underlying soil, hydrologic regime and atmospheric vapor [77]. As a result, for wetlands such as the BRLM, the spatial or spectral optical classification normally employed in remote sensing may result in low classification accuracies. Moreover, the ability of hyperspectral data to effectively distinguish individual species within flooded wetland environments is reduced, because the performance of near to mid-infrared bands are attenuated by the presence of underlying water and wet soil [76,78]. However, several authors (e.g., [1,7,78]) have used the narrow spectral channels offered by hyperspectral data to detect and map the spatial heterogeneity of wetland vegetation.
Hyperspectral sensors have been used for the early detection [79], mapping and monitoring [80] of the introduction and spread of invasive plant species in wetland environments. Wetlands are highly susceptible to plant invasions, which threaten the biodiversity and ecological integrity of such systems [81]. Fourteen of the 46 species used in this study are non-native, and at least two, MEQU and ALAL, are highly invasive to the BRLM. Management of this RAMSAR site would entail landscape mapping to monitor the introduction, presence and spread of such invasive species and to identify the location of single nuclear trees for eradication exercises.

Limitations
Despite attaining spectral separation of the vegetation tested in this study, there were several limitations to our analyses. First, we did not conduct leaf tissue chemical assays. Therefore, we were unable to relate leaf chemical properties to spectral indices. Ideally, chemical and hyperspectral data should be collected at the same time so that the chemical constituents of the leaves can be correlated to the hyperspectral data. This would minimize pseudo-replication from sites within the wetland that show micro-variations (e.g., vegetation in brackish and saline vs. freshwater locations, limestone vs. peat substrate). Furthermore, sampling was conducted during one month of the rainy season (October). To account for possible changes in physio-chemical parameters, for example water stress indicators, the collection of spectra, ideally on the same individuals sampled, should have been conducted during a representative month in the dry season (March or April).
Furthermore, several studies have used phenological changes in invasive plants to better discriminate them from non-introduced or native plants (e.g., [80,82,83]). However, the phenologies of the species used in this study were not considered during the sampling and collection of spectra. Clark and Roberts [22] used seven tropical, canopy-emergent species to demonstrate the effectiveness of using spectral metrics derived from leaves, bark and a combination of leaves and bark during phenological changes in the plants, to discriminate between species. In this investigation, we only assessed the foliar reflectance of the vegetation, and ignored branch and bark spectral reflectance.
Although we found feature selection to be effective and RFa efficient (prior feature selection not required) in distinguishing species based on their foliar reflectance, it should be noted that accuracies might decay markedly at coarser spatial scales [9]. In this study, the RFa's accuracy of approximately 92% might be satisfactory for in situ species differentiation. However, for remotely-sensed images, atmospheric effects should be corrected or compressed prior to conducting species recognition analyses, especially in wetland environments [9]. Wetland environments require special analytical techniques, because saturation and atmospheric vapor affect the near-infrared region [84]. Atmospheric correction should enhance the spectral separability between species with hyperspectral remote sensing data, but water absorption bands in the mid-infrared region should be considered. Mapping the wetland landscape requires remotely-acquired hyperspectral data collected by sensors placed on airborne or satellite-based platforms. Although we have demonstrated the applicability of using spectral reflectance and reflectance indices with feature selection, and the efficiency of the RFa algorithm, our analyses were limited to leaf-level, in situ conditions. If remotely-sensed hyperspectral data collected by sensors placed on airborne or satellite-based platforms are to be used to map and monitor vegetation changes in highly fragile and heterogeneous tropical wetland ecosystems such as the BRLM, the next step is to determine whether these leaf-level methods of spectral discrimination can be scaled up to the canopy and the landscape levels, and whether these methods can be successfully replicated in different tropical wetlands. However, to address the effects of atmospheric conditions and structural and vegetative complexities on remotely-acquired reflectance spectra, additional spectral indices may need to be explored or the effective wavebands used for distinguishing canopies of differing species composition may need to be revised.

Conclusions
We presented an application of leaf-level hyperspectral data for species discrimination using five feature selection methods (ANOVA, U-tests, PCA, RF and wSVM) and four classifiers (LDA, ANN, GLMnet and RFa) to discriminate among 46 flora species under wetland conditions. Both spectral reflectance and spectral indices were used, and feature selection proved helpful in obtaining parsimonious models that were able to discern between the leaves of different species with accuracies of approximately 88% for waveband and spectral index variables. However, the highest accuracy (92%) was achieved using the RFa classifier with spectral indices, but at the expense of using a less parsimonious model. However, one can question the efficiency of the feature selection step, since this step itself might be computationally expensive, compared to including more features. Nevertheless, the spectral discrimination of invasive plants, grasses, a floating macrophyte, endemic shrubs and trees, as well as both native and non-native climbers and epiphytes, was achieved at the individual foliar level. This result was obtained under controlled data-collection conditions. It is therefore the first step towards the ultimate goal of using hyperspectral remote sensing to discriminate and map the canopies of different vegetation types in a wetland environment, using the BRLM as a focal ecosystem. This will be used to support current initiatives aimed at managing and monitoring invasive flora and monitoring fragile/threatened habitats, such as the remnant fragments of swamp forests in the BRLM.
(UNEP), the National Environmental Planning Agency of Jamaica (NEPA), Commonwealth Agricultural Bureaux International (CABI), the Government of Jamaica "Mitigating the threats of alien invasive species in the insular Caribbean" project and the MacArthur Foundation.

Author Contributions
K. Prospere carried out field work, data analysis and is the main author of all sections of the manuscript. K. McLaren secured supporting funding, supervised research and helped with formulating the methodology. B. Wilson provided editorial advice, is the co-principal investigator and helped with formulating the methodology.