Evaluation of Continuous VNIR-SWIR Spectra versus Narrowband Hyperspectral Indices to Discriminate the Invasive Acacia longifolia within a Mediterranean Dune Ecosystem

Hyperspectral remote sensing is an effective tool to discriminate plant species, providing vast potential to trace plant invasions for ecological assessments. However, necessary baseline information for the use of remote sensing data is missing for many high-impact invaders. Furthermore, the identification of the suitable classification algorithms and spectral regions for successfully classifying species remains an open field of research. Here, we tested the separability of the invasive tree Acacia longifolia from adjacent exotic and native vegetation in a Natura 2000 protected Mediterranean dune ecosystem. We used continuous visible, near-infrared and short wave infrared (VNIR-SWIR) data as well as vegetation indices at the leaf and canopy level for classification, comparing five different classification algorithms. We were able to successfully distinguish A. longifolia from surrounding vegetation based on vegetation indices. At the leaf level, radial-basis function kernel Support Vector Machine (SVM) and Random Forest (RF) achieved both a high Sensitivity (SVM: 0.83, RF: 0.78) and a high Positive Predicted Value (PPV) (0.86, 0.83). At the canopy level, RF was the classifier with an optimal balance of Sensitivity (0.75) and PPV (0.75). The most relevant vegetation indices were linked to the biochemical parameters chlorophyll, water, nitrogen, and cellulose as well as vegetation cover, which is in line with biochemical and ecophysiological properties reported for A. longifolia. Our results highlight the potential to use remote sensing as a tool for an early detection of A. longifolia in Mediterranean coastal ecosystems.


Introduction
Invasive plant species are a major threat to many ecosystems worldwide [1], causing high ecological impacts and high economic costs [2].Thus, early detection and monitoring of distribution and abundance of invasive species is desirable as a basis for impact assessment, prioritization of most harmful species, and targeting of invasive populations.As on-site data collection is difficult over large areas, remote sensing provides a promising tool to detect and to monitor invasive species at landscape scales [3][4][5].To this end, it would be beneficial to identify band regions that separate invasive and native species by means of field spectra, and to assess their importance at different scales.However, few studies have compared the separability of plant species based on hyperspectral data on both, leaf and canopy level [6][7][8][9][10][11], while studies on invasive plants exclusively used either leaf [12,13] or canopy [14,15] field spectral data.
Further, there are several approaches to classify species using either the continuous spectrum from the visible (VIS) to the shortwave infrared (SWIR) [10,16,17], particular spectral curve shape-based features [8,18], or vegetation indices [8,19,20].Linking spectral with biochemical and physiological ancillary data may increase classification accuracy.At the same time, it allows to relate the spectral separability of species to their characteristic ecophysiological traits [19,21].In this regard, vegetation indices and spectral features can be considered semi-quantitative biochemical parameters of the reflectance spectrum [22] that may help to assess the physiological status of the vegetation [20].In a recent study on the separability of native Mediterranean dune species, relevant absorption features were related to pigments, water, lignin and cellulose [23].However, important wavelengths or bands to distinguish exotic from native species in a Mediterranean dune ecosystem still have to be identified.
Important predictors can be extracted by using wrapper functions such as recursive feature elimination and by choosing classifiers with embedded feature selection methods [16,19,24,25].Powerful classifiers for high-dimensional data include Partial Least Squares Discriminant Analysis (PLSDA) [26] and sparse PLSDA [16], Support Vector Machines (SVM) [16,27], Linear Discriminant Analysis (LDA) [7,19] and Random Forest (RF) [8,19].The relatively new High-Dimensional Discriminant analysis (HDDA) has been tested on simulated and real high-dimensional datasets [28], but not on vegetation data compared with common algorithms such as SVM and RF.However, previous studies show no agreement on the best classifier, the best data reduction technique, and the most important bands for species discrimination.Therefore, in the present study, we compared results from different types of classification algorithms (LDA, HDDA, sPLSDA, RF, SVM) and different methods for reducing data dimensionality.
A comprehensive spectral library for a diverse Mediterranean dune ecosystem has been published recently [23], though it does not include exotic invasive species.Some highly invasive shrub and tree species worldwide belong to the genus Acacia [29].In European Mediterranean dune ecosystems, Acacia longifolia (Andrews) Willd.has been described as one of most problematic invaders.Its occurrence has been reported for most part of the western Atlantic coast of the Iberian Peninsula [30][31][32][33][34][35][36] and in coastal ecosystems worldwide [37][38][39][40].Several studies have assessed its invasion strategy and impact in Mediterranean dune ecosystems in Portugal.It is considered a highly competitive plant as it has a high growth rate [41,42], an efficient nutrient acquisition [41,42], and it benefits from fire events [43].The impacts on ecosystem functioning include alterations of the nitrogen cycle [30,44], water balance [45], carbon assimilation [45], vegetation and plant community structure [33], plant diversity [33], litter density [31], soil N content and C/N ratio [31], and the seed bank [46].In spite of the numerous studies on its strategy, its impacts, as well as restoration and control, there is a lack of assessments of its invasion pattern at landscape level, i.e., by means of remote sensing.Furthermore, to our knowledge, assessing leaf and canopy spectral properties of invasive Acacia spp. as baseline information for detection at an early invasion state has not yet been achieved.
In this study, we tested the separability of A. longifolia from other exotic and native shrubs and trees in a Mediterranean dune ecosystem.In particular, we investigated: (i) if narrowband hyperspectral vegetation indices perform better than the full spectrum in distinguishing A. longifolia at leaf and canopy level from all other species; (ii) which type of classification algorithm provide the best classifier for A. longifolia; and (iii) which are the most important biophysical, biochemical, and ecophysiological parameters to distinguish A. longifolia from adjacent vegetation.

Study Area
The study was conducted at 7 sites along a coastal strip of 1 km ˆ35 km in the Natura 2000 site "Comporta/Galé" [47] in Southwest Portugal (see Figure 1).The study area is characterized by a mosaic of several natural and semi-natural shrub-dominated dune habitats comprising "Atlantic decalcified fixed dunes" (2150*), "coastal dunes with Juniperus spp." (2250*), "dune sclerophyllous scrubs" (2260) as well as "wooded dunes with Pinus pinea and/or Pinus pinaster (2270*)", including priority habitat types based on the Natura 2000 directive (indicated with "*") [48].For a further description of the Natura 2000 site and the nature reserve in particular see [47,49], respectively.

Study Area
The study was conducted at 7 sites along a coastal strip of 1 km × 35 km in the Natura 2000 site "Comporta/Galé" [47] in Southwest Portugal (see Figure 1).The study area is characterized by a mosaic of several natural and semi-natural shrub-dominated dune habitats comprising "Atlantic decalcified fixed dunes" (2150*), "coastal dunes with Juniperus spp." (2250*), "dune sclerophyllous scrubs" (2260) as well as "wooded dunes with Pinus pinea and/or Pinus pinaster (2270*)", including priority habitat types based on the Natura 2000 directive (indicated with "*") [48].For a further description of the Natura 2000 site and the nature reserve in particular see [47,49], respectively.One of the management objectives for the aforementioned habitat types is the early detection and control of high-impact invasive species of the genus Acacia [47] such as A. longifolia [30,33,44,45].The population structure of A. longifolia in the study area includes isolated adult trees without juvenile plants, locally regenerating populations, populations regenerating after disturbance and stable, impenetrable thickets.However, to date, comprehensive abundance and distribution data are not available.

Collection of Field Spectra
Leaf and canopy spectra were collected between 5 and 9 April 2011 and between 27 April and 12 May 2014.In 2011 we sampled spectra with an ASD FieldSpec 3 (see [50,51] for specifications).In 2014, we collected spectral data with an ASD FieldSpec 4 Hi-Res spectroradiometer (ASD Inc., Boulder, CO, USA) [52].Spectral information was recorded from fresh samples in the field to avoid changes in spectral properties during storage and transportation [53].Leaf spectra were recorded using a plant probe (ASD Inc., Boulder, CO, USA).Several leaves were used in case single leaf area was too small to cover the area of the contact probe.The plant probe's white reference disk was used as in [50].In 2014, we additionally used black, foamed rubber as dark background (see [26]).The canopy spectra were taken with a field of view of 25° [52].Canopy spectra were sampled during days without cloud cover and with homogenous light conditions within a period of ca. 2 h before and after solar noon.The spectra were referenced against a calibrated Zenith LiteTM Diffuse Reflectance Target-95% R (SphereOptics ® , Herrsching, Germany).Spectra average was set to 25 for plant targets and 50 for the white reference.Spectral collection rate was ten•s −1 for the ASD FieldSpec 3 and five•s −1 for the FieldSpec 4.Although two different spectrometers were used, we do not expect a significant difference of measured target reflectance as the effect has been shown to be minimal for calibrated devices when One of the management objectives for the aforementioned habitat types is the early detection and control of high-impact invasive species of the genus Acacia [47] such as A. longifolia [30,33,44,45].The population structure of A. longifolia in the study area includes isolated adult trees without juvenile plants, locally regenerating populations, populations regenerating after disturbance and stable, impenetrable thickets.However, to date, comprehensive abundance and distribution data are not available.

Collection of Field Spectra
Leaf and canopy spectra were collected between 5 and 9 April 2011 and between 27 April and 12 May 2014.In 2011 we sampled spectra with an ASD FieldSpec 3 (see [50,51] for specifications).In 2014, we collected spectral data with an ASD FieldSpec 4 Hi-Res spectroradiometer (ASD Inc., Boulder, CO, USA) [52].Spectral information was recorded from fresh samples in the field to avoid changes in spectral properties during storage and transportation [53].Leaf spectra were recorded using a plant probe (ASD Inc., Boulder, CO, USA).Several leaves were used in case single leaf area was too small to cover the area of the contact probe.The plant probe's white reference disk was used as in [50].In 2014, we additionally used black, foamed rubber as dark background (see [26]).The canopy spectra were taken with a field of view of 25 ˝[52].Canopy spectra were sampled during days without cloud cover and with homogenous light conditions within a period of ca. 2 h before and after solar noon.The spectra were referenced against a calibrated Zenith LiteTM Diffuse Reflectance Target-95% R (SphereOptics ® , Herrsching, Germany).Spectra average was set to 25 for plant targets and 50 for the white reference.Spectral collection rate was ten¨s ´1 for the ASD FieldSpec 3 and five¨s ´1 for the FieldSpec 4.Although two different spectrometers were used, we do not expect a significant difference of measured target reflectance as the effect has been shown to be minimal for calibrated devices when the same protocol is applied [54].Here, both first-hand spectrometers were recently calibrated, and the sampling protocol as well as the operators and the white reference material were identical.
The measured target species were the most abundant shrub and tree species of the habitat types such as Corema album (L.) D. Don, Juniperus phoenicea subsp.turbinata (Guss.)Nyman and Pistacia lentiscus L., as well as exotic invasive species such as A. longifolia and Acacia saligna (Labill.)H.L. Wendl.Corema album is possibly the most frequent small shrub in the open dunes.Pistacia lentiscus is also very frequent, and it is the most similar species to A. longifolia regarding growth form, leaf type and habitat.Juniperus phoenicea has a different leaf type compared to A. longifolia, but growth form and habitat are similar.We also sampled chamaephytes including the invasive Carpobrotus edulis (L.) N.E.Br.and those of high-conservation value (e.g., Thymus sp.).Apart from different growth forms, leaf types and strategies, species such as C. album and A. longifolia confer a high degree of plasticity in response to microhabitat differences in environmental stress, such as fine scale differences in water accessibility in this heterogeneous system [55] which can affect the plant species' spectral response [56,57].Altogether, we used 607 leaf spectra (74 A. longifolia and 524 other plant species) and 293 canopy spectra (18 A. longifolia and 275 other plant species).As the main objective of this study was to discriminate the target species A. longifolia from any other species or combinations thereof, the samples were separated into two groups, "Acacia longifolia" and "other plant species", be classified in a "one-versus-all" (OVA) approach.Although some studies showed constraints using a binary classifier for a potentially multiclass problem (e.g., [58]), the OVA approach has been applied successfully in various studies [59] including classification of remote sensing data using SVM [60,61], and for invasive plant species detection [62].A full species list including growth form and the number of leaf and canopy samples can be found in Table 1.

Pre-Processing of Field Spectra
The spectra were corrected for the spectral discontinuities between the three sensors of the spectroradiometer using an additive correction as in [64] with the SWIR1 sensor as reference.The data were smoothed with the Savitzky-Golay filter using a window size of fifteen and a second order polynomial using the "hsdar" package version 0.3.1 [65] of R statistical software [66].For the canopy spectra analysis, noisy bands with water absorption features (1350-1460 nm and 1790-1960 nm) were removed.

Calculation of Vegetation Indices and Red Edge Parameters
Hyperspectral narrowband vegetation indices and red edge parameters were calculated using the "hsdar" package, version 0.3.1 [65].Variables with missing or infinite values or near zero variance were removed and indices for soil parameters (SWIR.FI, SWIR.SI, and SWIR.LI) were excluded.Pairwise correlations were calculated for the vegetation indices using Spearman rank correlation, and highly correlated predictors with a correlation coefficient higher than 0.6 were removed from the dataset, thus setting a slightly more conservative threshold than recommended [67].

Classification
Two approaches were compared to classify the spectral data: (1) the full spectrum together with classifiers specifically developed for high-dimensional data analysis; and (2) a reduced dataset consisting of vegetation indices.The number of vegetation indices was further reduced using recursive feature elimination (RFE) based on the receiver-operator curve (ROC) and considering variable importance [68].
To analyze the full spectrum, we chose sparse Partial Least Square Discriminant Analysis (sPLSDA) [69] and High-Dimensional Discriminant Analysis (HDDA) [28].HDDA is a relatively new classifier [28,70] designed for high dimensional datasets.The vegetation index datasets were classified using sPLSDA and HDDA as well as Linear Discriminant Analysis (LDA) [71], Random Forest (RF) [72], and Support Vector Machine (SVM) [73].LDA was chosen as it is a simple, fast and efficient classifier [9,74].RF and SVM were included because they have been proven useful in classifying various kinds of datasets [75] and are among the most commonly used classifiers designed for hyperspectral data analysis [76].
We optimized the model parameters, i.e., the number of components, eta and kappa in sPLSDA, "mtry" in RF as well as sigma and the cost parameter for SVM with a radial basis function kernel.The predictors were centered and scaled to account for the different ranges of the vegetation indices and the red-edge parameters.During model training and recursive feature elimination, the models were validated using stratified tenfold cross validation with five repeats.We used the receiver operating characteristic (ROC) curve to maximize model performance and balance Sensitivity and Specificity.The specific packages for each classifier were as followed: sPLSDA ("spls", version 2.2-1) [77], HDDA ("HDClassif", version 1.3) [70], LDA ("MASS", version 7.3-44) [71], RF ("randomForest", version 4.6-12) [78], and SVM ("kernlab", version 0.9-22) [79].
Each dataset was split into a training set (75%) for model fitting and a test set (25%) for model evaluation preserving class distribution."A.longifolia" was up-sampled during training (Table 2) to avoid a high impact of the majority class on the classifier which could lead to poor identification of the minority class [68].The test set was not up-sampled in order to produce a reliable evaluation of the training model.
Model performance can be negatively influenced by both between-class and within-class imbalances.As a result, minority classes or small data subclusters of one class might be misclassified [80].Therefore, data splitting and model fitting was iterated a hundred times to address the within-class distribution.Model performance was assessed by boxplots of Sensitivity, Specificity, ROC area under curve (AUC) (R package "pROC", version 1.8) [81] and Positive Predictive Value (PPV).Furthermore, we calculated the variable importance (VIP) and the frequency of occurrence for the predictors within each of the 100 iterations.A Mann-Whitney U-Test was performed to test for significant differences of important vegetation indices between the target species A. longifolia and the other species.All analyses were conducted using the R package "caret", version 6.0-52 [82] which supports a large variety of classification algorithms.

Separating Acacia longifolia from Adjacent Mediterranean Dune Vegetation at the Leaf and Canopy Level
After removing correlated variables, sixteen vegetation indices were used for the classification at the leaf scale and twelve at the canopy scale (Table 3).Regarding the classification accuracy at leaf level, the median of Area Under Curve (AUC) was between 0.91 (HDDA FullSpec ) and 0.98 (RF VegInds , SVM VegInds ).At the canopy scale, all classifiers apart from HDDA FullSpec reached a median AUC of 0.98 or 0.99 (Figure 2).Therefore, A. longifolia could be successfully identified at both leaf and canopy level.However, performance parameters directly related to the identification of A. longifolia, PPV (0.4-0.86) and Sensitivity (0.67-1), varied stronger than AUC (0.84-0.99) and Specificity (0.89-0.99).

Separating Acacia longifolia from Adjacent Mediterranean Dune Vegetation at the Leaf and Canopy Level
After removing correlated variables, sixteen vegetation indices were used for the classification at the leaf scale and twelve at the canopy scale (Table 3).Regarding the classification accuracy at leaf level, the median of Area Under Curve (AUC) was between 0.91 (HDDAFullSpec) and 0.98 (RFVegInds, SVMVegInds).At the canopy scale, all classifiers apart from HDDAFullSpec reached a median AUC of 0.98 or 0.99 (Figure 2).Therefore, A. longifolia could be successfully identified at both leaf and canopy level.However, performance parameters directly related to the identification of A. longifolia, PPV (0.4-0.86) and Sensitivity (0.67-1), varied stronger than AUC (0.84-0.99) and Specificity (0.89-0.99).Table 3. Hyperspectral narrowband vegetation indices including related vegetation characteristics, the sensor level at which the index was applied (Leaf and Canopy) and spectral bands used for calculation.Significant differences (p < 0.05, U-Test) in median index values between Acacia longifolia and adjacent dune vegetation are marked with an "*" for leaf and canopy scale, respectively.

Comparison of Model Performance Using Full Spectrum Data and Hyperspectral Narrowband Vegetation Indices
Models based on the full spectrum reached a similar accuracy regarding AUC (leaf: 0.97/canopy: 0.99), Sensitivity (0.94/1.00), and Specificity (0.94/1.00) compared to the highest median values using vegetation index data: AUC (0.98/0.99),Sensitivity (0.94/1.00) and Specificity (0.98/0.99) (Figure 2).Nevertheless, models based on vegetation indices revealed at least one classifier that reached or outperformed those based on the full spectrum with respect to AUC, Specificity, and PPV.In particular, the median values for PPV were lower for full spectrum data (0.60/0.40) compared with the best classifiers using vegetation indices (0.86/0.75).Thus, generally, accuracy was as high or even higher for the classifiers using the reduced dataset, especially regarding PPV.

Identification of the Optimal Algorithm to Distinguish Acacia longifolia from Adjacent Vegetation
An optimal algorithm can be characterized by a high true positive rate (Sensitivity) and a high reliability (PPV).At the leaf level, sPLSDA FullSpec and sPLSDA VegInds reached the highest median (0.94) with a relatively small interquartile range (0.06 and 0.11, respectively) regarding Sensitivity (Figure 2).HDDA FullSpec also produced relatively high Sensitivity values (0.89) and a small interquartile range compared to sPLSDA FullSpec , though a low PPV (0.52) indicated a high false positive rate.SVM VegInds and RF VegInds reached a moderate Sensitivity of 0.83 and 0.78 with a high PPV value of 0.86 and 0.83, respectively.In summary, sPLSDA FullSpec,VegInds performed best concerning Sensitivity, but overestimated the number of identified A. longifolia individuals.SVM VegInds and RF VegInds provided alternatives in terms of a balanced relation between Sensitivity and PPV.
At the canopy level, the values for Sensitivity had a low resolution and alternated between 0.75 and 1 due to the sample size.The best results regarding Sensitivity were achieved by sPLSDA FullSpec , sPLSDA VegInds and HDDA VegInds .The latter reached higher AUC and Specificity, and therefore a higher PPV, but the PPV was relatively low compared to all other classifiers using index data.Thus, despite the high Sensitivity, the amount of identified individuals of the target species, A. longifolia, was overestimated.On the contrary, LDA VegInds , SVM VegInds and RF VegInds had moderate Sensitivity, but a higher PPV.RF VegInds had the highest PPV of all classifiers, though it showed some extreme outliers.

Identification of the Most Important Variables
Figure 3 shows mean spectra of the target species, A. longifolia, and three native species as well as variable importance (VIP) of classification using the full spectrum (sPLSDA FullSpec ) at leaf and canopy level, while the variable importance of vegetation indices for two selected classifiers (sPLSDA VegInds , RF VegInds ) is displayed in Figure 4.The mean spectra of A. longifolia and the three selected small and tall shrub species, C. album, P. lentiscus and J. phoenicea, differed.However, the standard deviations indicated overlap among the species.
The number of variables that were important to distinguish A. longifolia from adjacent vegetation varied depending on sampling method and model (Figures 3 and 4).RF VegInds and sPLSDA VegInds used between 4-12 and 2-12 variables, respectively.The VIP and variable frequencies of the predictors differed between sPLSDA VegInds and RF VegInds (Figure 4).For example, in sPLSDA VegInds at canopy level five predictors (TCARI2, EGFR, Datt7, DPI, l0) occurred in more than 75% of the models, and five predictors (TCARI2, EGFR, Datt7, DPI, LWVI1) had a VIP higher than 0.5.RF VegInds had more predictors with a frequency higher than 0.75 (TCARI2, EGFR, Datt7, DPI, LWVI1, SWIR.VI, CAI, DWSI3, l0), but it had only two predictors (TCARI2, EGFR), which had a median VIP of higher than 0.5 (Figure 4).The classifiers showed a similar ranking of the VIPs, but absolute values and selection frequencies differed.3. Numbers above boxes show the frequency with which the respective index was selected in the final model (max = 100).Boxplots: show medians, interquartile ranges and extreme values within 1.5 × interquartile range.
The vegetation indices used as predictors in the classification models both at the leaf and canopy scale were related to a range of biochemical and ecophysiological parameters such as leaf water (LWVI1) and water stress (DWSI2), biomass/LAI (EVI) and vegetation cover (SWIR.VI), cellulose (CAI), and greenness/chlorophyll (TCARI2, DPI).Vegetation indices that were only relevant at the leaf level were mainly related to chlorophyll and pigments.Vegetation indices that were important at the canopy level with significant differences between A. longifolia and the other species were related to greenness (TCARI2, DPI), cellulose (CAI), vegetation cover (SWIR.VI), leaf water content (LWVI1, Datt7) and water or nitrogen stress (EGFR) (Figure A1).

Distinguishing Acacia longifolia from Adjacent Vegetation at the Leaf and Canopy Level
Identifying a high-impact invasive species at an early stage of invasion is an important task in conservation ecology.Hyperspectral remote sensing enables invasive species monitoring and predicting invasions at landscape scale.In the present study, we were able to successfully distinguish the Australian tree A. longifolia, an invader with high impact on ecosystem services and functioning in Mediterranean ecosystems [30][31][32][33]44,45,101], at leaf and at canopy level using field spectral data.At the leaf level, the non-linear classifiers SVM and RF achieved the best results in terms of Sensitivity and Positive Predicted Value using a reduced dataset based on vegetation indices.Similar to the leaf level, selecting vegetation indices for species separation based on canopy spectra resulted in higher PPV than using the full spectrum dataset.However, RFVegInds also produced a few extreme outliers possibly due to misclassifications in the highest nodes [19].Even though RFVegInds did underestimate the number of A. longifolia individuals due to its lower Sensitivity, it revealed the highest reliability as indicated by the highest PPV.As adjacent bands are highly correlated, removing unnecessary bands while maintaining the predictive power of a hyperspectral dataset is an important processing step [102].The classification accuracy can be increased if only band regions are retained that are linked to important biochemical parameters.For  3. Numbers above boxes show the frequency with which the respective index was selected in the final model (max = 100).Boxplots: show medians, interquartile ranges and extreme values within 1.5 ˆinterquartile range.
The vegetation indices used as predictors in the classification models both at the leaf and canopy scale were related to a range of biochemical and ecophysiological parameters such as leaf water (LWVI1) and water stress (DWSI2), biomass/LAI (EVI) and vegetation cover (SWIR.VI), cellulose (CAI), and greenness/chlorophyll (TCARI2, DPI).Vegetation indices that were only relevant at the leaf level were mainly related to chlorophyll and pigments.Vegetation indices that were important at the canopy level with significant differences between A. longifolia and the other species were related to greenness (TCARI2, DPI), cellulose (CAI), vegetation cover (SWIR.VI), leaf water content (LWVI1, Datt7) and water or nitrogen stress (EGFR) (Figure A1).

Distinguishing Acacia longifolia from Adjacent Vegetation at the Leaf and Canopy Level
Identifying a high-impact invasive species at an early stage of invasion is an important task in conservation ecology.Hyperspectral remote sensing enables invasive species monitoring and predicting invasions at landscape scale.In the present study, we were able to successfully distinguish the Australian tree A. longifolia, an invader with high impact on ecosystem services and functioning in Mediterranean ecosystems [30][31][32][33]44,45,101], at leaf and at canopy level using field spectral data.At the leaf level, the non-linear classifiers SVM and RF achieved the best results in terms of Sensitivity and Positive Predicted Value using a reduced dataset based on vegetation indices.Similar to the leaf level, selecting vegetation indices for species separation based on canopy spectra resulted in higher PPV than using the full spectrum dataset.However, RF VegInds also produced a few extreme outliers possibly due to misclassifications in the highest nodes [19].Even though RF VegInds did underestimate the number of A. longifolia individuals due to its lower Sensitivity, it revealed the highest reliability as indicated by the highest PPV.As adjacent bands are highly correlated, removing unnecessary bands while maintaining the predictive power of a hyperspectral dataset is an important processing step [102].The classification accuracy can be increased if only band regions are retained that are linked to important biochemical parameters.For example, it has been shown that at leaf level, using bands related to leaf tannin content were also suitable to distinguish A. longifolia from native and other Acacia species [51].Moreover, a higher number of bands as predictors requires more observations to achieve the same classification accuracy ("Hughes phenomenon", e.g., [103]).RF and SVM can achieve higher accuracies when redundant predictors are removed from a high-dimensional dataset [104,105].In addition, radial basis-function kernel SVMs and RF are less sensitive to the Hughes effect than traditional discriminant classifiers such as LDA [102,106].While LDA, for example, requires regularization to reach high accuracies [107], SVMs and RF are state-of-the-art classifiers for hyperspectral data when their parameters are optimized properly [106,107].Therefore, at both scales, applying vegetation indices using a non-linear classifier that can deal with high-dimensional data provides an efficient method to extract meaningful information and to achieve an optimized and balanced ratio of Sensitivity and PPV.

Identifying the Optimal Classifier
The definition of the best classifier in a comparative approach depends on the selected accuracy measure.For invasive species detection and management, a compromise between Sensitivity and PPV has to be made (e.g., [108]).In our case, at the canopy scale, for example, the highest Sensitivity and, thus, the highest detection rate of A. longifolia was reached by sPLSDA FullSpec , VegInds and HDDA VegInds, but at the expense of a relatively high amount of false positives.Hence, in cases where the reliability of results is an important selection criterion and false detection is not acceptable, RF VegInds would deliver the best classification accuracy.This may be the case, e.g., for reporting cover values of invasive species in monitoring networks, where reliable distribution data are required to decide on management priorities and strategies.In contrast, other approaches such as screening studies of early warning systems aiming at detection of early stages of invasion require the identification of every single invasive individual.In this case, false detection may be acceptable and HDDA VegInds would provide the better alternative.Therefore, the choice of classifier and accuracy measure depends on the management objective.
The performance of SVM (leaf scale) and RF (both scales) agrees well with recent studies of plant species discrimination [16,19] including a recent meta-study comparing a large set of classification algorithms [75].However, accuracy can also depend on the species combination rather than the algorithm [108], and classifier performance can vary between functional groups [23].There are clear differences between classifiers regarding accuracy while using the same species combination.Therefore, for diverse ecosystems such as Mediterranean-type systems covering species with different leaf types presented here and elsewhere [23], an extensive spectral library is required to find the best possible model for a reliable classification result.

Interpretation of the Relevant Vegetation Indices for Identifying Acacia longifolia at the Canopy Level
The spectral information at the canopy level is most relevant for further comparison with airborne or satellite remote sensing data.Several species-specific characteristics such as leaf chemistry, canopy structure as well as litter and soil parameters are combined in the canopy spectrum (e.g., [6,23]).Here, vegetation indices related to chlorophyll, cellulose, water, vegetation cover, and nitrogen stress differed significantly between A. longifolia and other species, and provided important predictors to distinguish the invader.These indices were the Transformed Chlorophyll Absorption Ratio Index (TCARI2), the Cellulose Absorption Index (CAI), the Double Peak Index (DPI), the Leaf Water Vegetation Index (LWVI1), water content (Datt7), the Shortwave-Infrared Vegetation Index (SWIR.VI) and Edge-Green First Derivative Ratio (EGRF).This agrees partly with Oldeland et al. [109] who mapped Acacia encroachment (Acacia mellifera (Vahl) Benth., Acacia reficiens Wawra and Acacia hebeclada DC.) in a semi-arid ecosystem based on hyperspectral HyMap imagery using vegetation indices related to chlorophyll (CARI), greenness (DGVI), water (LWVI1), lignin (NDLI), nitrogen (NDNI), and cellulose (CAI).
In both studies, indices related to chlorophyll, leaf water and cellulose were important to distinguish Acacia.However, additional indices related to water (Datt7, EGFR), nitrogen limitation (EGFR), physiology (DPI) and vegetation cover (SWIR.VI) turned out to be important in our study, whereas Oldeland et al. [109] chose indices in the SWIR region related to lignin and nitrogen content (NDNI, NDLI), and parameters related to the red edge region were absent.Thus, a similar though slightly different set of indices enabled the successful identification of several Acacia species.
Groundwater availability and adaptation to drought are possibly important factors that influence spectral shapes and affect spectral separability of species in this ecosystem [23].The fact that the majority of important indices was related to physiology and water content could be due to the sampling season and potentially indicates a different adaptation strategy of A. longifolia to environmental conditions [45,55].Acacia longifolia is known to be a water-spending species, and it increases the stand transpiration while decreasing the water availability for co-occurring species [45].In contrast to native dune species, A. longifolia shows only little stomatal control of water use [55] and it maintains a high use of resources under drought conditions, which is a novel trait in the studied dune ecosystems [42,45].The question if and how invasive species handle water stress is highly relevant under climate change scenarios in Mediterranean ecosystems [110].Thus, selecting vegetation indices can both increase classification accuracy and reveal the invader's adaptation strategies to drought.
Regarding the TCARI2 and DPI indices at the canopy level, A. longifolia was distinguishable by its higher chlorophyll content.Similarly, greenness was found to be an important factor discriminating the invasive Arundo donax L. from adjacent riparian vegetation in Portugal, as it produced new fresh green leaves when water was available during the vegetative period [14].Acacia longifolia, too, has a high growth rate and extended growth period [42,55], efficient nutrient acquisition [41] and high transpiration rates [45] while having a different leaf type (large phyllodes) compared to native species in this dune ecosystem [23,111].This was reflected in the slightly, but significantly lower CAI value for A. longifolia which can be related to less dry, non-photosynthetically active plant material in the canopy [83,112].In contrast to directly selecting one specific biochemical parameter, e.g., leaf tannin content [51], our data mining approach identified suitable vegetation indices from a large set of easily produced predictors.It enabled a robust and quick classification of the invasive plant A. longifolia, and gave insights into biochemical, biophysical and ecophysiological traits.

Conclusions
We showed that the high-impact invasive species A. longifolia can be distinguished at leaf and canopy level using vegetation indices derived from field spectral data in the studied Natura 2000 protected Mediterranean dune ecosystem.The best results in terms of a high detection and a high reliability were achieved by using non-linear classifiers that can deal with the Hughes effect based on vegetation indices.Thus, there is high potential for mapping the invader at airborne and satellite level using multispectral and hyperspectral sensors.Apart from increasing classification accuracy, choosing vegetation indices gave insights into the adaptation strategy of the water-spending invader in this semi-arid ecosystem.Multi seasonal studies could further explore the best time frame for mapping as seasonal variation of biochemical, biophysical and ecophysiological parameters may affect the spectral separability.The spectral library delivers baseline information that could be extended further for multiclass approaches to give insights into the separability of other important species such as other invaders, endemic species, or those protected by national and international legislation.Moreover, regional scale, high-resolution mapping of the invader may enable quantification of invasive status of pristine ecosystems, prediction of future invasions and identifications of areas of high-conservation value.EUFAR (DeInVader, EUFAR11-06).This study contributes to the project "INSPECTED.NET" funded by the European Union's Seventh Framework Programme FP7-PEOPLE-2010-IRSES (Proposal No. 269206).We also would like to thank the ICNF Portugal and the Estabelecimento Prisional de Pinheiro da Cruz (EPPC), Portugal, for granting site permission, Cristina Máguas for logistic support, Manuel João Pinto for helping in plant species identification, Péter Burai and Csaba Lénárt for assisting in the field and for providing the spectrometer in 2011, Marius Appel for IT support as well as Jeannine Böhmichen, Jan Lehmann, Jörg Lüling and Tercia Vargas for assisting during field work.We also thank the four anonymous reviewers whose comments and suggestions helped improve and clarify this manuscript.

Figure 1 .
Figure 1.Study area and study sites in the stabilized coastal dunes of the Natura 2000 site "Comporta/Galé" in SW Portugal.

Figure 1 .
Figure 1.Study area and study sites in the stabilized coastal dunes of the Natura 2000 site "Comporta/Galé" in SW Portugal.

Figure 3 .
Figure 3. Field spectra and medians of Variable Importance (VIP) (100 iterations) of classification models of Acacia longifolia: (A) mean leaf spectra of A. longifolia and co-occurring native species; (B) VIP of sparse Partial Least Squares Discriminant Analysis (sPLSDA) using the full spectrum at leaf level; (C) mean canopy spectra of A. longifolia and co-occurring native species; and (D) VIP of sparse Partial Least Squares Discriminant Analysis (sPLSDA) using the full spectrum at canopy level.VIP values > 0.8 (dashed lines) indicate highly important predictors.

Figure 3 .
Figure 3. Field spectra and medians of Variable Importance (VIP) (100 iterations) of classification models of Acacia longifolia: (A) mean leaf spectra of A. longifolia and co-occurring native species; (B) VIP of sparse Partial Least Squares Discriminant Analysis (sPLSDA) using the full spectrum at leaf level; (C) mean canopy spectra of A. longifolia and co-occurring native species; and (D) VIP of sparse Partial Least Squares Discriminant Analysis (sPLSDA) using the full spectrum at canopy level.VIP values > 0.8 (dashed lines) indicate highly important predictors.

Figure 4 .
Figure 4. Distribution of Variable Importance (VIP) values of vegetation indices from 100 iterations of classification models of Acacia longifolia using field spectra of leaves and canopies.VIP values higher than 0.8 (dashed lines) indicate highly important predictors.For explanations of vegetation indices, see Table3.Numbers above boxes show the frequency with which the respective index was selected in the final model (max = 100).Boxplots: show medians, interquartile ranges and extreme values within 1.5 × interquartile range.

Figure 4 .
Figure 4. Distribution of Variable Importance (VIP) values of vegetation indices from 100 iterations of classification models of Acacia longifolia using field spectra of leaves and canopies.VIP values higher than 0.8 (dashed lines) indicate highly important predictors.For explanations of vegetation indices, see Table3.Numbers above boxes show the frequency with which the respective index was selected in the final model (max = 100).Boxplots: show medians, interquartile ranges and extreme values within 1.5 ˆinterquartile range.

Table 2 .
Overview of the number of spectral samples for Acacia longifolia and other species used for training and testing at leaf and canopy level.Training samples of the minority class A. longifolia were up-sampled during resampling to match the majority class size and account for class imbalance.

Table 2 .
Overview of the number of spectral samples for Acacia longifolia and other species used for training and testing at leaf and canopy level.Training samples of the minority class A. longifolia were up-sampled during resampling to match the majority class size and account for class imbalance.