Assessing the Fractional Abundance of Highly Mixed Salt-Marsh Vegetation Using Random Forest Soft Classification

: Coastal salt marshes are valuable and critical components of tidal landscapes, currently threatened by increasing rates of sea level rise, wave-induced lateral erosion, decreasing sediment supply, and human pressure. Halophytic vegetation plays an important role in salt-marsh erosional and depositional patterns and marsh survival. Mapping salt-marsh halophytic vegetation species and their fractional abundance within plant associations can provide important information on marsh vulnerability and coastal management. Remote sensing has often provided valuable methods for salt-marsh vegetation mapping; however, it has seldom been used to assess the fractional abundance of halophytes. In this study, we developed and tested a novel approach to estimate fractional abundance of halophytic species and bare soil that is based on Random Forest (RF) soft classification. This approach can fully use the information contained in the frequency of decision tree “votes” to estimate fractional abundance of each species. Such a method was applied to WorldView-2 ( WV-2 ) data acquired for the Venice lagoon (Italy), where marshes are characterized by a high diversity of vegetation species. The proposed method was successfully tested against field observations derived from ancillary field surveys. Our results show that the new approach allows one to obtain high accuracy (6.7% < root-mean-square error (RMSE) < 18.7% and 0.65 < R 2 < 0.96) in estimating the sub-pixel fractional abundance of marsh-vegetation species. Comparing results obtained with the new RF soft-classification approach with those obtained using the traditional RF regression method for fractional abundance estimation, we find a superior performance of the novel RF soft-classification approach with respect to the existing RF regression methods. The distribution of the dominant species obtained from the RF soft classification was compared to the one obtained from an RF hard classification, showing that numerous mixed areas are wrongly labeled as populated by specific species by the hard classifier. As for the effectiveness of using WV-2 for saltmarsh vegetation mapping, feature importance analyses suggest that Yellow (584–632 nm), NIR 1 (near-infrared 1, 765–901 nm) and NIR 2 (near-infrared 2,856–1043 nm) bands are critical in RF soft classification. Our results bear important consequences for mapping and monitoring vegetation-species fractional abundance within plant associations and their dynamics, which are key aspects in biogeomorphic analyses of salt-marsh landscapes.

Fractional abundance, i.e., the fraction of the area-projected on the horizontal plane-occupied by plants of a given species, is an important indicator of vegetation distribution [45], with strong links to biomass and salt-marsh surface geomorphology [18] and its time evolution [3,36,40,48,[63][64][65][66]. Fractional abundance of bare soil is also an important property of the marsh landscape that has been connected to the marsh sediment budget and marsh vulnerability [67,68]. Hence, accurate vegetation and bare soil mapping is of central interest to understand marsh dynamics and to support coastal management strategies.
Due to the profound influence of halophytic vegetation on ecological and geomorphological processes and their spatial and temporal dynamics [36,38,45,[69][70][71][72], analyses of the fractional abundance of salt-marsh vegetation species are required over a large range of spatial scales, from the local (plant) scale to the whole-marsh scale (up to several km 2 ). Remote sensing is an ideal tool to obtain this type of quantitative information and there is an ever-growing amount of research work focusing on the application of remote sensing methods to map fractional abundance of halophytic vegetation in space and time [44,70,[73][74][75][76].
Classification methods applied to salt marshes have been developed for and applied to multiand hyperspectral remote sensing data in a diverse set of biomes worldwide [76][77][78][79][80][81][82]. The large majority of previous approaches to halophytic vegetation mapping determined vegetation abundance by identifying the dominant species in each pixel, using traditional supervised and unsupervised classification algorithms [2,44,76,[83][84][85][86]. Nonparametric mapping methods, such as Random Forest (RF) algorithms, have also been applied to halophytic vegetation mapping in the form of pixel-based [84,87] and object-based methods [76,86,88]. However, halophytic vegetation species are highly mixed at the scale of typical satellite sensor resolutions (order of 0.5-1 m) such that the use of hard classification approaches, which attempt to associate a single dominant species to each pixel, is hardly justified. Yet, the number of studies focusing on retrieving the fractional abundance of halophytic vegetation species and bare soil at the sub-pixel scale, i.e., the problem of unmixing, is still limited [65,83]. This is a clear gap that hinders the usefulness of remote sensing retrievals of vegetation distribution and change in salt-marsh studies. Here, we contribute to filling this gap by developing and applying a novel RF-based soft classification method to infer relative species abundance at the sub-pixel scale.
Wang et al. [74] used artificial neural network models to map the fractional abundance of species within associations in salt-marsh landscapes. Artificial neural networks, however, require a relatively time-consuming training phase and the definition and identification of their parameters can be a difficult task [89]. Additionally, artificial neural network performance heavily depends on their structure and design [90], i.e., the number of layers and neurons can significantly influence the accuracy of the method, such that it is difficult to provide a general neural network architecture that can be easily applied in different environments furthermore populated by different species.
RF algorithms [91] have been applied to detect land-use fractional cover [92,93]. However, to our knowledge, the RF approach has never been applied to estimating the fractional abundance of salt-marsh halophytic species at the sub-pixel scale. As marsh vegetation species are particularly highly mixed, here we wonder whether the RF methods may have the ability to provide reliable unmixing results. Furthermore, typical applications of the RF unmixing method to other environments separately estimate single species abundances through regression and subsequently normalize them to sum to 100% [93]. This leads to increased estimation errors, which may be avoided if the RF formulation was better leveraged. To address the latter issue, in this work, we propose a new approach which uses the frequency with which individual "trees" in the RF assign a pixel to each species as reflective of its relative abundance at the sub-pixel scale. This new approach substantially differs from previous analyses based on the RF regression algorithm to estimate fractional abundance at the sub-pixel level, because those analyses do not take advantage of the information contained in the individual tree "votes" and rely on empirical regressions based on field observations. Towards the goal of improving current capability to accurately map fractional abundance of halophytic vegetation in space and time in salt-marsh landscapes, we first explored the possibility of applying the new algorithm based on RF soft classification and then compared the performance of the newly proposed approach to that characterizing existing RF regression methods.

Study Site-the San Felice Salt Marsh (Venice Lagoon, Italy)
The Venice lagoon (top panels in Figure 1) is located in northeastern Italy and is connected to the Adriatic Sea by three inlets: Lido, Malamocco, and Chioggia. The main rivers that used to debouch into the lagoon were diverted directly to the sea in the XVI-XIX centuries [94], and only a few small rivers now remain, carrying modest amounts of freshwater and sediments into the lagoon. The Venice lagoon has an area of about 550 km 2 and is characterized by a semidiurnal tide with an average tidal range of about 1.0 m and a maximum spring tidal range of approximately 1.5 m [95,96]. The present study focuses on the San Felice salt marsh (bottom panel in Figure 1), one of the most naturally preserved areas within the northern part of the lagoon, close to the Lido inlet. The San Felice marsh is characterized by relatively healthy vegetation conditions [18,97] and is colonized by halophytic vegetation associations dominated by the following species: Salicornia veneta (hereafter "Salicornia"), Spartina maritima (hereafter "Spartina"), Limonium narbonense (hereafter "Limonium"), Sarcocornia fruticosa (hereafter "Sarcocornia") and Juncus maritimus (hereafter "Juncus") [44,45,74,98]. Silvestri et al. [45] reported that each species occupies a preferential range of possible elevations, thus leading to a typical species sequence with increasing elevation. Moreover, due to the strong link between marsh elevation and distance to channels [40], the distribution of halophytic species also varies from the channel edges to the inner portions of the marsh. Specifically, Salicornia and Spartina are preferably found in the lowest areas (inner portions of the marsh), Limonium tends to occupy intermediate marsh elevations, and Sarcocornia is more likely to colonize higher marsh areas, close to marsh edges. Juncus tends to develop where creeks bring litter and organic matter accumulates over time. In general, the density of halophytic vegetation decreases with distance to marsh edges [18].
As mentioned above, halophytic vegetation distribution is strongly linked to marsh morphology through a landscape-forming bio-morphologic process, and species are associated with (possibly overlapping) characteristic elevation ranges [45,48]-the result of species adaption to edaphic factors [43,50,99] and of interspecific competition [60,100]. Productivity and bio-diversity of halophytic vegetation are also linked to elevation [36,71,101]. The presence of these links between morphological and ecological patterns highlights the great value of robust fractional-abundance mapping methods to monitor and analyze salt-marsh bio-morphodynamics. This suggests that robust fractionalabundance mapping algorithms are of critical importance to analyze halophytic vegetation distribution patterns and temporal dynamics at large spatial scales (indicatively 10 m -10,000 m).
Finally, halophytic vegetation distribution has been observed to change over time scales of a few years [44,62,65,70,102,103], especially in the current accelerating sea-level rise scenario [104][105][106][107][108]. Hence, a proper quantitative description of marsh vegetation space-time dynamics would greatly benefit from robust and highly repeatable quantitative mapping.

WoldView-2 Data
We developed and tested RF unmixing methods with application to WorldView-2 (WV-2) data. The workflow of this study is shown in Figure 2.
The WV-2 sensors included a panchromatic spectral band with a high spatial resolution (0.5 m) and 8 multispectral bands (Table 1) with a lower spatial resolution (2 m), spanning 4 standard bands (red, green, blue, and near-infrared 1) and 4 other application-oriented bands (Coastal, Yellow, Red Edge, and near-infrared 2). The sensor acquired data from an altitude of about 770 km. The data analyzed in this study were acquired at 10:23:00 on Nov 7, 2019. At the time of acquisition, the tidal level, measured at the Saline tide gauge station, close (about 3 km) to the San Felice marsh, was about 0.76 m above the Punta della Salute datum. As the current MSL is 31 cm higher than the Punta della Salute datum, the water level was about 0.45 m above the MSL at the time of acquisition, which corresponds to water depths ranging between 0 and 30 cm over the marsh platform.

Field Observations
Field vegetation mapping was performed on Jan 9, 2020. Twenty-four regions of interest (ROIs, i.e., field areas used for training and validating the classification model), were selected, with areas ranging between 18.0 m 2 and 106.5 m 2 ( Table 2 and bottom panel in Figure 1). ROIs were randomly selected across the marsh to include all typical homogeneous associations of species that encroach the San Felice salt marsh. For each ROI, the percentage cover of vegetation species and bare soil was estimated using the standard Braun-Blanquet visual method, which records the presence of each species by 10 intervals between 0% and 100% [44]. The boundaries of the ROIs were accurately delimited through differential GPS (Leica CS15 in RTK mode, minimum accuracy of ±1 cm) (see Table  2 for ROI properties). ROIs were then overlaid on the WV-2 georeferenced image (using ArcGIS 10.5) and only pixels falling entirely within an ROI were used to build the classification dataset, which was then divided into two independent training and validation subsets, as explained in Section 2.3 [74].

WorldView-2 Data Preprocessing
Even though the atmospheric correction may not influence the result and accuracy of classifications [109], we applied such a correction to obtain more accurate spectral information to favor the interpretation of the results, and for possible comparisons with past or future acquisitions. The Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) algorithm [110,111] was employed to perform atmospheric correction in Envi 5.4. In FLAASH, the "Mid-Latitude Winter" Atmospheric Model and the "Maritime" Aerosol Model were used. Due to the lack of aerosol optical thickness data at the nearest AERONET station on the acquisition date (https://aeronet.gsfc.nasa.gov/) and good weather conditions, according to the instruction [112] the visibility was set to "Clear", corresponding to 40 km. After atmospheric correction, the multi-spectral bands were pan-sharpened using the panchromatic band, which has a spatial resolution of 0.5 m, through the Gram-Schmidt Pan Sharpening algorithm [113,114]. Both atmospheric correction and pan-sharpening were performed in ENVI 5.4. Water bodies were masked based on negative values of the NDVI (normalized differential vegetation index) [115] derived from "NIR1" and "Red" channels (Table. 1).

Algorithm Description
In this work, the RF algorithm is applied using the Scikit-learn package [116], a freely available machine learning library for the Python programming language.
RF is a machine-learning algorithm based on the Decision Tree method [91], which is being increasingly and successfully used in remote sensing analyses of vegetation species and habitats [76,77,84,93,117,118]. RF classification is a supervised nonparametric classification method that makes predictions through a set of decision trees [91] which form a so-called "forest". Each decision tree is composed of a set of internal nodes and terminal nodes. Given a set of pixels known to belong to different information classes (i.e., the different vegetation species in the present application), training is performed by feeding each tree with the input spectral reflectance and the associated class for each pixel. Training pixels (samples) are then split into two groups ("left" and "right") at each node, based on so-called "best split" binary rules [91,119] and decreasing the Gini impurity index (G(I)), which is defined as: where is the frequency of occurrence of the ith class among the n total classes. G(I) represents the impurity level of information in the current node. Specifically, the highest value of G(I), G(I)=1-1/n, shows each class is equally distributed in this node, while the minimum value of G(I) shows all pixels in this node belong to one class. The best-split is chosen by maximizing the impurity decrease (ID), which can be expressed as: where N is the total number of samples in the training set, Nt is the number of samples at the current node, Ntl is the number of samples in the left child node, and Ntr is the number of samples in the right child node, and the ( ) ℎ and ( ) are G(I) in the right and left child node, respectively, [116]. In a decision tree, the nodes will be split if this split induces a decrease in the impurity larger than or equal to a determined value, named "min impurity decrease" (hereafter mid). The value of mid is pre-determined as 0, which is the default value in Scikit-learn package. If the growth of a tree is not bounded, the tree would keep growing until there is a terminal node for every single pixel.
In the RF classification, a user-defined large number of decision trees (ntree) is chosen, but each tree has limitations in growth given by specific rules. In particular, each tree learns from a subset of pixels randomly selected from the training dataset. Two-thirds of the pixels present in the training dataset are drawn with replacement (i.e., bootstrapping) to construct each decision tree; thus decision trees are trained on different subsets of the data [91]. Each tree is fed with a subset of the training pixels, those that are left out (out of bag, OOB) are used as validation datasets to test the predictive ability of that individual tree. The proportion of times that OOB samples are incorrectly predicted is recorded and averaged over all cases to produce an OOB error estimate (OOB score). The OOB error estimation has been proven to be unbiased [120].
The training process also makes it possible to evaluate the importance of each spectral band in reducing the classification error. Specifically, at each split, the decrease in G(I) is recorded for each band (Xi) that was used to form the split. The average of all decreases in the G(I) in the forest where band (Xi) is involved yields the Gini variable importance value (IV) [121][122][123]. Scikit-learn normalizes IVs of each band to a value between 0 and 1 by dividing by the sum of all importance values [116]. As for other supervised classification methods, the training dataset may come from ROIs selected in the image. If the ROIs contain mixed vegetation and bare soil areas, as it often happens with marsh sites, the RF classifier allows to include "sample weights", which, in this study, correspond to the fractional abundance of different vegetation species and bare soil estimated for each ROI during field surveys. Once "sample weights" are defined, the split is determined by the weight of each class at the current node, instead of the number of samples. N, Nt, Ntr, and Ntl in Equation (2) will be the sum of the weight of all species at their corresponding nodes. For example, a node including n pixels, in which fractional abundance belongs to ith species is Fi. N at this node can be obtained by = ∑ . Once this node acts as a mother node to split, the calculation of Nt, Ntr, and Ntl will follow the same method. Moreover, we use another parameter, i.e., the "min weight fraction", to control the split process. "Min weight fraction" is defined by Pedregosa et al. [116] as the minimum weighted fraction of the total sum of the weights (of all the input samples) at each child node. Here, we prefer to interpret it as a threshold value to determine whether the child node should be created or not. Specifically, the weight fraction (wf) of each child node creation is defined as follows: where weightchild and weightmother are the weight sum of classes in the child and mother node, respectively. A child node whose wf is lower than "min weight fraction" should not be created. In this study, the "min weight fraction" is set to 0. The RF is trained based on the provided classes and sample weights (fractional abundances), generating the pruning decision trees. After the training, when an unknown pixel value is input into the model, each decision tree assigns it to a specific class independently. This is usually explained by saying that each tree "votes" for a class, thus suggesting the possibility to consider it as a voting process. The RF classifier records the number of votes associated with the classified pixel for all the classes and the pixel is usually assigned to the class with the highest number of votes. Instead of considering the final association of each pixel to the mostvoted class, in this work, we consider the number of votes as the probability value that the pixel belongs to one specific class [91], and then we interpret such probability value as the sub-pixel relative fractional abundance. An important advantage of the RF classification is that it can manage a large number of input bands [84,88,124] and minimize data dimensionality issues, such as the Hughes phenomenon [117], that make the large amount of information contained in multispectral data difficult to exploit fully. Moreover, unlike some supervised parametric classifiers, such as the Maximum Likelihood method which assumes data are normally distributed [125], RF is also capable of handling multi-modal datasets, whose variables display more than one maximum in their probability distribution [117].
In this work, in order to test the accuracy of our results, we have randomly divided the original dataset into two independent groups, i.e., 75% of the pixels (2804 pixels) from the ROIs were used for model training and 25% of the pixels (935 pixels) were used for testing. Sample weights of training data were passed to the model according to fractional coverages of vegetation species recorded in the field. At the end of the process, we assumed that the predicted probability of each vegetation class (i.e., species) equals its fractional abundance. Results were validated using the fractional abundance of the validation dataset and the error for each vegetation species was calculated.
With the purpose of verifying the effectiveness of this new approach based on RF soft classification for sub-pixel fractional abundance assessment, we used the same dataset to train and test a traditional RF regression method.
Similarly to the RF classification, the RF regression is an ensemble of decision trees, and it is based on the assumption that the relationship between input variables (spectral reflectance) and subpixel fractional abundance can be described through a non-linear correspondence [91]. Following the above description of RF classification, it is easy to understand the processes of RF regression, which shares many of the advantages of the RF classifier. RF regression is always characterized by a relatively low risk of overfitting, compared with other regression methods, especially the Decision Tree regression. Similar to its classification counterpart, RF regression can provide a relatively unbiased evaluation of the model (through OOB information). In the RF regression process, the same training and validation datasets of RF soft classification were used to construct and test the RF regression model.
The main steps of the RF regression are: 1) the RF generates a regression model for each vegetation species based on the training dataset; 2) for each unknown pixel, the RF regression model is used to predict the fractional abundance of vegetation species and soil, and the prediction error is calculated using the validation dataset; 3) the results obtained for each pixel are then rescaled to sum to 100% because the method predicts vegetation fractional abundance separately for each vegetation species; 4) the accuracy of the predicted percentage for each class (obtained in Step 3) is again quantified using the validation dataset.
Considering that traditional approaches to mapping halophytic vegetation usually assign pixels to the dominant species [2,44,84,97] and that some of species associations are dominated by one species, we used the fractional abundance maps obtained with the RF soft classification method to produce a map of the most abundant halophytic species across the study site. Specifically, the pixels with percentage cover larger than 60% were considered as colonized by the dominant species [44]. The results were then compared to a map obtained using an RF hard classification trained using only ROIs characterized by relative homogeneous vegetation communities (or bare soil) with the dominant species (or soil) covering more than 60% of the area. A dataset of 2829 pixels was used, but to allow for error assessment, it was randomly divided into two groups: 2121 pixels (about 75% of the dataset) were used in model training and 708 pixels (about 25% of the dataset) were used in model validation. We notice that the number of data used for the hard classification is smaller than that used for soft classification and regression because hard classification just includes pixels with percentage cover greater than 60% [44] while all pixels are used in soft classification and regression processes.

Estimation of Accuracy
The Confusion Matrix was used to evaluate the performance of the hard classification, which can provide Overall Accuracy (A), describing the ratio between the number of correctly classified validation points and the total number of validation points irrespective of the class [126]. We also used the Kappa coefficient, K, which is defined by the proportion of correctly classified validation sites after random agreements are removed [127]. The root-mean-square error (RMSE) and the coefficient of determination (R 2 ) between predicted fractional abundance and test data were calculated for each class to estimate model performance: where is the ground referential value, ̂ represents the predicted value, is the average of the observed values, and n is the number of test points.

Selection of Ntree
In this work, we selected ntree based on the accuracy of hard classification, testing the overall Accuracy (A) variations when the number of trees ranged from 10 to 1000. Figure 3 shows the results obtained when ntree is in the 10 to 1000 range. We notice that the training accuracy A rapidly increases and stays stable once ntree is larger than 150. The steady increase in A with ntree can be attributed to the reduced risk of overfitting. Indeed, the peak value of A is approached when ntree ranges from 460 to 490. As the RF is a computationally efficient algorithm and a larger ensemble of trees can reduce the risk of overfitting, ntree should be set as large as possible [128]. Considering that several applications of the RF to remote sensing image classification used ntree = 500 [93,[129][130][131], we decided to use ntree close to 500 to compare the results obtained with our method to those from previous analyses. We thus selected ntree equal to 490, which was close to the value used in previous studies and also provided the highest value of A in this study. Even though ntree was selected based on RF hard classification, to consistently compare the results, we have maintained ntree = 490 also for RF soft classification and RF regression.

Fractional Abundance Based on RF Soft Classification Method
As discussed in the Methods section, we use the number of votes resulting from the RF soft classification to determine the probability of each vegetation species. The main advantage of this new approach is that, for each pixel, the sum of the predicted probability of each class is equal to 100%. Indeed, by assuming that the probability of each species represents its fractional abundance, there is no need to rescale the abundance of the different species. This is consistent with the collection of ancillary data because we emphasize that the method used in the field for determining the fractional abundance (i.e., the Braun-Blanquet visual method, commonly used in ecology) is also essentially related to the occurrence probability of each class.
Maps of fractional abundance of Juncus, Limonium, Salicornia, Sarcocornia, Soil, and Spartina generated using the RF soft classification method are shown in Figure 4, while Table 3 shows R 2 and RMSE for fractional abundance. We notice that R 2 and RMSE values for the RF soft classification range from 0.652 to 0.956 and from 6.753 to 18.667, respectively. This suggests that the RF soft classification method can successfully predict the fractional abundance of each species and bare soil.

Subpixel Classification through RF Regression Method
It has been shown that RF regression can perform well for vegetation species mapping when ntree is large (for example, 500 decision trees) [93]. To compare the performance of the RF regression method with the soft method, 490 decision trees (i.e., ntree = 490) were used to predict the abundance of the six vegetation species individually. Table 4 provides the accuracy retrieved for each class, showing that R 2 and RMSE range from 0.74 to 0.98 and from 4.5 to 15.0, respectively. These results confirm that the RF regression is an accurate predictor of percentage for each class when we consider one class at a time ( Figure 5). However, once the predicted abundance of each class is simply rescaled to 100% (i.e., the percentage values of the classes are rescaled to sum to 100% for each pixel), the accuracy decreases ( Table 4), suggesting that the RF regression method may not be suitable to provide quantitative information on the fractional abundance for highly mixed vegetation species. Due to their low accuracy, the rescaled fractional abundance maps are not shown in this paper.   Figure 6 shows the results of an RF hard classification with ntree = 490, trained with the same dataset used for the RF soft classification (Figure 6a). The OOB, A, and Kappa coefficient of the RF hard classification are 0.96, 0.97, and 0.96, respectively. Figure 6b shows a majority map created using fractional abundance predicted by the RF soft classification, i.e., a map that shows the spatial distribution of species with fractional abundance higher than 60%. Black pixels in the map indicate the highly mixed locations, where the percentage cover of all classes is lower than 60%. The Confusion Matrix for the RF hard classifier is displayed in Table 5. Our results show that the RF hard classifier can efficiently distinguish different vegetation associations based on the dominant species and bare soil.

Discussion
We developed and tested a new method that uses the frequency with which an individual "tree" in an RF algorithm assigns a pixel to each species as reflecting the fractional abundance of the corresponding species. A comparison of results from the new algorithm to those from existing RF regression methods [132][133][134] shows a superior performance of the proposed method (Table 3), which thus constitutes a powerful method for the analysis of vegetation patterns and their dynamics in saltmarsh landscapes.

Halophytic Vegetation Distribution Patterns on the San Felice Marsh
The application of the new method of vegetation abundance mapping to marshes in the Venice lagoon allows the quantitative description of some characteristic patterns exhibited by halophytic vegetation. Figure 4a shows that Juncus is more likely to populate marsh edges, while Limonium and Sarcocornia (Figures 4b,d) tend to compete for the same area, located at a moderate distance from the tidal channels. Figures 4c,f show that Salicornia and Spartina tend to cover the inner portions of the marsh. Such patterns nicely agree with those documented through field observations [44,45]. Indeed, as discussed in Section 2, halophytic vegetation distribution is associated with salt-marsh surface morphology. Silvestri et al. [45] showed that, in the study marsh considered, Spartina colonizes the inner and lower part of the marsh, Limonium and Sarcocornia are more likely to be observed at intermediate surface elevations, and Juncus tends to occupy higher-elevation marsh areas. Indeed, the fractional abundance of each species has been considered as an indicator of marsh morphology [53] and of distance to channels [135]. Consistent with observational evidence [2,45,53,74], maps of fractional abundance of each species provided by the "soft" RF algorithm (Figure 4) emphasize the clear link between vegetation distribution and marsh surface morphology, which is strongly related to the distance to main channels representing the source of sediments delivered to the platform [136]. Indeed, inner marsh portions, that are mainly occupied by Salicornia and Spartina (Figure 4c,f), display lower elevations; areas at moderate distance to the channels, that are encroached by Limonium and Sarcocornia, (Figure 4b,d) are characterized by intermediate surface elevations; marsh edges, which are mainly occupied by Juncus, are characterized by higher elevations. The link between plant distribution and marsh morphology, described in Figure 4, is consistent with observational evidence [18,53], and therefore further confirms the robustness of the RF soft classification. The repeated application of the novel soft RF algorithm to a temporal series of remote sensing data from the same marsh can thus allow a quantitative and repeatable monitoring of marsh eco-morphodynamic processes.

The RF Soft Method Performance Compared to Existing Regression Models
It is worth recalling that the RF regression method [132][133][134]137,138], after simply rescaling abundance of each class to sum to 100%, was observed to perform well when applied, e.g., to map fracitonal abundance of tree species in Bavaria (Germany) (0.72 < R 2 < 0.82) [93] and plant types (0.47 < R 2 < 0.78) in East Asia steppe (China, Mongolia, and Russia) [133]. However, in our case the accuracy of the RF regression model was not satisfactory (0.14 < R 2 < 0.58). This relatively worse performance can probably be attributed to the high small-scale heterogeneity that characterizes marsh vegetation. In particular, the number of classes in Bavaria (two tree species and one class labeled as "other" considered in [93]) and in East Asia steppe (two plant types: woody and herbaceous) was lower than that of the Venice lagoon (five vegetation species and one class representing the bare soil on the marsh). Furthermore, the renormalization of RF regression results summing to 100%, which is necessary to obtain fractional abundances, is likely to increase the estimation error (Table 4 and Figures 7-9). Indeed, as Immitzer et al. [93] reported when they estimated the fractional abundance of tree species in Bavaria via the RF regression, the highest value of the sum of the relative abundance of the three considered classes in each pixel was about 102%, while such a value increased to more than 200% in our case. It should be noticed that the renormalization process in the application of the RF regression is performed in this study by assuming that all RF regression models for single species contribute equally, e.g., considering that the sum of the predicted abundance is 100% [139]. In order to improve the accuracy of the RF regression, a weighted contribution of each model on the basis of the documented vegetation distribution patterns could be considered. Figures 7-9 and Table 4 suggest that, although the RF regression model predicts reasonably the relative distribution of each class taken separately, the method that is usually adopted [93,132,133,137] can hardly be applied to accurately estimate the fractional abundance of each species in the case of highly mixed species. Values of R 2 and RMSE for fractional abundance derived from RF soft classification and RF regression, respectively (Tables 3 and 4), suggest that the RF soft classification performs slightly worse than RF regression for single classes, while its performance is considerably higher compared to the rescaled RF regression method. Figures 7-9 show the outcome of the test performed to compare field observations with the results obtained with the three different methods (RF soft classification, RF regression, and rescaled RF regression models) for Juncus and Limonium (Figure 7), Salicornia and Sarcocornia (Figure 8), Soil and Spartina ( Figure 9) and highlight that the RF soft classification method performs much better than the rescaled RF regression method. The superior performance of the RF soft classification can be attributed i) to the full use of the information provided by each decision tree, and ii) to the simultaneous consideration of all classes which avoids the need to perform ad hoc renormalizations.

Drawbacks of Dominant Species Maps
Due to the high biodiversity of halophytic vegetation species on marshes, we argue that traditional hard classification methods (i.e., where classifiers tend to associate each pixel to an individual species or to bare soil) [44,70] cannot provide accurate information on vegetation distribution. Indeed, pixels (whose sizes are in the order of 0.5-1 m) in remote sensing images are often composed of highly mixed vegetation associations [53,74], particularly over salt marshes. The results obtained with the hard classification (Figure 6a) allow us to perform a further analysis of the results obtained with the RF soft classification (Figure 6b). Specifically, we notice that the position of patches occupied by dominant classes agrees quite well with those obtained with a RF hard classification, thus suggesting the robustness of the RF soft method. Furthermore, we notice that some large mixed areas, composed by more than one species (or bare soil), cannot be detected by the hard classification method. These areas are mostly located in the inner portions of the marsh, where topographic elevations are relatively low [18,45] and inner species (Salicornia and Spartina) are always mixed with bare soil. Finally, we also notice that mixed areas are observed in Limonium-dominated areas in the hard classification results. This can be attributed to the fact that Limonium and Sarcocornia tend to colonize the same areas.
We further compared our results to those obtained by Belluco et al. [44] using ML (Maximum Likelihood) and SAM (Spectral Angle Mapper) hard classifiers applied to a 2001 IKONOS dataset over the same study site. The map obtained with the RF hard classification and the majority map obtained from the RF soft classification are both very similar to that of ML and slightly better than its SAM counterparts, based on the comparison of A and Kappa coefficient, indicating that RF is a reliable classifier for halophytic species.
We therefore conclude that, in highly mixed vegetation environments like salt marshes, traditional hard classification methods do not provide sufficient information on species distribution since they must necessarily label mixed areas with the dominant species. On the contrary, soft classification methods, when properly applied, provide essential information about species presence (also within mixed pixels). Majority maps obtained from RF soft methods are consistent with those produced with hard classification methods, lending further support to the method introduced here.

Feature Importance Analyses
An advantage of the RF algorithm is that it allows the quantification of the importance of each spectral band (i.e., feature) used in the classification. In our case, we analyzed IV values of each WV-2 band for the detection of salt marsh vegetation and bare soil. As for the hard classification method, Figure 10a shows that the Yellow band (wavelength: 584-632 nm) is the most important band among those provided by WV-2. One possible explanation for this could be that the Yellow band facilitates the detection of bare soil, which has higher reflectance at this wavelength compared to vegetation ( Figure 10c). As for the subpixel classification based on RF soft method, Yellow band, NIR 1, and NIR 2 bands are those that provide the majority of information (Figure 10b), possibly due to the difference of reflectance characteristics in the NIR 1 and NIR 2 bands of different vegetation species (Figure 10c). Table 6 suggests that the Yellow band is of critical importance in Soil and Limonium percentage regression. Moreover, Table 6 also shows that the NIR 1 and NIR 2 bands are critical in regression practice for other classes. These analyses suggest the NIR 1 and NIR 2 bands provided by WV-2 can improve the accuracy of halophytic classification. This can be explained because, as shown in Figure  10c, in the visible range, reflectance values of vegetation species are similar, while the variability increases in the NIR 1 and NIR 2 bands.  An important source of uncertainty in this study may be related to the interference of water with species reflectance. The tidal elevation at the time of acquisition was about 0.45 m above MSL, suggesting that large portions of the marsh, whose elevations range from 0.15 m to 0.60 m above MSL, were flooded with water depths up to 30 cm. Kearney et al. [140] documented that tidal inundation can result in a significant reduction in NIR 2 (856-1043 nm) and greatly affects the Red Edge band (699-749 nm). High water levels thus increase noise in spectral reflectance information, in particular for NIR bands and significantly affect the outcome of hard and soft classifiers.

Conclusions
Halophytic vegetation, an important component of salt marshes, is typically organized in patches of species associations. In this study, we focus on the development of a new approach based on the application of RF soft classification for estimating fractional abundance of each species within vegetation associations and applied it to a WV-2 multispectral image. In particular, we make full use of the information contained in the distribution of "votes" from individual decision trees and interpret their distribution across classes as the corresponding fractional abundance. This approach yields high classification accuracies (6.7% < RMSE < 18.7% and 0.65 < R 2 < 0.96). We found that, while the RF regression can predict the percentage of each class accurately when each class is considered separately, the overall accuracy decreases significantly when relative abundances are rescaled to sum to 100%. Comparisons of RF soft classification results to rescaled RF regression results (Figures 6, 7 and 8) suggest that the former is more suitable to accurately map fractional abundance in highly mixed halophytic associations. Our results show that the RF soft-classifier predicted distribution patterns are in very good agreement with halophytic vegetation patterns documented by previous analyses [18,44,45,83], thus confirming the usefulness of the method.
We show that the results obtained with the RF soft classification can be used to produce a map of the dominant species within the plant association (i.e., with percentage cover higher than 60% in our case). This map nicely agrees with an RF hard classification map (Kappa = 0.962, A = 0.970) produced for the same study site, thus emphasizing the RF soft-classifier robustness. Our comparison also highlights that the traditional hard classifiers force the pixels to be assigned to a specific class, which is unrealistic when dealing with mixed vegetation associations as in the case of salt marshes, thus neglecting the heterogeneous contribution to the spectral signal associated with the mixture.
In conclusion, we developed a robust RF soft classification approach to assess the fractional abundance of halophytic vegetation and bare soil. This approach uses the frequency of "votes" to each species to represent corresponding fractional abundance. We applied this method to estimate the fractional abundance of halophytic vegetation species within our study site, which is characterized by high biodiversity of salt-marsh vegetation and where halophytic species are organized in mixed vegetation associations at the scale of the satellite sensor resolution (0.5 m). The proposed method allowed us to obtain high accuracy in the current application, suggesting it can be a valuable tool to analyze the distribution pattern of fractional abundance of salt-marsh vegetation species. The comparison between the results obtained with the RF soft classifier to those drawn from its regression counterpart shows its superior robustness. We suggest that the RF soft classification allows one to monitor the temporal evolution of halophytic vegetation, such as dieback and replacement. We, therefore, suggest that the RF soft classification method should be considered to analyze salt-marsh response to sea-level changes and for the development and testing of biogeomorphic models.