Assessing the Potential of Sentinel-2 and Pléiades Data for the Detection of Prosopis and Vachellia spp. in Kenya

: Prosopis was introduced to Baringo, Kenya in the early 1980s for provision of fuelwood and for controlling desertiﬁcation through the Fuelwood Afforestation Extension Project (FAEP). Since then, Prosopis has hybridized and spread throughout the region. Prosopis has negative ecological impacts on biodiversity and socio-economic effects on livelihoods. Vachellia tortilis , on the other hand, is the dominant indigenous tree species in Baringo and is an important natural resource, mostly preferred for wood, fodder and charcoal production. High utilization due to anthropogenic pressure is affecting the Vachellia populations, whereas the well adapted Prosopis —competing for nutrients and water—has the potential to replace the native Vachellia vegetation. It is vital that both species are mapped in detail to inform stakeholders and for designing management strategies for controlling the Prosopis invasion. For the Baringo area, few remote sensing studies have been carried out. We propose a detailed and robust object-based Random Forest (RF) classiﬁcation on high spatial resolution Sentinel-2 (ten meter) and Pléiades (two meter) data to detect Prosopis and Vachellia spp. for Marigat sub-county, Baringo, Kenya. In situ reference data were collected to train a RF classiﬁer. Classiﬁcation results were validated by comparing the outputs to independent reference data of test sites from the “Woody Weeds” project and the Out-Of-Bag (OOB) confusion matrix generated in RF. Our results indicate that both datasets are suitable for object-based Prosopis and Vachellia classiﬁcation. Higher accuracies were obtained by using the higher spatial resolution Pléiades data (OOB accuracy 0.83 and independent reference accuracy 0.87–0.91) compared to the Sentinel-2 data (OOB accuracy 0.79 and independent reference accuracy 0.80–0.96). We conclude that it is possible to separate Prosopis and Vachellia with good accuracy using the Random Forest classiﬁer. Given the cost of Pléiades, the free of charge Sentinel-2 data provide a viable alternative as the increased spectral resolution compensates for the lack of spatial resolution. With global revisit times of ﬁve days from next year onwards, Sentinel-2 based classiﬁcations can probably be further improved by using temporal information in addition to the spectral signatures.


Introduction
Invasive species cause ecological, economic and social impacts and are key drivers of global change [1]. Prosopis spp., mesquite, which are native to arid and semi-arid zones in the Americas, are among the world's most damaging invasive species. Globally, they are regarded as noxious invaders having substantial impacts on biodiversity, ecosystem services, as well as on local and regional economies in their native and even more so in their invasive ranges [2]. According to Shackleton et al. [2] factors that make many Prosopis species successful invaders are: (a) the production of a large number of seeds that remain viable for decades; (b) rapid growth rates; (c) the ability to coppice after damage [3][4][5]; (d) a root system that taps deep into the groundwater table [6,7]; (e) a high tolerance to climate extremes; (f) a high tolerance to various soil types; and (g) negative allelopathic effects on competing plants [8].
Prosopis spp. were introduced to Kenya by the Food and Agriculture Organization (FAO) in the early 1980s [9,10]. The aims were to prevent desertification, provide an alternative fuelwood to the high demand for Vachellia, and reduce stress on indigenous flora by the human population [1,9]. According to Andersson [10], a large number of Prosopis plantations were created around Lake Baringo propagating Prosopis juliflora (Sw.) DC. and/or the closely related Prosopis pallida (Willd.) Kunth (16 sites) as well as Prosopis chilensis (Molina) Stuntz (17 sites). Additionally, small scale plantations were established for the creation of ornamental plants. A summary of Prosopis plantations in Kenya, derived from the work of Andersson, is provided in Table 1 [10]. Table 1. Summary of initial Prosopis plantations in Baringo county, extent of the site and species, based on the work of Andersson [10].

Initial Location
No. Over time, these species and their hybrid offspring, described by Pasiecznik et al. [11] as the Prosopis juliflora-Prosopis pallida complex, earned themselves a spot on the World Conservation Unions 100 of the "world's worst invasive alien species" list [12]. In our study, we address all Prosopis ssp. and hybrids as Prosopis.
The genus Acacia sensu lato is a large genus consisting of over 900 different species. Since 2005, the African species are recognized as Vachellia [13], as previously the nomenclature identified the African varieties as Acacia spp. Vachellia tortilis (Forssk.) Hayne Galasso and Banfi is among the six endemic Vachellia species present in Baringo [14] and the most dominant prolific species in the study area. The deciduous tree species is utilized for numerous livelihood activities in semi-arid parts throughout Kenya [14], such as: (a) production of charcoal; (b) supply of timber wood; (c) provision of fodder for livestock [9]; and (d) production of honey [14]. Its intensive use causes substantial disturbance on the species and decreases its competitiveness, which makes vast areas susceptible to Prosopis [15]. In this paper, we refer to and simplify all Vachellia subspecies, including the nomenclature re-classification, present at our study site as Vachellia.
Prosopis and Vachellia species have in principle similar characteristics (Figure 1), in that both are drought tolerant with extensive rooting systems. When comparing the roots, Prosopis rapidly develops a complex and vigorous rooting system as a coping mechanism to survive the first dry season [16]. Compared to Vachellia, the species has an extended root depth, reaching deeper aquifers [17]. Both species produce palatable pods [10] which are eaten by many domesticated and wild animals. This enhances seed germination after passing through the digestive system [3] and further propagates the species [18]. A fully developed Prosopis tree yields between 10 and 50 kg of pods annually [19], with approximately 20 to 35 thousand seeds per kg [20]. For comparison, mature Vachellia yields roughly 10 to 12 kg of pods annually containing 12 to 25 thousand seeds per kg [21]. Furthermore, Vachellia trees are subjected to extensive browsing pressure by goats, which stunts young trees in their development [14], while the leaves of Prosopis are non-palatable. Finally, the value of Vachellia charcoal outmatches the value of Prosopis charcoal. All these factors create favorable conditions for Prosopis proliferation.
Remote Sens. 2017, 9,74 3 of 28 Prosopis and Vachellia species have in principle similar characteristics (Figure 1), in that both are drought tolerant with extensive rooting systems. When comparing the roots, Prosopis rapidly develops a complex and vigorous rooting system as a coping mechanism to survive the first dry season [16]. Compared to Vachellia, the species has an extended root depth, reaching deeper aquifers [17]. Both species produce palatable pods [10] which are eaten by many domesticated and wild animals. This enhances seed germination after passing through the digestive system [3] and further propagates the species [18]. A fully developed Prosopis tree yields between 10 and 50 kg of pods annually [19], with approximately 20 to 35 thousand seeds per kg [20]. For comparison, mature Vachellia yields roughly 10 to 12 kg of pods annually containing 12 to 25 thousand seeds per kg [21]. Furthermore, Vachellia trees are subjected to extensive browsing pressure by goats, which stunts young trees in their development [14], while the leaves of Prosopis are non-palatable. Finally, the value of Vachellia charcoal outmatches the value of Prosopis charcoal. All these factors create favorable conditions for Prosopis proliferation.  [11,16,22,23], summarizing the competitive advantages of Prosopis in seed production, root system, leaf and pod palatability.  [11,16,22,23], summarizing the competitive advantages of Prosopis in seed production, root system, leaf and pod palatability.
Remote sensing provides cost-efficient means to assess the distribution of invasive alien plant species and monitor their spread [24,25]. Moreover, it allows assessing areas that are difficult to access. In the past, various attempts have been made to map Prosopis using remote sensing approaches [26][27][28]. Based on the increasing problems caused by Prosopis invasion as well as reported observation that the plant has a distinct spectral response compared to surrounding native vegetation, several mappings have been carried out. In Ethiopia, Wakie et al. [29] have used Moderate Resolution Imaging Spectroradiometer (MODIS) vegetation indices and topo-climatic predictors to map the current distribution of Prosopis using maximum entropy modeling software (Maxent). The performances of the models were evaluated using area under the receiver-operating characteristic (ROC) curve (AUC). The results indicate that the extent the invasion is approximately 3605 km 2 in the Afar region (AUC = 0.94), while the potential habitat for future infestations is 5024 km 2 (AUC = 0.95). Ayanu et al. [30] applied a combination of Landsat and ASTER data and a supervised classification approach (maximum likelihood) to assess the spreading of Prosopis over the last 30 years. In South Africa, Van den Berg et al. [31] used a combination of Landsat satellite and topographic data and developed a decision tree and threshold based approach to map Prosopis. Both methods focused solely on Prosopis vegetation. Their approaches are based on the assumption that during dry season Prosopis is the most photosynthetically active plant species compared to the other vegetation types present in their respective study area. Supervised classifications with the aim of differentiating Prosopis from other classes were conducted by Robinson et al. [32] classifying Prosopis, Eucalyptus and different soil types in Australia using WorldView-2 very high resolution (VHR) satellite data. In Somaliland, Meroni et al. [27] attempted to differentiate several Prosopis sub-classes, native vegetation as well as mixed vegetation classes based on Landsat 8 satellite data applying a Random Forest (RF) classifier. Ng et al. [28] assessed both object-and pixel-based approaches for detecting Prosopis in Somaliland using Landsat 8 data. Their results show higher accuracies using a pixel-based RF classifier as Landsat's spatial resolution and extent of the invasion provided higher inaccuracy using an object-based approach. Besides that, there are a modest number of published works describing moderate success using maximum likelihood classifiers and Landsat data for Prosopis detection within Kenya [33,34].
With respect to Baringo County, few attempts have been made to map Prosopis invasion with remote sensing data [35]. Amboka and Ngigi [35] used Landsat time series to detect Prosopis cover in Baringo between 1985 and 2010 using five-year intervals. Their study applied a maximum likelihood classification and was validated through 250 randomly generated and interpreted reference points. The studies outcomes indicate an irregular distribution over time, with a Prosopis cover of 56% of the total land cover in 2005 and an overall accuracy of 89.5%. These results should be scrutinized carefully and indicate that an improved remote sensing study is warranted. Andersson [10] attempted to monitor Prosopis invasion in Baringo by mapping the initial plantations and quantifying the abundance of indigenous plant species in and around these sites. These two studies utilized local knowledge of the study area and plantation sites, without focusing on remote sensing techniques; plant inventories were made along transects. Other research in Bargino focused on impacts of Prosopis invasion on livelihoods [17], costs of controlling the invasion and local residents' perceptions on Prosopis invasion [36], management, control and utilization of Prosopis [19,20]. However, detailed mapping of Prosopis using higher spatial resolution data and better performing classifiers has not been achieved for Baringo.
Regarding such VHR datasets, imagery under five meter ground sampling distance are labeled as high resolution while imagery under two meter are described as very high resolution [37], two trends can be observed. First, the number of available commercial very high resolution sensors increases steadily, thereby increasing data availability and affordability. Second, Europe's ambitious Currently, Copernicus program provides free-of-charge Sentinel-2 imagery at high spatial resolution (ten meter). Compared to Landsat satellites, the increased spatial resolution of Sentinel-2, as well as commercial VHR satellites, is expected to provide more accurate results [26] and offering the capacity to distinguish between different species in mixed stands. The discrimination between different species in mixed stands is an essential component of the detection of the Prosopis invasion and classification between Vachellia and Prosopis.
Whereas a large number of studies have been published using VHR satellites for vegetation applications, this is not yet the case for Sentinel-2. Due to the sensor's novelty (the first of two identical satellites was launched in June 2015), the full range of its capacities has not yet been fully explored. There are only a limited number of published studies dealing with Object-Based Image Analysis (OBIA) and remote sensing of vegetation using this sensor [38,39]. First impressions with focus on vegetation are described by Immitzer et al. [40] and Radoux et al. [41]. Both studies report good classification results regarding the detection of different tree species, crop types and sub-pixel landscape features such as grass strips and small woody patches within European and US test sites. No study is known employing Sentinel-2 for the mapping of invasive species.
Our research aims at addressing several knowledge gaps and needs: (a) identify a robust and reliable method for differentiating Prosopis from native vegetation types (Vachellia spp.), including mixed classes; (b) produce reliable mapping products having good accuracies, validated with independent reference samples; (c) assess the novel Sentinel-2 sensor for tree species classification and its application in arid and semi-arid environments; and (d) assess the value of free-of-charge Sentinel-2 data, compared to commercial Pléiades data.

Materials and Methods
Maps were created using two different, rather novel, high resolution satellite sensor datasets: Sentinel-2 and Pléiades. The mapping was done using an object-based classification approach and the random forest (RF) classifier. Object-based approaches have distinct advantages when (very) high resolution optical satellite data are used.

Study Area
The study area is located in the lowlands flats around Lake Baringo, between latitudes 00 • 21'N and 00 • 36'N and longitudes 35 • 55 E and 36 • 05 E ( Figure 2). Marigat is the largest town within the region of interest (ROI), located southwest of Lake Baringo and alongside the Perkerra irrigation scheme. The semi-arid [42] study area is characterized by a unique combination of altitude, precipitation, soil and vegetation. The lowland flats are slightly undulating, dominated by rangeland, with an average altitude of 900 m above sea level (a.s.l.) and surrounded by Tugen hills, ridges and plateaus of the Lake Baringo catchment with peaks of over 2300 m a.s.l. [43]. The total annual rainfall range is 300-700 mm, characterized by bimodal distribution with two peaks in April and November [44].
Presently, the vegetation is bushy, dominated by Prosopis and scattered indigenous species (shrubs and trees) of Vachellia spp., Acalypha fruticosa and Balanites aegyptiaca among others. The soils are moderately to poorly drained, very deep, strongly calcareous, saline and sodic in some areas [3]. The texture is fine sandy loam to clay. The population density is 50 persons per km 2 according to 2009 census [45]. The economic activities include livestock, bee keeping, farming and fishing in Lake Baringo. Most lands are held under the communal tenure regime, where land is managed through a common property arrangement [36]. Under this regime, the animals, primarily cows, goats and sheep, are able to move and graze freely. The latter point is important, as a number of studies have indicated that livestock play a major role in contributing to the spread of Prosopis [3,18].

Satellite Data
For this research, we tested satellite datasets of two different sensors, Sentinel-2 and Pléiades. Sentinel-2 provides free of charge high spatial resolution data at ten meter, whereas Pléiades is a commercial sensor that captures data at very high spatial resolution at two meter ( Table 2). Although the acquisition dates of the two images are one year apart, we believe that the two datasets can be well compared as they were acquired within one week of the same season. As a result, climatic and phenological conditions are probably very similar. The scenes have been selected, because both are free of cloud cover.
Pléiades-1A is a commercial sensor operated since December 2011 by Centre National d'Etudes Spatiales (CNES) with a very high spatial resolution of 2 m for four multi-spectral (MS) bands (red, green, blue and near infrared) and one panchromatic (PAN) band at 0.5 m (Figure 3). The sunsynchronous sensor has a revisit time of 26 days and has exceptional roll, pitch and yaw agility, enabling the system to maximize the number of acquisitions above a given area. For analysis, the four multi-spectral bands at original 2 m spatial resolution were used. Pan-sharpened true color Pléiades image (using subtractive resolution merge in Erdas Imagine [46]) of the study area centered around Marigat (left), including the Prosopis plantations [10] and in situ collected reference dataset used for training the classifier. Map of Kenya (upper right) with location of ROI and Digital Elevation Model (lower right) generated from Pléiades stereo imagery along with roads and waterways.

Satellite Data
For this research, we tested satellite datasets of two different sensors, Sentinel-2 and Pléiades. Sentinel-2 provides free of charge high spatial resolution data at ten meter, whereas Pléiades is a commercial sensor that captures data at very high spatial resolution at two meter ( Table 2). Although the acquisition dates of the two images are one year apart, we believe that the two datasets can be well compared as they were acquired within one week of the same season. As a result, climatic and phenological conditions are probably very similar. The scenes have been selected, because both are free of cloud cover.
Pléiades-1A is a commercial sensor operated since December 2011 by Centre National d'Etudes Spatiales (CNES) with a very high spatial resolution of 2 m for four multi-spectral (MS) bands (red, green, blue and near infrared) and one panchromatic (PAN) band at 0.5 m ( Figure 3). The sun-synchronous sensor has a revisit time of 26 days and has exceptional roll, pitch and yaw agility, enabling the system to maximize the number of acquisitions above a given area. For analysis, the four multi-spectral bands at original 2 m spatial resolution were used.  Sentinel-2 is a state of the art sensor launched on 23 June 2015 by the European Space Agency (ESA). Sentinel-2 mission is a land monitoring constellation of two identical satellites (Sentinel-2A and Sentinel-2B) that deliver high resolution optical imagery. The system provides a global coverage of the Earth's land surface and is characterized by its high revisit time (ten days with one satellite and 5 days when Sentinel-2B becomes operational; its launch is scheduled for 2017). The system is designed to collect data at 10 m (blue, green, red and near-infrared-1) and respectively, 20 m (red edge 1 to 3, near-infrared-2, short wave infrared 1 and 2). Three additional bands for atmospheric correction are collected at 60 m (for retrieval of aerosol, water vapor, cirrus), resulting in a total of 13 spectral bands ( Figure 3). For our study we excluded the bands having 60 m resolution and resampled all bands to 10 m spatial resolution, effectively down-scaling one 20 m × 20 m pixel into four 10 m × 10 m pixels.

Sensor Spatial Resolution Acquisition Date Cloud Cover (%) Bands
The Sentinel-2 data were corrected for atmospheric effects by applying the Sen2Cor version 2.2.1 software [47]. The atmospheric correction transforms the Top-Of Atmosphere level 1C product to Bottom-Of Atmosphere atmospheric and terrain corrected product (level 2A) and its usability are described by Vuolo et al. [48]. The Pléiades data were atmospherically corrected to a Top-Of-Canopy reflectance dataset using the optical calibration tool implemented in Orfeo toolbox (OTB) [49] which acquires the spectral sensitivity data from the metadata for calibration.

Reference Data
We applied two reference dataset in our study; one for training and another for validation. Field observations for the training data ( Figure 2) were collected in May 2016 with a handheld Trimble Juno 3, achieving accuracies between three to five meter. Consisting of approximately 100 GPS-based geo-referenced field data points for Prosopis (n = 43), Vachellia spp. (n = 24) and the mixed classes (n = 31), distributed equally over the study area. For each field data point, we registered species composition (focusing on Prosopis and Vachellia), radius of the plot (10-60 m), fractional vegetation cover, soil type as well as elevation. All field data points were complemented with ample field photographs, taken from the target as well as cardinal directions. Later on, we used the field photographs and Google Earth imagery [50] to complete the training dataset for not surveyed classes such as water and soils. The classification scheme employed for this study is shown in Table 3. Besides a "full" classification with 12 different classes, we also re-classified (regrouped) the original classes into four broad classes (Table 3). Sentinel-2 is a state of the art sensor launched on 23 June 2015 by the European Space Agency (ESA). Sentinel-2 mission is a land monitoring constellation of two identical satellites (Sentinel-2A and Sentinel-2B) that deliver high resolution optical imagery. The system provides a global coverage of the Earth's land surface and is characterized by its high revisit time (ten days with one satellite and 5 days when Sentinel-2B becomes operational; its launch is scheduled for 2017). The system is designed to collect data at 10 m (blue, green, red and near-infrared-1) and respectively, 20 m (red edge 1 to 3, near-infrared-2, short wave infrared 1 and 2). Three additional bands for atmospheric correction are collected at 60 m (for retrieval of aerosol, water vapor, cirrus), resulting in a total of 13 spectral bands ( Figure 3). For our study we excluded the bands having 60 m resolution and resampled all bands to 10 m spatial resolution, effectively down-scaling one 20 m × 20 m pixel into four 10 m × 10 m pixels.
The Sentinel-2 data were corrected for atmospheric effects by applying the Sen2Cor version 2.2.1 software [47]. The atmospheric correction transforms the Top-Of Atmosphere level 1C product to Bottom-Of Atmosphere atmospheric and terrain corrected product (level 2A) and its usability are described by Vuolo et al. [48]. The Pléiades data were atmospherically corrected to a Top-Of-Canopy reflectance dataset using the optical calibration tool implemented in Orfeo toolbox (OTB) [49] which acquires the spectral sensitivity data from the metadata for calibration.

Reference Data
We applied two reference dataset in our study; one for training and another for validation. Field observations for the training data ( Figure 2) were collected in May 2016 with a handheld Trimble Juno 3, achieving accuracies between three to five meter. Consisting of approximately 100 GPS-based geo-referenced field data points for Prosopis (n = 43), Vachellia spp. (n = 24) and the mixed classes (n = 31), distributed equally over the study area. For each field data point, we registered species composition (focusing on Prosopis and Vachellia), radius of the plot (10-60 m), fractional vegetation cover, soil type as well as elevation. All field data points were complemented with ample field photographs, taken from the target as well as cardinal directions. Later on, we used the field photographs and Google Earth imagery [50] to complete the training dataset for not surveyed classes such as water and soils. The classification scheme employed for this study is shown in Table 3. Besides a "full" classification with 12 different classes, we also re-classified (regrouped) the original classes into four broad classes (Table 3).
Regarding the second (validation) dataset, approximately 200 GPS points were collected from a parallel study using a handheld Garmin GPS. This dataset is completely independent from the training dataset and is not used for RF training. The independent reference samples were acquired as part of the "Woody Weeds" project [51]. The "Woody Weeds" project studies invasive woody alien species in several East African countries; the present study contributes to the ongoing research in Baringo. The reference sites for Prosopis and Mixed classes, of whom a number were ongoing Prosopis test plots, were visited in 2015. Its GPS point was located in the center of a preliminary defined 15 m by 15 m plot. These points contained information on fractional vegetation cover, elevation, soils, land use and distance from water. This information was interpreted and grouped into four classes ( Table 3, Prosopis 1-2, Vachellia 5-6, Mixed 3-4 and Other 7-12) with each class represented by about 50 reference points. We matched the GPS points with the corresponding segments generated with LSMS. The remaining classes, Vachellia and Other were photo interpreted. The validation dataset provides an excellent means to validate our classification outputs and permits the generation of training sample independent confusion matrices.

Segmentation
Research has shown [52] that object-based image analysis (OBIA) often provides benefits over pixel-based approaches in studies using high to very high resolution datasets and where pixels are significantly smaller compared to the analyzed landscape features. By extracting the information at object level, OBIA reduces salt-and-pepper effects, and enables the extraction of spectral and textural properties not available for pixel-based approaches [53].The mean shift algorithm is a non-parametric, feature-space analysis technique for locating the maxima of a density function established by Fukunaga and Hostetler [54]. The algorithm extension to the spatial domain was proposed by Comaniciu and Meer [55]. For the present study, we applied the Large Scale Mean Shift (LSMS) segmentation provided by Michel et al. [56] implemented in the open source software OTB version 5.4.0 as it provides stable segmentation results [57] and does not require a priori knowledge. The algorithm requires three parameters: (a) Spatial Radius (spatial distance); (b) Range Radius (spectral difference); and (c) Minimum Size (merging criterion).
The first step in the segmentation process consists of scaling the raster values by normalization through standard score, using the mean and stand deviation to correct outliers. To assess the segmentation outputs and fine tune the parameterization, we digitized a set of reference objects ( Figure 4) representing either tree crowns or single species stands for one-to-one spatial correspondence or matching test [58]. The best fitting match was empirically determined by comparing objects overlap with reference polygons. Table 4 provides an overview of the selected parameters for each sensor. The minimum size was respectively set to ten, which corresponds to a polygon measuring ten pixels, for Pléiades and four for Sentinel-2 to: (a) reduce the amount of polygons; and (b) exclude small segments which not render any useful statistical information.   Table 4. Table 4 provides an overview of the selected parameters for each sensor. The minimum size was respectively set to ten, which corresponds to a polygon measuring ten pixels, for Pléiades and four for Sentinel-2 to: (a) reduce the amount of polygons; and (b) exclude small segments which not render any useful statistical information. To obtain optimum classification results, we used for both datasets a large set of spectral and textural features. A number of well documented indices (Table 5) were generated depending on the available spectral bands of the satellite images. The selected Vegetation Indices (VI) assess specific characteristics of the vegetation (e.g., vegetation cover, water content, and leaf chlorophyll content). Eighteen VIs were calculated for Sentinel-2 and five indices for Pléiades.   Table 4.

Spectral and Textural Features
To obtain optimum classification results, we used for both datasets a large set of spectral and textural features. A number of well documented indices (Table 5) were generated depending on the available spectral bands of the satellite images. The selected Vegetation Indices (VI) assess specific characteristics of the vegetation (e.g., vegetation cover, water content, and leaf chlorophyll content). Eighteen VIs were calculated for Sentinel-2 and five indices for Pléiades.   A second series of features consists of textural information. When performing OBIA, texture can add valuable information, thus increasing the accuracy [76][77][78][79]. For the purpose of our study, such features were generated based on the coiflet wavelet transformation [80]. For every spectral band we used four transformation levels and produced the mean of horizontal (H), vertical (V) and diagonal (D) detail coefficients, by applying the Wavelet Toolbox in MATLAB 7.13.0 [81]. In addition, we calculated summary statistics per object (mean, standard deviation and percentiles), also referred to as features, for each of the spectral bands, indices and texture layers.

Random Forest and Feature Selection
We applied a non-parametric Random Forest (RF) classifier [82]. RF is a high performance state-of-the-art machine learning algorithm based on an ensemble of decision trees. It has many benefits compared to traditional classifiers [82][83][84] such as: (a) being relatively insensitive to the number and multi-collinearity of input data; (b) making no assumptions about distributions; (c) providing information about the importance of input variables; and (d) achieving reliable and stable results. Li et al. [85] described how RF and feature selection provide improved accuracy and reduced computing times compared to Support Vector Machines (SVM) and pixel based analysis for detecting landslides with LiDAR data. Li et al. [86] demonstrate that RF generates greatest overall accuracy compared to SVM and artificial neural networks for classifying surfaced-mined and agricultural landscapes.

Accuracy Assessment
In addition to using the OOB accuracy as a measure for assessing the results, we provided an independent validation using the class information of pre defined GPS points (Section 2.2.2). The reference points were matched to the corresponding segments in both classifications. To ensure an appropriate sample size we calculated a multinomial distribution (Equation (1)) as described by Congalton and Green [90]. To achieve a confidence level of 85% we supplemented the GPS points with additional orthophoto interpreted polygons, a higher number of reference data points and thus confidence was unattainable for the ROI. The number of samples (n) is determined by value determined from the chi-square table ( ), the percentage of occurrence class i in the test area ( ) and the required precision of the land cover map ( ).
First, a confusion matrix was constructed based on the sample counts, scoring the classifier on a simple true or false match between the validation dataset and the classification results. To produce a more robust validation the size of the validation polygons needs to be considered [91][92][93]. Therefore, we constructed a new confusion matrix considering the area of all classified reference polygons. Finally, we applied a post-stratified estimator (Equation (2)) to provide area proportions as proposed by Olofsson et al. [91]. The sample based estimator ( ) is determined by using the proportion of the area mapped as class i (Wi), the counts normalized by area ( ) and the total sample size normalized by area per map class ( . ).

= .
(2) To perform our object-based classification, we used the R package "randomForest" developed by Liaw and Wiener [87]. The processing chain ( Figure 5) was automated by using a script developed in the open source statistical software R Version 3.2.3 [88].
RF is known to benefit from a reduced number of features [82]. For feature selection, the feature importance was calculated as Mean Decreasing Accuracy (MDA). MDA is automatically generated within RF by running the model and systematically testing which features impact most (in a negative sense) the Out-Of-Bag (OOB) accuracy of the classification, if left out. The MDA values were then used for feature ranking and selection following approaches described in [76,78,89]. By performing the feature selection the model was optimized and the accuracy increased. In addition, by reducing the number of features we improved the processing performance.

Accuracy Assessment
In addition to using the OOB accuracy as a measure for assessing the results, we provided an independent validation using the class information of pre defined GPS points (Section 2.2.2). The reference points were matched to the corresponding segments in both classifications. To ensure an appropriate sample size we calculated a multinomial distribution (Equation (1)) as described by Congalton and Green [90]. To achieve a confidence level of 85% we supplemented the GPS points with additional orthophoto interpreted polygons, a higher number of reference data points and thus confidence was unattainable for the ROI. The number of samples (n) is determined by value determined from the chi-square table (B), the percentage of occurrence class i in the test area (V i ) and the required precision of the land cover map (b i ).
First, a confusion matrix was constructed based on the sample counts, scoring the classifier on a simple true or false match between the validation dataset and the classification results. To produce a more robust validation the size of the validation polygons needs to be considered [91][92][93]. Therefore, we constructed a new confusion matrix considering the area of all classified reference polygons. Finally, we applied a post-stratified estimator (Equation (2)) to provide area proportions as proposed by Olofsson et al. [91]. The sample based estimator (p ij ) is determined by using the proportion of the area mapped as class i (W i ), the counts normalized by area (n ij ) and the total sample size normalized by area per map class (n j. ).

Prosopis and Vachellia Maps
The 12-class map products of our land cover classifications ( Figure 6), display well the situation encountered in the study area and described in the literature [11,94]. Prosopis was found colonizing urban centers ( Figure 6C,D) and near waterbodies ( Figure 6E,F). The maps also show that the area to the East, which is lower in elevation, highly populated and dominated by agriculture, is heavily invaded by Prosopis. The largest agricultural area (Perkerra irrigation scheme) is classified under agriculture (cropped fields), bare soils (fallow fields) and other vegetation. Prosopis follows, in both classifications, a pattern where it grows along waterbodies and irrigated agricultural areas.
We generalized our results into four broader classes (Section 2.2.2; Table 3) by re-classifying the original land cover classification with twelve classes (Figure 7 and Table 6). Displaying solely the targets for our study, being Prosopis and Vachellia cover, as well as including the areas containing both species (Mixed), and grouping all the non-Prosopis-Vachellia classes together (Other). The Prosopis, Mixed and Vachellia classes are more abundant in the Sentinel-2 classification ( Figure 7B) compared to the Pléiades classification where particularly other non-Prosopis-Vachellia classes are more frequent ( Figure 7A). The spatial distribution of the Mixed class is comparable in both results, however the Sentinel-2 classification indicates higher abundance (Table 6). There appears to be a dissimilarity between class Other in the Pléiades results and class Vachellia in the Sentinel-2 results. The difference is more apparent in lower (east) then higher (west) elevated areas ( Figure 7C,D). Finer details are more evident within the Pléiades classification as compared to Sentinel-2 which aggregates small vegetated patches ( Figure 7E,F). Notably, Prosopis is apparent along nearly all the rivers in both classification results ( Figure 7A,B).     Table 3. The margin map (Figure 8) was generated by differencing the probabilities of the first (majority vote) and second most often assigned class [95,96]. Values range from 0 to 1 indicating low to high confidence in the attribution to the class. For both datasets the larger Vachellia forest (west), water (Lake Baringo) and the bare agricultural areas receive very high probabilities. On the contrary, some of the sparsely vegetated areas have lowest probability. The margin map (Figure 8) was generated by differencing the probabilities of the first (majority vote) and second most often assigned class [95,96]. Values range from 0 to 1 indicating low to high confidence in the attribution to the class. For both datasets the larger Vachellia forest (west), water (Lake Baringo) and the bare agricultural areas receive very high probabilities. On the contrary, some of the sparsely vegetated areas have lowest probability.

Feature Importance
The MDA (Figure 9) highlights the feature importance. For each sensor, Figure 9 displays the 15 most important features and differentiates between spectral bands, indices and texture. In both datasets, only seven bands contribute to the 15 most important features. For Pléiades (Figure 9A), the most influential feature is the 80 percentile of the Level 1 wavelets of the blue band. Interestingly, this band appears four more times in the highest ranked feature list. Other significant textural bands for the Pléiades classification are Level 2 NIR and NDVI. The NIR band is also important as it occurs five times in the top ranking features. From the indices only, the DNWI and the WDVI were amongst the top 15. For Sentinel-2 ( Figure 9B), the Green Ratio index (GR)

Feature Importance
The MDA (Figure 9) highlights the feature importance. For each sensor, Figure 9 displays the 15 most important features and differentiates between spectral bands, indices and texture. In both datasets, only seven bands contribute to the 15 most important features. For Pléiades (Figure 9A), the most influential feature is the 80 percentile of the Level 1 wavelets of the blue band. Interestingly, this band appears four more times in the highest ranked feature list. Other significant textural bands for the Pléiades classification are Level 2 NIR and NDVI. The NIR band is also important as it occurs five times in the top ranking features. From the indices only, the DNWI and the WDVI were amongst the top 15. For Sentinel-2 ( Figure 9B), the Green Ratio index (GR) generates the highest increases in overall accuracy and appears four times in the list. The most often found spectral bands are blue, green and NIR1. The Level 1 NDVI, blue and red coiflets are also recognized as important textural features. Table 7 provides a comprehensive overview of the performed accuracy assessments. The table is divided into Pléiades and Sentinel-2 results and displays the number of independent reference polygons used for the sample count and area proportioned confusion matrices. The area (W i ) depicts the predicted map coverage of each class, while average size shows the area (ha) of the average reference polygon per class. The Out Of Bag (OOB) confusion matrix is provided by the RF classification. To match the OOB matrix to the independent validation we re-classified the 12 classes (Table A1) providing the overall, User's (U i ), and Producer's (P i ) accuracies for the four re-classified classes. The sample count shows the overall, U i and P i accuracies of the independent validation confusion matrix before area correction. Subsequently, the area proportion presents the area corrected results in terms of overall, U i and P i accuracies. All confusion matrices can be found in Table A2.    When assessing the two sensors, Pléiades provides the best overall accuracies for the re-classified map (OOB: 0.91, sample count: 0.88, area prop: 0.95). Sentinel-2 achieves similar results with a higher variability (OOB: 0.85, sample count: 0.79, area prop: 0.96). Most notable differences are the Pléiades for the area proportionate accuracies for the classes Prosopis (U i : 0.65) and Mix (P i : 0.58) and for the Sentinel-2 sample count results for Other (P i : 0.67).

Accuracy Assessment
Comparing the validation methods no substantial difference are observed between the OOB, sample count and area proportionate accuracies. Essentially, the area proportionate accuracies are highest and are closely matched with the OOB (second highest accuracy), while the sample count accords for the lowest overall accuracy. Finally, the included information of the validation polygons size and area indicates that the Prosopis and Mixed class mainly consists of small polygons, while the Vachellia and Other class composes of larger polygons dominating the classification results in terms of coverage. The correlation between polygon size and classes is also persisting in the map product.

Map Product and Spatial Distribution
The mapped spatial distribution of Prosopis and Vachellia (Figure 7) matches literature [10,26,31] and in situ observations. Prosopis is mainly found at the lowest and wettest areas of the ROI, collecting the majority of runoff water and containing highest population density. Runoff water and density correlate with the amount of cattle, a major disperser and propagator of Prosopis seeds [11]. This is for example illustrated by the abundance of Prosopis: (a) in and near the town centre of Marigat; (b) alongside the roads; (c) surrounding the Perkerra irrigation schemes' seasonally fallow and grazed fields; and (d) invading free and fertile land made accessible by the retreating water of Lake Baringo.
Vachellia is more abundant on the less populated and steeper slopes of the ROI, which are less utilized and grazed. As there is lower population pressure and disturbance, the established native plant communities prevent Prosopis to invade these areas [15]. As mentioned above, Prosopis is mainly found concentrated along the waterways as described by literature [26,27,31].
Some higher elevated areas along the river beds (west of ROI) surveyed during the fieldwork have no Prosopis. However, the maps indicate some small pockets of Prosopis, these are confused with vegetation which remains evergreen due to its proximity to the river. However, the subdominant evergreen tree species Balanites aegyptiaca, appearing similar to Prosopis, is also found along these riparian zones and the lowlands. Furthermore, we found two sites on the Western slopes where the Prosopis was introduced in a plant breeding effort.

Map Comparison
Comparing both detailed (12 classes) land cover classifications (Figure 6), we achieved the best mapping results for pure Prosopis and Vachellia on Pléiades data and the sensor has an improved capacity to detect smaller features such as build-up areas. Sentinel-2 provided the best results for the Mixed classes. The overall accuracies (Pléiades: 0.88-0.95; Sentinel-2: 0.79-0.96) are comparable to other studies [26][27][28] and the class specific results are in line and outperform the accuracies (Landsat 8: OOB: 0.71) shown in Ng et al. [28]. Both maps display significant differences in terms of land cover. Visual interpretation of both products suggests the Pléiades map to be of higher accuracy. Several factors contribute to the findings.

Impact of Spatial Resolution
Some differences between the two maps can be attributed to the spatial resolution influencing the object-based results, as described by Immitzer et al. [40,97] and Mirik and Ansley [98]. The differences in resolution, two meters versus ten meters, translated in some of the LSMS polygons consisting in a mix of classes. This affected the Sentinel-2 dataset to a greater extent, resulting in segments containing a mix of soils and sparse vegetation. Obviously, RF will assign a single class to the entire segment, thus homogenizing the polygon and dismissing the other land cover type(s) present within the polygon.
For Baringo, this had a negative impact on the soils and sparsely vegetated Prosopis and Vachellia classes (Pr and Va < 50%), which were classified as either vegetated or bare soils. This outcome originated from the LSMS segmentation, as the parameter selection favors crown detection in dense and mixed stands, while soils, containing a wide ranges of (spectral) information (sands, rocks, wood and shrubs), are segmented into large polygons. This effect increased with decreasing spatial resolution, negatively influencing the Sentinel-2 classification results. In the ten meter resolution dataset, the shrubs and most of the single trees are too small to be detected and fall outside the minimum object size of the LSMS segmentation, set at four pixels (400 m 2 ) (Figure 4). On the contrary, in case of the Pléiades data, its spatial resolution of two meter permits the segmentation to detect and polygonize single tree crowns and allows to distinguish small vegetated patches from soils. The mean polygon sizes (Table 7) demonstrate this issue, showing that the Pléiades segmentation consists of smaller polygons. Furthermore, very large polygons are challenging to establish as reference as they most likely consist of many classes [99]. Furthermore, these huge objects strongly contribute to the variance of the object size distribution [100].

Impact of Spectral Resolution
The spectral resolution had some noticeable influences on the classification results. For the Sentinel-2 data, we could use ten spectral bands, while for the Pléiades we only had four bands at our disposal. This translated into largely different feature sizes when generating the indices and texture parameters. The increased spectral resolution of Sentinel-2 compared to Pléiades partly compensated the pixel size effects and sometimes permitted better OOB accuracies for some classes as described by Immitzer et al. [40] and Duro et al. [101]. This was particularly the case for the area proportionate assessment for the U i of Mix, Vachellia and Other and the P i for Prosopis, Vachellia and Other.

Impact of Acquisition Dates
The Pléiades data were acquired one year before the fieldwork and the Sentinel-2 data. The difference has the potential to affect the results [102] as some changes might have happened in classes such as agriculture and bare soils, due to crop rotation and yearly precipitation fluctuations: • Pléiades results show for example a higher proportion in the class Other, originating from the soil classes, as compared to Sentinel-2. There could be a relation between the larger exposure of bare soils and the utilization of both Prosopis and Vachellia for charcoal. In the years 2013-2015, the preliminary testing of the Cummins biogas power plant was initiated. The plant was established in Marigat to generate electricity from Prosopis, however, activities were discontinued the next year [103,104]. Hence, the increase in the sparsely vegetated classes in the Sentinel-2 data (2016) could be regrowth of the cleared area. • Some of the Prosopis invaded areas visible in the 2015 (Pléiades) data were observed as cleared during the fieldwork. At the same time, some previously none invaded areas (mainly bare soils) had a Prosopis presence when visited in the field. Doing the fieldwork, we were able to determine from the height of Prosopis seedlings that they were half a year old, thus only detectable in one of the datasets (Sentinel-2).

•
The sparsely vegetated Vachellia class and soil classes are often confused, because Vachellia exists as shrubs in some areas due to intense grazing. These misclassifications can be explained by the spectral similarity of Vachellia and the soil classes during the dry season in which the datasets were acquired. During the dry period, being a deciduous tree species, Vachellia is shedding its leaves, thus resembling soil. • An additional factor potentially explaining the differences between the class Other and vegetated classes (or soils and sparse vegetation) in 2016, can be attributed to the heavy rainfall coinciding with the 2016 El Niño-Southern Oscillation (ENSO). ENSO is known facilitating fast proliferation of Prosopis and Vachellia [105,106]. • Observed differences in the Water class are mainly caused by the slowly retreating water level, a consequence of the sudden increase of the water level of Lake Baringo in 2010 [107].
A possible solution for these issues is to either use contemporaneous data or use multi-annual time series [27,108,109].

Validation and Reference Data
The area proportionate validation, based on the independent reference dataset, indicated overall very high accuracies. The results showed, in contrast to the other accuracy assessment schemes, higher accuracies for the Sentinel-2 maps. We determined that the high in-accuracy of the Pléiades classification persisting in the U i , is due to the misclassification of large polygons. The effects occurred predominantly in the Prosopis and Mix classes. For example in the Prosopis class 49 polygons (or 86%) were correctly classified and 8 polygons (or 14%) were misclassified. After normalizing the results for map proportion and area, the eight misclassified polygons accounted for 45% while the 49 correctly classified polygons for 55%.
We understand that, to truly independently assess the map accuracy, a more rigorous random sampling scheme needs to be applied. Due to logistical and time constraints, we utilized a set of established sites for validation. The limited number of validation points, providing a confidence level of 85%, was geo-tagged by GPS in the center of preliminary defined field plots. Some of the reference samples indicate low vegetation cover in the inventory's description, but were actually not vegetated in the satellite data. We observed that some areas were recently cleared, as discussed above. We know from literature that Prosopis seedlings may grow up to one meter and become a shrub within a single year [110,111]. Consequently, the status captured at acquisition date by the satellites and the situation on the ground during field data collection may have changed and forced us to omit the point or substitute it with an adjacent point through orthophoto interpretation.
Another difficulty was the possibility of multiple classes occurring at a given reference sample, as one sample does not necessarily aligns with a single segment created through LSMS. For example, within the 15 m by 15 m Prosopis field plot, and keeping in mind the (in) accuracy of the GPS device, the classes Prosopis, Mixed and Other could all occur, leading to an inconclusive differentiation in the high spatial resolution Pléiades classification. Due to the lower spatial resolution of Sentinel-2, one pixel would cover almost the entire plot. To mitigate this issue we either excluded or moved the point to a nearby area fitting its description. If necessary, this step was performed based on pan-sharpened Pléiades data (0.5 m spatial resolution).

Random Forest
The robust random forest (RF) classifier provided fast and good results, as established by many studies [28,40,61,78,96]. The processing chain can be easily repeated and applied to other studies with nearly no fine tuning. The feature selection improved our results (not shown) and made it possible to add a large amount of features which were screened for their respective discriminative power. The MDA feature importance (Figure 9) indicated that besides the traditional spectral bands, indices and texture had significant value improving our results. The best 15 features of the Pléiades dataset had for example six spectral bands and wavelets, and three indices. We attribute the lower number of indices to the lack of spectral bands of the sensor. The Sentinel-2 feature importance was more influenced by the spectral data, selecting eight spectral bands, four indices and three wavelets. Interestingly, the two most important features were generated from the GR index. The importance and advantages of indices is well described by the literature [59,61,63,78], while the significance of texture matches the observations of Toscani et al. [76] and Koger et al. [112].

Conclusions
In this study, we aimed at producing a highly accurate vegetation type map that differentiates Prosopis from native vegetation types and mixed vegetation classes in a region in Baringo County that is affected by Prosopis invasion. We did this by testing and comparing classification results of two different satellite sensors, Sentinel-2 and Pléiades. The following conclusions can be drawn.
For the Pléiades classification result, features based on the blue and near-infrared band contributed most to the high accuracy, besides features based on vegetation indices. Particularly the level one coiflet of the blue band is considered important. The significance of the blue band was reported by a number of studies [102,113,114]. For the classification based on Sentinel-2, the features based on blue and near-infrared bands were again important, while the green band also contributed significantly. Besides that, Green Ratio-and to a lesser extent-NDVI were important. Overall, recurrent features in both datasets are the blue band, near-infrared band, and coiflet level 1 of the blue band. This demonstrates that the blue band deserves more attention for such vegetation related studies, in particular as the atmospheric correction is nowadays well handled.
The higher spectral resolution of Sentinel-2 on the other hand has a very positive influence on the accuracies, scoring better or equal OOB results of the vegetation classes as the higher spectral information contributes to an improved detection and differentiation of vegetation cover and types.
When comparing the classification results and the achieved class accuracies of both satellite datasets, we can observe that the higher spatial resolution of Pléiades contributes a lot to the level of detail and thus a high accuracy of the derived segmentation and classification result. Particularly, higher OOB and sample count accuracies for classes that are sensitive to the spatial resolution (such as bare soils, sparsely vegetated areas and build-up areas) were obtained.
There are some larger differences in Vachellia cover, attributed to the segmentation process and differences in spatial resolution. The dissimilarities are not between the Prosopis and Vachellia classes but between soils and sparse Vachellia vegetation in the Sentinel-2 data. We therefore conclude that Pléiades data provide the highest class specific accuracy while Sentinel-2 presents the highest map accuracy (area proportionate assessment). The high accuracy correlates with the larger polygons, found in Sentinel-2, and the possibility of classification bias towards large polygons. More research is warranted to investigate the influence of large polygons on classification results.
We also demonstrate that using the free of charge Sentinel-2 data and open source software is a viable alternative achieving promising results (overall accuracy: 0.79-0.96). Further improvements to the parameterization of the Large Scale Mean Shift algorithm could improve the segmentation and classification results. The authors suggest assessing the feasibility of multi-temporal data, which could reveal change dynamics and phenological information, mitigating the bare soils and sparse Vachellia misclassification caused by leaf senescence. Sentinel-2 seems perfectly suited for such kind of study, in particular once the twin satellite is launched (Sentinel-2B) and data are collected at five-day intervals. There is also some potential of incorporating hyper spectral data to better delineate the different tree species.
More research is warranted regarding the invasion pattern of Prosopis since its introduction in the 1980s. A suitable research design is necessary to combine different data sources (e.g., Landsat 5 to 8 with Sentinel-2) and to avoid pitfalls when doing post-classification change detection.