Mapping and Monitoring of the Invasive Species Dichrostachys cinerea (Marab ú ) in Central Cuba Using Landsat Imagery and Machine Learning (1994–2022)

: Invasive plants are a serious problem in island ecosystems and are the main cause of the extinction of endemic species. Cuba is located within one of the hotspots of global biodiversity, which, coupled with high endemism and the impacts caused by various disturbances, makes it a region particularly sensitive to potential damage by invasive plants like Dichrostachys cinerea (L.) Wight & Arn. (marab ú ). However, there is a lack of timely information for monitoring this species, as well as about the land use and land cover (LULC) classes most significantly impacted by this invasion in the last few decades and their spatial distribution. The main objective of this study, carried out in Central Cuba, was to detect and monitor the spread of marab ú over a 28-year period. The land covers for the years 1994 and 2022 were classified using Landsat 5 TM and 8 OLI images with three different classification algorithms: maximum likelihood (ML), support vector machine (SVM), and random forest (RF). The results obtained showed that RF outperformed the other classifiers, achieving AUC values of 0.92 for 1994 and 0.97 for 2022. It was confirmed that the area covered by marab ú increased by 29,555 ha, from 61,977.59 ha in 1994 to 91,533.47 ha in 2022 (by around 48%), affecting key land covers like woodlands, mangroves, and rainfed croplands. These changes in the area covered by marab ú were associated, principally, with changes in land uses and tenure and not with other factors, such as rainfall or relief in the province. The use of other free multispectral imagery, such as Sentinel 2 data, with higher temporal and spatial resolution, could further refine the model’s accuracy.

Invasive exotic plants are non-native species that establish and disperse in areas outside their region of origin, generating a negative impact on their ecosystem, economy, and social well-being [1].According to [2], these introduced species have been named in various ways: non-indigenous, alien, non-native, foreign, exotic, transplanted, and nonnative species.These species are considered to be the second greatest threat to biodiversity and the cause of the extinction of numerous species around the world [3,4].
Human migration is considered one of the main causes of the introduction of species outside their regions of origin [5].Globalization has accelerated the dispersal of species thanks to animal trade and exports of agricultural products [6].Most of these species have been scattered to many parts of the world, including small islands (such as Cuba).The greatest manifestation of endemism is shown on islands, which, being surrounded by oceans, act as a barrier to the dispersal of continental plants [7].According to [8], invasions on islands are linked to the threat suffered by endemic species; however, a notable information gap has been observed on most islands and archipelagos.The Bahamas, the Greater Antilles, and the Lesser Antilles, which make up the largest group of Caribbean islands, represent the most important island system in the New World and are considered to be a global priority for conservation [9].
Cuba is located within one of the hotspots of global biodiversity, which, coupled with high endemism and the impacts caused by various disturbances, makes it a region particularly sensitive to potential damage by invasive plants [10].Marabú is a species native to Africa and Asia which, in the mid-19th century, was introduced to Cuba [10].It is a shrub about five meters high, with a solid trunk, made of very hard wood.This is an invasive plant that spreads and grows rapidly, even in irregular or unfertile terrain, and therefore ends up occupying large areas [11].
Presently, Cuba is greatly affected by the invasion of three species belonging to the Fabaceae family: Dichrostachys cinerea (L.) Wight & Arn., Mimosa pigra (L.), and Vachellia farnesiana (L.) [12].The first, colloquially known as marabú, constitutes a unique example of the devastating consequences caused by invasive species [13][14][15].According to [16], marabú has posed a threat to agricultural production in Cuba since 1911.The marabú scrubland is currently one of the greatest concerns in the country since in 1996 it already occupied approximately 1.5 million hectares of land in Cuba [10], including 18% of the agricultural areas and 56% of the livestock areas.The same author pointed out that there was a significant increase in the infected areas of marabú in Cuba, which grew from around 268,000 ha in 1946 to 402,000 ha in 1958.However, despite the fact that for 30 years ) different methods were used to eradicate this plant, the area covered by marabú in Cuba remained between 528,000 and 660,000 hectares.[10].Starting in 1990, the economic crisis on the island worsened and therefore the means and techniques used for its control, which were generally expensive, were no longer implemented, so the speed of expansion of marabú multiplied throughout the country [10].Some authors [17][18][19] have reported that the development and expansion of marabú in Cuba are related to some environmental factors (precipitation distribution, relief, soil type, etc.) and human factors (deforestation, changes in land use, etc.), but there were no clear conclusions.Furthermore, estimates of the area occupied by marabú have been very inaccurate, mainly using direct observational methods in representative plots [17].
Early identification and cartographic representation of the invasive species are of paramount importance in the formulation of efficient management strategies and the mitigation of further expansion into non-invaded areas [19].In addition, using remote sensing enables the quantification of invasion spread rates and patterns and facilitates the evaluation of the effectiveness of various management approaches [20,21].Some studies of marabú have been conducted by various researchers using remote sensing in parts of Cuba, such as in the Sancti Spíritus province [18,22], in the Havana province [17], and the Camagüey and Granma provinces [23].In the Ciego de Ávila province, [24] a remote sensing methodology to study areas covered by marabú was proposed; however, the authors of that study did not show any results related to the spatial distribution of the species nor temporal changes.Thus, there is a lack of timely information for monitoring marabú, as well as about the LULC classes most significantly impacted by this invasion in the last few decades and their spatial distribution.

Remote Sensing for Invasive Species Monitoring
The objective of remote sensing and image photointerpretation is to identify and evaluate those elements found on the Earth's surface [25].The mapping of invasive species using remote sensing techniques was not common until the 1990s.Moderate spatial resolution images (those with pixel size greater than 10 m) have been the most commonly used [3], although it should be noted that these are only effective for large areas where these plants exist, and when large-scale management is desired [26,27], as is the case in this case study.Refs.[5,[28][29][30] suggest that the key to the success of this type of approach is to take into account the unique characteristics of the plant (e.g., flowering and fruiting seasons), using a single image or a multitemporal approach.In general, multispectral free imagery used for mapping and monitoring of alien invasive plant species can be employed for applications on local to regional scales and provide accuracies ranging from very low to moderate [31][32][33].
At certain times, multispectral products have shown difficulties in detecting invasive species that exhibit similar characteristics to their environment, so the field of invasive species detection is shifting towards hyperspectral products or the use of hyperspectral products in combination with multispectral products [8].In particular, data from both multispectral and hyperspectral sources from satellites and unmanned aerial vehicles (UAVs) coupled with machine learning algorithms have been used to discriminate invasive species from other species.Areas covered by the invasive species Hakea saricea were mapped using high spatial resolution imagery from multispectral UAVs and WorldView-2, and accuracies were achieved which allowed its eradication at a local scale to be monitored [3].Meanwhile, Ref. [29] successfully mapped and identified the aggressive invasive species Acacia salicina and Acacia saligna using WorldView-2 imagery as well as the random forest algorithm.The authors of [30] analyzed the possibility of detecting and monitoring the spread of Asclepias syriaca in Hungary with hyperspectral images from UAVs.While hyperspectral data yield more accurate results (generally above 80% accuracy), they are mainly limited to airborne products, such as HyMap and the airborne visible/infrared imaging spectrometer (AVIRIS) [28].However, these aerial products have a major disadvantage in large-scale mapping since the acquisition cost is high, making the economic advantage of remote sensing less obvious.
To overcome these limitations, researchers have been applying traditional machine learning techniques to land use land cover (LULC) mapping using remote sensing, such as spectral angle mapper (SAM), fuzzy adaptive resonance theory supervised predictive mapping (Fuzzy ARTMAP), or other more advanced ones, which in recent years have gained wide acceptance, such as artificial neural networks (ANNs), support vector machine (SVM), or random forest (RF) [34,35].The three latter techniques generally provide better accuracy [36] than other traditional classification techniques, such as distance measurement [37], clustering [38], or logistic regression [39].These advanced models often exhibit significantly higher processing speeds compared to the original physically based models, and, additionally, when given precise and representative training datasets, these models can surpass the accuracy of conventional efficient parameterizations [40].In comparison with deep learning techniques, machine learning algorithms also achieve high accuracy with limited samples [41].Furthermore, machine learning techniques possess the ability to incorporate variables excluded from, or unsuited for, physically based models, including nonlinear processes [40].
The development of robust and advanced non-parametric image classification algorithms represents a significant advancement in the field of mapping invasive species [31].As satellite sensor technology continues to evolve, it is essential to explore the utilization of these advanced classifiers in conjunction with data from the latest generation of multispectral sensors, which offer improved spatial and spectral resolutions [42].This approach is crucial for overcoming the challenges associated with invasive species classification using remote sensing.One of the many difficulties is the spectral similarity between invasive and native species, making it hard to differentiate them accurately [42].Additionally, the heterogeneous nature of the landscape and the varying growth stages of different species further complicate the classification process [31].Furthermore, limited spatial resolution and spectral range of remote sensing data can hinder the identification of invasive species, especially in complex environments [43].Another challenge is the need for extensive ground truth data for training and validating classification models, which can be resourceintensive and time-consuming [44].Moreover, the dynamic nature of invasive species and their interactions with the environment requires continuous monitoring, which may be limited by the revisit time of remote sensing platforms [43].

Objective and Aims of This Work
The works described above regarding the mapping of invasive species using remote sensing indicate that, until now, there is no precise method for mapping most of these species [3,[45][46][47].Moreover, the increasing reliance on high spatial and hyperspectral datasets for alien invasive species detection and mapping poses challenges due to the prohibitive acquisition costs, particularly for repeated and large-scale estimation and monitoring in resource-limited regions like Cuba.To optimize the detection and mapping of invasive species in these regions, it is necessary to explore the capabilities of freely available improved spatial and spectral resolution multispectral datasets, such as Landsat 8-9, in conjunction with robust and advanced machine learning algorithms [31].Hence, three machine learning classifiers were used in this study in combination with Landsat imagery to map the invasive species.The three classifiers used were maximum likelihood (ML), support vector machine (SVM), and random forest (RF); all of them are supervised classifiers, the first being parametric and the latter two non-parametric.The ML algorithm explains each of the categories using a Gaussian function, assuming that the data follow a normal distribution; this makes it a very complex algorithm [48], but it has been widely used in the past and therefore is considered a benchmark [46].The non-parametric SVM classifier was introduced by Vapnik [49] as a machine learning model, which is based on kernel functions to perform regression and classification tasks [50] and has been also used in previous works in the field, along with RF [45,51].RF is a non-parametric machine learning algorithm that constitutes an ensemble of decision trees grown with a randomization process, which makes it robust against overfitting and less sensitive to noise in the data and outliers [52].
Therefore, in this study, our main objective was to find an accurate method based on remotely sensed imagery to map and monitor the marabú (Dichrostachys cinerea) invasion within the Ciego de Ávila province (Central Cuba) between 1994 and 2022.Also, another aim was to determine the LULC classes most significantly impacted by this invasion and analyze the spatial distribution and dynamics of marabú in that period of time.To achieve these goals, we used Landsat data from two different dates in combination with other auxiliary data and tested three machine learning classifiers.

Study Area
The Ciego de Ávila province is located in the central part of the island of Cuba (WGS84 UTM 2417694 731183 17Q), bordered to the west by the province of Sancti Spíritus, to the north by the Canal Viejo de the Bahamas, to the east by the province of Camagüey, and to the south by the Gulf of Ana María (Figure 1).It is the seventh largest province by area (6946.9km²), representing 6.3% of the total surface area of Cuba [53].Its political-administrative division consists of ten municipalities: Chambas, Morón, Bolivia, Ciro Redondo, Florencia, Majagua, Primero de Enero, Ciego de Ávila, Venezuela, and Baraguá [54].
The province has a predominantly flat relief, and it is one of the provinces where agroindustry and livestock are the main pillars of the economy, representing 50% of the economy of this region.The production of sugar and its derivatives is the main economic industry in the province [53].The scarcity of studies on this province related to the impact of marabú on these sectors of the economy, as well as those aimed at its spatiotemporal expansion since the economic crisis of the 1990s, led us to carry out a study of this type of invasive species using remote sensing techniques.The province has a predominantly flat relief, and it is one of the provinces where agroindustry and livestock are the main pillars of the economy, representing 50% of the economy of this region.The production of sugar and its derivatives is the main economic industry in the province [53].The scarcity of studies on this province related to the impact of marabú on these sectors of the economy, as well as those aimed at its spatiotemporal expansion since the economic crisis of the 1990s, led us to carry out a study of this type of invasive species using remote sensing techniques.
This research work was conducted following the approach summarized in the flowchart shown in Figure 2. The main elements of the flowchart are described in the sections below, as follows: imagery (input) data (Section 2.2), reference data for calibration and validation of the classification (Section 2.3), classification process (separability, algorithm description) (Section 2.4), independent validation of the classifications and classifier choice (Sections 2.5 and 2.6), and land use and land cover change analysis using the LULC maps obtained in the previous sections (Section 2.7).This research work was conducted following the approach summarized in the flowchart shown in Figure 2. The main elements of the flowchart are described in the sections below, as follows: imagery (input) data (Section 2.2), reference data for calibration and validation of the classification (Section 2.3), classification process (separability, algorithm description) (Section 2.4), independent validation of the classifications and classifier choice (Sections 2.5 and 2.6), and land use and land cover change analysis using the LULC maps obtained in the previous sections (Section 2.7).
Figure 2. Flowchart of the approach followed in this study.

Satellite Data
The LULC changes between 1994 and 2022 were analyzed, so one scene from 29 January 1994 from the Landsat 5 TM sensor and one scene from 26 January 2022 from Landsat 8 OLI were selected, both with path 013 and row 045.The images were obtained for January since this corresponds to the phenological flowering period of the species (Figure 3), when it is more likely to be differentiated from other land covers [18] and therefore the

Satellite Data
The LULC changes between 1994 and 2022 were analyzed, so one scene from 29 January 1994 from the Landsat 5 TM sensor and one scene from 26 January 2022 from Landsat 8 OLI were selected, both with path 013 and row 045.The images were obtained for January since this corresponds to the phenological flowering period of the species (Figure 3), when it is more likely to be differentiated from other land covers [18] and therefore the spectral signature of marabú would be easier to identify.The images chosen were the ones with the least cloud cover in the flowering period for those years.There were no clouds or cloud shadows in the study area in any of the images, and no quality issues were reported in the imagery metadata.Both the Landsat 5 TM and Landsat 8 OLI images are at-surface reflectance products (Collection 2 Level 2), geometrically corrected to WGS 84/UTM zone 17N, so it was not necessary to make any radiometric or geometric corrections.The bands used in both cases were blue (B), green (G), red (R), near-infrared (NIR), and short-wave infrared (SWIR1 and SWIR2), which in Landsat 5 TM are 1 to 5 and 7, and in Landsat 8 OLI, from 2 to 7, with a spatial resolution of 30 m and radiometric resolution of 16 bits.The images were downloaded from the United States Geological Survey (USGS) website (https://earthexplorer.usgs.gov/,accessed on 1 January 2020) in TIF format.
For each image, the visible and infrared bands were grouped together so that we could work with a single composition of bands.To avoid working with the entire image, a square cutout that included the study area was made.

Field Reference Data
In this study, one of the aims was to identify the 10 land/water cover classes shown in Table 1, with a special focus on the marabú class.These classes were chosen in order to determine if the marabú had spread over areas of environmental importance (wetlands, grasslands) and/or economic relevance (i.e., crops).Both the Landsat 5 TM and Landsat 8 OLI images are at-surface reflectance products (Collection 2 Level 2), geometrically corrected to WGS 84/UTM zone 17N, so it was not necessary to make any radiometric or geometric corrections.The bands used in both cases were blue (B), green (G), red (R), near-infrared (NIR), and short-wave infrared (SWIR1 and SWIR2), which in Landsat 5 TM are 1 to 5 and 7, and in Landsat 8 OLI, from 2 to 7, with a spatial resolution of 30 m and radiometric resolution of 16 bits.The images were downloaded from the United States Geological Survey (USGS) website (https://earthexplorer.usgs.gov/,accessed on 1 January 2020) in TIF format.
For each image, the visible and infrared bands were grouped together so that we could work with a single composition of bands.To avoid working with the entire image, a square cutout that included the study area was made.

Field Reference Data
In this study, one of the aims was to identify the 10 land/water cover classes shown in Table 1, with a special focus on the marabú class.These classes were chosen in order to determine if the marabú had spread over areas of environmental importance (wetlands, grasslands) and/or economic relevance (i.e., crops).Field reference data (LULC) were obtained corresponding to the 1994 and 2022 reference years (Figure 2).For the 1994 data, the land cover information involved the visual interpretation of Landsat satellite data coupled with on-site verification conducted by local long-term residents who possessed intimate knowledge of the study area and its historical LULC.To distinguish the marabú from the tree species present in the study area, the NIR SWIR1 RED bands were mainly used; given that, with this combination, the marabú could be differentiated from the native species because the marabú showed up on the image as intense red and the native species in the area as light red.The reference points corresponding to the 2022 image were collected across the study area using VHR satellite imagery available via Google Satellite images (https://www.google.com/maps/@22.0518467,-78.3543797,228031m/data=!3m1!1e3?entry=ttu, accessed on 1 January 2023) and verified in the field.The selection of reference data was guided by the unique characteristics of species occurrence, including elevation-induced vegetation distinctions, contiguity, homogeneity, and proximity to settlements.After the visual interpretation, small polygons were digitized for each LULC class to be used as training data in the classification process (Table 1).The approach followed to obtain the reference data for the independent validation of the classification is explained in Section 2.5.

Classification of Satellite Imagery
As a first step in the classification process (Figure 2), the spectral separability among classes was calculated to observe whether the classes defined above presented significantly different spectral signatures in the selected feature space.Separability analysis calculates the statistical distance between spectral classes [55].In this study, the Jeffries-Matusita distance was used to evaluate the separability, giving a value of less than 1.5 when the classes are spectrally similar and a maximum value of 2 when they are very different [55].In case of low values, the training samples should be reviewed and/or the definition of the classes, since the spectral classes would not match the classes defined in the legend.
For the classification of each image, we used three algorithms: maximum likelihood (ML), support vector machine (SVM), and random forest (RF).ML and SVM were run on ENVI 5.3.1 software, while RF was applied using the function provided in the randomForest package in the R statistical software version 4.2.3 [56].
ML works on the principle of calculating the probability distribution of each pixel.If the probability of a pixel belonging to class i is greater than the probability of it belonging to class j, it is classified as belonging to class i [55].
The non-parametric SVM algorithm identifies a single boundary between two classes, assuming that the multidimensional data are linearly separable in the input space.Specifically, it determines an optimal hyperplane to separate the data set into a discrete number of predefined classes, using the training data.To maximize separation, the algorithm uses a portion of the training sample that is closest in the feature space to the optimal decision boundary, acting as support vectors [57].The following settings were used in this study: a kernel of the radial basis function (RBF) type and a Gamma value of 0.165; all other parameters were kept at their default values.
RF is a non-parametric machine learning algorithm employing an ensemble of randomly grown trees, where individual tree predictions are subsequently aggregated [31].
During training, each tree is exposed to a different subset of the data, and the remaining data are used for testing.In this case, 70% of the sample described in Section 2.3 was dedicated to the development of the random forest model, and the remaining 30% was allocated for the prediction estimation, yielding the 'out-of-bag' (OOB) error estimate [58].
The importance of each one of the input variables in the model was also calculated.Each RF model consisted of 500 trees.The rest of the parameters of the randomForest were kept as default.

Validation
In order to compare the results of the classifications using the different algorithms, an independent validation was performed using the same data for the three classifiers (Figure 2).The sample size was calculated using Equation (1), which considers the binomial probability theory [59].An expected accuracy of 85% was established for all classes, as well as an allowable error of 10%.So a sample size of 51 points was obtained for each class, to which 4 more were added for a total of 55.Therefore, 550 points in total were used to calculate the accuracy of each classification.
where p: expected accuracy of the validation sample for that class (p) (p = 0.85); q = (100-p); E = allowable error in the classification of that class (E = 0.10); Z = 2, approximation of the normal standard deviation of 1.96 for the 95% confidence interval (two-tailed).
The method to locate the validation points was a stratified sampling [60,61].With this procedure, the points were generated randomly on the classified images, and the actual LULC class was assigned using the same methods as to assign the LULC classes for the training samples (Section 2.3).It was verified that none of the points of the validation sample overlapped the areas used for training.The spatial distribution of the validation points is shown in Figure 4.
Finally, to compare the results of each classification, a confusion matrix was obtained for each year and classification, and the overall accuracy, as well as the user and producer accuracies [59,62].For each statistic, the 95% confidence interval was calculated using the adjusted Wald method [63], which is the most widely used when the sample size is smaller than 100 for each class [63].The F-score (see Equation in [64]) for the target class (marabú) was also calculated, as a harmonic mean of the user and producer accuracy.In addition, the ROC curve (receiver operating characteristic curve) for marabú was calculated, to show visually the performance of each classification model for that class [65], as well as the AUC (area under the ROC curve) [66].The latter quantifies the overall performance of the classifier.A higher AUC value indicates better classification accuracy, with a value of 1 indicating a perfect classification.According to [66] AUC should be used instead of overall accuracy for the evaluation of machine learning algorithms (i.e., ML, RF).
The method to locate the validation points was a stratified sampling [60,61].With this procedure, the points were generated randomly on the classified images, and the actual LULC class was assigned using the same methods as to assign the LULC classes for the training samples (Section 2.3).It was verified that none of the points of the validation sample overlapped the areas used for training.The spatial distribution of the validation points is shown in Figure 4. Finally, to compare the results of each classification, a confusion matrix was obtained for each year and classification, and the overall accuracy, as well as the user and producer To further evaluate the results estimated from the classifiers and compare the overall performance of the classifiers, the McNemar nonparametric statistical test [67,68] was two-tailed and computed at a 95% confidence level.This test is based on the calculation of the χ 2 distribution and is commonly used to compare the classification errors between two classifiers, and test values > 3.84 show a statistical difference at a 95% confidence level [69].

Classifier Choice
The criteria to choose the most suitable algorithm for the mapping of marabú using Landsat imagery were as follows (in this order of priority): (i) the highest AUC for the marabú, (ii) the highest producer accuracy for the marabú class (lowest omission error), and (iii) the highest user accuracy for the marabú class (lowest commission error).The classifications will be used to locate the invasive species with the aim of controlling it; therefore, it is more important to minimize the omission error than the commission error.
If the results of two or more algorithms were not significantly different considering the first criterion, the second was tested, and, if needed, the following ones.

Land Use and Land Cover Change Analysis
Once the most accurate algorithm for each image was chosen, the calculation of the area occupied by each land cover was carried out in QGIS for each class of the vectorized and cropped file with the classification information for each year, 1994 and 2022 (Figure 2).Firstly, we determined the marabú spatial coverage during the period of time of the study.Then, we assessed the LULC change by constructing a cross-tabulation matrix for the time interval 1994-2022.This analysis involved the computation of gains, losses, net changes, and rates of change (Figure 2).Visual representations of LULC gains and losses were generated through the utilization of tables.In order to determine the influence of the marabú invasion on each LULC class, we subtracted the contributions of each class to marabú (losses to Marabú) from their respective gains from marabú (gains from Marabú).
To gain insights into the pattern of the invasion, we carried out an analysis of change statistics and conducted a visual assessment of the LULC maps.Our interpretation of the change output and distribution patterns was further enhanced by a comprehensive understanding of seasonal socioeconomic activities, including irrigation farming, charcoal production, and flood events.This knowledge was acquired through on-site field observations, facilitated focus group discussions, and interactions with the local community.

Spectral Characterization of D. cinerea
The spectral separability results obtained using the Jeffries-Matusita distance were equal to or more than 1.96 for the 1994 image and 1.98 for the 2022 image for all of the classes compared to D. cinerea.For the 1994 data, the lowest separability value was obtained with rainfed crops (1.96).For the 2022 data, the lowest separability obtained was also with rainfed crops (1.98), followed by woodlands (1.99).These values showed the suitability of the selected feature space (VIS, NIR, SWIR) for differentiating marabú from the other LULC classes.The spectral signature of marabú is clearly distinct from the spectral signatures of rainfed crops and woodlands, as shown in Figure 5, for the Landsat images from January 1994 and 2022.In both cases, the NIR band showed the largest differences in reflectance.

Dichrostachys Cinerea Detection with Landsat 5 TM and Landsat 8 OLI images
The combination of Landsat imagery obtained during the flowering season and RF algorithm was proven to be very efficient in detecting D. cinerea in the study area, w highly accurate results (AUC > 0.90 for both dates, producer accuracy (PA) > 83%, u accuracy (UA) > 93%) (Table 2).The PA and UA values for each one of the other n LULCs also showed that RF performed better compared to ML and SVM (Tables S1 a S2 in the Supplementary Materials).The confusion matrix for the 1994 RF showed that main sources of confusion were the woodland and grassland classes, while for 2022 th were the woodland and mangrove classes, in that order (Tables S3 and S6 in the Supp mentary Materials).

Dichrostachys cinerea Detection with Landsat 5 TM and Landsat 8 OLI images
The combination of Landsat imagery obtained during the flowering season and the RF algorithm was proven to be very efficient in detecting D. cinerea in the study area, with highly accurate results (AUC > 0.90 for both dates, producer accuracy (PA) > 83%, user accuracy (UA) > 93%) (Table 2).The PA and UA values for each one of the other nine LULCs also showed that RF performed better compared to ML and SVM (Tables S1 and S2 in the Supplementary Materials).The confusion matrix for the 1994 RF showed that the main sources of confusion were the woodland and grassland classes, while for 2022 they were the woodland and mangrove classes, in that order (Tables S3 and S6 in the Supplementary Materials).
The results of McNemar's test (Table 3) confirmed that the differences in the classification performance were statistically significant for all algorithms in pairwise comparison except for ML and SVM classifiers for 1994, where χ 2 0.05= 1.39 (lower than the test value 3.84; values lower than this critical value indicate that there is no statistical difference) [67,69].This confirms that the overall accuracy of RF (Table 2) was significantly higher than the accuracies obtained with SVM and ML, for both years.
Taking into account the results shown above (Tables 2 and 3), the RF model was chosen for mapping and estimating the area covered by Dichrostachys cinerea in 1994 and 2022.

Dichrostachys cinerea Spread from 1994 to 2022
In 1994, the total area covered by D. cinerea was 61,977.59ha, while in 2022 it reached approximately 91,533.47 ha (Table 4).In 1994, the marabú was more widespread to the east (Primero de Enero municipality), northeast (municipality of Bolivia), center (Ciro Redondo municipality), and northwest (Chambas municipality) of the province.The areas that had less coverage of D. cinerea were in the south (municipalities of Venezuela and Baraguá) of the province and in the Morón municipality (in the north of the province) (Table 5, Figure 6a).The results from 2022 showed an increase in the area covered by marabú in all the municipalities except for two (Table 4) and a change in the spatial distribution of the species, being most prevalent in the northeast and south of the province (municipalities of Bolivia, Primero de Enero, and Venezuela) (Figure 5b).The largest densities of the D. cinerea species were found in the municipality of Bolivia, Primero de Enero (northeast of the province), and in the south of Venezuela.It should be noted that during this period of time, D. cinerea was not present in both coastal areas of the province (south and north), principally in the areas occupied by mangrove forests, wetlands, and saltmarshes (Figure 6).

Marabú-Induced Changes in other LULCs
During the 28-year period, three of the ten LULC classes (D.Cinerea, infrastructure, and bare soil) significantly increased their spatial coverage, by 48%, 173%, and 366%, respectively, while the rest decreased or had almost no change (<5%) (Table 6).Overall, the highest losses were found in the woodland class, which was reduced by 30%, followed by rainfed crops by 29%, mangrove by 26%, grassland by 23%, and irrigated crops by 11%.Our results show that woodlands, mangroves, and rainfed crops were mostly lost to marabú during the studied period (Table 7).Within almost three decades, the marabú invasion has resulted in LULC losses of woodlands by 16,790 ha (38%), rainfed cropland by 6671 ha (4%), grasslands by 4186 ha (2,5%), flood-prone areas by 2085 ha (2%), mangroves by 1079 ha (4%), and irrigated crops by 192 ha (1%).The area that maintained a land cover of D. cinerea from 1994 to 2022 comprised 17,445.62ha.The results obtained in mapping and detecting changes in D. cinerea cover in Central Cuba using Landsat 5 TM and 8 OLI satellite images (moderate spatial resolution with a pixel size greater than 10 m) were considered satisfactory, achieving AUC > 0.92 and overall accuracies higher than 90%, which are higher than the ones achieved in other works monitoring invasive species [31,45,70,71].These results are supported as well by [3], the authors of which suggested that imagery with spatial resolutions of more than 10 m can generally be used to detect and map invasive species in large areas, as is the case in this study, where the invasion of D. cinerea was studied on a provincial scale.The results of the mapping of this species using different classification algorithms (ML, RF, and SVM) demonstrated the superior performance of the random forest (RF) algorithm (Table 2).The highest PA values for the Dichrostachys cinerea class (lowest omission errors) were obtained when using the RF algorithm, for both 1994 and 2022 (83.64% and 98.18%, respectively) (Table 2).RF was significantly more accurate than ML and SVM regarding PA (p < 0.05), according to the confidence intervals.Moreover, the highest UA values (lowest commission error) obtained were also with RF, for both years (93.88% for 1994 and 95.09% for 2022) (Table 2).These values were significantly higher (p < 0.05) than the ones achieved using ML and SVM, considering the confidence intervals.As expected, considering the PA and UA values, the AUC values were also higher for the classifications using RF (0.92 for 1994 and 0.97 for 2022).Regarding the performance of the classifiers for all 10 classes, the OA values achieved by RF were also the highest (90.91 and 95.09, respectively, for 1994 and 2022).Taking into account the confidence intervals, these values were significantly higher (p < 0.05) than the ones achieved using ML and SVM.These findings are in line with the broader literature on the use of different classification algorithms for mapping and monitoring invasive plant species.In [31], the authors emphasized the importance of algorithm selection in remote sensing applications for invasive species, which is corroborated by the observed performance of the RF algorithm in this study.The superior performance of RF over SVM has been also reported by [45,51,[71][72][73] for land cover and invasive species mapping, while [70] also found the use of RF instead of ML more effective in mapping invasive plant species, emphasizing the relevance of these methods in remote sensing applications for invasive species management.The authors of [73] conducted a comparison between ML and SVM classifiers for tree cover mapping, also finding higher accuracy when using SVM, as shown in our work.On the one hand, one of the limitations of this work could be the possible difficulty in finding free cloud cover images for the optimal phenological status (flowering period); however, the availability of Landsat 8, Landsat 9 and, if needed, Sentinel 2 imagery simultaneously since February 2022, increases the temporal resolution and the likelihood of accessing cloud-free data [74].On the other hand, the minimum mapping area is limited by the spatial resolution of the Landsat 5 and Landsat 8 imagery used in this study, which limits the detection of very small areas covered by marabú.This issue could be overcome by the use of Sentinel 2 imagery, which has a higher spatial resolution (10 m for VIS and NIR and 20 m for SWIR) since the critical spectral bands identified to detect marabú (NIR) are also present in Sentinel 2 imagery (Figure 5).

4.2.
Dichrostachys cinerea Spread from 1994 to 2022 and Land Cover Changes D. cinerea is considered a species of great concern; it has transformed the land cover of Cuba and has been defined as an invasive species that is not from Cuba [13].Furthermore, it constitutes an obstacle to landscape recovery for ecological restoration and agricultural redevelopment [75].
Our results showed that there was an increase of almost 30,000 hectares covered by marabú between 1994 and 2022 (Table 6).However, except in some specific areas, such as the highest areas (more than 130 m a.s.l.(above sea level)) between the municipalities of Florencia, Chambas, and Ciro Redondo, and around Loma de Cunagua (Figures 1 and 6), the areas covered by marabú changed.For example, in 2022, an increase in the density of the plant was noted towards the northwest (Bolivia and Primero de Enero municipalities) and south (Venezuela municipality), and a decrease in the density of areas covered by marabú towards the center of the province (Ciro Redondo municipality) (Table 5).
These temporal changes in the areas covered by marabú could be related, in the case of the increase, to the definitive closure of sugar mills at the beginning of 2002 (more than 20 years before the 2022 image was captured) in the municipalities of Bolivia and Venezuela and, therefore, to the decrease in areas dedicated to planting sugar cane [11].Furthermore, both municipalities have suffered, in the last decade, a significant decrease in their populations [54].In fact, Bolivia and Venezuela are the municipalities with the lowest population densities [54], in the province (17.2 inhabitants/km 2 and 32.5 inhabitants/km 2 , respectively).Coupled with the fact that they are purely agricultural municipalities, this population decrease in Bolivia has also impacted the availability of a labor force in the agricultural sector, and therefore less of the land area is used for agriculture, which provides an opportunity for the establishment and expansion of marabú [17].Table 5 shows a large decrease in irrigated and rainfed crop areas, which reinforces our previous statements.
In the area studied, only the municipalities of Ciro Redondo and Ciego de Ávila showed a decrease in the areas covered by marabú, which could be due to the fact that since 2019, a bioelectric plant has been in operation (in the Ciro Redondo municipality) that uses marabú biomass as an alternative energy.According to the feasibility studies, the yields of the areas covered by marabú in a 50 km radius around the bioelectric plant were expected to have a yield of 70 tons/ha [11].However, reality has shown that yields do not exceed 30 tons/ha, which could be due to the estimation method, which was based on the compilation of reports from landowners, whether private or state, who contributed the values according to their own assessments [24].For this reason, our study can offer important information for the management of this species as biomass in the production of electrical energy and for the management of marabú as an emerging popular charcoal source for wood-fired ovens and grills in Cuba [75].In recent years, marabú charcoal has become one of the major agricultural exports from Cuba to Europe, and since 2017 it has been the largest agricultural export to the United States in more than 50 years [75].
Although some relate the prevalence of marabú to factors such as altitude, rainfall, and availability of sun, in the province where the study was conducted, these were not the determining factors in the spatial distribution of the species, since the plant populations were maintained in the highest areas of the province above sea level between 1994 and 2022 [10].This could be due to the lack of mechanized and specialized technology for its eradication, in addition to the fact that these areas are not used as much as other areas for agricultural and livestock exploitation.According to [11], marabú eradication is so laborious and expensive that very often the invaded lands are abandoned by agricultural producers.In the case of the spatial distribution of rainfall in the province [76], a direct relationship was not determined, since marabú aggressively invaded both the areas with the lowest rainfall (<1150 mm/ year) in the province (northeast) and also the areas with significantly higher rainfall (>1340 mm/year) (west).
One of the control methods used to curb the expansion of marabú is the flooding of flat lands, since the species is not tolerant to soil inundation [11].For this reason, the spatial distribution of the species, both in 1994 and in 2022, did not extend to the areas where marshes, wetlands, and mangroves were located in the southern and northern edges of the province; for example, in the Morón municipality (located within the Great Northern Wetland of Ciego de Ávila), the area covered by marabú remained almost the same over the time period (Tables 6 and 7).The above also constitutes one more reason to protect these ecosystems, their freshwater sources, and the populations of species such as the mangrove forest.In fact, Table 6 shows a decrease (of around 6600 ha) in the areas covered by mangroves between 1994 and 2022, which must be monitored in order to conserve this important environmental resource.
In summary, the spatial distribution and temporal changes in D. cinerea in the Ciego de Ávila province showed heterogeneous dynamics (with an increase of around 48% of the area occupied by marabú), which was associated more with changes in land use and tenure than to other factors, such as height above sea level and rainfall.
This methodology for mapping marabú can be applied to other provinces in Cuba that have the same problem as Ciego de Ávila, in relation to the expansion of marabú and its effects on the other sectors of the Cuban economy.

Conclusions
This work provides a remote sensing-based tool for Dichrostachys cinerea (marabú) detection and mapping in central Cuba, offering insights into its spatial distribution, temporal changes, and impact on land cover.The findings contribute valuable knowledge for regional management and lay the groundwork for future research addressing complex challenges associated with invasive species.
This study successfully employed Landsat imagery and machine learning classifiers, achieving satisfactory results (AUC > 0.82) in detecting and mapping marabú in central Cuba.The RF algorithm demonstrated superior performance in comparison to ML and SVM.The research emphasized the critical role of algorithm and image selection (flowering time) in remote sensing applications for invasive species mapping.The independent validation of the developed model revealed a strong ability to accurately identify 10 main land covers in the area (overall accuracy >90%).
Regarding the temporal dynamics, the analysis of marabú spread from 1994 to 2022 revealed a notable increase of almost 30,000 hectares (48%) in the area occupied by this species.The analysis of the spatial distribution and dynamics of marabú during that period of time showed fluctuations in density across different areas of the Ciego de Ávila province (the highest increases in the Bolivia and Venezuela municipalities, and a decrease in the center of the province), indicating a dynamic interaction with land use and economic factors.The LULC classes most significantly impacted by this invasion have been woodlands, mangroves, and rainfed crops, with implications for agriculture and ecosystem health.Changes in land use, including the closure of sugar mills and population decline in specific municipalities, were identified as influencing factors.
The developed methodology not only provides valuable insights into marabú dynamics in Ciego de Ávila but also offers a transferable tool for other Cuban provinces facing similar challenges.The methodology and results of this paper can also be used as a base to develop detection and monitoring models for economically constrained areas, by using free multispectral imagery and the RF algorithm.This could be applicable to similar invasive species that have a flowering period different from the native species and are spectrally different from other land covers in that area, and it would contribute to advancing the understanding of the model's robustness and applicability in other areas, potentially improving its performance and expanding its utility for management objectives.
The challenges identified in this study include the need for continuous, accurate monitoring, and the exploration of sustainable solutions for marabú management.The spread of marabú must be addressed and managed in a holistic way, which would involve the eradication of this plant in some areas; also, it could be used both as a source of energy and for the production of charcoal in other areas, which would result in an increase in income for families and municipalities linked to this activity.Future research should focus on refining mapping techniques, considering the evolving socioeconomic landscape, and developing targeted interventions to address the persistent threat posed by marabú.Exploring the integration of additional data sources, such as free higher temporal-and spatial-resolution imagery like Sentinel 2 data, could further refine the model's usability.
Future studies could delve deeper into the socioeconomic drivers influencing marabú dynamics.Understanding the intricate relationship between economic activities, population changes, and land use decisions will enhance predictive models and management strategies.Given marabú's impact on agriculture and ecosystems, continuous monitoring and intervention strategies should be explored.Integrating remote sensing with on-the-ground efforts can contribute to dynamic, adaptive management practices to curb marabú expansion.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/rs16050798/s1,Table S1: Producer's Accuracy (PA) and User's Accuracy (UA) results for 2014 classifications for all LULC classes using RF, NL and SVM algorithms; Table S2: Producer's Accuracy (PA) and User's Accuracy (UA) results for 2022 classifications for all LULC classes using RF, NL and SVM algorithms; Table S3: Confusion matrix obtained from the validation of the classification carried out with the RF algorithm in 1994 in the province of Ciego de Ávila; Table S4: Confusion matrix obtained from the validation of the classification carried out with the ML algorithm in 1994 in the province of Ciego de Ávila; Table S5: Confusion matrix obtained from the validation of the classification carried out with the SVM algorithm in 1994 in the province of Ciego de Ávila; Table S6: Confusion matrix obtained from the validation of the classification carried out with the RF algorithm in 2022 in the province of Ciego de Ávila; Table S7: Confusion matrix obtained from the validation of the classification carried out with the ML algorithm in 2022 in the province of Ciego de Ávila; Table S8: Confusion matrix obtained from the validation of the classification carried out with the SVM algorithm in 2022 in the province of Ciego de Ávila.

1 .
Invasive Plant Species in Cuba: The Case of Dichrostachys cinerea (L.)

Figure 2 .
Figure2.Flowchart of the approach followed in this study.

21 Figure 3 .
Figure 3.In Cuba, the D. cienerea is an invasive species that has spread rapidly throughout the country.(A) = young plants of D. cinerea, (B) = internal structure of a D. cinerea forest, (C) = expansion of a D. cinerea forest in a savannah in the province of Ciego de Ávila, and (D) = the D. cinerea plant in bloom.

Figure 3 .
Figure 3.In Cuba, the D. cienerea is an invasive species that has spread rapidly throughout the country.(A) = young plants of D. cinerea, (B) = internal structure of a D. cinerea forest, (C) = expansion of a D. cinerea forest in a savannah in the province of Ciego de Ávila, and (D) = the D. cinerea plant in bloom.

Figure 4 .
Figure 4. Spatial distribution of the validation points for 1994 (a) and 2022 (b).

Figure 4 .
Figure 4. Spatial distribution of the validation points for 1994 (a) and 2022 (b).
Remote Sens. 2024,16,  x FOR PEERREVIEW  11 o    spectral signatures of rainfed crops and woodlands, as shown in Figure5, for the Land images from January 1994 and 2022.In both cases, the NIR band showed the largest d ferences in reflectance.

Figure 6 .
Figure 6.Spatial distribution of D. cinerea (marabú) areas in 1994 (a) and in 2022 (b) in the Ciego de Ávila province using the RF classifier.

Figure 6 .
Figure 6.Spatial distribution of D. cinerea (marabú) areas in 1994 (a) and in 2022 (b) in the Ciego de Ávila province using the RF classifier.

Table 1 .
Land use and land cover classes located in the study area and the number of training areas for each class.

Table 1 .
Land use and land cover classes located in the study area and the number of training areas for each class.

Table 2 .
General accuracy results for the 1994 and 2022 classifications.Producer accuracy (PA), u accuracy (UA) for Dichrostachys cinerea, and overall accuracy (OA) values are expressed in %.T Wald-adjusted confidence intervals (p < 0.05) are shown for OA, PA, and UA.The area under ROC curve (AUC) and F-score is shown for the target class Dichrostachys cinerea.

Table 2 .
General accuracy results for the 1994 and 2022 classifications.Producer accuracy (PA), user accuracy (UA) for Dichrostachys cinerea, and overall accuracy (OA) values are expressed in %.The Wald-adjusted confidence intervals (p < 0.05) are shown for OA, PA, and UA.The area under the ROC curve (AUC) and F-score is shown for the target class Dichrostachys cinerea.

Table 3 .
McNemar's comparison of classifier performances.Note: test values > 3.84 show that there is a statistical difference at a 95% confidence level.Bold values show calculated statistics smaller than the critical value (χ 2 0.05) = 3.84).

Table 4 .
LULC proportions for each class in hectares (ha) and percentage of the total area for 1994 and 2022.

Table 5 .
Areas covered by D. cinerea (by municipality and total) in 1994 and 2022 in the Ciego de Ávila province.

Table 6 .
The overall net changes in LULCs in hectares (ha): the percentage of the total study area by class and the percentage of change for each class between 1994 and 2022 with the base area being 1994.These changes were calculated for the 1994-2022 time period.