Affordable Use of Satellite Imagery in Agriculture and Development Projects: Assessing the Spatial Distribution of Invasive Weeds in the UNESCO-Protected Areas of Cuba

The effective and regular remote monitoring of agricultural activity is not always possible in developing countries because the access to cloud-based geospatial analysis platforms or expensive high-resolution satellite images are not always available. Herein, using paid high-resolution satellite images first and then free medium-resolution satellite images, we aimed to develop a cost-free, affordable method for regularly mapping the spatial distribution of sicklebush (Dichrostachys cinerea), an archetypal allochthonous invasive plant in Cuba that is becoming impossible to control owing to its rapid growth in areas planted with sugar cane in the Trinidad-Valle de los Ingenios area (Cuba), a UNESCO World Heritage Site. Two types of images were used (WorldView-2 and Landsat-8); these were subjected to supervised classification, with accuracy values of 88.7% and 93.7%, respectively. Vegetation cover was first derived from the purchased WorldView-2 image, and this information was then used as the training field to obtain spectral signatures from the Landsat-8 free image so that Landsat images may be regularly used to monitor D. cinerea infestations. The results obtained in the spatial distribution map for sicklebush in the Landsat-8 images had an overall reliability of 93.7% and a Kappa coefficient reliability of 91.9%. These values indicate high confidence in the results, which suggests that sicklebush has invaded 52.7% of the total 46,807.26-ha area of the Trinidad-Valle de los Ingenios. This process proved extremely effective and demonstrated the benefits of using high-resolution spatial images from which information can be transferred to free satellite images with a larger pixel size.


Introduction
The sicklebush (Dichrostachys cinerea L. Wight & Arn.), also known as marabou and bell mimosa, is a semi-deciduous to deciduous tree with an open crown that can reach a height of 7 m. It is characterised by bark with green young branches, greyish dark-brown older branches, and stems with longitudinal fissures and strong, smooth, and slightly recurvate thorns to 8-cm long, which grow at nearly right angles from the branches and can support leaves at the base (ECOCROP, accessed on 4 November 2020). This species belongs to the Fabaceae family (subfamily Mimosaceae) and is originally from South Africa. However, in Cuba, it has become an allochthonous invasive plant that is extremely difficult to control. Its tolerance to stress, abundant thorns, resistance to cutting and burning, hardness of its stems, dispersal of seeds by livestock, and high capacity to re-sprout when

Data
A multispectral 8-band WorldView-2 image with 1.84 m resolution, captured on 20 January 2014 in cloud-free conditions (processing level: Ortho-ready standard), and a Landsat-8 image with 30 m resolution, captured on 1 April 2014 with a cloud cover of 3.76% and Level-1T processing (standard terrain correction), were obtained for the study region. Both images were obtained from the United States Geological Survey Global Visualisation Viewer Available online: http://glovis.usgs.gov/ (accessed on 4 December 2020).

Data
A multispectral 8-band WorldView-2 image with 1.84 m resolution, captured on 20 January 2014 in cloud-free conditions (processing level: Ortho-ready standard), and a Landsat-8 image with 30 m resolution, captured on 1 April 2014 with a cloud cover of 3.76% and Level-1T processing (standard terrain correction), were obtained for the study region. Both images were obtained from the United States Geological Survey Global Visualisation Viewer Available online: http://glovis.usgs.gov/ (accessed on 4 December 2020).

Supervised Classification of WorldView-2 Image
There are two general approaches to image classification: supervised and unsupervised. They differ in how the classification is performed. In the case of supervised classification, the software system delineates specific landcover types based on statistical characterization data drawn from known examples in the image (known as training sites). With unsupervised classification, however, clustering software is used to uncover the commonly occurring landcover types, with the analyst providing interpretations of those cover types at a later stage.
Compared with the middle and low-resolution satellite images, the high-resolution satellite images have richer spatial but less spectral information. Previous studies have shown that the conventional pixel-based statistical methods cannot obtain very satisfying results [18]. Therefore, we used an object-oriented classification for the high-resolution satellite image. The WorldView-2 image was first segmented using eCognition software, which subdivides the image into separate regions. We used a multi-threshold segmentation procedure based on homogeneity and heterogeneity criteria in colour and shape parameters (0.8 and 0.2, respectively) and smoothness and compaction parameters (0.5 and 0.5, respectively) (Appendix A, Figure A1). However, several attempts were needed to determine the scale parameter (20 m) that was able to achieve the best fit for identifying the limits of ground-cover types.
The image was processed using Erdas Imagine software, within which a threeconsecutive-stage supervised classification was conducted. In the first stage, the training stage [19], sample points and areas obtained on-site and, with the help of visual interpretation with Google Earth, were used to select regions that represented the different vegetation and land-use types and to build a working classification, as shown in Table 1. Despite being a cloud-free image, dark-tone regions were identified across the whole image, matching those associated with tree shadows, which likely reflected the time the image was taken. Therefore, a spectral signature was added to this specific category. The training sites were digitised using the segmented image as a baseline, and spectral signatures were obtained for use in the following stage. In total, there were 389 training fields, from which 19 signatures were acquired for different vegetation types. After the spectral signatures were obtained, they were assessed using the transformed divergence separability method [20], in which index values over 1900 could be confidently distinguished, values between 1700 and 1900 indicated fair separation, and values below 1700 indicated poor separation. Signatures containing values below 1800 were eliminated to produce new spectral signatures at different training sites [21].
In the second stage, the assignment stage, and once the land-cover types were classified and their signatures derived, each pixel was analysed independently during the assignment or data classification stage. For this, pixels were classified using the maximum likelihood algorithm as the parametric rule and the parallelepiped algorithm as the non-parametric rule [22].
In the third and last stage, the accuracy assessment stage, the subsequent supervised classification was evaluated to assess accuracy using the "Accuracy Assessment" Erdas imaging in-built tool, which allowed comparison of certain pixels in the classified image with baseline data, for which real classes were known from field sampling. This tool provides an error matrix, accuracy percentage values, and Kappa coefficients [23]. The size of the samples, which were subsets of the field sampling, was 257 pixels in the WorldView-2 image and 240 pixels in the Landsat-8 image, which were distributed randomly as established by Congalton [23] and the Spiegel and Stephens [24] simplified formula for population estimation.

Supervised Classification of Landsat-8 Image
The land-cover categories obtained from the WorldView-2 image were exported as polygons, and those with an area less than 900 m 2 were eliminated to match the 30-m pixel size of the Landsat-8 image. Before processing the Landsat-8 image, several corrections were made to eliminate radiometry anomalies using ENVI 5.1 software; radiometric and atmospheric corrections were applied involving the transformation of digital levels to radiance and reflectance. The atmospheric correction used in this study was based on the Fast Line of Sight Atmospheric Analysis of Hypercubes (FLAASH) method. This method corrects visible wavelengths to near-infrared (NIR) and short-wave regions to Once the corrected image was obtained, it was cropped to the WorldView-2 image and then to an area covering the Trinidad-Valle de los Ingenios. The methodology used in the classification was repeated for the first cropped image (excluding the training stage), from which spectral signatures were acquired using the WorldView-2 classifications as training fields. The number of classes were reduced to those that could be differentiated with Landsat-8 since the ability to discriminate features compared to WorldView-2 is lower (spatial resolution of 30 m versus 1.85 m). The spectral signatures for the land-use and vegetation categories shown in Table 2 were thereby derived. After the signatures of the different training fields were acquired, they were assessed, classified, and verified using the accuracy assessment tool to determine the reliability of the resulting sicklebush distribution in the Landsat-8 image. Once the classification was produced in the first cropped image, the entire study area was processed. This was performed based on the spectral signatures of the first cropped image, which had previously been validated. Therefore, the final sicklebush distribution information was obtained with a high degree of confidence.

WorldView-2 Image
To identify all vegetation cover types in the WorldView-2 image, both the segmentation and diversity of plant covers were considered. In total, 389 training fields were used for the 19 different spectral signatures to include the spectral variability of each class. The results for the digital levels in the different bands obtained from the training fields for each spectral signature are displayed in Figure 2. Greater separation of each signature can be seen in band 7 (NIR), which registers plant activity parameters and is sensitive to moisture. This explains why this part of the spectrum generates high reflectivity [25].

MYA
Dichrostachys cinerea + Vachellia farnesiana PAY Roystonea regia, Bursera simaruba (L.) Sarg., Cecropia schreberiana Miq. PA Acacia mangium Willd. plantation PS Grass UR Urban VR Riparian vegetation After the signatures of the different training fields were acquired, they were assessed, classified, and verified using the accuracy assessment tool to determine the reliability of the resulting sicklebush distribution in the Landsat-8 image. Once the classification was produced in the first cropped image, the entire study area was processed. This was performed based on the spectral signatures of the first cropped image, which had previously been validated. Therefore, the final sicklebush distribution information was obtained with a high degree of confidence.

WorldView-2 Image
To identify all vegetation cover types in the WorldView-2 image, both the segmentation and diversity of plant covers were considered. In total, 389 training fields were used for the 19 different spectral signatures to include the spectral variability of each class. The results for the digital levels in the different bands obtained from the training fields for each spectral signature are displayed in Figure 2. Greater separation of each signature can be seen in band 7 (NIR), which registers plant activity parameters and is sensitive to moisture. This explains why this part of the spectrum generates high reflectivity [25]. The sicklebush and sicklebush with aroma (Vachellia farnesiana) categories obtained an average digital level in the NIR band of 583.64 and 514.71, respectively. These values are below those for the other classes of vegetation and reflect differences in the properties of the leaves, canopy structure, and orientation. For example, the portion of radiation reflected in different parts of the spectrum (the reflectivity pattern) depends on a leaf's pigmentation, density, and composition (cellular structure) as well as the quantity of water in its tissues [26]. As such, our adopted spectral classification approach was able to confidently distinguish sicklebush from other species.
After recording the training pixels for the different categories of each class, spectral signatures were assessed using the transformed divergence separability method (Table 3). This showed that with the sicklebush class as baseline, the classes with lower spectral distances were Acacia plantation (PA), mamoncillo, mango, other (MMO), and sicklebush and aroma (MA). The values obtained for these classes were 1723, 1805, and 1855, respectively. Based on these values, the AP class had poor separability, whereas the MYA and MMO classes had medium separability.  The poor separation in Acacia, sicklebush, and aroma classes can be explained by the similarity in the alternate pinnate leaves of these species, both paripinnate and imparipinnate, which is a common characteristic of the Fabaceae family [27]. This implies that the reflectivity of these leaves will share similarly high values, making it impossible to obtain a greater level of separation within these classes.
Following the assessment of the signatures, an assignment was performed to select the best classifier. The parallelepiped method left a large number of pixels unclassified for sicklebush, producing a highly fragmented thematic image with fairly extensive regions of pixels with no specific assignment. The minimum distance method also gave an erroneous classification of pixels, including those belonging to other categories, and as a result, classes could not be determined for a large number of isolated pixels. To address this issue, two classifiers were employed. First, the maximum likelihood classifier was applied as a parametric rule, and the parallelepiped classifier was applied as a non-parametric rule. These separated each of the vegetation and land-use categories by assigning each pixel with an established category so that none were left unclassified. This follows the results of Ayala and Menenti [22], which suggest that these classifiers perform the best, as measured using the performance index.
The spatial distribution of sicklebush is shown in red in Figure 3, showing a homogeneous distribution in combination with other types of vegetation. In areas where sicklebush was found, the canopy was almost entirely classified as sicklebush. This indicates a correct supervised classification. While juvenile sicklebush growth occurs in some areas, the corresponding spectral values could not be acquired because the training fields only included adult growth, which was more easily identified from the satellite images.
The spatial distribution of sicklebush is shown in red in Figure 3, showing a hom geneous distribution in combination with other types of vegetation. In areas where sic lebush was found, the canopy was almost entirely classified as sicklebush. This indica a correct supervised classification. While juvenile sicklebush growth occurs in some are the corresponding spectral values could not be acquired because the training fields on included adult growth, which was more easily identified from the satellite images. To validate the accuracy of the resulting classification, 257 pixels were randomly lected to calculate the reliability and degree of accuracy of each class using a confusi matrix ( Table 4). The data in Table 4 represent the degree of association between the cla sified and baseline classes, yielding an overall accuracy of 88.7%. This exceeds the 85 value recommended by Anderson et al. [28]. For sicklebush, an accuracy of 94.7% w obtained. Of all the pixels in this category, 94.7% coincided with pixels classified by t applied algorithms, while the remaining pixels were classified as the aroma class. As p viously stated, such misclassification likely resulted from the similar physiological ch acteristics of these two species of Fabaceae. The accuracy obtained by the Kappa coefficie was 87.4%, close to the perfect level of reliability according to the value classification f the coefficient, as discussed by Landis and Koch [29]. To validate the accuracy of the resulting classification, 257 pixels were randomly selected to calculate the reliability and degree of accuracy of each class using a confusion matrix ( Table 4). The data in Table 4 represent the degree of association between the classified and baseline classes, yielding an overall accuracy of 88.7%. This exceeds the 85% value recommended by Anderson et al. [28]. For sicklebush, an accuracy of 94.7% was obtained. Of all the pixels in this category, 94.7% coincided with pixels classified by the applied algorithms, while the remaining pixels were classified as the aroma class. As previously stated, such misclassification likely resulted from the similar physiological characteristics of these two species of Fabaceae. The accuracy obtained by the Kappa coefficient was 87.4%, close to the perfect level of reliability according to the value classification for the coefficient, as discussed by Landis and Koch [29].  Visual inspection of the classification revealed salt-and-pepper noise for some cover types despite image segmentation and pixel grouping. This noise was unavoidable and is similar to the observations of Estoque [30], who used different classifiers. Salt-and-pepper noise is more apparent in the classifiers used in this study than in those based on objects from high-resolution spectral images. Consequently, we conclude that our classification did not give a higher overall reliability level, as some of the sample pixels were likely affected by this noise even when their surroundings were correctly classified, based on field observations.

Landsat-8 Image
The Landsat-8 image was cropped to the size of the WorldView-2 image to use the previously acquired classifications as training fields to obtain spectral signatures for the different vegetation types. A working legend (Table 5) was established for different vegetation types and characteristics of the Landsat-8 image based on its ability to discriminate one object from another since the spatial resolution of the first classified image was 0.5 m, that is, the size of the pixel, and the Landsat-8 image had a spatial resolution of 30 m (Appendix A, Figure A2). Therefore, the number of classes or vegetation types was reduced to those that could be differentiated. Thus, 70 training fields were obtained for the 10 different spectral signatures to account for the spectral variability of each class. The spectral signatures obtained from the training areas ( Figure 4) show that sicklebush (MA) and the combination of sicklebush and aroma (MYA) were distinguishable from the other classes in band 5 (NIR) and gave the lowest reflectivity, 0.245 and 0.228, respectively. The same result was obtained using the WorldView-2 image, in which the lowest reflectivity values were obtained for the same classes.
These reflectivity patterns are typical for vegetation and reveal low reflectivity in the visible bands, especially in the red portion, but high reflectivity in the NIR band [31]. The low reflectivity in the visible portion of the spectrum results from absorption by leaf pigments, primarily chlorophyll [32]. High reflectivity in the NIR region may be due to the internal structure of leaves (e.g., the spongy mesophyll layer), in which internal air cavities affect the diffusion and dispersion of radiation [33].
To confirm that the signatures obtained for the different vegetation and land-use types were separated spectrally from the sicklebush category, the transformed divergence separability method was applied. Good separability between the different vegetation types was apparent (Table 5); the majority of values exceeded 1900, as established by [21], apart from the sicklebush and aroma class, which exhibited a value of 1086, placing it in the poor separability category. This likely resulted from the combined signatures of both sicklebush and aroma, the growth of which were closely associated in the study area. Another explanation may be the choice of the training fields and pixel size given that the problem did not appear in the WorldView-2 image. This implies that the sites selected as training sites in the first classification had a greater cover of aroma than sicklebush, with the opposite apparent in the Landsat-8 image. In terms of pixel size, it is possible that the range of the sensor did not allow sicklebush and aroma to be differentiated, given their similar morphological characteristics, including bipinnate compound leaves. The spectral signatures obtained from the training areas ( Figure 4) show that sicklebush (MA) and the combination of sicklebush and aroma (MYA) were distinguishable from the other classes in band 5 (NIR) and gave the lowest reflectivity, 0.245 and 0.228, respectively. The same result was obtained using the WorldView-2 image, in which the lowest reflectivity values were obtained for the same classes. These reflectivity patterns are typical for vegetation and reveal low reflectivity in the visible bands, especially in the red portion, but high reflectivity in the NIR band [31]. The The assignment stage was undertaken to obtain the supervised classification using the maximum likelihood method as a parametric rule and the parallelepiped method as a non-parametric rule. This approach successfully separated the vegetation and landuse categories. While the classification was conducted for all vegetation types used in the spectral signatures, only the distribution of sicklebush was mapped in this study ( Figure 5). Thus, sicklebush was distributed across the analysed area, mostly localised to the north, where the valley is superseded by the Escambray Sierra. The results obtained from the original Landsat-8 image demonstrated that sicklebush was colonising areas with low vegetation density or where vegetation is highly fragmented. This finding is in agreement with those of a study by Godinez et al. [34], who noted that as an invasive species, sicklebush colonises open ground by forming a structure of inaccessible branches and thorns and highlighted the prevalence of this species in areas of secondary scrubland. 5). Thus, sicklebush was distributed across the analysed area, mostly localised to the north, where the valley is superseded by the Escambray Sierra. The results obtained from the original Landsat-8 image demonstrated that sicklebush was colonising areas with low vegetation density or where vegetation is highly fragmented. This finding is in agreement with those of a study by Godinez et al. [34], who noted that as an invasive species, sicklebush colonises open ground by forming a structure of inaccessible branches and thorns and highlighted the prevalence of this species in areas of secondary scrubland.  Finally, the classification was checked using a confusion matrix produced with 240 random pixels verified in the field. The validation of the maximum likelihood classification of the Landsat-8 image is shown in Table 6, which yielded an overall accuracy of 93.7%, higher than that of Anderson et al. [28], who suggested that an accuracy of at least 85% is required (both overall and for each category) if results are to be used for future land-use management and planning. An accuracy of 91% was obtained for the Kappa coefficient, which, based on Landis and Koch [29], indicates very high reliability, where six of the ten categories exceeded 90%. This suggests that the visual interpretation and criteria used to separate the different categories were correct, reducing the likelihood of the misclassification of pixels.   After acquiring the spectral signature and spatial distribution of sicklebush by cropping the Landsat-8 image, supervised classification was performed using the maximum likelihood method (Figure 6). Sicklebush was present in most of the municipality (shown in red in Figure 6), covering 52.7% of the total area analysed, and was mainly present in the northeast and west (Escambray Sierra) and south (seaside marshes) of the Valle de los Ingenios area, which is clearly delimited (Figure 6).
Given the scope of the project, the distribution of infestation could not be verified across non-cropped areas. Therefore, it was impossible to generate a matrix of confusion to determine the overall reliability of our model and to determine the effect of transferring the information from a small to a larger area. This is because there may be other types of vegetation in non-accessible zones, such as the sierra and marsh areas, which had not been distinguished spectrally in the model. Therefore, it is necessary to generate alternative signatures to avoid erroneous classification in the study area. However, our model was able to distinguish between local crops and marabou covers, which was the main goal of this study.

Conclusions
Training fields, generated using segmentation of WorldView-2 images, successfu yielded the spectral signatures of sicklebush (MA) and sicklebush with aroma (MY land-cover classes, enabling them to be distinguished from other types of vegetation ide

Conclusions
Training fields, generated using segmentation of WorldView-2 images, successfully yielded the spectral signatures of sicklebush (MA) and sicklebush with aroma (MYA) landcover classes, enabling them to be distinguished from other types of vegetation identified in the study area. The resultant map of sicklebush distribution based on the WorldView-2 image had an overall precision of 88.7% overall accuracy and a Kappa coefficient reliability of 87.4%.
Once the training fields from the Worldview-2 image were obtained, we acquired spectral signatures from Landsat-8 images. Using this information, the spatial distribution map for sicklebush in the Landsat-8 image had an overall reliability of 93.7% and a Kappa coefficient reliability of 91.9%. These values indicate high confidence in the results, which suggest that sicklebush has invaded 52.7% of the total 46,807.26 ha of Trinidad-Valle de los Ingenios.
Access to both expensive, high-definition spatial satellite images and/or cloud-based, geospatial analysis platforms is one of the main problems when implementing geospatial analysis in developing countries. The cost of such data (23 $/km 2 for WorldView-2) usually jeopardises the development of extensive and/or intensive studies that, in our case, would provide useful information about spatial and temporal patterns in sicklebush infestations. The use of vegetation covers retrieved from the WorldView-2 image as training fields to obtain spectral signatures from the Landsat-8 showed high effectiveness because, by obtaining the separability assessment in it, most of the index values fluctuated between 1900 and 2000 compared with the sicklebush signature. This demonstrates the effectiveness of using high-resolution spatial images to transfer information to free images with larger pixel sizes. Our results are providing local authorities with an affordable and accurate tool to check and measure a known problem described in Cuba some time ago: the progressive marabou infestation of natural and agricultural environments with great ecological value, which, in this case, compromises more than half of the municipality of Trinidad. Once the analytical methodology is developed, this technique will allow in the long-term further analysis of the temporal series of Landsat-8 images of other environmentally valuable areas affected by this weed. Short-term future lines of research would include the delimitation of the Valle de los Ingenios boundaries (as the valley has no fixed borders itself), following an exhaustive, time-related study to estimate the change and degree of infestation over time and space that renders hints about the nature of this weed infestation and its vectors and ways of dissemination to allow rapid implementation of integrated weed control techniques for sicklebush.