Forest Conservation with Deep Learning: A Deeper Understanding of Human Geography around the Betampona Nature Reserve, Madagascar

: Documenting the impacts of climate change and human activities on tropical rainforests is imperative for protecting tropical biodiversity and for better implementation of REDD+ and UN Sustainable Development Goals. Recent advances in very high-resolution satellite sensor systems (i.e., WorldView-3), computing power, and machine learning (ML) have provided improved mapping of ﬁne-scale changes in the tropics. However, approaches so far focused on feature extraction or the extensive tuning of ML parameters, hindering the potential of ML in forest conservation mapping by not using textural information, which is found to be powerful for many applications. Additionally, the contribution of shortwave infrared (SWIR) bands in forest cover mapping is unknown. The objectives were to develop end-to-end mapping of the tropical forest using fully convolution neural networks (FCNNs) with WorldView-3 (WV-3) imagery and to evaluate human impact on the environment using the Betampona Nature Reserve (BNR) in Madagascar as the test site. FCNN (U-Net) using spatial/textural information was implemented and compared with feature-fed pixel-based methods including Support Vector Machine (SVM), Random Forest (RF), and Deep Neural Network (DNN). Results show that the FCNN model outperformed other models with an accuracy of 90.9%, while SVM, RF, and DNN provided accuracies of 88.6%, 84.8%, and 86.6%, respectively. When SWIR bands were excluded from the input data, FCNN provided superior performance over other methods with a 1.87% decrease in accuracy, while the accuracies of other models—SVM, RF, and DNN—decreased by 5.42%, 3.18%, and 8.55%, respectively. Spatial–temporal analysis showed a 0.7% increase in Evergreen Forest within the BNR and a 32% increase in tree cover within residential areas likely due to forest regeneration and conservation efforts. Other effects of conservation efforts are also discussed.


Introduction
The REDD+ (Reducing Emissions from Deforestation and Forest Degradation) program identifies halting and reversing forest loss and degradation, which is essential for mitigating climate change effects [1,2]. To implement REDD+ objectives at the national level, it is imperative to develop methodologies to accurately estimate forest types, forest cover area, forest degradation, and change as well as all forest restoration due to conservation efforts supported by the REDD+ program using satellite remote sensing. Efforts on documenting the impact of REDD+ payments on forest recovery and carbon sequestration in a fully automated manner is of special interest as ever-increasing stocks of very highresolution satellite imagery with a global coverage present unprecedented challenges for big data analytics.
The 13 km × 13 km study area (Figure 1) is located approximately 40 km northwest of the coastal city of Toamasina and is centered over the BNR. The area is characterized by steep slopes varying from 0 to 55 • , isolated forest patches, and extensive agriculture. There are 21 streams that flow through the BNR, and it represents an important watershed for the region that includes the headwaters for two major river systems. The region is characterized by a hot and humid climate with an annual rainfall of over 2000 mm and an average humidity ranging between 80% and 90%. The annual average temperature is 24 • C, with a low of 16 • C in the months of June through August and highs of 32 • C in the months of December through February [10].

Ground Truth Data Collection
Ground truth data were surveyed in 2018 by a local field team facilitated by the Madagascar Fauna and Flora Group (MFG). For each surveyed location, the type of land cover, approximate surface area covered, and associated image was documented. Survey plots were selected such that the target land cover or land use class covered a large and homogenous area. Then, GPS data was recorded at least one location within or on the border of a plot, depending on accessibility. The size of the plots varied, based on the location. Then, these data were categorized into 11 classes ( Figure 2; Table 1). These ground truth points were used as a reference to create samples for model training and testing.
The choice of classification scheme has implications on the map application, and different classification schemes are of value to different management groups. In this research, classes were defined according to land cover (4 classes) and forest cover type (7 classes). The description of each class is seen in Table 1. classes were defined according to land cover (4 classes) and forest cover type (7 classes). The description of each class is seen in Table 1.  WorldView-3 imagery over the study area imaged on February 19th, 2019 ( Figure 1) with minimal cloud cover was acquired. The data were atmospherically and radiometrically corrected [35][36][37] and orthorectified by the vendor, Maxar Technologies [38]. The WorldView-3 data contain 8 bands in the visible (400-1040 nm with a spatial resolution of 1.2 m) and 8 bands in the shortwave infrared (1210-2365 nm with a spatial resolution of 3.7 m). Then, the imagery was stacked and resampled to a 1.2 m spatial resolution using the nearest neighbor resampling method with ENVI 5.4.1 software package (ENVI ® image processing and analysis software, from Exelis Visual Information Solutions). Resampling of the SWIR data from 3.7 m to 1.2 m was necessary to analyze the data at a uniform spatial resolution.

Training Samples
Sample polygons were generated such that only pure pixels were selected, using the ground truth data as a reference. Polygons were created with the survey points projected   Figure 1) with minimal cloud cover was acquired. The data were atmospherically and radiometrically corrected [35][36][37] and orthorectified by the vendor, Maxar Technologies [38]. The WorldView-3 data contain 8 bands in the visible (400-1040 nm with a spatial resolution of 1.2 m) and 8 bands in the shortwave infrared (1210-2365 nm with a spatial resolution of 3.7 m). Then, the imagery was stacked and resampled to a 1.2 m spatial resolution using the nearest neighbor resampling method with ENVI 5.4.1 software package (ENVI ® image processing and analysis software, from Exelis Visual Information Solutions). Resampling of the SWIR data from 3.7 m to 1.2 m was necessary to analyze the data at a uniform spatial resolution.

Training Samples
Sample polygons were generated such that only pure pixels were selected, using the ground truth data as a reference. Polygons were created with the survey points projected to 0.31 m GSD WorldView-3 pan-sharpened visual imagery, ensuring that only homogenous polygons were digitized using the GPS recordings. Additional training samples were created through photointerpretation to create a total of 360 polygons spread out over the entire study area. To compare results from multiple models, the same training and testing data were used for all classification models. Thus, once the sample polygons were created, it was split into training (70%) and testing (30%). The ground truth data, imagery, model training data, and model testing data were georeferenced to the Universal Transverse Mercator (UTM) coordinate system, zone 39 South with the World Geodetic System Datum of 1984 (WGS 84).

U-Net
A typical Convolutional Neural Network (CNN) contains a convolutional layer and a pooling layer. In the convolutional layer, a filter of size N × N pixels slides over the input image and performs element-wise multiplication to produce a single value. The filter slides right to left and downward bottom to repeat the element-wise multiplication, resulting in a feature map with a shape reduced by a value of N-2. The pooling layers aim to reduce all the useful information extracted from convolutional layers to much smaller dimensions. CNNs have been used for imagery-based applications because of their ability to extract spectral and textural information from the images [20]. Textual information derived from convoluting kernels in the neural network enhances the existing spectral information. In a Fully Convolutional Neural Network (FCNN), the final fully connected dense layer within a CNN architecture is replaced with an up-sampling convolutional network. The architecture of the U-Net model ( Figure 3) first implemented by [27] for biomedical image segmentation has been used in forest type and tree species mapping [17] as well as the delineation of human-induced deforestation [16].
Remote Sens. 2021, 13, x FOR PEER REVIEW 7 of 30 Figure 3. U-net architecture with the encoding (left) and decoding (right) sections that produce the characteristic U-shape of the architecture, producing pixel-wise classification of the input imagery. ResNet layers forms the encoder.

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a non-parametric method that attempts to classify non-linearly separable data through the kernel trick in which input data are mapped to a higher-dimensional feature space where a linear decision plane can be easily computed to separate the classes [41]. This hyperplane is drawn such that the distance between the nearest data vectors and the hyperplane is maximized. The SVM model was created using the sklearn module (scikit-learn 0.24).
The accuracy of the SVM model is dependent on the choice of kernel, and a radial bias kernel is shown to outperform other kernels for remote sensing applications [42]. The RBF kernel requires tuning of two parameters-C and γ. Choice of the C parameter involves a trade-off between correct classification and maximization of the margin. Thus, a smaller C value will result in a wider margin and thus a lower accuracy. The γ value controls the radius of influence of the training samples. Thus, a greater value will result in a model that overfits on the training data and poor generalization on the testing data. The best values for C and γ were optimized using GridSearchCV (scikit-learn 0.24) and were The U-Net algorithm [27] is a type of FCNN where the encoding path follows the standard ResNet CNN structure: repeated 3 × 3 unpadded convolutions with ReLU as activation functions and standard 2 × 2 max pooling operations with different numbers of kernels. However, the decoding path replaces max pooling operations with transpose convolutions, which doubles the resolution of each feature map. In addition, each upsampled feature map is concatenated with cropped feature maps from the "same level" of the encoding path. This enables the precise localization and compensates for the loss of information in the pooling layers. As a result of this, the U-Net architecture was used over other FCNNs. The final layer of U-Net is 1 × 1 convolution with softmax activation, which produces a per-pixel segmentation map for the image. The quantity of training data derived from ground data collection in 2018 was limited. A U-Net implemented with Keras produced a low training and testing accuracy with segmentation results on testing data differing between repetitions. Implementing a U-Net model within the arcgis.learn module of the ArcGIS API for Python [39] removed the requirement for a quantitatively massive dataset. Additionally, the U-Net implemented through argis.learn is pretrained on ImageNet to further improve classification accuracies. This reduces the time and resources spent on ground truth data collection. Hyperparameter tuning was done to select the best U-Net model [31,34,40]. Based on hyperparameter optimization, it was found that the best model uses an input patch size of 64 × 64 pixels and a ResNet-50 backbone [31,34,40].

Support Vector Machine (SVM)
Support Vector Machine (SVM) is a non-parametric method that attempts to classify non-linearly separable data through the kernel trick in which input data are mapped to a higher-dimensional feature space where a linear decision plane can be easily computed to separate the classes [41]. This hyperplane is drawn such that the distance between the nearest data vectors and the hyperplane is maximized. The SVM model was created using the sklearn module (scikit-learn 0.24).
The accuracy of the SVM model is dependent on the choice of kernel, and a radial bias kernel is shown to outperform other kernels for remote sensing applications [42]. The RBF kernel requires tuning of two parameters-C and γ. Choice of the C parameter involves a trade-off between correct classification and maximization of the margin. Thus, a smaller C value will result in a wider margin and thus a lower accuracy. The γ value controls the radius of influence of the training samples. Thus, a greater value will result in a model that overfits on the training data and poor generalization on the testing data. The best values for C and γ were optimized using GridSearchCV (scikit-learn 0.24) and were found to be 1000 and 1, respectively.

Random Forest (RF)
The RF algorithm constructs multiple decision trees (DTs), or classifiers, that each predict a class [43]. Each tree within the RF is created using different training data subsets, which are selected through repetition and replacement of the original training dataset. This repetition and replacement create a 'bagged' dataset for each of the decision trees within the random forest. The samples that are out of the bag, the so-called out-of-bag samples (OOB), are used for validation of the RF model. The final prediction of the RF is based on the majority vote from all trees. The RF model was created using the sklearn module (scikit-learn 0.24). The number of trees and the maximum number of features to be considered for the best split were tuned via GridSearchCV (scikit-learn 0.24) and were found to be as follows: best number of trees = 130; maximum features required for split = auto.

Deep Neural Network
A Deep Neural Network (DNN) consists of multiple hidden layers, made up of 'n' number of neurons. These neurons are interconnected with neurons of the preceding and next layer, through some weight (m) and bias (c), such that, The network attempts to learn the values for the various weights and biases, the parameters of the model, by minimizing the cost function. The choice of the cost function is important, as this function guides the model in the direction of the correct weights and biases for accurate predictions. The cross-entropy loss function is used for the DNN model.
The number of neurons and the number of hidden layers all affect the predicting power of the network. Thus, these parameters were adjusted, resulting in the architecture seen in Figure 4. The presence of a dropout layer, with a value of 0.23, after hidden layer 5 and a batch normalization layer after hidden layer 2 increased the testing accuracy of the model. These additional layers-dropout and batch normalization-reduce the over-fitting of the model on the training data and increase generalization on testing data. The hyperparameters in this network-the learning rate, number of epochs, and batch size-were further tuned such that the testing accuracy and the kappa were the best among all models. The optimized hyperparameter values for learning rate and batch size were 0.007 and 48, respectively.
The network attempts to learn the values for the various weights and biases, the parameters of the model, by minimizing the cost function. The choice of the cost function is important, as this function guides the model in the direction of the correct weights and biases for accurate predictions. The cross-entropy loss function is used for the DNN model.
The number of neurons and the number of hidden layers all affect the predicting power of the network. Thus, these parameters were adjusted, resulting in the architecture seen in Figure 4. The presence of a dropout layer, with a value of 0.23, after hidden layer 5 and a batch normalization layer after hidden layer 2 increased the testing accuracy of the model. These additional layers-dropout and batch normalization-reduce the overfitting of the model on the training data and increase generalization on testing data. The hyperparameters in this network-the learning rate, number of epochs, and batch sizewere further tuned such that the testing accuracy and the kappa were the best among all models. The optimized hyperparameter values for learning rate and batch size were 0.007 and 48, respectively.

Accuracy Assessment
To compare the accuracy of various models independent testing data, i.e., same for all models, is used for model evaluation. The confusion/error matrix and subsequent metrics are calculated on a pixel-level basis. This confusion matrix shows the model prediction of each class (rows) compared to the original class (columns) as defined by the testing data. The quantitative evaluation was conducted by using overall accuracy (OA), kappa coefficient, user's accuracy, producer's accuracy, and F1 scores, which were derived using the error matrix.
The OA is defined as the ratio of the number of correctly classified samples to the total number of test samples (Equation (2)). While OA measures simple percentage agreement, the Kappa coefficient measures the degree of agreement by considering the correctly classified samples that may happen by chance and is usually less than or equal to 1 (perfect agreement) (Equation (3)).

Accuracy Assessment
To compare the accuracy of various models independent testing data, i.e., same for all models, is used for model evaluation. The confusion/error matrix and subsequent metrics are calculated on a pixel-level basis. This confusion matrix shows the model prediction of each class (rows) compared to the original class (columns) as defined by the testing data. The quantitative evaluation was conducted by using overall accuracy (OA), kappa coefficient, user's accuracy, producer's accuracy, and F1 scores, which were derived using the error matrix.
The OA is defined as the ratio of the number of correctly classified samples to the total number of test samples (Equation (2)). While OA measures simple percentage agreement, the Kappa coefficient measures the degree of agreement by considering the correctly classified samples that may happen by chance and is usually less than or equal to 1 (perfect agreement) (Equation (3)).
where P o : Observed Accuracy & P e : Accuracy obtained by chance. Producer's accuracy measures the accuracy of model predictions, while user's accuracy measures whether the predictions reflect true ground cover (Equations (4) and (5)). These values for producer's and user's Accuracy-TP, FP, and FN-are based on an error matrix that is weighted according to the proportion of area in each cell of the matrix [44]. The F1 score is mathematically defined in Equation (6).
True Positive (TP) is the number of pixels for a class correctly predicted by the model. False Negative (FN) is the number of pixels for a certain class wrongly predicted as another class, while False Positive (FP) counts the number of pixels predicted as a certain class that actually belong to another class.

Land Cover and Land Use Change
Conservation efforts were quantified by investigating the land cover and land use change over time. Since the types of land cover classes studied here were the same as the 2010 classification [10], the map created here was compared with one created by [10]. Since the 2019 imagery covered a much larger extent than the 2010 classification product, the change metrics were based on the 2010 imagery extent. Zone of protection (ZOP) is defined as a 100 m wide ecotone at the perimeter of the BNR, which extends 100 m outside the boundary of the BNR [14]. The activities within the ZOP have an impact on flora and fauna within the BNR due to the forest edge effect, since both human and ecological activities overlap in the ZOP. Therefore, change metrics for ZOP were computed for an area that extends 100 m inward and outward from the boundary. In other words, the statistics of the changes for the ZOP include 100 m within the BNR plus the ZOP, which extends 100 m from the boundary outside the reserve. The overall percent change for each land cover and land use was calculated by dividing the difference between 2019 and 2010 total area by the total area in 2010 for that specific land cover and land use; see Equation (7).
In order to visualize the trajectory of change spatially, we used a grid cell approach as in [45]. A grid cell size of 10 × 10 m was used to aggregate pixel-level land cover and land use types. We chose this grid size as it is approximately the size of mature trees so that we can track the changes to the scale of individual trees. The percent change was computed based on aggregated percent land cover and land use type within each grid cell using a combination of ArcGIS processing tools and python code. First, a 10 m fishnet grid was created using ArcGIS Pro, which was intersected with 2010 and 2019 classification shapefiles. Then, polygons of the same class that intersect (fall within) a fishnet grid were aggregated to produce the sum of each unique land cover within the fishnet cell (using ArcGIS Pro 'Summarize Attributes' tool). This resulted in a table with multiple rows with the various land cover types in the same column. In order to organize each land cover percentage in separate columns (which is needed to calculate spatio-temporal change), the table was pivoted (using ArcGIS Pro "Pivot" tool) such that each grid cell has the total percentage of a specific land cover organized by a table of columns. Finally, the percent area of a specific land cover per grid cell was calculated, and the percent change was computed as the difference between 2010 and 2019 classification products.

Classification Results
The U-Net model produced a 90.9% accuracy, outperforming other pixel-based models ( Table 2). The SVM, RF, and DNN models produced an overall accuracy of 88.6%, 84.8%, and 86.6% respectively. Among SVM, RF, and DNN models, SVM was able to better distinguish between the classes. Table 2. Results-overall accuracy (Equation (2)), kappa coefficient (Equation (3)), producer's (Equation (4)), and user's accuracy (Equation (5)) for Support Vector Machine (SVM), Random Forest (RF), Deep Neural Network (DNN), and U-Net models, based on independent testing data. The U-Net model provided the highest producer's and user's accuracy for Mixed Forest and Evergreen Forest. The producer's and user's accuracy for Molucca Raspberry was greater than 95% for all methods, with the DNN model providing the highest accuracies and U-Net providing the lowest accuracies. Although the U-Net model produces the highest producer's accuracy of 81.6% for Madagascar Cardamom, only 84.5% is actually Madagascar Cardamom. An increase in both the producer's and user's accuracy was seen for the U-Net model for Guava. The producer's accuracy for Shrubland is seen to increase for the U-Net model, while the user's accuracy decreased. The U-Net model resulted in the highest user's accuracy and the lowest producer's accuracy for Grassland compared to other methods, while the SVM model produced the best producer's and user's accuracy for Grassland. Open Water, due to its unique spectra compared to other classes, results in 100% user's and producer's accuracy across all models. The U-Net model resulted in 100% producer's and user's accuracy for the Residential and Fallow classes. As shown in Figure  5, the F1 score from U-Net is the highest for all classes considered, indicating that U-Net is a better model overall for tropical forest habitat mapping. . F1 score (Equation (6)), for all classes and models, showing the superiority of the FCNN U-Net model.

SWIR Bands
The removal of SWIR bands reduced the accuracy by 5.42%, 3.18%, 8.55%, and 1.87% for the SVM, RF, DNN, and U-Net models, respectively ( Table 3). The confusion matrix for the DNN and U-Net models from which accuracy metrics are derived is shown in Figure 6. The DNN model is chosen for comparison because of the greatest percent change due to the removal of SWIR bands, producing the worst accuracy when SWIR bands were removed. The U-Net model is chosen because of the lowest reduction in accuracy when the SWIR bands were removed. The removal of the SWIR bands amplifies the number of misclassified pixels between classes: for example, guava pixels being classified as Madagascar Cardamom ( Figure 6). A decrease in the producer's and user's accuracy for Madagascar Cardamom and Guava, which are invasive plant species, is seen ( Figure 7). Table 3. Overall accuracy (Equation (2)) and percent change for all classification models including (16 bands) and excluding (8 bands (6)), for all classes and models, showing the superiority of the FCNN U-Net model.

SWIR Bands
The removal of SWIR bands reduced the accuracy by 5.42%, 3.18%, 8.55%, and 1.87% for the SVM, RF, DNN, and U-Net models, respectively ( Table 3). The confusion matrix for the DNN and U-Net models from which accuracy metrics are derived is shown in Figure 6. The DNN model is chosen for comparison because of the greatest percent change due to the removal of SWIR bands, producing the worst accuracy when SWIR bands were removed. The U-Net model is chosen because of the lowest reduction in accuracy when the SWIR bands were removed. The removal of the SWIR bands amplifies the number of misclassified pixels between classes: for example, guava pixels being classified as Madagascar Cardamom ( Figure 6). A decrease in the producer's and user's accuracy for Madagascar Cardamom and Guava, which are invasive plant species, is seen ( Figure 7). Table 3. Overall accuracy (Equation (2)) and percent change for all classification models including (16 bands) and excluding (8 bands) the shortwave infrared (SWIR) bands.

Spatial Distribution of LCLU
A LCLU map was created through post-classification editing of the best model (U-Net) results to further reduce the errors in an effort to create a "gold standard" classified map ( Figure 8) by further removing spurious pixels and majority analyses. Each pixel within the map was manually scanned for misclassified pixels using photointerpretation of the 0.31 m GSD pan-sharpened image, and such pixels were reclassified using the 'Pixel Editor' tool in ArcGIS Pro. This map accurately describes the class of each 1.2 m × 1.2 m pixel within the map. Based on Figure 8, around 50% of the study area is covered by shrubland, 33% is covered by mixed and evergreen forests, and 4.7% is covered by invasive plant species. Another 8% of the study area is used for cultivation (Row Crops and Fallow). The detailed distribution in percent of the total study area is shown in Table 4.
of the 0.31 m GSD pan-sharpened image, and such pixels were reclassified using the 'Pixel Editor' tool in ArcGIS Pro. This map accurately describes the class of each 1.2 m × 1.2 m pixel within the map. Based on Figure 8, around 50% of the study area is covered by shrubland, 33% is covered by mixed and evergreen forests, and 4.7% is covered by invasive plant species. Another 8% of the study area is used for cultivation (Row Crops and Fallow). The detailed distribution in percent of the total study area is shown in Table 4. Table 4. Percentage and area in hectares of each class type mapped in 2019 within the study area, based on the classification map seen in Figure 8.    The quantitative change was conducted for the overlapping extent ( Figure 9) between the two classification maps produced for 2010 and 2019 (Table 5). An increase is seen for the following classes: Evergreen Forest (1.5%), Mixed Forest    (Figure 9), along with the amount of percent increase (or decrease) (Equation (7)) over time for each class. The 44% increase in Open Water is attributed to the extensive mapping of the streams in the study area as well as increased panchromatic resolution of 0.3m that enabled the discernment of very narrow streams, which is not possible with the 4 m spatial resolution of IKONOS data used in the 2010 classification. Since the area represents an important watershed for the region, mapping all the streams within the area is important to guide decision-makers and conservationists.

Classification
With the increasing population, an increase in Residential land use is seen within the study area. This is seen both in the increase in the spatial extent of residential areas as well as the emergence of new hamlets ( Figure 10). Furthermore, the increased spatial resolution of the 2019 imagery (from 4 m to 1.2 m) has resulted in the identification of isolated huts believed to be seasonal agricultural homes called lasy. Its isolated nature along with the near distance to the agricultural fields validates this hypothesis.  It should be noted that trees within the boundaries of residential areas, originally classified as Mixed Forests, were re-classified as Residential in this study to quantify the change in residential tree cover for agroforestry analysis. Based on the re-classified pixels, tree cover within residential areas was estimated for 2010 and 2019 ( Figure 11). A 32% increase in the tree cover in residential areas was observed in 2019, which may be associated with tree growth and increased fruit tree plantation within residential areas encouraged by conservation groups and local authorities to combat food insecurity.
The reduced area for Row Crops and Fallow is seen primarily within the ZOP. It can potentially be attributed to a seasonal shift in agriculture. The imagery from which the 2010 classification map was derived was collected in May, while the current imagery was collected in February. Additionally, the replanting of native flora in the ZOP to protect the degradation of evergreen forest at the forest edges is also a major factor in the reduction of agriculture in the ZOP.
It should be noted that trees within the boundaries of residential areas, originally classified as Mixed Forests, were re-classified as Residential in this study to quantify the change in residential tree cover for agroforestry analysis. Based on the re-classified pixels, tree cover within residential areas was estimated for 2010 and 2019 ( Figure 11). A 32% increase in the tree cover in residential areas was observed in 2019, which may be associated with tree growth and increased fruit tree plantation within residential areas encouraged by conservation groups and local authorities to combat food insecurity. The reduced area for Row Crops and Fallow is seen primarily within the ZOP. It can potentially be attributed to a seasonal shift in agriculture. The imagery from which the 2010 classification map was derived was collected in May, while the current imagery was collected in February. Additionally, the replanting of native flora in the ZOP to protect the degradation of evergreen forest at the forest edges is also a major factor in the reduction of agriculture in the ZOP.
The classes that were converted to Shrubland in 2019 are shown in Figure 12. The conversion of 2010 Evergreen Forest to Shrubland areas in 2019 is considered as land degradation and is seen in forest fragments within the study area, east of the BNR (Figure 13). Around 70% of the conversion was seen from Fallow and Grasslands to Shrubland. The vegetation in these areas could be late inter-cropping periods when agricultural fields have been left fallow for several years, or it could be the early stages of a Mixed Forest or regenerating Evergreen Forest, but the small height of these trees has resulted in a Shrubland classification. The classes that were converted to Shrubland in 2019 are shown in Figure 12. The conversion of 2010 Evergreen Forest to Shrubland areas in 2019 is considered as land degradation and is seen in forest fragments within the study area, east of the BNR (Figure 13). Around 70% of the conversion was seen from Fallow and Grasslands to Shrubland. The vegetation in these areas could be late inter-cropping periods when agricultural fields have been left fallow for several years, or it could be the early stages of a Mixed Forest or regenerating Evergreen Forest, but the small height of these trees has resulted in a Shrubland classification.  Figure 9. This information is useful for awareness-raising and conservation efforts.  Figure 9. This information is useful for awareness-raising and conservation efforts. Figure 12. Percentage of classes in 2010 that were converted to Shrubland in 2019, within the study area based on the classification map extent seen in Figure 9. This information is useful for awareness-raising and conservation efforts.

Change within the BNR
The land cover changes within the BNR are shown in Table 6. Increases within Mixed Forest and Evergreen Forest are observed while decreases in invasive plant species are observed within the BNR (Figure 9; Table 6). The classes that were converted to Mixed Forest in 2019 are shown in Figure 14. When comparing land cover in 2010 and 2019, 3080 ha, 3101 ha, and 459 ha of invasive Molucca Raspberry, Madagascar Cardamom, and Guava respectively were converted to Mixed Forest in 2019 probably due to tree growth. However, 876 ha of Evergreen Forest were also converted to Mixed Forest in 2019. Table 6. Percentage of forest cover types in 2010 and 2019 observed within the boundary of the BNR (Figure 1).

Change within the Zone of Protection (ZOP)
The ZOP, also called the buffer zone, acts as a transition between protected areas and surrounding unmanaged landscape that reduces the negative impacts on protected areas. In this zone, human and ecological activities clash. In the ZOP, while the extent of invasive Molucca Raspberry has decreased over time, an increase in the extent of Madagascar Cardamom and Guava is observed ( Figure 15; Table 7). Within the ZOP, anthropogenic land use (i.e., residential) decreased by 17.9%, and Row Crops and Fallow areas decreased by a 76% and 4%, respectively. The decrease in anthropogenic land use is complemented by an increase in the extent of Mixed and Evergreen Forests. Detailed mapping of streams, which can be seen in Figure 15, accounts for the 430% increase in Open Water. It must be noted that this significant increase in Open Water class was the result of the super-high resolution of the 2019 WV-3 imagery compared to the IKONOS imagery used in 2010, which allowed better discrimination of smaller streams or streams that are covered by trees.

Change within the Zone of Protection (ZOP)
The ZOP, also called the buffer zone, acts as a transition between protected areas and surrounding unmanaged landscape that reduces the negative impacts on protected areas. In this zone, human and ecological activities clash. In the ZOP, while the extent of invasive Molucca Raspberry has decreased over time, an increase in the extent of Madagascar Cardamom and Guava is observed ( Figure 15; Table 7). Within the ZOP, anthropogenic land use (i.e., residential) decreased by 17.9%, and Row Crops and Fallow areas decreased by a 76% and 4%, respectively. The decrease in anthropogenic land use is complemented by an increase in the extent of Mixed and Evergreen Forests. Detailed mapping of streams, which can be seen in Figure 15, accounts for the 430% increase in Open Water. It must be noted that this significant increase in Open Water class was the result of the super-high resolution of the 2019 WV-3 imagery compared to the IKONOS imagery used in 2010, which allowed better discrimination of smaller streams or streams that are covered by trees.

Land Cover and Land Use Classification
Only spectral information is used as input features for classification using SVM, RF, and DNN. Although classification accuracies are greater than 80% for all models, a salt-andpepper effect is seen in the classification maps (Figure 16), which is common for pixel-based classification with very high-resolution satellite data. Note that the SVM classification map is shown here because of its higher accuracy compared to RF and DNN models. With very high-resolution images such as WV-3, the increased spatial resolution results in an increase in the within-class variation and a decrease in the between-class variation, decreasing the separability between classes [46]. This can be observed in the confusion matrix with higher percentages of misclassified pixels ( Figure 6). Additionally, as seen in Figure 17, all forest covers have similar-looking spectral curves. To reduce these misclassified pixels, researchers have used vegetation indices and textural features such as the Gray Level Co-occurrence Matrix (GLCM) [46]. However, the creation of such handcrafted features is time consuming, requires tuning, and needs domain expertise, highlighting the advantage of end-to-end CNN-based classification that utilizes texture information through an FCNN architecture. chitectures usually require thousands of training data so that the model is able to generalize on the testing data. The samples were limited, and images were rotated to increase the training and testing data.

Contribution of SWIR Bands
WV-3 SWIR bands can detect non-vegetative pigments [18], and thus, the removal of these bands reduced the overall accuracy by 5.42%, 3.18%, 8.55%, and 1.87% for SVM, RF, DNN, and U-Net models, respectively (Table 3). Therefore, the inclusion of WV-3 SWIR bands improves the separability between vegetative classes and improves classification accuracy.
Since the RF, SVM, and DNN models use only the spectral features for classification, the removal of SWIR bands has a greater impact on the overall accuracy. Based on the percent decrease in accuracy, the SWIR bands were most useful in separating classes in the DNN model, followed by the SVM and RF models. Within U-Net, the inclusion of spatial/textural features results in only a 1.87% decrease in accuracy when SWIR bands were excluded, confirming that CNN-based approaches can provide robust and accurate results just with VNIR data, excluding the need for expansive SWIR data collection. However, it is worth noting that we used a 2D CNN architecture in this paper. A 3D CNN architecture that fully utilizes spectral differences among classes along with texture and patterns may show the benefit of SWIR bands differently.
WV-3 VNIR and SWIR bands have been used to discriminate between tree species [18,19] and crop types [49]. These studies showed that the contribution of each SWIR bands is class-specific and varied for different land cover and land use types, which was further confirmed by this study. A decrease is observed in the F1 score of Mixed Forest, Fallow, Open Water, and Guava when SWIR bands were removed across all classification methods employed. For the remaining classes, an increase in the F1 score is seen for at least one classification method.
The F1 scores for all classification methods and selected land cover classes are shown in Figure 18 Guava is highlighted in Figure 18a, because of the highest decrease in F1 score for SVM, RF, and DNN models compared to all other classes. For the U-Net model, the removal of SWIR bands reduced the F1 score by almost 50% for Madagascar Cardamom (Figure 18b). In Figure 18c, Evergreen Forest is highlighted for 16 (VNIR + SWIR) bands and eight (VNIR) bands as the F1 score increased for RF and U-Net models upon removing SWIR bands. Figure 18d shows the decrease in F1 score for Mixed Forest without SWIR bands; similarly, in Figure 18e, decreases are evident for Molucca Raspberry. For Row Crops and Residential, the F1 score slightly increased when SWIR bands were removed for both RF and DNN, respectively. This increase could be attributed to the fact that both Row Crops and Residential land use are best differentiated by VNIR bands and the relatively lower resolution of SWIR may have affected the metric. For Fallow, as shown in The FCNN U-Net model produced an overall accuracy of 90.9%, outperforming the other pixel-based models studied here. Similarly, U-Net has outperformed other pixelbased models in other studies [40]. The improvement in accuracy can be attributed to the architecture, which is common within all FCNN encoder-decoder type models. Firstly, the 2D kernels within convolutional layers consider spatial information. The convolutional layers extract hierarchical features, and therefore, raw WV-3 imagery input alone is able to produce >90% accuracies [17]. Finally, the fully convolutional nature can segment and classify each pixel in the input image resulting in an end-to-end classification map [26,30,40]. Thus, semantic segmentation at the pixel level along with the extraction of features through convolutional layers and the inclusion of spatial information results in the higher accuracy of the FCNN-based U-Net over other pixel-based models. However, one disadvantage of the U-Net model is the labeling of each pixel in the training data. Abundant data are needed for training deep neural networks of at least 1000 image samples. In such cases, a pretrained network can reduce the amount of training data needed [47]. The U-Net implemented within arcgis.learn had a ResNet-50 backbone that was pretrained on ImageNet.
The following classes provided a higher user's accuracy and producer's accuracy across all pixel-based models: Open Water, Molucca Raspberry, and Row Crops. These classes have unique spectral signatures that enable the separation of each class from other classes ( Figure 17). SVM, RF, and DNN models in this study suffer from misclassified Fallow (bare land) and Residential (built-up) pixels because of similar reflectance curves ( Figure 17). The misclassification can be reduced by using an FCNN-based U-Net model that includes textural features (Table 2, Figure 5) [30]. A lot of misclassified pixels are seen between forest covers-namely, Mixed Forest, Evergreen Forest, and Guava [48]. Training data for Mixed Forest were created based on limited ground reference data points. Misclassified pixels among these three classes-Mixed Forest, Evergreen Forest, and Guava-could stem from inconsistent training data. The training data might not have been able to capture the spectral variability within each class, which could have reduced the separability between the classes. The spectral signature of Guava is difficult to recognize even with using other sources of data [10]. Furthermore, guava does not need direct sunlight and therefore grows beneath the forest canopy which makes its identification through satellite imagery challenging. The producer's and user's accuracy decreased for Madagascar Cardamom and Molucca Raspberry using the U-Net model. This decreased accuracy may be due to the limited training samples for the U-Net model implemented. Deep architectures usually require thousands of training data so that the model is able to generalize on the testing data. The samples were limited, and images were rotated to increase the training and testing data.

Contribution of SWIR Bands
WV-3 SWIR bands can detect non-vegetative pigments [18], and thus, the removal of these bands reduced the overall accuracy by 5.42%, 3.18%, 8.55%, and 1.87% for SVM, RF, DNN, and U-Net models, respectively (Table 3). Therefore, the inclusion of WV-3 SWIR bands improves the separability between vegetative classes and improves classification accuracy.
Since the RF, SVM, and DNN models use only the spectral features for classification, the removal of SWIR bands has a greater impact on the overall accuracy. Based on the percent decrease in accuracy, the SWIR bands were most useful in separating classes in the DNN model, followed by the SVM and RF models. Within U-Net, the inclusion of spatial/textural features results in only a 1.87% decrease in accuracy when SWIR bands were excluded, confirming that CNN-based approaches can provide robust and accurate results just with VNIR data, excluding the need for expansive SWIR data collection. However, it is worth noting that we used a 2D CNN architecture in this paper. A 3D CNN architecture that fully utilizes spectral differences among classes along with texture and patterns may show the benefit of SWIR bands differently.
WV-3 VNIR and SWIR bands have been used to discriminate between tree species [18,19] and crop types [49]. These studies showed that the contribution of each SWIR bands is class-specific and varied for different land cover and land use types, which was further confirmed by this study. A decrease is observed in the F1 score of Mixed Forest, Fallow, Open Water, and Guava when SWIR bands were removed across all classification methods employed. For the remaining classes, an increase in the F1 score is seen for at least one classification method.
The F1 scores for all classification methods and selected land cover classes are shown in Figure 18 Guava is highlighted in Figure 18a, because of the highest decrease in F1 score for SVM, RF, and DNN models compared to all other classes. For the U-Net model, the removal of SWIR bands reduced the F1 score by almost 50% for Madagascar Cardamom (Figure 18b). In Figure 18c, Evergreen Forest is highlighted for 16 (VNIR + SWIR) bands and eight (VNIR) bands as the F1 score increased for RF and U-Net models upon removing SWIR bands. Figure 18d shows the decrease in F1 score for Mixed Forest without SWIR bands; similarly, in Figure 18e, decreases are evident for Molucca Raspberry. For Row Crops and Residential, the F1 score slightly increased when SWIR bands were removed for both RF and DNN, respectively. This increase could be attributed to the fact that both Row Crops and Residential land use are best differentiated by VNIR bands and the relatively lower resolution of SWIR may have affected the metric. For Fallow, as shown in Figure 18h, a decrease is evident when SWIR bands are removed, indicating the importance of SWIR bands for identifying Fallow, as SWIR regions are most effective for mapping crop residue in Fallow. Finally, for Shrubland ( Figure 18i) and Grassland (Figure 18j), SVM, RF, and DNN were unable to identify the class without the presence of SWIR bands, producing a percent decrease in F1 scores, while the U-Net model was able to produce better F1 scores when the SWIR bands were removed. Figure 18h, a decrease is evident when SWIR bands are removed, indicating the importance of SWIR bands for identifying Fallow, as SWIR regions are most effective for mapping crop residue in Fallow. Finally, for Shrubland ( Figure 18i) and Grassland ( Figure  18j), SVM, RF, and DNN were unable to identify the class without the presence of SWIR bands, producing a percent decrease in F1 scores, while the U-Net model was able to produce better F1 scores when the SWIR bands were removed.

Accuracy of Classification Maps
Land cover and land use maps are highly advantageous to conservationists and land managers for the establishment of new conservation programs, for quantitative evaluation of various existing conservation programs, and for the estimation of forest fragmentation rates [4,16]. Moreover, it is beneficial to users that these land cover maps are 100% accurate.
Accuracy estimation for each pixel in classification maps is not possible, since ground truth data are not available for each pixel. Therefore, the accuracy of classification maps is calculated based on pixels that are representative of the image [50]. These pixels are taken at random and are not used to train the classification model. If the sample pixels selected are not true representations of the study area, the resulting map accuracy will be inaccurate [50], and these accuracy values should be treated with caution.
To increase the accuracy of land cover maps created through classification, the maps can be inspected and subsequently edited. Although such a process is labor-intensive and time consuming, it ensures a near 100% accuracy of the land cover map. Therefore, the final classified map was manually inspected, fine-tuned, and edited to reflect true land covers. For example, trees within the Mixed Forest class that obstruct rivers and streams are reclassified as Open Water. Other edits included the re-classification of misclassified Fallow pixels to Residential and vice versa. The land cover map was edited to correctly reflect the distribution of Madagascar Cardamom and Molucca Raspberry. The outcome of this post-editing as mentioned in previous sections is a gold standard map of the entire study area. Confusion matrices were computed by comparing the gold standard map and the LCLU maps produced by the machine learning algorithms including the U-Net and SVM models (Table 8). Table 8. Accuracy metrics-overall accuracy (Equation (2)) and kappa coeff. (Equation (3))-based on the predictions of best performing models, SVM and U-Net, compared with the edited map seen in Figure 8. The accuracy of modelsis much lower compared to the accuracy derived from independent testing data. Although the overall accuracy of the U-Net model based on testing data of limited random pixels is over 90%, the accuracy based on the gold standard map is only 77%, which reflects the accuracy over the entire study, including all pixels in the imagery as opposed to the testing accuracy based on the limited testing pixels. Similarly, a much lower accuracy is seen with the SVM model. There are numerous reasons for the resulting low accuracy for the U-Net model, which can be attributed to the training of the models themselves. With additional training data, the spectral variation within each class ( Figure  17) could be learned to better predict the land cover. Another source of error for the lower accuracy (Table 8) is Open Water. Although the overhead imagery shows tree cover above a stream, these tree pixels over water features are reclassified to Open Water in the gold standard map. Since the area represents an important watershed for the region, accurate mapping of these streams is important. Similarly, landslides expose bare rock, and their spectral signature is very similar to that of the Fallow class. Thus, both models classify those exposed rocks as Fallow. Although this classification is 'spectrally' accurate, it is not reflective of the true LCLU. Additionally, the low accuracy of 66% for the SVM model can be attributed to the salt-and-pepper effects of pixel-based classification.

Conservation Efforts in the BNR
Ref [14] found, based on imagery acquired in 2010, that 81% of the BNR consists of undisturbed and/or degraded evergreen forest, and 10% of the BNR consists of regenerating or degraded mixed forest. Results from the 2019 imagery show a 12.5% increase in regenerating Mixed Forest and a 0.7% increase in Evergreen Forest ( Table 6). The regeneration of Evergreen Forest is a lengthy process, and therefore, the success of the restoration work within BNR is only slightly apparent with a 0.7% increase in Evergreen Forest. However, given that the trend in 2010 was toward increasing deforestation and forest fragmentation [14], this is already a worthwhile result, especially when coupled with a 13% and 28% increase in Mixed Forest in the BNR and the ZOP, respectively.
The Madagascar Flora and Fauna Group (MFG), an association of zoos and botanical gardens, manages the BNR in collaboration with the Madagascar National Parks (MNP) [12]. Since 2007, the MFG has run a community-based native forest restoration project in the BNR's ZOP, a 100 m buffer extending out from the BNR, in partnership with the Madagascar National Parks. Another initiative is the control of invasive Guava in the BNR as well as the control of local invasive plants Madagascar Cardamom and invasive Molucca Raspberry, both of which have contributed to the 0.7% increase in Evergreen Forest in the BNR.
Knowing where and which LCLU types have been converted to Shrubland (Figure 12) within the study area is very valuable to the MFG and their partners at the MNP. This knowledge targets awareness raising and conservation efforts in areas of particular concern for example, the deforestation of formerly evergreen forests that are mainly remnant "Classified Forests". These "Classified Forests" have a legally protected status but little practical protection. Similarly, knowledge of which classes were converted to Mixed Forest in 2019 in BNR ( Figure 14) is also beneficial. For instance, although the conversion of Evergreen Forest to Mixed Forest amounted to 876 ha, the majority of classes that was converted to Mixed Forest was Molucca Raspberry (3080 ha), Fallow (2648 ha), Madagascar Cardamom (3102 ha), and Grassland (2697 ha).
Land use in the vicinity of protected areas is seen to negatively affect conservation efforts [51]. Since human and ecological activities overlap in these areas, the land cover changes are studied in the ZOP. The size of this transition zone is further dependent on the ecological, socio-economic, and hydrological interactions with the surrounding landscape [51], and therefore, it differs between protected areas. For the BNR, the size of this zone is taken as 100 m extending out from the BNR boundary. In the ZOP and 100 m within the BNR boundary, a decrease in anthropogenic land use is observed (residential areas, agricultural fields, and fallow land), which is complemented by an increase in mixed and evergreen forests (Table 7; Figure 19). These results are most likely a direct consequence of the MFG's native forest restoration program and increased awareness-raising and lobbying by the MFG and Madagascar National Parks to discourage slash-and-burn agriculture in the ZOP. More recently, the MFG has initiated two efforts to reduce forest loss and promote more sustainable agriculture: the promotion of agroforestry and the distribution of fuel-efficient stoves.
Agroforestry around villages has been promoted by the MFG and their agroecology specialists since 2005. A 32% increase is seen in tree cover within residential areas (Figure 11), and it is found that there is an average of 6.5 species found in residential home gardens [12]. The increase in tree cover is indicative of native trees maturing over time and is also indicative of successful agroforestry efforts. The local population has already been practicing agroforestry for many generations, so it is not possible to quantify the impact due to the MFG's specific efforts. However, the provision of training, trees and equipment since 2005 seems to be making a positive difference to tree cover in residential areas despite the overall growth of many of the residential areas. Ref. [12] also found that local farmers around the BNR are willing to increase tree crop production.
Ref. [5] found that 96% of the local population relies on firewood for cooking. To reduce the need for wood for firewood production, 2100 fuel-efficient stoves were distributed in 12 villages around the BNR. Distributions were made during three time periods: September 2018, May 2019, and September 2020. Prior to the collection of the imagery used for this analysis, 700 stoves were distributed. The change that occurred in this period (September 2018 to February 2019) is too short compared to the months where these stoves were not being used to assess their impact. In addition, trees are not generally felled completely for firewood but rather coppiced, or deadwood is collected below the forest canopy. Therefore, it would be almost impossible to quantify the impacts of fuel-efficient stoves through remote sensing techniques.

Human Geography Implications
Since agriculture is the main source of income for the locals [5,12], the mapping of agricultural land becomes important. In 2019, 5147 ha (3.08%) ( Table 4) of the study area accounts for agricultural cover, which is associated with human settlements that account for 650 ha (0.38%) of the study area. The presence of these agricultural fields in the ZOP ( Figure 15; Table 7) leads to an increase in human interaction in the buffer zone, which often has negative impacts on the biodiversity within the BNR. These agricultural fields are spread out over the study area, but a higher concentration is seen in the southern part, close to residential areas. Figure 19 shows the comparison between percent changes in Evergreen Forest (Figure 19a) and percent changes in Agriculture (Figure 19b), highlighting the human impact (or lack thereof) on the BNR. Increases are seen in the Evergreen Forest within the BNR and surroundings and even the ZOP (outer edge of the gray polygon). Comparatively, decreases are seen in agricultural fields in the ZOP and other areas surrounding the BNR. These changes can be attributed to the fallow species succession and fallow/cropping regimes over 9 years. The fallow species succession can be used to either a forest restorative path or a forest degradative path. The land can be restored back to a forest from the regeneration of a shrub fallow. However, if the shrub fallow is cultivated it develops into a herbaceous fallow. The forest can still be restored at this stage; however, if it is further cultivated, it develops into grasslands whereupon if cultivated, it has low productivity. For a more detailed explanation of fallow species succession and 'restoration' or 'degradation' pathways, the reader is advised to read [52].

Conclusions
This paper demonstrated that FCNN models that utilize texture information in a fully automated, end-to-end mapping outperformed pixel-based conventional machine learning approaches in mapping tropical rainforest habitats. Our results showed that the FCNN model (U-Net) produced an overall accuracy of 90.9% compared to accuracies of 88.6%, 84.8%, and 86.6% for SVM, RF, and DNN models respectively. Moreover, the WV-3 SWIR bands increased the accuracy by 1.87% for the U-Net model. The decrease in accuracy without SWIR bands was much greater for DNN (8.55%), SVM (5.42%) and RF (3.18%) models that use only spectral features for classification, demonstrating the potential of the FCNN approach without using expansive SWIR data collections. Comparing Locals around the BNR face food insecurity because of the growing population, which is based on the increased residential land cover (Table 4) and decreasing agricultural yields [12] in the study area. This food insecurity is further driving the forest cover conversion to agricultural fields. Food insecurity leads to the consumption of livestock (chickens, ducks, etc.) and to the consumption of wildlife-particularly lemurs [5,12]. Children around the BNR are reported to becoming weak, falling ill, and even performing poorly in school because of hunger [13]. Health care services are poor, costly, and inaccessible to locals around the BNR. It was common for locals to use medicinal plants as a cheaper option [13]. Therefore, although the conservation of species is important, it is also necessary to consider the needs of the local population. Future conservation efforts by the MFG include improving human health and recognizing the importance of people-oriented strategies in order to provide cash crops and a sustainable food source for the locals [13].

Conclusions
This paper demonstrated that FCNN models that utilize texture information in a fully automated, end-to-end mapping outperformed pixel-based conventional machine learning approaches in mapping tropical rainforest habitats. Our results showed that the FCNN model (U-Net) produced an overall accuracy of 90.9% compared to accuracies of 88.6%, 84.8%, and 86.6% for SVM, RF, and DNN models respectively. Moreover, the WV-3 SWIR bands increased the accuracy by 1.87% for the U-Net model. The decrease in accuracy without SWIR bands was much greater for DNN (8.55%), SVM (5.42%) and RF (3.18%) models that use only spectral features for classification, demonstrating the potential of the FCNN approach without using expansive SWIR data collections. Comparing LCLU maps generated in 2010 and 2019, changes in classes were quantified to understand human interactions with the environment and quantify the impact of conservation efforts. A 44% increase in Residential is seen in the study area, and a decrease in Row Crops in the ZOP. Within the BNR, invasive plant species are seen to decrease, which is complemented by increases in Mixed (12%) and Evergreen Forest (0.7%). This increase in forest cover indicates a reversal of the trends described by Ghulam et al. (2014). In addition, a 32% increase is seen in the tree cover within residential areas either due to native tree maturing or successful agroforestry efforts by the MFG. Conservation efforts worldwide should consider the needs of the local human population. As such, future conservation efforts by the MFG include improving human health and recognizing the importance of peopleoriented strategies in order to provide cash crops and a sustainable food source for the locals.
Machine/deep learning has been far advanced in some application domains; however, challenges exist for remote sensing applications especially for tropical forest monitoring due to in part the lack of sufficient training data representing various biomes and forest types globally that must be manually created and in part the challenges of managing and analyzing big satellite data at the global scale. We presented a fully automated deep learning model in this paper which can be applied to any part of the world. Taking advantage of cloud computing, the approach can be duplicated for national-scale forest cover mapping and change detection, which has significant impact on our understanding of tropical forests and biodiversity for better implementation of REDD+.  Data Availability Statement: WorldView-3 imagery can be obtained from MAXAR technologies directly. Ground truth data and codes will be made available at https://github.com/remotesensinglab.