Change Detection of Selective Logging in the Brazilian Amazon Using X-Band SAR Data and Pre-Trained Convolutional Neural Networks

: It is estimated that, in the Brazilian Amazon, forest degradation contributes three times more than deforestation for the loss of gross above-ground biomass. Degradation, in particular those caused by selective logging, result in features whose detection is a challenge to remote sensing, due to its size, space conﬁguration, and geographical distribution. From the available remote sensing technologies, SAR data allow monitoring even during adverse atmospheric conditions. The aim of this study was to test different pre-trained models of Convolutional Neural Networks (CNNs) for change detection associated with forest degradation in bitemporal products obtained from a pair of SAR COSMO-SkyMed images acquired before and after logging in the Jamari National Forest. This area contains areas of legal and illegal logging, and to test the inﬂuence of the speckle effect on the result of this classiﬁcation by applying the classiﬁcation methodology on previously ﬁltered and unﬁltered images, comparing the results. A method of cluster detections was also presented, based on density-based spatial clustering of applications with noise (DBSCAN), which would make it possible, for example, to guide inspection actions and allow the calculation of the intensity of exploitation (IEX). Although the differences between the tested models were in the order of less than 5%, the tests on the RGB composition (where R = coefﬁcient of variation; G = minimum values; and B = gradient) presented a slightly better performance compared to the others in terms of the number of correct classiﬁcations for selective logging, in particular using the model Painters (accuracy = 92%) even in the generalization tests, which presented an overall accuracy of 87%, and in the test on RGB from the unﬁltered image pair (accuracy of 90%). These results indicate that multitemporal X-band SAR data have the potential for monitoring selective logging in tropical forests, especially in combination with CNN techniques.


Introduction
Land use, land use changes, and forests have historically been the sectors that most contribute to greenhouse gas emissions in Brazil, according to the Greenhouse Gas Emissions and Removal Estimates System-SEEG [1]. Therefore, the necessary containment of the increase in emissions is closely related to the control and combat of deforestation and forest degradation. Brazil has a robust system for monitoring and quantifying annual deforestation carried out by the National Institute for Space Research, called PRODES [2], (CV) is an approach presented as being advantageous for detecting changes due to its simple formulation and notable statistical properties [31]. Coefficient of Variation, also called relative standard deviation, is mathematically defined, in statistics, as the ratio of the standard deviation of the signal to the mean. Therefore, it is considered a normalized measure of the dispersion of a probability distribution.
The classification of changes consists of separating the detections into two or more classes and can be carried out with supervision where the classifier receives input labeled samples for training or without supervision, that is, without external inputs of training data. There are several classification techniques proposed in the literature. They can be divided into traditional classifiers and those based on Artificial Intelligence (AI). In recent years, AI technology has become the focus of research in the development of new methods for detecting and classifying changes [32]. AI uses external information obtained through different data sources as an input to identify underlying rules and patterns, relying on Machine Learning approaches, which generally describe methods that help computers learn without being explicitly programed [33]. Thus, Machine Learning is an essential part of AI, as its algorithms are capable of modeling complex class signatures, can accept a variety of input predictor data, and make no assumptions about the data distribution (that is, are non-parametric). A wide range of studies demonstrate that these methods tend to produce greater accuracy compared to traditional parametric classifiers, especially for complex data with a high-dimensional resource space, i.e., many predictor variables [34]. Amongst the methods of machine learning for speckle suppression and feature extraction in SAR images are the so-called Autoencoders (AE) [35][36][37][38][39] and Convolutional Neural Networks (CNNs) [40][41][42][43].
The objective of this work was to explore the potential of bitemporal X-Band SAR data and pre-trained Convolutional Neural Networks for selective logging mapping in a tropical forest region. The use of pre-trained CNNs is known as transfer learning, which has been tested for diverse applications performing better when the CNN models are trained on images datasets, for example, ImageNet [44]. For this purpose, we obtained a pair of bitemporal COSMO-SkyMed images, acquired in STRIPMAP mode and HH polarization from the Jamari National Forest. Classifications were tested in three types of bitemporal subproducts: (1) RGB composite image (R = coefficient of variation, G = minimum values, B = gradient); (2) single-layer image of the coefficient of variation; and (3) single-layer image of ratio. We also tested the ability of these networks to classify the same changes detected on the images without speckle filtering, evaluating the need for this pre-processing step in the classification process of selective logging. Subsequently, using the Density-based spatial clustering of applications with noise (DBSCAN) method, groupings of clearings were carried out as a proposal for an approach that would allow, for example, to guide inspection actions and allow the calculation of exploitation intensity (IEX).

Study Site and Data
The study area is located in the Jamari National Forest (NF), an area covered by native tropical forest, protected by the Brazilian State. One of the activities allowed in the NFs is sustainable forest management by the concessionaire company that acquired the right to explore the area, which includes selective logging. Jamari NF, which has approximately 220,000 hectares, is subdivided into three Forest Management Units (I, II, and III), which in turn are subdivided into Annual Production Units (UPAs). The UPA explored in 2018 (UPA 11) (Figure 1) was selected for this study because it contains, in addition to the SAR images acquired before and after the exploration period, the forest inventory identifying the exploited trees and the LiDAR point cloud acquired also before and after exploration. SAR images were acquired on 5 June and 8 October 2018 (before and after the selective logging period) by sensors aboard the COSMO-SkyMed3 and COSMO-SkyMed4 satellites of the COSMO-SkyMed constellation. with the acquisition parameters were wave-length=band X; acquisition mode = STRIPMAP; polarization = HH; angle of incidence = ~55°. The images were processed with 1 look in range and azimuth, resulting in a 3 m grid cell, co-registered for correction of translational and rotational deviations between images, filtered through the GammaMAP filter [45][46] with a 3 × 3 window and, finally, geocoded using the digital elevation model produced from the Phased Array type L-band Synthetic Aperture Radar (PALSAR) sensor and conversion to the backscatter coefficients (σ 0 , units in dB).
The ground truth was generated from airborne LiDAR point clouds acquired in 2018 and 2019 and the forest inventory made available by the Brazilian Forest Service (SFB), the government agency responsible for managing the NFs. The inventory contains the list of tree species that occur in the area, the geographical coordinate of each tree exploited, the date of exploitation, and parameters such as diameter at breast height (DBH), circumference at breast height (CBH), and estimated volume. The LiDAR survey was performed by the LiDAR Optech ALTM Gemini airborne sensor with approximately 21 pulses per square meter of terrain. The data comes from a service contracted by the SFB. Using the LAStools plugin for QGIS [47], these point clouds were converted into a digital surface model (DSM) with 3 × 3 m cells from the first return points, which correspond to the pulses with the shortest time between emission and return, representing those that focused on the outermost surface of the canopy. Given that X-band SAR data interacts only superficially with the canopy and, therefore, understory changes cannot be identified at this wavelength, the purpose of the processing adopted for the LiDAR data aimed to simulate this behavior, which justified the adoption of the first return as a signal to compose the digital model. To identify the selective logging that occurred between the two SAR images were acquired on 5 June and 8 October 2018 (before and after the selective logging period) by sensors aboard the COSMO-SkyMed3 and COSMO-SkyMed4 satellites of the COSMO-SkyMed constellation. with the acquisition parameters were wavelength = band X; acquisition mode = STRIPMAP; polarization = HH; angle of incidence =~55 • . The images were processed with 1 look in range and azimuth, resulting in a 3 m grid cell, co-registered for correction of translational and rotational deviations between images, filtered through the GammaMAP filter [45,46] with a 3 × 3 window and, finally, geocoded using the digital elevation model produced from the Phased Array type L-band Synthetic Aperture Radar (PALSAR) sensor and conversion to the backscatter coefficients (σ 0 , units in dB).
The ground truth was generated from airborne LiDAR point clouds acquired in 2018 and 2019 and the forest inventory made available by the Brazilian Forest Service (SFB), the government agency responsible for managing the NFs. The inventory contains the list of tree species that occur in the area, the geographical coordinate of each tree exploited, the date of exploitation, and parameters such as diameter at breast height (DBH), circumference at breast height (CBH), and estimated volume. The LiDAR survey was performed by the LiDAR Optech ALTM Gemini airborne sensor with approximately 21 pulses per square meter of terrain. The data comes from a service contracted by the SFB. Using the LAStools plugin for QGIS [47], these point clouds were converted into a digital surface model (DSM) with 3 × 3 m cells from the first return points, which correspond to the pulses with the shortest time between emission and return, representing those that focused on the outermost surface of the canopy. Given that X-band SAR data interacts only superficially with the canopy and, therefore, understory changes cannot be identified at this wavelength, the purpose of the processing adopted for the LiDAR data aimed to simulate this behavior, which justified the adoption of the first return as a signal to compose the digital model. To identify the selective logging that occurred between the two acquisitions, the ratio between the DSMs was obtained, in which high values represent the changes. The changes were confirmed by overlaying the SAR ratio image with the forest inventory.

Convolutional Neural Network Architectures
Convolutional Neural Networks (CNN) are a type of multilayer network with learning capability, composed of convolutional layers, pooling layers, and fully connected layers ( Figure 2). Remote Sens. 2021, 13, x FOR PEER REVIEW 5 of 19 acquisitions, the ratio between the DSMs was obtained, in which high values represent the changes. The changes were confirmed by overlaying the SAR ratio image with the forest inventory.

Convolutional Neural Network Architectures
Convolutional Neural Networks (CNN) are a type of multilayer network with learning capability, composed of convolutional layers, pooling layers, and fully connected layers ( Figure 2). The input of convolutional layers ∈ ℝ consists of n 2D feature/attribute maps of size ℎ. The output ∈ ℝ of the convolutional layers are m 2D feature/attribute maps of size ′ ℎ′ via convolution matrix W.
∈ ℝ are the m trainable filters of size (usually = 1, 3, or 5). The convolution process is described as * , where * denotes 2-D convolution operation and b the bias. In general, a nonlinear activation function f is performed after the convolution operation. As the convolutional structure deepens, convolutional layers can capture different features/attributes (e.g., edges, lines, corners, structures, and shapes) from the input feature/attribute maps [48].
Pooling layers perform a maximum or average operation over a small area of each input feature map. They can be defined as , where pool represents the pooling function (summarizes the information from that pooling area into a single average, maximum or stochastic pooling value), and e the input and output of the pooling layer, respectively. Typically, pooling layers are applied between two successive convolutional layers. The pooling operation can create invariances such as small shifts and distortions. For object detection and image classification, the invariance characteristic provided by pool layers is very important [48].
Fully connected layers usually appear in the upper layer of CNNs, which can summarize the features/attributes extracted from the lower layers. Fully connected layers process their input with linear transformation by weight and bias , then map the output of the linear transformation to a nonlinear activation function f, according to the equation . . In the classification task, to generate the probability of each class, a softmax classifier is usually connected to the last fully connected layer. The softmax classifier is used to normalize the output of the fully connected layer ∈ ℝ (where c is the number of classes) between 0 and 1, which can be described as / ∑ , where e is the exponential function. The output of the softmax classifier denotes the probability that a given input image belongs to each class. The dropout method The input of convolutional layers X ∈ R n×w×h consists of n 2D feature/attribute maps of size w × h. The output H ∈ R m×w ×h of the convolutional layers are m 2D feature/attribute maps of size w × h via convolution matrix W. W ∈ R m×l×l×n are the m trainable filters of size l × l × n (usually l = 1, 3, or 5). The convolution process is described as H = f (W * X + b), where * denotes 2-D convolution operation and b the bias. In general, a nonlinear activation function f is performed after the convolution operation. As the convolutional structure deepens, convolutional layers can capture different features/attributes (e.g., edges, lines, corners, structures, and shapes) from the input feature/attribute maps [48].
Pooling layers perform a maximum or average operation over a small area of each input feature map. They can be defined as H l = pool(H l−1 ), where pool represents the pooling function (summarizes the information from that pooling area into a single average, maximum or stochastic pooling value), and H l−1 e H l the input and output of the pooling layer, respectively. Typically, pooling layers are applied between two successive convolutional layers. The pooling operation can create invariances such as small shifts and distortions. For object detection and image classification, the invariance characteristic provided by pool layers is very important [48].
Fully connected layers usually appear in the upper layer of CNNs, which can summarize the features/attributes extracted from the lower layers. Fully connected layers process their input X with linear transformation by weight W and bias b, then map the output of the linear transformation to a nonlinear activation function f, according to the equation y = f W· X + b . In the classification task, to generate the probability of each class, a softmax classifier is usually connected to the last fully connected layer. The softmax classifier is used to normalize the output of the fully connected layer y ∈ R c (where c is the number of classes) between 0 and 1, which can be described as P( is the exponential function. The output of the softmax classifier denotes the probability that a given input image belongs to each class. The dropout method [49] operates on fully connected layers to avoid overfitting [49] as a fully connected layer usually contains a large number of parameters [48].
The extraction of attributes from the images occurs through so-called embedders, which read the images and use deep learning models to calculate a vector of attributes for each image. It returns a data table with additional columns containing the image descriptors. The deep learning models tested in this work were: InceptionV3 [50]: is Google's deep neural network for image recognition, consisting of 48 layers. It is trained on the ImageNet dataset [51] and has been shown accuracy greater than 78.1% on this set. The model is composed of symmetric and asymmetric components, including convolutions, average clusters, maximum clusters, concatenations, dropouts, and fully connected layers. Batch normalization is used extensively throughout the model and applied to trigger inputs. The loss is calculated using the softmax function. 2.
VGG16 [52]: is a convolutional neural network model consisting of 16 layers containing their respective weights, trained in the ImageNet dataset, having achieved 92.6% accuracy in its classification. Instead of having a large number of hyperparameters, the network has convolution layers of 3 × 3 filter with a 1 pass, always using the same padding, and the maxpool layer of 2 × 2 filter with 2 passes. It follows this arrangement of convolution and maxpool layers consistently across the architecture.
In the end, there are two fully connected layers, followed by a softmax for the output. 3.
VGG19: is a variant of the VGG model that contains 19 deep layers, which achieved a 92.7 accuracy in the ImageNet set classification.

4.
SqueezeNet [53]: is a 26-layer deep convolutional neural network that achieves AlexNet level accuracy [54] on ImageNet with 50× fewer parameters. SqueezeNet employs architectural strategies that reduce the number of parameters, notably with the use of trigger modules that "squeeze" the parameters using 1 × 1 convolutions.

5.
Painters: is a model trained in the dataset of the Painter by Numbers on Kaggle competition [55], consisting of 79,433 images of paintings by 1584 different painters, whose objective was to examine pairs of paintings, and determine if they are by the same artist. The network comprises a total of 24 layers. 6.
DeepLoc [56]: is a convolutional network trained on 21,882 individual cell images that have been manually assigned to one of 15 location compartments. It is a prediction algorithm that uses deep neural networks to predict the subcellular location of proteins based on sequence information alone. At its core, the prediction model uses a recurrent neural network that processes the entire protein sequence and an attention mechanism that identifies regions of the protein important for subcellular localization. The network consists of 11 layers.
The six models were tested to leverage the transfer learning from their original applications, and thus using the contextual features extracted from their extensive dataset, apply for the forest degradation detection.

Data Selection
The first stage of the classification tests consisted of selecting all candidate features to change between the SAR images of June/2018 and October/2018. For this, the coefficient of variation (CV) between the two images was generated, since the CV has been pointed out as an advantageous alternative for detecting changes in SAR images [31]. As obtained by [57], the value of 0.4 was defined as the boundary between non-logging and selective logging, thus values greater than 0.4 were included as candidates for the selective logging class. Later, such sets of pixels were vectorized, forming candidate polygons for the selective logging class.
To create a field truth dataset, the extraction class candidates were chosen based on airborne LiDAR and forest inventory data. Thus, of the 186,886 polygons generated by slicing the CV, 4324 received the class label, namely, selective logging and non-logging. The polygons were then used as a mask to crop the original images listed below into 186,886 patches of varying sizes (depending on feature size-4324 labeled and 182,562 unlabeled, applied for network generalization capacity test and later validated by ground truth), to be used in the CNNs. These models use image patches instead of pixels for training and prediction in order to understand the underlying contextual/textural information.
• RGB whose R channel contains the coefficient of variation image, the G channel the minimum values image and the B channel the gradient (covmingrad image) between the June and October SAR COSMO-SkyMed filtered images; • RGB whose R channel contains the coefficient of variation image, the G channel the minimum values image and the B channel the gradient (covmingrad image) between the June and October SAR COSMO-SkyMed unfiltered images; • Single band of the coefficient of variation (CV image); • Single band of ratio (COSMO-SkyMed October /COSMO-SkyMed June -ratio image).
The image clipping procedure generated 4324 labeled sub-images for each of the four images, of which 3026 were used for training and 1298 for testing.

Classification Tests
The subimages clipped by labeled polygons and classified by field truth were used as training (70%) and test (30%) sets, and the unlabeled ones to analyze the generalizability of CNN. Figure 3 shows a sample of the covmingrad images cut by these polygons. The polygons were then used as a mask to crop the original images listed below into 186,886 patches of varying sizes (depending on feature size-4324 labeled and 182,562 unlabeled, applied for network generalization capacity test and later validated by ground truth), to be used in the CNNs. These models use image patches instead of pixels for training and prediction in order to understand the underlying contextual/textural information.
• RGB whose R channel contains the coefficient of variation image, the G channel the minimum values image and the B channel the gradient (covmingrad image) between the June and October SAR COSMO-SkyMed filtered images; • RGB whose R channel contains the coefficient of variation image, the G channel the minimum values image and the B channel the gradient (covmingrad image) between the June and October SAR COSMO-SkyMed unfiltered images; • Single band of the coefficient of variation (CV image); • Single band of ratio (COSMO-SkyMedOctober/COSMO-SkyMedJune-ratio image).
The image clipping procedure generated 4,324 labeled sub-images for each of the four images, of which 3026 were used for training and 1298 for testing.

Classification Tests
The subimages clipped by labeled polygons and classified by field truth were used as training (70%) and test (30%) sets, and the unlabeled ones to analyze the generalizability of CNN. Figure 3 shows a sample of the covmingrad images cut by these polygons. To perform the tests, the Orange data mining software [58], was adopted, which is a platform for data analysis and visualization based on visual programming that presents the possibility of implementation using the Python 3 library. The general flowchart of the tests is shown in Figure 4. To perform the tests, the Orange data mining software [58], was adopted, which is a platform for data analysis and visualization based on visual programming that presents the possibility of implementation using the Python 3 library. The general flowchart of the tests is shown in Figure 4.  The tested models were InceptionV3, VGG16, VGG19, SqueezeNet, Painters, and DeepLoc (for details, see Section 2.2). The last fully connected layer was defined according to the tests performed by [57], who presented, as the best result for classification of selective logging, that obtained by an Artificial Neural Network of the Multi Layer Perceptron (ANN-MLP) type, with 1 hidden layer of 50 neurons, with ReLu activation function and SGD weight activator, α = 0.00002 and 1000 iterations (Figure 4, step 5).
The performance evaluation with the test dataset was carried out by calculating the following parameters: • Area under receiver operator curve (AUC): an AUC of 0.5 suggests no discrimination between classes; 0.7 to 0.8 is considered acceptable; 0.8 to 0.9, excellent; and more than 0.9, exceptional. • Accuracy: proportion of correctly classified samples.
The validation strategy adopted was cross-validation [59], with a number of folds = 5. The evaluation of the generalization capacity was carried out based on the accounting, within the area where forest inventory data are available that point the exploited trees, correct classifications, commission and omissions errors, and calculation of the global accuracy [60], that considers data from the diagonal of the confusion matrix (true agreement).

Grouping of Selective Logging Features
As a final step, a method was proposed that makes it possible to estimate the intensity of exploration (IEX) of the study area, which is calculated by the ratio between the selective logging area (which comprises the area of clearings) and the total area that delimits the region in exploration (minimum bounding box). To delimit the minimum bounding box, the DBSCAN (Density Based Spatial Clustering of Application with Noise) method [61] was applied, which requires two parameters: minimum group size (number of clearings) and maximum distance between grouped clearings, whose definition was based on analysis of the distribution of distances and groupings of available extraction data. The clustering method was performed on polygons classified as selective logging by CNN. The tested models were InceptionV3, VGG16, VGG19, SqueezeNet, Painters, and DeepLoc (for details, see Section 2.2). The last fully connected layer was defined according to the tests performed by [57], who presented, as the best result for classification of selective logging, that obtained by an Artificial Neural Network of the Multi Layer Perceptron (ANN-MLP) type, with 1 hidden layer of 50 neurons, with ReLu activation function and SGD weight activator, α = 0.00002 and 1000 iterations (Figure 4, step 5).
The performance evaluation with the test dataset was carried out by calculating the following parameters: • Area under receiver operator curve (AUC): an AUC of 0.5 suggests no discrimination between classes; 0.7 to 0.8 is considered acceptable; 0.8 to 0.9, excellent; and more than 0.9, exceptional. • Accuracy: proportion of correctly classified samples. • Training time (s).
The validation strategy adopted was cross-validation [59], with a number of folds = 5. The evaluation of the generalization capacity was carried out based on the accounting, within the area where forest inventory data are available that point the exploited trees, correct classifications, commission and omissions errors, and calculation of the global accuracy [60], that considers data from the diagonal of the confusion matrix (true agreement).

Grouping of Selective Logging Features
As a final step, a method was proposed that makes it possible to estimate the intensity of exploration (IEX) of the study area, which is calculated by the ratio between the selective logging area (which comprises the area of clearings) and the total area that delimits the region in exploration (minimum bounding box). To delimit the minimum bounding box, the DBSCAN (Density Based Spatial Clustering of Application with Noise) method [61] was applied, which requires two parameters: minimum group size (number of clearings) and maximum distance between grouped clearings, whose definition was based on analysis of the distribution of distances and groupings of available extraction data. The clustering method was performed on polygons classified as selective logging by CNN.

Results
The classification tests by pre-trained CNNs on the covmingrad, images are presented in Table 1A, on the cov image in Table 1B, and on the ratio image in Table 1C, with all images derived from the filtered COSMO Sky-Med image pair. The data presented in Table 1A-C shows accuracy values obtained above 85% for all images presented, which are considered excellent results. Considering the confidence intervals calculated between the cross-validation n-folds for a significance level of 5%, there is no significant difference between the accuracies obtained by the models applied to the covmingrad (Table 1A) and cov (Table 1B) images. The models applied to the ratio image, on the other hand, showed lower performance (Table 1C). The best training and testing times were obtained with the DeepLoc embedder for all tests performed. The confusion matrices of the highest mean accuracies obtained through the CNNs tested for each type of input image are presented in Table 1. Table 2 shows that the highest percentage of selective logging correct classifications was obtained with the covmingrad image and Painters model, and therefore, its generalization capacity was tested by applying it to unlabeled images (186,886 images). The confusion matrix containing the results (percentage in relation to the prediction) of the generalization test, obtained by crossing unlabeled images with ground truth, is shown in Table 3. The Global Accuracy obtained was 87%. Figure 5 shows the bounding boxes of unlabeled subimages before and after classification.  Table 3. Confusion matrix containing the results (percentage in relation to the prediction) of the generalization test.  Table 2 shows that the highest percentage of selective logging correct classifications was obtained with the covmingrad image and Painters model, and therefore, its generalization capacity was tested by applying it to unlabeled images (186,886 images). The confusion matrix containing the results (percentage in relation to the prediction) of the generalization test, obtained by crossing unlabeled images with ground truth, is shown in Table 3. The Global Accuracy obtained was 87%. Figure 5 shows the bounding boxes of unlabeled subimages before and after classification.

Covmingrad (Painters)-Generalization Test Predicted selective logging non-logging
Ground truth selective logging 87.5% 12.7% non-logging 12.5% 87.3% The highest classification accuracies obtained for unfiltered SAR data (covmingrad images) was the Painters model (89.9%), followed by Inception V3 (89.4%) and SqueezeNet (89.1%) ( Table 4). However, at a significance level of 5%, in this case, it is also not possible to state that there is a significant difference between the results obtained by the different models. The Painters method produced 18.8% commission errors and 8.6% omission errors (Table 5). The highest classification accuracies obtained for unfiltered SAR data (covmingrad images) was the Painters model (89.9%), followed by Inception V3 (89.4%) and SqueezeNet (89.1%) ( Table 4). However, at a significance level of 5%, in this case, it is also not possible to state that there is a significant difference between the results obtained by the different models. The Painters method produced 18.8% commission errors and 8.6% omission errors ( Table 5). The Brazilian government, through the Brazilian Forest Service, has developed a system, called DETEX [62], for mapping selective logging. Figure 6 shows the polygons of areas affected by illegal logging in the year 2018. In the same figure, it is possible to observe that detections through CNNs (in magenta) have the advantage of delimiting the scar in the canopy of each tree or set of removed trees, while DETEX (in yellow) presents only a polygon delimiting the affected area. The Brazilian government, through the Brazilian Forest Service, has developed a system, called DETEX [62], for mapping selective logging. Figure 6 shows the polygons of areas affected by illegal logging in the year 2018. In the same figure, it is possible to observe that detections through CNNs (in magenta) have the advantage of delimiting the scar in the canopy of each tree or set of removed trees, while DETEX (in yellow) presents only a polygon delimiting the affected area. This precise delimitation of the scar allows, given the correlation between the area of clearings resulting from forest exploitation and the IEX (Exploration Intensity-m 3 /ha) presented by [63], the estimation of the IEX. In general, the areas of legal exploration, that This precise delimitation of the scar allows, given the correlation between the area of clearings resulting from forest exploitation and the IEX (Exploration Intensity-m 3 /ha) presented by [63], the estimation of the IEX. In general, the areas of legal exploration, that is, those under concession, have well-defined limits, facilitating the task of estimating the IEX. For illegal exploration areas, an alternative we propose is the grouping of polygons classified as logging, by the DBSCAN (Density Based Spatial Clustering of Application with Noise) method [61], whose required and adopted parameters were the minimum group size (5 gaps) and maximum distance between grouped gaps, defined as 60 m for legal logging and 100 m for illegal logging from the analysis of distances between gaps presented in Figure 7, and calculating the mean values.
OR PEER REVIEW 12 of 19 is, those under concession, have well-defined limits, facilitating the task of estimating the IEX. For illegal exploration areas, an alternative we propose is the grouping of polygons classified as logging, by the DBSCAN (Density Based Spatial Clustering of Application with Noise) method [61], whose required and adopted parameters were the minimum group size (5 gaps) and maximum distance between grouped gaps, defined as 60 m for legal logging and 100 m for illegal logging from the analysis of distances between gaps presented in Figure 7, and calculating the mean values.  Figure 8 shows the grouped gap polygons. The clearings shown in Figure 8 in white color are those that are geographically isolated. For the detection of illegal logging areas, for example, or even for monitoring forest concessions, these clearings could be neglected, as they represent, in many cases, natural tree falls or intense leaf loss. Since logging requires an infrastructure of timber transport, it presents a concentrated pattern of scars rather than a dispersed one. This approach can be especially useful for on-site enforcement efforts, directing operations to areas at an early stage of exploration, reducing the occurrence of false positives.   Figure 8 in white color are those that are geographically isolated. For the detection of illegal logging areas, for example, or even for monitoring forest concessions, these clearings could be neglected, as they represent, in many cases, natural tree falls or intense leaf loss. Since logging requires an infrastructure of timber transport, it presents a concentrated pattern of scars rather than a dispersed one. This approach can be especially useful for on-site enforcement efforts, directing operations to areas at an early stage of exploration, reducing the occurrence of false positives.
Remote Sens. 2021, 13, x FOR PEER REVIEW Figure 8. Polygons classified as logging (grouped-each color represents a group). In red, t NF limit.

Discussion
Given their low canopy penetrability and relatively low data availability data are less frequently considered in forestry studies. However, changes in cano ture caused by vegetation removal can be perceived by sensors operating in quency bands, as they contain more textural information [64], depending on fact as biomass, forest structure, and terrain conditions, as they reduce the inte backscattered energy, as evidenced by Bouvet et al. [65]. When trees in a fores tracted, shadows appear or disappear at their edges, depending on the direction the position of the fragment in relation to the satellite, and the ground cover ar fragment [65]. The appearance of the shading effect is characterized by the sudd in backscatter in a multitemporal series of images acquired according to the sam eters (angle of view, sensor height, orbit, and image acquisition mode). An oppo nomenon can also be observed in the opposite position of the deforested area: the ance of an increase in backscattering, which occurs due to the double reflectan exerted by the trunks of the remaining trees that are positioned in the direction o gation of the radar signal [66].
This effect made it possible, as evidenced by Bouvet et al. [65], to detect logging occurring in an area of tropical forest in the Brazilian Amazon whose logging is authorized by the government. Tests applying pre-trained CNNs on of the bitemporal pair of COSMO-SkyMed images showed the X-band as suitabl rying out this type of detection, reaching an accuracy greater than 90% with th embedder Painters on covmingrad images. It was able to correctly classify 85.4 subimages from the selective logging class and 93% from the non-logging class. A they did not perform as well, the results of the classification of cov and ratio ima sented accuracy above 85%, having correctly classified selective logging s in 84.4%, respectively.
In general, all models applied to covmingrad and cov filtered and unfiltere presented good performance, with accuracy above 90%, which proves the ability Figure 8. Polygons classified as logging (grouped-each color represents a group). In red, the Jamari NF limit.

Discussion
Given their low canopy penetrability and relatively low data availability, X-band data are less frequently considered in forestry studies. However, changes in canopy structure caused by vegetation removal can be perceived by sensors operating in highfrequency bands, as they contain more textural information [64], depending on factors, such as biomass, forest structure, and terrain conditions, as they reduce the intensity of backscattered energy, as evidenced by Bouvet et al. [65]. When trees in a forest are extracted, shadows appear or disappear at their edges, depending on the direction of orbit, the position of the fragment in relation to the satellite, and the ground cover around the fragment [65]. The appearance of the shading effect is characterized by the sudden drop in backscatter in a multitemporal series of images acquired according to the same parameters (angle of view, sensor height, orbit, and image acquisition mode). An opposite phenomenon can also be observed in the opposite position of the deforested area: the appearance of an increase in backscattering, which occurs due to the double reflectance effect exerted by the trunks of the remaining trees that are positioned in the direction of propagation of the radar signal [66].
This effect made it possible, as evidenced by Bouvet et al. [65], to detect selective logging occurring in an area of tropical forest in the Brazilian Amazon whose selective logging is authorized by the government. Tests applying pre-trained CNNs on products of the bitemporal pair of COSMO-SkyMed images showed the X-band as suitable for carrying out this type of detection, reaching an accuracy greater than 90% with the use of embedder Painters on covmingrad images. It was able to correctly classify 85.4% of the subimages from the selective logging class and 93% from the non-logging class. Although they did not perform as well, the results of the classification of cov and ratio images presented accuracy above 85%, having correctly classified selective logging s in 83% and 84.4%, respectively.
In general, all models applied to covmingrad and cov filtered and unfiltered images presented good performance, with accuracy above 90%, which proves the ability of these models to classify selective logging in SAR images. These results are similar, in terms of accuracy, to those obtained by [57] with the application of the Artificial Neural Network Multi Layer Perceptron (ANN-MLP) on attributes extracted by the authors of the products of the same bitemporal COSMO-SkyMed images used in this study. The CNNs approach eliminates the attribute generation step, reducing processing time, and its main advantage compared to its predecessors is that it automatically detects and learns texture and context features that describe the target without any human supervision. However, in general, the training time for CNNs was higher than for ANN-MLPs, as CNNs are trained remotely over a large dataset (ImageNet, for example). As the name explains, convolutional networks explore, through multiple filters applied in the convolution windows, textural characteristics of images. Thus, neighborhood relationships between targets (context) are not taken into account, which is an important parameter in the detection of activities such as logging. Kuck et al. [57] explored, in addition to textural parameters, spectral, spatial, and context parameters, and obtained results similar to those of CNNs. A future approach could combine the attributes of the two ML techniques.
The Painters model was selected for the generalizability test, as it showed greater success for the selective extraction class. Embedder Painters was developed within the framework of the Painter by Numbers competition at Kaggle [55], whose objective was to examine pairs of paintings to determine if they were painted by the same artist. The training set consisted of artwork and their corresponding class labels (painters). Its ability to identify the unique styles of painters, in works that do not present standard features, but abstract ones (although characteristic styles and traits, in the case of works by the same painter), may have represented an important characteristic for the classification of selective logging and non-logging, which likewise do not show a shape pattern (unlike the identification of a face, for example, where there is a characteristic pattern).
The classification test of the covmingrad image from unfiltered bitemporal COSMO-SkyMed images resulted in 90% accuracy by the Embedder Painters, being able to correctly classify 81.2% of the selective logging subimages. This result indicates that CNNs are able to correctly classify these targets even under the speckle effect, which, in general, makes it difficult or even impossible to identify targets in SAR images, and its suppression or minimization is the focus of several studies in the microwave remote sensing area [46].
Regarding the grouping methodology presented, in Figure 7 it is possible to see that the distribution is concentrated on the left for the areas of illegal selective logging however, it has a greater variance than the distances presented by the legal selective logging. This is because the legal selective loggings are planned to cause the least possible impact on the forest, and therefore the skid trails are used to transport more than one log, and the trees to be cut are selected according to a plan based on the forest inventory [63]. In illegal logging, this planning and control does not take place. Logging intensity is an important metric given its correlation with damage to the remaining forest. Low understory damage values found are, in part, explained by the low exploration intensity [63].
At the moment, our study is the first one presenting an alternative for operational monitoring of selective logging based on X-band SAR data and CNNs, which allows monitoring even in periods of high cloud cover in the Amazon, which comprises the months between October and April, covering a limitation of optical data [17]. This work demonstrates that such monitoring on a large scale is possible since a well-trained network can have high generalizability. Mitchell et al. [14], in their review of remote sensing applied to the study of forest degradation, presented in 2017 that initiatives using Xband for the purpose of fine-scale detection until that time (logging scars) had no largescale demonstration or operational application, and in all case studies presented, X-band images were acquired in spotlight mode (VHR covering only a small geographic area) [67]. Having demonstrated high generalizability, it is possible that the technique presented here represents an advance both for operational monitoring and for large-scale application, demanding, however, new tests. Another advance of the study presented is the possibility of detection at the individual level. Although Bullock et al. [15] have obtained high accuracy in detecting degradation through Landsat (optical) images, it is noted that detections are restricted to those intense disturbances (spots of degraded areas, caused by fire or logging infrastructure). One of the first tree level estimates of tree loss come obtained 64% accuracy with a Random Forest model and multi-temporal VHR imagery from WorldView-2 and GeoEye-1 satellites [16]. However, this low accuracy was attributed to view-illumination geometry issues that create shadows not associated with treefalls and tree loss which are inherent in optical data and confuse the classifier. Meanwhile, our estimates and SAR data, in general, do not suffer from these issues, enabling more precise detection of tree loss associated with logging.
Multitemporal SAR data, if acquired under the same geometric and radiometric parameters, present changes related to land cover changes or the presence of dense meteorological formations and changes in the moisture content of the targets. These differences affect backscatter and can produce false detections [5]. More studies should be carried out to quantify the effect of these artifacts on detections through CNNs, although [57] have shown that ANN-MLPs are capable of separating these artifacts from selective logging.
Although we achieved such high accuracy on our method, there are still caveats to be acknowledged. The method confused selective logging with non-logging probably due to foliar loss, natural death of trees, or canopy geometry (trees hidden in the shadow of the neighborhood). This hypothesis must be tested through a field survey that specifically addresses these features. Another potential factor that could affect our estimates is the seasonality and moisture content due to rainfall in the forest. In this experiment, we controlled for seasonality, choosing a pair of images both acquired over the dry season. Further, a previous study pointed out that seasonality might not affect degradation detections using machine learning with X-band SAR data [57]. These aforementioned caveats, thus, consist in some of the directions that future studies can build upon and improve the methodology. Third, we only tested the SAR data COSMO-SkyMed, but perhaps other sensors could be tested such as Iceye and Sentinel-1, including other bands such as C and other acquisitions geometry. We expect this method to be continuously improved and perhaps used for operational monitoring given the availability of SAR data for tropical forest areas.
The deep learning model experiments provided in this paper is the first step towards monitoring tropical forest degradation. This is an important topic in face of climate change and deforestation and degradation reductions pledged by the Brazilian government up to 2030 as recently highlighted in the COP26. We believe our approach can be improved and reach the necessary scale for the operational monitoring towards achieving those difficult but important goals. For this purpose, the investment in SAR data and computing resources by the government would off course be required. Nevertheless, this is a step forward towards fighting forest crime and helping mitigate climate change.

Conclusions
Bitemporal features generated from the pair of SAR images used in this study, acquired in X-band and HH polarization by the COSMO-SkyMed constellation before and after the period of legal logging in the Jamari NF, in conjunction with the CNN techniques employed, enabled the detection of scars caused by selective logging in both legal and illegal logging areas. The highest success rate for the selective logging class was obtained by the Painters model. However, in relation to accuracy, all models showed similar performance.
The present study represents an evolution of the study presented by [57], with the advantage that the convolutional network itself extracts the images attributes, eliminating the need for this step in the classification process. The reduction of stages is especially important when the objective is the systematic and operational monitoring of the entire Brazilian Amazon territory, which has an area larger than 5 million km 2 . It is suggested that further studies explore machine learning techniques such as U-NET, based on semantic segmentation, and as input the bitemporal images themselves, eliminating the need to generate bitemporal products (coefficient of variation, ratio, minimum values, and gradient, for example). It is also suggested to increase the training samples and carry out tests in different biophysical composition areas to estimate the generalization capacity of these networks in different environments, and to expand the findings of this study and its advances to automate the application to a regional scale.
The tests showed that CNNs were able to present good results, in the case studied, even when applied to bitemporal products from unfiltered images. Many studies have presented alternatives to reduce the speckle effect on SAR images given that such an effect reduces the target detection and classification capacity of these images. It is suggested that future studies be carried out to measure the contribution of this effect in reducing the performance of techniques based on machine learning for the classification of SAR images.
The DBSCAN clustering method was presented as an alternative for identifying areas at an early stage of illegal selective logging, as well as for measuring the intensity of logging in legal and illegal areas. Although already started, more studies should be carried out to establish the correlation between the area of clearings and the intensity of exploration.

Data Availability Statement:
The data presented in this study are available upon request from the author.