Convolutional Neural Network with a Learnable Spatial Activation Function for SAR Image Despeckling and Forest Image Analysis

: Synthetic aperture radar (SAR) images are often disturbed by speckle noise, making SAR image interpretation tasks more difﬁcult. Therefore, speckle suppression becomes a pre-processing step. In recent years, approaches based on convolutional neural network (CNN) achieved good results in synthetic aperture radar (SAR) images despeckling. However, these CNN-based SAR images despeckling approaches usually require large computational resources, especially in the case of huge training data. In this paper, we proposed a SAR image despeckling method using a CNN platform with a new learnable spatial activation function, which required signiﬁcantly fewer network parameters without incurring any degradation in performance over the state-of-the-art despeckling methods. Speciﬁcally, we redeﬁned the rectiﬁed linear units (ReLU) function by adding a convolutional kernel to obtain the weight map of each pixel, making the activation function learnable. Meanwhile, we designed several experiments to demonstrate the advantages of our method. In total, 400 images from Google Earth comprising various scenes were selected as a training set in addition to 10 Google Earth images including athletic ﬁeld, buildings, beach, and bridges as a test set, which achieved good despeckling effects in both visual and index results (peak signal to noise ratio (PSNR): 26.37 ± 2.68 and structural similarity index (SSIM): 0.83 ± 0.07 for different speckle noise levels). Extensive experiments were performed on synthetic and real SAR images to demonstrate the effectiveness of the proposed method, which proved to have a superior despeckling effect and higher ENL magnitudes than the existing methods. Our method was applied to coniferous forest, broad-leaved forest, and conifer broad-leaved mixed forest and proved to have a good despeckling effect (PSNR: 23.84 ± 1.09 and SSIM: 0.79 ± 0.02). Our method presents a robust framework inspired by the deep learning technology that realizes the speckle noise suppression for various remote sensing images.


Introduction
With the development of radio technology, radar has not only been used in military fields such as target detection [1,2] and reconnaissance [3,4] but also plays an important role in weather forecasting [5], environmental protection [6,7], etc. Synthetic aperture radar (SAR) [8,9] is an efficient type of radar system, which can generate high resolution images on the moving platform, such as airplanes, satellites, etc. In the process of radar movement, the ground target is scanned by transmitting electromagnetic waves and reflects the radar echo signal. Finally, the radar image is synthesized by the collected two-dimensional echo signal.
Compared with optical and infrared imaging systems, SAR possesses inherent allday and all-weather acquisition capability and makes some difficult tasks possible, such as detecting hidden targets and interferometry [10,11]. However, the SAR images are often inhibited by speckle, which is formed by the interference echo of each resolution cell and brings the difficulties for SAR images processing and interpretation. Therefore, the image despeckling is crucial and is used as a pre-processing step in various SAR applications [12,13].
For SAR images, the main reason for contamination is multiplicative speckle noise, and this noise model can be described by the following equation.
where Y denotes the observed intensity image with size of W × H, X is the clean image with size of W × H corresponding to the Y, and N represents the factor of speckle noise. Specifically, for SAR amplitude image, N follows a Gamma distribution with unit mean and variance 1/L and has the following distribution [14]: where L ≥ 1, N ≥ 0, Γ(•) denotes the Gamma function, and L is the equivalent number of looks (ENL). The purpose of speckle suppression methods based on the convolutional neural network is to learn the nonlinear mapping relationships between clean images and corresponding noisy images.
Since the 1980s, different methods for despeckling have been proposed based on various technologies, such as multilook processing [15][16][17][18], spatial domain filters [19][20][21][22], wavelet transform [23][24][25][26], nonlocal filtering [27][28][29][30], and total variation [31][32][33][34]. The multilook processing can suppress speckle noise simply and effectively, but this leads to reduction of resolution for SAR image. Spatial domain filtering methods can effectively suppress noise, but they always have the problem of excessive smoothing of edge and detail information. Wavelet transform based methods are superior to the spatial domain filtering methods in speckle suppression. However, these kinds of methods still cannot save the texture details of the image effectively. The methods based on the non-local idea, such as probabilistic patch-based (PPB) [28] and SAR block-matching 3D (SAR-BM3D) [29], can achieve better results for speckle suppression and texture information. The basic idea is that there are large numbers of similar blocks in the whole image, and the self-similarity between the blocks is employed. However, the search for similar blocks increases the computational complexity for non-local methods. Although the above methods have achieved good results for despeckling, some of these methods still face problems if the intention is to preserve excellent detailed features in domains of complicated texture.
In recent years, with the development of computer hardware, various methods based on deep convolutional neural networks were successfully applied in image denoising tasks [35]. Compared with traditional algorithms for SAR image despeckling, a deep neural network is more powerful to solve complex non-linear problems. Chierchia [36] used homomorphic processing and residual learning [37] to train a convolutional neural network, in which the log-transformed images were trained in the neural network. Wang proposed a despeckling network named image despeckling convolutional neural network (ID-CNN) by using a component-wise division-residual layer to estimate speckle [38]. Zhang [39] combined skip connection [40] and dilated convolution [41] to achieve SAR despeckling. Similarly, Gui [42] proposed a network using dilated convolution and a densely connected network [43]. Lattari [44] successfully used the U-Net CNN architecture for SAR image speckle suppression. Moreover, some scholars also proposed the SAR image despeckling schemes by combining CNN with other methods, such as nonlocal methods [45][46][47] and guided filtering (GFF) methods [48][49][50]. Although CNN based methods have achieved successful despeckling application, one problem is that the despeckling network becomes deeper and wider, which leads to large computation in both network training and the despeckling process. In order to reduce network parameters, we proposed a method using a new learnable spatial activation function based on xUnit [51]. In the case of the same parameter quantity, the activation function can obtain better despeckled results than the commonly used functions such as the rectified linear unit. Moreover, the function can achieve the same performance as the original structure with fewer layers so that more complex network structural characteristics can be avoided.
In this study, we aimed to use a better convolutional neural network method for speckle suppression of SAR images. Therefore, a speckle suppression method of SAR images based on a learning activation function was introduced. In order to improve the speckle suppression performance and reduce the occupancy rate of computing resources as much as possible, an activation function with learnable parameters was proposed from the perspective of the threshold unit of ReLU, a common activation function. In order to analyze the performance of the activation function, comparative experiments were designed. The innovation of this paper is as follows: firstly, a novel algorithm is proposed to achieve speckle suppression; secondly, a complete set of experimental methods and systems is formed from theory to simulation and then to the application of real SAR images; thirdly, the method proposed in this paper not only is applicable to SAR images but also introduces forest image denoising for comparison.
The rest of the paper is organized as follows. Section 2 introduces the proposed scheme, including the network architecture, the modified xUnit (M-xUnit), and the evaluation index of SAR image speckle suppression. The results on synthetic and real SAR images are shown and compared with other state-of-art methods in Section 3. Section 4 derives the discussion and the conclusion in Section 5.

Structure Design of Learnable Activation Function
Convolution operation and pooling operation in convolution neural networks are linear operations, which only can solve linear problems. However, most of the practical problems are nonlinear. If just stacking the convolution layer and the pooling layer directly, the neural network will only be suitable for the linear problem. This is why CNN needs to add an activation function layer whose role is to inject nonlinear factors into the neural network so that the network can fit all kinds of curves and be able to handle practical problems. The common activation functions are as follows: logistic function (also known as Sigmoid function) [52], hyperbolic tangent function (Tanh) [53], and rectified linear units (ReLU) [54]. The expressions of the three functions are as follows: It can be seen from Figure 1 that the Sigmoid function can compress the input signal into the interval [0, 1] [52]. Since the data are compressed to the interval [0, 1], this function is mainly used to transform the input into a form of probability that also ranges from 0 to 1. However, when the input is large or small enough, the output approaches 1 or 0 after compression, which results in gradient dispersion. The Tanh function can be obtained by scaling and translating the Sigmoid function [53]. The mean value of the Tanh is 0, and its convergence speed is faster than that of the Sigmoid function, but it still cannot solve the problem of gradient vanishing. Therefore, the ReLU function is the most commonly used activation function, which has strong sparsity and greatly reduces the number of parameters. In addition, the ReLU function solves the problem of gradient vanishing in the positive interval, and its convergence speed is much faster than the Sigmoid and the Tanh functions. However, the ReLU function is difficult to update for some parameters in the negative interval. To solve the above mentioned problems, there are also some improved  Although there are different kinds of convolutional neural network structures, their structures are basically similar, mainly composed of convolution and activation functions. From a mathematical point of view, the relationship between layer k and layer k + 1 is as follows: where x is the k-layer input, W k is the convolution operation, b k is the bias term, f (·) is the nonlinear activation function, and x k+1 is the layer input.
Taking the ReLU activation function as an example, the nonlinear operation of the function satisfies f (0) = 0, and the input of layer k + 1 in Equation (4) can be converted into: The symbol • in the formula is the Hadamard product, that is, the product of the corresponding elements of two matrices. g k represents the weight mapping related to z k , which is defined as follows: where g k is 0 when z k is 0. The ReLU curve in Figure 1c is derived, and the following formulas can be obtained: Although the CNNs of various complex structures are proposed to improve the denoising performance in SAR image speckle suppression tasks, these network structures cannot avoid a large number of network parameters. Here, a modified xUnit activation is proposed and incorporated into ID-CNN structure to further improve the denoising performance. According to the characteristics of the ReLU shown in Figure 1c, a ReLU derivative curve shown in Figure 1d can be obtained. At this point, the ReLU activation function can be considered as a threshold unit (0 or 1) shown in Figure 2a, where g k denotes the threshold unit (0 or 1), and x k and x k+1 respectively represent the ReLU functions of input and output. By contrast, a learnable spatial activation function is proposed to construct a weight map in the range [0, 1] so that each element is related to the spatial neighborhood of its corresponding input element [51].
It can deduce that g k is a threshold unit related to z k . The formula is as follows: Remote Sens. 2021, 13, 3444 5 of 22 Equation (8) represents the redefined ReLU activation function, and its structure is shown in Figure 2a. Multiplies represents the product of corresponding elements, namely the Hadamard product.
As can be seen from Figure 2a, the input x k is convoluted to get z k . The ReLU function can be regarded as setting the threshold unit g k value according to each element value of z k and multiplying it to get the output x k+1 . According to Figure 2a, the g k value of the M-xUnit function in this section is not only related to the corresponding element of z k but is also related to the spatial adjacent elements. The basic idea is to construct a weight mapping in which the weight of each element depends on the corresponding spatial adjacent input elements. This relationship can be realized by convolution operation. As shown in Figure 2c, the constructed weight mapping realizes nonlinear operation through a ReLU activation function, then successively going through the deep convolution and the Tanh function, and, finally, the weight range is mapped between [−1, 1]. Among these operations, the employment of deep convolution ensures that the weight of each element is related to its corresponding spatial adjacent elements. The introduction of Tanh solves the problem that the output of ReLU is not zero-centered and makes up for the defect that the element value is zero by ReLU when it is less than or equal to zero, which avoids the phenomenon that some units will never be activated.
In Figure 2c, g k of M-xUnit activation unit is defined as: where H k represents deep convolution, while d k represents the output of deep convolution.
To enable xUnit shown in Figure 2b to perform better in the ID-CNN architecture and the task of SAR despeckling, the Gaussian function was replaced with a hyperbolic tangent function (Tanh), and two batch normalization (BN) [55] layers were removed. Tanh function can map the dynamic range to [−1, 1] so that the mean value of the output distribution is zero and resembles the identity function when input remains around zero. As shown, the BN layers cannot improve the denoising performance of the SAR image, Remote Sens. 2021, 13, 3444 6 of 22 and the possible reason is that the output of the hidden layer is normalized by BN, which destroys the distribution of the original space [39,56]. The modified xUnit is shown in Figure 2c, where Conv2d represents the operation of convolution, and Conv2d Depthwise denotes deep convolution [57] whose kernel size is set as 9 × 9. Figure 3 shows the difference between deep convolution and ordinary convolution. One convolution kernel of deep convolution corresponds to one channel, and each channel can only be convoluted by one kernel. The number of output channels generated in this process is the same as the number of input channels. Compared with ordinary convolution, deep convolution has lower parameters and lower operation cost, which is the main reason why M-xUnit has fewer parameters. Although the number of parameters for a function increases, compared with the parameters of the whole network, the increase of parameters is quite limited. At the same time, the speckle suppression performance of the network is also improved. Assume the size of the input image is M × N with m channels, the size of the convolution kernel is k × k, and the number of kernels is n. As is shown in Figure 3, for ordinary convolution, each output channel is convoluted by m kernels, thus the computational complexity is M × N × k × k × m × n, while for deep convolution, each output channel is convoluted by only one kernel, thus its computational complexity is M × N × k × k × n. It can be seen that the computation cost of deep convolution is 1/m times that of ordinary convolution.
However, deep convolution cannot expand the feature maps, and because the input channels are convoluted separately, it cannot effectively use the feature information of different channels in the same spatial position. Therefore, it is necessary to employ pointwise convolution to combine the feature maps generated by deep convolution into a new feature map. The combination of the two convolutions forms a deep separable convolution, which is very suitable for the lightweight network of the mobile terminal. This is also the core of Mobilenet [57] and Xception [58]. However, the basic idea of the M-xUnit activation function in this section is to correlate the weight mapping with the corresponding spatial adjacent elements of the input elements. This only requires a convolution operation and does not need to consider whether or not to effectively use the feature information of the same spatial position between channels. Therefore, this section only adds one layer of deep convolution to the M-xUnit activation unit.

Loss Function
Loss function is the basic and critical factor and can measure the prediction effect of a model [59,60]. It can be effectively applied to various tasks of deep learning through the definition and the optimization of loss function. In Section 2.2, the combination of Euclidean loss and TV loss as a loss function is used. Specifically, the Euclidean loss is used to minimize the error between the estimated image and the target image. Moreover, the TV loss is used to smooth the predicated image. They are defined as: where L E is Euclidean loss, L TV is TV loss, and W and H respectively represent the width and the height of the image. X w,h and X w,h respectively represent the pixel values of the clean image and the estimated image. In particular, L TV is set as 2 × 10 −7 to make Euclidean loss dominant in this network.

Evaluation Index of SAR Image Speckle Suppression
It can be carried out from two aspects-subjective evaluation and objective evaluati on-to judge the quality of the denoised images. Subjective evaluation is to observe, analyze, and judge the result of speckle suppression from human vision, which is mainly reflected in the preservation of image texture and detail information. Objective evaluation uses undistorted images as evaluation, and the commonly used indexes are peak signal to noise ratio (PSNR) [61], structural similarity index (SSIM) [62], equivalent numbers of looks (ENL) [63], mean value [64], and standard deviation [65]. In this paper, PSNR and SSIM are used to evaluate the simulated SAR experiment, and ENL is used to evaluate the real SAR image experiment.
(1) PSNR PSNR is the most widely used objective evaluation index based on the error between corresponding pixels, which is often defined by mean square error (MSE). MSE is defined as follows: where X and Y represent two images of m × n sizes, respectively. The PSNR formula is defined as follows: where MAX represents the maximum pixel value of the image. The higher the PSNR value is, the better the effect of noise suppression is.
(2) SSIM SSIM mainly measures the similarity between the denoised image and the reference image, which is mainly reflected in brightness, contrast, and structure. The interval range of SSIM value is generally between 0 and 1. The closer it is to 1, the higher the similarity Remote Sens. 2021, 13, 3444 8 of 22 between the two images is, which results in better image processing. The calculation method is as follows: where m and n are two images, µ m and µ n are the mean values of image m and image n, σ 2 x and σ 2 y are the variances of image m and image n, and σ mn is the covariance. c 1 and c 2 are constants to avoid division by zero.
(3) ENL ENL is a generally accepted speckle reduction index in the field of SAR image speckle suppression. It can measure the smoothness of the homogeneous region. The larger the value is, the smoother the region is and the better the noise suppression effect is as well. The formula can be defined as: where F C is a constant related to the SAR image format, and if it is a SAR image with intensity format, F C = 1. If the SAR image is in amplitude format, then F C = 4/π − 1. µ and σ 2 represent the mean and the variance of the region, respectively.

Results
A series of experiments are set to evaluate the performance of the proposed model in Section 3. The despeckled results on synthetic SAR images are shown in this section. Additionally, the despeckling performance with the changing of network parameters numbers is investigated. Finally, the real SAR images are used to test the effectiveness of the proposed method, and the performance evaluated by the ENL is compared with some state-of-the-art methods, including PPB, SAR-BM3D, and ID-CNN. Besides, this section uses forest images to verify the effectiveness of the method. Specially, it is shown that ID-CNN outperforms PPB and SAR-BM3D in ref. [38].

Performance Analysis of M-xUnit Activation Function
In these experiments, the NWPU-RESISC45 dataset [66] was used for training and testing. In the dataset, 400 images with sizes of 256 × 256 pixels were chosen for training, and 10 images with sizes of 256 × 256 pixels were selected to test. These images cover more than 100 countries and regions all over the world, including developing, transitioning, and highly developed economies. This dataset was also collected by the experts in the field of remote sensing image interpretation from Google Earth (Google Inc.). These training images and test images are shown in Figures 4 and 5, respectively. In order to enhance the training data, these images were scaled in proportion to 1, 0.9, 0.8, and 0.7, and the scaled images were randomly flipped and rotated. The patches with sizes of 40 × 40 were extracted from training images with a step size of 10, and 547,584 patches were obtained. Finally, these patches were synthesized with speckle noise to obtain the synthetic SAR images. Figure 6a shows the process of simulated SAR images. A speckle noise was generated by Equation (2). Then, a multiplicative noise model was established between the noise and the clean image. ing images and test images are shown in Figures 4 and 5, respectively. In order to enhance the training data, these images were scaled in proportion to 1, 0.9, 0.8, and 0.7, and the scaled images were randomly flipped and rotated. The patches with sizes of 40 × 40 were extracted from training images with a step size of 10, and 547,584 patches were obtained. Finally, these patche.    The training process of this model took a total of 60 epochs with a mini-batch size of 128. The Adam method [67,68] with the default setting of the gradient descent optimization method was used. The initial learning rate was 0.001 and was multiplied by the decay factor 0.1 after 30 epochs. The proposed method was implemented in Pytorch, and all experiments were tested in the Windows 10 environment with an Intel Core CPU 3.7 GHz and an NVIDIA RTX 2080 GPU.
Inspired by the principle of xUnit [51], the M-xUnit was applied in the SAR image despeckling task. In this paper, ID-CNN [38] was used to test the performance of this spatial learnable activation. The noise estimation part of the network consisted of eight convolutional layers. The main reason for choosing the ID-CNN structure was that the network structure is not affected by some structures such as dilated convolution, skip connections, and densely connected networks. The proposed CNN architecture is shown in Figure 6b, and the detailed configurations of the structure are described in Table 1. Differing from ID-CNN, a series structure with convolution operation, batch normalization (BN), and M-xUnit was employed in L2 to L7.   Figure 7a,b shows the average PSNR results of two different denoisers whose activation functions were M-xUnit and ReLU, respectively. Figure 7a shows the denoising results of ID-CNN with different numbers of M-xUnit. M-xUnit-1 meant that only one layer of "Conv + BN + M-xUnit" was added in the middle of the network, and the total number of network layers was three. Figure 7b presents the denoising results of ID-CNN with regular ReLU activation function; the layer setups were the same as Figure 7a for comparison. We can see that ID-CNN with M-xUnit outperformed the original ID-CNN for all six different layer setups. To further facilitate the superiority of M-xUnit, we set the number of parameters as abscissa in Figure 7c,d to show the relationship between denoising performance and network complexity. It was found that PSNR and SSIM values obtained by two layers of "Conv + BN + M-xUnit" were equivalent to those obtained by six layers of "Conv + BN + ReLU", and in the case of three layers, its performance was completely superior to that of ID-CNN. We can also see that the parameters of two layers of "Conv + BN + M-xUnit" were fewer than half of the parameters required by ID-CNN, and the comparison is shown in Table 2. Comparing the structure of M-xUnit and ReLU in Figure 2, it increased the number of network parameters if the ReLU was merely replaced by the modified xUnit. This meant more memory consumption and running time at the training and the testing stages were required. In [51], compared with ReLU, the xUnit based structure could achieve the same performance with fewer network layers. Finally, fewer network parameters were involved.
To prove that the M-xUnit has better performance on the ID-CNN structure than the original xUnit, a test experiment was conducted in advance. As shown in Table 3, the average PSNR of 10 test images, which are shown in Figure 6, with the speckle noise level of L = 10 for the two structures was compared. It was found that M-xUnit and xUnit activation functions had almost the same performance for the 10 tested images, but the proposed M-xUnit structure was simpler, which meant fewer network training parameters and fewer computational resources were needed for the proposed structure.

Results on Synthetic SAR Images
To verify the denoising effectiveness with known noise level in SAR image despeckling, three different speckle noise levels of L = 1, 4, and 10 were set up for the test images. In this paper, peak signal to noise ratio (PSNR) and structural similarity (SSIM) were used to evaluate the denoising effectiveness for synthetic SAR images, and the results of synthetic despeckled images are listed in Table 4. As shown in Table 4, the proposed method obtained the best denoising results compared to the other methods in all the different noisy levels. We found that the proposed method achieved the best denoising results compared to other methods for all the different noisy levels. Figures 8 and 9 show the despeckled results affected by speckle noise level of L = 4. It can be observed that the despeckled results were consistent with the visual results by comparing the zoomed-in patches shown at the lower right corner of these images. of L = 10 for the two structures was compared. It was found that M-xUnit and xUnit activation functions had almost the same performance for the 10 tested images, but the proposed M-xUnit structure was simpler, which meant fewer network training parameters and fewer computational resources were needed for the proposed structure.

Results on Synthetic SAR Images
To verify the denoising effectiveness with known noise level in SAR image despeckling, three different speckle noise levels of L = 1, 4, and 10 were set up for the test images. In this paper, peak signal to noise ratio (PSNR) and structural similarity (SSIM) were used to evaluate the denoising effectiveness for synthetic SAR images, and the results of synthetic despeckled images are listed in Table 4. As shown in Table 4, the proposed method obtained the best denoising results compared to the other methods in all the different noisy levels. We found that the proposed method achieved the best denoising results compared to other methods for all the different noisy levels. Figures 8 and 9 show the despeckled results affected by speckle noise level of L = 4. It can be observed that the despeckled results were consistent with the visual results by comparing the zoomed-in patches shown at the lower right corner of these images.

Results on Real SAR Images
In Section 3.3, the real Flevoland and the Death Valley SAR images were evaluated for the despeckling test by the proposed method and some other state-of-the-art methods, shown in Figures 10 and 11. The Flevoland and the Death Valley SAR images were acquired by the airborne synthetic aperture radar (AIRSAR) and cropped to 600 × 600 pixels as the test SAR image.
It can be observed that the despeckled result by SAR-BM3D still contained residual speckle noise. Moreover, a few texture distortions were generated after the PPB processing. Based on the visual inspection, the ID-CNN performed almost as well as the proposed method for the despeckling. The difference was quite small. Since there were no

Results on Real SAR Images
In Section 3.3, the real Flevoland and the Death Valley SAR images were evaluated for the despeckling test by the proposed method and some other state-of-the-art methods, shown in Figures 10 and 11. The Flevoland and the Death Valley SAR images were acquired by the airborne synthetic aperture radar (AIRSAR) and cropped to 600 × 600 pixels as the test SAR image.
It can be observed that the despeckled result by SAR-BM3D still contained residual speckle noise. Moreover, a few texture distortions were generated after the PPB processing. Based on the visual inspection, the ID-CNN performed almost as well as the proposed method for the despeckling. The difference was quite small. Since there were no

Results on Real SAR Images
In Section 3.3, the real Flevoland and the Death Valley SAR images were evaluated for the despeckling test by the proposed method and some other state-of-the-art methods, shown in Figures 10 and 11. The Flevoland and the Death Valley SAR images were acquired by the airborne synthetic aperture radar (AIRSAR) and cropped to 600 × 600 pixels as the test SAR image.
It can be observed that the despeckled result by SAR-BM3D still contained residual speckle noise. Moreover, a few texture distortions were generated after the PPB processing. Based on the visual inspection, the ID-CNN performed almost as well as the proposed method for the despeckling. The difference was quite small. Since there were no specklefree data for the real SAR images, the ENL was employed to measure the performance of different methods. In Figures 10 and 11 speckle-free data for the real SAR images, the ENL was employed to measure the performance of different methods. In Figures 10 and 11, the ENL values are estimated from the two homogeneous regions within the red square. speckle-free data for the real SAR images, the ENL was employed to measure the performance of different methods. In Figures 10 and 11, the ENL values are estimated from the two homogeneous regions within the red square. As listed in Tables 5 and 6, the proposed method attained a better pe speckle reduction than the other methods.

Method Validation on Optical Images
From the above subjective analysis, it can be seen that this method was for SAR image speckle suppression. For better analysis and verification, the ages were selected from the human vision aspect, which was an optical ima small unmanned aerial vehicle (UAV) produced by China DJI (AIR 2S, D shown in Figures 12, they are coniferous forest, broad-leaved forest, and c leaved mixed forest. Most conifers [69] are evergreens, many of which have long and slend a needlelike appearance, including most of the Taxodiaceae. These leaves a tened, straight or slightly curved, pectinately arranged, obtusely pointed o cronate, tapering abruptly towards the articulated junction of the lamina w rent base. The broad-leaved tree [70] (such as maple or oak) can be distin trees bearing needlelike leaves (such as most conifers) by having relativ leaves and leaf texture, and the most common are oak (sessile and peduncul (silver and downy), but ash, sycamore, and beech are also quite common. U borne perspective, coniferous trees are generally darker than broad-leave clear tree crown boundaries, but the clarity of coniferous tree leaves is blur reason is that broad leaf has an obvious texture while conifer does not [71-73 iment selected the Metasequoia glyptostroboides forest of Nanjing University o Technology as the coniferous forest image, the broad-leaved trees near the Sun Yat-sen Mausoleum as the broad-leaved forest image, and the tree spe Yat-sen Mausoleum Meiling Palace as the conifer broad-leaved mixed for shown in Figure 12a,d,g, respectively. The tree species in Figure 12d,g inc aceae, pine, plane tree, Cinnamomum camphora, Photinia serrulata, etc.  Tables 5 and 6, the proposed method attained a better performance in speckle reduction than the other methods.

Method Validation on Optical Images
From the above subjective analysis, it can be seen that this method was very effective for SAR image speckle suppression. For better analysis and verification, the three tree images were selected from the human vision aspect, which was an optical image taken by a small unmanned aerial vehicle (UAV) produced by China DJI (AIR 2S, DA2SUE1). As shown in Figure 12, they are coniferous forest, broad-leaved forest, and conifer broadleaved mixed forest.
Most conifers [69] are evergreens, many of which have long and slender leaves with a needlelike appearance, including most of the Taxodiaceae. These leaves are linear, flattened, straight or slightly curved, pectinately arranged, obtusely pointed or shortly mucronate, tapering abruptly towards the articulated junction of the lamina with the decurrent base. The broad-leaved tree [70] (such as maple or oak) can be distinguished from trees bearing needlelike leaves (such as most conifers) by having relatively broad flat leaves and leaf texture, and the most common are oak (sessile and pedunculate) and birch (silver and downy), but ash, sycamore, and beech are also quite common. Under the airborne perspective, coniferous trees are generally darker than broad-leaved trees, with clear tree crown boundaries, but the clarity of coniferous tree leaves is blurred. The main reason is that broad leaf has an obvious texture while conifer does not [71][72][73]. This experiment selected the Metasequoia glyptostroboides forest of Nanjing University of Science and Technology as the coniferous forest image, the broad-leaved trees near the bandstand of Sun Yat-sen Mausoleum as the broad-leaved forest image, and the tree species near Sun Yat-sen Mausoleum Meiling Palace as the conifer broad-leaved mixed forest image, as shown in Figure 12a,d,g, respectively. The tree species in Figure 12d,g included taxodiaceae, pine, plane tree, Cinnamomum camphora, Photinia serrulata, etc.
Visually, due to complications stemming from the original numerous needles interlacing with each other and inadequate characterization based on the limited pixel resolution, the leaves of the coniferous forest after denoising were relatively blurred, each of which were hard to distinguish, as shown in Figure 12c. The leaves of the broad-leaved forest after denoising had a clear boundary because each leaf occupied several pixels, which was conducive to texture rendering and presence in Figure 12f. As can be seen from Figure 12i, after denoising, the leaves of conifers were blurred into blocks (marked with red circles), while those of broad-leaved trees retained obvious texture (marked with red rectangles), and the clarity of broad-leaved trees was higher than that of conifers.
For better analysis, the two relevant indices (i.e., PSNR and SSIM) are given in Table 7. The higher the PSNR value was, the better the effect of noise suppression was. The coniferous forest image had the lowest value (22.75 dB) of PSNR, while the broad-leaved forest image had the highest value (24.94 dB) of PSNR, which shows that the denoising effect of the broad-leaved forest was better than that of the coniferous forest. The closer SSIM was to one, the higher the similarity was between the two images, which resulted in better kept image details. It can be seen that the SSIM of the coniferous forest was 0.778 dB, and that of the broad-leaved forest was 0.806, from which we can infer that the image details of the broad-leaved forest were better preserved after denoising. Because the coniferous and the broad-leaved tree crowns were staggered, the two indices of the conifer broad-leaved mixed forest (PSNR 23.46 dB and SSIM 0.785) were lower than those of the broad-leaved forest and higher than those of the coniferous forest. Figure 12i shows a coniferous trees crown with an ambiguous upper appearance due to information loss caused by insensitivity of activatable functions for minute details (red circle marks) and broad-leaved trees with clear texture (red rectangle marks) because the activatable function had higher recognition of texture.
A well-denoised image can be obtained by our method, which is helpful to identify conifers and broad-leaved trees from the conifer broad-leaved mixed forest. It can also be concluded that our method is effective for multiplicative noise.

Discussion
The reason the convolutional neutral network method was used to suppress the speckle was that the traditional algorithm could improve the performance of noise suppression by introducing a new algorithm structure. However, detailed information could be still missing. The speckle suppression performance would be degraded if more details were to be retained. Therefore, traditional algorithms should seek the maximum balance between speckle suppression performance and information preservation. The speckle suppression itself is to solve the mapping problem from the observed image to the noise-free image. By virtue of its powerful feature extraction ability, convolutional neural networks achieve end-to-end mapping and good results in speckle suppression and information preservation. Convolutional neural networks are widely used in SAR image speckle suppression because of the advantages that this traditional algorithm does not have. Deep convolution neural networks achieve unprecedented performance in many low-level vision tasks, such as super-resolution reconstruction, image denoising, target detection, and recognition. However, the most advanced result is usually to design a very deep network with tens of millions of network parameters, which greatly limits the implementation of the algorithm on resource-limited platforms. Therefore, it is a challenge to run network models using low power and resource-limited platforms.
At present, scholars have done a great deal of research in reducing model parameters. One option is to improve convolution operation, such as 1 × 1 convolution, ACNet, and MobileNet. The second is to improve the activation function, such as Leaky ReLU, PReLU, DyReLU, and xUnit. Among them, PReLU, DyReLU, and xUnit are activation functions with learnable parameters. Their essence is to add learnable parameters to the activation function of the convolution layer, and their performance is much better than common ReLU functions. Although the number of additional parameters will be increased, this kind of function is characterized by its ability to achieve the same performance as the original network model with fewer network layers, thereby further reducing the parameters of these models without degrading the performance. Although convolutional neural networks have achieved very good results in SAR image speckle suppression, the structure has gradually changed from simple to complex with depth gradually changing from shallow to deep, resulting in an increasing complexity and thus requiring a large number of computing resources. This paper further improved the performance by introducing a learnable activation function at the cost of a small number of parameters.
We mainly introduced parameters that could be trained and learned with the network in the process of nonlinear operation, that is, spatial processing was introduced in the process of nonlinear operation, which constituted the structure of the algorithm. As shown in Figure 2c, the constructed weight mapping achieved nonlinear operation through a ReLU activation function and then successively deeply convolved with the Tanh function, and, finally, the weight range was mapped between [−1, 1]. Tanh was introduced to solve the problem that the output of the ReLU method was not zero-centered, making up for the defect that the element value was zero by ReLU when it was less than or equal to zero so as to prevent some cells from being activated. Compared with other algorithms, this algorithm is easier to implement, which makes it easy for readers to analyze and operate. In Section 3, 10 test images were used to evaluate the proposed method and ID-CNN by gradually reducing "Conv + BN + M-xUnit" and "Conv + BN + ReLU" blocks, as shown in Figure 7a,b. The training configuration was the same in both networks, and the network configuration is displayed in Table 1. The performances of the modified networks based on xUnit and ReLU with respect to number of parameters were compared, as demonstrated in Figure 7c,d. As can be seen from the figure, the proposed method achieved higher PSNR and SSIM with the same number of parameters. Alternatively, the modified xUnitbased network achieved the same PSNR and SSIM with significantly fewer parameters, suggesting, in the case of training models with large parameters, the "Conv + BN + ReLU" blocks can be replaced with fewer "Conv + BN + M-xUnit" blocks. Finally, this algorithm required little computational memory and time for speckling.
Based on the above analysis and discussion, the whole experiment achieved the expected effect, which exceeded the traditional methods.

Conclusions
A SAR image despeckling method using a CNN platform with a new learnable spatial activation function, M-xUnit, was proposed in this paper. Compared with the most advanced speckle processing methods, fewer network parameters were required for training without degrading performance. In addition to designing complex network structures for better despeckled results, improving the activation function is also a preferred choice for SAR image speckle suppression tasks. A total of 400 training images and 10 test images were used to illustrate the performance of the proposed method, and its effectiveness was verified by using real SAR images and forest optical images. Despeckling experiments on both synthetic and real SAR images indicate that the proposed method outperforms some state-of-the-art despeckling methods. We also applied the proposed method to forest optical images and achieved good results. It was also found that there was a large difference in the despeckling effect between coniferous forest and broad-leaved forest and thus concluded that the despeckling effect of the broad-leaved forest was better than that of the coniferous forest.
In future work, the method proposed in this paper will be extended by adding other algorithms so as to perform better and be applicable to a wider range of fields.

Data Availability Statement:
The NWPU-RESISC45 dataset presented in this study are openly available in reference number [66]. Publicly available forest images were analyzed in this study. This data can be found here: https://pan.baidu.com/s/1SDGmDxxHN_Fxc5PZIigqjg (senn) (accessed on 1 July 2021).