An Effective Image Denoising Method for UAV Images via Improved Generative Adversarial Networks

Unmanned aerial vehicles (UAVs) are an inexpensive platform for collecting remote sensing images, but UAV images suffer from a content loss problem caused by noise. In order to solve the noise problem of UAV images, we propose a new methods to denoise UAV images. This paper introduces a novel deep neural network method based on generative adversarial learning to trace the mapping relationship between noisy and clean images. In our approach, perceptual reconstruction loss is used to establish a loss equation that continuously optimizes a min-max game theoretic model to obtain better UAV image denoising results. The generated denoised images by the proposed method enjoy clearer ground objects edges and more detailed textures of ground objects. In addition to the traditional comparison method, denoised UAV images and corresponding original clean UAV images were employed to perform image matching based on local features. At the same time, the classification experiment on the denoised images was also conducted to compare the denoising results of UAV images with others. The proposed method had achieved better results in these comparison experiments.


Introduction
As a rapid evolution technology, the increased availability of unmanned aerial vehicles (UAVs) has drawn attention for their capability to generate ultra-high spatial resolution images. UAV however is a low-altitude remote sensing platform, which is affected by lightning, ground electromagnetic waves, illumination change, and mechanical noise from the UAV itself. These factors are sources of noise in UAV images, therefore, it is especially important to study how to remove noise from UAV images.
In recent years, there is a growing body of research on noise removal in remote sensing images. Liu et al. [1] used an auxiliary noise-free image as a prior, proposing a denoising method for remote sensing images based on partial differential equations. Rajapriyadharshini et al. [2] split the noisy images into several disjoint local regions and clustered the noisy images into several disjoint local regions to denoise SAR images. Bhosale et al. [3] designed wavelet filter to restore the remote sensing images and explored the effects of noise. Wang et al. [4] proposed high-order balanced multi-band multiwavelet packet transforms to denoise remote sensing images and claimed that an appropriate band number can improve the denoising performance. Xu et al. [5] provided a method based on blockwise nonlocal means algorithm to denoise repetitive image patches in remote sensing images. Penna et al. [6] utilized non local means with the stochastic distances to denoise SAR images. Kim et al. [7] adopted background registration processing and robust principle component analysis, proposing a method of noise filtering of LWIR/MWIR Imaging sensors.
Some researchers studied the denoising of remote sensing images based on sparse expression. Chang et al. [8] combined unidirectional total variation and sparse representation to learn a dictionary trained to fit the input data to remove random noises from remote sensing images. Cerra et al. [9] reduced the weight of sparse unmixing problem proposing a denoising method based on sparse reconstruction of simulated EnMAP data. Xu et al. [10] used the nonlocal sparse model and the iterative regularization technique to denoise SAR images.
Nowadays, deep learning is breeding new ideas and convolutional neural networks (CNNs) have become recognized as an efficient method that automatically learns deep-level feature representations from images. Denoising algorithms for natural images, based on deep learning are an emerging trend. Jain et al. [11] synthesized training samples from specific noise models and used convolutional networks as unsupervised learning procedure and image processing architecture. Xie et al. [12] put forward a method of image denoising that combined sparse representation and deep neural networks pre-trained by denoising auto-encoders. Burger et al. [13] introduced a plain multi-layer perceptron of mapping from a noisy image to a noise-free image. Wu et al. [14] used rectified linear function instead of sigmoid function as the hidden layer activation function of deep neural networks to achieve image denoising. Xu et al. [15] provided a method of using deep convolutional neural networks as reliable a support for robust deconvolution against artifacts for image restoration. Li et al. [16] combined sparse coding and auto-encoder to achieve image denoising. Mao et al. [17] put forward networks with multiple layers of convolution and deconvolution operators to learned end-to-end mappings from a denoised image to a clean image.
Many practical remote sensing applications require clear textures for ground objects in remote sensing images. Because of the wide range of remote sensing images, remote sensing image denoising must consider the overall spatial distribution of ground objects, which is different from the natural images. It is easier to extract deep texture features in remote sensing images from the deeper neural network structure. Nevertheless, the deeper networks are often more complicated to train because of the internal covariate shift. Batch normalization can address this problem due to the fact that it back propagates the gradients through the normalization parameters and preserves the representation ability of the networks [18][19][20]. Residual learning framework solves the gradient vanishing of deeper neural networks because it explicitly lets each few stacked layers fit a residual mapping instead of expecting these layers directly fit a desired underlying mapping [21][22][23][24]. Visual Geometry Group networks (VGGs) are beneficial for the deep-level representation of images [25][26][27], which is of great help for modeling mappings of highly complex ground features. Johnson et al. [27] utilized pre-trained VGG nets to extract high-level features by optimizing the perceptual loss function. Kiasari et al. [28] used perceptual loss to alleviate the problem of blurry image generation. Generative Adversarial Networks (GAN) with constantly improving optimization strategies have been applied widely to image generation and classification problems [20,22,29,30]. Dosovitskiy et al. [31] combined Euclidean distances with GAN to calculate the distances between image features extracted.
On the basis of distinctive demand for texture features from UAV images and combining with the continuous development of deep learning methodology, we propose a method based on generative adversarial networks to obtain clearer feature textures for denoised UAV images. In this paper, pretrained VGG networks are employed to extract deep-level complex features of UAV images and a perceptual reconstruction loss is combined with the pixel-based Euclidean loss for constant optimizations of the game theoretic min-max model to generate fake clean UAV images using powerful generation capability of GAN. In order to make the networks structure deeper and sturdier, the batch-normalization is used to regularize training data and multiple residual blocks are utilized to build the networks.

Deep Architecture
GAN can effectively learn the distributions of clean and noisy UAV images, using clean UAV images to denoise UAV images. The proposed neural network based on GAN directly learns the mapping between clean UAV images and noisy UAV images, based on the work of Goodfellow et al. [29], who generated realistic data by effectively learning the distributions of a training data set. They adopted a min-max adversarial game theoretic optimization framework to train a generative model G and discriminative model D simultaneously. GAN continuously train a model G so that the probability distributions of the images generated by G are indistinguishable from real images, thus fooling model D. According to this idea, we consider the UAV image image-denoising problem as a generative adversarial problem, aiming to learn a mapping directly from clean UAV images to noisy UAV images by constructing a GAN-based deep networks called Denoise UAV image Generative Adversarial Network (DUGAN). In order to learn a generator that can fool a discriminative model (D) so it can distinguish clean UAV images from generated denoised UAV images. Thus, a generative model G and a discriminative model D were designed.

Generative Model
The primary aim of generative model G is to generate clean denoised UAV images from real clean UAV images. As the original GAN model is unstable, artifacts in the output images are synthesized by generative model G, which do not meet ground-object texture requirements of UAV image applications. Therefore, the critical is to design a deeper structure to generate denoised UAV images. We constructed a generative model G containing 14 residual blocks ( Figure 1) to train the deeper networks in the G model, efficiently.
The gradient can easily vanish in the process of back propagation while training the deeper networks, which may result in the losses of image details and textures. The residual networks make a reference to each layer's input and learn a residual function rather than learn some functions that do not have references. This residual function is easier to optimize to solve the gradient vanishing problem structurally, at the level of the deeper networks, but greatly increases the number of network layers In the G model. Each residual block of generative model G comprises two convolutional layers and two batch normalization layers, then a skip connection is established in each residual block. Skip connection back-propagates the gradient to deeper layers and can pass gradients through multiple residual blocks smoothly, which helps recover the details in UAV images and assists the CNN to effectively denoise UAV images.
The distributions of data will have influence on the training of deep networks. When the depth of a network increases, the overall distribution of the activation input gradually approaches the upper and lower limits of the value interval of the non-linear function of the value interval, resulting in a slow network convergence. Batch normalization forces the distributions of the input values of any neuron in each layer of the neural networks back to the standard normal distribution, so that the active input values fall into the sensitive area of the nonlinear function. Small changes can lead to large changes in the loss function, which makes the training faster and easier. In our generative model G, the activation functions of two batch normalization layers in the residual blocks are relu. In our generative model G, the activation functions of two batch normalization layers in the residual blocks are relu. In addition to these residual blocks, we add two convolutional layers to generate simulated clean UAV images using the activation function, tanh. The specific settings of each layer of the generative model are as follows: Here, C(r,64) denotes a set of convolutional layers with 64 feature maps and activation function relu; C(64)B(r)C(64)B(r)SC represents a residual block; B(r) is a batch normalizational layer with activation function relu, and SC denotes a skip connection. There are in total, 14 residual blocks. C(t,3) represents a convolutional layer with three feature maps and activation function tanh.

Discriminative Model
GAN continuously trains the generative model G until the generated images cannot be differentiated by the discriminative model D from these actual, real image samples. Generative model G continuously generates better images so that the distributions of the images are undifferentiated from the distributions of real images. The discriminative model D is employed to distinguish simulated clean UAV images synthesized by the generative model from the corresponding actual clean UAV images. Model D can also be regarded as a judge and guidance for the generative model G. We constructed a discriminative model D that alternately updates G and D to solve the min-max adversarial game-theoretic optimization problem (Equation (1)): Here, I C represents the actual real clean image and

Discriminative Model
GAN continuously trains the generative model G until the generated images cannot be differentiated by the discriminative model D from these actual, real image samples. Generative model G continuously generates better images so that the distributions of the images are undifferentiated from the distributions of real images. The discriminative model D is employed to distinguish simulated clean UAV images synthesized by the generative model from the corresponding actual clean UAV images. Model D can also be regarded as a judge and guidance for the generative model G. We constructed a discriminative model D that alternately updates G and D to solve the min-max adversarial game-theoretic optimization problem (Equation (1)): Here, I C represents the actual real clean image and G CI denotes the generated clean image, p r is the sample distribution of the real clean image, p g is the sample distribution of the generated clean image. The proposed discriminative model is shown at the bottom part of Figure 1. The specific settings of each layer of the discriminative model are as follows:

Loss Function
The definition of loss function is critical for the performance of the proposed method. Some losses of image restoration are optimized at pixel-level [31,33,34] so that images are typically, overly smooth and thus lack high frequency content and have poor perceptual quality. Some researchers argued that reconstructed results would be better to optimize a perceptual loss function by minimizing perceptual differences between reconstructed images and the true ground images [27,35,36]. The quality of the image features can be improved by the perceptual reconstruction loss so that it can meet the requirements of UAV images for features acquired from ground objects textures. GAN have powerful ability of image generation by alternatively updating generative networks G and discriminative networks D. Dosovitskiy et al. [31] combined Euclidean distances with generative adversarial training to establish a loss function. The solutions of pixel loss optimization problems often result in perceptually unsatisfying solutions with overly smooth textures. Combining the perceptual reconstruction loss function with VGG networks the networks will encourage networks to enjoy feature representations of noisy images similar to those of actual clean images. Therefore, we propose a new advanced loss equation for better denoising results of UAV images. Perceptual reconstruction loss, generative adversarial loss and Euclidean loss are combined together to formulate the proposed loss function, which is as follows: Here, L pe is perceptual reconstruction loss, an appropriate measure for features extracted from a pretrained VGG networs instead of low-level pixel-wise error measures; L ga is adversarial loss; and L pi is pixel loss between noisy pixel of the noisy UAV images and pixel of the clean UAV images.
x and y are respectively the weights of L pe and L ga . The perceptual reconstruction loss based on the relu activation layers of the pretrained 19 layer VGG networks is defined as in [26,30]. The aim is to minimize the distances between high-level features, and L pe defined as Equation (3): Here, C i , W i , H i represent the channels, width and height of the images respectively, and V represents a non-linear CNN transformation pretrained by VGG19. U I GCI denotes generated clean UAV image and U I CI denotes the corresponding actual clean UAV image.
The generative adversarial loss encourages our networks to obtain better solutions laying in the manifold of reconstructed images by trying to fool the discriminative model, which is defined based on the probability that the discriminant model considers the generated denoised UAV images to be actual, clean UAV images, as shown in Equation (4): Here, D(U I GCI ) is the probability that the generated images are real clean UAV images. We minimize L ga continuously for better denoising results of UAV images. The per-pixel Euclidean loss is defined as Equation (5): Here, U I GCI is generated clean UAV image and U I CI represents the corresponding real clean UAV image.

Experimental Data
Because of the lack of UAV image data sets for denoising training and assessment, a new UAV image set for training and testing the proposed networks was built for our experiments. The UAV image training set was obtained by a CW-30 UAV in Guiyang city, Guizhou Province. The camera was a H5D-50 with the focal length of 50 mm; the size of entire image was 8176 × 6132 pixels and the flying height was 600-800 m. For the convenience of training, we cut entire images into small images using Photoshop; 400 images of 360 × 360 pixels were used as the clean training data and different levels of noise were added to these clean images to make them noisy images. The testing set includes two parts; one part consists of 40 pieces of UAV images and the other consists of 100 pieces of smaller UAV images, comprised of images of cars and trucks. The testing set was obtained from other mapping areas. One part of the testing set was collected with a CW-30 UAV in Yangjiang city, Guangdong Province. The camera was a SWDC5 with a focal length of 50 mm; the size of entire image is 8206 × 6078 pixels and the flying height is 600-800 m. The other test dataset was obtained using a CW-10 UAV in Wuhan city, Hubei Province. The camera was a ILCE-7R with a focal length of 28 mm; the size of entire image is 7360 × 4916 pixels and the flying height is 400-600 m.

Prameters Setting and Model Details
The entire networks are trained on a Nvidia GRID M60-8Q (8G) GPU using the tensorflow framework, the number of training iterations is 160 k. Due to the limited memory of the computer, the batch size of our experiment was 1. We used Aadm as an optimization algorithm and set the learning rate at 0.9. In the training process, we set x = 0.5 × 10 −3 , y = 2 × 10 −6 (in Equation (2)) by experimental experiences. All strides were 1 in the generative model G, while all other convolutions were composed of 3 × 3 sized kernels, except for the last convolution with a 1 × 1 sized kernel. In the discriminative model D, the first six layers were composed of 4 × 4 sized kernels with a stride of 2; the next layer was composed of 1 × 1 sized kernels with a stride 1 and the last two layers were composed of 3 × 3 sized kernels with a stride 1. In generative model G and discriminative model D, all padding modes padding edges of kernels.

Comparison and Qualitative Evaluation
We added synthesized noise to the testing images with three noise levels: 20, 35 and 55, and compare the proposed method with several state-of-the-art methods used for denoising of remote sensing images. These comparative results are presented in the following subsections. Figure 2 shows results when the noise level was 35. Images in columns a, b, c, d, and e are five randomly picked testing images showing different ground objects. The labels a 1 -e 1 denotes the actual clean UAV images, the a 2 -e 2 denotes the synthetic noise image, and the a 3 -e 3 denotes the UAV image denoised by method [8]. The label a 4 -e 4 denotes the denoised UAV image by method [5], the a 5 -e 5 identifies the UAV image denoised by method [17], and the a 6 -e 6 denotes the UAV image denoised by the proposed method. In each of the images, small rectangles in white and blue identify areas enlarged for examination in the larger white and blue rectangles in each series of images with different levels of noise.  (e4) (e5) (e6) Figure 2. Testing results of several denoising methods for UAV images with several ground objects.
In Figures 2 and 3, the experimental results of five randomly picked testing images of different ground objects have been provided. In the testing experiments with different noise levels, it can be observed that the proposed method preserves more distinct ground edges and more clear ground objects textures. Meanwhile, the denoised UAV images by DUGAN are better in overall visual effect and closer to the true ground objects in terms of UAV image structures. Due to the fact that we have relatively deeper networks and the loss function can better preserve the overall styles of the UAV images, deeper ground feature information can be extracted, which is of great help for the later application of UAV images.
(a1) (a2) (a3) In Figures 2 and 3, the experimental results of five randomly picked testing images of different ground objects have been provided. In the testing experiments with different noise levels, it can be observed that the proposed method preserves more distinct ground edges and more clear ground objects textures. Meanwhile, the denoised UAV images by DUGAN are better in overall visual effect and closer to the true ground objects in terms of UAV image structures. Due to the fact that we have relatively deeper networks and the loss function can better preserve the overall styles of the UAV images, deeper ground feature information can be extracted, which is of great help for the later application of UAV images. In each group of images, the 1st image (a1,b1,c1,d1,e1) is the ground truth; the 2nd image (a2,b2,c2,d2,e2) is a noise image with noise level 35; and the 3rd image (a3,b3,c3,d3,e3) presents the denoising results of method [8]. The 4th image (a4,b4,c4,d4,e4) presents the denoising results of method [5]; the 5th image (a5,b5,c5,d5,e5) presents the denoising results of method [17]; and the 6th image (a6,b6,c6,d6,e6) presents the denoising results of the proposed method.
In Figures 2 and 3, the experimental results of five randomly picked testing images of different ground objects have been provided. In the testing experiments with different noise levels, it can be observed that the proposed method preserves more distinct ground edges and more clear ground objects textures. Meanwhile, the denoised UAV images by DUGAN are better in overall visual effect and closer to the true ground objects in terms of UAV image structures. Due to the fact that we have relatively deeper networks and the loss function can better preserve the overall styles of the UAV images, deeper ground feature information can be extracted, which is of great help for the later application of UAV images.
As can be seen in the images shown in Figure 2, at noise level 35, the proposed method outperforms the other tested methods. For example, in the e series of images, the full size and enlarged portion of the image processed using proposed method is clearer, with sharper edges than the results from the other tested methods. Figure 3 is similar to Figure 2, with the same labeling scheme, order, and enlarged details, but shows results at noise level 55. It can be seen in Figures 2 and 3, a qualitative visual comparison of the proposed method and the other tested methods at different noise levels shows that the proposed method preserves distinct ground edges and clear ground object textures. The denoised UAV images produced by the proposed method are better in overall visual effect and closer to the true ground objects in terms of UAV image structures. Because the proposed model have relatively deeper networks and the loss function thus preserves the structure of terrain shown in the UAV images, so that deeper ground feature information can be extracted, which is of great help in later applications using these UAV images.
Tables 1 and 2 quantitatively compare denoised images obtained using the tested denoising methods using the Peak Signal to Noise Ratio (PSNR) and Structural Similarity Index (SSIM). Table 1 presents comparative image denoising results using PNSR for the tested methods at different noise levels. The columns indicate the tested method, while rows 3-7 represent the PSNR values of denoised images obtained using the test denoising methods when the noise level is 20. The rows 10-14 represent the PSNR values of denoised images obtained with several denoising methods when the noise level is 35, rows 17-21 represent the PSNR values of denoised images obtained with several denoising methods when the noise level is 55. Table 2 is similar to Table 1, but shows results using SSIM for the tested methods at different noise levels. As can be seen in Tables 1 and 2, the proposed method has the highest PSNR and SSIM [37] in different noise levels can be observed, this is consistent with the visual effects of the denoised UAV images.

Compare Denoising Results Using Image Matching
To verify our method further, the denoised UAV images gathered by several methods are matched with the real clean UAV images so that the results of the matching experiments were employed to compare the outcomes of denoising. Scale Invariant Feature Transform (SIFT) [38] is a well-known matching algorithm in image matching based on matching methods of image local features. The quality of the matching results can often reflect the similarities of the local characteristics of the two images, and the number of matching point pairs is the standard for judging the quality of matching results. In terms of the professional characteristics, the textures of ground features are important expressions of the UAV images of local features. Therefore, the matching results of denoising UAV images by SIFT can indicate the denoising results.
In order to ensure the objectivity of the experiment, the denoising results of the above five pieces of UAV images are used to match with the corresponding real clean UAV images by SIFT algorithm. Figures 4 and 5 shows the matching results by SIFT. In Figure 4, when the noise level is 35, the labels b 1 -e 1 indicates the SIFT matching result of the original clean image and the denoised image obtained by method [8], the labels b 2 -e 2 indicates the SIFT matching result of the original clean image and the denoised image obtained by method [5], the labels b 3 -e 3 indicates the SIFT matching result of the original clean image and the denoised image obtained by method [17], the labels b 4 -e 4 indicates the SIFT matching result of the original clean image and the denoised image obtained by the proposed method. Figure 5 is similar to Figure 4, but shows results at noise level 55.   In each matching image, the left image is the original real clean UAV image, and the right image is the denoised UAV image (noise level 35). The 1st image (b1,c1,e1) presents matching results from SIFT and the denoised UAV images using method [8] and original real clean UAV images. The 2nd image (b2,c2,e2) presents matching results from SIFT and the denoised UAV images using method [5] and the original real clean UAV images. The 3rd image (b3,c3,e3) presents matching results from SIFT and the denoised UAV images created by method [17] and original real clean UAV images; and the 4th image (b4,c4,e4) presents matching results from SIFT and the denoised UAV images using proposed method and the original real clean UAV images.
(b1) (b2)  c 1 ,e 1 ) presents matching results from SIFT and the denoised UAV images using method [8] and original real clean UAV images. The 2nd image (b 2 ,c 2 ,e 2 ) presents matching results from SIFT and the denoised UAV images using method [5] and the original real clean UAV images. The 3rd image (b 3 ,c 3 ,e 3 ) presents matching results from SIFT and the denoised UAV images created by method [17] and original real clean UAV images; and the 4th image (b 4 ,c 4 ,e 4 ) presents matching results from SIFT and the denoised UAV images using proposed method and the original real clean UAV images.  . The 1st image (b1,c1,e1) presents matching results from SIFT and the denoised UAV images using method [8] and original real clean UAV images. The 2nd image (b2,c2,e2) presents matching results from SIFT and the denoised UAV images using method [5] and the original real clean UAV images. The 3rd image (b3,c3,e3) presents matching results from SIFT and the denoised UAV images created by method [17] and original real clean UAV images; and the 4th image (b4,c4,e4) presents matching results from SIFT and the denoised UAV images using proposed method and the original real clean UAV images.
(b1) (b2)  (e3) (e4) Figure 5. In each matching image, the left image is the original real clean UAV image, and the right image is the denoised UAV image (noise level 55). The 1st image (b1,c1,e1) presents matching results from SIFT and the denoised UAV images using method [8] and original real clean UAV images. The 2nd image (b2,c2,e2) presents matching results from SIFT and the denoised UAV images using method [5] and the original real clean UAV images. The 3rd image (b3,c3,e3) presents matching results from SIFT and the denoised UAV images created by method [17] and original real clean UAV images; and the 4th image (b4,c4,e4) presents matching results from SIFT and the denoised UAV images using proposed method and the original real clean UAV images.
In Table 3, Column 1 denotes five randomly picked testing images, Column 2 denotes correct matching pairs of the original clean image and the denoised image obtained by method [8], Column 3 denotes correct matching pairs of the original clean image and the denoised image obtained by method [5], Column 4 denotes correct matching pairs of the original clean image and the denoised image obtained by method [17], Column 5 denotes correct matching pairs of the original clean image and the denoised image obtained by proposed method. As is shown in Figures 3-5, it can be observed that the denoised images generated by proposed method in two noise levels (noise level: 35, 55) obtain more correct matching pairs than other methods in number.

Compare Denoising Results Using Image Classification
In this experiment, we used 100 denoised UAV images (60 cars and 40 trucks) to conduct images classification experiment to compare denoising results of several methods ( Figure 6). The classification networks consist of five layers: one input layer, three hidden convolutional neural networks (convolution kernel size is 3), and a softmax output layer. We used 1 and 0 to represent clean images of cars and trucks, respectively. Then, we employed 80 clean UAV images to train classification networks and 20 clean UAV images to test the trained classification networks. After completing the classification networks training, the classification correct rate of the testing images reaches 85%. The definition of classification correct rate is shown in Equation (6): Figure 5. In each matching image, the left image is the original real clean UAV image, and the right image is the denoised UAV image (noise level 55). The 1st image (b 1 ,c 1 ,e 1 ) presents matching results from SIFT and the denoised UAV images using method [8] and original real clean UAV images. The 2nd image (b 2 ,c 2 ,e 2 ) presents matching results from SIFT and the denoised UAV images using method [5] and the original real clean UAV images. The 3rd image (b 3 ,c 3 ,e 3 ) presents matching results from SIFT and the denoised UAV images created by method [17] and original real clean UAV images; and the 4th image (b 4 ,c 4 ,e 4 ) presents matching results from SIFT and the denoised UAV images using proposed method and the original real clean UAV images.
In Table 3, Column 1 denotes five randomly picked testing images, Column 2 denotes correct matching pairs of the original clean image and the denoised image obtained by method [8], Column 3 denotes correct matching pairs of the original clean image and the denoised image obtained by method [5], Column 4 denotes correct matching pairs of the original clean image and the denoised image obtained by method [17], Column 5 denotes correct matching pairs of the original clean image and the denoised image obtained by proposed method. As is shown in Figures 3-5, it can be observed that the denoised images generated by proposed method in two noise levels (noise level: 35, 55) obtain more correct matching pairs than other methods in number.

Compare Denoising Results Using Image Classification
In this experiment, we used 100 denoised UAV images (60 cars and 40 trucks) to conduct images classification experiment to compare denoising results of several methods ( Figure 6). The classification networks consist of five layers: one input layer, three hidden convolutional neural networks (convolution kernel size is 3), and a softmax output layer. We used 1 and 0 to represent clean images of cars and trucks, respectively. Then, we employed 80 clean UAV images to train classification networks and 20 clean UAV images to test the trained classification networks. After completing the classification networks training, the classification correct rate of the testing images reaches 85%. The definition of classification correct rate is shown in Equation (6): where, N C is the number of correctly classified UAV images and N T is the number of total UAV images. We add different noise levels (noise level: 35, 55) to those 100 clean images, and obtain denoised images by several denoising methods. The trained classification networks are used to classify the denoised UAV images. From Table 4, it can be seen that the classification results of denoised images obtained by our method enjoy a higher accuracy rate, which can reflect indirectly that denoised images obatined by our method are more similar to clean images.  (6) where, C N is the number of correctly classified UAV images and T N is the number of total UAV images. We add different noise levels (noise level: 35, 55) to those 100 clean images, and obtain denoised images by several denoising methods. The trained classification networks are used to classify the denoised UAV images. From Table 4, it can be seen that the classification results of denoised images obtained by our method enjoy a higher accuracy rate, which can reflect indirectly that denoised images obatined by our method are more similar to clean images.  Through these three comparative experiments with different noise levels, it can be observed that the denoised images obtained by the proposed method can obtain better denoising results. In the SIFT matching algorithm, the denoised images of our method gets more correct matching pairs, which shows that our method can better restore the local characteristics of UAV images and preserve the textures, as observed via classification experiments with denoised images that denoised images generated by our method are more similar to real clean images.

Conclusions
In this paper, we use perceptual reconstruction loss function and pixel-based loss function and propose a method of denoising UAV images based on generative adverserial networks. According to the special requirements of the UAV images for the textures of ground objects, multiple residual blocks are used to build a deep learning framework, which makes the denoised images obtain more details of the texture features. The denoising results of this method yield better results in the traditional evaluation methods. Meanwhile, in the experiments based on local feature matching and image classification, good results are achieved, which is helpful for subsequent applications of UAV  [8]; (c) Represent the denoising results of method [5]; (d) Represent the denoising results of method [17]; (e) Represent the denoising results of proposed method. Through these three comparative experiments with different noise levels, it can be observed that the denoised images obtained by the proposed method can obtain better denoising results. In the SIFT matching algorithm, the denoised images of our method gets more correct matching pairs, which shows that our method can better restore the local characteristics of UAV images and preserve the textures, as observed via classification experiments with denoised images that denoised images generated by our method are more similar to real clean images.

Conclusions
In this paper, we use perceptual reconstruction loss function and pixel-based loss function and propose a method of denoising UAV images based on generative adverserial networks. According to the special requirements of the UAV images for the textures of ground objects, multiple residual blocks are used to build a deep learning framework, which makes the denoised images obtain more details of the texture features. The denoising results of this method yield better results in the traditional evaluation methods. Meanwhile, in the experiments based on local feature matching and image classification, good results are achieved, which is helpful for subsequent applications of UAV images. In the deep networks, each layer of the networks can be viewed as a filter. The deeper networks have more filters of UAV images denoising, which make the entire networks become more nonlinear. The nonlinearity of the networks is the critical factor of the proposed method, which makes it superior to other denoising methods. In the future work, we will further explore how to simulate real UAV noise so as to obtain more ideal denosing effect of UAV images based on adversative learning.
Author Contributions: R.W. executed all the analyses and wrote most of the paper; X.X. and R.W. reviewed the content and offered substantial improvement for this paper; B.G. and Q.Q. development and coordination of the project and helped in the interpretation of the results; R.C. shaped significant ideas for the analyses of this paper.