UAV Image Denoising and Its Impact on Performance of Object Localization and Classification in UAV Images

Tsekhmystro, Rostyslav; Lukin, Vladimir; Krytskyi, Dmytro

doi:10.3390/computation13100234

Open AccessArticle

UAV Image Denoising and Its Impact on Performance of Object Localization and Classification in UAV Images

by

Rostyslav Tsekhmystro

^1,2,

Vladimir Lukin

¹

and

Dmytro Krytskyi

^2,*

¹

Department of Information-Communication Technologies, National Aerospace University “KhAI”, 61070 Kharkiv, Ukraine

²

Department of Design Information Technologies, National Aerospace University “KhAI”, 61070 Kharkiv, Ukraine

^*

Author to whom correspondence should be addressed.

Computation 2025, 13(10), 234; https://doi.org/10.3390/computation13100234

Submission received: 27 August 2025 / Revised: 21 September 2025 / Accepted: 30 September 2025 / Published: 3 October 2025

(This article belongs to the Section Computational Engineering)

Download

Browse Figures

Versions Notes

Abstract

Unmanned aerial vehicles (UAVs) have become a tool for solving numerous practical tasks. UAV sensors provide images and videos for on-line or off-line data processing for object localization, classification, and tracking due to the use of trained convolutional neural networks (CNNs) and artificial intelligence. However, quality of images acquired by UAV-based sensors is not always perfect due to many factors. One of them could be noise arising because of several reasons. Its presence, especially if noise is intensive, can make significantly worse the performance characteristics of CNN-based techniques of object localization and classification. We analyze such degradation for a set of eleven modern CNNs for additive white Gaussian noise model and study when (for what noise intensity and for what CNN) the performance reduction becomes essential and, thus, special means to improve it become desired. Representatives of two most popular families, namely the block matching 3-dimensional (BM3D) filter and DRUNet denoiser, are employed to enhance images under condition of a priori known noise properties. It is shown that, due to preliminary denoising, the CNN performance characteristics can be significantly improved up to almost the same level as for the noise-free images without CNN retraining. Performance is analyzed using several criteria typical for image denoising, object localization and classification. Examples of object localization and classification are presented demonstrating possible object missing due to noise. Computational efficiency is also taken into account. Using a large set of test data, it is demonstrated that: (1) the best results are usually provided for SSD Mobilenet V2 and VGG16 networks; (2) the performance characteristics for cases of applying BM3D filter and DRUNet denoiser are similar but the use of DRUNet is preferable since it provides slightly better results.

Keywords:

UAV; color image; object localization and classification; convolutional neural networks; noise filtering; performance analysis

1. Introduction

The past ten years have been the time of intensive developments of unmanned aerial vehicles (UAVs) and drones [1,2]. They have found numerous civil and military applications including urban planning, precision agriculture, battlefield control and so on [3,4,5]. Modern UAVs and drones are usually equipped with one or several sensors able to acquire images or video for sensed territories [6]. Optical sensors are the most popular and wide-spread ones providing information in a commonly perceived way. Object localization and classification are typical operations in analyzing such images [7,8,9]. Convolutional neural networks (CNNs) are typical tools applied for solving these tasks where, according to [7], the existing CNNs can be classified into two groups: —two stage [10,11] and one-stage [12] CNNs—where each group has its own advantages and drawbacks.

Quality of images acquired by UAV-based sensors is not perfect [13,14,15]. Blur, weather conditions, noise and other factors can significantly reduce image quality and lead to degradation of performance of object localization and classification [16,17,18]. In particular, noise can arise due to low light conditions and low quality of cheap cameras installed on-board of drones. Negative influence of the noise on object localization and classification has been clearly demonstrated in [18]. Note that the sensitivity of different CNNs to noise is not the same. For example, SSD Lite [12], Faster R-CNN [11], and RetinaNet [19] are quite sensitive.

Image/video denoising can be helpful for improving the performance of UAV-based imaging systems [20,21,22]. Wang et al. [20] proposed to apply noise reduction using Improved Generative Adversarial Networks; they have demonstrated image quality to become better in terms of peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) metrics and this is favorable for image matching based on local features. Jin et al. [17] have shown that the designed denoising method based on Guided Pixel Aggregation Network is able to significantly improve maritime image quality and can be useful for maritime target detection. Wavelet-based denoising technique has been proposed by Niu et al. and successfully tested for UAV images [20]. A rather complex denoising technique has been designed by Lu et al. [21] and applied to gas and oil pipeline inspection.

In addition, numerous filters have been proposed for other color imaging and remote sensing applications. Most of them are based either on non-local processing principle and orthogonal transforms [23,24,25] or on trained neural networks (see [26,27] and references therein). The use of denoising in remote sensing (RS) data processing has some peculiar features. Alongside with characterization of denoising efficiency in terms of standard metrics such as mean square error, PSNR, and SSIM, it is necessary to consider also metrics (criteria) characterizing the final goal of RS data use such as classification accuracy [28,29], target detection reliability [30], segmentation characteristics [31], and so on depending on an application. In particular, the authors of the studies [28,29] have shown that efficient pre-filtering is able to considerably improve RS data classification. Good denoising is able to improve target detection [30] and image segmentation [31] characteristics. Although it is intuitively clear that more efficient filtering, on the average, results in better performance of image processing operations at further stages, there is no strict dependence established yet. There are classes for which texture feature preservation is important. Most misclassifications are observed in the neighborhood of edges and fine details [28], and edge/detail preservation [32] associated with visual quality [33] is also important as well. Note that SSIM [34] is obviously not the best visual quality metric [35].

Concerning object localization and classification, special criteria are usually employed including Intersection of Union (IoU) [36] and F1 score [37]. Our motivation is to look how noise intensity influences the performance according to these criteria. Moreover, image filtering efficiency should also be treated in terms of these criteria. This is one point dealing with novelty of our research. As possible filtering approaches, we consider the block matching 3D (BM3D) filter [24,25] as one of the best representatives of nonlocal transform based techniques and DRUNet neural network (NN)-based filter [38] as one useful NN-denoiser. Their performance is compared in terms of several performance criteria, and this constitutes another novelty in this paper. Eleven modern CNNs that can be applied for solving the considered task are studied. Their performance including computational efficiency, is considered and compared. This is the third novelty aspect. We pay attention not only to detection characteristics for all types of objects but also especially to localization and classification of small size objects. This is another specific feature and novelty aspect of our paper.

The paper is organized as follows. First, image and noise models are introduced. Some aspects of CNN training are discussed. Quantitative criteria of localization and classification accuracy are considered. Then, Section 3 deals with experiment description whilst Section 4 is devoted to analysis of the obtained results. Computation aspects are discussed in Section 5. Finally, the conclusions follow.

2. Image/Noise Model and Object Localization and Classification

2.1. Image/Noise Model

Although UAV-based imaging systems are able to produce different types of images, in this paper, we consider color images or frames of video as the most typical data. Recall that, in contrast to modern customer digital cameras able to capture images of large size, UAV-based optical sensors usually produce image sizes of several hundred to several hundred pixels or slightly larger since on-board processing (at least, compression) has to be commonly performed in conditions of limited weight and power consumption [39].

Without losing generality, it is possible to assume that images are in RGB format of 24 bit/pixel. In this paper, we assume that noise is additive white and Gaussian with zero mean and equal variance σ² (σ is noise standard deviation—STD) in all color components where noise realizations are supposed independent in component images. Then, we have

I_{k i j}^{n} = I_{k i j}^{t} + n_{k i j},

(1)

where

I_{k i j}^{n}

is the noisy image value observed in the ij-th pixel of the k-th component (k = 1, …, 3),

I_{k i j}^{t}

is the corresponding true (noise-free) value,

n_{k i j}

denotes noise in the ij-th pixel of the k-th component image. On the one hand, we clearly understand that the used AWGN model used is too primitive, and it can essentially differ from real-life noise observed in practice. Meanwhile, some studies show that noise is additive in components of hyperspectral images acquired from UAVs [40]. On the other hand, to study the impact of denoising on object localization and classification, we need a noise model to start, and consideration of AWGN is a common step [23,24].

Noise present in images can be visible or not. In general, its visibility depends on image complexity (noise is usually more visible in homogeneous image regions and less visible or invisible in textural regions due to masking effects), noise intensity, and spatial correlation (spatially correlated noise is more visible than white one with the same statistical characteristics). We use the AU-AIR [41] dataset for generating noisy images for our research. Figure 1 gives an example where for STD = 7 (Figure 1b), noise is visible mainly in homogeneous regions, whilst, for STD = 25 (Figure 1d), noise is visible in textural regions as well.

Note that the denoising efficiency depends on both image complexity and noise intensity [23,24,25]. Improvement of PSNR (IPSNR) due to filtering (expressed in dB) usually increases if noise intensity (σ² in our case) becomes larger. Meanwhile, IPSNR varies in very wide limits—from almost zero (for highly textural images or images containing many details and non-intensive noise) to values exceeding 10–12 dB (for simple structure images with large homogeneous regions and large noise intensity). This means three things. First, there are cases when it is not worth expecting significant positive outcomes from applying denoising. Second, it is worth considering images of different complexity in our study. Third, one can be interested in special analysis for localizing and classifying objects of relatively small sizes.

2.2. Considered CNNs

Several neural networks with different architectures that can be used for object recognition were selected for the study. The neural network architectures were selected in order to explore different approaches, as well as different depths of neural networks, which are reflected in different computing capacities. The selected neural networks have both low computational requirements and a fairly large amount of computing power. In total, four types of neural networks were selected. These include the following:

-: Faster R-CNN [11]—as a representative of the classic NN architecture for object recognition. It has a two-stage architecture. The first stage is a region proposal network (RPN), which, based on a feature vector, predicts a list of regions that will be classified at the second stage, taking into account the objectiveness indicator. The classification part of the neural network, or the second stage, classifies each of the regions and rejects regions with low probability. This neural network has a fairly high demand on the computing power of the device, but shows a fairly high result in localization accuracy.
-: RetinaNet [19] is a representative of multiscale architecture. It is a neural network consisting of a feature vector extractor based on a convolutional neural network, as well as a feature pyramid network (FPN) component. FPN has several outputs corresponding to different feature matrix sizes. Each output has a part with a classifier and regressor for localizing and classifying regions. Accordingly, each output can work with a feature matrix of different sizes, which affects the size of the localized regions. The neural network has a slightly lower computational load, but its size is still significant.
-: Single Shot Detector (SSD) [12] is a neural network with a classic convolutional architecture. At the beginning of the neural network, there is a feature matrix extractor, followed by convolutional layers that perform the function of a pixel-by-pixel classifier at different scales of the feature matrix. The final stage of the neural network is the classification of regions and the rejection of proposals with low ratings. The computational load of this neural network is much lower than that of the previously considered ones.
-: You Only Look Once (YOLO) [42] is a neural network widely used in object recognition tasks. It has a relatively low computational load (compared to the neural networks discussed earlier) and implements an approach that is quite similar to SSD, which corresponds to the classification of regions of different sizes on feature matrices of different sizes. However, the YOLO architecture also uses FPN, which allows it to work better with regions of different sizes.

All neural networks, except for YOLO, are studied by us for different sizes of feature extractors, which should affect their accuracy and computational load. In particular, different depths of ResNet [43] (ResNet18, ResNet50, ResNet101, ResNetXt) were used for Faster R-CNN and RetinaNet, while MobileNetV2 [44] and VGG16 [45] were used for SSD.

2.3. Criteria of Accuracy for Object Localization and Classification

Several metrics are used to describe the effectiveness of neural networks, reflecting localization, classification, and accuracy. First, the percentage of correctly predicted regions (PoCPR) is calculated, which is expressed as the ratio of correctly predicted regions to the total number of regions, without taking into account the distribution of classes, the formula for which is given in (2)

P o C P R = \frac{T P}{A R} \times 100 %,

(2)

where TP is the true positive rate, and AR denotes the number of annotated regions for image.

The next metric is Intersection over Union (IoU), which reflects localization accuracy, taking into account the overlap of the predicted region relative to the annotated one. Classification accuracy is expressed using the F1 metric (3), which reflects the harmonic mean between Precision and Recall, providing a fairly accurate classification estimate:

F 1 = \frac{2 \times T P}{2 \times T P + F P + F N},

(3)

where FP is the false positive rate, FN is the false negative rate.

A good method of object localization and classification should produce all three values as large as possible (the maximal values of F1 and IoU are both smaller than unity). The introduced metrics reflect the effectiveness of object localization and classification from different viewpoints. For example, PoCPR describes the total percentage of correctly localized regions in the image without taking into account the accurate determination of classes and the quality of the localization itself. This indicator takes into consideration only the presence of a region in the required area of the image relative to the annotated regions. The IoU metric is responsible for the quality of region localization relative to the annotated ones, reflecting how high-quality the localized region is and whether or not it contains the necessary information. The F1 metric is responsible for the quality of region classification, taking into account information about the region from the annotated data. Thus, the set of metrics provides a stable and objective assessment of the effectiveness of localization and classification methods.

3. Experiment Description

3.1. Used Dataset

To carry out experiments, we need a dataset with the following properties:

(1): It should contain images (image fragments) typical for UAV-based imaging.
(2): Objects in this dataset have to belong to several typical classes under interest.
(3): These objects have to be of different sizes (to provide an opportunity to analyze the influence of this feature on object localization and classification); they have to be preliminarily annotated to ensure easy training and determination of quantitative performance criteria.
(4): The number of objects for all classes should be large enough to provide appropriate training and verification for obtaining reliable statistics of applied performance indicators.
(5): The objects have to be placed on a quite complex and diverse background to correspond to possible practical situations.

One good candidates for solving our tasks is the VisDrone dataset [46]. This dataset can be used for localization and classification tasks, as well as for object tracking tasks. The images were captured in different regions, with different traffic and people densities, as well as various environments and shooting conditions. In total, the dataset consists of 263 videos and 10,209 images that do not overlap with each other. The total number of frames, including videos and images, is 179,264 images. Considering the fact that the dataset contains 10,209 images with low redundancy, in our opinion, it is a good option for our research task.

The images in the dataset were annotated into regions, dividing them into 10 classes, which can be combined into several abstract categories. By classes, the dataset is divided into the classes “pedestrian”, “person”, “bicycle”, “car”, “van”, “truck”, “tricycle”, “tricycle with awning”, “bus”, and “motorcycle”. The distribution of classes in the dataset is quite uneven. In general, the classes can be combined into several categories, these are “person”, “car”, “truck”, “bus”, “tricycle”, and “bicycle”.

In general, the dataset contains almost 340 thousand annotated regions, which is sufficient for high-quality training of neural networks. These regions are annotated on images of different types, different weather conditions, different camera inclinations relative to the earth’s surface, and other factors. Also the markup was performed for all objects in the images, also taking into account small-sized objects. An example of the markup is given in Figure 2.

The main factors in choosing VisDrone are its size and quality of annotations. Another important factor that influenced the choice is the expanded number of classes and the possibility of combining them into more abstract categories. The variety of shooting parameters and the large size of the images are also important factors that are positive for this dataset.

The VisDrone dataset has been used in our previous studies [47,48]. Paper [48] gives details on CNN training, whilst the paper [47] presents data showing that the objects having a size smaller than 150–200 pixels are localized and classified worse than the objects having a larger size.

To improve the quality and stability of the results obtained, the AU-AIR dataset [41] was used in the study to test the performance of neural networks under various conditions. The structure of this dataset is quite similar to the training dataset, and therefore allows its use without additional processing. This approach allows for better determination of the accuracy parameters of neural networks, preventing memorization, as well as increasing the amount of data that will be used to calculate the accuracy metrics of the methods.

3.2. Preliminary Results

All CNNs mentioned in Section 2.2 have been trained for noise-free test images (12,900 images) and then applied to images in the verification set (3200 images), both noise-free and the ones corrupted by the noise. Recall that the number of the images in the verification set here is significantly larger than for the data presented in [18].

Let us start with the analysis of F1. The results are presented in Figure 3. Analysis shows the following:

(1): Almost for all types of CNNs, a larger STD (more intensive noise) leads to F1 reduction; for STD ≤ 10, reduction is not observed or is negligible.
(2): This reduction is different, i.e., some CNNs are more robust with respect to noise; for example, Faster R-CNN (Resnet50) is the most robust.
(3): For some CNNs, F1 is reduced by almost 0.1 for STD = 25 (e.g., for SSD MobileNetV2), i.e., this CNN performs well for almost noise-free images but, certainly, it is not the best choice for intensive noise.

IoU is an important indicator (criterion) of performance—the corresponding data are presented in Figure 4. Surprisingly, IoU values are practically the same for all considered values of σ, i.e., noise does not essentially influence accuracy of localization. According to this criterion, the results are practically at the same level for all considered CNNs.

Finally, Figure 5 represents data for PoCPR. As seen, for many CNNs, their performance according to this criterion does not depend on noise intensity. Only for YOLOv5m, the performance reduction is essential.

Then, denoising seems expedient if noise STD > 5, i.e., if noise is visible. Moreover, denoising is expedient if: (1) it improves F1; (2) it does not make other performance characteristics worse; (3) it is fast enough and does not require too many resources.

One can wonder why IoU remains practically the same (Figure 4) for different STDs whilst the CNN performance becomes worse according to other criteria. To understand this, we have carried out special experiments. We have looked at many particular images and results of their processing. Figure 6 shows one particular case. Annotation sample for the original (noise-free) image is presented in Figure 6a. There are three objects, all of the same class —“cars”. As one can see, red rectangles obtained by markup for the considered dataset are considerably larger than green rectangles generated by SSD (VGG16) CNN.

This takes place for many other images although there are also images (Figure 2) where objects and their positions are marked more carefully and accurately. In our opinion, this is the main reason why IoU is only slightly sensitive to noise. Figure 6b demonstrates an example when one object (the leftmost car) is not detected due to noise.

Figure 7 shows the image processing results for another CNN, YOLOv5m. Again, green rectangles are considerably smaller than the corresponding red ones and this influences IoU. In this case, the leftmost object marked as “car” is classified as two objects (“truck” and “car”, see Figure 7a,c). Due to intensive noise, the leftmost object is not correctly detected (Figure 7b).

4. Denoising Efficiency Analysis

4.1. Denoising Efficiency Analysis in Conventional Terms

Recall here that the BM3D filter [24] is based on searching similar patches, their grouping into 3D blocks, and processing using orthogonal transforms. Originally, the BM3D filter was designed to cope with AWGN with a priori known variance in grayscale images. Later, its versions for coping with signal-dependent and/or spatially correlated noise in grayscale and three-channel (color) images were proposed [25]. In both cases, it is supposed that noise properties are a priori known or accurately pre-estimated. The DRUNet filter [38] employs U-Net consisting of four scales as its backbone. It requires a priori availability of σ for AWGN or noise statistics for signal-dependent noise case. Below we assume that the noise STD is either a priori known or accurately pre-estimated [49,50].

Table 1 presents the results for image denoising. Average values of PSNR for input data (before denoising) and denoised images (for two considered filters) are given. The following can be concluded:

(1): PSNR improvement is observed in all cases.
(2): DRUNet performs by 0.8–1.6 dB better than the BM3D filter (these results are in full agreement with data in paper [38] and can be associated with the ability of DRUNet to learn); the difference is larger for small STD.
(3): The largest improvement of PSNR takes place for large values of STD.

4.2. Denoising Efficiency in Terms of Object Localization and Classification

Let us see how denoising influences object localization and classification accuracy. Analysis has been carried out for all considered values of STD. However, we present data for three values of σ: σ = 5 that corresponds to practically invisible noise; σ = 15 that relates to visible noise; σ = 25 that corresponds to very intensive noise.

Figure 8 shows the results for STD = 5. There are minor improvements due to filtering for ResNet50 and ResNetXt backbones for CNNs according to F1 metric for BM3D filter (see the data in Figure 8a), but there can not be small performance reduction due to filtering according to other metrics (see data in Figure 8b,c). So, we can state that for STD = 5 the filtering is practically useless in the sense of object localization and classification.

The results obtained for STD = 15 are presented in Figure 9. There are obvious improvements of F1 (Figure 9a), especially for Faster RCNN (ResNet50), SSD, and YOLOv5m when the DRUNet filter is applied. The results for the BM3D filter are usually worse than for the DRUNet filter, although performance improvement due to filtering is observed for almost all considered CNNs.

Analyzing IoU (Figure 9b), it is possible to state that the denoising by DRUNet is useless, whilst the BM3D filtering leads to minor negative outcomes. The same is the situation according to PoCPR (Figure 9c). The use of DRUNet produces approximately the same values as the processing of noisy images, whilst the use of BM3D results in minor degradation of performance.

For the most intensive noise (STD = 25), pre-filtering leads to significant improvement of F1 for all CNNs, especially for SSD CNNs (Figure 10a). IoU remains practically the same if DRUNet denoising is applied and becomes slightly worse if the BM3D pre-filtering is employed (Figure 10b). Meanwhile, the DRUNet pre-filtering can be expedient for slight improvement of the PoCPR, especially for SSD and YOLOv5m CNNs (Figure 10c).

For convenience of performance comparison, we have also collected all data in Table 2, Table 3 and Table 4. In addition, Figure 6c and Figure 7c show the results of object localization and classification for denoised images. As seen, denoising in both cases leads to detection of the leftmost car although its classification differs for the considered CNNs.

Summarizing the obtained results, it is possible to state that pre-filtering is expedient for visible noise if the main task is to improve (increase) F1. In this sense, the use of the DRUNet filter can be recommended.

4.3. Performance Analysis in Terms of Small-Sized Objects’ Detectability

We have also analyzed the performance of CNNs on original and pre-filtered images in terms of small-sized objects’ accuracy of localization and classification. For this purpose, we have chosen only objects with no more than 800 pixels in original images (recall that the images are downscaled before inputting them into CNNs). These objects are mainly humans (classes “person” and “pedestrian”) but also tricycles and some other classes imaged from large distances.

Figure 11 shows F1 values for three STD values—5, 15, and 25—as in the previous cases. As seen, for noisy images, there is a tendency toward F1 reduction with noise intensity growth (compare the corresponding data in Figure 11a–c) that can be associated with noise influence. Concerning the impact of pre-filtering, the results are different for different CNNs. For YOLOv5m, the use of pre-filtering leads to obvious positive outcomes. For Faster R-CNNs, the use of DRUNet-based pre-filtering is expedient. For Retina Net CNNs, the use of pre-filtering is, in general, expedient. For SSD CNNs, the use of the BM3D filter is useless, whilst it is worth applying the DRUNet filter.

IoU values for three values of STD are presented in Figure 12. Their analysis shows the following. First, for most CNNs, IoU values are smaller than those ones in Figure 8b, Figure 9b and Figure 10b. This means that small-sized objects are localized with worse accuracy than objects of arbitrary size in general. Quite bad results are obtained for Faster R-CNN family and YOLOv5m. The best results are achieved for RetinaNet (ResNetXt) CNN. Second, more intensive noise leads to smaller IoU for practically all networks. Third, the use of the BM3D filter is almost always useless (except YOLOv5m, although its performance for small-sized objects is poor). Meanwhile, the use of preprocessing by DRUNet can be helpful for SSD networks.

Finally, Figure 13 presents the PoCPR data for three considered STDs. A comparison with the corresponding data in Figure 8c, Figure 9c and Figure 10c shows that there are CNNs for which PoCPR for small-sized objects is significantly smaller than for objects in general—these are, e.g., Faster R-CNN family and YOLOv5m networks. The results for other CNN families are better. For original images, PoCPR decreases if noise intensity increases. The use of pre-filtering occurs expedient, especially for intensive noise. For some networks, the BM3D filter performs better whilst, for other networks, the situation is the opposite.

Summarizing the results of analysis in this Section, it is possible to state that the noise has a negative impact on all considered performance indicators especially on F1 and PoCPR and especially for small-sized objects. This impact is different for different networks. The family of RetinaNet CNNs seems to be the most robust to noise. They also demonstrate the best performance in the sense of small-sized object localization and classification. Pre-filtering can be useful for high intensity of the noise and DRUNet, in general, exhibits better characteristics than the BM3D filter.

5. Computational Complexity and Discussion

Each of the neural networks studied has its own positive features. For example, the neural network RetinaNet provides the best metrics for the percentage of correctly predicted regions, while YOLO has high classification accuracy rates. In general, the use of a particular neural network can be reduced to a choice between accuracy and speed (or computational load). For the neural networks under consideration, parameters were calculated to determine the computational load, namely the number of floating point operations per second (FLOPS) and the number of parameters (Params). The results are shown in Table 5. Based on these indicators, it can be determined that the SSD MobileNetV2 and YOLO neural networks have the lowest computational load, while the Faster R-CNN neural networks require the most resources.

The filtering methods used in the work also affect the processing time of a single image and the computational capacity of the algorithm as a whole. To determine the impact of this effect, the time consumption for each of the methods used was evaluated on an image with a size of 1920 × 1080 pixels. The time calculations were performed on the same device with an Intel Core i9-10900 processor. Considering that the DRUNet method is a neural network, the number of floating point operations per second (FLOPS) and the number of parameters were determined for it. The results are presented in Table 6. Analysis of the results shows that the DRUNet method is faster than BM3D, although it has a fairly long image processing time.

To study the impact of filtering on the total processing time of the localization and classification, the image processing time was measured from image loading until the localization result was obtained. The experiments were conducted on available devices, so a GPU (NVIDIA GeForce RTX 4060Ti) was used to run DRUNet, as was the case for all neural networks. An Intel i9-10900 was used to run the BM3D filtering algorithm, as it was not possible to run this algorithm directly on the GPU. The results are presented in Table 7. It is noticeable that the processing time for one image using BM3D is significantly longer than for filtering using DRUNet. By comparing the values obtained with and without the use of filtering algorithms, it can be concluded that the filtering algorithms significantly slow down the overall speed (increase time expenses).

Considering the obtained accuracy estimates of neural networks when processing images with noise, as well as images after filtering, and taking into account the image processing speed estimate for each filtering method, it can be concluded that the DRUNet neural network is the optimal method for improving image quality. Noise pre-filtering to improve localization and classification accuracy metrics is effective and expedient for STD about 10 and larger. On one hand, denoising can be accelerated both on-board and on-land. On the other hand, the use of denoising seems more reasonable for image processing on-land where computational facilities are usually considerably better than on-board.

Recall that in all cases (for original images, images with different intensities of AWGN, and filtered images) the CNNs trained for noise-free images have been applied. Ref. [51] shows that the use of noisy and/or filtered images in classifier training can improve the classification results. Then, it is possible to expect that similar effects might take the place for the considered task of object localization and classification. In other words, the CNNs training with AWGN augmented images can be helpful to improve performance, especially for the case of on-board processing.

The task of small-sized object localization and classification is, as shown, more complicated than the task of object localization and classification in general. It requires special attention and, probably, special approaches including determination of CNN type and parameters most suited for providing high performance, specific approaches to training, and input image pre-processing.

Above we have considered object localization and classification in general, without paying attention to a particular class or classes that can be of interest for a given application, e.g., the detection of forest fires or polluted regions. Then, analysis has to be modified to better fit the considered application.

These results already allow for improved urban monitoring systems, specifically for maintaining order on streets, monitoring complex road sections, identifying citizens in need of assistance, etc. It should also be noted that the obtained data are highly important for the authors’ future work in identifying small and camouflaged objects. This will allow for better noise filtering in images without losing important information, enabling object detection without reducing the probability of detection.

6. Conclusions

The task of object localization and classification in noisy color images acquired from UAVs is considered. It is demonstrated that noise has a negative impact on most performance characteristics (especially, F1) but mainly starting from the moment it becomes visible, i.e., if input PSNR is about 30 dB. Then, it means that noise intensity (or PSNR) has to be controlled and this might complicate image processing.

It should be noted that the F1-score demonstrated the most pronounced degradation with increasing noise levels compared to other metrics (IoU, PoCPR). This is explained by its high sensitivity to classification errors: in the presence of noise, even a slight increase in the number of false positives or false negatives leads to a significant decrease in the final value. Furthermore, the uneven distribution of classes in the dataset used amplifies this effect, since noise has a stronger impact on small categories, reducing the Recall score and, consequently, the F1-score. Another important factor is the fact that the CNN was trained exclusively on clean images, which causes a mismatch between the training and test sets (domain gap) and further degrades classification results. Thus, the F1-score is the most rigorous indicator of classification quality in the presence of noisy UAV images, and its suboptimal values indicate the need to use preprocessing methods (e.g., DRUNet) or expand the training sets with data containing synthetic noise.

Different CNN architectures have different robustness with respect to noise. In particular, YOLOv5m is quite sensitive. The use of pre-filtering occurs to be expedient if input PSNR is less than 30 dB. Both considered filters are, in general, able to improve the performance, especially if noise is very intensive. Meanwhile, on average, the use of DRUNet filter is preferable.

It is also shown that localization and classification of small-sized objects that might correspond to such classes as “humans” or “pedestrians” is even more complex task than localization of objects having size of one thousand pixels or more. Special efforts are needed to improve CNN performance for such classes.

We have studied only the simplest noise model of AWGN. The use of more adequate models of the noise and other degradations is desired in the future. In particular, signal-dependent and/or spatially correlated noise has to be considered. It is also worth studying different models of YOLO family CNNs that develop quickly with continuous improvement of their properties.

Author Contributions

Conceptualization, D.K. and V.L.; methodology, V.L.; software, R.T.; validation, V.L. and R.T.; formal analysis, R.T. and D.K.; investigation, R.T., V.L. and D.K.; resources, R.T. and V.L.; data curation, V.L.; writing—original draft preparation, R.T.; writing—review and editing, V.L. and D.K.; visualization, R.T.; supervision, V.L.; project administration, D.K.; funding acquisition, D.K. All authors have read and agreed to the published version of the manuscript.

Funding

The research has been funded by National Research Foundation of Ukraine (https://nrfu.org.ua/en/, accessed on 24 June 2025) within Project No. 2025.06/0037 “A system for detecting and recognizing camouflaged and small objects based on the use of modern computer vision technologies” (2025–2026).

Data Availability Statement

The VisDrone dataset used for training is available at https://github.com/VisDrone/VisDrone-Dataset (accessed on 20 December 2024). The AU-AIR dataset used for research is available at https://bozcani.github.io/auairdataset (accessed on 20 December 2024). The research results, namely noise-distorted images, filtered images, and obtained metric values, are available at the following Google Drive link: https://drive.google.com/drive/folders/1p4ZSRi7em0FRm3qZymhd_WdLHrI_SASX?usp=drive_link (accessed on 20 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Gholami, A. Exploring drone classifications and applications: A review. Int. J. Eng. Geosci. 2024, 9, 418–442. [Google Scholar] [CrossRef]
Gowroju, S.; Ramchander, N.S. Applications of Drones—A Review. In Drone Technology: Future Trends and Practical Applications; Mohanty, S.N., Ravindra, J.V.R., Narayana, G.S., Pattnaik, C.R., Sirajudeen, Y.M., Eds.; Scrivener Publishing LLC: Beverly, MA, USA, 2023; pp. 183–206. [Google Scholar]
Fedorenko, G.; Fesenko, H.; Kharchenko, V.; Kliushnikov, I.; Tolkunov, I. Robotic-biological systems for detection and identification of explosive ordnance: Concept, general structure, and models. Radioelectron. Comput. Syst. 2023, 106, 143–159. [Google Scholar] [CrossRef]
Ampatzidis, Y.; Partel, V.; Meyering, B.; Albrecht, U. Citrus rootstock evaluation utilizing UAV-based remote sensing and artificial intelligence. Comput. Electron. Agric. 2019, 164, 104900. [Google Scholar] [CrossRef]
Grigore, L.; Cristescu, C. The Use of Drones in Tactical Military Operations in the Integrated and Cybernetic Battlefield. Land Forces Acad. Rev. 2024, 29, 269–273. [Google Scholar] [CrossRef]
Guan, S.; Zhu, Z.; Wang, G. A Review on UAV-Based Remote Sensing Technologies for Construction and Civil Applications. Drones 2022, 6, 117. [Google Scholar] [CrossRef]
Tang, G.; Ni, J.; Zhao, Y.; Gu, Y.; Cao, W. A Survey of Object Detection for UAVs Based on Deep Learning. Remote Sens. 2024, 16, 149. [Google Scholar] [CrossRef]
Ni, J.; Chen, Y.; Chen, Y.; Zhu, J.; Ali, D.; Cao, W. A Survey on Theories and Applications for Self-Driving Cars Based on Deep Learning Methods. Appl. Sci. 2020, 10, 2749. [Google Scholar] [CrossRef]
Li, Z.; Wang, Y.; Zhang, N.; Zhang, Y.; Zhao, Z.; Xu, D.; Ben, G.; Gao, Y. Deep Learning-Based Object Detection Techniques for Remote Sensing Images: A Survey. Remote Sens. 2022, 14, 2385. [Google Scholar] [CrossRef]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.E.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Mehmood, K.; Ali, A.; Jalil, A.; Khan, B.; Cheema, K.M.; Murad, M.; Milyani, A.H. Efficient online object tracking scheme for challenging scenarios. Sensors 2021, 21, 8481. [Google Scholar] [CrossRef]
Zachar, P.; Wilk, Ł.; Pilarska-Mazurek, M.; Meißner, H.; Ostrowski, W. Assessment of UAV image quality in terms of optical resolution. In International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Proceedings of the EuroCOW 2025—European Workshop on Calibration and Orientation Remote Sensing, Warsaw, Poland, 16–18 June 2025; pp. 139–145.
Sieberth, T.; Wackrow, R.; Chandler, J.H. UAV image blur—Its influence and ways to correct it. In International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Proceedings of the 2015 International Conference on Unmanned Aerial Vehicles in Geomatics, Toronto, ON, Canada, 30 August–2 September 2015; pp. 33–39. Available online: https://isprs-archives.copernicus.org/articles/XL-1-W4/33/2015/ (accessed on 20 December 2024).
Weng, T.; Niu, X. Enhancing UAV Object Detection in Low-Light Conditions with ELS-YOLO: A Lightweight Model Based on Improved YOLOv11. Sensors 2025, 25, 4463. [Google Scholar] [CrossRef] [PubMed]
Jin, G.; Wei, G.; Zhai, J.S. Denoising of Maritime Unmanned Aerial Vehicles Photography Based on Guidance Correlation Pixel Sampling and Aggregation. IEEE Access 2022, 10, 40109–40118. [Google Scholar] [CrossRef]
Tsekhmystro, R.; Rubel, O.; Prysiazhniuk, O.; Lukin, V. Impact of distortions in UAV images on quality and accuracy of object localization. Radioelectron. Comput. Syst. 2024, 2024, 59–67. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2999–3007. [Google Scholar]
Wang, R.; Xiao, X.; Guo, B.; Qin, Q.; Chen, R. An Effective Image Denoising Method for UAV Images via Improved Generative Adversarial Networks. Sensors 2018, 18, 1985. [Google Scholar] [CrossRef]
Pingjuan, N.; Xueru, M.; Run, M.; Jie, P.; Shan, W.; Hao, S.; She, H. Research on UAV image denoising effect based on improved Wavelet Threshold of BEMD. In Journal of Physics: Conference Series (JPCS), Proceedings of the 2nd International Symposium on Big Data and Applied Statistics (ISBDAS2019), Dalian, China, 20–22 September 2019; IOP Publishing: Bristol, UK, 2019. [Google Scholar]
Lu, J.; Chai, Y.; Hu, Z.; Sun, Y. A novel image denoising algorithm and its application in UAV inspection of oil and gas pipelines. Multimed. Tools Appl. 2023, 83, 34393–34415. [Google Scholar] [CrossRef]
Chatterjee, P.; Milanfar, P. Is Denoising Dead? IEEE Trans. Image Process. 2010, 19, 895–911. [Google Scholar] [CrossRef] [PubMed]
Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image Denoising by Sparse 3-D Transform-Domain Collaborative Filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
Mäkinen, Y.; Azzari, L.; Foi, A. Collaborative Filtering of Correlated Noise: Exact Transform-Domain Variance for Improved Shrinkage and Patch Matching. IEEE Trans. Image Process. 2020, 29, 8339–8354. [Google Scholar] [CrossRef] [PubMed]
Ilesanmi, A.E.; Ilesanmi, T.O. Methods for image denoising using convolutional neural network: A review. Complex Intell. Syst. 2021, 7, 2179–2198. [Google Scholar] [CrossRef]
Tian, C.; Fei, L.; Zheng, W.; Xu, Y.; Zuo, W.; Lin, C. Deep learning on image denoising: An overview. Neural Netw. 2020, 131, 251–275. [Google Scholar] [CrossRef]
Lukin, V.; Abramov, S.; Krivenko, S.; Kurekin, A.; Pogrebnyak, O. Analysis of classification accuracy for pre-filtered multichannel remote sensing data. J. Expert Syst. Appl. 2013, 40, 6400–6411. [Google Scholar] [CrossRef]
Lavreniuk, M.; Shelestov, A.; Kussul, N.; Rubel, O.; Lukin, V.; Egiazarian, K. Use of modified BM3D filter and CNN classifier for SAR data to improve crop classification accuracy. In Proceedings of the 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON), Lviv, Ukraine, 2–6 July 2019; pp. 1071–1076. [Google Scholar]
Zhang, Y.; Kong, W.; Zhou, W.; Yuan, H. Self-Supervised SAR Image Denoising Method for Ship Target Detection. In Proceedings of the 2024 International Radar Conference (RADAR), Rennes, France, 21–25 October 2024; pp. 1–6. [Google Scholar]
Zhang, X. Image denoising and segmentation model construction based on IWOA-PCNN. Sci. Rep. 2023, 13, 19848. [Google Scholar] [CrossRef]
Hua, W.; Chen, Q. A survey of small object detection based on deep learning in aerial images. Artif. Intell. Rev. 2025, 58, 162. [Google Scholar] [CrossRef]
Martini, G.M.; Hewage, C.T.E.R.; Villarini, B. Image quality assessment based on edge preservation. Signal Process. Image Commun. 2012, 27, 875–882. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Ponomarenko, M.; Egiazarian, K.; Lukin, V.; Abramova, V. Structural Similarity Index with Predictability of Image Blocks. In Proceedings of the 2018 IEEE 17th International Conference on Mathematical Methods in Electromagnetic Theory (MMET), Kyiv, Ukraine, 2–5 July 2018; pp. 115–118. [Google Scholar]
Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
Sokolova, M.; Japkowicz, N.; Szpakowicz, S. Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation. In Lecture Notes in Computer Science, Proceedings of the Advances in Artificial Intelligence (AI 2006), Hobart, Australia, 4–8 December 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 1015–1021. [Google Scholar]
Zhang, K.; Li, Y.; Zuo, W.; Zhang, L.; Gool, L.V.; Timofte, R. Plug-and-Play Image Restoration with Deep Denoiser Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 6360–6376. [Google Scholar] [CrossRef]
Michailidis, E.T.; Maliatsos, K.; Skoutas, D.N.; Vouyioukas, D.; Skianis, C. Secure UAV-Aided Mobile Edge Computing for IoT: A Review. IEEE Access 2022, 10, 86353–86383. [Google Scholar] [CrossRef]
Tian, W.; Zhao, Q.; Ma, Y.; Long, X.; Wang, X. Flight parameter setting of unmanned aerial vehicle hyperspectral load. J. Appl. Spectrosc. 2022, 89, 159–169. [Google Scholar] [CrossRef]
Bozcan, I.; Kayacan, E. AU-AIR: A Multi-modal Unmanned Aerial Vehicle Dataset for Low Altitude Traffic Surveillance. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation, Paris, France, 31 May–31 August 2020. [Google Scholar]
Zenodo. Ultralytics/yolov5: V7.0—YOLOv5 SOTA Realtime Instance Segmentation. Available online: https://zenodo.org/records/7347926 (accessed on 12 August 2025).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Zhu, P.; Wen, L.; Bian, X.; Ling, H.; Hu, Q. Vision Meets Drones: A Challenge. arXiv 2018, arXiv:1804.07437. [Google Scholar] [CrossRef]
Tsekhmystro, R.; Rubel, O.; Lukin, V. Investigation of the effect of object size on accuracy of human localization in images acquired from unmanned aerial vehicles. Aerosp. Tech. Technol. 2024, 194, 83–90. [Google Scholar] [CrossRef]
Tsekhmystro, R.; Rubel, O.; Lukin, V. Study of methods for searching and localizing objects in images from aircraft using convolutional neural networks. Radioelectron. Comput. Syst. 2024, 2024, 87–98. [Google Scholar] [CrossRef]
Colom, M.; Lebrun, M.; Buades, A.; Morel, J.M. A non-parametric approach for the estimation of intensity-frequency dependent noise. In Proceedings of the 2014 IEEE International Conference on Image Processing (ICIP), Paris, France, 27–30 October 2014; pp. 4261–4265. [Google Scholar]
Pyatykh, S.; Zheng, L.; Hesser, J. Fast noise variance estimation by principal component analysis. In Image Processing: Algorithms and Systems XI, Proceedings of the IS&T/SPIE Electronic Imaging, Burlingame, CA, USA, 3–7 February 2013; SPIE: Bellingham, WA, USA, 2013. [Google Scholar]
Rebrov, V.; Proskura, G.; Lukin, V. Classification of Compressed Noisy Three-Channel Noisy Images: Comparison of Several Approaches. In Integrated Computer Technologies in Mechanical Engineering—2024. ICTM 2024. Lecture Notes in Networks and Systems, Proceeding of the Conference on Integrated Computer Technologies in Mechanical Engineering–Synergetic Engineering, Kharkiv, Ukraine, 12–14 December 2024; Springer: Cham, Switzerland, 2025. [Google Scholar]

Figure 1. Example of noise-free image (a) and it noisy versions for AWGN with standard deviation equal to 7 (b), 15 (c), and 25 (d).

Figure 2. Example of markup.

Figure 3. F1 values for noise-free image (STD = 0) and noisy images with seven values of STD for eleven types of CNNs.

Figure 4. IoU values for noise-free image (STD = 0) and noisy images with seven values of STD for eleven types of CNNs.

Figure 5. Percentage of correctly predicted regions for noise-free image (STD = 0) and noisy images with seven values of STD for eleven types of CNNs.

Figure 6. Annotation sample for original image (a), noisy image with STD = 25 (b), and denoised by DRUNet (c), all generated by SSD (VGG16).

Figure 7. Annotation sample for original image (a), noisy image with STD = 25 (b), and denoised by DRUNet (c), all generated by YOLOv5m.

Figure 8. Performance characteristics for STD = 5: F1 (a), IoU (b), and PoCPR (c).

Figure 9. Performance characteristics for STD = 15: F1 (a), IoU (b), and PoCPR (c).

Figure 10. Performance characteristics for STD = 25: F1 (a), IoU (b), and PoCPR (c).

Figure 11. F1 for small-sized objects for STD = 5 (a), STD = 15 (b), and STD = 25 (c).

Figure 12. IoU for small-sized objects for STD = 5 (a), STD = 15 (b), and STD = 25 (c).

Figure 13. PoCPR for small-sized objects for σ = 5 (a), σ = 15 (b), and σ = 25 (c).

Table 1. Average PSNR before and after denoising for the considered images.

Processing Method	Noise STD
Processing Method	3	5	7	10	15	20	25
PSNR before denoising, dB	38.47	34.14	31.26	28.19	24.69	22.23	20.31
PSNR after BM3D, dB	46.90	44.32	42.49	40.36	38.03	36.59	35.33
PSNR after DRUNet, dB	48.49	45.88	43.79	41.50	39.01	37.12	36.14

Table 2. PoCPR comparison for all neural networks.

STD	Neural Network
STD	Faster R-CNN (ResNet18)	Faster R-CNN (ResNet50)	Faster R-CNN (ResNet101)	Faster R-CNN (ResNetXt)	RetinaNet (ResNet18)	RetinaNet (ResNet50)	RetinaNet (ResNet101)	RetinaNet (ResNetXt)	SSD (MobileNetV2)	SSD (VGG16)	YOLO v5m
original	0.9	0.9	0.88	0.88	0.91	0.89	0.89	0.91	0.9	0.9	0.71
5	0.9	0.9	0.88	0.88	0.91	0.9	0.89	0.91	0.92	0.92	0.71
15	0.9	0.91	0.88	0.88	0.91	0.9	0.89	0.91	0.92	0.88	0.61
25	0.9	0.9	0.88	0.88	0.91	0.9	0.89	0.91	0.9	0.9	0.47
5_denoised_drunet	0.9	0.9	0.88	0.89	0.91	0.9	0.89	0.9	0.91	0.91	0.72
15_denoised_drunet	0.9	0.9	0.88	0.88	0.91	0.89	0.89	0.9	0.91	0.91	0.73
25_denoised_drunet	0.9	0.9	0.88	0.88	0.91	0.9	0.89	0.9	0.91	0.91	0.73
5_denoised_bm3d	0.9	0.9	0.89	0.88	0.9	0.89	0.87	0.9	0.91	0.91	0.73
15_denoised_bm3d	0.88	0.87	0.85	0.85	0.88	0.85	0.86	0.86	0.88	0.88	0.73
25_denoised_bm3d	0.87	0.87	0.85	0.85	0.88	0.86	0.85	0.87	0.88	0.88	0.72

Table 3. IoU comparison for all neural networks.

STD	Neural Network
STD	Faster R-CNN (ResNet18)	Faster R-CNN (ResNet50)	Faster R-CNN (ResNet101)	Faster R-CNN (ResNetXt)	RetinaNet (ResNet18)	RetinaNet (ResNet50)	RetinaNet (ResNet101)	RetinaNet (ResNetXt)	SSD (MobileNetV2)	SSD (VGG16)	YOLO v5m
original	0.48	0.49	0.48	0.49	0.44	0.44	0.44	0.45	0.45	0.45	0.51
5	0.47	0.47	0.48	0.49	0.43	0.44	0.44	0.45	0.43	0.43	0.5
15	0.46	0.47	0.48	0.48	0.43	0.43	0.44	0.44	0.43	0.46	0.49
25	0.46	0.47	0.47	0.48	0.43	0.43	0.44	0.44	0.42	0.42	0.48
5_denoised_drunet	0.47	0.47	0.48	0.48	0.44	0.44	0.44	0.45	0.43	0.43	0.5
15_denoised_drunet	0.47	0.47	0.48	0.48	0.44	0.44	0.45	0.45	0.43	0.43	0.5
25_denoised_drunet	0.47	0.47	0.48	0.48	0.43	0.44	0.45	0.45	0.43	0.43	0.5
5_denoised_bm3d	0.46	0.47	0.47	0.48	0.43	0.43	0.43	0.44	0.43	0.43	0.51
15_denoised_bm3d	0.45	0.45	0.47	0.47	0.42	0.42	0.43	0.43	0.42	0.42	0.49
25_denoised_bm3d	0.46	0.45	0.47	0.47	0.42	0.42	0.43	0.43	0.42	0.42	0.48

Table 4. F1 comparison for all neural networks.

STD	Neural Network
STD	Faster R-CNN (ResNet18)	Faster R-CNN (ResNet50)	Faster R-CNN (ResNet101)	Faster R-CNN (ResNetXt)	RetinaNet (ResNet18)	RetinaNet (ResNet50)	RetinaNet (ResNet101)	RetinaNet (ResNetXt)	SSD (MobileNetV2)	SSD (VGG16)	YOLO v5m
original	0.42	0.35	0.49	0.49	0.4	0.4	0.41	0.4	0.61	0.61	0.52
5	0.43	0.36	0.49	0.49	0.39	0.4	0.4	0.38	0.6	0.6	0.51
15	0.4	0.34	0.48	0.47	0.35	0.37	0.38	0.35	0.54	0.53	0.44
25	0.36	0.33	0.45	0.45	0.3	0.35	0.36	0.31	0.49	0.49	0.32
5_denoised_drunet	0.43	0.36	0.49	0.48	0.4	0.42	0.41	0.4	0.63	0.63	0.53
15_denoised_drunet	0.43	0.38	0.5	0.5	0.42	0.43	0.44	0.43	0.64	0.64	0.54
25_denoised_drunet	0.44	0.39	0.51	0.52	0.43	0.43	0.44	0.44	0.65	0.65	0.55
5_denoised_bm3d	0.42	0.37	0.5	0.49	0.39	0.41	0.41	0.41	0.63	0.63	0.55
15_denoised_bm3d	0.43	0.37	0.48	0.48	0.4	0.41	0.42	0.41	0.62	0.62	0.54
25_denoised_bm3d	0.42	0.37	0.48	0.48	0.41	0.41	0.43	0.42	0.62	0.62	0.54

Table 5. Number of parameters and FLOPS for the neural networks under investigation.

Parameter	Neural Network
Parameter	Faster R-CNN (ResNet18)	Faster R-CNN (ResNet50)	Faster R-CNN (ResNet101)	Faster R-CNN (ResNetXt)	RetinaNet (ResNet18)	RetinaNet (ResNet50)	RetinaNet (ResNet101)	RetinaNet (ResNetXt)	SSD (MobileNetV2)	SSD (VGG16)	YOLO v5m
FLOPS, T	0.085	0.112	0.151	0.231	0.08	0.106	0.145	0.22	0.001	0.088	0.065
Params, M	28.33	41.4	60.4	99.3	19.97	36.54	55.52	94.4	3.18	25.86	41.2

Table 6. Performance parameters of denoising methods.

Method	Parameter
Method	FLOPS, T	Params, M	Inference Time, s
DRUNet [38]	9.087	32.64	24.5
BM3D [25]	-	-	51.1

Table 7. Performance of filtering and localization methods.

Inference Type	Neural Network
Inference Type	Faster R-CNN (ResNet18)	Faster R-CNN (ResNet50)	Faster R-CNN (ResNet101)	Faster R-CNN (ResNetXt)	RetinaNet (ResNet18)	RetinaNet (ResNet50)	RetinaNet (ResNet101)	RetinaNet (ResNetXt)	SSD (MobileNetV2)	SSD (VGG16)	YOLO v5m
Without pre-processing	0.057	0.075	0.063	0.375	0.062	0.106	0.252	0.339	0.79	0.8	0.032
With BM3D denoiser	51.19	51.18	51.19	51.44	51.18	51.21	51.39	51.44	51.901	51.902	51.131
With DRUNet denoiser	1.059	1.078	1.067	1.373	1.065	1.108	1.255	1.34	1.801	1.803	1.035

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsekhmystro, R.; Lukin, V.; Krytskyi, D. UAV Image Denoising and Its Impact on Performance of Object Localization and Classification in UAV Images. Computation 2025, 13, 234. https://doi.org/10.3390/computation13100234

AMA Style

Tsekhmystro R, Lukin V, Krytskyi D. UAV Image Denoising and Its Impact on Performance of Object Localization and Classification in UAV Images. Computation. 2025; 13(10):234. https://doi.org/10.3390/computation13100234

Chicago/Turabian Style

Tsekhmystro, Rostyslav, Vladimir Lukin, and Dmytro Krytskyi. 2025. "UAV Image Denoising and Its Impact on Performance of Object Localization and Classification in UAV Images" Computation 13, no. 10: 234. https://doi.org/10.3390/computation13100234

APA Style

Tsekhmystro, R., Lukin, V., & Krytskyi, D. (2025). UAV Image Denoising and Its Impact on Performance of Object Localization and Classification in UAV Images. Computation, 13(10), 234. https://doi.org/10.3390/computation13100234

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

UAV Image Denoising and Its Impact on Performance of Object Localization and Classification in UAV Images

Abstract

1. Introduction

2. Image/Noise Model and Object Localization and Classification

2.1. Image/Noise Model

2.2. Considered CNNs

2.3. Criteria of Accuracy for Object Localization and Classification

3. Experiment Description

3.1. Used Dataset

3.2. Preliminary Results

4. Denoising Efficiency Analysis

4.1. Denoising Efficiency Analysis in Conventional Terms

4.2. Denoising Efficiency in Terms of Object Localization and Classification

4.3. Performance Analysis in Terms of Small-Sized Objects’ Detectability

5. Computational Complexity and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI