The following studies were excluded during the quality assessment stage, and are, therefore, not included in the results. Ref. [
37] was inaccessible via our institutional tools and network; consequently, it was excluded from our review. Studies [
38,
39] were deemed false positives as they did not mention the utilization of GANs in their paper. Additionally, it was observed that they were authored by the same individuals and focused on similar research themes. Refs. [
40,
41] were rejected because the primary focus of the research was on creating datasets using GANs, which lies outside the scope of our systematic review.
Table 4.
From quality assessment. “Total” represents the cumulative points assigned to each study based on inclusion criteria. “Base” denotes the source database where the papers were indexed, while “Pub. type” indicates the publication format, distinguishing between journal papers (a) and conference papers (b). Target is the kind of search that the study aimed to identify.
Study | Year | IC1 | IC2 | IC3 | IC4 | IC5 | IC6 | IC7 | Total | Citations | Base | Pub. Type | Target |
---|
[42] | 2017 | 0.5 | 1.0 | 0.5 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | - | IEEE Xplore | b | Roads |
[43] | 2017 | 5.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.5 | - | IEEE Xplore | b | Transmission Lines |
[44] | 2018 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 3.0 | - | IEEE Xplore | b | Stingrays |
[45] | 2019 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | - | IEEE Xplore | a | Diverse entities |
[46] | 2019 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | - | IEEE Xplore | b | Cars |
[47] | 2019 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 33 | SCOPUS | a | Diverse entities |
[48] | 2019 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | 8 | SCOPUS | b | Vehicles |
[49] | 2019 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | - | IEEE Xplore | b | Vehicles |
[50] | 2020 | 0.0 | 0.0 | 1.0 | 0.5 | 0.0 | 0.0 | 1.0 | 2.5 | 12 | SCOPUS | a | Markers |
[51] | 2020 | 0.5 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.5 | 15 | SCOPUS | a | Diverse entities |
[52] | 2020 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | 7 | SCOPUS | a | Insulators |
[53] | 2020 | 0.5 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.5 | - | IEEE Xplore | b | Vehicles |
[54] | 2021 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 4.0 | 5 | SCOPUS | a | Pedestrians |
[55] | 2021 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 5.0 | 1 | SCOPUS | a | Diverse entities |
[56] | 2021 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 23 | SCOPUS | a | Plants |
[57] | 2021 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | 0 | SCOPUS | a | Diverse entities |
[58] | 2021 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 3.0 | 26 | SCOPUS | a | Small Entities |
[59] | 2021 | 0.5 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | 2.5 | 10 | SCOPUS | a | Insulators |
[60] | 2021 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 6 | SCOPUS | a | Diverse entities |
[61] | 2021 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 3.0 | - | IEEE Xplore | b | Pedestrians |
[62] | 2021 | 0.0 | 0.0 | 0.0 | 0.5 | 0.0 | 1.0 | 1.0 | 2.5 | - | IEEE Xplore | b | Living beings |
[63] | 2022 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 5.0 | 1 | SCOPUS | b | Debris |
[64] | 2022 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 4.0 | 23 | SCOPUS | a | Pavement cracks |
[65] | 2022 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 0 | SCOPUS | b | Transm. Lines defects |
[66] | 2022 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 4.0 | 8 | SCOPUS | a | Wildfire |
[67] | 2022 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | 8 | SCOPUS | a | Anomaly Entities |
[68] | 2022 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 3.0 | 8 | SCOPUS | a | Small drones |
[69] | 2022 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 3.0 | 1 | SCOPUS | a | Peach tree crowns |
[70] | 2022 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 2.0 | - | IEEE Xplore | b | UAVs |
[71] | 2022 | 1.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3.0 | - | IEEE Xplore | b | Human faces |
[72] | 2022 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.0 | - | IEEE Xplore | b | Object distances from UAV |
[73] | 2022 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 2.0 | - | IEEE Xplore | b | Diverse entities |
[74] | 2023 | 1.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 4.0 | 1 | SCOPUS | b | Vehicles |
[75] | 2023 | 0.0 | 0.0 | 1.0 | 1.0 | 0.0 | 0.0 | 1.0 | 3.0 | 2 | SCOPUS | a | Diverse entities |
[76] | 2023 | 0.5 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 2.5 | - | IEEE Xplore | b | Small entities |
[77] | 2023 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 0.0 | 1.0 | 5.0 | - | IEEE Xplore | b | Drones |
[78] | 2023 | 1.0 | 0.0 | 1.0 | 0.5 | 0.0 | 0.0 | 1.0 | 3.5 | - | IEEE Xplore | b | Small objects |
3.1. Human and Animal Detection
From the articles focused on the detection of humans and animals, we obtained the following outcomes.
Ref. [
54] proposes a weight GAN sub-network to enhance the local features of small targets and introduces sample balance strategies to optimize the imbalance among training samples, especially between positive and negative samples, as well as easy and hard samples, which is a technique for object detection that freely addresses issues of images generated by drone movement instability and tiny object size, which can hinder identification, lighting problems, rain, and fog, among others. The study reported improvements in detection performance compared to other methods, such as achieving a 5.46% improvement over Large Scale Images, a 3.91% improvement over SRGAN, a 3.59% improvement over ESRGAN, and a 1.23% improvement over Perceptual GANs. This work would be an excellent reference for addressing issues related to images taken from medium or high altitudes in SAR operations. The authors used accuracy as a metric, comparing its value with those of the the SRGAN, ESRGAN, and Perceptual GAN models. Other metrics presented for evaluating the work include the AP (average precision) and AR (average recall).
Ref. [
44] used a Faster-RCNN, but for detecting stingrays. The work proposes the application of a GLO model (a variation of a GAN where the discriminator is removed and learns to map images to noise vectors by minimizing the reconstruction loss) to increase the dataset to improve object detection algorithms. The used model (C-GLO) learns to generate synthetic foreground objects (stingrays) given background patches using a single network, without relying on a pre-trained model for this specific task. In other words, the study utilized a modified GAN network to expand the dataset of stingray images in oceans, considering the scarcity of such images, which complicates the training of classification algorithms. Thus, the dataset was augmented through C-GLO, and the data were analyzed by Faster-RCNN. They utilized the AP metric to assess the performance of the RCNN applied tothe images of various latent code dimensions.
Ref. [
61] proposes a model for generating pedestrian silhouette maps, used for their recognition; however, the application of GANs is not addressed. It was rejected because it lacked the use of GANs in the development.
Ref. [
62] was also rejected due to its failure to incorporate GAN usage in development, despite utilizing YOLOv3 for target detection in UAV images.
3.2. Object Detection
In considering studies oriented to object detection, Ref. [
47] aimed to address object detection challenges in aerial images captured by UAVs in the visible light spectrum. To enhance object detection in these images, the study proposes a GAN-based super-resolution method. This GAN solution is specifically designed to up-sample images with low-resolution object detection challenges, improving the overall detection accuracy in aerial imagery.
Ref. [
74] utilized a GAN-based real-time data augmentation algorithm to enhance the training data for UAV vehicle detection tasks, specifically focusing on improving the accuracy of detecting vehicles, pedestrians, and bicycles in UAV images. In incorporating the GAN approach, along with enhancements like using FocalLoss and redesigning the target detection head combination, the study achieved a 4% increase in detection accuracy over the original YOLOv5 model.
Ref. [
67] introduces a novel two-branch generative adversarial network architecture designed for detecting and localizing anomalies in RGB aerial video streams captured by UAVs at low altitudes. The primary purpose of the GAN-based method is to enhance anomaly detection and localization in challenging operational scenarios, such as identifying small dangerous objects like improvised explosive devices (IEDs) or traps in various environments. The GAN architecture consists of two branches: a detector branch and a localizer branch. The detector branch focuses on determining whether a given video frame depicts a normal scene or contains anomalies, while the localizer branch is responsible for producing attention maps that highlight abnormal elements within the frames when anomalies are detected. In the context of search and rescue operations, the GAN-based method can be instrumental in identifying and localizing anomalies or potential threats in real-time aerial video streams. For example, in search and rescue missions, the system could help in detecting hazardous objects, locating missing persons, or identifying obstacles in disaster-affected areas. By leveraging the GAN’s capabilities for anomaly detection and localization, search and rescue teams can enhance their situational awareness and response effectiveness in critical scenarios.
Ref. [
57] proposes a novel end-to-end multi-task GAN architecture to address the challenge of small-object detection in aerial images. The GAN framework combines super-resolution (SR) and object detection tasks to generate super-resolved versions of input images, enhancing the discriminative detection of small objects. The generator in the architecture consists of an SR network with additional components such as a gradient guidance network (GGN) and an edge-enhancement network (EEN) to mitigate structural distortions and improve image quality. In the discriminator part of the GAN, a faster region-based convolutional neural network (FRCNN) is integrated for object detection. Unlike traditional GANs that estimate the realness of super-resolved samples using a single scalar, realness distribution is used as a measure of realness. This distribution provides more insights for the generator by considering multiple criteria rather than a single perspective, leading to an improved detection accuracy.
Ref. [
68] introduces a novel detection network, the Region Super-Resolution Generative Adversarial Network (RSRGAN), to enhance the detection of small infrared targets. The GAN component of the RSRGAN focuses on the super-resolution enhancement of infrared images, improving the clarity and resolution of small targets like birds and leaves. This enhancement aids in accurate target detection, particularly in challenging scenarios. In the context of search and rescue operations, the application of the RSRGAN could be beneficial for identifying small targets in infrared imagery with greater precision. By enhancing the resolution of images containing potential targets, such as individuals in distress or objects in need of rescue, the RSRGAN could assist search and rescue teams in quickly and accurately locating targets in various environmental conditions. The improved detection capabilities offered by the RSRGAN could enhance the efficiency and effectiveness of search and rescue missions, ultimately contributing to saving lives and optimizing rescue efforts.
Ref. [
48] concentrates on enhancing small-object detection in UAV aerial imagery captured by optical cameras mounted on unmanned aerial systems (UAVs). The proposed GAN solution, known as a classification-oriented super-resolution generative adversarial network (CSRGAN), aims to improve the classification results of tiny objects and enhance the detection performance by recovering discriminative features from the original small objects. In the context of search and rescue operations, the application of CSRGAN could be beneficial for identifying and locating small objects, such as individuals or objects, in aerial images. By enhancing the resolution and classification-oriented features of these small objects, CSRGAN could assist in improving the efficiency and accuracy of search and rescue missions conducted using UAVs. This technology could aid in quickly identifying and pinpointing targets in large areas, ultimately enhancing the effectiveness of search and rescue operations.
Ref. [
60] focuses on LighterGAN, an unsupervised illumination enhancement GAN model designed to improve the quality of images captured in low-illumination conditions using urban UAV aerial photography. The primary goal of LighterGAN is to enhance image visibility and quality in urban environments affected by low illumination and light pollution, making them more suitable for various applications in urban remote sensing and computer vision algorithms. In the context of search and rescue operations, the application of LighterGAN could be highly beneficial. When conducting search and rescue missions, especially in low-light or nighttime conditions, having clear and enhanced images from UAV aerial photography can significantly aid in locating individuals or objects in need of assistance. By using LighterGAN to enhance images captured by UAVs in low illumination scenarios, search and rescue teams can improve their visibility, identify potential targets more effectively, and enhance their overall situational awareness during critical operations.
Ref. [
43] explores the use of GANs to enhance image quality through a super-resolution deblurring algorithm. The GAN-based approach aims to improve the clarity of images affected by motion blur, particularly in scenarios like UAV (unmanned aerial vehicle) image acquisition. By incorporating defocused fuzzy kernels and multi-direction motion fuzzy kernels into the training samples, the algorithm effectively mitigates blur and enhances image data captured by UAVs.
Ref. [
49] introduces a novel approach utilizing a GAN to address the challenge of small-object detection in aerial images captured by drones or unmanned aerial vehicles (UAVs). By leveraging the capabilities of GAN technology, the research focused on enhancing the resolution of low-quality images depicting small objects, thereby facilitating more accurate object detection algorithms.
Ref. [
45] utilized a generative adversarial network (GAN) solution to augment typical, easily confused negative samples in the pretraining stage of a saliency-enhanced multi-domain convolutional neural network (SEMD) for remote sensing target tracking in UAV aerial videos. The GAN’s purpose is to enhance the network’s ability to distinguish between targets and the background in challenging scenarios by generating additional training samples. In SAR operations, the study can assist in distinguishing between targets and the background.
Ref. [
46] introduces a generative adversarial network named VeGAN, trained to generate synthetic images of vehicles from a top-down aerial perspective for semantic segmentation tasks. By leveraging the GAN for content-based augmentation of training data, the study aimed to enhance the accuracy of a semantic segmentation network in detecting cars in aerial images. We can take the study as a basis for training for the identification of other targets.
Ref. [
56] aimed to enhance maize plant detection and counting using deep learning algorithms applied to high-resolution RGB images captured by UAVs. To address the challenge of low-quality images affecting detection accuracy, the study proposes a GAN-based super-resolution method. This method aims to improve results on native low-resolution datasets compared to traditional up-sampling techniques. This study was rejected because it focused on agricultural purposes rather than SAR.
Ref. [
58] utilized a generative adversarial network (GAN), specifically the CycleGAN model, for domain adaptation in bale detection for precision agriculture. The primary objective was to enhance the performance of the YOLOv3 object detection model in accurately identifying bales of biomass in various environmental conditions. The GAN is employed to transfer styles between images with diverse illuminations, hues, and styles, enabling the YOLOv3 model to effectively detect bales under different scenarios. We also rejected it because of its non-SAR purpose.
Ref. [
69] utilized a conditional generative adversarial network (cGAN) for the automated extraction and clustering of peach tree crowns based on UAV images in a peach orchard. The primary focus was on monitoring and quantitatively characterizing peach tree crowns using remote sensing imagery. It was also rejected because it focuses on agriculture.
Ref. [
70] proposes a novel approach using the Pix2Pix GAN architecture for unmanned aerial vehicle (UAV) detection. The GAN was applied to detect UAVs in images captured by optical sensors, aiming to enhance the efficiency of UAV detection systems. In utilizing the GAN framework, the study focused on improving the accuracy and effectiveness of identifying UAVs in various scenarios, including adverse weather conditions. We rejected this study because it was aimed at air defense through identifying UAVs in the air using some sensors on the ground.
Ref. [
65] employed GANs to enhance transmission line images. The article does not mention YOLO; the study used a dataset from scratch. Therefore, we can conclude that the GAN was used for super-resolution. We cannot classify it as a study focused on SAR. Hence, we rejected the study at this stage.
3.3. Infrared Spectrum
Considering studies with images in the infrared spectrum, Ref. [
55] employed GANs to facilitate the translation of color images to thermal images, specifically aiming to enhance the performance of color-thermal ReID (re-identification). This translation process involves converting probe images captured in the visible range to the infrared range. By utilizing the GAN framework for color-to-thermal image translation, the study aimed to improve the effectiveness of object recognition and re-identification tasks in cross-modality scenarios, such as detecting objects in thermal images and matching them with corresponding objects in color images. Yolo and another object detector were mentioned; the study utilized various metrics for evaluating the ThermalReID framework and modern baselines. For the object detection task, they used the Intersection over Union (IoU) and mean average precision (mAP) metrics. In the ReID task, they employed Cumulative Matching Characteristic (CMC) curves and normalized Area Under the Curve (nAUC) for evaluation purposes.
In [
63], the primary objective of utilizing GANs was to address the challenge posed by the differing characteristics of thermal and RGB images, such as varying dimensions and pixel representations. By employing GANs, the study aimed to generate thermal images that are compatible with RGB images, ensuring a harmonious fusion of data from both modalities.
The StawGAN in [
76] was used to enhance the translation of night-time thermal infrared images into daytime color images. The StawGAN model was specifically designed to improve the quality of target generation in the daytime color domain based on the input thermal infrared images. By leveraging the GAN architecture, which comprises a generator and a discriminator network, the StawGAN model aims to produce more realistic and well-shaped objects in the target domain, thereby enhancing the overall image translation process.
Ref. [
77] employed a GAN as a sophisticated image processing technique to enhance the quality of input images for UAV target detection tasks. The primary objective of integrating GAN technology into the research framework was to elevate the accuracy and reliability of the target detection process, particularly in the context of detecting UAVs. By harnessing the capabilities of GANs as image fusion technology, the study focused on amalgamating images captured from diverse modalities, such as those obtained from both the infrared and visible light spectra. This fusion process is crucial, as it enriches the visual information available for identifying and pinpointing UAV targets within the imagery. Essentially, the GAN functions as a tool to generate fused images by adapting and refining the structures of both the generator and discriminator components within the network architecture. Through this innovative approach, the research aimed to enhance the precision and robustness of the target detection mechanism embedded within the YOLOv5 model. By leveraging the power of GAN-based image fusion, the study endeavored to optimize the focus and clarity of the target detection process, ultimately leading to an improved performance in identifying UAV targets within complex visual environments.
Ref. [
72] focuses on utilizing a conditional generative adversarial network (CGAN), specifically the Pix2Pix model, to generate depth images from monocular infrared images captured by a camera. This application of CGAN aims to enhance collision avoidance during drone flights at night by providing crucial depth information for safe navigation. The research emphasizes the use of CGAN for converting infrared images into depth images, enabling the drone to determine distances to surrounding objects and make informed decisions to avoid collisions during autonomous flight operations in low-light conditions. This study can be leveraged in drone group operations, but in terms of ground object identification, it is not applicable. Therefore, we rejected the study.
Ref. [
52] proposes a novel approach for insulator object detection in aerial images captured by drones utilizing a Wasserstein generative adversarial network (WGAN) for image deblurring. The primary purpose of the GAN solution was to enhance the clarity of insulator images that may be affected by factors such as weather conditions, data processing, camera quality, and environmental surroundings, leading to blurry images. By training the GAN on visible light spectrum images, the study aimed to improve the detection rate of insulators in aerial images, particularly in scenarios where traditional object detection algorithms may struggle due to image blurriness. It was rejected because it was not oriented toward search and rescue.
3.4. YOLO Versions
While some studies utilized Faster-RCNN [
44,
49,
51] or custom object detection solutions [
68], the majority of the selected ones employed some version of YOLO, with the most common being versions 3 and 5, as depicted in
Figure 5a.
Ref. [
66] aimed to enhance wildfire detection using GANs to produce synthetic wildfire images. These synthetic images were utilized to address data scarcity issues and enhance the model’s detection capabilities. Additionally, Weakly Supervised Learning (WSOL) was applied for object localization and annotation, automating the labeling task and mitigating data shortage issues. The annotated data generated through WSOL were then used to train an improved YOLOv5-based detection network, enhancing the accuracy of the wildfire detection model. The integrated use of GANs for image generation, WSOL for annotation, and YOLOv5 for detection aimed to enhance the model’s performance and automate the wildfire detection process. This study could also aid in search and rescue operations, as the presence of fire in an area may indicate potential areas of interest during search efforts.
Ref. [
75] is centered on image deblurring in the context of aerial remote sensing to enhance the object detection performance. It introduces the adaptive multi-scale fusion blind deblurred generative adversarial network (AMD-GAN) to address image blurring challenges in aerial imagery. The AMD-GAN leverages multi-scale fusion guided by image blurring levels to improve the deblurring accuracy and preserve texture details. In the study, the AMD-GAN was applied to deblur aerial remote sensing images, particularly in the visible light spectrum, to enhance object detection tasks. The YOLOv5 model was utilized for object detection experiments on both blurred and deblurred images. The results demonstrated that deblurring with the AMD-GAN significantly improves object detection indices, as evidenced by increased mean average precision (MAP) values and an enhanced detection performance compared to using blurred images directly with YOLOv5.
Ref. [
78] engages on enhancing small-object detection in drone imagery through the use of a Collaborative Filtering Mechanism (CFM) based on a cycle generative adversarial network (CycleGAN). The purpose of the GAN in the study was to improve the object-detection performance by enhancing small-object features in drone imagery. The CFM, integrated into the YOLO-V5s model, filters out irrelevant features during the feature extraction process to enhance object detection. By applying the CFM module to YOLO-V5s and evaluating its performance on the VisDrone dataset, the study demonstrated significant improvements in the detection performance, highlighting the effectiveness of the GAN-based approach in enhancing object detection capabilities in drone imagery.
Ref. [
64] aimed to develop a portable and high-accuracy system for detecting and tracking pavement cracks to ensure road integrity. To address the limited availability of pavement crack images for training, a GAN called PCGAN is introduced. PCGAN generates realistic crack images to augment the dataset for an improved detection accuracy using an improved YOLO v3 algorithm. The YOLO-MF model, a modified version of YOLO v3 with acceleration and median flow algorithms, was employed for crack detection and tracking. This integrated system enhances the efficiency and accuracy of pavement crack detection and monitoring for infrastructure maintenance. We rejected this study because it lacks relation to SAR operations.
Ref. [
50] focuses on addressing the challenges of motion deblurring and marker detection for autonomous drone landing using a deep learning-based approach. To achieve this, the study proposes a two-phase framework that combines a slimmed version of the DeblurGAN model for motion deblurring with the YOLOv2 detector for object detection. The purpose of the DeblurGAN model is to enhance the quality of images affected by motion blur, making it easier for the YOLOv2 detector to accurately detect markers in drone landing scenarios. By training a variant of the YOLO detector on synthesized datasets, the study aimed to improve the marker detection performance in the context of autonomous drone landing. Overall, the study leveraged the DeblurGAN model for motion deblurring and the YOLOv2 detector for object detection to enhance the accuracy and robustness of marker detection in autonomous drone landing applications. We rejected it as its focus is on landing assistance rather than search and rescue.
Ref. [
58] utilized a GAN solution, specifically the CycleGAN model, for domain adaptation in the context of bale detection in precision agriculture. The primary objective was to enhance the performance of the YOLOv3 object detection model for accurately detecting bales of biomass in various environmental conditions. The GAN was employed to transfer styles between images with diverse illuminations, hues, and styles, enabling the YOLOv3 model to be more robust and effective in detecting bales under different scenarios. By training the YOLOv3 model with images processed through the CycleGAN for domain adaptation, the study aimed to improve the accuracy and efficiency of bale detection, ultimately contributing to advancements in agricultural automation and efficiency. The study was rejected because its focus is more aligned with the application of UAVs in agriculture.
Ref. [
59] introduces InsulatorGAN, a novel model based on conditional generative adversarial networks (GANs), designed for insulator detection in high-voltage transmission line inspection using unmanned aerial vehicles (UAVs). The primary purpose of InsulatorGAN is to generate high-resolution and realistic insulator detection images from aerial images captured by drones, addressing limitations in existing object detection models due to dataset scale and parameters. In the study, the authors leveraged the YOLOv3 neural network model for real-time insulator detection under varying image resolutions and lighting conditions, focusing on identifying ice, water, and snow on insulators. This application of YOLOv3 demonstrates the integration of advanced neural network models within the context of insulator detection tasks. While the study does not explicitly mention the use of pre-trained models or training from scratch for InsulatorGAN, the emphasis is on enhancing the quality and resolution of generated insulator images through the proposed GAN framework. By combining GAN technology with YOLOv3 for insulator detection, the study aimed to advance the precision and efficiency of detecting insulators in transmission lines using UAV inspection, contributing to the field of computer vision and smart grid technologies. We rejected it due to its lack of emphasis on search and rescue operations.
Figure 5a compiles the YOLO versions and the number of studies that utilize each one of them.
3.5. Pre-Trained Models
Considering the use of pre-trained models and weights, Ref. [
51] aimed to predict individual motion and view changes in objects in UAV videos for multiple-object tracking. To achieve this, the study proposes a novel network architecture that includes a social LSTM network for individual motion prediction and a Siamese network for global motion analysis. Additionally, a GAN is introduced to generate more accurate motion predictions by incorporating global motion information and objects’ positions from the last frame. The GAN was specifically utilized to enhance the final motion prediction by leveraging the individual motion predictions and view changes extracted by the Siamese network. It plays a crucial role in generating refined motion predictions based on the combined information from the individual and global motion analysis components of the network. Furthermore, the Siamese network is initialized with parameters pre-trained on ImageNet and fine-tuned for the task at hand. This pre-training step helps the Siamese network learn relevant features from a large dataset like ImageNet, which can then be fine-tuned to extract changing information in the scene related to the movement of UAVs in the context of the study.
Ref. [
73] focuses on using generative adversarial networks (GANs) to enhance object detection performance under adverse weather conditions by restoring images affected by weather corruption. Specifically, the Weather-RainGAN and Weather-NightGAN models were developed to address challenges related to weather-corrupted images, such as rain streaks and night scenes, to improve the object detection accuracy for various classes like cars, buses, trucks, motorcycles, persons, and bicycles in driving scenes captured in adverse weather conditions. The study can provide valuable insights into SAR scenarios in snow-covered regions or other severe weather conditions.
Ref. [
71] introduces a GAN for a specific purpose, although the exact application domain is not explicitly mentioned in the provided excerpts. The GAN was crafted to achieve a particular objective within the context of the research, potentially linked to tasks in image processing or computer vision. Furthermore, the study incorporated the use of a pre-trained model, which served a specific purpose in developing or enhancing the proposed GAN solution. The application of the pre-trained model within the study likely aimed to leverage existing knowledge or features to improve the performance or capabilities of the GAN in its intended application domain. This study, despite its focus on human face recognition, was deemed unnecessary for SAR operations, as the UAV is anticipated to operate at high altitudes where facial images of potential individuals would not be readily identifiable. Therefore, we rejected this study.
Ref. [
42] introduces a dual-hop generative adversarial network (DH-GAN) to recognize roads and intersections from aerial images automatically. The DH-GAN was designed to segment roads and intersections at the pixel level from RGB imagery. The first level of the DH-GAN focuses on detecting roads, while the second level is dedicated to identifying intersections. This two-level approach allows the end-to-end training of the network, with two discriminators ensuring accurate segmentation results. Additionally, the study utilized a pre-trained model within the DH-GAN architecture to enhance the intersection detection process. By incorporating the pre-trained model, the DH-GAN can effectively extract intersection locations from the road segmentation output. This integration of the pre-trained model enhances the overall performance of the DH-GAN in accurately identifying intersections within the aerial images. We declined it because it was not closely aligned with the target topic, as it focuses on road detection rather than the detection of objects, animals, or people.
Ref. [
53] aimed to the enhance tracking performance in UAV videos by transferring contextual relations across views. To achieve this, a dual GAN learning mechanism is proposed. The tracking-guided CycleGAN (T-GAN) transfers contextual relations between ground-view and drone-view images, bridging appearance gaps. This process helps adapt to drone views by transferring contextual stable ties. Additionally, an attention GAN (A-GAN) refines these relations from local to global scales using attention maps. The pre-trained model, a Resnet50 model, is fine-tuned to output context operations for the actor–critic agent, which dynamically decides on contextual relations for vehicle tracking under changing appearances across views. Typically, SAR operations are conducted in remote or hard-to-reach areas, making ground-based image capture impractical. Therefore, we rejected thia study.
This systematic literature review yielded highly detailed and diverse results, considering that drone images can serve various purposes such as agriculture, automatic landing, face recognition, the identification of objects on power lines, and so on. Since our focus was primarily on SAR operations, we rejected papers that were unrelated to this topic.
Table 5 and
Table 6 display the selected results along with the metrics used on their work. Here, AP stands for average precision; AR means average recall; AUC, area under the receiver operating characteristic curve; ROC, receiver operating characteristic curve; MAP, Mean average precision; NIQE, natural image quality evaluator; AG, average gradient; PIQE, perception index for quality evaluation; PSNR, Peak Signal-to-Noise Ratio; SSIM, Structural Similarity Index, FID, Fréchet Inception Distance; DSC, the Dice Similarity Coefficient; S-Score, Segmentation Score; MAE, Mean Absolute Error; IS, Inception Score; SMD, Standard Mean Difference; EAV, Edge-Adaptive Variance; PI, Perceptual Index; IOU Intersection over Union; and
and
refer to the center position errors in the longitudinal and lateral driving direction.
Figure 5b shows the most used metrics, along with the number of studies that use them.
Ref. [
77] conducted a comprehensive comparison of the precision, recall, and mean average precision (MAP) metrics between their proposed improved versions of YOLOv5 and the following models: original YOLOv5; YOLOv5 + CBAM, and YOLOv5 + Image Fusionl. The results demonstrated the superior performance of their proposed models in terms of the object detection accuracy. This comparative analysis not only highlights the advancements achieved with the enhanced YOLOv5 variants but also underscores the importance of employing classical evaluation metrics in assessing the efficacy of GANs and YOLOv5 in practical applications.
Ref. [
78] also uses Yolov5 and compares different techniques for object detection with a mAP of 0.5; mAP, 0.5:0.95; precision; and recall.
Ref. [
51] presented a comprehensive array of metrics, including Identification Precision (IDP), Identification Recall (IDR), IDF1 score (F1 score), Multiple-Object Tracking Accuracy (MOTA), Multiple-Object Tracking Precision (MOTP), Mostly Tracked targets (MT), Mostly Lost targets (ML), number of False Positives (FPs), number of False Negatives (FNs), number of ID Switches (IDSs), and the number of times a trajectory is Fragmented (FMs). The authors utilized diverse datasets and compared object monitoring performances across various techniques, namely, Faster-RCNN, R-FCN, SSD, and RDN. This extensive metric evaluation renders this reference an excellent resource for validating metrics applied in target identification in SAR applications.
Ref. [
75] conducted a comparison of the mean average precision (MAP), precision, and recall in object identification using the methods GT + YOLOV5, Blur + YOLOV5, and AMD-GAN + YOLOV5. It serves as an excellent resource for detection comparisons with YOLOV5. Regarding the GAN utilized, the study employed metrics such as the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM), comparing these metrics with various methods, including AMD-GAN without NAS MFPN, AMD-GAN with NAS MFPN, AMD-GAN without AMS Fusion, and AMD-GAN with AMS Fusion.
Ref. [
60] compared the performances of several algorithms, including LighterGAN; EnlightenGAN; CycleGAN; Retinex; and LIME (Low-Light Image Enhancement via Illumination Map Estimation), which showed the highest PIQE score; and DUAL (Dual Illumination Estimation for Robust Exposure Correction), with respect to the PIQE metric.
Ref. [
73] employed the average precision (AP) and mean average precision (mAP) metrics to assess the performances of various restoration methods, including Gaussian Gray Denoising Restormer, Gaussian Colour Denoising Restormer, Weather-RainGAN, and Weather-NightGAN. It evaluated these metrics across different object classes such as car, bus, person, and motorcycle. This comparison serves as an excellent resource for applying these metrics in the context of object detection in works involving the application of GANs.
Ref. [
76] utilized the PSNR and SSIM, in addition to the FID and IS, to compare image modality translation among the methods pix2pixHD, StarGAN v, 3 PearlGAN, TarGAN, and StawGAN. Furthermore, they compared the segmentation performances using metrics such asthe Dice Similarity Coefficient (DSC), S-Score, and Mean Absolute Error (MAE), specifically for TarGAN and StawGAN. This comprehensive evaluation provides insights into the effectiveness of these methods for image translation and segmentation tasks.
Ref. [
49] compared the PSNR, SSIM, and average PI with SRGAN, ESRGAN, and their proposed model. They also used the MAP metric to evaluate different methods for object detection such as SSD and Faster R-CNN.
Ref. [
44] used the average precision to compare augmentation methods to Faster R-CNN without augmentation, in the context of the detection of stingrays.
Ref. [
46] employed the IOU metric to quantify the degree of overlap between predicted car regions and ground truth car regions in the images. Through the analysis of
and
values, the study shows how accurately the segmentation network was able to localize and position the detected cars within the images.
Table 7 and
Table 8 present the selected studies divided into clusters, where each cluster represents a trend of application in the grouped studies, with columns indicating the method applied in UAV-generated images, the main advantage of its application, the main disadvantage, and an example of how the study can be addressed in search and rescue scenarios. Cluster A includes studies focusing on GANs for data augmentation, super-resolution, and motion prediction. Cluster B comprises studies utilizing GANs for deblurring, augmentation, and super-resolution. Cluster C consists of studies employing GANs for small-object detection, fusion, and anomaly detection. Cluster D encompasses studies utilizing GANs for Image translation and fusion. Cluster E includes studies focusing on GANs for weather correction, adverse condition handling, and deblurring.