The Survey of Evolutionary Deep Learning-Based UAV Intelligent Power Inspection

Fan, Shanshan; Cao, Bin

doi:10.3390/drones10010055

Open AccessReview

The Survey of Evolutionary Deep Learning-Based UAV Intelligent Power Inspection

by

Shanshan Fan

^1,2,*

and

Bin Cao

^1,2

¹

School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China

²

State Key Laboratory of Intelligent Power Distribution Equipment and System, Hebei University of Technology, Tianjin 300401, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(1), 55; https://doi.org/10.3390/drones10010055

Submission received: 3 December 2025 / Revised: 4 January 2026 / Accepted: 8 January 2026 / Published: 12 January 2026

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

This paper systematically introduces evolutionary deep learning (EDL) technology into the field of UAV intelligent power inspection. It reviews the current application status of DL models in power inspection, proposes the use of EDL technology to enhance detection performance in power inspection, and summarizes and analyzes key technical solutions for object detection models in addressing challenges such as small objects, complex backgrounds, and sample imbalance.
This paper analyzes the urgent need for lightweight DL models in UAV power inspection under the “cloud–edge collaboration” architecture. It points out that EDL technology can effectively balance multi-objective conflicts such as model accuracy, computational efficiency, and energy consumption by automatically optimizing the network architecture and hyperparameters, and automatically generate target detection models that take into account lightweight, high precision, and environmental robustness.

What are the implications of the main findings?

This paper is the first review that combines evolutionary computing (EC) and DL and specifically applies them to UAV intelligent power inspection. EDL technology automates the optimization of deep network model structures and hyperparameters, effectively addressing the issues of time-consuming manual design and reliance on expert knowledge, providing a systematic reference for subsequent research and engineering applications.
Through multi-objective optimization (e.g., balancing accuracy, parameter size, latency and energy consumption), EDL can directly search for efficient model architectures suitable for embedded platforms. This addresses key bottleneck issues in UAV power inspection scenarios, such as high model design costs and the conflict between edge computing demands and real-time requirements. It offers a novel solution for intelligent detection technology on resource-constrained devices like UAVs.

Abstract

With the rapid development of the power Internet of Things (IoT), the traditional manual inspection mode can no longer meet the growing demand for power equipment inspection. Unmanned aerial vehicle (UAV) intelligent inspection technology, with its efficient and flexible features, has become the mainstream solution. The rapid development of computer vision and deep learning (DL) has significantly improved the accuracy and efficiency of UAV intelligent inspection systems for power equipment. However, mainstream deep learning models have complex structures, and manual design is time-consuming and labor-intensive. In addition, the images collected during the power inspection process by UAVs have problems such as complex backgrounds, uneven lighting, and significant differences in object sizes, which require expert DL domain knowledge and many trial-and-error experiments to design models suitable for application scenarios involving power inspection with UAVs. In response to these difficult problems, evolutionary computation (EC) technology has demonstrated unique advantages in simulating the natural evolutionary process. This technology can independently design lightweight and high-precision deep learning models by automatically optimizing the network structure and hyperparameters. Therefore, this review summarizes the development of evolutionary deep learning (EDL) technology and provides a reference for applying EDL in object detection models used in UAV intelligent power inspection systems. First, the application status of DL-based object detection models in power inspection is reviewed. Then, how EDL technology improves the performance of the models in challenging scenarios such as complex terrain and extreme weather is analyzed by optimizing the network architecture. Finally, the challenges and future research directions of EDL technology in the field of UAV power inspection are discussed, including key issues such as improving the environmental adaptability of the model and reducing computing energy consumption, providing theoretical references for promoting the development of UAV power inspection technology to a higher level.

Keywords:

UAV intelligent power inspection; evolutionary computation; evolutionary deep learning; object detection

1. Introduction

A safe and stable electricity supply is the most important and basic need of modern society. Power equipment is exposed to various harsh environments for long periods and is prone to failure or damage, which results in major security risks during normal power system operations. To ensure the safe operation of the power grid, any defects in the power equipment must be detected in time. Power inspection is mostly used to monitor the statuses of devices such as insulators [1,2], transformers [3], transmission lines [4,5] and poles [6] in the power grid to prevent uncontrolled power outages caused by the failure of these devices. In addition, in some disaster scenarios, it is necessary to quickly obtain panoramic images of the disaster-stricken area to assess the functional damage to power equipment. In scenarios of external force damage, such as detecting construction machinery intrusion and kites wound around wires, it is necessary to achieve second-level object tracking and generate an alarm to ensure the safety of the power grid. However, the conventional state scheme for monitoring power equipment and the associated operation processes is generally carried out via manual inspection, which involves a heavy workload, poor real-time performance, inefficiency and numerous risk factors. Unmanned aerial vehicles (UAVs) can closely observe and monitor power equipment in complex geographical environments and climates, quickly covering large areas of power lines and facilities. In some emergency scenarios, such as postdisaster assessment and external force damage detection, they can identify potential threats in real time and transmit data to achieve a rapid response, and use multiple sensors to solve the problem of inspection blind spots. They can improve the comprehensiveness and accuracy of the detection. Therefore, using UAVs as detection tools to locate faults in electrical equipment has become a new trend in power inspection.

By integrating UAV power inspection with artificial intelligence technology, the intelligence level of UAVs can be effectively enhanced [7,8,9]. Early fault detection of power equipment usually adopts a method that combines manual feature extraction and machine learning [10,11], which is characterized by low robustness and long time consumption. This approach does not achieve good recognition results in power visual detection involving complex backgrounds, and manual feature design requires professional knowledge. With the development of deep learning (DL) and artificial intelligence, UAV-based intelligent power inspection via deep neural networks (DNNs) has gradually replaced traditional methods [12,13], achieving more accurate object detection and positioning through end-to-end learning. Compared with the original fault detection methods based on machine vision, object detection models based on DL are composed of multiple convolutional layers, which have a stronger learning ability. These models can efficiently and automatically extract image features and locate the fault areas in the image, greatly improving the efficiency and accuracy of detection.

In recent years, the application of DL technology in the field of power inspection has significantly increased. A bibliometric analysis based on the Web of Science Database (search keywords: “deep learning”, “object detection”, “power inspection”, and “UAV power inspection”) revealed that the number of related research papers has continued to increase from 2019 to 2024. Specifically, Figure 1 shows the changes in the number of research papers on DL technology in terms of power inspection and UAV power inspection, whereas Figure 2 presents the overall research trend of DL-based object detection technology in the application of power inspection and UAV power inspection. Data show that DL technology is becoming increasingly popular in the field of inspection, and the application of DL-based object detection algorithms in these two fields is also maintaining a steady growth. This development trend fully demonstrates the significant value of DL technology in enhancing the intelligence level of power inspection. Through DL algorithms, especially object detection technology, the accuracy and efficiency of defect identification in power equipment can be significantly improved, providing strong technical support for the safe operation of the power system.

At present, the UAV power inspection system based on DL is undergoing a technological transformation from a centralized cloud computing architecture to an edge–cloud collaboration architecture. Under the traditional cloud computing model, the inspection data collected by UAVs through multimodal sensors (such as visible light cameras, infrared thermal imagers and LiDARs) need to be sent back to the cloud for processing [14]. Although this solves the computing power bottleneck of embedded devices in UAVs, there is a significant data transmission delay, and it is suitable only for nonreal-time batch processing scenarios. To meet real-time requirements, the emerging edge computing paradigm has achieved localized lightweight inference processing of data by deploying computing nodes at the network edge or integrating edge computing devices on UAVs [15], significantly reducing the latency and energy consumption problems caused by cloud data transmission. Therefore, the edge–cloud collaboration architecture is gradually becoming the mainstream solution for UAV power inspection. Its typical implementation approach is to handle lightweight detection tasks with high real-time requirements at the edge while offloading computationally intensive and complex tasks to the cloud for processing. As shown in Figure 3, by equipping edge computing devices in UAVs, real-time processing and analysis of multisource heterogeneous data (including high-definition visible light images, infrared thermal imaging, and LiDAR point clouds) have been achieved through computer vision algorithms based on DL. Real-time detection and abnormal early warning of power equipment defects are completed at the edge end, and key data are uploaded to the cloud at the same time. By aggregating data from edge nodes, high-precision detection models are trained on the cloud using massive amounts of data and pushed back to edge devices, thereby enhancing the accuracy of fault detection at the edge end and forming a virtuous cycle of “edge–cloud collaboration”. However, although DL, with its powerful feature learning and modeling capabilities, can represent data in a more abstract and essential way, its high demand for computing resources conflicts with the limited computing power of edge devices [16,17]. This resource constraint makes efficient deployment of traditional deep models in resource-constrained environments such as UAV-embedded platforms difficult. Therefore, developing a lightweight object detection model that accounts for real-time performance, accuracy and low resource consumption has become a key breakthrough for promoting the application of UAV intelligent power inspection technology.

In recent years, evolutionary computation (EC) has shown great potential in the field of automatic search for DL architectures [18,19]. This stochastic optimization technique, inspired by natural evolution mechanisms, can effectively solve non-convex, non-differentiable and complex multiobjective optimization problems by simulating processes such as natural selection and genetic variation [20,21], providing an innovative solution for improving the performance of DL models. In the intelligent power inspection of UAVs, UAVs are usually equipped with multimodal sensors (including high-definition cameras and LiDARs) for real-time data collection, and then combined with DL technology for real-time analysis and prediction. However, traditional manually designed DL models face multiple challenges. First, the scenarios of intelligent power inspection by UAVs are complex and changeable (such as diverse equipment types and unstable lighting conditions), and traditional manually designed DL models have difficulty adapting to different sensor data (visible light, infrared, ultraviolet, etc.). However, EC can generate the optimal model architecture for multimodal data by automatically searching for the network structure and hyperparameter configuration. Second, in practical application scenarios, it is necessary not only to consider the number of model parameters but also to balance multiple conflicting goals, such as model accuracy, computational cost, time delay, and energy consumption. For example, UAVs need to achieve real-time detection with limited computing power, while the EC can balance model accuracy and computational efficiency through a population evolution mechanism. In addition, defect samples from power equipment are scarce and unevenly distributed. EC can combine the small-sample learning strategy to automatically adjust the data augmentation and feature fusion methods, thereby enhancing the robustness of the deep learning model. Therefore, EC theory offers a new intelligent solution for UAV power inspection. Applying evolutionary deep learning (EDL) technology to a UAV power inspection system and automatically searching for the optimal model architecture, this approach provides a more efficient and reliable solution for the intelligent detection of power equipment, demonstrating significant theoretical value and application prospects.

The current research status indicates that there are already research reviews on the independent applications of EC and DL in the field of power systems. For example, Valencia-Rivera et al. [22] published a review on the application of EC to solve various optimization problems in power systems, and Pop et al. [23] reviewed the optimization application of swarm intelligence algorithms in the EC domain for renewable-powered smart grids. However, none of the above reviews involve DL. Massaoudi et al. [24] summarized the deep learning methods used in smart grids. Vanting et al. [25] introduced the application of DNNs in power load forecasting. Aguiar-Perez et al. [26] researched demand forecasting for smart grids based on DL. Ruan et al. [27] reviewed the cybersecurity of smart grids based on DL. Although they conducted the latest analysis on the application of deep learning in power system fields such as power load forecasting, demand forecasting, and cybersecurity, the above review did not cover EC, and the DL models applied were still obtained through manual experiments. Some reviews on EDL have also been published [20,28,29,30,31], but none have touched upon the scenario of intelligent power inspection by UAVs. Therefore, there is still a significant gap in the research on combining EC and DL and specifically applying them to the intelligent power inspection of UAVs. Collaborative innovation research on the two has not yet been systematically carried out—especially in the special scenario of UAV power inspection, which requires both complex environmental adaptability and real-time performance. Therefore, this review aims to fill this research gap. By analyzing the potential of EDL in UAV power inspection, a new paradigm for automatically generating lightweight and highly robust detection models is explored, providing methodological support and technical route guidance for the construction of next-generation intelligent power inspection systems.

The structure of this article is as shown in Figure 4. In Section 2, the current application status of DL in the field of UAV intelligent power inspection is comprehensively expounded first. Afterward, the development history of the EDL is described, including hyperparameter optimization, network architecture optimization, and multiobjective model optimization. Finally, the performance characteristics of one-stage and two-stage object detection models based on DL are analyzed and compared. Section 3 explores the innovative application of DL models (with a focus on analyzing one-stage, two-stage and lightweight object detection DL models) and EDL theory in intelligent power inspection scenarios, and summarizes the key technical solutions for improving the performance of power inspection detection. Section 5 presents an analysis of the technical bottlenecks currently faced in the field of UAV intelligent power inspection, including key issues such as insufficient detection accuracy for small objects and a scarcity of training data. On this basis, forward-looking future research directions are proposed. Finally, Section 6 summarizes this paper.

2. Foundations

2.1. UAV Intelligent Power Inspection Technology

A malfunction in a power system can result in many inconveniences to people’s lives and can cause very large economic losses. Therefore, strict inspection and maintenance must be carried out. In the early days, power inspection technology often involved staff using the naked eye or optical inspection equipment to observe whether the equipment had malfunctioned. However, this process is time-consuming, highly dangerous, costly and wastes much time. With the development of computer vision technology, power inspection has entered a new era of intelligence. UAVs have the advantages of high flexibility, miniaturization, low energy consumption and high intelligence. An increasing number of UAVs equipped with intelligent vision systems are being used in power inspection, providing a safer, more efficient and more cost-effective method for the inspection process.

Object detection based on UAV intelligent power inspection is challenging. In the field of UAV power inspection, object detection methods can be divided into traditional manual feature extraction methods and end-to-end methods based on DL. Traditional object detection relies mainly on features such as color, shape and edge in the target image for detection and recognition. Usually, feature extraction algorithms are combined with machine learning classification algorithms to achieve object recognition. For example, Zhai et al. [32] utilized Haar features and the AdaBoost method to automatically extract insulators from aerial images. Wang et al. [33] utilized the SVM algorithm to identify insulators from aerial images. In traditional machine learning, the feature extraction stage and the target detection stage are often carried out separately, resulting in low accuracy, slow recognition speed, and feature design that relies on expert experience, which is easily affected by the environment. It is only suitable for the extraction of a single type of target and rarely involves the recognition of multiple types of power equipment.

DL has a powerful feature representation ability. The use of DL to enhance the level of intelligent inspection provided by UAVs in terms of power is currently a research hotspot. DL has end-to-end learning capabilities, making it possible for UAVs to achieve real-time autonomous detection and inspection in power line inspection. The UAV power inspection method based on DL technology does not require manual feature extraction. It can extract the deep features of images through continuous non-linear transformation. Therefore, it is highly capable of handling visible light images with complex backgrounds and power inspection scenarios with variable scenes and diverse object features and meeting the requirements of intelligent image processing in power inspection. Typically, a UAV inspection system based on DL accurately collects data in a power detection environment by being equipped with high-definition cameras and sensor devices and uses DL algorithms for object positioning and fault detection. In terms of the different sensors used, the detection technologies commonly used in the inspection of power equipment can be divided into visible light imaging, infrared thermal imaging and laser radar, and ultraviolet imaging and multispectral imaging.

At present, most mainstream DL detection schemes use visible light datasets as objects [4,34,35]. However, visible light imaging technology is mainly applied to the detection of external faults in power equipment and is unable to identify internal abnormal defects. Common faults can cause abnormal changes in internal temperature. Therefore, thermal imaging technology can assess the status of electrical equipment through color differences in infrared images. First, thermal imaging cameras or infrared cameras are used to receive the infrared radiation emitted by power equipment. After processing, an infrared thermal image is generated, where different colors represent different temperatures. By directly observing the temperature on the surface of the equipment, the fault point can be precisely located. For example, Li et al. [36] proposed a lightweight power equipment detection network (PEDNet) based on YOLOv4-Tiny for real-time detection of infrared images of substation equipment. This network has achieved good results in terms of both detection accuracy and speed. Ou et al. [13] proposed an improved model based on Faster R-CNN [37] for five infrared electrical devices in substations. This model not only has high detection accuracy and speed but is also robust to noise and brightness variations in infrared images.

Laser radar (LiDAR) acquires the position and speed of an object by emitting high-frequency laser pulses toward the detection object and can quickly and accurately obtain three-dimensional spatial information of the object. Through three-dimensional measurements and a morphological analysis of power facilities, the efficiency and accuracy of power inspection have greatly improved. For example, Zhu et al. [38] proposed an end-to-end multi-branch network (EM-Net) for the automatic extraction of transmission equipment from point cloud data acquired by UAV-based LiDAR scanners.

In most cases, incipient equipment faults cause corona or partial discharge. At this time, the acoustic waves or ultraviolet (UV) radiation generated by air ionization can be detected. UV imaging technology captures UV signals generated by external discharges of electrical equipment through UV cameras and assesses the defect level on the basis of the number of photons in the UV image. For example, Davari et al. [39] proposed a method based on Faster R-CNN to use UV-visible light video to diagnose the initial faults of power distribution lines. Ibrahim Uckol et al. [40] first used the YOLO v8 model to identify the position of corona discharge and then used a DNN to classify the corona discharge patterns. However, compared with conventional tools, UV imaging is limited by surface discharge and remains cost-prohibitive.

While optical imagery detects visible faults, thermal and UV imaging reveal invisible anomalies, and radar provides richer 3D measurement data than other sensors do. Therefore, multispectral sensing technology, which can obtain the spectral information of objects in multiple bands, is on the rise. Multispectral sensing technology provides an advanced method for precise monitoring. Specifically, UAVs are equipped with multispectral cameras to synchronously collect data such as the appearance, temperature distribution and corona discharge information from power equipment, enabling state monitoring and early warnings via multispectral image analysis. For example, Han et al. [41] combined the temperature information of infrared thermal imaging in multispectral images with the morphological features of visible light images and utilized DL algorithms to achieve intelligent recognition of typical appearance defects of substation equipment. Yang et al. [42] fused RGB images, temperature data, and 3D point clouds for fault prediction of substation equipment. Among them, LiDAR was used to collect spatial information of the substation equipment, visible light cameras were used to obtain visible appearance information, and infrared cameras were used to obtain the radiation temperature information of the substation equipment. Zhao et al. [43] combined infrared detection, UV detection and visible light detection and integrated them into an UAV platform, providing more comprehensive criterion support for the defect diagnosis of composite insulators.

In summary, images from diverse sensors facilitate the detection of various power equipment faults. DL models leverage robust feature extraction and pattern recognition to effectively fuse multisource heterogeneous sensor data, markedly improving detection accuracy and robustness for compound faults (e.g., insulator damage, foreign objects on conductors, hardware corrosion) in UAV power inspections while meeting real-time requirements for edge devices.

2.2. EDL Research

Manually designing deep learning detection models is both time-consuming and inefficient. The parameters and computational load of general DL models are relatively large, making it difficult to achieve real-time detection on the embedded devices of UAVs with limited computing and storage capabilities. Furthermore, these generic network models are typically specialized for specific tasks or datasets and cannot guarantee optimal performance in real-time and energy-efficient power inspection tasks. Therefore, this motivates researchers to adopt optimisation algorithms to automatically search for better model parameters and architectures to enhance network performance.

EC is an efficient intelligent optimization method that is based on biological evolution mechanisms and population biological behaviors and includes mainly evolutionary algorithms (EAs) and swarm intelligence (SI) algorithms [44]. EA is a population-based metaheuristic algorithm that, by simulating the operation of biological evolution mechanisms, uses crossover, mutation, and environmental selection operators during the iterative process to find solutions to optimization problems. When multiobjective optimization problems that require the simultaneous optimization of multiple conflicting objectives are solved, multiobjective evolutionary algorithms (MOEAs) exhibit excellent performance. Commonly used EAs include the genetic algorithm (GA) [45], evolutionary strategy (ES) [46], differential evolution (DE) [47]. The SI algorithm mainly solves complex optimization problems by simulating the complex and intelligent group behaviors exhibited by biological groups in nature [48], such as particle swarm optimization (PSO) [49] and the firefly algorithm (FA) [50].

Research that combines EC with neural networks has long attracted the attention of scholars. Early EC-based neural network optimization efforts focused mainly on using the EC to search for the model parameters [51] and architectures [52] of shallow neural networks. Currently, neural architecture search based on EC (ENAS) for searching for DL architectures is becoming increasingly popular [53,54,55].

Compared with reinforcement learning (RL)-based or gradient-based neural architecture search (NAS) methods, EC-based approaches show clear advantages in UAV power inspection tasks. In UAV deployment, we care not only about accuracy, but also about several competing goals such as inference latency, energy consumption, and model size. Manual design is usually interpretable and can reuse expert knowledge. However, its design space is limited by human intuition. As a result, it is hard to optimize multiple objectives at the same time, and the large architecture space cannot be fully explored. Gradient-based NAS methods such as DARTS are often more search efficient, but their search space is restricted to differentiable operations. This makes discrete architectural choices difficult to handle. For the same reason, these hardware related objectives are not easily included in the loss function, and many gradient based methods mainly optimize a single objective, typically accuracy. RL-based NAS methods such as NASNet can handle discrete search spaces and support end-to-end optimization. Still, they often suffer from low sample efficiency and unstable training. Reward design can also be difficult, multi-objective balancing is limited, and performance can be sensitive to hyperparameters. In contrast, EC does not require gradients. These black box hardware constraints can be directly added to the fitness function, so hardware aware model search can be achieved. Moreover, gradient-based search can get stuck in local optima. RL methods usually rely on a controller, such as an RNN, to generate architectures sequentially and then update it with policy gradients. This procedure is often sample inefficient, convergence can be unstable, and parallelization is hard. EC, however, maintains population diversity and therefore supports more global exploration in a large non-convex search space. In addition, EC naturally supports parallel evaluation and variable length encoding. Networks with different depth and width can be evolved, which helps find a better balance between accuracy and efficiency under limited compute. This advantage is difficult to match with traditional methods. Table 1 provides a comparative analysis of these neural network optimization methods.

Based on the table above, EDL shows the best fit for UAV-based power line inspection. Its main strengths appear in several key aspects. First, with multiobjective evolutionary algorithms, EDL can naturally search the Pareto front that balances detection accuracy and resource cost. This provides multiple architecture options for different inspection needs. For example, the system can switch between a high-accuracy mode and an endurance-first mode. For hardware awareness, EDL can place real on-device metrics into the fitness function, such as inference latency and energy consumption on the UAV platform. This is more reliable than using FLOPs as a proxy, because FLOPs often fails to reflect real runtime. Meanwhile, EDL is not limited by differentiability. So it can explore complex search spaces that include quantization, pruning, and lightweight convolutions such as depthwise separable convolution and MBConv. In addition, the population-based search makes EDL more robust to initialization and hyperparameter choices, which reduces the need for manual tuning.

From a deployment perspective, EDL usually has a high search cost, often thousands of GPU-hours. However, for UAV power inspection, this “search once, deploy many times” pattern makes the cost acceptable. The discovered architectures can be deployed across many UAVs of the same model, which spreads the cost effectively. More importantly, task- and hardware-specific architectures can be searched for different missions and platforms, such as Jetson Xavier NX and AGX Orin. This enables tight co-optimization between the task and the hardware. Also, the search cost can be reduced significantly with advanced techniques such as surrogate models and weight sharing (see Section 2.2.4). For these reasons, researchers are motivated to use EDL to automatically find better model parameters and architectures, and thereby improve the overall performance of UAV power inspection systems.

2.2.1. Hyperparameter Optimization for EDL

Hyperparameter optimization for DL models refers to the process of finding the best hyperparameter setting, based on a given neural network architecture, to maximize model performance. Hyperparameters usually include architecture related choices, such as the number of hidden layers or modules, kernel size and number of convolution filters, stride, pooling type, whether batch normalization is used, activation function type, and the width of fully connected layers. They also cover training strategy settings, including the learning rate and its scheduling policy, optimizer type and momentum, regularization coefficients, the form of the loss function, batch size, and the strength of data augmentation. This is a large-scale and highly non-convex continuous optimization problem. For example, Sun et al. [56] proposed an evolutionary depth-based CNN method using a GA (EvoCNN). The encoding strategy in this method supports variable-length chromosomes, and its evolutionary hyperparameters include the number and sizes of the filters, the step size and convolution type of the convolutional layer, the filter and step sizes and pooling type of the pooling layer, and the number of neurons contained in the fully connected layer. Wang et al. [57] utilized multiobjective PSO (MOPSO) to optimize the number, growth rate and number of layers of dense blocks to balance classification accuracy and inference delay. Real et al. [58] used a large scale evolutionary algorithm, LargeEvo, to automatically optimize the number of network layers, connectivity patterns, filter sizes, channel numbers, and the learning rate. The algorithm also decides whether Batch Normalization and activation functions are applied. Experiments show that the proposed method can automatically evolve better DNNs for a given dataset. Rajesh et al. [59] proposed a new evolutionary model based on DE (DEvoNet) to find the optimal CNN architecture by optimizing the hyperparameters of a CNN, such as its activation functions (e.g., rectified linear unit (ReLU) and tanh functions), network blocks (e.g., dense and residual blocks) and optimizers (e.g., adaptive moment estimation (Adam) and stochastic gradient descent (SGD)).

From the methods above, we can see that the focus of hyperparameter optimization is shifting from a few training parameters to architectural components. In addition, it is gradually converging with neural architecture search in terms of methodology. However, hyperparameter optimization faces multiple challenges. First, modern deep learning models often include dozens or even hundreds of hyperparameters, which form a complex, high dimensional search space with both continuous and discrete variables. Second, the non-convex nature of the problem makes it harder. The relationship between hyperparameters and model performance is highly non-linear, and many local optima may exist. At the same time, evaluation is extremely costly. Each hyperparameter setting typically has to be evaluated by fully training a deep learning model, so large computational resources are consumed. Given these challenges, a better balance between algorithmic efficiency and optimization performance is urgently needed.

2.2.2. Network Architecture Optimization for EDL

ENAS usually consist of three steps: first, design a search space that includes multiple model architectures; second, use the EC to find candidate architectures within the search space; and finally, select high-performance network architectures on the basis of an evaluation strategy [60,61]. The basic ENAS framework is shown in Figure 5. Early NAS works focused mainly on multiobjective search in classification tasks [62,63]. Most NAS approaches use the architecture learned in the classification task as their object detection component [64] and do not directly search for the optimal network architecture based on the object detection task. Compared with classification architectures, object detection architectures are more complex and consist of several modules, such as backbone networks and FPNs. It has been shown that a better trade-off between network complexity and accuracy can be attained by directly searching for the architecture of an object detection network. In recent years, some works [65,66,67,68,69,70,71,72,73] have proposed the use of NAS to design architectures that are specifically suitable for object detection tasks. Among these, the DetNAS [65] was the first approach to apply the ENAS to an object detection framework. Inspired by one-shot NAS [74], the DetNAS searches for the optimal backbone network in an object detector by decoupling the weight training and structure search procedures. The specific training process is divided into three steps: pretraining the supernetwork on ImageNet, fine-tuning the supernetwork on the utilized detection dataset and searching with the trained supernetwork via ES. Compared with the general backbone, the DetNAS proves that the backbone network of search is more suitable for detection tasks. Liang et al. [75] proposed a new search space for the FPN and designed six paths that can aggregate multilayer information, which can capture much richer and more diverse information than the FPN does. Yao et al. [76] proposed the SM-NAS, a two-stage neural network search strategy from structural to modular. In the structural stage, a rough search of the model architecture is conducted to determine the optimal model architecture for the current task (such as using a one-stage detector or a two-stage detector and what type of backbone to use) and the matching size of the input image. In the modular search phase, the backbone module is fine-tuned to further improve the performance of the model. From the macro architecture (such as the selection of one- or two-stage detectors) to the micro modules (such as backbone network optimization), end-to-end optimization is achieved to solve the compatibility issues of manually designed modules. Therefore, the ENAS can overcome the bottleneck of manual design through automated search, solve the problem of collaborative optimization of multiple modules (backbone network, FPN, etc.) in object detection, and achieve the optimal balance between complexity and accuracy in detection tasks.

2.2.3. Multiobjective Model Optimization in EDL

Deploying DNN models with hundreds of millions of network parameters on resource-constrained embedded devices is highly challenging. The traditional EDL method only takes prediction accuracy as the optimization goal, which often leads to overly complex model structures and many parameters, making it difficult to meet the resource constraints of embedded devices. Multiobjective model search based on the EC introduces a multiobjective optimization mechanism. While ensuring the model’s accuracy, key indicators such as the number of parameters, inference delay, and number of floating-point operations (FLOPs) are incorporated into the optimization objective. For example, Elsken et al. [18] proposed the Lamarkian EA (LEMONADE) to optimize the performance prediction process and resource constraints of their network architecture. Wu et al. [77] established a multiobjective NN pruning (MONNP) model with the accuracy and sparsity of the pruning network as the evolutionary objectives and used the multiobjective PSO algorithm to optimize the pruning threshold for solving the MONNP model so that the final network could run efficiently on mobile or embedded devices. Lu et al. [78] proposed a GA-based NAS method (NSGA-Net), which automatically generates a set of competitive neural network architectures that are competitive with single-objective search and artificial networks at the Pareto front by minimizing classification error and computational complexity. Xue et al. [79] proposed an NAS based on MOEA with probability stack (MOEA-PS). In this method, the structure blocks are stacked according to two objectives—precision and complexity—to generate a CNN. Hu et al. [80] proposed a classification surrogate model based on MOEA for the NAS process. This method balances limited computing resources and predictive performance by minimizing the validation error and computational complexity and uses KNN as a surrogate model to reduce the computational cost of individual evaluation. In addition, the NASNET-based search space is improved by adding new components [81], and a new non-linear activation function, h-swish [82], is used instead of ReLU to improve the accuracy of the model. Jiang et al. [83] proposed an NAS method called MOPSO/D-NET based on the decomposition-based multiobjective PSO algorithm. In this method, two conflicting objectives, the classification error rate and the number of network parameters, are optimized, and a hybrid binary network encoding representation is employed. Ma et al. [84] used ENAS to balance the computational complexity and performance error of their network and to encode the different connection modes between convolutional layers in the form of binary encodings. Lu et al. [85] used NSGA-II to simultaneously optimize a many-objective NAS problem with five objectives, namely, the accuracy achieved on ImageNet, the number of parameters, the number of multiplication-addition operations, and the latency of the CPU and GPU. Yao et al. [86] proposed a lightweight object detection model called GhostShuffleNet (GSNet), which is based on zero-sample NAS. First, a lightweight search space was designed based on ghost shuffle (GS) cells. Afterward, the parameters, number of FLOPs, number of layers, and cost of memory access (MAC) were added to the search strategy as constraints. Finally, a feature fusion module based on GhostPAN [87] was designed. Yan et al. [88] proposed an adaptive segmented multiobjective ENAS (ASMEvoNAS) method that considers classification accuracy, the number of architectural parameters and the number of FLOPs as optimization objectives. During population initialization, the initializing framework with a high number of parameters is filtered out, and the evolutionary stage is divided into three stages such that more matching objectives are adaptively selected to evaluate individuals in the population at different evolutionary phases to reduce the computational cost of fitness evaluation. Zhou et al. [89] proposed an EA-based method for shallowing CNNs at the block level (ESNB), which optimizes the number of blocks and accuracy as objectives and uses a knowledge distillation method to improve the performance of the shallowing model. Wang et al. [90] proposed an evolutionary multiobjective model compression (EMOMC) method to optimize the accuracies, energy consumption levels and model sizes of NNs. The Pareto solution set was searched in the network pruning and quantization space to satisfy the requirements of different embedded devices, and a two-stage pruning- and quantization-based cooperative optimization strategy was designed to improve the evolutionary speed of the proposed method.

2.2.4. Key Technical Components of EDL

The existing ENAS differ in terms of their search spaces, encoding strategies, search strategies and evaluation mechanisms. The details are as follows:

Search spaces. The utilized search space must define the number of layers or blocks contained in the network, how they are connected, and the type and number of cores in each layer or block. Therefore, the search space determines the efficiency of the search process and the quality of the architecture searched. A layer-based search space is very large and cannot effectively search DNNs. Currently, popular architecture design methods that superimpose blocks or units can largely reduce the complexity of the search space [64] and accelerate the process of automatically searching for CNNs by finding the best blocks or units instead of entire CNN architectures. The cell-based search framework is shown in Figure 6. Several researchers have conducted in-depth research on cell-based search spaces. Some methods involve squeeze-and-excitation blocks [91], pyramidal convolution operations [92], depthwise separable convolutional operations [93], dilated convolutional operations [94] and bottleneck convolutional operations. High-performance modules such as the above operations [95] can be added to the search space to improve the efficiency of the search process and the quality of the obtained architectures [96,97,98]. For example, SZWARCMAN et al. [99] proposed an NAS scheme based on an improved quantum evolution algorithm and added more complex convolution and residual units to the search space. Lu et al. [100] proposed a CNN architecture search algorithm based on evolutionary blocks (EB-CNN), designed a search space based on four types of deeply separable convolution blocks and two types of pooling blocks, and proposed a variable-length coding strategy. A new crossover mutation operator was used to optimize the types, number and combination of the two blocks and the width and depth of the resulting architecture. Huang et al. [101] designed a search space based on lightweight mobile inverted bottleneck convolution (MBConv) blocks. MBConv blocks remove the shortcut connections added by element features so that the numbers of input and output feature maps are inconsistent, and use efficient channel focus (ECA) blocks instead of SE blocks to capture cross-channel interactions more efficiently. Fang et al. [102] proposed a search space based on a Reg Block, comprising group convolution [103] and squeeze-and-excitation network (SENet) modules, and designed a variable-architecture coding strategy based on the Reg Block, which effectively expanded the search space. Dong et al. [104] proposed a CNN architecture search method based on fast memetic algorithm. In this scheme, first, a separated search space with two subspaces was proposed and an improved DAG was designed to represent the CNN architectures, then a global search operator and a local search operator were designed for NAS to search for high-quality architectures, and finally, a one-epoch-based performance evaluation strategy was proposed to accelerate the evaluation process.
Encoding strategies. Encoding strategies interconvert the individuals contained in a population and CNN architectures and are therefore very important for solving ENAS optimization problems. These strategies can be divided into fixed-length encoding (e.g., the fixed encoding strategy and the adjacency matrix encoding strategy, as shown in Figure 7) and variable-length encoding (the adjacency list encoding strategy, as shown in Figure 7) [20]. The fixed-length coding strategy more easily performs evolutionary operations such as cross and mutation [105], whereas the variable-length strategy can design more diverse NNs but requires carefully designed evolutionary operators. Sun et al. [106] designed a variable-length encoding strategy to represent ResNet blocks, DenseNet blocks and pooling units, where the ResNet blocks consist of three convolutional layers and a skip connection that uses tensor summation to effectively combine the inputs of the blocks with the outputs of the last layer. In contrast, each DenseNet block consists of four convolutional layers, the input of each layer is a combination of all the previous outputs, and new crossover operators and mutation operators are utilized to evolve the cell parameters to adaptively find the optimal CNN architecture with the appropriate depth. Wang et al. [107] proposed a CNN model optimization method based on a hybrid differential EA, in which an encoding strategy based on internet protocol addresses was used, and a mutation operator and two crossover operators were designed according to the encoding strategy. The second crossover operator, which is based on a Gaussian distribution, can generate offspring of different lengths to satisfy the variable-architecture evolution requirement of a CNN. Xue et al. [19] used an encoding strategy based on an adjacency list. Zhang et al. [108] proposed an improved adaptive scalable NAS method (AS-NAS) based on IDEA [109]. This method utilizes the variable coding for the number of layers and channels within neural network architectures, as well as for the connection relationships between different convolutional layers. In addition, L1/2 regularization can be used to further improve the sparsity of AS-NAS methods. Wen et al. [110] proposed an ENAS approach based on reparameterized visual geometry group (RepVGG) nodes [111] and developed a novel encoding strategy based on directed acyclic graphs; this method can use fixed-length chromosomes to represent block structures with variable depth. Huang et al. [101] proposed a two-level variable-length PSO method to evolve the utilized architecture. This approach encodes the individuals into two parts, where the variable-length encoding strategy is used to represent the depth of the network; the number of filters; and the types, numbers and locations of pooling layers in the MBConv block. Binary encoding is used to represent the connection information between nodes. Finally, a scheme is employed to accelerate the fitness evaluation process by combining extended dynamic early stopping, downsampling (which reduces the feature resolution) and architecture downscaling (which reduces the number of filters).
Evolutionary strategies. Currently, most NAS algorithms use evolutionary algorithms (EAs) [58], reinforcement learning (RL) [60,112], or gradient descent [113] as their search strategies. Specifically, an RL-based approach requires more computational resources than the other types of two search strategies do. Gradient-based algorithms are only suitable for continuous optimization problems; they can reduce the incurred computational costs but require the construction of a supernetwork in advance and are prone to falling into local optima [114]. EAs have great advantages in terms of solving non-convex, non-differentiable and complex multiobjective optimization problems [20,21], so NAS based on EC (ENAS) has attracted increasing attention.
EAs form a population-based evolutionary paradigm that mimics the biological evolution process or group behaviors encountered in nature. Different evolutionary strategies lead to different evolutionary effects (common crossover and mutation strategies are described in Figure 8 and Figure 9, respectively). For example, Real et al. [115] incorporated the individual’s age into the individual selection process, and the improved algorithm could automatically search for better models than the manually designed models that were available at the time. Tong et al. [116] proposed a nondominated sorting genetic algorithm II based reference point (NSGA-II) scheme for NAS, which not only balances the objectives of accuracy and computational complexity but also accounts for the preferences of decision-makers. Li et al. [117] used the hybrid-model estimation of the distribution algorithm to optimize different types of hyperparameters for each network layer in a CNN and proposed an orthogonal initialization strategy, which enabled the algorithm to more uniformly and efficiently explore the large-scale search space to find the optimal solution. Soniya et al. [98] proposed a hybrid method including gradient descent and an EA to simultaneously find the optimal network architecture and its corresponding parameters. Xue et al. [118] proposed an adaptive mutation-based NAS algorithm, in which a self-adaptive mutation strategy pool with three candidate generation strategies (COGSs) was designed to adaptively adjust the mutation strategies employed during the evolutionary process, and an environment selection operator based on semicomplete binary competition was also designed to prevent the elimination of excellent individuals. Yuan et al. [119] proposed an ENAS algorithm based on an efficient autoencoder-based PSO (EAEPSO) strategy, which uses an autoencoder to convert dense block vectors of variable-length into fixed-length continuous decimal latent vectors, and they proposed a dynamic hierarchical fitness evaluation method to estimate the performance of individuals. Qiu et al. [120] proposed an efficient self-learning evolutionary neural architecture search (ESE-NAS) method. This method uses an adaptive learning strategy for mutation sampling to adjust the probability distribution of the mutation operators according to the size of the mutation architecture for determining which type of mutation operator to use. Louati et al. [121] proposed a two-layer optimization method for designing and compressing CNN architectures based on the coevolutionary migration-based algorithm (CEMBA), where an architecture with minimum numbers of convolutional blocks and convolutional nodes in these blocks is generated by performing crossover and mutation at the upper level, while filters are pruned for each architecture in the lower layer to determine the optimal number of filters per layer. Zhang et al. [122] proposed a two-stage NN architecture search algorithm based on block-level and network-level search processes; this method uses an enhanced gradient to search for high-performance and low-complexity blocks. Afterward, a multiobjective EA based on nondominated sorting is used to automatically construct a block-based network. Shang et al. [105] proposed an adaptive parameter adjustment mechanism that is based on network architecture diversity. Upon dividing the population individuals into exploitation and exploration parts, the individuals in the exploitation part evolve via guided mutation based on the potential contribution (PCGM) based on the gene distribution index matrix, and the exploration part generates offspring via crossover and low-rate mutation operations. In addition, an aging mechanism-based environment selection method was designed to increase the exploration ability of the algorithm in the search space, thus obtaining better solutions. Zhang et al. [96] proposed a new evolutionary OSNAS (Evo-OSNAS) framework. This framework uses a neural architecture representation scheme composed of an architecture chromosome and a switch chromosome, which is used to generate candidate submodels from a one-shot model, employs a matching crossover operator consisting of node crossover and switch crossover, and introduces a set of pyramidal convolution (PyConv) operations in the search space to improve the feature processing ability of the algorithm. An et al. [61] proposed an EA based on knowledge reconstruction for NAS. First, a search space construction method based on network morphism was designed to avoid training the network from scratch; then, a hierarchical variable-length encoding strategy was designed to encode the architecture and weight simultaneously; and finally, DE algorithm was used to explore the optimal network architecture. Du et al. [123] proposed a Dual-Space K-Medoids niching technique (DSKM). By combining information from the decision space (parameter values) and the objective space (model performance), the method partitions a complex search space into several simpler and more promising subregions, namely niches, so diversity can be maintained during the search.
Performance evaluation. The performance evaluation process implemented in the ENAS usually requires each candidate architecture to be trained on the given dataset until convergence is reached, which consumes considerable search time and computational resources. To reduce the required evaluation cost, many studies have used low-fidelity evaluation techniques, i.e., reducing the number of training epochs or using a small portion of the employed dataset to train their networks [19,64,100,115,124,125]. Some works have built surrogate models for predicting the performance of candidate networks to improve their evaluation efficiency. For example, Sun et al. [126] used an end-to-end random forest model to predict the performance of deep learning models instead of expensive training processes. Li et al. [117] proposed a surrogate-assisted multilevel evaluation (SME) method, which first conducts surrogate-based evaluations of CNNs and then conducts training-based evaluations of promising CNNs, thus forming a two-layer evaluation scheme. Ma et al. [127] proposed a Pareto-wise end-to-end ranking classifier to identify the optimal architecture during the multiobjective search process; this search scheme divided the architectures into “good” and “poor” categories on the basis of their clustering and $α$ -dominance strategies, and was used to train and validate the proposed classifier. Therefore, no real objective evaluation is performed on the architectures observed in the environment selection step; rather, the classifier predicts the dominance relationships between the candidate architectures and the reference architectures. Wang et al. [128] proposed a surrogate-assisted PSO algorithm to automatically evolve CNNs; this method integrates a surrogate binary classification model based on a support vector machine (SVM) and a surrogate dataset construction method based on reducing the sample size of the given dataset and decreasing the image resolution during PSO to significantly reduce the incurred computational cost. Wei et al. [129] proposed an ENAS strategy guided by a graph-based neural predictor, which can directly output the performance prediction obtained for the architecture. Chen et al. [130] proposed a surrogate-assisted highly cooperative coevolutionary algorithm based on a radial basis function, and a segment-based overlapping decomposition (SOD) strategy was used to optimize the hyperparameters of the chained CNN. Luong et al. [131] proposed an enhanced training-free multiobjective evolutionary NAS (E-TF-MOENAS) scheme, which uses multiple training-free performance indicators to evaluate architectures in the architecture search stage with NSGA-II. This method can obtain a better Pareto front while reducing the computational cost. Sun et al. [132] proposed a new training protocol for performance predictors in the ENAS. First, the difference between any two sampling architectures was used as a training example. A paired ranking index was then designed to construct the training objectives; that is, the ranking information between any two training samples was used to train the logistic regression model. Peng et al. [133] proposed a predictor-assisted evolutionary NAS (PRE-NAS) process. By learning from a small number of representative training samples, a fixed-size weight inheritance strategy based on a fixed-size network was adopted to retain the characteristics of the network to the greatest extent possible, thereby reducing training costs and improving the accuracy of the output architecture performance ranking predictions. In addition, weight-sharing schemes represented by one-shot models regard all candidate networks as subnetworks of the overarching supernetwork [134,135] and reduce the computational cost by sharing weights [97]. For example, Zhang et al. [136] trained individuals on small batches of training data and adopted a nodal inheritance strategy so that the fitness values of all offspring individuals could be evaluated without training. Several evaluation methods also use more than one of the above methods. For example, Liao et al. [97] trained each of their networks through parameter inheritance on a small training dataset to significantly reduce the computational overhead. Yuan et al. [137] proposed a performance predictor based on random forest to predict the probability that the performance of offspring was better than that of their parents, and a weight inheritance method including parent inheritance and weight pool inheritance was proposed to further improve the attained evaluation efficiency.

2.3. Research on Object Detection Algorithms Based on DL

Generally, an object detection network consists of a backbone network for feature extraction, a neck network for feature fusion, and a head network for classification and localization. On the basis of their head networks, object detection networks can be divided into one-stage and two-stage detection networks, as shown in Figure 10 and Figure 11, respectively.

Two-stage detection algorithms are represented by Faster R-CNN [37] and Mask R-CNN [138]. These methods first use a Region Proposal Network (RPN) to generate candidate regions, and then perform refined classification and regression on those proposals. Two-stage models usually achieve high detection accuracy on general benchmarks. However, their two-step inference pipeline is complex, which leads to high computation cost and slower speed. As a result, they are often unable to meet the strict real-time requirements of UAVs or edge devices. In contrast, one-stage detectors remove the region proposal step and perform end-to-end prediction directly on feature maps. Typical models include the YOLO family [82,139], SSD [140], and RetinaNet [141]. YOLO, proposed by Redmon et al. [82], divides the input image into an S×S grid, and each grid cell directly outputs bounding box coordinates and class scores. Later, YOLO v3 [142] adopted Darknet-53 as the backbone and introduced FPN [143] for multi-scale feature fusion, which improved small-object detection. Building on this, YOLO v4 [144] used the CSPDarknet53 backbone, Mosaic data augmentation, and the CIOU loss to further enhance feature learning. YOLO v5 improved network efficiency through the Focus module, CSP structures, and adaptive anchor computation. YOLO v7 [139] introduced the E-ELAN module, which extracts richer features by controlling gradient paths. YOLO v8 adopted the C2f module and an anchor-free strategy, and its decoupled head design improved detection accuracy. Although YOLO detection algorithms are relatively fast, they have difficulties addressing small objects. Liu et al. [140] proposed an SSD [140] with VGG16 as its backbone network, which performs classification and regression on multiscale feature maps and achieves better performance than YOLO when handling small objects; however, its detection speed is lower than that of YOLO. RetinaNet [141] uses a focal loss to improve the serious imbalance between the proportions of positive and negative samples in one-stage algorithms, which enables the accuracy of a one-stage detector to reach or even exceed that of a two-stage detector. Considering that processing operations such as nonmaximum suppression (NMS) on anchors reduce the speed of the utilized algorithm, Zhou et al. [145] proposed an anchor-free CenterNet model, which only predicts the position of the center point of the corresponding bounding box. Table 2 and Table 3 compares the common two-stage and one-stage detection algorithms, respectively.

Although the detectors mentioned above achieve strong results on general datasets, they still show clear limitations in power inspection scenarios. For example, defects on power equipment are often extremely small, such as tiny cracks on insulators or missing pins. Mainstream backbones like CSPDarknet used in YOLO v5 and YOLO v7, and ResNet used in Faster R-CNN, usually apply multiple down-sampling stages to obtain a larger receptive field and richer semantic features. However, after repeated down-sampling, these targets occupy only a few pixels on the feature maps. As a result, their features can be overwhelmed or quantized away in deeper layers, and the objects may not be detected at all. In addition, power inspection involves a huge scale range. Detecting large transmission towers requires a very large receptive field to capture the global structure, while detecting bolts needs a very small receptive field to avoid introducing background noise. Existing backbones struggle to satisfy both extreme needs within the same feature level. Meanwhile, objects such as transmission conductors and Stockbridge dampers have extreme aspect ratios and are very thin and long. Generic Anchors are often clustered from the COCO dataset, where common ratios are around 1:1, 1:2, and 2:1. This leads to low IoU matching and failed positive-sample assignment. If Anchors are made too small to favor small objects, the computational cost increases significantly. If Anchors are not sufficiently small, small objects are treated as background.

These observations show that mainstream detectors in power inspection suffer from small object feature disappearance, strong background interference, and difficult matching for special shaped objects. In essence, manually designed general network structures cannot adapt dynamically to the unique properties of power inspection scenes. Under these bottlenecks, simply relying on manual trial and error to adjust network depth, width, or module combinations is inefficient and it is also hard to reach the performance limit. Therefore, it has become necessary to introduce EDL and use the strong search and optimization ability of Evolutionary Computation EC to automatically find network architectures that best fit power inspection tasks, so that inspection performance can be improved.

3. Research on UAV Intelligent Power Inspection Methods Based on EDL

With the rapid growth of power systems, UAV-based intelligent power inspection has become a key approach to ensure grid safety. Although DL-based object detection has achieved notable results in tasks such as insulator localization and defect identification, most mainstream models (e.g., YOLO series, R-CNN series) still rely heavily on manual experience for network architecture design and hyperparameter tuning. In power inspection, three core challenges are repeatedly encountered, namely large scale variation in targets, complex backgrounds, and limited computing resources on edge devices. Under these constraints, the traditional trial and error design process has shown clear drawbacks. It is inefficient, generalizes poorly, and often fails to achieve the best trade off between accuracy and speed. EDL can simulate biological evolution mechanisms, so it can automatically search for better neural network parameter settings and topology structures. This section will first provide a brief review of how traditional deep learning is used in inspection tasks and where it falls short. Then it will focus on hyperparameter optimization and NAS based on Evolutionary Computation. Finally, we analyze how EDL can be tailored to address the key technical challenges in UAV power inspection.

3.1. Intelligent Power Inspection Methods Based on One-Stage Object Detection Models

One-stage detectors predict object classes and locations directly from network features. They run fast, so they dominate in power inspection tasks with strong real-time requirements, such as insulator condition monitoring and foreign-object detection on transmission lines using UAVs. At present, the YOLO v3 to v11 is the most widely used in practical power inspection applications [146,147,148]. Other One-stage frameworks have also shown specific strengths in this field. For example, Zheng et al. [3] designed an improved Iresgroup backbone for CenterNet to enhance feature learning from infrared images. Zheng et al. [1] improved the FSSD by fusing multi-scale feature maps, which strengthens infrared feature extraction. Xu et al. [149] and Wang et al. [150] used the SSD to localize and identify defects in insulators and transmission lines, respectively. Mao et al. [151] replaced the original centerness prediction branch in FCOS with an IoU prediction branch that shares weights with the regression branch, which improves localization and recognition of small defects on transmission lines. Xu et al. [152] replaced RetinaNet’s backbone with DenseNet121 and optimized the Anchor Box design, achieving higher defect detection accuracy for transmission lines while reducing the number of parameters. Table 4 summarizes the research progress of one-stage object detection models in the field of power inspection.

To handle complex backgrounds and multi-scale targets, earlier studies mainly improved models based on human expertise. Effective modifications in the literature can be summarized as follows:

•: Network architecture optimization: Multi-scale feature extraction and fusion were strengthened by introducing improved structures such as densely connected FPN [153], CSPD Block [154], SPP-Net [151], and BiFPN variants [155]. Researchers also enhanced performance by optimizing FSSD [1], the CenterNet backbone [3], and the RetinaNet backbone [152].
•: Attention mechanisms: By embedding SENet [156], visual attention mechanisms [153], the channel-spatial attention (CSAttention) module [157], CBAM [155], CoordAtt [158], and channel attention (SE module) [159], models can focus on key features, suppress background noise and improve small-object detection.
•: Loss function innovation: Focal Loss [158], SIoU [160], and DIoU-NMS [161] were adopted to address class imbalance and bounding-box regression issues.
•: Adaptive anchors: The use of k-means/k-means++ clustering to generate anchor boxes that are more closely matched to the object sizes has been widely adopted [154,155,158,161].
•: Lightweight design: The computational complexity is reduced through model design (such as Hybrid-YOLO combining YOLOv5 and ResNet-18 [162]) and component improvement (such as replacing with grouped convolution [3]) to meet the deployment requirements of edge devices such as UAVs [163].
•: Data augmentation techniques: To increase dataset diversity and cope with challenging environments, researchers built task-specific datasets (e.g., the SDFC transmission-line faulty component dataset [164] and a substation equipment infrared dataset [3]) and developed synthesis methods (e.g., the synthetic fogging algorithm [159]).

Although these manual improvements have shown clear benefits, they are often limited to local module tuning. Architecture design also relies heavily on human experience, so global optimality is hard to guarantee, and generalization across diverse power equipment and complex environments is uncertain. In addition, many experiments are needed to validate different hyperparameter combinations. However, from the EDL perspective, the diverse single-stage architectures (YOLO, RetinaNet, FCOS, etc.) and their rich components (attention modules, feature pyramids, activation functions) actually form a large search space for architecture exploration. These modules naturally serve as “encodable” gene loci for EDL, including backbone block types, feature-layer connection patterns, attention insertion positions, loss combinations, sample assignment and NMS strategies, Anchor vs. Anchor-free choices, input resolution, and augmentation policies. If we abstract the above modifications into a modular and composable library, EDL can automatically assemble them and perform multiobjective optimization under multiple constraints such as speed, accuracy, energy use, latency, and model size.

Table 4. Power intelligent inspection based on one-stage object detection model.

Original Method	Improvement	Application Scenario	Purpose	Performance Improvement	Reference
YOLO v2	Data enhancement	Insulator detection on transmission lines	accurately locate insulator	Classification accuracy: 87%	[147]
YOLO v3	Introduced Dense Connected Feature Pyramid Network (DC-FPN) in the Neck	Insulator detection on transmission lines	improve the detection ability of small objects and reduce the overfitting	mAP: +3.25%	[153]
	Introduced CSPNet and DenseNet; constructed CSPD module; CIoU loss; k-means++ for anchors	Insulator detection on transmission lines	enhance feature extraction capability to improve detection of small objects in complex background	AP: +4.9%	[154]
	Rapid localization in visible light; extracted discharge features for status judgment	Insulator location recognition	use the dual optical path information of the visible light path and the UV light path to realize the intelligent inspection	mAP: 93.07%; detection time: 0.0182 s	[148]
YOLO v4	Image enhancement/denoising; 4× upsampling in backbone; Focal Loss	Fault detection of substation power equipment	solve the problem of locating difficulty and the imbalance of positive and negative samples	mAP: +2%	[160]
YOLO v4	Data enhancement; k-means for anchors; DIoU-NMS	Foreign object detection on transmission lines	improve the detection ability of small objects and reduce the overfitting	mAP: +8.39%	[161]
YOLO v5	“Locate first, classify later” framework; VGG16 backbone; SPP-Net Neck; transfer learning	Insulator detection on transmission lines	improve the accuracy of insulator defect detection	Accuracy: 89%	[165]
	CSPNet backbone; FPN+PAN “double tower” strategy; Slicing operation	Recognition and positioning model of the bird’s nest	realize the real-time identification and positioning of the bird’s nest	mAP: 83.4%; FPS: 85.32	[166]
	k-means clustering; Mosaic data augmentation	Insulator detection on transmission lines	improve the detection ability of small objects and reduce the overfitting	Precision: +0.6%	[146]
	CSAttention in Backbone; MAttention in FPN	Insulator detection on transmission lines	enhance feature learning and extraction capabilities	AP: +0.4%; FPS: −2	[157]
	Synthetic dataset (SDFC); CBAM; SimCSPSPPF; optimized head; two-stage transfer learning	Defect detection of components on transmission lines	reduces the interference of complex background and improves detection accuracy of small object	mAP: +7.6%	[164]
	Synthetic fog algorithm; SFID dataset; SE module in backbone	Insulator detection on transmission lines	expand the small sample dataset and improve the detection accuracy	mAP: +1.3%; F1: +0.7%	[159]
	ResNet-18 backbone; improved PANet neck	Insulator detection on transmission lines	improve the accuracy of object detection in small datasets and reduce the computation amount	mAP: +3.5%; F1: +1.7%	[162]
	k-means++; CBAM in backbone; BiFPN design	Insulator detection on transmission lines	improve the detection ability of small objects in complex background	AP: +0.57%	[155]
YOLO v7	k-means++; Coordinate Attention; HorBlock; optimized loss and NMS	Insulator detection on transmission lines	improve the defect detection accuracy of small objects and solve the problem of positive and negative sample imbalance	mAP: +3.7%	[158]
YOLO v8	Constructed dataset with 6020 insulator images	Insulator detection on transmission lines	reduce overfitting due to small samples	mAP: 99.4%	[163]
SSD	Multi-scale feature maps	Insulator detection on transmission lines	improve the defect detection accuracy of small objects	Time: 0.03 s; mAP: 94.7%	[149]
SSD	Compared 5 extraction networks; multi-scale training; horizontal mirroring	Defect detection of components on transmission lines	improve the defect detection accuracy of small objects in complex background Accuracy: 98.9%		[150]
FSSD	Feature enhancement in backbone; improved fusion; adaptive bounding box ratios	Insulator defect detection in substations	utilize the information of shallow feature map and improve the detection accuracy of different scale objects	AP: +0.72%	[1]
CenterNet	Proposed Iresgroup backbone; constructed infrared dataset	Defect detection of substation equipment	improve the defect detection accuracy of small objects in complex background	mAP: +2.8%	[3]
FCOS	Binocular Feature Fusion; Feature Screening Module; improved head; loss optimization	Transmission line component defect detection	Improve small object detection in complex backgrounds	mAP: +4.1%	[151]
RetinaNet	DenseNet121 backbone; optimized anchor design	Transmission line component defect detection	Reduce parameters; improve accuracy	mAP: +1.23%; Params: −38.2 M	[152]

3.2. Intelligent Power Inspection Methods Based on Two-Stage Object Detection Models

Two-stage detectors (such as Faster R-CNN, Mask R-CNN) keep an accuracy advantage in tasks like tiny defect diagnosis on insulators, transmission line monitoring, and fine-grained inspection of substation equipment, thanks to the RPN and the RoI refinement mechanism [167,168]. However, their computation cost is high, which limits direct deployment on UAV platforms. Table 5 summarizes the research progress of two-stage object detection models in the field of power inspection. Current studies mainly improve them in the following directions:

•: Architecture optimization: lightweight backbones (replacing the backbone with MobileNet architectures [4,169]), enhanced feature extraction (ResNet50+FPN to preserve small-object features [170], regional self-attention modules [171]), temporal fusion (using LSTM to learn region dependencies [87]), and special anchor design (1:3/3:1 aspect ratios to fit slender devices [13,103]).
•: Detection mechanism optimization: RoIAlign to reduce quantization errors [170], crop_and_resize for faster processing [172], center-guided NMS to reduce redundant boxes [169], and refinable RPN to improve recall [171].
•: Data augmentation and training strategies: rotation, brightness adjustment, and Gaussian noise to expand the training set [173], inter-class sampling to balance long-tail distributions [169], and multi-scale training [171].
•: System-level innovation: federated learning for collaborative edge training [4], cloud–edge collaboration architectures that reduce UAV computing load by 60% [174], and UV–visible multi-modal fusion with hierarchical detection for fault diagnosis [39].

Because two-stage models are computationally heavy, it is difficult for them to meet real-time requirements on edge devices. In addition, parameters such as anchor sizes, IoU thresholds, and NMS strategies have a strong impact on performance. The design of modules like RPN and RoI also requires deep domain knowledge, so manual tuning is challenging and rarely reaches a global optimum. This is where EDL becomes useful. By parameterizing each component of a two-stage detector, EDL can leverage population-based global search to break the limits of hand-crafted designs.

For example, for power inspection tasks, RPN structures, anchor configurations and feature extraction strategies can be automatically optimized to match power equipment objects that often have “slender shapes and tiny local defects”. RoI feature alignment choices (RoIPooling/RoIAlign/crop_and_resize) and post-processing strategies (NMS variants) can also be included in the search, so recall and false positives are better balanced. Moreover, EDL can perform Pareto search across objectives such as “detection accuracy, recall, inference latency, and power consumption”. This makes it possible to automatically decide whether a two-stage solution should run on the cloud side, the edge side, or a collaborative setup, and which high-cost modules are worth keeping.

Table 5. Power intelligent inspection based on two-stage object detection model.

Original Method	Improvement	Application Scenario	Purpose	Performance Improvement	Reference
Faster R-CNN	Added an LSTM layer between the ROI pooling layer and the fully connected layer	Fault diagnosis of substation equipment	reduce the influence of the occlusion on object recognition	mAP: +3%	[87]
	Used `crop_and_resize` to replace RoI Pooling, retained small proposal boxes, increased mini-batch to 128 and used data augmentation	Multi-class defects detection of electrical equipment	improve inspection accuracy and speed	mAP: 81.11%	[172]
	Inter-class Sampling, Center-guided NMS, Category-adaptive Threshold, and replaced ResNet50 with MobileNet v2	Defects detection of substation equipment	Solve uneven sample distribution, reduce redundant boxes and parameters.	mAP: +4.4%	[169]
	Used Deep Residual Network for feature extraction, optimized RPN, and adopted joint training for shared conv layers	Defects detection of substation equipment	detect more slender targets and improve the detection ability	Accuracy: +9.17%; F1: +8.96%	[103]
	Discarded some high-level conv layers of VGG16 to reduce parameters, and optimized Anchor ratio	Defects detection of substation equipment	detect more slender targets and reduce the number of parameters	mAP: +0.94%	[13]
	Used ResNet50 instead of VGGNet16, introduced FPN, and used RoIAlign instead of RoIPooling	Insulator detection on transmission lines	enhance the feature extraction ability and improve the detection accuracy of small objects	mAP: +7.52%; F1: +7.02	[170]
	Introduced FPN, used Hough transform for tilt angle calculation and horizontal correction	Insulator detection on transmission lines	improve the accuracy of insulator recognition and fault detection	mAP: +2.7%	[175]
	Backbone used GoogLeNet Inception V2, and applied data augmentation	Foreign object detection on transmission lines	expand the small sample dataset and improve the detection accuracy	Precision: +10.4%	[173]
	Designed Spatial Region Attention Blocks, Multitask Framework, Refinable RPN, and multi-scale training	Power Line Parts Detection	enhance feature extraction capabilities and improve multi-scale target detection accuracy	mAP: +4.7%	[171]
Mask R-CNN	Added Mask branch to Faster R-CNN, and introduced ROI Align to replace ROI Pooling	Foreign objects in power lines	improve the speed, efficiency and recognition accuracy of foreign object detection	mAP: +2.2%	[168]
	Pixel-level segmentation, Transfer Learning, dynamic learning rate, and infrared grayscale-to-temperature mapping	Insulator detection on transmission lines	increase the convergence speed of the network and improve detection accuracy for small samples	mAP: 77%; FPS: 5.07	[176]
	Adaptive Federated Learning (Adaptive FL), MobileNet backbone, and ROI Align technology	Insulator detection on transmission lines	improve the training speed and reduce the communication cost	mAP: +1.1%; Det. Time: −40%; Train Time: −10%	[4]

3.3. Intelligent Power Inspection Methods Based on Lightweight Object Detection Models

To fit the resource limits of UAVs and other edge devices, lightweight design has become a key direction for intelligent power inspection. Table 6 summarizes the research progress of lightweight object detection models in the field of power inspection. Traditional solutions mainly rely on manually designed lightweight backbones (such as MobileNet v2 [4], MobileNet v3 [177,178], RepVGG-A0 [14]) and model pruning [179]. In addition, computation can be reduced by introducing depthwise separable convolution [180,181], Ghost modules [177], anchor-free detection heads [182], and parameter-free attention mechanisms (SimAM) [183].

Although these hand-crafted lightweight strategies have achieved some gains, it is still hard to find the best trade-off between accuracy and compactness. Moreover, there are many possible combinations of depthwise separable convolution, Ghost modules, and related blocks, so manual search is not practical. EDL provides a new automated design paradigm for building lightweight models. It can be summarized into four actionable paths:

EDL-NAS-style architecture search: Evolutionary strategies can be used to encode and search across dimensions such as “lightweight backbone block types, operator sets (depthwise separable, group convolution, Ghost-like modules), neck fusion topology, detection head type (Anchor-free and Anchor-based), and attention insertion points”. This directly produces architectures that satisfy on-device constraints.
EDL-driven multiobjective compression: Pruning ratios, layer retention policies, quantization bit-width, and distillation weights can be treated as evolutionary variables. Then Pareto optimization is performed between accuracy and latency/energy, rather than using a single compression plan with a fixed pruning ratio.
EDL-based joint training-strategy search: For a target deployment platform (ARM CPU, NVIDIA Jetson), EDL can automatically adjust hyperparameters such as input resolution, augmentation combinations, loss weights, positive/negative sample assignment, and NMS thresholds.
ENAS for Pareto-frontier lightweight architectures: Within a large search space, model accuracy, FLOPs, and inference latency can be set as constraints or as multiobjective functions in the evolutionary algorithm. In this way, evolution can automatically produce lightweight architectures on the Pareto frontier, which can then be selected based on the task and the platform.

3.4. Intelligent Power Inspection Methods Based on EDL

In the studies discussed above, most improvements still belong to “manual DL design”. Researchers first select a network architecture such as YOLOv5 or Faster R CNN, choose a backbone type and attention modules, and then tune hyperparameters based on experience. In the end, a model that performs well on a specific dataset is obtained. However, UAV-based power inspection faces three key challenges: small targets, complex environments, and limited edge computing. Under these conditions, the trial-and-error design paradigm has several clear limitations. First, the architecture design space is extremely large. The combinations of network depth, channel width, and connection patterns grow exponentially, so manual search cannot cover it. Hyperparameters such as learning rate, batch size, and loss weights also interact in complex ways, which makes experience-based tuning slow and inefficient. Second, the best structure can vary greatly across scenarios. For specific power equipment such as different insulator types, the network often needs to be redesigned, and hyperparameters must be tuned again when the model is transferred. Third, when compute and communication are limited, there is no unified way to balance accuracy, complexity, and robustness.

EDL combines EC with DNN training. It turns DL design into a searchable optimization problem. By performing evolutionary search in a large combinational space, EDL can automatically find optimal or near-optimal solutions under constraints, and it can greatly reduce the cost of manual trial and error. The following sections discuss parameter optimization and architecture optimization, and they provide a targeted comparison in the context of UAV power inspection.

3.4.1. Network Parameter Optimization

In existing EDL studies for power inspection, parameter-level optimization is the dominant direction. It mainly includes two types: weight/bias optimization and hyperparameter tuning.

For weight and bias optimization, Huang [189] used an EA to directly optimize the connection weights and biases of a feedforward neural network, which improved the accuracy of transformer fault diagnosis. Meng et al. [190] introduced quantum-inspired particle swarm optimization (QPSO) into an RBF neural network to tune the hidden-layer kernel parameters and the output connection weights, so the output error of the transformer fault diagnosis model was minimized. Haghnegahdar et al. [191] employed WOA [192] to initialize and adjust ANN weights, with minimum mean square error as the objective, and achieved higher accuracy and better stability for power-system attack and fault prediction.

For hyperparameter optimization, Mitra et al. [193] applied particle swarm optimization to select the number and size of convolution kernels in a 1D CNN. This enabled joint optimization of band partitioning and feature extraction for transmission line fault detection. Zhang et al. [194] and Tao et al. [195] used modified differential evolution whale optimization (MDE-WOA) and an improved salp swarm algorithm (ISSA), respectively, to optimize the smoothing factor of a probabilistic neural network, thereby improving classification accuracy in transformer fault diagnosis. Lu et al. [196] adopted an improved Sand Cat Swarm Optimization method to tune BiGRU hyperparameters (batch size, alpha, number of hidden layers, and the number of neurons per layer), which enhanced generalization and correct classification. Elmasry et al. [197] proposed a Double PSO-based Algorithm to determine optimal features and hyperparameters (maximum number of iterations, swarm size, minimum velocity, maximum velocity, inertia weight constant, acceleration coefficients, stopping threshold, weight parameter, and free parameter), aiming to improve electrical fault detection. Yang et al. [198] developed a chaotic adaptive simulated annealing whale optimization algorithm (CASAWOA) to optimize RBFNN parameters (center weights, hidden-layer neuron width, and output weights) and network size for electronic current transformer fault diagnosis. Klaar et al. [199] combined empirical wavelet transform (EWT), an attention mechanism and LSTM, and then used Optuna to automatically tune hyperparameters (hidden units, activation functions such as ReLU, ELU, and Tanh, and the learning rate) for insulator fault prediction.

Closer to real UAV-based visual inspection is the evolutionary optimization of hyperparameters for object detection networks. Jiang et al. [200] used PSO to jointly optimize the batch size and input resolution of YOLOX. They also designed a composite metric that combines mAP, recall, and precision, and used it as the PSO fitness function. After PSO, the best batch size was set to 54 and the optimal input resolution was 480×480. Experiments showed that the PSO-optimized YOLOX could detect collapsed utility poles in aerial images both accurately and quickly. It outperformed the original YOLOX, YOLOv3, Faster R-CNN, and ELM in accuracy and robustness, and it was especially strong when dealing with “false tilt” cases and complex backgrounds. Stefenon et al. [201] proposed a GA-based insulator defect detection model, called hypertuned-YOLO. They used detection mAP as the fitness function and automatically tuned 21 hyperparameters in the YOLO framework, including learning-rate and optimizer-related settings, warmup parameters, loss gains, threshold and anchor settings, and data augmentation parameters. This tuning significantly improved both performance and convergence speed for distribution network fault detection.

From the key challenges of UAV power inspection, the role of parameter-level EDL can be summarized as follows:

For small-object detection, evolutionary search over input resolution, layer-wise weights in multi-scale feature fusion, anchor scales and matching thresholds, and positive/negative sample sampling strategies can automatically find configurations that best fit small targets such as insulators, fittings, and conductor bolts in the current inspection data. As a result, recall can be greatly improved. Compared with manual tuning, evolutionary algorithms explore a much wider space and may discover non-obvious combinations. For example, smaller anchors can be paired with a higher IoU threshold and specific data augmentation policies, which helps reduce confusion between small objects and the background.
For adaptation to complex environments, EDL can use robustness metrics as the fitness function, such as mAP on subsets with rain/fog, backlight, or strong background interference. By adjusting loss weights, classification/regression balance factors, hyperparameters of Focal Loss and IoU loss, and augmentation strength, the method can automatically find a model that performs well across diverse conditions. In this way, it can partly replace hand-crafted training strategies designed for different weather and lighting.
Under edge-deployment constraints, evolutionary optimization can explicitly include model size, FLOPs, and inference latency in the fitness function. This turns the task into a multiobjective or constrained optimization problem that balances accuracy and complexity. It then searches for hyperparameter settings that meet real-time and power limits on onboard UAV computing platforms.

However, hyperparameter optimization alone often brings only “local gains on a fixed architecture”. When the bottleneck comes from limited model capacity or a mismatch between the architecture and deployment constraints, the improvement will be limited.

3.4.2. Network Architecture Optimization

Architecture evolution better matches the real needs of UAV inspection. In this setting, there are many equipment types, object scales vary widely, and edge-side constraints are strict. As a result, architecture-level automatic design is urgently needed. However, architecture-level EDL is still relatively rare in the power inspection literature. The CPSOTJUTT model proposed by Lv et al. [202] is one of the few works that attempts to automatically design a DNN architecture for power-system inspection. It is therefore worth describing in more detail and summarizing its insights for inspection scenarios.

The key idea of this method is to treat deep network architecture design as a high-dimensional, non-convex combinatorial optimization problem. The optimization procedure is as follows:

Defining the Search Space
The search space defines the types of networks the algorithm can generate. This study adopts a block-based binary encoding scheme:
- Node Definition: A fixed number of convolutional layer blocks are established.
- Connection Encoding: Connections between layers are optimized using binary bits. A “1” indicates that the current layer receives output from the previous layer, while a “0” means there is no connection.
- Backend Structure: A unified backend is preset based on the task: a Fully Connected (FC) layer and SoftMax layer for classification, or RPN/RoI pooling layers for object detection.
Setting the Fitness Function
The fitness function evaluates the quality of a generated architecture. This study minimizes Cross-Entropy Loss to assess performance.
Detailed Optimization Pipeline
- Step 1: Architecture Evolution (Bottom Layer)
  –
  A population is initialized using a Genetic Algorithm (GA).
  –
  Crossover and Mutation operations are performed to generate new architectures.
  –
  For each architecture, Consensus-based Particle Swarm Optimization (CPSO) is used for preliminary training. This allows for rapid fitness evaluation to select the optimal architecture (CPGA-DNN).
- Step 2: Weight Refinement (Middle Layer)
  For the optimal architecture selected in Step 1, the CPSOTJUTT three-stage algorithm is initiated to optimize the DNN weight parameters (w):
  –
  Stage I: Exploration and Consensus
  ∗
  Goal: Optimize the initial search area for weights.
  ∗
  Process: PSO is run until particles reach a "consensus," locking onto candidate regions containing high-quality Local Optimal Solutions (LOS).
  ∗
  Algorithm: CPSO combined with Mini-batch K-Means clustering. Clustering is used to reduce the computational burden and select representative starting points.
  –
  Stage II: Robust Convergence
  ∗
  Goal: Optimize the convergence path of the weights.
  ∗
  Process: DNN training is mapped to a dynamical system. This ensures weights converge robustly and quickly from the initial point to a high-quality Stable Equilibrium Point (SEP), solving the issue of sensitivity to initial values.
  ∗
  Algorithm: Trajectory Unification (TJU) technique combined with local solvers (e.g., Adam or SGD).
  –
  Stage III: Search Optimal
  ∗
  Goal: Optimize the quality of the local optima.
  ∗
  Process: By finding “exit points” on the stability boundary, the algorithm actively escapes the current local optimal region. It searches along the gradient direction for other weight combinations (SEPs) in neighboring regions that offer better performance, resulting in multiple high-quality sub-models.
  ∗
  Algorithm: TRUST-TECH method.
- Step 3: Ensemble Construction (Top Layer)
  The multiple high-quality sub-models obtained in Step 2 serve as “hidden nodes.” The CPSOTJUTT algorithm is applied to optimize the ensemble weights ( $σ$ ) and fusion strategies:
  –
  Classification Tasks: Optimize the combination weights $σ$ for each sub-model in the final decision.
  –
  Detection Tasks: Optimize parameters for Weighted Bounding Box Fusion, including coordinate updates and confidence calculations.
  –
  Objective: To fully leverage the strengths of different sub-models, improving generalization and final accuracy in complex inspection environments.
Experimental Validation and Application
- Datasets: The method was tested on public datasets (MNIST, CIFAR-10, CIFAR-100) and three self-constructed power system inspection datasets: Insulator Defects (PLIID), Substation Equipment Defects (PSSID), and Transmission Line Obstacles (PLOID).
- Key Conclusions:
  –
  Superior Performance: On CIFAR datasets, the automatically designed model (CPGA-DNN) outperformed mainstream models (like ResNet, VGG) and other automated design algorithms in terms of test error, parameter count, and computational cost (GPU days).
  –
  Strong Robustness: Compared to traditional SGD training, CPSOTJUTT enabled 83% of initial points to converge to the optimal region, whereas SGD only achieved 18%.
  –
  Significant Inspection Results: In power inspection tasks, the ensemble model significantly improved recognition accuracy in complex scenes (e.g., smoke, foreign objects, equipment damage), performing particularly well on class-imbalanced datasets.

CPSOTJUTT is an important step in power-system inspection research, moving from “parameter-level optimization” toward “architecture-level automatic design”. It offers a useful technical route for building ENAS tailored to UAV-based power inspection. However, its search space targets general DNN architectures rather than detector architectures. It also does not include modern modules such as residual connections and attention mechanisms. Even so, the main ideas can be transferred to UAV image object detection frameworks. For example, the “macro-architecture” and “module choices” of detectors such as YOLO and CenterNet can be treated as evolutionary variables. These variables may include backbone depth and width such as the number of layers and channels per layer, residual or bottleneck block types such as standard convolution, depthwise separable convolution, and Ghost modules, feature fusion paths such as FPN, PAN, and BiFPN and their connection patterns, the number of detection heads and their scales, and the configuration of attention modules such as whether to add CBAM, CoordAtt, or SimAM.

In addition, CPSOTJUTT does not include real hardware latency or power usage in the fitness function, so the resulting architectures may be hard to deploy. To address this, the evolutionary algorithm can add constraints such as parameter count, FLOPs, and an inference-latency limit on the target edge platform. This way, it can automatically search for the best architecture that satisfies “maximize small-object detection performance under limited compute resources”. It also helps overcome a common weakness in lightweight design, where modules are replaced by experience and global trade-offs are difficult.

Finally, CPSOTJUTT does not deal with the search-cost issue. If evolutionary architecture search is evaluated by full training each candidate, the cost will be extremely high. Cost-saving strategies are therefore needed, such as weight inheritance, early stopping, and surrogate models. For future work, two gaps should be filled: building a detector-oriented search space, and defining an edge-side multiobjective fitness function. Only then can CPSOTJUTT-like methods be truly deployed in UAV inspection.

3.4.3. Comparative Analysis of EDL Models for Power Inspection

Table 7 provides a comparative analysis of these neural network optimization methods. From the perspective of optimization targets, existing EDL studies can be roughly grouped into three categories: weight-level, hyperparameter-level and architecture-level optimization. Weight-level optimization (e.g., [189,190,191,194,195,196]) mostly focuses on fault diagnosis for equipment such as transformers and instrument transformers. These methods mainly handle 1D time-series or frequency-domain data, and they aim to improve diagnostic accuracy and stability. They are usually run offline, so real-time constraints are not strict. Hyperparameter-level optimization covers a broader range. It includes 1D CNN kernel settings [193], BiGRU structural parameters [196], joint optimization of feature selection and classifier parameters [197], and hyperparameter tuning for object detectors such as YOLOX and YOLO [200,201]. Compared with weight-level work, the latter two types act directly on UAV inspection image-based object detection tasks, so they are more comparable and have higher engineering value. Architecture-level optimization has only been explored in a few studies. A typical example is CPSOTJUTT proposed by Lv et al. [202], which reflects the trend toward automatic architecture design.

From the task-scenario viewpoint, signal-based fault diagnosis EDL work often targets cases such as transformer oil dissolved gas analysis and current or voltage waveforms. The models in these settings are relatively compact, and compute pressure is limited. In contrast, image and video inspection tasks must process high-resolution inputs with multiple objects and complex backgrounds. Therefore, they place much higher demands on model design and inference efficiency. The studies by Jiang et al. [200] and Stefenon et al. [201] show that combining PSO or GA with YOLO-series detectors can significantly reduce manual tuning effort and improve detection performance, while the overall network architecture is kept largely unchanged. This is an effective path for introducing EDL into UAV inspection vision models. Still, these methods mainly focus on hyperparameter optimization, and they have not fully exploited the potential of architecture search for lightweight design and multi-scenario adaptability.

From the optimization-objective dimension, most existing EDL research is still driven by a single objective, such as “maximize accuracy” or “minimize error”. Deployment-related metrics, such as model complexity and inference latency, are less often included. This differs from the needs of UAV-based power inspection, where real-time response and energy consumption are highly sensitive. It suggests that future EDL should adopt multiobjective or constrained optimization more often, and it should unify the three factors of accuracy, speed, and resource usage within one evolutionary framework. In this way, the resulting models can be deployed on onboard platforms in practice. Although CPSOTJUTT [202] has not been directly applied to image object detection, it provides an important insight for building model sets that fit different inspection tasks and platforms. Multiple complementary architectures can be obtained through evolutionary search, and then they can be selected or ensembled dynamically based on field requirements, which improves overall system robustness.

Overall, EDL-based power inspection research has reached a certain scale in parameter optimization, especially for transformer fault diagnosis and transmission-line signal analysis. However, EDL for UAV power inspection is still at an early stage. Current work mainly includes evolutionary hyperparameter optimization for YOLO-series models [200,201] and a small number of general DNN architecture design attempts [202]. Compared with the many manually designed one-stage, two-stage, and lightweight detectors, introducing EDL offers a new way to address three key challenges: small-object detection, adaptation to complex environments and edge-deployment constraints. On the one hand, global search and multiobjective optimization reduce reliance on expert experience and manual trial-and-error. On the other hand, architecture-level evolution lays the foundation for customized object detection tailored to specific power inspection scenarios.

3.4.4. EDL Solutions for UAV Power Inspection Challenges

Based on the earlier analysis of power inspection studies using traditional object detection models and EDL, the key challenges can be grouped into three types: small objects that are hard to detect, weak adaptation to complex environments, and limited resources for edge deployment. To address these issues, EDL offers targeted benefits at two levels: parameter optimization and architecture search.

Multi-scale Architecture Adaptive Optimization for Small Object Detection
Power equipment defects such as insulator cracks often have widths smaller than 5 pixels, and corona discharge may cover less than 0.1% of an image. This makes detection highly sensitive to how the feature extractor sets the receptive field. Traditional methods often rely on a manually designed FPN to reduce this problem. However, hyperparameters such as kernel size, depth, channel width, and skip connection patterns depend heavily on expert experience. They also do not cover the full search space, so the design does not transfer well across different inspection scenarios.
For small and slender objects, a more effective EDL path is to co-optimize architecture and training strategy. At the architecture level, deformable convolution can be introduced to handle shape changes, and dilated convolution with different dilation rates can capture features at different scales. Evolution can also be applied directly to the multi-scale feature fusion topology, such as how FPN connections are built, how many fusion layers are used, and whether cross scale skip links are added. This improves the flow and fidelity of high resolution information. In addition, the resolution and number of branches in the detection head can be searched. For example, adding a higher resolution branch like P2 may increase recall for small objects. Attention is not inserted everywhere by default. Instead, evolution can decide the location and type of attention module. For example, lightweight attention may be placed only on high resolution layers to avoid large latency increases on edge devices. For slender objects such as fittings and foreign objects on conductors, anchor based or anchor free settings can also be included in the search variables. The anchor size set and aspect ratio set can be optimized together, so the matching process better fits the object shapes. Overall, architecture level EDL can automatically combine lightweight backbones, enhanced FPN or SPP, cross scale attention, and small object focused detection heads. This avoids manual tuning of a single component while ignoring global interactions, and it helps build an end-to-end small-object-friendly architecture for complex transmission scenarios. At the training level, EDL can search for data augmentation combinations that benefit small objects, such as random crop ratios and haze synthesis strength. This increases how often small objects appear and improves the share of effective positive samples. It can also tune loss function hyperparameters, such as Focal style weights and IoU loss settings, to reduce training instability caused by scarce positive samples.
Robust Architecture Evolution for Complex Environments
Power inspection scenarios face challenges such as extreme lighting conditions (overexposure from strong backlighting, noise in nighttime infrared imaging) and severe occlusion (trees blocking lines, fog reducing visibility). Traditional methods often add special data augmentation or attention modules to improve robustness against rain, fog, backlight and occlusion. However, the combination of these techniques (such as attention module insertion positions and augmentation strategy intensity) creates a vast design space, making manual tuning inefficient and prone to local optima.
EDL can include metrics such as mAP and false positive rate on specific subsets like fog, glare, and complex background into the fitness function. During evolution, the algorithm is guided to care about both overall performance and performance in difficult conditions. As a result, the system can automatically find more robust hyperparameter and architecture combinations. For example, increasing the weight of foggy or high reflection samples during evolution can push the model toward stronger deep semantic features and better local attention fusion. This helps reduce feature confusion at its root. If visible light data and infrared or ultraviolet data are available, multi-modal fusion networks can also be designed automatically. For visible infrared ultraviolet multi-source data, the evolutionary search can optimize the depth of each modality branch, the fusion layer positions, and feature alignment strategies. Data augmentation can also be optimized further by treating offline augmentation such as geometric transforms and color jitter, together with online augmentation such as adversarial sample generation, as evolution variables. This improves generalization.
Resource-Constrained Optimization for Edge Deployment
The computing power limitations of UAV platforms (Jetson Xavier NX with only 31 TOPS), memory bottlenecks (4 GB), and power consumption requirements (15 W) fundamentally contradict high-precision detection needs. Therefore, for “edge deployment resource constraints”, this is not a problem that can be solved by a single metric. Instead, it becomes a multi-objective trade-off among accuracy, recall, latency, VRAM usage, parameter count, and FLOPs. This setting is especially well suited for using EDL to return a set of pareto-optimal solutions rather than one model that is “best on average”. Compared with traditional lightweight design, which often relies on researchers manually choosing modules like MobileNet, RepVGG, or Ghost, EDL can automatically generate a deployable family of models. It can also treat compression and deployment-related hyperparameters as evolution variables and search them jointly, such as pruning rate and pruning location, quantization bit-width, distillation temperature and loss weights, and even engineering factors like the ratio of TensorRT-fusible operators, input resolution, and batch strategy. This makes the resulting Pareto front better matched to a specific hardware target, and researchers can directly pick the most suitable deployment option from the Pareto set for each scenario.
EDL can also be combined with cloud–edge collaboration architectures [4,174], so that the “Pareto set from multi-objective evolution” becomes a practical end-to-end deployment workflow. On the cloud side, where compute is abundant, EDL runs large-scale evolutionary search and training and builds a hierarchical model family for different inspection scenarios. Within one family, the Pareto frontier spans from high-performance models to ultra-light models, offering optimal trade-offs across accuracy, recall, end-to-end latency, peak VRAM, model size, and energy use. To improve deployability, the cloud stores not only architectures and weights, but also key deployment metadata, including measured inference time, peak VRAM, and power curves on different hardware platforms, as well as inference-engine settings such as operator fusion ratios and input-resolution configurations. When an edge device connects, the cloud selects the best-fit lightweight model from the Pareto set based on hardware constraints, such as GPU type, available memory, power limits, and target FPS, together with task requirements like minimum detection accuracy and maximum allowed latency, and then delivers it to the device. During real deployment, the edge device can adapt within the same model family through a lightweight model-switching mechanism driven by battery level, thermal status, and task priority; for example, a higher-accuracy model can be used when power is sufficient, and an ultra-light model can be selected automatically under low-battery conditions to extend flight time. Meanwhile, new samples and performance feedback collected during inspection are periodically uploaded to the cloud for continuous evolutionary optimization. Based on this feedback, the cloud updates and re-evaluates the objective setup, iterates to produce a new Pareto model family that better reflects real deployment conditions, and pushes updates back to the edge to complete a closed-loop upgrade. In this way, a cloud-evolution design stage, efficient edge inference, and feedback-driven improvement form a closed-loop collaborative system, allowing model performance and deployment efficiency to keep improving as scenarios and hardware states change.

4. Discussion

Although EDL shows clear promise, applying it to UAV power inspection still faces major obstacles. Existing work has not discussed several key aspects in depth:

Trade off between computational cost and performance gain
EDL is inherently compute intensive. Many studies report accuracy gains after evolution, but few clearly disclose the extra resources needed to obtain them, such as GPU hours and wall clock time. In real inspection projects, if training cost grows exponentially while performance improves only slightly, the return on investment becomes questionable. Future work should define explicit metrics to quantify the ratio between “evolution cost” and “performance benefit”.
Contradiction between offline evolution and online adaptation
Almost all current EDL work is conducted offline—performing searches on ground servers or in the cloud, then deploying the optimal model to the onboard platform. Conducting complete evolutionary searches on the onboard platform is usually impractical unless extremely small populations, very short training epochs, or surrogate model evaluation are adopted, which sacrifices search reliability. However, power inspection environments are complex and variable, making it difficult for offline-evolved models to adapt to unseen environmental distributions (Out-of-Distribution). True “intelligent” inspection should possess online evolution or lifelong learning capabilities, where UAVs can fine-tune models based on new data during flight. Limited by onboard computing power (such as Jetson Xavier/Orin), current evolutionary algorithms are difficult to run in real-time on edge devices. How to design lightweight, hardware-constraint-driven online evolution strategies remains a gap in this field.
Single-Objective Fitness Function Design and multiobjective Conflicts
The fitness function guides the direction of evolution. UAV inspection objectives are not simply to maximize mAP, but rather represent a typical multiobjective constrained problem. An excellent inspection model requires not only high accuracy but also low latency, low energy consumption, and small model size. Evolution that ignores hardware constraints often leads search results toward complex models that cannot actually run on UAVs. In existing research, most work still uses accuracy metrics as the single fitness or primary optimization objective. Even among the few studies claiming to balance efficiency, many only use Params or FLOPs as efficiency proxies. However, on embedded platforms, real latency and energy consumption depend more strongly on operator types, memory access, and inference engine optimization levels. Therefore, “hardware-aware” fitness should directly include measurements on target devices or calibrated surrogate models, rather than approximating with FLOPs alone. Current research lacks attempts to incorporate hardware-aware metrics into fitness functions.
Computational Bottlenecks and Algorithm Innovation Requirements for EDL
EDL provides a novel way to solve complex optimization problems, but practical deployment remains difficult. From a compute perspective, evolutionary computation needs many iterations to evaluate candidates, while deep learning training already consumes substantial compute. When combined, resource demand can grow rapidly and training efficiency is heavily constrained. From an algorithm design perspective, many UAV power inspection studies still rely on classic methods such as PSO and GWO. Advanced multi-objective optimization methods with strong pareto front search ability are not well integrated. The balance between exploration (global search) and exploitation (local refinement) directly affects whether high quality solutions can be found within a limited number of iterations. Future work can focus on algorithm level innovation in EDL. For example, surrogate predictors based on radial basis functions, Gaussian process regression, or neural networks can replace expensive full training during evaluation. Decomposition strategies may be used to convert a multi-objective problem into multiple single objective subproblems. In addition, adaptive evolutionary operators can be designed: mutation rates can be adjusted based on population diversity, with higher mutation early to encourage exploration and lower mutation later for fine exploitation.

In summary, EDL-based power inspection has achieved progress in parameter optimization, but it is still limited by compute bottlenecks and overly narrow optimization targets. Key future directions include shifting from simple hyperparameter tuning to fully automated NAS, moving from offline training to cloud edge collaborative online evolution, and building a multi-dimensional fitness framework that jointly considers energy, latency, and accuracy. Only by critically examining and resolving these deep-level contradictions in hardware-software co-design can evolutionary deep learning truly empower intelligent UAV power inspection.

5. Challenges and Future Development Trends

As elaborated in Section 3, with the development of DL-based object detection models, the use of DL models for power inspection has become increasingly popular. The performance of DL-based power inspection models depends mainly on the quality of image acquisition and the efficacy of the object detection algorithm. Many researchers have successfully implemented the idea of using EC to improve DL models and have made good progress. From the investigations and discussions presented in Section 3, we can identify the primary technical challenges associated with optimizing DL using EC and applying EDL in UAV power inspection. In addition, the research in this field lacks systematicity, and many issues are worthy of further research and exploration in the future. Therefore, this section focuses on the challenges faced in the field of power inspection based on the EDL and discusses some forward-looking research directions to advance a more profound and comprehensive research on the EDL.

5.1. Automated EDL Search for Robust Architectures in Complex Environments

The practical application of UAV power inspection technology faces multiple technical challenges. During the image acquisition process, the complex and variable background environment, the small size of the object equipment (which may occupy only a minuscule portion of the image), and the recurrent occlusion problems significantly affect the detection accuracy. Moreover, variable lighting conditions and diverse weather interference further increase the detection difficulty. To address object recognition and small object detection problems in complex scenarios, researchers have enhanced feature extraction ability mainly by adding attention mechanisms or improving feature fusion networks. For example, Li et al. [36] proposed a Global Information Aggregation Module (GIAM). It uses a Pyramid Pooling Module (PPM) to model global context and suppress background interference. Zhang et al. [153] proposed a densely connected FPN and embedded the SENet visual attention mechanism to improve feature effectiveness. However, the attention type, how modules are combined, and where they are inserted depend strongly on the data distribution and target scales. Manual trial and error rarely reaches the best configuration. Therefore, several forward-looking directions should be highlighted:

•: Automated search and composition of attention mechanisms: EDL can place attention choices into a unified search space, including attention types (channel attention, spatial attention, self attention), insertion positions (different stages of the backbone, different levels of the feature pyramid), and module hyperparameters (such as reduction ratio). With multi-objective evolutionary algorithms such as NSGA-III and SMS-EMOA, attention configurations can be explored automatically to balance background suppression and computational efficiency in power inspection scenes. In this way, stronger saliency focus and better robustness to occlusion and complex backgrounds can be achieved.
•: Adaptive optimization of fusion topology: EDL can be used to automatically design the connection patterns of feature fusion networks, such as FPN and PANet variants. The goal is to balance semantic richness in multi-scale features with localization accuracy.
•: Co-optimized multimodal fusion strategies: Because single optical imaging has clear limits under challenging conditions, multimodal approaches such as visible infrared ultraviolet fusion are becoming increasingly important [43]. EDL can automatically design cross modal fusion topologies, including early fusion, late fusion, or hybrid strategies. It can also choose alignment mechanisms, such as feature level alignment or decision level alignment.

5.2. EDL-Driven Few-Shot Learning and Data Augmentation

The application of DL in power inspection faces the challenge of scarce public datasets, which is reflected mainly in two aspects. First, the number of images in existing datasets is insufficient to meet the training requirements of deep models. Second, the lack of data diversity makes it impossible to cover the complete detection scenarios of power line components (such as insulators and conductors). Existing datasets generally have problems of insufficient sample size and class imbalance, which seriously affect the robustness and detection accuracy of the model. To address these issues, current research focuses mainly on the following directions: First, data augmentation techniques are implemented, including basic methods such as geometric transformations (rotation and scaling), illumination adjustment, and noise injection [161], as well as advanced processing methods such as adaptive threshold binarization and median filtering [203]. Second, transfer learning strategies are leveraged, where pretraining is conducted on general datasets (such as ImageNet) and then fine-tuned for specific scenarios. Additionally, few-shot learning and meta-learning techniques can effectively enhance the model’s ability to extract patterns from limited samples, and when combined with transfer learning, they can further alleviate the problem of insufficient data. Third, generative adversarial networks (GANs) can generate many realistic defect images from a single sample. Finally, the fusion of multimodal data such as infrared thermography and vibration signals has become a new approach for improving the recognition rate of small samples. However, EDL can further turn the “data scarcity problem” into something that can be optimized:

•: Automated search for augmentation policies: EDL can treat operations such as geometric transforms, illumination perturbations, noise injection, and blur or degradation, together with their magnitudes, as searchable genes. A genetic algorithm can then be used to find augmentation policies that best fit power inspection scenarios.
•: Synergy between meta learning and evolutionary algorithms: EDL can search for the best meta learner architecture and hyperparameter settings through evolutionary strategies. For example, under the MAML (Model-Agnostic Meta-Learning) framework, EDL can automatically search the initialization scheme, the number of inner loop update steps, and the task sampling distribution. As a result, the model can still adapt quickly to new defect types even when only a small number of power fault samples are available.
•: Evolutionary optimization of generative model architectures: EDL can be used to automatically design generative models such as GANs to synthesize realistic defect samples under resource constraints. At the same time, multi-objective evolutionary algorithms can balance generation quality and computational cost.

5.3. Multiobjective Evolutionary Search for Lightweight Models

Deploying DL models in embedded systems and edge computing devices faces severe resource constraint challenges, which are particularly prominent in applications with strict requirements for real-time performance and energy efficiency, such as UAV power inspection. Currently, the academic community mainly adopts two technical routes: one is to enhance computational efficiency by optimizing the backbone network or feature fusion network of object detection models, but this often leads to a degradation of model accuracy; the other is to use model compression techniques (such as pruning, quantization, and distillation) to reduce computational load and memory usage. However, the methods above have inherent limitations. On the one hand, manual architecture level tweaks often reduce accuracy. On the other hand, in specialized domains such as power inspection, model compression is hard to engineer. It usually requires domain experts to spend substantial effort on hyperparameter tuning. Therefore, future research can leverage EDL as a systematic framework for model lightweighting under resource constraints:

•: Multiobjective co-optimization: Build the fitness function with detection accuracy (mAP), inference latency (to meet the 30 FPS real time target), energy consumption (to extend flight time), and model size (to fit edge chip memory). Multiobjective evolutionary algorithms can then output a Pareto set, so deployment can choose the best trade-off for a given platform.
•: Hardware aware architecture search: EDL can embed latency and energy models of a specific hardware platform into the fitness function, and then search architectures that perform best on the target device. For example, a GPU-friendly design may include more parallel convolution layers, while an NPU oriented design may prefer regular tensor operators. This kind of hardware customization is difficult to achieve with generic compression methods.
•: Joint optimization of structure and compression: Treat pruning ratio, quantization bit width, and distillation temperature as evolvable variables, and optimize them together with network depth and width, operator types, and FPN topology. In this way, uncontrolled accuracy loss caused by post hoc compression can be avoided.
•: Injecting domain knowledge into the search space: Because power inspection has specific patterns, such as small objects and long, thin structures, EDL can incorporate priors into the search space. For instance, high resolution feature layers can be kept, and convolution kernels with elongated receptive fields can be enforced. This helps the search maintain key detection capability while still improving efficiency.

5.4. Cross-Modal Architecture Search for Multimodal Data Fusion

Most current research on power inspection focuses on the optical imaging domain, with little exploration of other imaging modes. Visible spectrum cameras have inherent limitations in complex environmental conditions and are ineffective at detecting certain specific types of power faults. Existing studies have shown that infrared and ultraviolet imaging technologies exhibit unique complementary advantages in detecting power equipment faults: infrared imaging can accurately capture abnormal thermal distributions caused by leakage currents, whereas ultraviolet imaging is adept at identifying corona discharge phenomena. By fusing multiple imaging modes such as optical, infrared, and ultraviolet, the limitations of single technologies can be overcome, and more comprehensive detection information can be obtained. However, the application of multimodal technologies in power inspection still faces many challenges. First, the technology for the collaborative analysis of multisource data is not yet mature, and cross-modal alignment is very difficult, especially in terms of spatial and temporal synchronization. For example, when both ultraviolet and infrared imaging are used in UAV inspection, it is necessary to precisely align the location information of corona discharge hotspots and thermal anomaly regions. Second, the design of existing DL models focuses mainly on a single sensor modality and often ignores the complementary information between different modalities, making it difficult to effectively integrate heterogeneous data features and performing poorly in the face of missing modalities or interactive tasks. To address the issues above, EDL can be used to automatically design cross modal fusion topologies:

•

Automated search for fusion topology: EDL can encode the fusion topology as an evolvable genotype and then explore the best fusion strategy across network stages for multimodal data, including:

–: Fusion level: at which stages modalities interact (shallow, middle, or deep), and whether multi-level fusion is needed (for example, fusing on multiple feature maps).
–: Fusion operators: search combinations of operators such as element wise addition, concatenation (Concatenation), attention weighted fusion (learning modality importance), gating, Transformer style cross attention, and tensor fusion.
–: Alignment mechanisms: include Spatial Transformer Networks (STNs), temporal alignment modules, and related components in the search space. Alignment network parameters can then be adjusted automatically by evolutionary algorithms. With multiobjective optimization, an optimal fusion topology for power inspection scenarios can be discovered automatically.

•

Unified trade off with lightweight design: Multimodality introduces extra branches and compute. EDL can include decisions such as whether to enable a modality, when to enable it, and at what resolution to use it in a multi-objective search. This helps find the best compromise under the given task and platform constraints.

•

Hardware co-designed fusion optimization: Multimodal processing creates even stronger compute pressure. During architecture search, EDL can consider heterogeneous processors and assign workloads automatically to the best units, such as using the CPU for optical features, the NPU for infrared features, and a DSP for feature alignment. In this way, end-to-end software hardware co-optimization can be achieved.

5.5. Federated Evolutionary Learning with Distributed Collaboration and Privacy Protection

As UAV swarm inspection becomes a mainstream solution in power systems, a distributed scenario is emerging in which multiple UAVs cooperatively cover transmission lines. This shift brings both new challenges and new opportunities. In the traditional centralized training paradigm, images collected by each UAV must be uploaded to the cloud for processing, and three difficulties arise. First, bandwidth is limited: a single high resolution inspection image can be up to 10 MB, so transmitting large scale data consumes substantial communication resources, which is especially problematic in remote mountain areas or during cross region missions. Second, privacy risks increase because inspection data may contain sensitive information such as grid topology, equipment health status, and fault distribution; centralized storage and transmission make data leakage and attacks more likely. Third, timeliness becomes a bottleneck, since the cloud training and redeployment loop is long, and it is difficult to keep up with fast environmental changes along transmission corridors, such as seasonal vegetation growth or equipment condition evolution under extreme weather.

By combining EDL with the privacy preserving mechanism of federated learning, a Federated Evolutionary Learning (FEL) framework can be built to address these issues. In FEL, each UAV acts as a client and performs NAS and fitness evaluation on local data. Instead of sending raw images or full model parameters, the client periodically uploads only the fitness information of candidate architectures, such as mAP, inference latency, and energy consumption, together with architecture metadata such as layer configurations and operator types, to a central server or an edge base station. After fitness feedback from multiple UAVs is aggregated, a new generation of architecture populations is produced on the server side using evolutionary operators, and it is then distributed back to the clients for continued local evaluation and optimization. In this way, “distributed NAS and optimization” across UAVs can be achieved without uploading raw data, and even without transmitting full model parameters.

5.6. EDL Co-Design for Emerging Hardware

5.6.1. Neural Architecture Adaptation for Event Cameras

As a bionic visual sensor, the event camera, through a pixel-level brightness change detection mechanism, only asynchronously outputs light intensity change events that exceed the threshold. This characteristic gives it unique advantages in the field of power inspection. Compared with traditional cameras, the application of an event camera for UAV power inspection has the following advantages: (1) It reduces the computational complexity and power consumption levels and extends the flight time of a UAV. A frame camera must process each frame of the input image, and the high-speed ordinary camera generates at least hundreds of pictures per second; thus, a large amount of data requires high numbers of chip computations, and the power consumption rate is also high. The pixels of an event camera respond only to changes in the local environment, automatically filter out or ignore the invalid information contained in the environment, and generate only a few hundred kilobytes of data per second. (2) In terms of improving environmental adaptability, an event camera has a large dynamic range, up to 140 dB, which far exceeds that of an ordinary camera (60 dB), and each pixel can respond to extremely low-light or extremely high-light environments so that the clarity of the acquired image can be maintained under extreme light intensity changes. In addition, the high temporal resolution of an event camera means that the event stream is not affected by motion blur, as traditional images are. Therefore, event-based UAVs can provide reliable visual information in extremely light environments and highly dynamic environments. (3) With respect to improving air flight safety, a traditional frame camera outputs frame images one by one with a delayed response speed, whereas an event camera has a very high recognition frequency; thus, it can quickly perceive high-speed moving objects and avoid high-speed dynamic obstacles to ensure flight safety.

However, traditional vision algorithms cannot be used with event cameras, mainly because event camera images are asynchronous and contain no intensity information (only binary intensity changes). Traditional camera images are frame-based time series, whereas event cameras generate streams of events. Conventional CNNs cannot be applied directly to event-camera data. They usually need temporal models such as RNNs or LSTMs, or spiking neural networks (SNNs), to handle discrete event streams. EDL offers an architecture-optimization framework for event-based processing, helping fully exploit the strengths of event-driven sensing in the following ways:

•: Evolutionary optimization of SNNs: SNNs are a good fit for event cameras because both are event-driven, and SNNs can run at very low power. However, the design space is more complex. It includes neuron models (e.g., LIF, Izhikevich), spike encoding schemes, and synaptic connectivity patterns. With neural architecture search, EDL can automatically design SNN topologies and tune hyperparameters such as depth, neuron count, and time constants. Combined with training methods like surrogate gradients, accuracy and energy efficiency can be optimized together.
•: Co-design of frame–event hybrid architectures: In some scenarios, combining a standard frame camera with an event camera provides complementary benefits. Frame images offer rich texture and color, while event streams provide high-temporal-resolution motion cues. EDL can automatically design frame–event fusion architectures, including: (1) two-stream network topology (how and where to fuse frame features and event features across layers); (2) attention mechanisms (weights between the two streams are adjusted based on scene dynamics); (3) time-alignment strategies (handling timestamp gaps between asynchronous events and synchronous frames). Using MOEAs, overall latency and power can be minimized while detection accuracy is maintained.
•: Hardware-aware co-optimization: The benefits of event cameras are most visible when deployed on edge devices. EDL can include target-hardware constraints—such as neuromorphic chips (e.g., Intel Loihi, IBM TrueNorth) or FPGAs—during the search, enabling hardware–algorithm co-optimization. For example, to match the sparse connectivity favored by neuromorphic chips, evolutionary search can discover SNN topologies with high sparsity, improving hardware parallelism and energy efficiency.

5.6.2. Architecture-Hardware Co-Evolution for In-Memory Computing Chips

With the rapid development of AI technology, the application demand of deep neural networks in edge computing scenarios such as UAV power inspection is becoming increasingly prominent. Traditional hardware systems based on the von Neumann architecture face high energy consumption and latency problems caused by data transfer. The emergence of a memory-computing integrated architecture offers a new solution to this bottleneck. The memory-computing integrated architecture chips can integrate computing units and storage units, optimize data transmission paths, and significantly reduce system response time while achieving a substantial improvement in energy efficiency. This makes them more suitable for the computing power requirements of cloud computing, Internet of Things (IoT), and AI edge devices. In particular, the introduction of the in-memory integrated architecture, which integrates sensing, storage, and computing functions on a single chip, has been proposed. The advantages of this architecture are reflected in three main aspects. First, eliminating the data transfer process significantly reduces system power consumption, which is crucial for battery-powered UAVs. Second, on-chip processing ensures data privacy and security, making it particularly suitable for inspection tasks involving critical power grid facilities. Finally, the collaborative design of sensing, memory, and computing shortens the system response time to the millisecond level, providing a hardware foundation for UAV autonomous obstacle avoidance and real-time decision-making.

In the future, an important direction for EDL is algorithm–hardware co-search for in-memory computing platforms. Quantization bit-width, sparsity patterns, operator feasibility, on-chip cache limits, and bandwidth constraints can be built into the fitness function. This allows EDL to directly search for detection and fusion networks that are mappable, acceleratable, and robust, and to achieve end-to-end co-optimization from sensor input to on-chip inference. Based on this, future work can explore EDL-driven in-memory computing chips:

•: Hardware-aware architecture search: The physical constraints of emerging memory devices—such as memristor crossbar arrays and phase-change memory (PCM)—can be explicitly modeled as NAS constraints. These constraints may include read/write latency, energy cost, and accuracy loss. Hardware-friendly network architectures are then searched under these limits.
•: Co-optimization of operator mapping: EAs can automatically decide how operators such as convolution, pooling, and activation are mapped onto in-memory computing chips. For example, the search can choose the degree of parallelism and the data tiling or partition strategy.
•: Joint algorithm–hardware optimization: The neural network architecture and low-level hardware settings can be searched at the same time. This includes parameters such as compute-in-memory array size and ADC/DAC precision. In this way, end-to-end co-design is achieved rather than tuning the algorithm and hardware separately.

6. Conclusions

With the improvement of computing hardware performance and the development of big data, DL-based object detection technology has demonstrated unique advantages in UAV power inspection. Its powerful feature extraction capability can effectively identify power equipment defects in complex scenarios. Second, the end-to-end training approach significantly reduces the workload of manual feature engineering. Evolutionary algorithms, a global search methods inspired by biological evolution, have been widely applied in the parameter optimization and architecture search of DL models. Therefore, this review systematically explores the solutions based on EDL for the key challenge in UAV-based intelligent power inspection—achieving accurate and real-time detection with limited computing resources. First, it provides an overview of the basic principles of mainstream deep learning object detection models and EDL technology. Afterward, the collaborative application mechanism of the two in the power inspection scenario is analyzed in detail. Finally, this paper looks forward to the challenges and future development directions of applying EDL to UAV power inspection. For example, the dataset can be expanded through data augmentation techniques, and the object detection network can be improved to enhance detection performance; lightweight strategies can be adopted, and in-memory computing chips can be developed to improve deployment flexibility and reduce power consumption; and event camera technology can be introduced to reduce motion blur and invalid information caused by UAV movement, thereby reducing computational load and enhancing system robustness. The integration of these technical paths will provide an important theoretical basis and technical support for building a new generation of intelligent power inspection systems.

Author Contributions

Methodology, S.F.; software, S.F.; validation, S.F. and B.C.; formal analysis, S.F.; investigation, S.F.; writing—original draft preparation, S.F.; writing—review and editing, S.F.; visualization, S.F.; supervision, B.C.; project administration, B.C.; funding acquisition, B.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (NSFC) (Grant No. 62473129), the Natural Science Fund of Hebei Province for Distinguished Young Scholars (Grant No. F2021202010), the Science and Technology Project of Hebei Education Department (Grant No. JZX2023007) and the Interdisciplinary Graduate Student Training Project of Hebei University of Technology (Grant No. HEBUT-Y-XKJC-2022116). The APC was funded by the same grant.

Data Availability Statement

No numerical data is used in the present study. The reported values are synthesized from the literature.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zheng, H.; Sun, Y.; Liu, X.; Djike, C.L.T.; Li, J.; Liu, Y.; Ma, J.; Xu, K.; Zhang, C. Infrared image detection of substation insulators using an improved fusion single shot multibox detector. IEEE Trans. Power Deliv. 2020, 36, 3351–3359. [Google Scholar] [CrossRef]
She, L.; Fan, Y.; Wang, J.; Cai, L.; Xue, J.; Xu, M. Insulator Surface Breakage Recognition Based on Multiscale Residual Neural Network. IEEE Trans. Instrum. Meas. 2021, 70, 3524309. [Google Scholar] [CrossRef]
Zheng, H.; Cui, Y.; Yang, W.; Li, J.; Ji, L.; Ping, Y.; Hu, S.; Chen, X. An Infrared Image Detection Method of Substation Equipment Combining Iresgroup Structure and CenterNet. IEEE Trans. Power Deliv. 2022, 37, 4757–4765. [Google Scholar] [CrossRef]
Deng, F.; Zeng, Z.; Mao, W.; Wei, B.; Li, Z. A Novel Transmission Line Defect Detection Method Based on Adaptive Federated Learning. IEEE Trans. Instrum. Meas. 2023, 72, 3508412. [Google Scholar] [CrossRef]
Yang, L.; Kong, S.; Deng, J.; Li, H.; Liu, Y. DRA-Net: A Dual-Branch Residual Attention Network for Pixelwise Power Line Detection. IEEE Trans. Instrum. Meas. 2023, 72, 5010813. [Google Scholar] [CrossRef]
Hosseini, M.M.; Umunnakwe, A.; Parvania, M.; Tasdizen, T. Intelligent Damage Classification and Estimation in Power Distribution Poles Using Unmanned Aerial Vehicles and Convolutional Neural Networks. IEEE Trans. Smart Grid 2020, 11, 3325–3333. [Google Scholar] [CrossRef]
Tao, X.; Zhang, D.; Wang, Z.; Liu, X.; Zhang, H.; Xu, D. Detection of power line insulator defects using aerial images analyzed with convolutional neural networks. IEEE Trans. Syst. Man Cybern. Syst. 2018, 50, 1486–1498. [Google Scholar] [CrossRef]
Liu, X.; Miao, X.; Jiang, H.; Chen, J. Box-point detector: A diagnosis method for insulator faults in power lines using aerial images and convolutional neural networks. IEEE Trans. Power Deliv. 2021, 36, 3765–3773. [Google Scholar] [CrossRef]
Bianchi, D.; Epicoco, N.; Di Ferdinando, M.; Di Gennaro, S.; Pepe, P. Physics-Informed Neural Networks for Unmanned Aerial Vehicle System Estimation. Drones 2024, 8, 716. [Google Scholar] [CrossRef]
Tong, X.; Luo, Z. Transmission line faults classification method based on fuzzy support vector machine and reducing-dimension display. High Volt. Eng. 2015, 41, 2276–2282. [Google Scholar]
Zhou, Q.; Wang, S.; Liao, R.; Sun, C.; Xie, H.; Rao, J. Power transformer fault diagnosis method based on cloud model of AdaBoost algorithm. High Volt. Eng. 2015, 41, 3804–3811. [Google Scholar]
Zhang, D.; Han, X.; Deng, C. Review on the research and practice of deep learning and reinforcement learning in smart grids. CSEE J. Power Energy Syst. 2018, 4, 362–370. [Google Scholar] [CrossRef]
Ou, J.; Wang, J.; Xue, J.; Wang, J.; Zhou, X.; She, L.; Fan, Y. Infrared Image Target Detection of Substation Electrical Equipment Using an Improved Faster R-CNN. IEEE Trans. Power Deliv. 2023, 38, 387–396. [Google Scholar] [CrossRef]
Liu, M.; Li, Z.; Li, Y.; Liu, Y. A Fast and Accurate Method of Power Line Intelligent Inspection Based on Edge Computing. IEEE Trans. Instrum. Meas. 2022, 71, 3506512. [Google Scholar] [CrossRef]
Sirojan, T.; Lu, S.; Phung, B.T.; Zhang, D.; Ambikairajah, E. Sustainable deep learning at grid edge for real-time high impedance fault detection. IEEE Trans. Sustain. Comput. 2018, 7, 346–357. [Google Scholar] [CrossRef]
Jiang, A.; Yan, N.; Shen, B.; Gu, C.; Huang, H.; Zhu, H. Research on Lightweight Method of Image Deep Learning Model for Power Equipment. In Proceedings of the 2021 China International Conference on Electricity Distribution (CICED), Shanghai, China, 7–9 April 2021; IEEE: New York, NY, USA, 2021; pp. 334–337. [Google Scholar]
Wang, B.; Ma, F.; Ge, L.; Ma, H.; Wang, H.; Mohamed, M.A. Icing-EdgeNet: A Pruning Lightweight Edge Intelligent Method of Discriminative Driving Channel for Ice Thickness of Transmission Lines. IEEE Trans. Instrum. Meas. 2021, 70, 2501412. [Google Scholar] [CrossRef]
Elsken, T.; Metzen, J.H.; Hutter, F. Efficient multi-objective neural architecture search via lamarckian evolution. arXiv 2018, arXiv:1804.09081. [Google Scholar]
Xue, Y.; Chen, C.; Słowik, A. Neural architecture search based on a multi-objective evolutionary algorithm with probability stack. IEEE Trans. Evol. Comput. 2023, 27, 778–786. [Google Scholar] [CrossRef]
Zhou, X.; Qin, A.K.; Gong, M.; Tan, K.C. A survey on evolutionary construction of deep neural networks. IEEE Trans. Evol. Comput. 2021, 25, 894–912. [Google Scholar] [CrossRef]
Zhan, Z.H.; Shi, L.; Tan, K.C.; Zhang, J. A survey on evolutionary computation for complex continuous optimization. Artif. Intell. Rev. 2022, 55, 59–110. [Google Scholar] [CrossRef]
Valencia-Rivera, G.H.; Benavides-Robles, M.T.; Morales, A.V.; Amaya, I.; Cruz-Duarte, J.M.; Ortiz-Bayliss, J.C.; Avina-Cervantes, J.G. A systematic review of metaheuristic algorithms in electric power systems optimization. Appl. Soft Comput. 2023, 150, 111047. [Google Scholar] [CrossRef]
Pop, C.B.; Cioara, T.; Anghel, I.; Antal, M.; Chifu, V.R.; Antal, C.; Salomie, I. Review of bio-inspired optimization applications in renewable-powered smart grids: Emerging population-based metaheuristics. Energy Rep. 2022, 8, 11769–11798. [Google Scholar] [CrossRef]
Massaoudi, M.; Abu-Rub, H.; Refaat, S.S.; Chihi, I.; Oueslati, F.S. Deep learning in smart grid technology: A review of recent advancements and future prospects. IEEE Access 2021, 9, 54558–54578. [Google Scholar] [CrossRef]
Vanting, N.B.; Ma, Z.; Jørgensen, B.N. A scoping review of deep neural networks for electric load forecasting. Energy Inform. 2021, 4, 49. [Google Scholar] [CrossRef]
Aguiar-Pérez, J.M.; Pérez-Juárez, M.Á. An insight of deep learning based demand forecasting in smart grids. Sensors 2023, 23, 1467. [Google Scholar] [CrossRef] [PubMed]
Ruan, J.; Liang, G.; Zhao, J.; Zhao, H.; Qiu, J.; Wen, F.; Dong, Z.Y. Deep learning for cybersecurity in smart grids: Review and perspectives. Energy Convers. Econ. 2023, 4, 233–251. [Google Scholar] [CrossRef]
Baldominos, A.; Saez, Y.; Isasi, P. On the automated, evolutionary design of neural networks: Past, present, and future. Neural Comput. Appl. 2020, 32, 519–545. [Google Scholar] [CrossRef]
Kaveh, M.; Mesgari, M.S. Application of meta-heuristic algorithms for training neural networks and deep learning architectures: A comprehensive review. Neural Process. Lett. 2023, 55, 4519–4622. [Google Scholar] [CrossRef] [PubMed]
Mahdi Shariatzadeh, S.; Fathy, M.; Berangi, R.; Shahverdy, M. A Survey on Multi-Objective Neural Architecture Search. arXiv 2023, arXiv:2307.09099. [Google Scholar] [CrossRef]
Poyser, M.; Breckon, T.P. Neural architecture search: A contemporary literature review for computer vision applications. Pattern Recognit. 2024, 147, 110052. [Google Scholar] [CrossRef]
Zhai, Y.; Wu, Y.; Chen, H.; Zhao, X. A method of insulator detection from aerial images. Sens. Transducers 2014, 177, 7–13. [Google Scholar]
Wang, X.; Zhang, Y. Insulator identification from aerial images using support vector machine with background suppression. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 7–10 June 2016; IEEE: New York, NY, USA, 2016; pp. 892–897. [Google Scholar]
Li, Y.; Lu, Y.; Wu, K.; Fang, Y.; Zheng, C.; Zhang, J. Intelligent Inspection System for Power Insulators based on UAV on Complex Weather Conditions. IEEE Trans. Appl. Supercond. 2024, 34. [Google Scholar] [CrossRef]
Zhao, Y.; Wu, J.; Chen, W.; Wang, Z.; Tian, Z.; Yu, F.R.; Leung, V.C. A small object real-time detection method for power line inspection in low-illuminance environments. IEEE Trans. Emerg. Top. Comput. Intell. 2024, 8, 3936–3950. [Google Scholar] [CrossRef]
Li, J.; Xu, Y.; Nie, K.; Cao, B.; Zuo, S.; Zhu, J. PEDNet: A Lightweight Detection Network of Power Equipment in Infrared Image Based on YOLOv4-Tiny. IEEE Trans. Instrum. Meas. 2023, 72, 5004312. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Zhu, S.; Li, Q.; Zhao, J.; Zhang, C.; Zhao, G.; Li, L.; Chen, Z.; Chen, Y. A deep-learning-based method for extracting an arbitrary number of individual power lines from uav-mounted laser scanning point clouds. Remote Sens. 2024, 16, 393. [Google Scholar] [CrossRef]
Davari, N.; Akbarizadeh, G.; Mashhour, E. Intelligent diagnosis of incipient fault in power distribution lines based on corona detection in UV-visible videos. IEEE Trans. Power Deliv. 2020, 36, 3640–3648. [Google Scholar] [CrossRef]
Uckol, H.I.; Ilhan, S. DC corona discharge mode identification based on the visible light images via the YOLOv8. IEEE Trans. Instrum. Meas. 2024, 73, 5019910. [Google Scholar] [CrossRef]
Han, R.; Liu, L.; Liu, S.; Jiang, P.; Han, Y.; Yang, Z. Research and application of substation intelligent inspection technology based on multi spectral image recognition. In Proceedings of the 2020 IEEE International Conference on High Voltage Engineering and Application (ICHVE), Beijing, China, 6–10 September 2020; IEEE: New York, NY, USA, 2020; pp. 1–4. [Google Scholar]
Yang, Y.; Yang, N.; Li, L.; Gao, F. Defect diagnosis technology based on multi-spectral point cloud. In Proceedings of the 2021 IEEE Electrical Insulation Conference (EIC), Denver, CO, USA, 7–28 June 2021; IEEE: New York, NY, USA, 2021; pp. 643–646. [Google Scholar]
Zhao, L.; Li, T.; Yang, Y.; Zhao, L.; Xie, Z.; Jiang, K. Experiment and Research on Multi-Spectral Detection of Composite Insulators’ Typical Defects under the Perspective of Unmanned Aerial Vehicle. In Proceedings of the 2022 IEEE International Conference on High Voltage Engineering and Applications (ICHVE), Chongqing, China, 25–29 September 2022; IEEE: New York, NY, USA, 2022; pp. 1–6. [Google Scholar]
Zhan, Z.H.; Li, J.Y.; Zhang, J. Evolutionary deep learning: A survey. Neurocomputing 2022, 483, 42–58. [Google Scholar] [CrossRef]
Holland, J.H. Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence; MIT Press: Cambridge, MA, USA, 1992. [Google Scholar]
Beyer, H.G.; Schwefel, H.P. Evolution strategies—A comprehensive introduction. Nat. Comput. 2002, 1, 3–52. [Google Scholar] [CrossRef]
Storn, R.; Price, K. Differential evolution—A simple and efficient heuristic for global optimization over continuous spaces. J. Glob. Optim. 1997, 11, 341–359. [Google Scholar] [CrossRef]
Dorigo, M.; Maniezzo, V.; Colorni, A. Ant system: Optimization by a colony of cooperating agents. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 1996, 26, 29–41. [Google Scholar] [CrossRef]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; IEEE: New York, NY, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Yang, X.S. Firefly algorithms for multimodal optimization. In Stochastic Algorithms: Foundations and Applications, Proceedings of the International Symposium on Stochastic Algorithms, Sapporo, Japan, 26–28 October 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 169–178. [Google Scholar]
Stanley, K.O.; Miikkulainen, R. Evolving neural networks through augmenting topologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [CrossRef]
Liu, Y.; Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G.; Tan, K.C. A survey on evolutionary neural architecture search. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 550–570. [Google Scholar] [CrossRef]
Yao, X.; Liu, Y. A new evolutionary system for evolving artificial neural networks. IEEE Trans. Neural Netw. 1997, 8, 694–713. [Google Scholar] [CrossRef] [PubMed]
Floreano, D.; Dürr, P.; Mattiussi, C. Neuroevolution: From architectures to learning. Evol. Intell. 2008, 1, 47–62. [Google Scholar] [CrossRef]
Galván, E.; Mooney, P. Neuroevolution in deep neural networks: Current trends and future challenges. IEEE Trans. Artif. Intell. 2021, 2, 476–493. [Google Scholar] [CrossRef]
Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G. Evolving deep convolutional neural networks for image classification. IEEE Trans. Evol. Comput. 2019, 24, 394–407. [Google Scholar] [CrossRef]
Wang, B.; Sun, Y.; Xue, B.; Zhang, M. Evolving deep neural networks by multi-objective particle swarm optimization for image classification. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 490–498. [Google Scholar]
Real, E.; Moore, S.; Selle, A.; Saxena, S.; Suematsu, Y.L.; Tan, J.; Le, Q.V.; Kurakin, A. Large-scale evolution of image classifiers. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 2902–2911. [Google Scholar]
Rajesh, C.; Kumar, S. An evolutionary block based network for medical image denoising using Differential Evolution. Appl. Soft Comput. 2022, 121, 108776. [Google Scholar] [CrossRef]
Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. arXiv 2016, arXiv:1611.01578. [Google Scholar]
An, Y.; Zhang, C.; Zheng, X. Knowledge reconstruction assisted evolutionary algorithm for neural network architecture search. Knowl.-Based Syst. 2023, 264, 110341. [Google Scholar] [CrossRef]
Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Hua, W.; Li, L.J.; Fei-Fei, L.; Yuille, A.; Huang, J.; Murphy, K. Progressive neural architecture search. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 19–34. [Google Scholar]
Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. Mnasnet: Platform-aware neural architecture search for mobile. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2820–2828. [Google Scholar]
Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710. [Google Scholar]
Chen, Y.; Yang, T.; Zhang, X.; Meng, G.; Xiao, X.; Sun, J. Detnas: Backbone search for object detection. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, Canada, 8–14 December 2019; pp. 6642–6652. [Google Scholar]
Jiang, C.; Xu, H.; Zhang, W.; Liang, X.; Li, Z. SP-NAS: Serial-to-parallel backbone search for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11863–11872. [Google Scholar]
Ghiasi, G.; Lin, T.Y.; Le, Q.V. NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 7029–7038. [Google Scholar] [CrossRef]
Xu, A.; Yao, A.; Li, A.; Liang, A.; Zhang, A. Auto-FPN: Automatic Network Architecture Adaptation for Object Detection Beyond Classification. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6648–6657. [Google Scholar] [CrossRef]
Wang, N.; Gao, Y.; Chen, H.; Wang, P.; Tian, Z.; Shen, C.; Zhang, Y. NAS-FCOS: Efficient search for object detection architectures. Int. J. Comput. Vis. 2021, 129, 3299–3312. [Google Scholar] [CrossRef]
Guo, J.; Han, K.; Wang, Y.; Zhang, C.; Yang, Z.; Wu, H.; Chen, X.; Xu, C. Hit-Detector: Hierarchical Trinity Architecture Search for Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11402–11411. [Google Scholar] [CrossRef]
Wang, X.; Lin, J.; Zhao, J.; Yang, X.; Yan, J. Eautodet: Efficient architecture search for object detection. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 668–684. [Google Scholar]
Xiong, Y.; Liu, H.; Gupta, S.; Akin, B.; Bender, G.; Wang, Y.; Kindermans, P.J.; Tan, M.; Singh, V.; Chen, B. Mobiledets: Searching for object detection architectures for mobile accelerators. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3825–3834. [Google Scholar]
Chen, B.; Ghiasi, G.; Liu, H.; Lin, T.Y.; Kalenichenko, D.; Adam, H.; Le, Q.V. MnasFPN: Learning Latency-Aware Pyramid Architecture for Object Detection on Mobile Devices. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 13604–13613. [Google Scholar] [CrossRef]
Bender, G.; Kindermans, P.J.; Zoph, B.; Vasudevan, V.; Le, Q. Understanding and simplifying one-shot architecture search. In Proceedings of the International Conference on Machine Learning, Stockholm Sweden, 10–15 July 2018; pp. 550–559. [Google Scholar]
Liang, T.; Wang, Y.; Tang, Z.; Hu, G.; Ling, H. OPANAS: One-Shot Path Aggregation Network Architecture Search for Object Detection. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10190–10198. [Google Scholar] [CrossRef]
Yao, L.; Xu, H.; Zhang, W.; Liang, X.; Li, Z. SM-NAS: Structural-to-modular neural architecture search for object detection. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12661–12668. [Google Scholar]
Wu, T.; Shi, J.; Zhou, D.; Lei, Y.; Gong, M. A multi-objective particle swarm optimization for neural networks pruning. In Proceedings of the 2019 IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand, 10–13 June 2019; IEEE: New York, NY, USA, 2019; pp. 570–577. [Google Scholar]
Lu, Z.; Whalen, I.; Boddeti, V.; Dhebar, Y.; Deb, K.; Goodman, E.; Banzhaf, W. Nsga-net: Neural architecture search using multi-objective genetic algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 419–427. [Google Scholar]
Howard, A.G. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
Hu, W.; Zhou, A.; Zhang, G. A classification surrogate model based evolutionary algorithm for neural network structure learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; IEEE: New York, NY, USA, 2020; pp. 1–7. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Jiang, J.; Han, F.; Ling, Q.; Wang, J.; Li, T.; Han, H. Efficient network architecture search via multiobjective particle swarm optimization based on decomposition. Neural Netw. 2020, 123, 305–316. [Google Scholar] [CrossRef]
Ma, A.; Wan, Y.; Zhong, Y.; Wang, J.; Zhang, L. SceneNet: Remote sensing scene classification deep learning network using multi-objective neural evolution architecture search. ISPRS J. Photogramm. Remote Sens. 2021, 172, 171–188. [Google Scholar] [CrossRef]
Lu, Z.; Deb, K.; Goodman, E.; Banzhaf, W.; Boddeti, V.N. Nsganetv2: Evolutionary multi-objective surrogate-assisted neural architecture search. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part I 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 35–51. [Google Scholar]
Yao, F.; Wang, S.; Ding, L.; Zhong, G.; Bullock, L.B.; Xu, Z.; Dong, J. Lightweight network learning with zero-shot neural architecture search for UAV images. Knowl.-Based Syst. 2023, 260, 110142. [Google Scholar] [CrossRef]
Xiong, X.; Xu, S.; Wu, W.; Tu, D.; Zhang, J.; Wei, Z. Identification of Electrical Equipment Based on Faster LSTM-CNN Network. In Proceedings of the 2020 IEEE International Conference on Networking, Sensing and Control (ICNSC), Nanjing, China, 30 October–2 November 2020; pp. 1–6. [Google Scholar] [CrossRef]
Yan, L.; Zhang, Z.; Liang, J.; Qu, B.; Yu, K.; Wang, K. ASMEvoNAS: Adaptive segmented multi-objective evolutionary network architecture search. Appl. Soft Comput. 2023, 146, 110639. [Google Scholar] [CrossRef]
Zhou, Y.; Yen, G.G.; Yi, Z. Evolutionary Shallowing Deep Neural Networks at Block Levels. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 4635–4647. [Google Scholar] [CrossRef]
Wang, Z.; Luo, T.; Li, M.; Zhou, J.T.; Goh, R.S.M.; Zhen, L. Evolutionary Multi-Objective Model Compression for Deep Neural Networks. IEEE Comput. Intell. Mag. 2021, 16, 10–21. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Duta, I.C.; Liu, L.; Zhu, F.; Shao, L. Pyramidal convolution: Rethinking convolutional neural networks for visual recognition. arXiv 2020, arXiv:2006.11538. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
Lin, M. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
Zhang, H.; Jin, Y.; Hao, K. Evolutionary search for complete neural network architectures with partial weight sharing. IEEE Trans. Evol. Comput. 2022, 26, 1072–1086. [Google Scholar] [CrossRef]
Liao, Y.; Li, J.; Wei, S.; Xiao, X. Evolutionary Search via channel attention based parameter inheritance and stochastic uniform sampled training. Comput. Vis. Image Underst. 2024, 243, 104000. [Google Scholar] [CrossRef]
Soniya; Singh, L.; Paul, S. Hybrid evolutionary network architecture search (HyENAS) for convolution class of deep neural networks with applications. Expert Syst. 2023, 40, e12690. [Google Scholar] [CrossRef]
Szwarcman, D.; Civitarese, D.; Vellasco, M. Quantum-inspired evolutionary algorithm applied to neural architecture search. Appl. Soft Comput. 2022, 120, 108674. [Google Scholar] [CrossRef]
Lu, Z.; Liang, S.; Yang, Q.; Du, B. Evolving block-based convolutional neural network for hyperspectral image classification. IEEE Trans. Geosci. Remote. Sens. 2022, 60, 5525921. [Google Scholar] [CrossRef]
Huang, J.; Xue, B.; Sun, Y.; Zhang, M.; Yen, G.G. Particle swarm optimization for compact neural architecture search for image classification. IEEE Trans. Evol. Comput. 2022, 27, 1298–1312. [Google Scholar] [CrossRef]
Fang, W.; Zhu, Z.; Zhu, S.; Sun, J.; Wu, X.; Lu, Z. LoNAS: Low-Cost Neural Architecture Search Using a Three-Stage Evolutionary Algorithm [Research Frontier]. IEEE Comput. Intell. Mag. 2023, 18, 78–93. [Google Scholar] [CrossRef]
Cheng, Y.; Xia, L.; Yan, B.; Chen, J.; Hu, D.; Zhu, L. A defect detection method based on faster RCNN for power equipment. In Journal of Physics: Conference Series, Proceedings of the 2020 3rd International Symposium on Power Electronics and Control Engineering (ISPECE 2020), Chongqing, China, 27–29 November 2020; IOP Publishing: Bristol, UK, 2021; Volume 1754, p. 012025. [Google Scholar] [CrossRef]
Dong, J.; Hou, B.; Feng, L.; Tang, H.; Tan, K.C.; Ong, Y.S. A cell-based fast memetic algorithm for automated convolutional neural architecture design. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9040–9053. [Google Scholar] [CrossRef] [PubMed]
Shang, R.; Zhu, S.; Liu, H.; Ma, T.; Zhang, W.; Feng, J.; Jiao, L.; Stolkin, R. Evolutionary architecture search via adaptive parameter control and gene potential contribution. Swarm Evol. Comput. 2023, 82, 101354. [Google Scholar] [CrossRef]
Sun, Y.; Xue, B.; Zhang, M.; Yen, G.G. Completely automated CNN architecture design based on blocks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 1242–1254. [Google Scholar] [CrossRef]
Wang, B.; Sun, Y.; Xue, B.; Zhang, M. A hybrid differential evolution approach to designing deep convolutional neural networks for image classification. In Proceedings of the AI 2018: Advances in Artificial Intelligence: 31st Australasian Joint Conference, Wellington, New Zealand, 11–14 December 2018; Proceedings 31. Springer: Berlin/Heidelberg, Germany, 2018; pp. 237–250. [Google Scholar]
Zhang, T.; Lei, C.; Zhang, Z.; Meng, X.B.; Chen, C.P. AS-NAS: Adaptive scalable neural architecture search with reinforced evolutionary algorithm for deep learning. IEEE Trans. Evol. Comput. 2021, 25, 830–841. [Google Scholar] [CrossRef]
Chen, C.P.; Zhang, T.; Chen, L.; Tam, S.C. I-Ching divination evolutionary algorithm and its convergence analysis. IEEE Trans. Cybern. 2016, 47, 2–13. [Google Scholar] [CrossRef] [PubMed]
Wen, L.; Gao, L.; Li, X.; Li, H. A new genetic algorithm based evolutionary neural architecture search for image classification. Swarm Evol. Comput. 2022, 75, 101191. [Google Scholar] [CrossRef]
Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13733–13742. [Google Scholar]
Lyu, B.; Wen, S.; Shi, K.; Huang, T. Multiobjective reinforcement learning-based neural architecture search for efficient portrait parsing. IEEE Trans. Cybern. 2021, 53, 1158–1169. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Simonyan, K.; Yang, Y. Darts: Differentiable architecture search. arXiv 2018, arXiv:1806.09055. [Google Scholar]
Morse, G.; Stanley, K.O. Simple evolutionary optimization can rival stochastic gradient descent in neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference 2016, Denver, CO, USA, 20–24 July 2016; pp. 477–484. [Google Scholar]
Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized evolution for image classifier architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 4780–4789. [Google Scholar]
Tong, L.; Du, B. Neural architecture search via reference point based multi-objective evolutionary algorithm. Pattern Recognit. 2022, 132, 108962. [Google Scholar] [CrossRef]
Li, J.Y.; Zhan, Z.H.; Xu, J.; Kwong, S.; Zhang, J. Surrogate-assisted hybrid-model estimation of distribution algorithm for mixed-variable hyperparameters optimization in convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 2338–2352. [Google Scholar] [CrossRef]
Xue, Y.; Wang, Y.; Liang, J.; Slowik, A. A self-adaptive mutation neural architecture search algorithm based on blocks. IEEE Comput. Intell. Mag. 2021, 16, 67–78. [Google Scholar] [CrossRef]
Yuan, G.; Wang, B.; Xue, B.; Zhang, M. Particle swarm optimization for efficiently evolving deep convolutional neural networks using an autoencoder-based encoding strategy. IEEE Trans. Evol. Comput. 2023, 28, 1190–1204. [Google Scholar] [CrossRef]
Qiu, Z.; Bi, W.; Xu, D.; Guo, H.; Ge, H.; Liang, Y.; Lee, H.P.; Wu, C. Efficient self-learning evolutionary neural architecture search. Appl. Soft Comput. 2023, 146, 110671. [Google Scholar] [CrossRef]
Louati, H.; Bechikh, S.; Louati, A.; Aldaej, A.; Said, L.B. Joint design and compression of convolutional neural networks as a bi-level optimization problem. Neural Comput. Appl. 2022, 34, 15007–15029. [Google Scholar] [CrossRef]
Zhang, H.; Hao, K.; Gao, L.; Tang, X.S.; Wei, B. Enhanced Gradient for Differentiable Architecture Search. IEEE Trans. Neural Netw. Learn. Syst. 2023, 35, 9606–9620. [Google Scholar] [CrossRef]
Du, W.; Ren, Z.; Li, F.; Lin, Y. Surrogate-Assisted Niching Differential Evolution for hyperparameter optimization in Convolutional Neural Networks. Swarm Evol. Comput. 2025, 99, 102176. [Google Scholar] [CrossRef]
Yang, S.; Tian, Y.; Xiang, X.; Peng, S.; Zhang, X. Accelerating evolutionary neural architecture search via multifidelity evaluation. IEEE Trans. Cogn. Dev. Syst. 2022, 14, 1778–1792. [Google Scholar] [CrossRef]
Xue, Y.; Zha, J.; Pelusi, D.; Chen, P.; Luo, T.; Zhen, L.; Wang, Y.; Wahib, M. Neural Architecture Search With Progressive Evaluation and Sub-Population Preservation. IEEE Trans. Evol. Comput. 2024, 29, 1678–1691. [Google Scholar]
Sun, Y.; Wang, H.; Xue, B.; Jin, Y.; Yen, G.G.; Zhang, M. Surrogate-assisted evolutionary deep learning using an end-to-end random forest-based performance predictor. IEEE Trans. Evol. Comput. 2019, 24, 350–364. [Google Scholar]
Ma, L.; Li, N.; Yu, G.; Geng, X.; Cheng, S.; Wang, X.; Huang, M.; Jin, Y. Pareto-wise ranking classifier for multi-objective evolutionary neural architecture search. IEEE Trans. Evol. Comput. 2023, 28, 570–581. [Google Scholar] [CrossRef]
Wang, B.; Xue, B.; Zhang, M. Surrogate-assisted particle swarm optimization for evolving variable-length transferable blocks for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3727–3740. [Google Scholar] [CrossRef]
Wei, C.; Niu, C.; Tang, Y.; Wang, Y.; Hu, H.; Liang, J. Npenas: Neural predictor guided evolution for neural architecture search. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 8441–8455. [Google Scholar] [CrossRef]
Chen, A.; Ren, Z.; Wang, M.; Chen, H.; Leng, H.; Liu, S. A surrogate-assisted highly cooperative coevolutionary algorithm for hyperparameter optimization in deep convolutional neural networks. Appl. Soft Comput. 2023, 147, 110794. [Google Scholar] [CrossRef]
Luong, N.H.; Phan, Q.M.; Vo, A.; Pham, T.N.; Bui, D.T. Lightweight multi-objective evolutionary neural architecture search with low-cost proxy metrics. Inf. Sci. 2024, 655, 119856. [Google Scholar] [CrossRef]
Sun, Y.; Sun, X.; Fang, Y.; Yen, G.G.; Liu, Y. A novel training protocol for performance predictors of evolutionary neural architecture search algorithms. IEEE Trans. Evol. Comput. 2021, 25, 524–536. [Google Scholar] [CrossRef]
Peng, Y.; Song, A.; Ciesielski, V.; Fayek, H.M.; Chang, X. Pre-nas: Evolutionary neural architecture search with predictor. IEEE Trans. Evol. Comput. 2022, 27, 26–36. [Google Scholar] [CrossRef]
Yang, Z.; Wang, Y.; Chen, X.; Shi, B.; Xu, C.; Xu, C.; Tian, Q.; Xu, C. Cars: Continuous evolution for efficient neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1829–1838. [Google Scholar]
Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4095–4104. [Google Scholar]
Zhang, H.; Jin, Y.; Cheng, R.; Hao, K. Efficient evolutionary search of attention convolutional networks via sampled training and node inheritance. IEEE Trans. Evol. Comput. 2020, 25, 371–385. [Google Scholar] [CrossRef]
Yuan, G.; Xue, B.; Zhang, M. An evolutionary neural architecture search method based on performance prediction and weight inheritance. Inf. Sci. 2024, 667, 120466. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollar, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
Farhadi, A.; Redmon, J. Yolov3: An incremental improvement. In Proceedings of the Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; Springer: Berlin/Heidelberg, Germany, 2018; Volume 1804, pp. 1–6. [Google Scholar]
Lin, T.Y.; Dollar, P.; Girshick, R.; He, K.; Hariharan, B.; Dollar, P. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar] [CrossRef]
Zhou, X.; Wang, D.; Krähenbühl, P. Objects as points. arXiv 2019, arXiv:1904.07850. [Google Scholar] [PubMed]
Feng, Z.; Guo, L.; Huang, D.; Li, R. Electrical insulator defects detection method based on yolov5. In Proceedings of the 2021 IEEE 10th Data Driven Control and Learning Systems Conference (DDCLS), Suzhou, China, 14–16 May 2021; IEEE: New York, NY, USA, 2021; pp. 979–984. [Google Scholar]
Sadykova, D.; Pernebayeva, D.; Bagheri, M.; James, A. IN-YOLO: Real-time detection of outdoor high voltage insulators using UAV imaging. IEEE Trans. Power Deliv. 2019, 35, 1599–1601. [Google Scholar] [CrossRef]
Liu, Y.; Ji, X.; Pei, S.; Ma, Z.; Zhang, G.; Lin, Y.; Chen, Y. Research on automatic location and recognition of insulators in substation based on YOLOv3. High Volt. 2020, 5, 62–68. [Google Scholar] [CrossRef]
Xu, C.; Bo, B.; Liu, Y.; Tao, F. Detection method of insulator based on single shot multibox detector. In Journal of Physics: Conference Series, Proceedings of the 3rd Annual International Conference on Information System and Artificial Intelligence (ISAI2018), Suzhou, China, 22–24 June 2018; IOP Publishing: Bristol, UK, 2018; Volume 1069, p. 012183. [Google Scholar] [CrossRef]
Wanguo, W.; Zhenli, W.; Bin, L.; Yuechen, Y.; Xiaobin, S. Typical defect detection technology of transmission line based on deep learning. In Proceedings of the 2019 Chinese Automation Congress (CAC), Hangzhou, China, 22–24 November 2019; IEEE: New York, NY, USA, 2019; pp. 1185–1189. [Google Scholar]
Mao, M.; Liu, L.; Chen, W.; Xiong, W.; Xi, X.; Zhu, G.; Zhang, Y.; Wang, S.; Chen, Y. Power Transmission Line Defect Recognition Method Based on Binocular Feature Fusion and Improved FCOS Detection Head. In Proceedings of the 2022 International Conference on Sensing, Measurement & Data Analytics in the era of Artificial Intelligence (ICSMD), Harbin, China, 30 November–2 December 2022; pp. 1–4. [Google Scholar] [CrossRef]
Xu, C.; Xin, M.; Wang, Y.; Gao, J. Transmission line defect identification based on improved RetinaNet algorithm. In Proceedings of the 2023 4th International Conference on Mechatronics Technology and Intelligent Manufacturing (ICMTIM), Nanjing, China, 26–28 May 2023; pp. 591–598. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, Y.; Liu, J.; Zhang, C.; Xue, X.; Zhang, H.; Zhang, W. InsuDet: A Fault Detection Method for Insulators of Overhead Transmission Lines Using Convolutional Neural Networks. IEEE Trans. Instrum. Meas. 2021, 70, 5018512. [Google Scholar] [CrossRef]
Liu, C.; Wu, Y.; Liu, J.; Sun, Z.; Xu, H. Insulator faults detection in aerial images from high-voltage transmission lines based on deep learning model. Appl. Sci. 2021, 11, 4647. [Google Scholar] [CrossRef]
Wang, J.; Zhang, T.; Xue, X.; Chen, L. Real-time recognition of transmission line insulators under complex backgrounds: A Yolov5s approach. In Proceedings of the 2022 4th International Conference on Power and Energy Technology (ICPET), Xining, China, 28–31 July 2022; IEEE: New York, NY, USA, 2022; pp. 77–83. [Google Scholar]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2011–2023. [Google Scholar] [CrossRef]
Shen, W.; Fang, M.; Wang, Y.; Xiao, J.; Chen, H.; Zhang, W.; Li, X. AE-YOLOv5 for Detection of Power Line Insulator Defects. IEEE Open J. Comput. Soc. 2024, 5, 468–479. [Google Scholar] [CrossRef]
Zheng, J.; Wu, H.; Zhang, H.; Wang, Z.; Xu, W. Insulator-defect detection algorithm based on improved YOLOv7. Sensors 2022, 22, 8801. [Google Scholar] [CrossRef]
Zhang, Z.D.; Zhang, B.; Lan, Z.C.; Liu, H.C.; Li, D.Y.; Pei, L.; Yu, W.X. FINet: An Insulator Dataset and Detection Benchmark Based on Synthetic Fog and Improved YOLOv5. IEEE Trans. Instrum. Meas. 2022, 71, 6006508. [Google Scholar] [CrossRef]
Chen, X.; An, Z.; Huang, L.; He, S.; Zhang, X.; Lin, S. Surface Defect Detection of Electric Power Equipment in Substation Based on Improved YOLOV4 Algorithm. In Proceedings of the 2020 10th International Conference on Power and Energy Systems (ICPES), Chengdu, China, 25–27 December 2020; pp. 256–261. [Google Scholar] [CrossRef]
Song, Y.; Zhou, Z.; Li, Q.; Chen, Y.; Xiang, P.; Yu, Q.; Zhang, L.; Lu, Y. Intrusion detection of foreign objects in high-voltage lines based on YOLOv4. In Proceedings of the 2021 6th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 9–11 April 2021; IEEE: New York, NY, USA, 2021; pp. 1295–1300. [Google Scholar]
Souza, B.J.; Stefenon, S.F.; Singh, G.; Freire, R.Z. Hybrid-YOLO for classification of insulators defects in transmission lines based on UAV. Int. J. Electr. Power Energy Syst. 2023, 148, 108982. [Google Scholar] [CrossRef]
Panigrahy, S.; Karmakar, S. Real-Time Condition Monitoring of Transmission Line Insulators Using the YOLO Object Detection Model With a UAV. IEEE Trans. Instrum. Meas. 2024, 73, 1–9. [Google Scholar] [CrossRef]
Song, J.; Qin, X.; Lei, J.; Zhang, J.; Wang, Y.; Zeng, Y. A fault detection method for transmission line components based on synthetic dataset and improved YOLOv5. Int. J. Electr. Power Energy Syst. 2024, 157, 109852. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, Y.; Hu, M.; Ju, X. Insulator defect detection based on YOLO and SPP-Net. In Proceedings of the 2020 International Conference on Big Data & Artificial Intelligence & Software Engineering (ICBASE), Bangkok, Thailand, 30 October–1 November 2020; IEEE: New York, NY, USA, 2020; pp. 403–407. [Google Scholar]
Ge, Z.; Li, H.; Yang, R.; Liu, H.; Pei, S.; Jia, Z.; Ma, Z. Bird’s nest detection algorithm for transmission lines based on deep learning. In Proceedings of the 2022 3rd International Conference on Computer Vision, Image and Deep Learning & International Conference on Computer Engineering and Applications (CVIDL & ICCEA), Changchun, China, 20–22 May 2022; IEEE: New York, NY, USA, 2022; pp. 417–420. [Google Scholar]
Lei, X.; Sui, Z. Intelligent fault detection of high voltage line based on the Faster R-CNN. Measurement 2019, 138, 379–385. [Google Scholar] [CrossRef]
Chen, W.; Li, Y.; Li, C. A visual detection method for foreign objects in power lines based on mask R-CNN. Int. J. Ambient. Comput. Intell. (IJACI) 2020, 11, 34–47. [Google Scholar] [CrossRef]
Ying, Y.; Wang, Y.; Yan, Y.; Dong, Z.; Qi, D.; Li, C. An improved defect detection method for substation equipment. In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; IEEE: New York, NY, USA, 2020; pp. 6318–6323. [Google Scholar]
Tang, J.; Wang, J.; Wang, H.; Wei, J.; Wei, Y.; Qin, M. Insulator Defect Detection Based on Improved Faster R-CNN. In Proceedings of the 2022 4th Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 25–28 March 2022; IEEE: New York, NY, USA, 2022; pp. 541–546. [Google Scholar] [CrossRef]
Zhang, H.; Wu, L.; Chen, Y.; Chen, R.; Kong, S.; Wang, Y.; Hu, J.; Wu, J. Attention-guided multitask convolutional neural network for power line parts detection. IEEE Trans. Instrum. Meas. 2022, 71, 5008213. [Google Scholar] [CrossRef]
Lan, M.; Zhang, Y.; Zhang, L.; Du, B. Defect Detection from UAV Images Based on Region-Based CNNs. In Proceedings of the 2018 IEEE International Conference on Data Mining Workshops (ICDMW), Singapore, 17–20 November 2018; pp. 385–390. [Google Scholar] [CrossRef]
Zu, G.; Wu, G.; Zhang, C.; He, X. Detection of common foreign objects on power grid lines based on Faster R-CNN algorithm and data augmentation method. In Journal of Physics: Conference Series, Proceedings of the 2020 3rd International Conference on Modeling, Simulation and Optimization Technologies and Applications (MSOTA) 2020, Bijing, China, 22–23 November 2020; IOP Publishing: Bristol, UK, 2021; Volume 1746, p. 012039. [Google Scholar] [CrossRef]
Song, C.; Xu, W.; Han, G.; Zeng, P.; Wang, Z.; Yu, S. A Cloud Edge Collaborative Intelligence Method of Insulator String Defect Detection for Power IIoT. IEEE Internet Things J. 2021, 8, 7510–7520. [Google Scholar] [CrossRef]
Zhao, W.; Xu, M.; Cheng, X.; Zhao, Z. An insulator in transmission lines recognition and fault detection model based on improved faster RCNN. IEEE Trans. Instrum. Meas. 2021, 70, 5016408. [Google Scholar] [CrossRef]
Wang, B.; Dong, M.; Ren, M.; Wu, Z.; Guo, C.; Zhuang, T.; Pischler, O.; Xie, J. Automatic fault diagnosis of infrared insulator images based on image instance segmentation and temperature analysis. IEEE Trans. Instrum. Meas. 2020, 69, 5345–5355. [Google Scholar] [CrossRef]
Li, Z.; Wang, Q.; Zhang, T.; Ju, C.; Suzuki, S.; Namiki, A. UAV High-Voltage Power Transmission Line Autonomous Correction Inspection System Based on Object Detection. IEEE Sens. J. 2023, 23, 10215–10230. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU loss: More powerful learning for bounding box regression. arXiv 2022, arXiv:2205.12740. [Google Scholar] [CrossRef]
Zhang, Y.; Yuan, B.; Zhang, J.; Li, Z.; Pang, C.; Dong, C. Lightweight PM-YOLO network model for moving object recognition on the distribution network side. In Proceedings of the 2022 2nd Asia-Pacific Conference on Communications Technology and Computer Science (ACCTCS), Shenyang, China, 25–27 February 2022; IEEE: New York, NY, USA, 2022; pp. 508–516. [Google Scholar]
Qiu, Z.; Zhu, X.; Liao, C.; Qu, W.; Yu, Y. A lightweight yolov4-edam model for accurate and real-time detection of foreign objects suspended on power lines. IEEE Trans. Power Deliv. 2022, 38, 1329–1340. [Google Scholar] [CrossRef]
Wang, Q.; Zhang, Z.; Chen, Q.; Zhang, J.; Kang, S. Lightweight Transmission Line Fault Detection Method Based on Leaner YOLOv7-Tiny. Sensors 2024, 24, 565. [Google Scholar] [CrossRef] [PubMed]
Jeddi, A.B.; Shafieezadeh, A.; Nateghi, R. PDP-CNN: A deep learning model for post-hurricane reconnaissance of electricity infrastructure on resource-constrained embedded systems at the edge. IEEE Trans. Instrum. Meas. 2023, 72, 2504109. [Google Scholar] [CrossRef]
Huang, H.; Lan, G.; Wei, J.; Zhong, Z.; Xu, Z.; Li, D.; Zou, F. TLI-YOLOv5: A lightweight object detection framework for transmission line inspection by unmanned aerial vehicle. Electronics 2023, 12, 3340. [Google Scholar] [CrossRef]
Han, S.; Yang, F.; Jiang, H.; Yang, G.; Zhang, N.; Wang, D. A Smart Thermography Camera and Application in the Diagnosis of Electrical Equipment. IEEE Trans. Instrum. Meas. 2021, 70, 5012108. [Google Scholar] [CrossRef]
Wang, Y.; Li, Q.; Liu, Y.; Wang, C. Insulator defect detection based on improved YOLOv5 algorithm. In Proceedings of the 2023 IEEE 12th Data Driven Control and Learning Systems Conference (DDCLS), Xiangtan, China, 12–14 May 2023; pp. 770–775. [Google Scholar] [CrossRef]
Shi, C.; Lin, L.; Sun, J.; Su, W.; Yang, H.; Wang, Y. A Lightweight YOLOv5 Transmission Line Defect Detection Method Based on Coordinate Attention. In Proceedings of the 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 4–6 March 2022; Volume 6, pp. 1779–1785. [Google Scholar] [CrossRef]
Ye, T.; Qin, W.; Zhao, Z.; Gao, X.; Deng, X.; Ouyang, Y. Real-Time Object Detection Network in UAV-Vision Based on CNN and Transformer. IEEE Trans. Instrum. Meas. 2023, 72, 2505713. [Google Scholar] [CrossRef]
Li, Y.; Liu, M.; Li, Z.; Jiang, X. CSSAdet: Real-Time end-to-end small object detection for power transmission line inspection. IEEE Trans. Power Deliv. 2023, 38, 4432–4442. [Google Scholar] [CrossRef]
Huang, Y.C. Evolving neural nets for fault diagnosis of power transformers. IEEE Trans. Power Deliv. 2003, 18, 843–848. [Google Scholar] [CrossRef]
Meng, K.; Dong, Z.Y.; Wang, D.H.; Wong, K.P. A Self-Adaptive RBF Neural Network Classifier for Transformer Fault Analysis. IEEE Trans. Power Syst. 2010, 25, 1350–1360. [Google Scholar] [CrossRef]
Haghnegahdar, L.; Wang, Y. A whale optimization algorithm-trained artificial neural network for smart grid cyber intrusion detection. Neural Comput. Appl. 2020, 32, 9427–9441. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Mitra, S.; Mukhopadhyay, R.; Chattopadhyay, P. PSO driven designing of robust and computation efficient 1D-CNN architecture for transmission line fault detection. Expert Syst. Appl. 2022, 210, 118178. [Google Scholar] [CrossRef]
Zhang, W.; Yang, X.; Deng, Y.; Li, A. An inspired machine-learning algorithm with a hybrid whale optimization for power transformer PHM. Energies 2020, 13, 3143. [Google Scholar] [CrossRef]
Tao, L.; Yang, X.; Zhou, Y.; Yang, L. A novel transformers fault diagnosis method based on probabilistic neural network and bio-inspired optimizer. Sensors 2021, 21, 3623. [Google Scholar] [CrossRef]
Lu, W.; Shi, C.; Fu, H.; Xu, Y. A power transformer fault diagnosis method based on improved sand cat swarm optimization algorithm and bidirectional gated recurrent unit. Electronics 2023, 12, 672. [Google Scholar] [CrossRef]
Elmasry, W.; Wadi, M. Edla-efds: A novel ensemble deep learning approach for electrical fault detection systems. Electr. Power Syst. Res. 2022, 207, 107834. [Google Scholar] [CrossRef]
Yang, P.; Wang, T.; Yang, H.; Meng, C.; Zhang, H.; Cheng, L. The performance of electronic current transformer fault diagnosis model: Using an improved whale optimization algorithm and RBF neural network. Electronics 2023, 12, 1066. [Google Scholar] [CrossRef]
Klaar, A.C.R.; Stefenon, S.F.; Seman, L.O.; Mariani, V.C.; Coelho, L.d.S. Optimized EWT-Seq2Seq-LSTM with attention mechanism to insulators fault prediction. Sensors 2023, 23, 3202. [Google Scholar] [CrossRef]
Jiang, H.; Wang, B.; Wu, L.; Chen, J.; Liu, X.; Miao, X. Fallen detection of power distribution poles in UAV inspection using improved YOLOX with particle swarm optimization. Multimed. Tools Appl. 2024, 83, 69601–69617. [Google Scholar] [CrossRef]
Stefenon, S.F.; Seman, L.O.; Klaar, A.C.R.; Ovejero, R.G.; Leithardt, V.R.Q. Hypertuned-YOLO for interpretable distribution power grid fault location based on EigenCAM. Ain Shams Eng. J. 2024, 15, 102722. [Google Scholar] [CrossRef]
Lv, X.L.; Chiang, H.D.; Dong, N. Automatic DNN architecture design using CPSOTJUTT for power system inspection. J. Big Data 2023, 10, 150. [Google Scholar] [CrossRef]
Zhao, Y.; Li, J.; Zhang, Q.; Lian, C.; Shan, P.; Yu, C.; Jiang, Z.; Qiu, Z. Simultaneous Detection of Defects in Electrical Connectors Based on Improved Convolutional Neural Network. IEEE Trans. Instrum. Meas. 2022, 71, 3511710. [Google Scholar] [CrossRef]

Figure 1. Changes in the number of papers published in different years about CNN-based power inspection and UAV power inspection.

Figure 2. Changes in the number of papers published in different years about object detection-based power inspection and UAV power inspection.

Figure 3. Intelligent inspection framework based on edge computing.

Figure 4. The paper overview.

Figure 5. Basic framework for ENAS.

Figure 6. The cell-based architecture search, where the input and output sizes of the normal cell are the same, and the output size of the reduction cell is 1/2 the size of the input.

Figure 7. Encoding strategy. (In the figure, light blue represents convolutional layers, dark blue represents fully connected layers, and gray represents pooling layers).

Figure 8. Crossover strategy. (In the figure, light blue represents convolutional layers, dark blue represents fully connected layers, and gray represents pooling layers. The red-labeled number 3 indicates the crossover operation performed on node 3).

Figure 9. Mutation strategy. (In the figure, light blue represents convolutional layers, dark blue represents fully connected layers, and gray represents pooling layers. The red-labeled number 3 indicates the mutation operation performed on node 3).

Figure 10. The framework of the one-stage objective detection.

Figure 11. The framework of the two-stage objective detection.

Table 1. The comparative analysis of NAS methods.

Method	Advantages	Limitations	Applicability to UAV Power Inspection
Manual Design	Strong interpretability; reusable experience.	Design space is limited by human expert experience; difficult to optimize multiple objectives simultaneously; unable to fully explore the architecture space; limited optimization capability for specific hardware platforms.	Adjustments are made based on general architectures; difficult to perform deep optimization specifically for inspection tasks and embedded platforms.
Gradient-based NAS	High search efficiency (GPU-hour level); differentiable operations; relatively low memory footprint.	Search space is limited to differentiable operations; difficult to handle discrete architecture choices; weak modeling capability for hardware constraints; primarily optimizes a single objective (accuracy).	Fast search speed, but difficult to handle multi-constraint optimization and discrete hardware characteristics of embedded devices.
RL-based NAS	Can handle discrete search spaces; end-to-end optimization.	Low sample efficiency (requires a large number of architecture evaluations); unstable training; difficult reward function design; hard to balance multiple objectives; sensitive to hyperparameters.	Can handle discrete spaces, but training costs are high, and the multiobjective balancing mechanism is less natural than evolutionary algorithms.
ENAS	Strong global search capability; naturally supports multiobjective optimization; no gradient information required; can handle complex constraints; architecture evaluation can be parallelized; insensitive to initialization.	High computational cost (requires evaluating a large number of candidate architectures); long search time (though can be mitigated by techniques such as surrogate models).	Can simultaneously optimize multiple objectives such as accuracy, latency, and power consumption; can directly model hardware constraints; suitable for scenarios involving offline search followed by deployment

Table 2. The two-stage objective detection.

Algorithm	Region Proposal	Backbone	Loss Function	Params (M)	GFLOPs	mAP@0.5	Date
R-CNN	Selective Search	AlexNet	Hinge Loss, Bounding-box regression	60	200	58.5% (VOC2007)	2014
SPP-Net	Edge Boxes	ZFNet	Hinge Loss, Bounding-box regression	60	50	59.2% (VOC2007)	2015
Fast R-CNN	Selective Search	AlexNet, VGG16	Class Log Loss, Bounding-box regression	60	45	70.0% (VOC2007)	2015
Faster R-CNN	Regional Proposal Network	ZFNet, VGG	Class Log Loss, Bounding-box regression	130	180	73.2% (VOC2007)	2015
Mask R-CNN	Regional Proposal Network	ResNet	Class Log Loss, Bounding-box regression, Binary cross-entropy loss function	150	220	78.0% (VOC2007)	2016

Table 3. The one-stage objective detection.

Algorithm	Backbone	Neck	Head	Params (M)	GFLOPs	mAP@0.5	Date
YOLO v1	GoogLeNet	-	Fully connected layer, Anchor-free	62	45	$63.4 %$ (VOC2007)	2015
YOLO v2	Darknet-19	-	Fully connected layer, Anchor-based	34	35	$76.8 %$ (VOC2007)	2016
YOLO v3	Darknet-53	FPN	Coupled head, Anchor-based	61	65	$60.6 %$ (COCO2017)	2018
YOLO v4	CSPDarkNet-53	SPP, PANet	Coupled head, Anchor-based	20	100	$65.7 %$ (COCO2017)	2020
YOLO v5	Focus, CSP1-based CSPDarknet	SPPF, CSP2-PAN	Coupled head, Anchor-based	7.2	16.5	$56.8 %$ (COCO2017)	2020
YOLO v6	EfficientRep	SimSPPF, Rep-PAN	Efficient Decoupled Head, Anchor-free	17.2	44.2	$60.4 %$ (COCO2017)	2022
YOLO v7	CSPDarknet based on E-ELAN and MPConv	SPPCSPC, PAN based on E-ELAN	Coupled head, Anchor-based	11	28.1	$69.7 %$ (COCO2017)	2022
YOLO v8	C2f-based CSPDarknet	SPPF, C2f-based PAN	Decoupled head, Anchor-free	11.2	28.6	$61.8 %$ (COCO2017)	2023
YOLO v9	GELAN	PGI-based PAN	Decoupled head, Anchor-free	7.1	26.4	$63.4 %$ (COCO2017)	2024
YOLO v10	CSPNet and Pyramid Scene Attention(PSA)	PAN	Consistent Dual Assignments, Anchor-free	7.2	21.6	$46.8 %$ (COCO2017)	2024
YOLO v11	C3k2-based CSPDarknet and C2PSA	SPPF, C3k2-based PAN	Decoupled head, Anchor-free	9.4	21.7	$71.2 %$ (COCO2017)	2024
YOLO v12	Vision Transformer	Attention Enhanced Cross-Feature module(A2C2f)-based PAN	Decoupled head, Anchor-free	9.3	21.4	$48 %$ (COCO2017)	2025
SSD	VGG-16	FPN	Anchor-based	26.8	99.5	$76.9 %$ (VOC2007)	2016
RetinaNet	ResNet50	FPN	Anchor-based	32.6	100	$59.1 %$ (COCO2017)	2017
CenterNet	ResNet	FPN	Anchor-free	11.7	50	$44.9 %$ (COCO2017)	2019

Table 6. Power intelligent inspection based on lightweight object detection model.

Original Method	Improvement	Application Scenario	Purpose	Performance Improvement	Reference
YOLO v4-Tiny	Introduced Global Information Aggregation Module (GIAM), improved Interior Spatial Transformer Network (ISTN), and designed Feature Enhancement Fusion Network (FEFN)	Fault diagnosis of substation equipment	improve the detection accuracy and the real-time detection speed	mAP: +3.36%	[36]
YOLO v4	Converted infrared images to grayscale inputs and compressed backbone network channels	Fault diagnosis of substation equipment	a lightweight network that conforms to the edge AI device is designed	mAP: +10.6%, engine file sizes: −192 MB	[184]
YOLO v4	Replaced CSPDarkNet53 with MobileNetV2 embedded with SENet; replaced standard convolution with depthwise separable convolution; embedded CBAM into SPP and PANet	Foreign object detection on transmission lines	improve inspection accuracy and speed	mAP: +0.91%, FPS: +26	[180]
YOLO v5n	Introduced SimAM attention module; adopted Wise IoU loss function; combined transfer learning (COCO pre-trained weights) and cosine learning rate decay	Fault detection on transmission lines	enhance feature extraction capability and improve model accuracy and robustness	mAP: +2.91%, F1 score: +1.69%	[183]
YOLO v5	Introduced RepVGG re-parameterization; introduced DBB to replace downsampling; introduced ECA in Neck; simplified SPP; TensorRT optimization	Pin defect detection of transmission line	improve model detection performance and make models suitable for deployment in edge devices	mAP: +1.2%, Time: −27 ms	[14]
	Replaced backbone with MobileNetV3; used K-means++ for anchor boxes; introduced Coordinate Attention (CA); replaced CIoU with EIoU loss	Insulator detection on transmission lines	improve detection accuracy of small objects in complex backgrounds and reduce the number of parameters	Parameters: −51%, GFLOPs: −62%, mAP: +2%, Inference time: −21 ms	[185]
	Introduced Reverse Depthwise Separable Convolution and lightweight Coordinate Attention (CA) mechanism	Defect detection on transmission lines	reduce model parameters and improve detection accuracy of small objects in complex backgrounds	mAP: −0.7%, FLOPs(G): −60.94%	[186]
YOLO X	Replaced Backbone with MobileNet v3; introduced Ghost module, depthwise separable convolution, CA, and $α$ -DIOU loss in Neck	Defect detection on transmission lines	reduce model parameters and improve detection accuracy of small objects in complex backgrounds	mAP: +2%, FPS: +6	[177]
YOLOv7-Tiny	Replaced standard convolution with Depthwise Separable Convolution; used SiLU; introduced SP multi-scale spatial attention and FCIoU loss	Defect detection on transmission lines	reduce model parameters and improve detection accuracy of small objects in complex backgrounds	mAP: +0.1%, Model Size: −20%	[181]
CNN, Transformer	Introduced Lightweight Feature Extraction Module (LEM) in backbone; introduced ECTB and multi-scale feature fusion/attention prediction	Defect detection on transmission lines	reduce model parameters and improve detection accuracy of small objects in complex backgrounds	mAP: 86.4%, FPS: 312	[187]
One-stage detection model	Introduced Cross-Scale Attention and Spatial Attention mechanisms in the Neck network	Pin defect detection of transmission line	enhance feature extraction capabilities and improve multi-scale object detection accuracy	mAP: +7.7%	[188]
PDP-CNN	Adopted depthwise separable convolution and anchor-free backend; introduced Bipartite Matching Loss and bounding box loss	Power distribution poles damage detection	trade-off between accuracy and computational cost and design lightweight models suitable for embedded systems	mAP: 86.1%, FPS: 28.1	[182]

Table 7. Power intelligent inspection based on EDL.

EDL Type	Algorithm	Optimization Content	Application Scenario	Performance Improvement	Reference
Parameter Optimization	EC	Connection weights and bias terms	Power Transformers	Accuracy: +4.76%	[189]
	QPSO	Number of neurons, centers and radii of hidden layer activation functions, and output connection weights of RBFNN	Power Transformers	Accuracy: +3.3%	[190]
	Double PSO	Optimal features and hyperparameters	Symmetrical and unsymmetrical faults of electrical power systems	Accuracy: +4.36%	[197]
	CASAWOA	Center weights, hidden layer neuron width, output weights, and network size of RBFNN	Electronic Current Transformer	Accuracy: +8.49%	[198]
	ISCSO	Hidden batch size, alpha, number of hidden layers, and number of layer neurons for BiGRU model	Power Transformers	Accuracy: +9.8%	[196]
	MDE-WOA	Smoothing factor of PNN	Power Transformers	Accuracy: +14.28%, MSE: −0.21	[194]
	ISSA	Smoothing factor of PNN	Power Transformers	Accuracy: +12.95%, MSE: −0.25	[195]
	Optuna	Hidden units, activation functions, and learning rate of EWT-Seq2Seq LSTM	Insulator detection	MSE: −0.12 $\times 10^{- 5}$ , MAE: −0.24 $\times 10^{- 3}$ , MAPE: −0.2 $\times 10^{2}$	[199]
	PSO	batch size and input resolution of YOLOX	Distribution pole inspection	AP: +15.2%, Precision: +17.7%	[200]
	GA	21 hyperparameters of YOLO v5	Insulator detection	mAP@0.5: +2.7%, Time: −15.9 h	[201]
	PSO	Number and size of 1D-CNN kernels	Transmission lines	Emulation time: −14.5%, FPS: +65%	[193]
Architecture Search	CPSOTJUTT	DNN topology	Multi-scenario	Test error: −12.6%	[202]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, S.; Cao, B. The Survey of Evolutionary Deep Learning-Based UAV Intelligent Power Inspection. Drones 2026, 10, 55. https://doi.org/10.3390/drones10010055

AMA Style

Fan S, Cao B. The Survey of Evolutionary Deep Learning-Based UAV Intelligent Power Inspection. Drones. 2026; 10(1):55. https://doi.org/10.3390/drones10010055

Chicago/Turabian Style

Fan, Shanshan, and Bin Cao. 2026. "The Survey of Evolutionary Deep Learning-Based UAV Intelligent Power Inspection" Drones 10, no. 1: 55. https://doi.org/10.3390/drones10010055

APA Style

Fan, S., & Cao, B. (2026). The Survey of Evolutionary Deep Learning-Based UAV Intelligent Power Inspection. Drones, 10(1), 55. https://doi.org/10.3390/drones10010055

Article Menu

The Survey of Evolutionary Deep Learning-Based UAV Intelligent Power Inspection

Highlights

Abstract

1. Introduction

2. Foundations

2.1. UAV Intelligent Power Inspection Technology

2.2. EDL Research

2.2.1. Hyperparameter Optimization for EDL

2.2.2. Network Architecture Optimization for EDL

2.2.3. Multiobjective Model Optimization in EDL

2.2.4. Key Technical Components of EDL

2.3. Research on Object Detection Algorithms Based on DL

3. Research on UAV Intelligent Power Inspection Methods Based on EDL

3.1. Intelligent Power Inspection Methods Based on One-Stage Object Detection Models

3.2. Intelligent Power Inspection Methods Based on Two-Stage Object Detection Models

3.3. Intelligent Power Inspection Methods Based on Lightweight Object Detection Models

3.4. Intelligent Power Inspection Methods Based on EDL

3.4.1. Network Parameter Optimization

3.4.2. Network Architecture Optimization

3.4.3. Comparative Analysis of EDL Models for Power Inspection

3.4.4. EDL Solutions for UAV Power Inspection Challenges

4. Discussion

5. Challenges and Future Development Trends

5.1. Automated EDL Search for Robust Architectures in Complex Environments

5.2. EDL-Driven Few-Shot Learning and Data Augmentation

5.3. Multiobjective Evolutionary Search for Lightweight Models

5.4. Cross-Modal Architecture Search for Multimodal Data Fusion

5.5. Federated Evolutionary Learning with Distributed Collaboration and Privacy Protection

5.6. EDL Co-Design for Emerging Hardware

5.6.1. Neural Architecture Adaptation for Event Cameras

5.6.2. Architecture-Hardware Co-Evolution for In-Memory Computing Chips

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI