Novel MOA Fault Detection Technology Based on Small Sample Infrared Image

: This paper proposes a novel metal oxide arrester (MOA) fault detection technology based on a small sample infrared image. The research is carried out from the detection process and data enhancement. A lightweight MOA identiﬁcation and location algorithm is designed at the edge, which can not only reduce the amount of data uploaded, but also reduce the search space of cloud algorithm. In order to improve the accuracy and generalization ability of the defect detection model under the condition of small samples, a multi-model fusion detection algorithm is proposed. Different features of the image are extracted by multiple convolutional neural networks, and then multiple classiﬁers are trained. Finally, the weighted voting strategy is used for fault diagnosis. In addition, the extended model of fault samples is constructed by transfer learning and deep convolutional generative adversarial networks (DCGAN) to solve the problem of unbalanced training data sets. The experimental results show that the proposed method can realize the accurate location of arrester under the condition of small samples, and after the data expansion, the recognition rate of arrester anomalies can be improved from 83% to 85%, showing high effectiveness and reliability.


Introduction
A metal oxide arrester (MOA) is widely used as an important protection equipment for safe operation of power transmission and the transformation system. However, due to the impact of lightning overvoltage and switching overvoltage, as well as environmental temperature and humidity, the characteristics of the MOA will change [1]. Therefore, the good operation characteristics of the MOA are particularly important for power transmission and the transformation system.
There are two main methods for MOA condition monitoring. The first one is based on leakage current, which can be divided into three methods, including the full current method [2][3][4], third harmonic current method [5][6][7] and the capacitive current compensation method [8,9]. However, these methods need an aging test, which is inefficient and difficult to overcome the interference of harmonic voltage in operation voltage. The second method is based on infrared thermal imaging [10,11]. Because MOA faults will cause a local temperature rise, the grade of heating defect can be judged by comparing the temperature difference of each part of the MOA. Compared with other defect detection methods, infrared detection technology is simple, safe and more efficient. However, the complex environment and less-fault samples are the difficulties of intelligent inspection. Therefore, how to realize automatic fault identification is a hot topic for researchers.
The early research of state detection is mainly based on the traditional image recognition and machine learning methods for image data mining. References [12,13] proposed to use self-organizing mapping (SOM) to analyze the thermal characteristics of an MOA infrared image under working voltage, so as to determine the MOA state; in [14], it is are collected and stored by edge devices, and then processed quickly on the spot based on the needs of communication and cloud diagnosis and uploaded to the cloud server for fault identification, relying on a high-speed communication network (4G, power wireless private network, etc.). Computing is distributed in the whole system network, including edge intelligent devices and cloud servers, and data is stored in the intelligent devices at the edge of the network. Therefore, the system can meet the construction needs of a low delay, low energy consumption, high-precision power Internet of things.
Electronics 2021, 10, x FOR PEER REVIEW 3 of 15 patrol smart car, etc.) and the server (cloud server, etc.). The infrared thermal images of the MOA are collected and stored by edge devices, and then processed quickly on the spot based on the needs of communication and cloud diagnosis and uploaded to the cloud server for fault identification, relying on a high-speed communication network (4G, power wireless private network, etc.). Computing is distributed in the whole system network, including edge intelligent devices and cloud servers, and data is stored in the intelligent devices at the edge of the network. Therefore, the system can meet the construction needs of a low delay, low energy consumption, high-precision power Internet of things.

MOA Identification and Localization
Aiming at the problem that traditional detection methods are difficult to overcome the complex background interference of a power grid, the key components localization is proposed to localize and extract different types of MOAs in substation or transmission line. In order to apply to edge devices, an improved SSD-MobileNet network that performs well in both speed and scale is adopted.

Infrared Thermal Fault Detection of the MOA
However, achieving full automation of MOA defect detection is still very challenging due to the visual complexity of defects and the small number of defective MOAs.
(1) The amount of abnormal MOA data is not enough to train robust classification model. (2) The existing fault detection algorithms based on deep learning are usually single neural network, but they are often limited by the characteristics of the network in the face of different background applications. (3) The visual complexity of defects makes it difficult, if not impossible, to construct a precise model.
On the one hand, through a transfer learning-generation convolution countermeasure network, the data expansion model is constructed to solve the problem of data imbalance. On the other hand, different infrared features are extracted by multiple neural networks, and multiple classifiers are trained. Finally, the combination strategy is used to fuse the prediction results to improve the accuracy and generalization ability of the detection model.

MOA Identification and Localization
A single shot multibox detector (SSD) is a classic one-stage target detection model proposed by Wei Liu in 2016 [25]. As a fast recognition and positing network, SSD is widely used in target detection, and its architecture is shown in Figure 2.

MOA Identification and Localization
Aiming at the problem that traditional detection methods are difficult to overcome the complex background interference of a power grid, the key components localization is proposed to localize and extract different types of MOAs in substation or transmission line. In order to apply to edge devices, an improved SSD-MobileNet network that performs well in both speed and scale is adopted.

Infrared Thermal Fault Detection of the MOA
However, achieving full automation of MOA defect detection is still very challenging due to the visual complexity of defects and the small number of defective MOAs.
(1) The amount of abnormal MOA data is not enough to train robust classification model. (2) The existing fault detection algorithms based on deep learning are usually single neural network, but they are often limited by the characteristics of the network in the face of different background applications. (3) The visual complexity of defects makes it difficult, if not impossible, to construct a precise model.
On the one hand, through a transfer learning-generation convolution countermeasure network, the data expansion model is constructed to solve the problem of data imbalance. On the other hand, different infrared features are extracted by multiple neural networks, and multiple classifiers are trained. Finally, the combination strategy is used to fuse the prediction results to improve the accuracy and generalization ability of the detection model.

MOA Identification and Localization
A single shot multibox detector (SSD) is a classic one-stage target detection model proposed by Wei Liu in 2016 [25]. As a fast recognition and positing network, SSD is widely used in target detection, and its architecture is shown in Figure 2. In order to cope with the limited computing resources at the edge, in this paper we use the lightweight MobileNet structure to replace the original VGG16 basic network, and cut down the average pooling layer and the full connection layer. MobileNet is a series of lightweight networks proposed by Google [26]. Figure 3 shows the standard convolution and MobileNet structure. MobileNet uses something similar to a deep separable convolution instead of a traditional convolution and decomposes the original standard convolution into deep convolution and point-by-point convolution. Each time, one channel of input data is convoluted, and then convolution is performed by using the convolution core with a channel number of 1 × 1 input data channel number, thus reducing a large amount of redundant calculation. First, the image size of the MOA is converted to a fixed size of 300 × 300. Then, forward propagation is used to extract features through basic network to from the feature map. Finally, the additional feature network is adopted for regression calculation and maximum suppression to generate the prediction of the target object frame and category.

Data Expansion
As shown in Figure 4, the proposed data expansion model based on transfer learning and a deep convolution generation adversarial network (TL-DCGAN) is proposed. Firstly, the transfer learning method is used to train a model DCGAN1, which can generate normal samples by using a large number of existing normal MOA images. Then, the weight of dcgan1 is transferred again, and the limited fault data is used to train the data expansion model DCGAN2. In order to cope with the limited computing resources at the edge, in this paper we use the lightweight MobileNet structure to replace the original VGG16 basic network, and cut down the average pooling layer and the full connection layer. MobileNet is a series of lightweight networks proposed by Google [26]. Figure 3 shows the standard convolution and MobileNet structure. MobileNet uses something similar to a deep separable convolution instead of a traditional convolution and decomposes the original standard convolution into deep convolution and point-by-point convolution. Each time, one channel of input data is convoluted, and then convolution is performed by using the convolution core with a channel number of 1 × 1 input data channel number, thus reducing a large amount of redundant calculation. In order to cope with the limited computing resources at the edge, in this paper we use the lightweight MobileNet structure to replace the original VGG16 basic network, and cut down the average pooling layer and the full connection layer. MobileNet is a series of lightweight networks proposed by Google [26]. Figure 3 shows the standard convolution and MobileNet structure. MobileNet uses something similar to a deep separable convolution instead of a traditional convolution and decomposes the original standard convolution into deep convolution and point-by-point convolution. Each time, one channel of input data is convoluted, and then convolution is performed by using the convolution core with a channel number of 1 × 1 input data channel number, thus reducing a large amount of redundant calculation. First, the image size of the MOA is converted to a fixed size of 300 × 300. Then, forward propagation is used to extract features through basic network to from the feature map. Finally, the additional feature network is adopted for regression calculation and maximum suppression to generate the prediction of the target object frame and category.

Data Expansion
As shown in Figure 4, the proposed data expansion model based on transfer learning and a deep convolution generation adversarial network (TL-DCGAN) is proposed. Firstly, the transfer learning method is used to train a model DCGAN1, which can generate normal samples by using a large number of existing normal MOA images. Then, the weight of dcgan1 is transferred again, and the limited fault data is used to train the data expansion model DCGAN2. First, the image size of the MOA is converted to a fixed size of 300 × 300. Then, forward propagation is used to extract features through basic network to from the feature map. Finally, the additional feature network is adopted for regression calculation and maximum suppression to generate the prediction of the target object frame and category.

Data Expansion
As shown in Figure 4, the proposed data expansion model based on transfer learning and a deep convolution generation adversarial network (TL-DCGAN) is proposed. Firstly, the transfer learning method is used to train a model DCGAN1, which can generate normal samples by using a large number of existing normal MOA images. Then, the weight of dcgan1 is transferred again, and the limited fault data is used to train the data expansion model DCGAN2.

Generation Adversarial Network
As shown in Figure 5, the GAN network structure mainly includes the generator (G) and discriminator (D). The objective function of GAN training is as follows: where pdata(x) is the probability distribution of real samples, pz(z) is the distribution of input random noise and V(G,D) is the cross entropy loss.
The loss function of G is: The loss function of D is: where s is the real sample and xfake is the false sample. The optimization goal of the GAN model is to make the samples generated by G make D unable to distinguish true from false. Therefore, in training, for G, we hope that the larger the p(s|xfake) is, the better; that

Generation Adversarial Network
As shown in Figure 5, the GAN network structure mainly includes the generator (G) and discriminator (D).

Generation Adversarial Network
As shown in Figure 5, the GAN network structure mainly includes the generator (G) and discriminator (D). The objective function of GAN training is as follows: where pdata(x) is the probability distribution of real samples, pz(z) is the distribution of input random noise and V(G,D) is the cross entropy loss.
The loss function of G is: The loss function of D is: where s is the real sample and xfake is the false sample. The optimization goal of the GAN model is to make the samples generated by G make D unable to distinguish true from false. Therefore, in training, for G, we hope that the larger the p(s|xfake) is, the better; that The objective function of GAN training is as follows: where p data (x) is the probability distribution of real samples, p z (z) is the distribution of input random noise and V(G,D) is the cross entropy loss. The loss function of G is: The loss function of D is: where s is the real sample and x fake is the false sample. The optimization goal of the GAN model is to make the samples generated by G make D unable to distinguish true from false. Therefore, in training, for G, we hope that the larger the p(s|x fake ) is, the better; that is, max D V(D, G) mentioned above. For the D, when the sample comes from the training set xreal, the larger the p(s|x real ) is, the better; when the sample comes from the G, the larger the p(s|x fake ) is, the better; that is, the min G V(D, G) mentioned above.

Data Expansion Model Based on TL-DCGAN
Compared with GAN, a variant of GAN (deep convolutional GAN (DCGAN)) was proposed in 2016 [27] which uses the mature convolutional neural networks (CNN) instead of MLP and removes the pooling layer, making the overall network model differentiable.

Improved Generator Structure
In order to improve the resolution of the MOA, a cumulus layer is added on the basis of DCGAN. The generator structure of the fault image expansion model of the MOA is shown in Figure 6. In addition, this is conducted in order to make the generated data distribution more close to the real data distribution, prevent the gradient disappearance and improve the network stability.
Electronics 2021, 10, x FOR PEER REVIEW 6 of 15 is, mentioned above. For the D, when the sample comes from the training set xreal, the larger the p(s|xreal) is, the better; when the sample comes from the G, the larger the p(s|xfake) is, the better; that is, the

Data Expansion Model Based on TL-DCGAN
Compared with GAN, a variant of GAN (deep convolutional GAN (DCGAN)) was proposed in 2016 [27] which uses the mature convolutional neural networks (CNN) instead of MLP and removes the pooling layer, making the overall network model differentiable.

Improved Generator Structure
In order to improve the resolution of the MOA, a cumulus layer is added on the basis of DCGAN. The generator structure of the fault image expansion model of the MOA is shown in Figure 6. In addition, this is conducted in order to make the generated data distribution more close to the real data distribution, prevent the gradient disappearance and improve the network stability.  The generator is mainly composed of the input layer, full connection layer, convolution layer and residual block, in which the convolution layer is used as fractional step convolution, and the activation function is the ReLU function. Firstly, a set of random noise is inputted, which is uniformly distributed, and is extended to a feature matrix of size 4 × 4 × 1024 through the whole connection layer. Then, through the first convolution layer, deconvolution, batch normalization and activation function operation are performed, and the output characteristic matrix size is 8 × 8 × 512. Then, the output characteristic matrix size is 16 × 16 × 256 through two residual blocks, increasing the network depth and improving the network representation ability. Finally, after much processing of the volume layer and residual block, the pixel is 128 × 128 MOA image. See Table 1 for the parameters of convolution layer of the producer, where the convolution core size is 3 × 3 and the step size is 2.  The generator is mainly composed of the input layer, full connection layer, convolution layer and residual block, in which the convolution layer is used as fractional step convolution, and the activation function is the ReLU function. Firstly, a set of random noise is inputted, which is uniformly distributed, and is extended to a feature matrix of size 4 × 4 × 1024 through the whole connection layer. Then, through the first convolution layer, deconvolution, batch normalization and activation function operation are performed, and the output characteristic matrix size is 8 × 8 × 512. Then, the output characteristic matrix size is 16 × 16 × 256 through two residual blocks, increasing the network depth and improving the network representation ability. Finally, after much processing of the volume layer and residual block, the pixel is 128 × 128 MOA image. See Table 1 for the parameters of convolution layer of the producer, where the convolution core size is 3 × 3 and the step size is 2.

Improved Discriminator Structure
As shown in Figure 7, compared with the original DCGAN network, the discriminator structure designed in this paper adds a layer of convolution network. In addition, in order to improve the network performance, the residual module is constructed similar to the above generator. Among them, the leaky ReLU activation function is used, and the convolution kernel size is 3 × 3. After convolution, batch normalization and function activation are performed. The input of the discriminator is the real or generated MOA image with the size of 128 × 128. The image size is reduced by sampling under the convolution layer, and then the network is deepened by two residual blocks, and the extracted feature information is transmitted to the deep layer of the network. Finally, through several convolution layers and residual blocks, the image size becomes 4 × 4 × 1024 and is input to the full connection layer to get the result of image discrimination. The parameters of the convolution layer in the discriminator are shown in Table 2, in which the convolution kernel size is 3 × 3 and the step size is 2. As shown in Figure 7, compared with the original DCGAN network, the discriminator structure designed in this paper adds a layer of convolution network. In addition, in order to improve the network performance, the residual module is constructed similar to the above generator. Among them, the leaky ReLU activation function is used, and the convolution kernel size is 3 × 3. After convolution, batch normalization and function activation are performed. The input of the discriminator is the real or generated MOA image with the size of 128 × 128. The image size is reduced by sampling under the convolution layer, and then the network is deepened by two residual blocks, and the extracted feature information is transmitted to the deep layer of the network. Finally, through several convolution layers and residual blocks, the image size becomes 4 × 4 × 1024 and is input to the full connection layer to get the result of image discrimination. The parameters of the convolution layer in the discriminator are shown in Table 2, in which the convolution kernel size is 3 × 3 and the step size is 2.

Defect Detection
As shown in Figure 8, the proposed MOA infrared state detection framework based on multi-model fusion is proposed. Due to the small number of MOA infrared fault samples, if the traditional single neural network is used to extract the feature vector, it is easy to lead to overfitting of the model in the training process of the classifier. Therefore, this paper uses a variety of convolutional neural networks to extract a variety of MOA fault features, and then selects the relevance vector machine (RVM) as the feature vector classifier to train and generate multiple weak learning machines, and finally uses the combination strategy to fuse them together for defect detection.

Defect Detection
As shown in Figure 8, the proposed MOA infrared state detection framework based on multi-model fusion is proposed. Due to the small number of MOA infrared fault samples, if the traditional single neural network is used to extract the feature vector, it is easy to lead to overfitting of the model in the training process of the classifier. Therefore, this paper uses a variety of convolutional neural networks to extract a variety of MOA fault features, and then selects the relevance vector machine (RVM) as the feature vector classifier to train and generate multiple weak learning machines, and finally uses the combination strategy to fuse them together for defect detection.

Depth Feature Extraction
In recent years, deep learning has developed rapidly; the deep convolution neural network has especially achieved good results in image classification and target recognition, and greatly improved the efficiency. Therefore, this method is used as a feature extractor to identify the infrared thermal fault of the MOA.
In the deep convolution neural network, most of the neurons only connect with the nearby neurons and share the weights, which greatly reduces the network parameters and improves the training speed. As shown in Figure 9, there are three main structures in the deep convolution network: the convolution layer, pooling layer and full connection layer.

Input
Convolution layer1 Convolution layer2 Pooling layer1 Pooling layer2 Fully connected layer In the convolution layer, after the input data is convoluted with the linear filter, the feature map is obtained through the nonlinear activation function. Each feature map contains one feature and shares the same parameters. Different feature maps use different parameters to extract different features. The convolution formula is: The pooling layer downsamples the feature graph, reduces the dimension of the feature graph and network parameters, makes the feature easier to follow-up processing and reduces the overfitting phenomenon to a certain extent. The pooling formula is:

Depth Feature Extraction
In recent years, deep learning has developed rapidly; the deep convolution neural network has especially achieved good results in image classification and target recognition, and greatly improved the efficiency. Therefore, this method is used as a feature extractor to identify the infrared thermal fault of the MOA.
In the deep convolution neural network, most of the neurons only connect with the nearby neurons and share the weights, which greatly reduces the network parameters and improves the training speed. As shown in Figure 9, there are three main structures in the deep convolution network: the convolution layer, pooling layer and full connection layer.

Depth Feature Extraction
In recent years, deep learning has developed rapidly; the deep convolution neural network has especially achieved good results in image classification and target recognition, and greatly improved the efficiency. Therefore, this method is used as a feature extractor to identify the infrared thermal fault of the MOA.
In the deep convolution neural network, most of the neurons only connect with the nearby neurons and share the weights, which greatly reduces the network parameters and improves the training speed. As shown in Figure 9, there are three main structures in the deep convolution network: the convolution layer, pooling layer and full connection layer.

Input
Convolution layer1 Convolution layer2 Pooling layer1 Pooling layer2 Fully connected layer In the convolution layer, after the input data is convoluted with the linear filter, the feature map is obtained through the nonlinear activation function. Each feature map contains one feature and shares the same parameters. Different feature maps use different parameters to extract different features. The convolution formula is: The pooling layer downsamples the feature graph, reduces the dimension of the feature graph and network parameters, makes the feature easier to follow-up processing and reduces the overfitting phenomenon to a certain extent. The pooling formula is: In the convolution layer, after the input data is convoluted with the linear filter, the feature map is obtained through the nonlinear activation function. Each feature map contains one feature and shares the same parameters. Different feature maps use different parameters to extract different features. The convolution formula is: where x k ij is the k-th layer characteristic graph, i and j are input dimensions and x k−1 ij is the input data of the upper layer. The convolution filter of layer k is determined by the weight w k ij and the bias term b k j and f is the nonlinear activation function. The pooling layer downsamples the feature graph, reduces the dimension of the feature graph and network parameters, makes the feature easier to follow-up processing and reduces the overfitting phenomenon to a certain extent. The pooling formula is: where down is the downsampling function, if the downsampling window size is n × n. The output feature map is reduced by N times.β k ij and b k j are multiplicative bias and additive bias parameters, respectively. The full connection layer is similar to the traditional neural network, in which each neuron is connected to all inputs.
Although the classifier based on supervised learning is very mature, it needs a large number of labeled data to train a classification model with high accuracy and strong generalization. However, in the actual power grid system, the samples of fault MOA infrared data are usually less, and the image background environment is more complex. Therefore, in this paper, different convolution neural networks (AlexNet, GoogLeNet, ResNet, RetinaNet) are used to extract different features of the MOA image, and the image can be comprehensively analyzed from different aspects, so as to obtain more reliable detection results.

MOA Fault Detection Based on Integrated Learning
In order to get more accurate judgment accuracy and improve the generalization ability of the defect recognition model, this paper proposes a multi-model combination strategy based on weighted voting rule and F 1 score. F 1 score, also known as balanced f score, is a harmonic average of model accuracy (P) and recall (R). Its maximum is 1 and minimum is 0. It is often used to measure the accuracy of the binary classification model.
The F 1 score of Mi of each weak learning machine is calculated, and the formula is as follows: where P is the accuracy rate and R is the recall rate. Then, according to the performance of RVM classifier in the verification set, the voting weight is calculated by using the following formula to give higher weight to the classifier with high reliability, so as to improve the reliability of the ensemble classifier: Finally, according to the prediction result h(xi) of each weak learning machine and its voting weight w i , the final model prediction result H(x) is obtained by using weighted voting rule: where n is the number of weak learning machines, that is, the number of depth feature types. According to the prediction result H(x) of the integrated classifier, whether the MOA is abnormal or not can be determined.

Experimental Results and Analysis
To evaluate the performance of the proposed MOA defect detection system, we tested it on a MOA image data set of a substation in Jiangxi Province, China. The data acquisition equipment is an advanced pistol thermal imager with 640 × 480 infrared resolution, and its model is FLIR E98 (Shenzhen Keruijie Technology Co., Ltd., Shenzhen, China). The experiment environment is as follows: Win10, Tensorflow1.3, Anaconda (python3.6), Keras2.1.5, Core i9-9900k and GTX 2080 GPU with 8-GB memory.

MOA Positioning Experiment
In the experiment, the parameters of the model are initialized by using the weight of the classical network, and then the infrared data of the MOA are divided into training set, verification set and test set according to the ratio of 6:1:3. In this paper, the empirical value is selected as the initial value of the super parameter. Among them, the learning rate is set to 0.0015, batch_ size is set to 16 and the epoch is set to 500. Results as shown in Figure 10, the proposed MOA identification and location algorithm can effectively identify and locate different types of MOAs (the rated voltages are 110 kV, 220 kV and 500 kV, respectively). the classical network, and then the infrared data of the MOA are divided into training set, verification set and test set according to the ratio of 6:1:3. In this paper, the empirical value is selected as the initial value of the super parameter. Among them, the learning rate is set to 0.0015, batch_ size is set to 16 and the epoch is set to 500. Results as shown in Figure 10, the proposed MOA identification and location algorithm can effectively identify and locate different types of MOAs (the rated voltages are 110 kV, 220 kV and 500 kV, respectively). In order to further verify the advantages of the proposed method in MOA identification and location, the proposed algorithm and the commonly used deep learning algorithm are tested and compared on the same data set. The results are shown in Table 3. Different algorithms are compared from map, recognition speed and model training time. It can be seen from the table that a one-stage algorithm is superior to a two-stage algorithm in recognition speed, model size and training time. The recognition accuracy of a twostage algorithm is significantly higher than that of a one-stage algorithm. Among them, although the proposed algorithm is slightly inferior to the two-stage algorithm in accuracy and slightly slower than the You Only Look Once (YOLO) in speed, it is most suitable to be deployed in the edge end of the embedded device in comprehensive ability, and can realize the MOA fast and with high-precision identification and positioning.

Data Expansion Experiment
In order to keep the diversity of the samples and enhance the generalization ability of the training model, a total of 2435 infrared images of the MOA in different natural conditions in several areas were obtained from a power grid company, including 1981 normal samples and 454 fault samples. Firstly, the fault samples are expanded to 1696 by the traditional method, and then the original DCGAN model is trained. In the experiment, the In order to further verify the advantages of the proposed method in MOA identification and location, the proposed algorithm and the commonly used deep learning algorithm are tested and compared on the same data set. The results are shown in Table 3. Different algorithms are compared from map, recognition speed and model training time. It can be seen from the table that a one-stage algorithm is superior to a two-stage algorithm in recognition speed, model size and training time. The recognition accuracy of a two-stage algorithm is significantly higher than that of a one-stage algorithm. Among them, although the proposed algorithm is slightly inferior to the two-stage algorithm in accuracy and slightly slower than the You Only Look Once (YOLO) in speed, it is most suitable to be deployed in the edge end of the embedded device in comprehensive ability, and can realize the MOA fast and with high-precision identification and positioning.

Data Expansion Experiment
In order to keep the diversity of the samples and enhance the generalization ability of the training model, a total of 2435 infrared images of the MOA in different natural conditions in several areas were obtained from a power grid company, including 1981 normal samples and 454 fault samples. Firstly, the fault samples are expanded to 1696 by the traditional method, and then the original DCGAN model is trained. In the experiment, the parameter optimizer is the Adam optimizer, the learning rate is set to 0.0002, the momentum value is set to 0.5 and the batch_size is set to 64.
It can be seen from the Figure 11a that in the original DCGAN model without transfer learning, some MOA contour information appears only at 100 epoch, and the training is relatively slow, and the complete and usable normal MOA image cannot be generated after 500 epoch training. As shown in Figure 11b learning, some MOA contour information appears only at 100 epoch, and the training is relatively slow, and the complete and usable normal MOA image cannot be generated after 500 epoch training. As shown in Figure 11b, the DCGAN1 model which migrates the weight of classical algorithm can generate the basic features of MOA image at 100 epoch, such as orientation features, target contour, etc., and can generate a more complete image at 500 epoch. On the basis of training the model DCGAN1, we continue to use the idea of transfer learning to train the model. Due to the use of the weight of the model DCGAN1, in Figure 11c, it can be clearly seen that the DCGAN2 has been able to get the basic characteristics of the MOA at 100   In order to judge the performance of the improved model more accurately, the discriminator and generator loss rate curves of the original DCGAN model and the improved DCGAN2 model are drawn; respectively, as shown in Figure 12, the x-axis is the different training moments of the model, and the y-axis is the loss function value of the discriminator or generator. In the initial stage of training, the generator training times are less, the extracted MOA features are not comprehensive, the generated image is quite different from the real MOA and the discriminator can easily identify the image "true and false". Therefore, the generator loss is much larger than the discriminator loss. With the increase in training times, the MOA features obtained by the generator are more and more sufficient, and the generated images are more and more close to the real sample data. Comparing the change of the loss function of the two models, it can be seen that the loss rate of the improved DCGAN2 model can finally converge, which proves that the generated image effect is better. In order to judge the performance of the improved model more accurately, the discriminator and generator loss rate curves of the original DCGAN model and the improved DCGAN2 model are drawn; respectively, as shown in Figure 12, the x-axis is the different training moments of the model, and the y-axis is the loss function value of the discriminator or generator. In the initial stage of training, the generator training times are less, the extracted MOA features are not comprehensive, the generated image is quite different from the real MOA and the discriminator can easily identify the image "true and false". Therefore, the generator loss is much larger than the discriminator loss. With the increase in training times, the MOA features obtained by the generator are more and more sufficient, and the generated images are more and more close to the real sample data. Comparing the change of the loss function of the two models, it can be seen that the loss rate of the improved DCGAN2 model can finally converge, which proves that the generated image effect is better.

MOA Thermal Fault Detection Experiment
In addition to the average accuracy mentioned in the previous chapter, an F1 score is added to evaluate the performance of the classifier. The fault identification effect diagram of different types of MOA is shown in Figure 13. It can be seen from Table 4 that the highest recognition rate of the model trained by a single neural network is only 76%, while the recognition accuracy of the multi-model fusion classifier proposed in this paper can be improved from 5% to 81%, which can effectively identify the infrared thermal fault of the MOA.

MOA Thermal Fault Detection Experiment
In addition to the average accuracy mentioned in the previous chapter, an F 1 score is added to evaluate the performance of the classifier. The fault identification effect diagram of different types of MOA is shown in Figure 13. It can be seen from Table 4 that the highest recognition rate of the model trained by a single neural network is only 76%, while the recognition accuracy of the multi-model fusion classifier proposed in this paper can be improved from 5% to 81%, which can effectively identify the infrared thermal fault of the MOA.

MOA Thermal Fault Detection Experiment
In addition to the average accuracy mentioned in the previous chapter, an F1 score is added to evaluate the performance of the classifier. The fault identification effect diagram of different types of MOA is shown in Figure 13. It can be seen from Table 4 that the highest recognition rate of the model trained by a single neural network is only 76%, while the recognition accuracy of the multi-model fusion classifier proposed in this paper can be improved from 5% to 81%, which can effectively identify the infrared thermal fault of the MOA.   In addition, the proposed method can identify and locate the MOA before the condition detection, which reduces the search space of the fault detection model. As shown in Figure 14, the fault detection accuracy of the MOA after positioning is significantly higher than the global detection. Therefore, the average accuracy of the final detection increased from 81% to 83%.  In addition, the proposed method can identify and locate the MOA before the condition detection, which reduces the search space of the fault detection model. As shown in Figure 14, the fault detection accuracy of the MOA after positioning is significantly higher than the global detection. Therefore, the average accuracy of the final detection increased from 81% to 83%.

Conclusions
In this paper, an infrared thermal fault detection method for small samples is proposed: (1) In order to solve the problem of sample imbalance, transfer learning and deep convolution, generation countermeasure networks are used to expand the data of fault MOAs. Experiments show that the expanded training set can improve the accuracy of a fault detection model by 2%. (2) In order to minimize the interference of the background to defect detection, defect detection is divided into two steps: target recognition and state detection. Firstly, the improved SSD algorithm is used to identify and locate the MOA. The experimental results show that the proposed algorithm can accurately locate different types of MOA in different scenarios. (3) Through a variety of convolution neural networks to extract a variety of MOA features, then train multiple weak classifiers, and then use the combination strategy to integrate the prediction results, further improving the prediction accuracy and generalization ability of the model. (4) The proposed method is based on simulation data and real cases, and many problems need to be further studied: how to combine the fault characteristics of equipment to make the model interpretable and improve the identification accuracy; through the cooperation of edge computing and cloud computing, improving the real-time performance of the detection system to meet the engineering application is the next research direction.

Conclusions
In this paper, an infrared thermal fault detection method for small samples is proposed: (1) In order to solve the problem of sample imbalance, transfer learning and deep convolution, generation countermeasure networks are used to expand the data of fault MOAs. Experiments show that the expanded training set can improve the accuracy of a fault detection model by 2%. (2) In order to minimize the interference of the background to defect detection, defect detection is divided into two steps: target recognition and state detection. Firstly, the improved SSD algorithm is used to identify and locate the MOA. The experimental results show that the proposed algorithm can accurately locate different types of MOA in different scenarios. (3) Through a variety of convolution neural networks to extract a variety of MOA features, then train multiple weak classifiers, and then use the combination strategy to integrate the prediction results, further improving the prediction accuracy and generalization ability of the model. (4) The proposed method is based on simulation data and real cases, and many problems need to be further studied: how to combine the fault characteristics of equipment to make the model interpretable and improve the identification accuracy; through the cooperation of edge computing and cloud computing, improving the real-time performance of the detection system to meet the engineering application is the next research direction.