Thermal Defect Detection for Substation Equipment Based on Infrared Image Using Convolutional Neural Network

: Thermal defects of substation equipment have a great impact on the stability of power systems. Temperature is crucial for thermal defect detection in infrared images. The traditional detection methods, which have low efﬁciency and poor accuracy, record the temperature of infrared images manually. In this study, a thermal defect detection method based on infrared images using a convolutional neural network (CNN) is proposed. Firstly, the improved pre-processing method is applied to reduce background information, and the region of interest is located according to the contour and position information, hence improving the quality of images. Then, the temperature values are segmented to establish the dataset (T-IR11), which contains 11 labels. Finally, the CNN model is constructed to extract features, and the support vector machine is trained for classiﬁcation. To verify the effectiveness of the proposed method, precision, recall, and F 1 score are adopted and 10-fold cross-validation is employed on the T-IR11 dataset. The results demonstrate that the accuracy of the proposed method is 99.50%, and the performance is superior to that of previous methods in terms of infrared images. The proposed method can realize automatic temperature recognition and equipment with thermal defects can be recorded systematically, which has signiﬁcant practical value for defect detection in substation equipment.


Introduction
Substation equipment is an important part of the power transmission, and its safe operation directly affects the stability of the power system. Owing to long-term exposure to the natural environment, substation equipment is prone to thermal defects due to corrosion, oxidation, and aging [1][2][3], resulting in abnormal temperature rise. In recent years, non-destructive testing technology has developed rapidly due to its fast and efficient characteristics, such as X-rays [4], ultrasound [5,6], photoacoustic [7], and eddy currents [8]. However, there are potential safety hazards in the detection of substation equipment, which will lead to false detection and missed detection. Infrared thermography technology can quickly detect the state of a device based on the principle of thermal radiation [9][10][11]. It has the advantages of high sensitivity and anti-electromagnetic interference. Presently, to improve the detection efficiency, infrared thermal imagers usually generate temperature maps on the right side of the image, and the maximum and minimum temperature values are marked to make temperature matching convenient [12,13]. There are two methods of infrared image temperature recording: manual and automatic. The disadvantages of the traditional manual recording include low efficiency and large error, whereas those of the automatic recording are low accuracy and poor stability due to the complex background and unstable chromaticity caused by illumination [14,15]. Therefore, research on temperature value recognition algorithms is conducive to the rapid screening of thermal defects by verify the effectiveness and accuracy. On this basis, the defective substation equipment with thermal defects were selected according to the temperature, which can improve the efficiency and accuracy of thermal defect detection.
The rest of the paper is arranged as follows: Section 2 introduces the proposed method for the infrared image of substation equipment. Section 3 presents the experiment and results. The discussion for the proposed method is illustrated in Section 4. Section 5 concludes this paper.

Proposed Methods
The block diagram of the proposed method is illustrated in Figure 1, which can be divided into three stages: image pre-processing, temperature segmentation, and temperature recognition.

Improved Image pre-Processing Method
Infrared images of substation equipment usually contain complex backgrounds, such as trees, nests, high-voltage lines, and buildings, which are significantly influenced by light and environmental factors [26]. Image pre-processing usually includes gray transformation and binarization to prepare for infrared image recognition by removing irrelevant information [27][28][29]. By analyzing the characteristics of the infrared image, it can be found that the green proportion is less, followed by the red, whereas blue is the largest, and a weighted method is used for gray transformation. Owing to poor contrast and the loss of details, the gamma correction method is applied to normalize the gray image [30][31][32], and it is expressed in Equation (1) as follows: where Ib(x,y) is the corrected image, I(x,y) is the original image, γ is the correction parameter, and c is the constant. The binarization method based on histograms is carried out, which can directly reflect the brightness of the image, and represent the distribution characteristics of the image [33]. In order to accurately explain the difference between images and eliminate the accidental background influence, 100 infrared and visible images were randomly selected for the analysis. The statistical results of the histograms are shown in Figure 2. It can be observed from Figure 2a that the pixels of the visible image are evenly distributed and the peak appears when the pixel values are very low, but the trough is not obvious. Therefore, it is difficult to select an appropriate threshold to distinguish between the background and RoI [34,35]. However, the histogram of infrared images has two large peaks, and the trough is visible as observed in Figure 2b. According to the gray values corresponding to the two sides of troughs as thresholds, the binarization results are shown in Figure 3.

Improved Image Pre-Processing Method
Infrared images of substation equipment usually contain complex backgrounds, such as trees, nests, high-voltage lines, and buildings, which are significantly influenced by light and environmental factors [26]. Image pre-processing usually includes gray transformation and binarization to prepare for infrared image recognition by removing irrelevant information [27][28][29]. By analyzing the characteristics of the infrared image, it can be found that the green proportion is less, followed by the red, whereas blue is the largest, and a weighted method is used for gray transformation. Owing to poor contrast and the loss of details, the gamma correction method is applied to normalize the gray image [30][31][32], and it is expressed in Equation (1) as follows: where I b (x,y) is the corrected image, I(x,y) is the original image, γ is the correction parameter, and c is the constant. The binarization method based on histograms is carried out, which can directly reflect the brightness of the image, and represent the distribution characteristics of the image [33]. In order to accurately explain the difference between images and eliminate the accidental background influence, 100 infrared and visible images were randomly selected for the analysis. The statistical results of the histograms are shown in Figure 2. It can be observed from Figure 2a that the pixels of the visible image are evenly distributed and the peak appears when the pixel values are very low, but the trough is not obvious. Therefore, it is difficult to select an appropriate threshold to distinguish between the background and RoI [34,35]. However, the histogram of infrared images has two large peaks, and the trough is visible as observed in Figure 2b. According to the gray values corresponding to the two sides of troughs as thresholds, the binarization results are shown in Figure 3. Figure 3 shows that when the threshold is 105, although the binarization effect on the temperature value is poor, the contour of the device is preserved. When the threshold is 210, the contour of the device is invisible, and the region of temperature value is well Electronics 2021, 10, 1986 4 of 14 preserved. There are no salt and pepper noise points in the maximum and minimum areas. Therefore, an improved adaptive binarization method based on the infrared image histogram is proposed. The threshold is determined adaptively by the infrared image histogram after gray transformation and gamma correction, and then the ideal binarization results can be obtained.   Figure 3 shows that when the threshold is 105, although the binarization effect on the temperature value is poor, the contour of the device is preserved. When the threshold is 210, the contour of the device is invisible, and the region of temperature value is well preserved. There are no salt and pepper noise points in the maximum and minimum areas. Therefore, an improved adaptive binarization method based on the infrared image histogram is proposed. The threshold is determined adaptively by the infrared image histogram after gray transformation and gamma correction, and then the ideal binarization results can be obtained.

RoI Extraction Based on Contour Information
After pre-processing, the image of the substation equipment still contains irrelevant information, such as time and watermark. The processing speed, efficiency, and accuracy will be affected due to the image containing dense pixel information [36,37]. Therefore, it is necessary to locate and segment the RoI from the background. By analyzing the images, it can be found that the rectangular box of the temperature map in the binary image is completely preserved, and the position is relatively fixed with the maximum and minimum temperatures. Therefore, a pixel accumulation method based on the contour and position information for the location is proposed to obtain the RoI. The location flowchart is shown in Figure 4.
According to the number of column pixels of the image, continuous pixels are accumulated in the direction of the length of the rectangular box, which is selected to be equal to the number of columns with the continuous pixels. The short edge of the rectangular box was used as a reference to locate the pixel coordinates of the four corners. The region of the maximum and minimum temperatures is located according to the relative position relationship between the rectangular box and the temperature value.    Figure 3 shows that when the threshold is 105, although the binarization effect on the temperature value is poor, the contour of the device is preserved. When the threshold is 210, the contour of the device is invisible, and the region of temperature value is well preserved. There are no salt and pepper noise points in the maximum and minimum areas. Therefore, an improved adaptive binarization method based on the infrared image histogram is proposed. The threshold is determined adaptively by the infrared image histogram after gray transformation and gamma correction, and then the ideal binarization results can be obtained.

RoI Extraction Based on Contour Information
After pre-processing, the image of the substation equipment still contains irrelevant information, such as time and watermark. The processing speed, efficiency, and accuracy will be affected due to the image containing dense pixel information [36,37]. Therefore, it is necessary to locate and segment the RoI from the background. By analyzing the images, it can be found that the rectangular box of the temperature map in the binary image is completely preserved, and the position is relatively fixed with the maximum and minimum temperatures. Therefore, a pixel accumulation method based on the contour and position information for the location is proposed to obtain the RoI. The location flowchart is shown in Figure 4.
According to the number of column pixels of the image, continuous pixels are accumulated in the direction of the length of the rectangular box, which is selected to be equal to the number of columns with the continuous pixels. The short edge of the rectangular box was used as a reference to locate the pixel coordinates of the four corners. The region of the maximum and minimum temperatures is located according to the relative position relationship between the rectangular box and the temperature value.

RoI Extraction Based on Contour Information
After pre-processing, the image of the substation equipment still contains irrelevant information, such as time and watermark. The processing speed, efficiency, and accuracy will be affected due to the image containing dense pixel information [36,37]. Therefore, it is necessary to locate and segment the RoI from the background. By analyzing the images, it can be found that the rectangular box of the temperature map in the binary image is completely preserved, and the position is relatively fixed with the maximum and minimum temperatures. Therefore, a pixel accumulation method based on the contour and position information for the location is proposed to obtain the RoI. The location flowchart is shown in Figure 4.
According to the number of column pixels of the image, continuous pixels are accumulated in the direction of the length of the rectangular box, which is selected to be equal to the number of columns with the continuous pixels. The short edge of the rectangular box was used as a reference to locate the pixel coordinates of the four corners. The region of the maximum and minimum temperatures is located according to the relative position relationship between the rectangular box and the temperature value.
To accurately segment the characters, the vertical projection method is used to project the RoI in the vertical direction, and is expressed in Equation (2) as follows: Electronics 2021, 10, 1986 5 of 14 where V x is the vertical projection of the image, f (x,m) is the pixel value of column x and row m in RoI, n and I x are the lengths of rows and columns in RoI, respectively.  To accurately segment the characters, the vertical projection method is used to project the RoI in the vertical direction, and is expressed in Equation (2) as follows: where Vx is the vertical projection of the image, f(x,m) is the pixel value of column x and row m in RoI, n and Ix are the lengths of rows and columns in RoI, respectively. Figure 5 shows the vertical projection results for the RoI. The image was first scanned from left to right, and thereafter the pixel value of each column was accumulated [38]. By counting the number of pixels in each column, it can be observed that the cumulative value of pixels undergoes a sudden change at the junction of two characters, where the pixel cumulative value was the minimum value. In Figure 5, two peaks correspond to the boundary area of the characters, which shows that there are two characters in the region. Subsequently, the location of the characters was determined according to their characteristics. By selecting the sudden change as the segment point, the temperature values were segmented.

Recognition Based on CNN
In the recognition stage, according to the characteristics of the temperature values, the CNN method is designed, as shown in Figure 6. C1 and C2 are convolution layers, P1 and P2 are pooling layers, and FC is the full connection layer.   Figure 5 shows the vertical projection results for the RoI. The image was first scanned from left to right, and thereafter the pixel value of each column was accumulated [38]. By counting the number of pixels in each column, it can be observed that the cumulative value of pixels undergoes a sudden change at the junction of two characters, where the pixel cumulative value was the minimum value. In Figure 5, two peaks correspond to the boundary area of the characters, which shows that there are two characters in the region. Subsequently, the location of the characters was determined according to their characteristics. By selecting the sudden change as the segment point, the temperature values were segmented.  To accurately segment the characters, the vertical projection method is used to project the RoI in the vertical direction, and is expressed in Equation (2) as follows:

Is it out of bounds
where Vx is the vertical projection of the image, f(x,m) is the pixel value of column x and row m in RoI, n and Ix are the lengths of rows and columns in RoI, respectively. Figure 5 shows the vertical projection results for the RoI. The image was first scanned from left to right, and thereafter the pixel value of each column was accumulated [38]. By counting the number of pixels in each column, it can be observed that the cumulative value of pixels undergoes a sudden change at the junction of two characters, where the pixel cumulative value was the minimum value. In Figure 5, two peaks correspond to the boundary area of the characters, which shows that there are two characters in the region. Subsequently, the location of the characters was determined according to their characteristics. By selecting the sudden change as the segment point, the temperature values were segmented.

Recognition Based on CNN
In the recognition stage, according to the characteristics of the temperature values, the CNN method is designed, as shown in Figure 6. C1 and C2 are convolution layers, P1 and P2 are pooling layers, and FC is the full connection layer.

Recognition Based on CNN
In the recognition stage, according to the characteristics of the temperature values, the CNN method is designed, as shown in Figure 6. C1 and C2 are convolution layers, P1 and P2 are pooling layers, and FC is the full connection layer.
The convolution kernel size of the two convolution layers is 5 × 5, and the stride is 1, which is applied to extract features better. The max-pooling is adopted in the pooling layer (P1), and the core size is 1 × 1, the stride is 1. In the pooling layer (P2), the core size of 2 × 2 is used to further extract the image features with the stride is 2, and the size of the feature image is 4 × 4 × 12. Finally, the input is converted to the size of 1 × 192 in the full connection layer (FC). To improve the accuracy of recognition, SVM is employed instead of Softmax as the classifier using the features from FC, which has the capability of nonlinear mapping and generalization. The temperature images obtained by pre-processing are used for training, and the network weights are updated by the adaptive moment estimation (Adam) method. Specifically, the collected infrared images of substation equipment are input into the trained CNN model to automatically recognize the temperature values, and the abnormal infrared images with thermal defects are selected according to the temperature. The convolution kernel size of the two convolution layers is 5 × 5, and the stride is 1, which is applied to extract features better. The max-pooling is adopted in the pooling layer (P1), and the core size is 1 × 1, the stride is 1. In the pooling layer (P2), the core size of 2 × 2 is used to further extract the image features with the stride is 2, and the size of the feature image is 4 × 4 × 12. Finally, the input is converted to the size of 1 × 192 in the full connection layer (FC). To improve the accuracy of recognition, SVM is employed instead of Softmax as the classifier using the features from FC, which has the capability of nonlinear mapping and generalization. The temperature images obtained by pre-processing are used for training, and the network weights are updated by the adaptive moment estimation (Adam) method. Specifically, the collected infrared images of substation equipment are input into the trained CNN model to automatically recognize the temperature values, and the abnormal infrared images with thermal defects are selected according to the temperature.

Process of the Proposed Method
Based on the above algorithms, the process of the proposed method is displayed in Figure 7.
(1) Image acquisition. The infrared images of substation equipment are acquired by the infrared thermal imager, including insulator, high voltage bushing, transfer switch, etc. (2) Image pre-processing. Gray transformation and gamma correction are firstly carried out, and the improved adaptive binarization method is used to remove the complex background.

Process of the Proposed Method
Based on the above algorithms, the process of the proposed method is displayed in Figure 7.

Experiment and Results
The configuration of the hardware is Intel(R) Core (TM)i5-10400F@ 2.90GHz with 16.0GB RAM and NVIDIA GTX 2060, and the software is MATLAB 2019b.

T-IR11 Dataset
The experimental images are collected by FLIR infrared thermal imager (T420, FLIR Systems, Inc., Wilsonville, OR, USA) from the substation equipment in the Jiangsu area. The measuring equipment is shown in Figure 8a. The infrared images have a resolution of 320 × 240 pixels and consist of six types of substation equipment, including insulators, bushings, transfer switches, lightning arresters, circuit breakers, and transformers. Some of the images are shown in Figure 8b.
Based on 600 infrared images, a temperature value dataset with 2200 images was established, which is evenly distributed to 11 labels such as "0-9" and "-", as shown in Figure 8c. According to the ratio of 8:2, the dataset was divided into training and testing sets, containing 1760 and 440 images, respectively. Four hundred images were used to test

Experiment and Results
The configuration of the hardware is Intel(R) Core (TM)i5-10400F@ 2.90 GHz with 16.0 GB RAM and NVIDIA GTX 2060, and the software is MATLAB 2019b.

T-IR11 Dataset
The experimental images are collected by FLIR infrared thermal imager (T420, FLIR Systems, Inc., Wilsonville, OR, USA) from the substation equipment in the Jiangsu area. The measuring equipment is shown in Figure 8a. The infrared images have a resolution of 320 × 240 pixels and consist of six types of substation equipment, including insulators, bushings, transfer switches, lightning arresters, circuit breakers, and transformers. Some of the images are shown in Figure 8b.

Experiment and Results
The configuration of the hardware is Intel(R) Core (TM)i5-10400F@ 2.90GHz with 16.0GB RAM and NVIDIA GTX 2060, and the software is MATLAB 2019b.

T-IR11 Dataset
The experimental images are collected by FLIR infrared thermal imager (T420, FLIR Systems, Inc., Wilsonville, OR, USA) from the substation equipment in the Jiangsu area. The measuring equipment is shown in Figure 8a. The infrared images have a resolution of 320 × 240 pixels and consist of six types of substation equipment, including insulators, bushings, transfer switches, lightning arresters, circuit breakers, and transformers. Some of the images are shown in Figure 8b.
Based on 600 infrared images, a temperature value dataset with 2200 images was established, which is evenly distributed to 11 labels such as "0-9" and "-", as shown in Figure 8c. According to the ratio of 8:2, the dataset was divided into training and testing sets, containing 1760 and 440 images, respectively. Four hundred images were used to test the proposed temperature value recognition method.

Evaluation Method
Precision, recall, and F1 score were adopted to evaluate the performance of the proposed method. The recognition results can be divided into true positive (TP), false positive (FP), true negative (TN), and false negative (FN), respectively [39,40]. Precision is the percentage of the actual positive predictions among all predicted positive samples. Recall is the percentage of the predicted positive samples among all actual positive samples. The Based on 600 infrared images, a temperature value dataset with 2200 images was established, which is evenly distributed to 11 labels such as "0-9" and "-", as shown in Figure 8c. According to the ratio of 8:2, the dataset was divided into training and testing sets, containing 1760 and 440 images, respectively. Four hundred images were used to test the proposed temperature value recognition method.

Evaluation Method
Precision, recall, and F 1 score were adopted to evaluate the performance of the proposed method. The recognition results can be divided into true positive (TP), false positive (FP), true negative (TN), and false negative (FN), respectively [39,40]. Precision is the percentage of the actual positive predictions among all predicted positive samples. Recall is the percentage of the predicted positive samples among all actual positive samples. The F 1 score considers both precision and recall, so that it is a good comprehensive evaluation index. Based on the confusion matrix of the experiment, three indices are expressed as follows: Electronics 2021, 10, 1986 8 of 14

Training Process
The parameters of the CNN model are set as follows: learning rate is 0.5, decay rate is 0.99, loss function coefficient is 0.01, batch size is 55, and iterations is 1600. In addition, the parameters of SVM are set as follows: the kernel function is radial basis function (RBF), kernel parameter is 0.09, and the penalty parameter is 1. The curve of loss and accuracy is shown in Figure 9. It can be observed that the loss decreases rapidly before 300 iterations, and then it gradually converges until 600 iterations. Thereafter, the loss tends to zero until the end. The recognition accuracy is gradually increased after 600 iterations and is greatly improved to achieve 99.55% after 650 iterations.

Training Process
The parameters of the CNN model are set as follows: learning rate is 0.5, decay rate is 0.99, loss function coefficient is 0.01, batch size is 55, and iterations is 1600. In addition, the parameters of SVM are set as follows: the kernel function is radial basis function (RBF), kernel parameter is 0.09, and the penalty parameter is 1. The curve of loss and accuracy is shown in Figure 9. It can be observed that the loss decreases rapidly before 300 iterations, and then it gradually converges until 600 iterations. Thereafter, the loss tends to zero until the end. The recognition accuracy is gradually increased after 600 iterations and is greatly improved to achieve 99.55% after 650 iterations.

Experiment Results
The infrared images were firstly pre-processed. The results of the gray transformation and gamma correction are shown in Figure 10a,b. Moreover, as shown in Figure  10c, the threshold 218 is selected adaptively according to the histogram of the infrared image. The binarized image is shown in Figure 10d, and the temperature value region is independent of the background color, watermark, and brightness.

Experiment Results
The infrared images were firstly pre-processed. The results of the gray transformation and gamma correction are shown in Figure 10a,b. Moreover, as shown in Figure 10c, the threshold 218 is selected adaptively according to the histogram of the infrared image. The binarized image is shown in Figure 10d, and the temperature value region is independent of the background color, watermark, and brightness.

Training Process
The parameters of the CNN model are set as follows: learning rate is 0.5, decay rate is 0.99, loss function coefficient is 0.01, batch size is 55, and iterations is 1600. In addition, the parameters of SVM are set as follows: the kernel function is radial basis function (RBF), kernel parameter is 0.09, and the penalty parameter is 1. The curve of loss and accuracy is shown in Figure 9. It can be observed that the loss decreases rapidly before 300 iterations, and then it gradually converges until 600 iterations. Thereafter, the loss tends to zero until the end. The recognition accuracy is gradually increased after 600 iterations and is greatly improved to achieve 99.55% after 650 iterations.

Experiment Results
The infrared images were firstly pre-processed. The results of the gray transformation and gamma correction are shown in Figure 10a,b. Moreover, as shown in Figure  10c, the threshold 218 is selected adaptively according to the histogram of the infrared image. The binarized image is shown in Figure 10d, and the temperature value region is independent of the background color, watermark, and brightness. The segmentation results are shown in Figure 11. The rectangular box and temperature values "81" and "38" are accurately located based on contour and relative position information, as shown in Figure 11a. Figure 11b shows the RoI is clearly extracted from the background without salt and pepper noise points. In Figure 11c, the characters are  The segmentation results are shown in Figure 11. The rectangular box and temperature values "81" and "38" are accurately located based on contour and relative position information, as shown in Figure 11a. Figure 11b shows the RoI is clearly extracted from the background without salt and pepper noise points. In Figure 11c, the characters are effectively segmented by utilizing the peak feature of vertical projection, and the characters "8" and "1" are cropped to a uniform size.
(c) (d) The segmentation results are shown in Figure 11. The rectangular box and tempe ture values "81" and "38" are accurately located based on contour and relative posit information, as shown in Figure 11a. Figure 11b shows the RoI is clearly extracted fr the background without salt and pepper noise points. In Figure 11c, the characters effectively segmented by utilizing the peak feature of vertical projection, and the char ters "8" and "1" are cropped to a uniform size. After training, the accuracy of the entire and each temperature value recognition shown in Table 1. It can be observed that 438 temperature value images are correctly r ognized, and the overall accuracy is 99.55%. Only two pictures "3" and "8" are recogniz incorrectly, and the recognition accuracy of other labels is 100%.  -0  1  2  3  4  5  6  7  8  9  Test number  40  40  40  40  40  40  40  40  40  40  40  Correct number  40  40  40  40  39  40  40  40  40  39 40 Accuracy (%) 100 100 100 100 97.5 100 100 100 100 97. 5 10 Four hundred images were tested to further verify the performance of the propos method based on infrared images. Figure 12a shows the field data acquisition. Beside After training, the accuracy of the entire and each temperature value recognition is shown in Table 1. It can be observed that 438 temperature value images are correctly recognized, and the overall accuracy is 99.55%. Only two pictures "3" and "8" are recognized incorrectly, and the recognition accuracy of other labels is 100%. Table 1. Distribution and recognition of labels. -0  1  2  3  4  5  6  7  8  9   Test number  40  40  40  40  40  40  40  40  40  40  40  Correct number  40  40  40  40  39  40  40  40  40  39 40 Accuracy (%) 100 100 100 100 97.5 100 100 100 100 97.5 100

Label
Four hundred images were tested to further verify the performance of the proposed method based on infrared images. Figure 12a shows the field data acquisition. Besides, a thermal defect detection system for substation equipment was designed to apply the proposed method, as shown in Figure 12b. It can be divided into three modules: (1) Image loading; (2) image recognition; and (3) defect detection.
Finally, 398 images are correctly recognized with an accuracy of 99.50%, which is consistent with the test accuracy. The experimental results show that the proposed method has good generalization capability, and the temperature value region can be accurately extracted in the infrared image. Finally, 398 images are correctly recognized with an accuracy of 99.50%, which is consistent with the test accuracy. The experimental results show that the proposed method has good generalization capability, and the temperature value region can be accurately extracted in the infrared image.

Discussion
Presently, there are several binarization methods including the 2-model, p-quantile, Otsu, and maximum entropy thresholding methods [41]. The results of binarization through these methods are shown in Figure 13. It can be observed that the previous binarization methods weaken the foreground information and the binarization effect is not ideal. In addition, there are numerous salt and pepper noise points in the temperature region that affect the accuracy of recognition. This may be due to the difference in color composition between infrared and visible images, which makes some traditional methods unsuitable for processing infrared images.

Discussion
Presently, there are several binarization methods including the 2-model, p-quantile, Otsu, and maximum entropy thresholding methods [41]. The results of binarization through these methods are shown in Figure 13. It can be observed that the previous binarization methods weaken the foreground information and the binarization effect is not ideal. In addition, there are numerous salt and pepper noise points in the temperature region that affect the accuracy of recognition. This may be due to the difference in color composition between infrared and visible images, which makes some traditional methods unsuitable for processing infrared images.
posed method, as shown in Figure 12b. It can be divided into three modules: (1) Ima loading; (2) image recognition; and (3) defect detection. Finally, 398 images are correctly recognized with an accuracy of 99.50%, which consistent with the test accuracy. The experimental results show that the propos method has good generalization capability, and the temperature value region can be curately extracted in the infrared image.

Discussion
Presently, there are several binarization methods including the 2-model, p-quant Otsu, and maximum entropy thresholding methods [41]. The results of binarizat through these methods are shown in Figure 13. It can be observed that the previous bi rization methods weaken the foreground information and the binarization effect is ideal. In addition, there are numerous salt and pepper noise points in the temperatu region that affect the accuracy of recognition. This may be due to the difference in co composition between infrared and visible images, which makes some traditional metho unsuitable for processing infrared images. To better analyze the recognition of labels in the testing set, a confusion matrix was produced. Table 2 shows that label "8", which recognizes incorrectly, is most likely to be confused with "3" in recognition, and label "3" is recognized as "2" in the test. The main reason is that the right side of "3" and "8" are similar, although the left sides of "3" and "8" are different. However, in image segmentation, the boundary difference between the pixel peaks is not obvious, which makes it easy to disconnect its left side when "8" is segmented. Therefore, it leads to confusion between "8" and "3", which is similar to "3" and "2". Table 2. Confusion matrix of labels.  Table 3 shows the precision, recall, F 1 score, and 10-fold cross-validation results of the proposed method. As can be seen, the precision, recall of label "3" is lowest, 97.5%, 97.50%, respectively, which is easily confused with "8" and "2". The F 1 score is close to 1 and flat. Three indices demonstrate that the proposed method is available for temperature recognition. Comparison experiments were employed to verify the effectiveness of the proposed method. Figure 14 shows that the recognition accuracy of the trained SVM instead of Softmax for classification is better than the CNN. In addition, the training loss of the CNN+SVM model is smaller than the CNN in the whole process, while the test accuracy is higher both in the beginning and end. The training process of the CNN+SVM model is more stable than the traditional CNN model, and converges rapidly, which can save training time effectively. Table 4 displays the comparative experiment of the classic HOG+SVM, PCANet, traditional CNN, and the proposed method. According to the experimental results, the training accuracy of our proposed method and the PCANet method are both higher than 99%. However, the testing accuracy of the proposed method reaches 99.50%, which is superior to the other methods. This shows that although PCANet has high training accuracy for the images with low background interference, it is prone to overfitting during the training process. When applied to infrared images, it is more susceptible to interference from complex background information. It is obvious that the classic HOG+SVM method achieves the lowest training accuracy and testing accuracy because it is sensitive to noise and is not resistant to scale variations. Compared with the traditional CNN model, the trained SVM instead of the Softmax layer can classify more accurately in training and testing. The proposed method is more effective, and the generalization capability is significantly improved.  Table 4 displays the comparative experiment of the classic HOG+SVM, PCANet, traditional CNN, and the proposed method. According to the experimental results, the training accuracy of our proposed method and the PCANet method are both higher than 99%. However, the testing accuracy of the proposed method reaches 99.50%, which is superior to the other methods. This shows that although PCANet has high training accuracy for the images with low background interference, it is prone to overfitting during the training process. When applied to infrared images, it is more susceptible to interference from complex background information. It is obvious that the classic HOG+SVM method achieves the lowest training accuracy and testing accuracy because it is sensitive to noise and is not resistant to scale variations. Compared with the traditional CNN model, the trained SVM instead of the Softmax layer can classify more accurately in training and testing. The proposed method is more effective, and the generalization capability is significantly improved.

Conclusions
This paper presents an efficient thermal defect detection method based on infrared images using a CNN. To overcome the problem of a complex background, an improved pre-processing method is proposed based on infrared image characteristics. In the segmentation stage, RoI is effectively extracted according to contour and position information. Then, the T-IR11 dataset of temperature values is established. Finally, a CNN+SVM model is constructed to extract features and trained for classification, thus realizing the thermal defect detection for substation equipment. The conclusions can be drawn as follows: (1) Compared with the other binarization method, the proposed improved pre-processing method can accurately remove irrelevant information and retain effective regions by selecting the appropriate threshold adaptively. In addition, combined with contour information, the position of the temperature values can be accurately segmented, solving the problem of temperatures overlapping the background. (2) The T-IR11 dataset established in this study is crucial for thermal defect detection.
Based on the infrared images collected from the actual environment, the T-IR11 dataset containing 11 labels is extracted from the infrared images, which provides the foundation for the following defect detection work.
Loss Accuracy Figure 14. Comparison of the CNN and CNN+SVM models.

Conclusions
This paper presents an efficient thermal defect detection method based on infrared images using a CNN. To overcome the problem of a complex background, an improved preprocessing method is proposed based on infrared image characteristics. In the segmentation stage, RoI is effectively extracted according to contour and position information. Then, the T-IR11 dataset of temperature values is established. Finally, a CNN+SVM model is constructed to extract features and trained for classification, thus realizing the thermal defect detection for substation equipment. The conclusions can be drawn as follows: (1) Compared with the other binarization method, the proposed improved pre-processing method can accurately remove irrelevant information and retain effective regions by selecting the appropriate threshold adaptively. In addition, combined with contour information, the position of the temperature values can be accurately segmented, solving the problem of temperatures overlapping the background. (2) The T-IR11 dataset established in this study is crucial for thermal defect detection.
Based on the infrared images collected from the actual environment, the T-IR11 dataset containing 11 labels is extracted from the infrared images, which provides the foundation for the following defect detection work. (3) The CNN model is constructed for extract features and the trained SVM is used to replace the Softmax layer for classification. Precision, recall, and F 1 score indices are used to evaluate the performance of the proposed method, and 10-fold crossvalidation is employed on the dataset. The accuracy of the proposed method is 99.50%, which is the highest compared with the previous studies in terms of infrared images. (4) The proposed method realizes the rapid screening and recording of thermal defect images. It is beneficial for reducing the labor intensity of power grid inspectors and improving work efficiency. In the future, the speed of recognition needs to be further prompted to realize real-time recognition and automatic recording. Moreover, the training samples will be augmented to improve the accuracy of the proposed method for engineering applications.