Comparison of Traditional Image Segmentation Methods Applied to Thermograms of Power Substation Equipment

: The variation in the thermal state of electrical energy substation equipment is normally associated with natural wear or equipment failure. This can be detected by infrared thermography, but technically it demands a long time to analyze these images. Computational analysis can allow an automated, more agile, and more efficient analysis to detect overheated regions in thermographic images. Therefore, it is necessary to segment the region of interest in the images; however, the results may diverge depending on the technique used. Thus, this article presents the improvement of four different techniques implemented in Python and applied in a substation under real operating conditions for a period of eleven months. The performance of the four methods was compared using eight statistical performance measures, and the efficiency was measured by the runtime. The segmentation results showed that the methods based on a threshold (Otsu and Histogram-Based Threshold) were fast, with processing times of 0.11 to 0.24 s, but caused excessive segmentation, presenting the lowest accuracy (0.160 and 0.444) and precision (0.004 and 0.049, respectively). The clustering-based methods (Cluster K-means and Fuzzy C-means) showed similar results to each other but were more accurate (0.936 to 1.000), more precise (0.965 to 1.000), and slower, with 2.55 and 38.8 s, respectively, compared to the threshold methods. The Fuzzy C-means method obtained the highest values of specificity, accuracy, and precision among the methods under analysis, followed by the Cluster K-means method. to


Introduction
Substations play an essential role in the electrical power system, as they are responsible for the safe and reliable operation of the electrical network [1,2]. The failures observed in substations are predominantly caused by overloads, timeworn equipment, and unbalanced loads [3]. These failures can make corrective maintenance more costly, in addition to compromising the operational safety and reliability of the network [1,3,4]. Monitoring the equipment through infrared thermal images is one of the recommended methods to detect these failures, as an abnormal increase in temperature is linked to equipment problems and can lead to a severe failure of the power system [5][6][7]. Therefore, the fast and accurate detection of such thermal phenomena is essential to ensure system integrity and reduce financial losses [1,5].
Infrared thermography (IRT) is a technique that consists of capturing images of the infrared radiation emitted by objects [6,8]. This technique has some advantages, as it is contact-free and free from electromagnetic interference from the substation equipment itself [8][9][10]. In addition, it is safe for those involved, and it allows us to monitor electrical equipment in real time, without interrupting the system's operation [8,10]. On the other hand, IRT also has disadvantages, such as the measurement of only the surface temperatures, interference from environmental conditions, reading errors caused by the different types of emissivity of materials that make up the equipment, the high cost of high-quality cameras, and the need for specific technical knowledge [11].
Another aspect regarding the use of IRT in the traditional way is the need for a team to travel to the location to monitor the substation equipment, followed by the long time spent screening images to identify the indicators of failures [7]. Afterward, another team must return to carry out the repair. In addition, if the failure occurs between inspections, it can lead to an interruption in the system's operation [5,7,12] because, according to the Brazilian standard ABNT-NBR-15763, a thermal inspection must be performed every six months. A consistent solution to the issues presented above is the application of the computational analysis of infrared images, increasing the efficiency of diagnostics based on infrared detection [2][3][4].
Several works have approached computational analysis in the field of thermography, using different image processing techniques and computer vision, as well as the inclusion of artificial intelligence algorithms for problem prediction and classification, which may require a high level of processing. In recent decades, significant progress has been made in image processing technology [13,14]. Processing must detect the overheated region of the electrical equipment, called the region of interest (ROI), then extract characteristic information from the ROI, and finally determine the indicators of failure according to predefined classification criteria. Among image processing techniques, the overheated area detection technique predominantly found in the literature in different application areas is the infrared image segmentation method [3,[5][6][7][8]12].
Different segmentation methods are found in the literature; these are classified as region-based [15], contour-based [5], thresholding-based [12], and clustering-based [16]. However, each method is not equally suitable for all specific types of images, because the methods found in the literature are usually applied to specific image types and applications. Thus, an algorithm may be suitable for one type of image but may not be suitable for another, as is the case with segmentation methods developed for application to optical images, which do not have good results when applied to infrared images, for example, the contour-based method, which had its use discontinued due to malfunction and was not tested in this work. This occurs because of the pixel density and low-intensity contrast, which reduces the definition between the edges of the image objects, making it difficult to divide the image into two planes, with the ROI being precisely in the foreground [8,17]. Another influencing factor is the location of the equipment; sometimes, it is installed in a very hostile environment, which can cause noise in the image acquisition.
The fields of application of the methods are varied, but authors usually do not clearly show the details of the algorithm steps, which makes the development of the methods difficult. Applying these methods in substations with hundreds of pieces of equipment can effectively reduce the inspection workload and the risk of making the wrong decision.
Therefore, this work seeks the best "absolute" method among the improvements of four traditional segmentation methods for detecting the overheating of electrical equipment. The contribution of this article is the proposal to detect hotspots and extract the characteristic information from the ROI to replace human screening, making the diagnosis more agile and efficient with a moderate level of image processing. For this, several standardized manual acquisitions were made weekly over a period of eleven months, always during the morning to have similar load conditions, with approximately 7500 images at the end.

Methodology
The analysis was developed in a real application, where all hotspots above 50 °C revealed in a substation under real operating conditions were evaluated between February and November 2021. In this period, five pieces of equipment with hotspots were identified and monitored.
It is a step-down distribution substation, with an area of approximately 6000 m², which has a dual 69 kV supply and lowers the voltage level to eight 13.8 kV bays. The power processed by the substation at peak hours is normally around 30 MW.
During this period, the substation was inspected weekly using a FLIR ® T540 thermal camera, whose technical specifications are presented in Table A1 of Appendix A.
This thermal camera captures two images simultaneously, as shown in Figure 1: an optical image and an infrared image. Due to the size of the ROI being relatively small in relation to the whole image, the results of image processing are represented with a zoom in the following figures for better visualization for the reader, represented by the squares in red in Figure 1. The process of image fragmentation into regions facilitates its analysis and allows for obtaining more information about an ROI [15]. The ROI is defined by the segmented region of the object of interest area of the thermal image. The critical information acquired through it is the maximum temperature in that area and the average temperature of the pixels that form the ROI. In the present work, segmentation was performed using four different techniques implemented in Python, always on the same computer, with an i5-3230M 3rd-generation processor, ddr3 4 GB 1600 MHz RAM memory, and Intel HD Graphics 4000 graphics card, and their results are compared in Section 3. These techniques were applied to infrared images, with each pixel having a 16-bit temperature representation.
For this, initially, it was necessary to read the metadata of the radiometric files and extract the Raw Thermal Image file, which has pixel values between 0 and 65,535 (16 bits), referring to the A/D conversion of the radiation captured by the camera sensor. The equations that are applied use each pixel of this file as input (Appendix B), and the expected result is a matrix with values referring to the temperature in degrees Celsius using floating-point representation [18], which is subsequently normalized. This normalization is necessary because it is not possible to store an image with pixels using floating-point representation, so we chose to normalize the values so that they fit in a 16bit integer variable through Equation (1).
where the maximum, minimum, and delta temperature values are stored in a supporting .txt file. is the temperature in degrees Celsius, _ is the minimum temperature value in the image, and ∆ is the difference between the maximum and minimum temperatures of the image. By applying the inverse of the equation, it is possible to retrieve the temperature values in degrees Celsius. Then, the four segmentation techniques were improved to work with 16-bit files and were performed as detailed in the algorithm steps.
First method: Image segmentation using the Otsu method consists of three steps: threshold determination, image thresholding, and ROI location. This method automatically finds the best threshold for the image. This threshold is determined by minimizing the variances within classes or, equivalently, by maximizing the variance between classes [17]. Otsu suggested that the optimal threshold can be obtained from Equation (2) where and are the foreground and background probabilities, and µ and µ are the foreground and background variances, respectively. Then, the image is thresholded in both planes, with the foreground being the location of the ROI. To improve this method, a morphological erode filter is applied, which eliminates edges and slightly reduces the segmented area.
Second method: Image segmentation using the Histogram-Based Threshold method consists of three steps: threshold determination, image thresholding, and ROI location. This method also splits the image into the foreground and background but automatically determines the best threshold based on the shape properties of the histogram using different techniques, such as the analysis of the distance of the convex hull histogram, the smoothing of two peaks through autoregressive modeling, the coarser rectangular approximation for the histogram lobes, or the analysis of peaks and valleys [20]. The threshold is in the visible and deep valley of the bimodal histogram [21]. The algorithm starts with the construction of the input image histogram by finding a threshold that separates the histogram into two parts according to the probability criteria for the foreground and background areas. These are calculated using Equation (3) [20].
The background is expressed as , 0 ≤ ≤ , and the foreground is expressed as , is the maximum temperature value of a pixel, which, in this case, is 65,535 for a 16-bit image.
The next two methods are cluster methods, where the image is divided into regions that are represented by their respective centers, representing the pixel with the highest image temperature and the ROI in the cluster having the highest center temperature. In the first step in applying these algorithms, it is necessary to input the number of clusters, which are typically manually entered in other research, but this number may vary by image. To make this step automatic, an improvement was developed in this work through a preliminary algorithm for the automatic determination of the best number of clusters. Through the curve of the within-cluster sum-of-squares (Cluster K-means) and the final fuzzy partition coefficient (Fuzzy C-means), it was possible to apply the elbow method, which detected a dramatic reduction in the error, forming an elbow before stabilization [22]. In Figure 2, the curve of the within-cluster sum-of-squares is represented according to the variation in the number of clusters tested, ranging from 2 to 30, which presents the elbow in the case of eight clusters, that is, the greatest distance between the green line and the curve. In this example, the ideal number of clusters is 8. Within-cluster sum-of-squares curve resulting from the application of the elbow method, with the tested clusters varying from 2 to 30 and the "elbow" being identified in 8 clusters. The red arrow shows the best number of clusters, which is the greatest distance between the green and blue curves.
Third method: Image segmentation using the Cluster K-means method consists of three steps: reporting the number of clusters, image clustering, and ROI location. This is the vector quantization method, which divides the dataset into k clusters in an interactive process and is represented by the average of all data in each cluster [23]. For each cluster, the midpoint, called the k-centroid, is calculated. Then, each pixel is randomly chosen, and the cluster value is assigned by comparing the shortest distance with the centroids. After associating all of the pixels, new k-centroids are calculated. A loop is generated, only being interrupted when the centers no longer change. Thus, the within-cluster sum-ofsquares [24], that is, the variance given by Equation (4), is minimized.
where − ² is the distance between the pixel and the centroid , and is the number of pixels in the cluster. Fourth method: Image segmentation using the Fuzzy C-means (FCM) method consists of three steps: determining the number of clusters, image clustering, and ROI location. This method performs clustering based on fuzzy theory and provides more flexible clustering results, starting with initial cluster centers or arbitrary association values [25]. To improve this method, a standard seed was applied, avoiding the arbitrary assignment of the initial center in each execution of the algorithm. Each pixel is assigned a probability of belonging to this cluster, and the quadratic criterion is minimized, where clusters are represented by their respective centers [12]. FCM (J) is given by Equation (5).
FCM iterates a maximum number of times while updating the degrees of association and cluster centers , found through Equations (6) and (7): Once the segmentation processes were completed, it was necessary to compare the performance of the different segmentation methods by conducting verification based on human evaluation. To standardize and perform a fair comparison between the methods, a reference standard was generated; that is, initially, it would be the region of overheating generated by a technician when detecting a hotspot during an inspection. This is the region that would be reported in the electrical-equipment-overheating technical report based on the analysis conducted by the technician in FLIR ® Tools 6.4.18039.1003 software (Wilsonville, United States), adopted here as a human standard. Subsequently, it was identified that, in the case of the tool used, there was a limitation in the selection of the ROI due to its low accuracy, which, at best, would make the process costly over time until reaching the expected result. Therefore, an improved human standard was proposed and adopted as the reference or ground truth in this work. It was obtained by processing IRT through an algorithm in Python, where, initially, the threshold was manually entered after its definition by human criteria according to visual perception. Then, the image was separated into the background and foreground. The next section explains the selection of both patterns.
Then, several parameters corresponding to statistical performance measures were calculated. To compare the results of automatic segmentations with the improved human standard, a binary classification of pixels was performed, where TP is true-positive pixels, FP is false-positive pixels, TN is true-negative pixels, and FN is false-negative pixels [26].
From these data, statistical performance measures were also calculated, that is, criteria for selecting the best method, namely, sensitivity, specificity, accuracy, positive predictive value (PPV) or precision, negative predictive value (NPV), false-positive rate (FPR), false discovery rate (FDR), and false-negative rate (FNR), through the formulas shown in Table 1 [27]. In addition to the statistical performance measures, the efficiencies of the methods in the form of the runtime were analyzed. For this metric, the runtime of a purely human analysis was compared with the automatic computational methods.

Results
In this section, the results of all hotspots detected in infrared images during the study period are presented, as well as analyzed and compared with each other regarding their performance and efficiency (runtime).
Case 1 detected a hotspot in a 13.8 kV Tandem disconnect switch (Figure 3a), and case 2 detected a hotspot in an insulator connection after passing through the circuit breaker at 69 kV (Figure 3b). Case 3 detected a hotspot on an expander bar of the 13.8 kV three-pole disconnect switch (Figure 3c), case 4 detected a hotspot in a 13.8 kV blade disconnect switch (Figure 3d), and case 5 detected a hotspot in a 13.9 kV-region three-pole disconnect switch connector (Figure 3e).
The pixel with the highest temperature in each infrared image was extracted, as can be seen in Figure 3a After analyzing the images, a technician selected the ROI according to the tools available in the FLIR ® Tools 6.4.18039.1003 software (Wilsonville, United States), resulting in the regions shown in Figure 3b,f,j,n,r. To maintain the same pattern of image processing and make it possible to compare them with the other methods, these ROIs were manually segmented by coloring the pixels using the GIMP ® image editing software and separating the IRT into two planes, as can be seen in Figure 3c,g,k,o,s, termed standard human segmentation for each case.
As can be seen in these figures, a significant number of pixels were not selected due to the limitation mentioned above, which would compromise the analysis of automatic methods, thus requiring an improved human standard. The results of the improved human pattern segmentation are shown in Figure 3d,h,l,p,t. Then, the thermographic images were submitted to the automatic segmentation algorithms shown in this work. The results of these techniques are shown in Figure 4, where the ROI of the improved human pattern is superimposed over the automatic segmentations in order to make the process more visually accessible to the reader. During the analyzed period, varying environmental conditions were observed. Such conditions may influence the segmentation process when applied directly, as is the case of thermographic images on days with the presence of clouds.
As shown in Figure 4a,b,e,i,j,m,n,q,r, many pixels were marked as hotspots, i.e., FP (shown in white), due to the characteristics of infrared images with a high pixel density and the soft edges of the image objects, reinforcing that the use of segmentation methods developed for application in optical images cause excessive segmentation when applied to infrared images. For the case of Figure 4k,l,o,p,s,t, too many pixels were detected as FN (shown in red, highlighting in the enlarged region) due to excessive segmentation. In the case of Figure  4f, only two pixels were detected as FN; this is linked to the high variance in temperature, which impacted the correct choice of the threshold. In Figure 4c,d,g,l,s,t, a small number of pixels were detected as FP, whereas in Figure 4h, no pixels were incorrectly detected: i.e., all pixels were correctly determined as TP.
The five examples below show the processing time and the maximum and average temperatures of the ROI resulting from each method. The data are presented in Table 2. The camera's temperature measurement accuracy is ±2 °C or ±2% of the reading (as presented in Thermal Camera Technical Specifications-Appendix A). When comparing the maximum temperatures between the improved human standard and the automatic computational methods, the error identified was equal to 0.98%, 0.88%, 7.09%, 0.24%, and 0.17% for cases 1, 2, 3, 4, and 5, respectively. These temperature differences are related to the imperfect mathematical conversion of bits into temperature, which is not sufficient to compromise the response of any method. The error in case 3, despite being high, does not compromise the method because, as the camera was in the range of 20-120 °C, the software ran into a temperature limit (130.3 °C). However, when we extracted the metadata, this range did not matter, as we accessed the data directly from the sensor.
Therefore, there was an agreement between the methods regarding the maximum extracted temperature. However, the average temperatures showed high variation because of the different segmented regions. With the excessive segmentation of pixels, there was a significant drop in the average temperature, as the average temperature ended up being calculated with pixels from a non-hot or even cold region (Otsu and Histogram-Based Threshold, with the exception of case 2).
This same factor also affected the average temperature. The Otsu and Histogram-Based Threshold methods were the most affected ones, but the Cluster K-means method also had problems in case 2. In cases 3 and 4, the Cluster K-means and Fuzzy C-means methods presented a higher average temperature due to sub-segmentation.
All of the automatic computational methods proved to be agile concerning the improved human standard. The threshold methods segmented the areas of interest with a runtime between 0.11 and 0.24 s, while the Cluster K-means method segmented with a runtime between 2.55 and 2.90 s, and the FCM method segmented with a runtime between 8.50 and 38.80 s. This longer FCM runtime was caused by the automatic determination process of the number of clusters (used in pre-processing), which represented about 95% of the total time.
Considering that a technician from the research partner energy company takes around five minutes to analyze the IRT results for each image, even if they use the proposed FCM technique, the longest runtime will take approximately thirty seconds. This represents a saving of approximately four minutes and thirty seconds per image analyzed; that is, this automatic processing method could be ten times faster. Therefore, using an automatic segmentation method provides speed and reduces the effort and processing time, even for the slowest case, which is acceptable for the application proposed in this article. As expected, automatic segmentation methods are much faster than a human; however, there is a large variation in the computational time between the methods, and, depending on the application, this can be a determining factor.
The validation of numerical results to diagnose electrical equipment failures was performed after segmentation. Table 3 refers to the TP, FP, TN, and FN data obtained through the binary classification of pixels in the comparison between automatic segmentation and improved human standard segmentation. Statistical performance measures, namely, sensitivity, specificity, accuracy, PPV, NPV, FPR, FDR, and FNR, are shown in Table 4. Here, for a perfect result, the first five indices must be equal to 1, and the last three must be equal to 0. PPV or precision is very important for this analysis, as it provides the ratio of selected pixels to total pixels that should be selected, so when PPV is equal to 1, the true selection is the maximum. However, it is desirable that the segmentation method combines good accuracy and PPV results.
Some statistical measures effectively reflected cases where segmentation was excessive, such as specificity (when values are less than 1) and FDR and FPR (when values are greater than zero).
As the analyzed methods showed low FN, some statistical measures did not reveal a significant direct relationship in the analysis of the results, such as sensitivity, NPV, and FNR. When the sensitivity has a high value, it is understood that the TP pixels are correct, but in the case of excessive area segmentation, the sensitivity will also be high, as the TP pixels will be inside the excessive area segmentation. When the sensitivity has a low value and the FNR has a high value, it is understood that there is sub-segmentation, as in the results of case 3 in the K-means method. Jaffery [26] reported that excessive sensitivity leads to excessive area segmentation, but this work disagrees with this statement, as excessive area segmentation is related to the presence of FP but not FN pixels.
The segmentation obtained through the Otsu method was not able to segment only the superheated area, causing excessive area segmentation, represented by high FDR and FPR indices, which should be equal to zero but showed values between 0.254 and 0.996 in cases 1 to 5, with the exception of FPR in case 1, with 0.105, and low specificities of 0.427 to 0.895 in cases 1 to 5, which should be equal to 1. The excessive area segmentation caused by this method agrees with the results shown in [5,12]. Low accuracy indices and PPV also disfavor this method. However, it has good results for the segmentation of electrical equipment, similar to the results presented by Fan et al. [3]. Thus, this method can be useful for pre-segmentation, aimed at removing interference from the background and then associating it with another method.
Despite being the second-fastest method, the segmentations obtained through the Histogram-Based Threshold method did not show a good pattern in the results because of the characteristics of the infrared images. In general, when the definition between the edges of the objects in the image is poor, the separation between the two planes is poor. In cases 1, 3, 4, and 5, FDR and FPR were close to 1 and the specificity was between 0.146 and 0.681, which characterize excessive area segmentation, which is also reflected in the low accuracy and PPV indices. In case 2, the results were the opposite: FDR and FPR were equal to zero, specificity equaled 1, and the accuracy and PPV indices were high, characterizing correct segmentation. These opposing results make the method fragile. There are recommendations for the use of this method for images that present a bimodal histogram, and it is not considered good for all types of images, as shown in [21,28,29].
The Cluster K-means method segmented the ROI, showing high indices of specificity, accuracy, and PPV between 0.936 and 1.000 and low indices of FPR and FDR that varied between 0.001 and 0.035, except for case 2, where the FDR of 0.467 is considered high, and the PPV of 0.533 is considered low, both caused by the proportionality between the eight TP and seven FP pixels. However, it did not accurately detect the overheated area in all cases, as shown in Figure 4c,g,k,o,s. Due to the size of the ROI and the proportionality with the FP, the error was high in case 2, but considering the results obtained in case 1 and sub-segmentation in cases 3, 4, and 5, this is the second method recommended in this work. The results presented by Fan et al. [3] showed that the K-means method was not able to accurately discover the real heating area. Muhammad et al. [16] recommended this method because it is more agile than FCM but did not analyze the quantitative performance of the methods.
The FCM clustering method was the only method that segmented the superheated area and reached the maximum values of the statistical measures in case 2. In case 1, only seven FP pixels were detected, which led to a slight drop in the statistical measures of specificity and accuracy (0.999) and PPV (0.965) and low FPR and FDR indices (0.001 and 0.035, respectively). In case 3, only 8 FP pixels, in addition to the 97 FN pixels, were detected, which led to a greater drop in the statistical measures of specificity and accuracy, with 0.999 and 0.983, respectively. It also had a PPV of 0.976 and great FPR and FDR indices. Cases 4 and 5 produced results that were identical or very close to those of the Cluster K-means method. Our work found results similar to some studies, such as those that showed the longest runtime for FCM, having shown greater precision in [3,16] and being different from studies that showed excessive area segmentation in [8,12]. The possibility of its combination with other segmentation methods (Otsu + FCM) was presented by Fan et al. [3], with promising results. Therefore, this is the first method recommended in this work.
The limitation of this work was due to the selection of segmentation methods based on the number of citations. Thus, some more recently developed and more efficient methods may not have been selected due to the lower number of citations. According to the nature of the work, methods that require a lot of computational time and investment in hardware were not applied, for example, methods that use neural networks, which require more investment, time for training, and trained technicians to create references for the network to follow and a longer application time after the completion of the network.

Conclusions
The detection of equipment overheating can guarantee the integrity of the electrical power system, as this is one of the indications of a failure in electrical equipment inside substations. A computational analysis instead of a human technical analysis provides agility in the automatic detection of overheated regions of electrical equipment. Still, a variety of segmentation methods in the literature may or may not be suitable for infrared.
As our final result, we recommend the Fuzzy C-means method. This was the method that obtained higher scores in most cases for sensitivity, specificity, accuracy, positive predictive value, and negative predictive value and lower scores for the false-positive rate, false discovery rate, and false-negative rate in the analyzed infrared images.

Appendix B
The following pertains to the implemented equations used to return the temperature value in degrees Celsius from the raw values of the metadata of the images acquired with FLIR cameras.