Partial Discharge Pattern Recognition Based on 3D Graphs of Phase Resolved Pulse Sequence

Partial discharge (PD) is an important phenomenon that reflects the insulation condition of electrical equipment. In order to protect the safety of power grids, it is of significance to diagnose the type of insulation defects inside the equipment accurately and early through PD pattern recognition. In this article, phase resolved pulse sequence (PRPS) graphs in 3D were constructed by the PD pulse data of the gas-insulated switchgear (GIS) acquired, then the histogram of oriented gradient (HOG) features were extracted directly from the 3D PRPS graphs, and finally the attribute selective Naïve Bayes classifier was used to recognize the discharge pattern. In addition, this method was compared with two traditional methods, i.e., the statistical method and the grayscale gradient co-occurrence matrix method, from three aspects. The result shows that 3D PRPS graphs have different morphology characteristics in vision under different defects, and the similarity among different voltages applied is higher than among different defects, so it is reasonable to use them as the basis for PD pattern recognition. The contrast indicates that the HOG method not only has the highest accuracy with the least requirement for pretreatment and training, but it also has robustness when the voltage applied changes. Consequently, this method has the universality for PD pattern recognition that is based on 3D PRPS graphs.


Introduction
With the gradual expansion of the power grid scale and the gradual enhancement of voltage level, the safety and reliability of electrical equipment becomes increasingly important. Partial discharge (PD), as an important phenomenon of insulation deterioration of electrical equipment, has the risk of damaging the safety of a power grid if without any prevention from continually deteriorating the insulation condition of the equipment [1]. The PD signals caused by different insulation defects usually show different characteristics, so the type of insulation defects inside the equipment can be accurately diagnosed according to the characteristics. The maintainer can thus diagnose the existing defect and take appropriate measures in time in order to prevent accidents [2][3][4]. Therefore, it is an important premise to construct an appropriate PD pattern, extract representative feature parameters, and design a classifier of discharge type with good performance.
The PD patterns are mainly divided into three types: time resolved partial discharge (TRPD) [5,6], phase resolved partial discharge (PRPD) [7][8][9], and phase resolved pulse sequence (PRPS) [10]. Since TRPD is greatly affected by the signal propagation path and noise interference, it is not suitable for pattern recognition. Therefore, most studies recognize 2D PRPD graphs, which requires tedious statistical calculations of data in PRPS to construct 3D PRPD graphs at first and then requires alternation between the dimensions of the graphs. In fact, 3D PRPS graphs can be easily constructed according to the primary data acquired by the detection system, and PRPS has covered essential PD information, such as apparent charge, time of occurrence, and phase of the applied voltage, which makes it possible to calculate nearly all PD graphs, such as the charge magnitude distribution meeting the criterion of analysis formulated by the International Council on Large Electric Systems [11]. As a result, PRPS can be used as the basis for pattern recognition without complexity of pretreatment.
However, few reports have recognized PD patterns directly based on 3D PRPS graphs due to the limitations of traditional recognition methods. The existing methods usually extract traditional low-dimensional, multiparameter features such as statistical features [12][13][14], wavelet coefficients [15][16][17], fractal features [18,19], Tamura texture features [20], and Weibull parameters [21], and then use classifiers such as support vector machine [22], neural networks [5,23], and clustering [19] to realize the recognition. One limitation is that the above methods almost exclusively apply to 2D graphs and have difficulty in recognizing 3D graphs. The other limitation is that the features are too simple to possess universality for PRPS, which may change some characteristics with the same insulation defect under different voltages applied [10]. Hence, this article aims to develop an innovative method for PD pattern recognition based on 3D PRPS graphs.
This article constructs a database of 3D PRPS graphs by collecting the PD pulse data of a gas-insulated switchgear (GIS), then extracts the histogram of oriented gradient (HOG) features directly from 3D PRPS graphs, and finally designs the attribute selective Naïve Bayes (ASNB) classifier to recognize the samples. Section 2 introduces the materials and methods used in this article, including the experiment platform for the construction of a database as well as the methods of feature extraction and classifier design. In Section 3, the results of applying the above method are presented in detail. In Section 4, the method proposed is compared in contrast with two traditional methods to verify its advantage of a lower requirement for pretreatment and training, higher recognition accuracy, and better robustness when the voltage applied changes.

Database Construction: 3D Graphs of PRPS
To acquire the PD pulses, this paper selected a set of real GIS with a rated voltage of 126 kV as the test object. The GIS test object is about 4m long, with 3 air chambers separated by insulators. Four kinds of typical insulation defects [8,14,24,25] were made by hand: • Free metal particles: This model was made by placing 10 copper balls with a radius of 0.5 mm at the ground electrode. When a high voltage is applied, the tinfoil balls will jitter randomly, which will produce PD and stimulate a high frequency electromagnetic wave.

•
Suspended electrodes: This model was made by placing a copper cube with a side length of 20 mm in a cylindrical epoxy resin. The epoxy resin is 30 mm high, with the top surface connecting to a high voltage electrode and the bottom surface connecting to a ground electrode.

•
Metal tip: This model was made by copper wire with a radius of 5mm and a length of 20 mm. One end of the copper wire was grounded to a needle tip with a curvature radius of about 0.1 mm, and the tip end was 5 mm away from the ground electrode.

•
Air gap: This model was made by sealing an air gap with a radius of about 3 mm in the middle of cylindrical epoxy resin with a height of 30 mm. The epoxy resin was sandwiched between high voltage and ground electrodes.

•
Since the ultra-high frequency (UHF) method is the most efficient detection technique for GIS, which has good anti-interference [26], an external UHF sensor based on fractal microstrip antenna was placed on the surface of the insulators. The UHF sensor was self-developed according to the simulation result of electromagnetic wave in GIS, which has the following indices: central frequency at 702 MHz; absolute bandwidth at 646-784 MHz, with a standing-wave ratio (SWR) less than 2; Energies 2020, 13, 4103 3 of 16 • bandwidth ranges with a gain more than 7 dB at 630-810 MHz and 1290-1420 MHz.
The sensor was connected to a detection system that was self-developed and composed of a signal conditioner, a sampler, and software. The signal conditioner can filter noises, amplify the PD signal, and synchronize the phase by hardware design. The sampler can collect PD pulses of 50 continuous power frequency cycles per second by peak envelope demodulation technology in the hardware design. Each power frequency cycle was divided into 100 phase intervals, so every sample contained 5000 data points of discharge amplitude with the information of power frequency cycle and phase. All the data sampled can be stored in software to construct the database.
The conventional electrical detection method was also used to provide a reference. The complete experiment platform was established according to the test circuit shown as Figure 1. The sensor was connected to a detection system that was self-developed and composed of a signal conditioner, a sampler, and software. The signal conditioner can filter noises, amplify the PD signal, and synchronize the phase by hardware design. The sampler can collect PD pulses of 50 continuous power frequency cycles per second by peak envelope demodulation technology in the hardware design. Each power frequency cycle was divided into 100 phase intervals, so every sample contained 5000 data points of discharge amplitude with the information of power frequency cycle and phase. All the data sampled can be stored in software to construct the database.
The conventional electrical detection method was also used to provide a reference. The complete experiment platform was established according to the test circuit shown as Figure 1.

Coupling capacitor Autotransformer
Step-up transformer  Figure 1. Test circuit of experiment platform for gas-insulated switchgear (GIS) partial discharge (PD).
Three-dimensional PRPS graphs show the distribution of the discharge amplitude for the phase and power frequency period [10]. To construct the database, we executed the experiment according to the following steps: 1. Select a defect model to be placed inside the middle chamber of the GIS, and fill the chamber with 0.5 MPa SF6. 2. After examining the connection of the platform, begin to exert the voltage. The voltage applied was increased by a step of 0.1 kV. Observe the standard PD instrument, and temporarily stop the boosting when PD occurs. Take record of the inception voltage and apparent PD quantity. 3. Keep the inception voltage, and begin to sample PD pulses by the detection system. After 3 min, stop the detection system to finish the first sampling process, and save the samples in the basic database. 4. Continually increase the voltage, and observe the screen of the standard PD instrument. If obvious changes occur to the discharge information, restart the detection system to acquire samples to be saved in the extra database. For the metal tip and air gap, enable boosting to breakdown voltage. For free metal particles and the suspended electrode, stop the boosting when voltage reaches three times the inception voltage because the breakdown voltage of these two types is too high [27]. 5.
Step down the voltage gradually until the power supply is turned off. Remove the platform wiring, and replace with another defected model. Repeat the above experiment steps until four defected models are used.
For each sample in the database, a 3D graph of PRPS can be plotted. The X-axis represents the power frequency cycle from 0 to 50, the Y-axis represents the phase from 0° to 360°, and the Z-axis Three-dimensional PRPS graphs show the distribution of the discharge amplitude for the phase and power frequency period [10]. To construct the database, we executed the experiment according to the following steps:

1.
Select a defect model to be placed inside the middle chamber of the GIS, and fill the chamber with 0.5 MPa SF 6 .

2.
After examining the connection of the platform, begin to exert the voltage. The voltage applied was increased by a step of 0.1 kV. Observe the standard PD instrument, and temporarily stop the boosting when PD occurs. Take record of the inception voltage and apparent PD quantity. 3.
Keep the inception voltage, and begin to sample PD pulses by the detection system. After 3 min, stop the detection system to finish the first sampling process, and save the samples in the basic database.

4.
Continually increase the voltage, and observe the screen of the standard PD instrument. If obvious changes occur to the discharge information, restart the detection system to acquire samples to be saved in the extra database. For the metal tip and air gap, enable boosting to breakdown voltage. For free metal particles and the suspended electrode, stop the boosting when voltage reaches three times the inception voltage because the breakdown voltage of these two types is too high [27]. 5.
Step down the voltage gradually until the power supply is turned off. Remove the platform wiring, and replace with another defected model. Repeat the above experiment steps until four defected models are used.
For each sample in the database, a 3D graph of PRPS can be plotted. The X-axis represents the power frequency cycle from 0 to 50, the Y-axis represents the phase from 0 • to 360 • , and the Z-axis represents the discharge amplitude normalized. For reducing the calculation scale, we plotted the graph by grayscale with the resolution of (256,256).

Feature Extraction: HOG Attribute Space
HOG is a high-dimensional feature operator frequently used in the field of computer vision to capture the shape of local target edges in images [28]. This article applied it in 3D graphs of PRPS. The HOG feature vector of a single sample can be extracted by the following steps: 1. Calculate the gradient of image pixel Mark the gray value of pixel (x,y) as L(x,y), then calculate its gradient value G x (x,y) and G y (x,y) in horizontal and vertical directions: Then, the gradient amplitude G(x,y) and gradient direction θ(x,y) can be calculated:

Calculate the HOG of a cell unit
Divide the image into uniform rectangular cell units with a size of (8,8) so that there are 64 gradient values per cell unit. Divide the gradient direction into nine equal groups; the i-th group of histogram represents the gradient direction interval ((i − 1)π/9, iπ/9). Check the gradient direction of each pixel point in the cell unit: If θ(x, y) ∈ ((i − 1)π/9, iπ/9), this group of histogram is weighted by the gradient amplitude of this pixel point; if θ(x, y) = kπ/9, (k = 0, 1, . . . , 9), carry out a bilinear interpolation in the direction and position for this pixel point. After all pixel points of a cell unit are checked, its HOG feature vector is obtained with a dimension of nine.

Construct the HOG of a sliding window
Combine four adjacent cell units into a sliding window, and the HOG feature vector V HOG with a dimension of 36 can be obtained. In order to reduce the influence of the image foreground and background contrast, the HOG sliding window is normalized:

Construct the HOG of the whole image
Move the sliding window with a fixed step, and adjoin the HOG of every sliding window to acquire the HOG of whole image. The shorter the step is, the richer the information is, but the higher the dimension of HOG is also. In order to keep balance, this paper selects half of the length of the sliding window as the moving step. As a result, the HOG feature vector of a single sample should have 34,596 dimensions.
Because the feature extraction process of each sample follows the same iterative algorithm above, the HOG feature vectors of different samples have the same attribute dimension and follow the same arrangement order. Therefore, the HOG attribute space M of the multisamples can be constructed according to the ordinal number of dimension. The size of M is (m,n), where m is the number of samples and n is the dimensions of the attribute.

Classifier Design: ASNB
The Naïve Bayes (NB) classifier is widely used because of its reliable mathematical theory and simple application [29]. It determines the type of sample c out based on the principle of maximum probability: where C is the type set, including the four types in this paper, i.e., free metal particles, suspended electrode, metal tip, and air gap; and p(c|F) is the probability that sample F belongs to type c, whose calculation is based on the Bayes theorem.
The NB classifier has a premise that each attribute should be independent, but HOG attributes depend on each other. Moreover, a large number of background blank pixel points in PRPS graphs have no type difference, so the redundancy of HOG attribute space is too high, which will bring a large amount of calculation and even cause a crash to the self-learning process of the NB classifier. Therefore, the HOG attribute space needs to be reformed to ameliorate the NB classifier.
The ASNB classifier selects a subset from the attribute space to reduce the redundancy [29]. The subset should be as small as possible; meanwhile, the attributes selected should be mutually independent and as representative as possible. In this article, the ASNB classifier is designed by following steps:

Reconstruction of HOG attribute space
Because the covariance can well reflect the error between two attributes, this paper adopts the covariance matrix C n × n to construct orthogonal axes whose number equals to the number of attribute dimensions in the original HOG attribute space. C n × n can be calculated by the following definition: where C ij represents the element in the i-th row and j-th column of the covariance matrix C n × n , and H i and H j are respectively the i-th and j-th dimension of attribute vector of HOG attribute space M: The covariance of two vectors X and Y is calculated by: According to the eigenvalue decomposition matrix principle, C n × n can be decomposed as: where Q represents the parameters matrix of a linear transformation; Λ is a diagonal matrix whose diagonal elements λ 1 , λ 2 , . . . , λ n respectively correspond to the contribution of each dimension of the reconstructed attribute space, and Q is rearranged to make sure that there is Through the linear transformation in Equation (9), the mapping of the original HOG attributes to orthogonal axes can be realized by Equation (10): where M' represents the reconstructed attribute space whose attributes satisfy the requirement of mutual independence. Aiming at seeking the optimum subset, this paper will use the greedy search method: Assuming that the target attribute subset is an empty set at first, we add the attribute to the subset in descending order of contribution, train the classifier, and record the recognition rate every time. We stop the search until the rate does not change any longer.

Selection of attributes
Finally, the ASNB classifier determines the type of sample c out by Equation (11): where S is the subset, S i is the i-th attribute in the subset, and k is the number of subset dimension.  order of contribution, train the classifier, and record the recognition rate every time. We stop the search until the rate does not change any longer. Finally, the ASNB classifier determines the type of sample cout by Equation (11):

Summary of Entire Process
where S is the subset, Si is the i-th attribute in the subset, and k is the number of subset dimension. Figure 2 shows a flow chart of the presented method from the signal registration to the final diagnosis.

Start
Acquisition of PD data points

Conventional Discharge Information
Through the standard PD instrument in Figure 1, the inception voltage and apparent discharge quantity were obtained as follows: • Free metal particles: The inception voltage is 13kV, and the apparent discharge quantity is between 142 and 473 pC. The data logging with time is shown in Figure 3a.

Conventional Discharge Information
Through the standard PD instrument in Figure 1, the inception voltage and apparent discharge quantity were obtained as follows: • Free metal particles: The inception voltage is 13kV, and the apparent discharge quantity is between 142 and 473 pC. The data logging with time is shown in Figure 3a. order of contribution, train the classifier, and record the recognition rate every time. We stop the search until the rate does not change any longer. Finally, the ASNB classifier determines the type of sample cout by Equation (11): where S is the subset, Si is the i-th attribute in the subset, and k is the number of subset dimension. Figure 2 shows a flow chart of the presented method from the signal registration to the final diagnosis.

Start
Acquisition of PD data points

Conventional Discharge Information
Through the standard PD instrument in Figure 1, the inception voltage and apparent discharge quantity were obtained as follows: • Free metal particles: The inception voltage is 13kV, and the apparent discharge quantity is between 142 and 473 pC. The data logging with time is shown in Figure 3a. As can be seen in Figure 3, the waveform in the time domain of conventional electrical detection is difficult to be used to indicate its discharge type. Hence, the UHF detection system was selfdeveloped to sample the data points of the discharge amplitude related to the phase and power frequency cycle. The PRPS 3D graphs were thus acquired by massive data points for pattern recognition.

Basic Database
According to the method introduced in Section 2.1, the 3D PRPS graphs under four types of insulation defects are plotted as Figure 4. As can be seen in Figure 3, the waveform in the time domain of conventional electrical detection is difficult to be used to indicate its discharge type. Hence, the UHF detection system was self-developed to sample the data points of the discharge amplitude related to the phase and power frequency cycle. The PRPS 3D graphs were thus acquired by massive data points for pattern recognition.

Basic Database
According to the method introduced in Section 2.1, the 3D PRPS graphs under four types of insulation defects are plotted as Figure 4. As can be seen in Figure 3, the waveform in the time domain of conventional electrical detection is difficult to be used to indicate its discharge type. Hence, the UHF detection system was selfdeveloped to sample the data points of the discharge amplitude related to the phase and power frequency cycle. The PRPS 3D graphs were thus acquired by massive data points for pattern recognition.

Basic Database
According to the method introduced in Section 2.1, the 3D PRPS graphs under four types of insulation defects are plotted as Figure 4.  It can be seen that the discharge pulses of free metal particles are sparse and scattered without regularity. For the suspended electrode, there is a cluster of evident discharge pulses at both the positive and negative half of the phase. In addition, it also has some small pulses near 0 • and 200 • . The discharge pulses of the metal tip are mainly distributed in the third quadrant. The discharge pulses of air gap are mainly distributed near 90 • and 270 • , which have a similar distribution with the suspended electrode. However, the pulses of the suspended electrode are sparser. In brief, the PRPS graph under different defects shows different morphology in vision, so it can be used as the basis for PD pattern recognition.
In cases where the samples were acquired when the detection system was not stable during start-up and termination in step 3 of the experiment, we took the number of fewest samples as the number reference to seek the balance. As a result, 140 samples for each defect were retained, so the basic database possesses 560 samples in total.

Extra Database
After continuing boosting the voltage applied from the inception voltage, the PRPS graphs of free metal particles and air gap have no huge diversification, while the PRPS graphs of the suspended electrode and metal tip were greatly affected. The change is caused by a different generation mechanism of the PD signal [30]. The differences are shown in Figure 5.  It can be seen that the discharge pulses of free metal particles are sparse and scattered without regularity. For the suspended electrode, there is a cluster of evident discharge pulses at both the positive and negative half of the phase. In addition, it also has some small pulses near 0° and 200°. The discharge pulses of the metal tip are mainly distributed in the third quadrant. The discharge pulses of air gap are mainly distributed near 90° and 270°, which have a similar distribution with the suspended electrode. However, the pulses of the suspended electrode are sparser. In brief, the PRPS graph under different defects shows different morphology in vision, so it can be used as the basis for PD pattern recognition.
In cases where the samples were acquired when the detection system was not stable during startup and termination in step 3 of the experiment, we took the number of fewest samples as the number reference to seek the balance. As a result, 140 samples for each defect were retained, so the basic database possesses 560 samples in total.

Extra Database
After continuing boosting the voltage applied from the inception voltage, the PRPS graphs of free metal particles and air gap have no huge diversification, while the PRPS graphs of the suspended electrode and metal tip were greatly affected. The change is caused by a different generation mechanism of the PD signal [30]. The differences are shown in Figure 5. From Figure 5a,b, it can be seen that the small pulses of the suspended electrode disappeared as the voltage applied was raised, only leaving two clusters of pulses with high amplitude. From Energies 2020, 13, 4103 9 of 16 Figure 5c,d, it can be seen that some sparse pulses emerged in the first quadrant in the graph of the metal tip. In addition, a few sparse pulses have greater amplitudes than the pulses that always exist in the third quadrant, leading to a decrease of kurtosis in vision. In brief, the increase of the voltage applied caused a certain degree of change in the discharge amplitude and phase distribution, but the main difference occurred in the region where the discharge amplitude is weak or when the discharge pulse is sparse. Therefore, the morphology similarity is still much greater than that between different defects, although there are differences in the same defect under different voltages applied. This article extracted HOG features, which can describe the morphology of the PRPS graph, so the recognition result should not be interfered by voltage applied. In order to verify the robustness of the algorithm in this article, an extra database composed of 140 samples of the suspended electrode at 25.8 kV and 140 samples of metal tip at 15.1 kV were constructed.

Feature Extracted: HOG Attribute Space
According to the method introduced in Section 2.2, the HOG feature vector of each sample was obtained. Since the dimension is too high to be listed, Figures 6 and 7 respectively show the global and local visualization results of HOG features extracted from 6 samples in the basic database. From Figure 5a,b, it can be seen that the small pulses of the suspended electrode disappeared as the voltage applied was raised, only leaving two clusters of pulses with high amplitude. From Figure  5c,d, it can be seen that some sparse pulses emerged in the first quadrant in the graph of the metal tip. In addition, a few sparse pulses have greater amplitudes than the pulses that always exist in the third quadrant, leading to a decrease of kurtosis in vision. In brief, the increase of the voltage applied caused a certain degree of change in the discharge amplitude and phase distribution, but the main difference occurred in the region where the discharge amplitude is weak or when the discharge pulse is sparse. Therefore, the morphology similarity is still much greater than that between different defects, although there are differences in the same defect under different voltages applied. This article extracted HOG features, which can describe the morphology of the PRPS graph, so the recognition result should not be interfered by voltage applied. In order to verify the robustness of the algorithm in this article, an extra database composed of 140 samples of the suspended electrode at 25.8 kV and 140 samples of metal tip at 15.1 kV were constructed.

Feature Extracted: HOG Attribute Space
According to the method introduced in Section 2.2, the HOG feature vector of each sample was obtained. Since the dimension is too high to be listed, Figures 6 and 7 respectively show the global and local visualization results of HOG features extracted from 6 samples in the basic database.  Each point in Figures 6 and 7 represents a cell unit. Each line represents a gradient angle group and its length represents the relative weight of this gradient. It can be seen from Figure 6 that the HOG feature describes the morphology of the PRPS graph. To see this clearly, Figure 7 is given as a local region of the PRPS graph that was selected randomly by zooming in on Figure 6. Through the local region being amplified, it can be seen that the HOG feature vector has a different value under different defects, which provides a good guarantee for recognition. Afterwards, the original HOG attribute space was constructed by combining the feature vectors of the samples.

Classifier Designed: ASNB
According to the method introduced in Section 2.3, the ASNB classifier was designed. We selected 120 samples as the training samples from the basic database randomly, and we used the rest of the 440 samples in the basic database to test the recognition rate. The search process of the optimum subset of attributes reconstructed is shown in Table 1. In order to avoid the occasional raising by random selection of the training sample, 10 tests were repeated every time when adding an attribute to the subset.  Each point in Figures 6 and 7 represents a cell unit. Each line represents a gradient angle group and its length represents the relative weight of this gradient. It can be seen from Figure 6 that the HOG feature describes the morphology of the PRPS graph. To see this clearly, Figure 7 is given as a local region of the PRPS graph that was selected randomly by zooming in on Figure 6. Through the local region being amplified, it can be seen that the HOG feature vector has a different value under different defects, which provides a good guarantee for recognition. Afterwards, the original HOG attribute space was constructed by combining the feature vectors of the samples.

Classifier Designed: ASNB
According to the method introduced in Section 2.3, the ASNB classifier was designed. We selected 120 samples as the training samples from the basic database randomly, and we used the rest of the 440 samples in the basic database to test the recognition rate. The search process of the optimum subset of attributes reconstructed is shown in Table 1. In order to avoid the occasional raising by random selection of the training sample, 10 tests were repeated every time when adding an attribute to the subset. As can be seen in Table 1, the recognition rate increased with the number of attributes selected. When the subset possessed only the first attribute of the HOG space reconstructed, the recognition rate was only 52.95%-65.68%, which is too low to meet the requirements of the classifier. The recognition rate realized an enormous growth when the capacity of subset was raised to two. However, there was a huge fluctuation between the lowest rate and the highest rate, which indicates that the recognition result is easily affected by the selection of training samples. When the third attribute was added into the subset, the recognition rate reached 95.91%-99.32%, which has already exceeded most of recognition algorithms at present. Considering the worst situation, the lowest recognition rate should be chosen as the search criteria, so we continued to add more attributes into the subset. The test results show that when the capacity of subset reached four, the recognition rate converged. After that, the result kept unchanged no matter how many attributes were added. Consequently, the optimum subset selected the first four attributes of the HOG space reconstructed. Taking four samples of different defects as an example, Table 2 lists the values of attributes selected. In summary, when the number of training samples is 120, the ASNB classifier designed selected the above four attributes to be the input feature. The overall recognition rate can reach up to 99.32%.

Performances in Contrast with Traditional Methods
In order to verify the advantages of the method proposed by this article, two traditional methods widely used were compared: Grayscale gradient co-occurrence matrix (GGCM) method [31] The feature vectors in these two methods are low dimensional compared to the HOG feature vector, and each parameter has a different meaning, so mutual independence of attributes is also satisfied. Hence, the features can be directly input in the NB classifier without reconstruction and selection of attributes.

Requirement for Pretreatment and Training
During the feature extraction, the statistical method needs a tedious statistical calculation of the data in PRPS and alternation between the 3D and 2D graphs. In addition, this paper plotted maximum discharge phase distribution and average discharge phase distribution by statistical calculation for PRPS. From these two 2D distributions, there were 12 parameters noted, including average amplitude of discharge, standard deviation, information entropy, and energy density in global as well as skewness and kurtosis in positive and negative half-cycle of two distributions.
The GGCM method needs to calculate the GGCM based on the gray pixel matrix of the images at first, and then 15 parameters are extracted from the GGCM. These features include a small gradient advantage, large gradient advantage, gray scale distribution inhomogeneity, gradient distribution inhomogeneity, etc.
Therefore, both the statistical method and the GGCM method have more requirements for the pretreatment. Instead, the HOG method extracts features directly from the pixel matrix of 3D PRPS graphs, which can be easily manipulated.
In terms of training requirement, this section changed the number of training samples to compare the recognition results. Here, 50 tests were repeated every time when 20 training samples were added, and the lowest overall recognition rates were taken as the points to be plotted in Figure 8. In terms of training requirement, this section changed the number of training samples to compare the recognition results. Here, 50 tests were repeated every time when 20 training samples were added, and the lowest overall recognition rates were taken as the points to be plotted in Figure 8. As can be seen from Figure 8, the overall recognition accuracy increases with the number of training samples for all methods. It gradually reaches the maximum value and finally converges. The difference of the methods is the smallest number of training samples that meets the requirements of convergence. The statistical method converges to 97.5%-98% at a number of 140 samples, while the GGCM method converges to only 92%-92.5% at 200 samples. Instead, the HOG method reaches 96.09% only at 100 samples and rapidly converges to over 99% at 120 samples. As a result, the HOG method requires the fewest training samples to reach the highest accuracy, thus reducing the training cost.

Recognition Accuracy
From the above test, it can be roughly seen that the HOG method has higher accuracy than the others. For the sake of fairness, this section compares the three methods under the optimal conditions. To do this, we selected 200 samples as the training samples from the basic database randomly because this is the lowest requirement for the GGCM method. The rest of the 360 samples in the basic database were used to test the recognition rate. The specific results of each defect are listed in Table 3, Table 4 and Table 5.   As can be seen from Figure 8, the overall recognition accuracy increases with the number of training samples for all methods. It gradually reaches the maximum value and finally converges. The difference of the methods is the smallest number of training samples that meets the requirements of convergence. The statistical method converges to 97.5%-98% at a number of 140 samples, while the GGCM method converges to only 92%-92.5% at 200 samples. Instead, the HOG method reaches 96.09% only at 100 samples and rapidly converges to over 99% at 120 samples. As a result, the HOG method requires the fewest training samples to reach the highest accuracy, thus reducing the training cost.

Recognition Accuracy
From the above test, it can be roughly seen that the HOG method has higher accuracy than the others. For the sake of fairness, this section compares the three methods under the optimal conditions. To do this, we selected 200 samples as the training samples from the basic database randomly because this is the lowest requirement for the GGCM method. The rest of the 360 samples in the basic database were used to test the recognition rate. The specific results of each defect are listed in Tables 3-5.   It can be seen that the method proposed by this article has the highest overall recognition accuracy, which is 1.94% higher than the statistical method and 6.94% higher than the GGCM method. Specifically, the statistical method can well recognize the suspended electrode, the metal tip, and air gap, but it has a probability of 10.11% to mistake the free metal particles for air gap. This is because the pulses of free metal particles are random, which may present some similarity with other types. The GGCM method raises the recognition rate of free metal particles from 89.89% to 95.35%, but the cost is that the metal tip and air gap have the possibility to be identified in error. The GGCM method has the lowest overall accuracy, which is only 92.5%, which indicates that the traditional image features are not suitable enough for 3D PRPS graphs, even though they have been well applied in 2D images. However, the HOG method solves the problem of these two traditional methods. It raises the recognition rate of free metal particles to 100%. This is because the HOG method extract features from every small cell unit in the graph. The feature of high dimension is more capable of resisting accidental similarity. Meanwhile, it guarantees that the suspended electrode and air gap can still be fully identified, and the recognition rate of the metal tip is only slightly decreased. The reason is that the HOG features describe the morphology characteristics of a 3D PRPS graph in more detail.

Robustness when Voltage Applied Changes
In Section 3.1, an extra database composed of 140 samples of the suspended electrode at 25.8 kV and 140 samples of the metal tip at 15.1 kV were constructed. Aiming at comparing the robustness of different methods, the classifier trained in Section 4.2 was used to test the 280 samples in the extra database. Table 6 shows the recognition accuracy of the suspended electrode and metal tip at different voltages applied. As can be seen from Table 6, by using the HOG method, the recognition accuracy of the suspended electrode at 25.8kV reaches 97.86% even though no sample of this group was trained. It is the same with the metal tip. Training samples are at 6.5 kV, but the recognition accuracy of the samples at 15.1 kV is only 2.04% lower than that of the samples at 6.5 kV.
However, the statistical method, which can fully identify the samples of the suspended electrode and metal tip in the basic database, encounters a huge failure when the voltage applied varies. Especially for metal tip, the recognition accuracy falls abruptly from 100% to 69.29%. The reason is that both the relative amplitude and phase distribution of the discharge pulses have some changes, thus having an impact on the 2D PRPD graph, the maximum discharge phase distribution, and average discharge phase distribution. The values of the statistical parameters extracted also have such differences that the recognition result is no longer accurate.
Both the GGCM method and the HOG method were applied in 3D PRPS graphs. The results in Table 4 indicate that these two methods have better robustness when the voltage applied changes and thus are less dependent on the comprehensiveness of training samples. The reason is that the morphology of the PRPS has higher similarity among different voltages applied than among different defects, as Figure 4 shows. Considering that the GGCM method is less accurate, the HOG method has more universality for recognizing 3D PRPS graphs.

Conclusions
Aiming at achieving pattern recognition of PD in GIS, this article applied the HOG method in 3D PRPS graphs. Firstly, 5000 PD pulses were collected through the UHF detection system established in the laboratory to plot each 3D PRPS graph. According to the variance of the voltage applied, a basic database composed of 560 samples and an extra database composed of 280 samples were constructed. The result shows that 3D PRPS graphs under different defects possess different morphology in vision, and the similarity among different voltages applied is higher than among different defects, so it is reasonable for them to be the basis for PD pattern recognition. Secondly, this article extracted HOG features, which can describe the morphology characteristics in detail from each sample, and subsequently we constructed an HOG attribute space. In order to break the limitations of the NB classifier, attributes were reconstructed, and an optimum subset composed of the first four attributes was obtained. On this condition, the overall recognition rate reaches 99.32% if the number of training samples is 120. Finally, the contrast with the statistical method and the GGCM method indicates that the method proposed avoids tedious calculation in pretreatment and massive training samples, and it has high recognition accuracy even when the voltage applied changes. In short, the HOG method possesses good performance in describing 3D PRPS graphs, and the ASNB classifier can easily realize the pattern recognition of a single PD source according the HOG features.
Nevertheless, it should be specified that the experiment was executed in a lab scenario. The on-site environment is more complicated because of the various noise and multiple PD sources at the same time. To solve this problem, a sensor array is strongly advised in real application. By installing more UHF sensors at different positions, different signals including noise and multiple PD sources could be clustered through the analysis of time delay and amplitude difference, which is a frequently used technology in PD source positioning. Through this step, the problem could be converted into the single source problem in this article. Since pattern recognition of a single PD source has been verified to be feasible by extracting HOG features from 3D PRPS graphs, it should have great application value in combination with PD source positioning technology to diagnose the insulation defect in GIS. Another point to note is that this article only studied GIS. The applicability of the presented method for other power equipment needs to be studied further.