Pattern Recognition of DC Partial Discharge on XLPE Cable Based on ADAM-DBN

: Pattern recognition of DC partial discharge (PD) receives plenty of attention and recent researches mainly focus on the static characteristics of PD signals. In order to improve the recognition accuracy of DC cable and extract information from PD waveforms, a modiﬁed deep belief network (DBN) supervised ﬁne-tuned by the adaptive moment estimation (ADAM) algorithm is proposed to recognize the four typical insulation defects of DC cable according to the PD pulse waveforms. Moreover, the e ﬀ ect of the training sample set size on recognition accuracy is analyzed. Compared with naive Bayes (NB), K-nearest neighbor (KNN), support vector machine (SVM), and back propagation neural networks (BPNN), the ADAM-DBN method has higher accuracy on four di ﬀ erent defect types due to the excellent ability in terms of the feature extraction of PD pulse waveforms. Moreover, the increase of training sample set size would lead to the increase of recognition accuracy within a certain range.


Introduction
HVDC (High Voltage Direct Current) transmission has the advantages of low line cost, not being restricted by synchronous stability problems, thus the project of DC cable has developed rapidly in recent years [1][2][3].With the excellent performance of DC cross-linked polyethylene (XLPE) cables, the usage of XLPE cables has gradually outstripped the oil-filled and impregnation paper cables since its first adoption in Sweden [4].In order to improve the stability and reliability of DC transmission, technology such as online detection and condition evaluation of cables as well as accessories has become a great concern for scholars.
The accumulation of space charge inside the polymer leads to local electric field distortion, which may result in partial discharge (PD), especially in insulation defects.Thus, the PD detection is often used for XLPE cable detection to evaluate the operating condition of the cable.There are lots of literatures about PD researches of AC XLPE cable, but few standards or recognized diagnose methods for DC cable [5,6].PD analysis methods can be divided into phase resolved partial discharge (PRPD) mode and time resolved partial discharge (TRPD) mode [7].Because the PD signal under DC has no phase information, the time interval between two adjacent discharge ∆t is usually used as an important characteristic parameter [8].Yang achieved pattern recognition from a H(q, ∆t) feature map by using the sparse representation classification technology based on the compressed sensing theory [9].Mazroua used a neural network for training and identification according to characteristics such as average current of discharge, repetition rate, and discharge energy [10].Ma proposed the time-frequency analysis of the waveform to pattern recognition by using fuzzy C mean clustering and minimum two-way support vector machine [11].Morette  and unsupervised classification methods based on feature extraction from PD [12,13].However, based on features extracted from TRPD, the usage of neural networks, support vector machines, and other algorithms takes a lot of time to study how to extract features efficiently.The deep belief network (DBN) is a deep learning model with excellent feature learning ability, outstanding performance in data degradation and feature recognition, which is widely used in image processing, speech recognition [14][15][16].Wang applies the DBN to the hybrid fault diagnosis method in transformer oil [17], and Zhang uses DBN to identify the analog circuit incipient faults [18].Both of them have good performance and prove that the deep learning method has strong characteristic learning ability based on the original data.However, the effect of the PD diagnosis for DC XLPE cable based on DBN has not been tested because of the limited data capacity.
In this paper, a modified DBN algorithm based on DC PD pulse waveforms is proposed to achieve the pattern recognition of DC cable insulation defects.First, the PD signals of typical insulation defect cables are collected through the DC experimental platform.Second, the PD pulse waveform is pre-processed using the Canny algorithm and used as an input sample for the classification model.Third, a DBN recognition model optimized by the adaptive moment estimation (ADAM) algorithm is built to achieve the pattern recognition of different defect types.Finally, compared with the artificial feature recognition methods, the effect of classification methods on various defects and the effect of training sample capacity on the classification models are analyzed.The experimental results show that the proposed method improves the accuracy of the recognition of DC XLPE cable insulation defects.

Deep Belief Networks
The deep belief network is a non-convolution generation model proposed by Hinton et al. in 2006 which solves the problem that the depth model was difficult to optimize.The deep belief network consists of a multi-layered restricted Boltzmann machine (RBM).It uses the contrastive divergence (CD) algorithm for unsupervised training of RBM and then uses supervised training for tuning of the entire DBN network.

RBM
The RBM is an undirected probability graph model consisted of the visual layer and the implied layer group, where there is no connection between the cells in the visual layer or the implied layer.The RBM is an energy-based model consisting of a visual layer v and a hidden layer h, with RBM's energy function [17]: where is the parameter of the RBM model, a i and b j are the biases of the visible unit v i and the hidden unit h j , respectively, w ij is the connection weight between the visible unit v i and the hidden unit h j , and n v and n h are the number of visible unit v i and hidden unit h j , respectively.The joint probability distribution is specified by the energy function [19]: where Z is a normalized factor called an allocation function: The state of the hidden elements in RBM is independent of each other, and when given the visible unit v, the probability that the hidden unit h is activated (set to 1) is [20]: Similarly, when the state of hidden unit h is determined, the probability that the visible unit v is activated is [18]: where sigmoid = (1 + exp(−x)) −1 is an activation function that maps x to the interval of (0,1).
For the RBM model given the number of explicit elements and hidden elements, it is necessary to determine the parameter θ through training, where the training goal is to make the data reconstructed from the RBM model under the control of the parameter θ as consistent as possible with the given training sample data.Because the fractionation function of RBM, Z θ , is difficult to calculate by simple method, this paper uses the comparative dispersion (CD) algorithm proposed by Hinton in 2002 to carry out rapid unsupervised training of RBM to solve the optimal value of the θ [21].
The CD algorithm first initializes the parameters θ randomly, uses the training sample as a visible unit v 0 , and calculates the hidden unit h 0 according to Equation ( 4).Then, it reconstructs the visible unit v 1 according to the Equation ( 5) and recalculates the hidden unit h 1 .Each parameter update formula is shown in Equation ( 6): where ε is the learning rate of comparing dispersion method, and < • > is mathematical expectation.In the actual training, due to the large sample size, the random sampling of training samples for RBM was trained in several small quantity batches to improve the computational efficiency.Suppose the number of the random sample size is N, the update formula for the k-th training is:

Pre-Training of DBN
The classification model based on the DBN is stacked with several RBMs and consists of an input layer, several hidden layers and an output layer consisting of the Softmax classifier.The structure is shown in Figure 1.
Each of the two adjacent layers in the input layer and the hidden layer constitutes an RBM, as shown in Figure 1.The RBM 1 consists of the visible unit v 1 and the hidden unit h 1 , which is also acting as the visible unit v 2 of RBM 2 , and so on, the visible unit v k of RBM k is the hidden unit h k−1 of RBM k−1 .The top layer of the network is the Softmax classifier, which maps the DBN from the features extracted from the original data to the category to be divided.
In DBN, RBM network is unsupervised trained layer by layer.First, using the CD algorithm training the input data v 0 is trained to obtain RBM 1 parameters θ 1 .Then, using the activation probability of h 1 as training data, RBM 2 is trained to get the θ 2 , and so on.Finally, get the pre-trained network weight and the bias.Each of the two adjacent layers in the input layer and the hidden layer constitutes an RBM, as shown in Figure 1.The RBM1 consists of the visible unit v 1 and the hidden unit h 1 , which is also acting as the visible unit v 2 of RBM2, and so on, the visible unit v k of RBMk is the hidden unit h k−1 of RBMk−1.The top layer of the network is the Softmax classifier, which maps the DBN from the features extracted from the original data to the category to be divided.
In DBN, RBM network is unsupervised trained layer by layer.First, using the CD algorithm training the input data v 0 is trained to obtain RBM1 parameters θ.Then, using the activation probability of h 1 as training data, RBM2 is trained to get the θ, and so on.Finally, get the pre-trained network weight and the bias.

Supervised Fine-Tuning Based on ADAM
RBM training is carried out independently, so the model parameters are optimal in RBM locally.Therefore, the parameters need to be fine-tuned at the end of the pre-training phase.For a given set of training samples {x, y}, the relationship between input and output can be expressed as: where f is a nonlinear function; xi is the i-th sample in the training sample 12 [ , , , ] ; ˆi y is the label of i-th sample mapped in the DBN model.
Use the average of the cross entropy of the predicted and actual values as the function of error loss: where N is the number of the training samples and yi is the actual value of i-th sample.
The supervised fine-tuning of DBN usually uses the gradient drop method and conjugate gradient descent method.But it is difficult to select the appropriate learning rate and seems to converge to local optimal.The adaptive moment estimation (ADAM) algorithm was proposed by Kingma in 2015 and has excellent performance in dealing with sparse gradients and unsteady states, requiring only a small amount of tuning [22].When get the network parameters θk of the k-th iteration, calculate the gradient gk = J(θk), update the partial first-order moment estimation mk+1 and the partial second-order moment estimation vk+1: (1 ) Then, calculate the first-order moment deviation 1 ˆk m  and the second-order moment deviation

Supervised Fine-Tuning Based on ADAM
RBM training is carried out independently, so the model parameters are optimal in RBM locally.Therefore, the parameters need to be fine-tuned at the end of the pre-training phase.For a given set of training samples {x, y}, the relationship between input and output can be expressed as: where f is a nonlinear function; Use the average of the cross entropy of the predicted and actual values as the function of error loss: where N is the number of the training samples and y i is the actual value of i-th sample.
The supervised fine-tuning of DBN usually uses the gradient drop method and conjugate gradient descent method.But it is difficult to select the appropriate learning rate and seems to converge to local optimal.The adaptive moment estimation (ADAM) algorithm was proposed by Kingma in 2015 and has excellent performance in dealing with sparse gradients and unsteady states, requiring only a small amount of tuning [22].When get the network parameters θ k of the k-th iteration, calculate the gradient g k = ∇J(θ k ), update the partial first-order moment estimation m k+1 and the partial second-order moment estimation v k+1 : Then, calculate the first-order moment deviation mk+1 and the second-order moment deviation vk+1 : The network parameters are updated: where α is the step length, τ is the stable constant.ADAM algorithm parameters are set according to the reference [20], which α = 0.001, τ = 10 −8 , β 1 = 0.9, β 2 = 0.999.ADAM makes full use of the adaptive learning rate of first-order moment mean and second-order moment mean to alleviate the problem of local suboptimal convergence and saddle point residence.

Insulation Defect Design and Test System Construction
Based on the common insulation faults in the actual operation of XLPE cables, this paper designs four typical defect models: (1) Conductor burrs as defect C1.A 1.5 mm long metal needle is attached to the outer surface of the XLPE insulation, with one end touching the cable copper core and the other hanging inside the XLPE insulation.(2) External semi-conductive layer residue as defect C2.In the process of peeling the semi-conductive layer, remain the 3 mm wide and 30 mm long semi-conductive layer on the main insulated surface.(3) Internal air gap as defect C3.Manufacture several micro-holes over the XLPE cable insulation interface to allow a small amount of air to enter the inside of the insulation.Then the seal is applied to the hole and surrounding with epoxy resin.(4) Scratch on the insulation surface as defect C4.Scratches are made along the axial axis on the surface of the XLPE cable insulation, with a width of 1 mm and a depth of 1 mm.The typical insulation defects for XLPE cables are shown as Figure 2.
The network parameters are updated: whereα is the step length,τ is the stable constant.ADAM algorithm parameters are set according to the reference [20], which α = 0.001, τ = 10 −8 , β = 0.9, β = 0.999.ADAM makes full use of the adaptive learning rate of first-order moment mean and second-order moment mean to alleviate the problem of local suboptimal convergence and saddle point residence.

Insulation Defect Design and Test System Construction
Based on the common insulation faults in the actual operation of XLPE cables, this paper designs four typical defect models: (1) Conductor burrs as defect C1.A 1.5 mm long metal needle is attached to the outer surface of the XLPE insulation, with one end touching the cable copper core and the other hanging inside the XLPE insulation.(2) External semi-conductive layer residue as defect C2.In the process of peeling the semi-conductive layer, remain the 3 mm wide and 30 mm long semi-conductive layer on the main insulated surface.(3) Internal air gap as defect C3.Manufacture several micro-holes over the XLPE cable insulation interface to allow a small amount of air to enter the inside of the insulation.Then the seal is applied to the hole and surrounding with epoxy resin.(4) Scratch on the insulation surface as defect C4.Scratches are made along the axial axis on the surface of the XLPE cable insulation, with a width of 1 mm and a depth of 1 mm.The typical insulation defects for XLPE cables are shown as Figure 2. The experimental system is shown in Figure 3.The experiment was carried out in the shielding chamber and the environment remained in the range of (293 ± 5) K.The high-voltage DC source used in the experiment is a 5 kVA/200 skV DC generator, C1 and C2 are 200 kV/100 pF DC divider, C3 is 200 kV/10,000 pF filter capacitor, R1 is 10 kΩ protective water resistance, C4 is 150 kV/300 pF coupling capacitor.This experiment used the HFCT detection system to detect partial discharges.The passband of the Rogowski coil is 3 MHz to 50 MHz.The Lecroy high-speed oscilloscope has a bandwidth of 2 GHz and a maximum sampling rate of 20 GS/s.The HFCT detection coil is clamped on the ground wire of the copper shielding layer of the cable, and then connected to the oscilloscope through the PD signal acquisition module.The rise time of PD pulse current is usually nanosecond order, the pulse duration is about 1000 ns. the sampling rate of the oscilloscope is set to 200 MSa/s, which meets the requirements of the Nyquist sampling theorem.each cable defect model is tested under constant DC voltage, maintain the applied voltage of the cable at partial discharge inception voltage.In the pressure test, a constant voltage test was carried out at five voltage levels.The voltage level gradient was about 4.5 kV, and each voltage gradient was maintained for at least 20 min.

Sample Collection
The four defect models were connected into the system individually, and then slow boost the voltage when PD was relatively stable.The PD pulse waveform signal data was collected through HFCT.The measured pulse waveform signal is shown in Figure 4, the oscilloscope acquisition PD signal contains redundant information before and after the pulse time.Therefore, this paper introduces the modified Canny edge detection operator to pre-process the pulse waveform and Energies 2020, 13, 4566 6 of 11

Sample Collection
The four defect models were connected into the system individually, and then slow boost the voltage when PD was relatively stable.The PD pulse waveform signal data was collected through HFCT.The measured pulse waveform signal is shown in Figure 4, the oscilloscope acquisition PD signal contains redundant information before and after the pulse time.Therefore, this paper introduces the modified Canny edge detection operator to pre-process the pulse waveform and intercepts the part of the red area in the figure so as to maximize the information efficiency.
signal contains redundant information before and after the pulse time.Therefore, this paper introduces the modified Canny edge detection operator to pre-process the pulse waveform and intercepts the part of the red area in the figure so as to maximize the information efficiency.
Canny edge detection has the advantages of strong anti-noise interference, easy detection of weak edges [23].The steps of using Canny algorithm to extract effective information of waveform are as follows.(1) Use Gaussian filtering to smooth the waveform.(2) Take the waveform derivative, perform non-maximum value suppression, and retain only the derivative maximum value point.(3) For dual-threshold detection, suppress the derivatives if below the low threshold suppression, mark as strong edge points if above the high threshold, and mark as weak edge points otherwise.(4) Suppress the isolated weak edge point and the weak edge point if there is no strong edge point nearby.
After determining the starting point of the discharge waveform, the discharge waveform is intercepted according to the length of the waveform and the sample rate, which is set to 600 in this article.Due to the different stable discharge voltages of different defects, the amplitude is normalized after intercepting the waveform.Canny edge detection has the advantages of strong anti-noise interference, easy detection of weak edges [23].The steps of using Canny algorithm to extract effective information of waveform are as follows.(1) Use Gaussian filtering to smooth the waveform.(2) Take the waveform derivative, perform non-maximum value suppression, and retain only the derivative maximum value point.(3) For dual-threshold detection, suppress the derivatives if below the low threshold suppression, mark as strong edge points if above the high threshold, and mark as weak edge points otherwise.(4) Suppress the isolated weak edge point and the weak edge point if there is no strong edge point nearby.
After determining the starting point of the discharge waveform, the discharge waveform is intercepted according to the length of the waveform and the sample rate, which is set to 600 in this article.Due to the different stable discharge voltages of different defects, the amplitude is normalized after intercepting the waveform.
The pre-processed PD waveform of four typical defects is shown in Figure 5.As shown in figure, the discharge current of C1 and C2 have 3 or 4 fluctuation approximately, and the amplitude decays faster, where the decline of current amplitude fluctuation of C2 decrease rapidly.The current waveforms of C3 and C4 are similar, have about 6-7 triangularly fluctuations, and the current amplitude decay is slow, where the current of C4 becomes an irregular violent jitter at a later stage.The pre-processed current waveform is used as sample data, and the sample dimension is the signal length intercepted.A total of 6400 samples were collected, where 1600 samples of discharge signals for each type of insulation defect.In addition, 14 characteristics of the PD current waveforms are extracted, such as skewness degree Sk, steepness Ku, number of local peak Pk, rise time of waveform Tr, peak time Tp, drop time Td, pulse width Tw, pulse mean ut, pulse variance σt 2 , frequency of frequency domain characteristic spectral peak p, amplitude of spectral peak F, number of spectral peak Mp, spectral mean uf, spectral variance σf 2 , as manual extraction features to be compared with The pre-processed current waveform is used as sample data, and the sample dimension is the signal length intercepted.A total of 6400 samples were collected, where 1600 samples of discharge Energies 2020, 13, 4566 7 of 11 signals for each type of insulation defect.In addition, 14 characteristics of the PD current waveforms are extracted, such as skewness degree S k , steepness K u , number of local peak P k , rise time of waveform T r , peak time T p , drop time T d , pulse width T w , pulse mean u t , pulse variance σ t 2 , frequency of frequency domain characteristic spectral peak p, amplitude of spectral peak F, number of spectral peak M p , spectral mean u f , spectral variance σ f 2 , as manual extraction features to be compared with the method proposed in this paper.

Pattern Recognition Step Based on ADAM-DBN
The PD pattern recognition step of DC cable based on the ADAM-DBN are shown as follows: (1) The original data are pre-processed and divided proportionally into training sets and sample sets. (

Experimental Evaluation Indicators
Accuracy and recall are used as indicators to measure the effectiveness of various methods of diagnosing insulation defects.In multi-category problems, accuracy represents the proportion of all samples correctly predicted to the total sample: Recall rate represents the proportion of the actual type of sample that was correctly determined for that type: where, X i represents a collection of the actual type and recognized type are all i and C i is the collection of the actual type of i.
Through the analysis of accuracy and recall rate, we can not only evaluate the overall effectiveness of the pattern recognition method, but also examine the specific identification effect of the pattern recognition method for each defect, and analyze the application of the algorithm to different defects.

Structure and Parameter Settings
The initial bias of parameters in the DBN model are set to 0, and the initial weight uses the random number generated by the Gaussian distribution, where the expectation of Gaussian distribution is set to 0, the standard deviation is set to the inverse of the mean square root value of the number of cells in the visual layer, the batch size of the pre-training stage is 100, the RBM learning rate is 0.1, and the learning cycle of the fine-tuning stage is preset 200.Because the hidden layer structure and learning cycle have a great influence on the recognition accuracy, this paper determines the specific value through experiments.
The number of hidden layers of the model and the number of hidden units per hidden layer are obtained by the enumeration method.Thereby, first, a hidden layer is set for the model, the number of hidden units is gradually increased, the average accuracy of 10 experiments is recorded as a performance indicator, then the optimal value of the number of hidden units is determined.Moreover, add one more hidden layer and choose the optimal number of hidden units of second hidden layer, until the validation score is no longer improved.
Energies 2020, 13, 4566 8 of 11 The relationship between the number of hidden units and the validation score is shown in Figure 6, where N indicates that there is one hidden layer and the number of hidden units is set to N, and 50-N indicates that there is two hidden layers and the number of hidden units of each layer are 50 and N respectively.Considering the computational efficiency, the network structure of this paper contains two hidden layers, where the numbers of hidden units in the first and second layers are both set to 50.
The number of hidden layers of the model and the number of hidden units per hidden layer are obtained by the enumeration method.Thereby, first, a hidden layer is set for the model, the number of hidden units is gradually increased, the average accuracy of 10 experiments is recorded as a performance indicator, then the optimal value of the number of hidden units is determined.Moreover, add one more hidden layer and choose the optimal number of hidden units of second hidden layer, until the validation score is no longer improved.
The relationship between the number of hidden units and the validation score is shown in Figure 6, where N indicates that there is one hidden layer and the number of hidden units is set to N, and 50-N indicates that there is two hidden layers and the number of hidden units of each layer are 50 and N respectively.Considering the computational efficiency, the network structure of this paper contains two hidden layers, where the numbers of hidden units in the first and second layers are both set to 50.After determining the structure of the hidden layer, the effect of the learning cycle on the recognition effect is analyzed.Figure 7 shows the validation score of the learning cycle set to 20-400 (interval 20).This shows that the validation score increases significantly with the increase of the learning cycle when the learning cycle is less than 200, but increases insignificantly after 200.Identifying efficiency and computing costs, it is reasonable to set a learning cycle of 200.After determining the structure of the hidden layer, the effect of the learning cycle on the recognition effect is analyzed.Figure 7 shows the validation score of the learning cycle set to 20-400 (interval 20).This shows that the validation score increases significantly with the increase of the learning cycle when the learning cycle is less than 200, but increases insignificantly after 200.Identifying efficiency and computing costs, it is reasonable to set a learning cycle of 200.

Results Analysis
We compare ADAM-DBN with naïve Bayes (NB), K-nearest neighbor (KNN), support vector machine (SVM) and back propagation neural    As can be seen from Figure 8, the overall identification accuracy of ADAM-DBN model is the highest, where the identification accuracy of the four types of defects has reached more than 95%.Moreover, the accuracy rate of identifying C1 and C2 is more than 98.5%.Compared with the traditional DBN method, ADAM has a higher efficiency of monitoring and optimization of DBN network weight, and the network's mapping of the original signal effectively removes redundant information, so the accuracy of recognition is further improved.
NB, KNN, SVM, BPNN, and DBN method have good overall prediction effectiveness, especially for the identification of defect C2, but performance poor at the identification of C3 and C4.The identify accuracy of C3 and C4 of NB, KNN is less than 90%, and accuracy of SVM, BPNN is less than 92%.This is because C3 and C4 both are situation that part of the insulation is filled with air, so there is a certain similarity between C3 and C4.From Figure 8, we can also see that C3 and C4 are the main types of misidentification of each other, indicating that the characteristics of artificial extraction perform good in C2, but lacks in the identification ability in C3 and C4.
In order to analyze the effect of training data scale on the recognition accuracy, 4000 of the 6400 samples were used as test data set, and the remaining samples were treated as training sets of 400, 800, 1200, 1600, 2000, and 2400, respectively.The results of identification accuracy of NB, KNN, SVM, BPNN, DBN, and ADAM-DBN under six different scales of training data set are shown in Table 1.As can be seen from Figure 8, the overall identification accuracy of ADAM-DBN model is the highest, where the identification accuracy of the four types of defects has reached more than 95%.Moreover, the accuracy rate of identifying C1 and C2 is more than 98.5%.Compared with the traditional DBN method, ADAM has a higher efficiency of monitoring and optimization of DBN network weight, and the network's mapping of the original signal effectively removes redundant information, so the accuracy of recognition is further improved.
NB, KNN, SVM, BPNN, and DBN method have good overall prediction effectiveness, especially for the identification of defect C2, but performance poor at the identification of C3 and C4.The identify accuracy of C3 and C4 of NB, KNN is less than 90%, and accuracy of SVM, BPNN is less than 92%.This is because C3 and C4 both are situation that part of the insulation is filled with air, so there is a certain similarity between C3 and C4.From Figure 8, we can also see that C3 and C4 are the main types of misidentification of each other, indicating that the characteristics of artificial extraction perform good in C2, but lacks in the identification ability in C3 and C4.
Energies 2020, 13, 4566 10 of 11 In order to analyze the effect of training data scale on the recognition accuracy, 4000 of the 6400 samples were used as test data set, and the remaining samples were treated as training sets of 400, 800, 1200, 1600, 2000, and 2400, respectively.The results of identification accuracy of NB, KNN, SVM, BPNN, DBN, and ADAM-DBN under six different scales of training data set are shown in Table 1.From Table 1, it can be seen that with the increase scale of training set, recognition accuracy of KNN, SVM, BPNN, traditional DBN and ADAM-DBN is increasing, but accuracy of NB has not significantly improved.In the case of small training set, the recognition accuracy of DBN is similar to that of other classification algorithms.However, with the increase of the number of training set, the characteristics of the local pulse waveform extracted by the deep learning network are more comprehensive, thus the recognition accuracy of DBN is obviously better than other four methods.At the same time, the modified DBN model supervised fine-tuned by the ADAM algorithm converges faster and the recognition accuracy is higher than other methods on the same training scale.

Conclusions
In this paper, a PD pattern recognition method of DC XLPE cable based on DBN algorithm is proposed to distinguish the different types of defect.Using the PD signal data collected by different artificial defects as input, the pattern recognition effect is analyzed, and the results are shown below: 1.
The Canny operator is used to pre-process the PD pulse waveform of the XLPE cable.The modified DBN model supervised fine-tuned by the ADAM algorithm is trained to get the pattern recognition result.

2.
Compared with the NB, KNN, SVM, and BPNN algorithms, ADAM-DBN can unsupervised extract the characteristic information contained in the PD pulse waveforms.And the recognition accuracy of the typical insulation defects in DC XLPE cables is higher than other methods.

3.
In the experiment, it is found that the traditional classification method based on statistical characteristics performs not so well in the identification of air gap and scratch defects.However, the pattern recognition method based on DBN can effectively characterize the intrinsic relationship between the insulation defect and the PD pulse current waveforms, and have a better recognition effect on all kinds of defects.

4.
With the increase of training sample size, the recognition accuracy of DC cable based on ADAM-DBN is increased, and the recognition effect is better than that of traditional DBN and other classification methods.

Figure 2 . 12 Figure 3 .
Figure 2. Typical insulation defects for XLPE cables.The experimental system is shown in Figure3.The experiment was carried out in the shielding chamber and the environment remained in the range of (293 ± 5) K.The high-voltage DC source used in the experiment is a 5 kVA/200 skV DC generator, C1 and C2 are 200 kV/100 pF DC divider, C3 is 200 kV/10,000 pF filter capacitor, R1 is 10 kΩ protective water resistance, C4 is 150 kV/300 pF coupling capacitor.This experiment used the HFCT detection system to detect partial discharges.The passband of the Rogowski coil is 3 MHz to 50 MHz.The Lecroy high-speed oscilloscope has a bandwidth of 2 GHz and a maximum sampling rate of 20 GS/s.The HFCT detection coil is clamped on the ground wire of the copper shielding layer of the cable, and then connected to the oscilloscope through the PD signal acquisition module.The rise time of PD pulse current is usually nanosecond order, the pulse duration is about 1000 ns.The sampling rate of the oscilloscope is set to 200 MSa/s, which meets the requirements of the Nyquist sampling theorem.Each cable defect model is tested under constant DC voltage, maintain the applied voltage of the cable at partial discharge inception voltage.In the pressure test, a constant voltage test was carried out at five voltage levels.The voltage level gradient was about 4.5 kV, and each voltage gradient was maintained for at least 20 min.Energies 2020, 13, x FOR PEER REVIEW 6 of 12

Figure 3 .
Figure 3. PD Experiment and Detection System of DC XLPE Cable.

Figure 4 .
Figure 4. Time domain signals of discharge pulse.

Figure 4 .
Figure 4. Time domain signals of discharge pulse.

)
Construct the DBN recognition model, pre-train it with CD algorithm, and get the pre-trained network parameters of the identification model.(3) Using the ADAM algorithm, the DBN model is supervised trained by training sample label, fine-tuning the network parameters to make the model optimized.(4) Use the test data set as input to inspect the trained DBN model.

Figure 6 .
Figure 6.Relationship between the number of hidden units and the validation score.

Figure 6 .
Figure 6.Relationship between the number of hidden units and the validation score.

Figure 7 .
Figure 7. Relationship between learning cycle and the validation score.
networks (BPNN) to evaluate the effectiveness of pattern recognition.Here, the Lidstone smoothing parameters of the NB model are α = 2 n , n = −10, −9, …, 10.KNN model uses European distance as the criterion for judging and the neighbors are set to 20.The SVM model uses radial basis function (RBF), which takes σ 2 = 2 n , wherein n = −10, −9, ... 0, …, 10, as the error penalty parameter, and for the parameter of the optimal test accuracy, choose σ 2 = 4 as the setting parameter.BPNN model has 14 nodes of input layer, 10 nodes of hidden layer, learning rate 0.01, learning cycle 200.As a comparison of the traditional DBN model, RBM learning rate 0.1, fine-tuning stage learning rate 0.4, fine-tuning stage learning cycle 200, using method of Section 3.2

Figure 7 .
Figure 7. Relationship between learning cycle and the validation score.

4. 3 .
Results AnalysisWe compare ADAM-DBN with naïve Bayes (NB), K-nearest neighbor (KNN), support vector machine (SVM) and back propagation neural networks (BPNN) to evaluate the effectiveness of pattern recognition.Here, the Lidstone smoothing parameters of the NB model are α = 2 n , n = −10, −9, . . ., 10. KNN model uses European distance as the criterion for judging and the neighbors are set to 20.Energies 2020, 13, 4566 9 of 11 The SVM model uses radial basis function (RBF), which takes σ 2 = 2 n , where in n = −10, −9, ... 0, . . ., 10, as the error penalty parameter, and for the parameter of the optimal test accuracy, choose σ 2 = 4 as the setting parameter.BPNN model has 14 nodes of input layer, 10 nodes of hidden layer, learning rate 0.01, learning cycle 200.As a comparison of the traditional DBN model, RBM learning rate 0.1, fine-tuning stage learning rate 0.4, fine-tuning stage learning cycle 200, using method of Section 3.2 to obtain the hidden layer structure, which has two hidden layers and 50 hidden units each layer.The pre-processed data sample set is divided into training sets and testing sets by 2:8, as the input of the ADAM-DBM model and the traditional DBN model.The 14 feature sample sets are also divided into training sets and testing sets by 2:8, as the input of the NB, KNN, SVM, and BPNN models.

Figure 8 .
Figure 8. Confusion matrix of insulation recognition results.

Table 1 .
Recognition accuracy on different scales of training sample sets.