1. Introduction
CIGRE’s statistics show that about 30% dielectric failures of gas insulated switchgears (GIS) are related to design deficiencies [
1]. Through the analysis of a large amount of partial discharge (PD) data from GIS in service, we also found that the proportion of PD cases caused by design reasons is high. This leads to a situation that the same type GIS equipment from the same manufacturer are susceptible to repeat partial discharge on similar location. This provides the basis for case-based reasoning (CBR) in GIS. Case-based reasoning is a branch of artificial intelligence (AI) that provides answers to new questions based on experience in historical cases [
2,
3]. In the latest studies, CBR has been used in load forecasting, energy management, grid system safety assessment, and power equipment failure assessment [
4,
5,
6,
7]. In Reference [
8], a case-based reasoning method is utilized to diagnose the incipient fault of power transformer. Pretreated dissolved gas analysis (DGA) data is used in the CBR system. Reference [
9] developed a case-based reasoning approach for identifying and filtering acoustic emission (AE) noise signals. The paper proposed a parametric case representation method for the AE signal process. Since CBR requires the accumulation of cases and data in the early stage, there is no CBR related literature published in the field of partial discharge. After accumulating a large amount of GIS PD detection data from substation site, CBR can provide new ideas for the interpretation and evaluation of partial discharge data. The key step in a CBR system is the case matching strategy. PD data is one of the key features in a GIS PD detection case. So this paper focused on the data matching problem in the CBR system establishment. A structure of GIS, CBR used PD data, the match degree is presented and shown in
Figure 1.
Some phase resolved pulse sequence (PRPS) graphs are used in
Figure 1 to refer to the data detected by the GIS partial discharge ultra-high frequency (UHF) detection. The specific procedures are as follows: First, the historical data are retrieved from the historical case database according to the operating conditions of the detected equipment, the manufacturer and other search conditions; the detected data are then matched with the historical data, and those cases for which the data match degree exceeds a threshold are considered match cases; and finally, from the match case, we can obtain information such as the highest probability of PD location in the detected power equipment, the most likely cause of PD in the detected power equipment, and pictures of disintegrated power equipment in historical cases. Maintenance plans can be developed based on match information. Therefore, PD history detection data can be more effectively utilized and can provide a basis for data-driven device status evaluations.
There are two key processes that are used to calculate PD data match degree. The first key process is to extract the valid eigenvalues for PD data, and the second is to obtain the match degree (MD) based on the eigenvectors. The traditional feature extraction methods used for PD data extract a variety of statistical features from, for example, histograms, scatter plots, and grayscale images based on PRPD (phase resolved partial discharge) data [
10,
11,
12]. Moreover, there are also some other algorithms applied to PD data feature extraction, such as principal component analysis (PCA) [
13], wavelet packets transformation [
14], sparse representation [
15], and signal norms [
16]. The algorithms proposed in the references behaved a good performance in the task of PD pattern recognition. However, due to the multi-source heterogeneity of access data in big data centers, the huge differences in the performances of PD detect instruments and the complex operating environments in substations, the statistical characteristics obtained by the traditional statistical methods have become inadequate in identifications of typical partial discharge types. In addition, data matching of PD data needs even more stringent requirements than those for PD pattern recognition.
In recent years, related technologies such as deep auto-encoders, deep convolutional networks, recurrent neural networks, and deep belief networks have shown good performance in many fields, including image processing and speech processing [
17,
18,
19,
20,
21]. Reference [
22] studied the application of deep neural networks in the diagnosis of partial discharges and demonstrated the improvements in accuracy and visualization that can be obtained through the deep learning method. Reference [
23] obtained a two-dimensional spectral frame representation of a UHF signal employing a time-frequency analysis and then used a deep convolutional network to obtain enhanced features under different PD sources. Auto-encoding (AE) is an unsupervised feature learning method, and its hidden layer can effectively extract the internal expression of data. Its deep structure makes the network closer to the human brain’s information hierarchical processing, with better nonlinear modelling ability [
24,
25]. The variational autoencoder (VAE) proposed by Kingma et al. is a generating network based on variational Bayesian inference [
26]. It avoids the computational complexity of dataset likelihood probability calculations and traditional Monte Carlo sampling and is therefore becoming an area of considerable research interest in text classification, semi-supervised learning, and other related fields.
This paper presents a PD data matching method based on VAE. The network uses variational Bayesian method to quickly approximate the posterior probability and extract the deep features of PD data. Euclidean distance, cosine distance, and correlation coefficient (Cc) methods were used to measure the similarity between different data, the comparative results of which are also shown in this paper.
The rest of the paper is organized as follows.
Section 2 introduces basic information on variational autoencoder networks.
Section 3 provides further information on the proposed partial discharge data matching approach. The dataset used in this paper is described in
Section 4.
Section 5 validates the data matching approach with different case studies and discusses the results obtained. The conclusions are presented in
Section 6.
2. Variational Autoencoder
Variational Bayes inference [
27] is a deterministic approximation method that maximizes the lower bound of the marginal likelihood function of the observed data by iteratively updating the variational parameters and approximates the posterior probability of unobservable variables.
For a sample set
X, define the eigenvalues of the data as latent variables
z because they cannot be directly observed. According to the Bayesian criterion, the posterior probabilities of the latent variables
z are
It is difficult to obtain an exact analytical solution for
p(
x), therefore, in the variational Bayes inference, an approximate distribution
q(
z|
x) is introduced to fit the real posterior distribution
p(
z|
x). Kullback-Leibler (KL) divergence is used to compare the similarities of the two distributions.
The approximate distribution q(z|x) is estimated by an auto-encoder network in VAE. VAE consists of a probabilistic encoder and a probabilistic decoder and uses a stochastic gradient variational Bayes algorithm to achieve a posterior distribution model that optimizes the hidden layer.
According to the variational Bayes method, the log marginal likelihood of the sample data X can be simplified as shown below.
where
ϕ is the real posterior distribution parameter, and
θ is the approximate distribution parameter of the hidden layer. The first item is the KL divergence between the approximate distribution of the hidden layer and the real posterior distribution. Since KL divergence is nonnegative, the KL divergence is zero only if the two distributions are exactly the same [
28]. Thus,
. Equation (3) can be expanded:
The optimal approximation of the sample set
pθ(
x(i)) can be obtained by maximizing variational bound
L(
θ,
φ;
x(
i)) [
29].
3. Data Matching Method of Partial Discharge Based on VAE
The encoder section of the VAE model for partial discharge data can be represented by Equation (5).
where
W and
b are the weights and biases of each layer, and
x is the input vector.
h1,
μenc, and
σenc are the outputs of the first and second layers of the network.
f is the activation function. Based on Gaussian distribution parameters
μ and
σ, the hidden layer output
z is obtained by sampling
q(
z|
x(
i)), and
N(0,
I) is the standard normal distribution.
The decoder section of the VAE model for partial discharge data can be represented by the following equation.
where
W and
b are the weights and biases of each layer,
h2,
μdec, and
σdec are the outputs of each layer of the decoder, and
f is the activation function.
The target optimization function of Equation (4) can be rewritten as Equation (7).
where
J is the dimension of the latent variables
z, and
L is the number of samples of the latent variables
z on the posterior distribution.
The parameters of the probability encoder and the probability decoder are then optimized by the stochastic gradient descent algorithm. When Equation (7) converges or stabilizes, the output of the encoder part of VAE is the extracted eigenvalues.
Figure 2 shows the matching process of partial discharge data based on the VAE model.
The match degree of partial discharge data can be obtained by calculating the distance between the partial discharge data by using the cosine algorithm of Equation (8).
where
Va and
Vb are the eigenvectors extracted from the two PD datasets.
is the length of the vector.
6. Conclusions
This paper has proposed a PD data matching method based on a VAE network to perform data mining on historical PD databases. Similar cases found by the method can provide abundant information for PD diagnosis and equipment status evaluation. A PD dataset was established from a laboratory partial discharge experiment and substation live detections. Additionally, on the data set, a comparative experiment was conducted on the VAE and the comparison method. Experimental results show:
(1) Compared with traditional statistical eigenvalues, deep learning related methods, such as CNN, DBN, VAE, etc., have better effects on the identification of different PD types on complex data sets;
(2) Compared with CNN, the DBN and VBE models extracted the partial discharge data eigenvalues with better expression ability. In the data matching experiment, the discrimination degree is higher.
(3) The MD calculation method of cosine distance has better precision under a large number of samples than the Euclidean distance and correlation coefficient.
The work in this paper provides a new way of thinking about PD data mining under the background of big data. In further research, a better match strategy will be designed to meet the engineering requirements of PD data mining. The benchmarking criteria for MD of PD data is the key issue to be studied in the next step.