Prediction Method for Power Transformer Running State Based on LSTM_DBN Network

: It is of great signiﬁcance to accurately get the running state of power transformers and timely detect the existence of potential transformer faults. This paper presents a prediction method of transformer running state based on LSTM_DBN network. Firstly, based on the trend of gas concentration in transformer oil, a long short-term memory (LSTM) model is established to predict the future characteristic gas concentration. Then, the accuracy and inﬂuencing factors of the LSTM model are analyzed with examples. The deep belief network (DBN) model is used to establish the transformer operation using the information in the transformer fault case library. The accuracy of state classiﬁcation is higher than the support vector machine (SVM) and back-propagation neural network (BPNN). Finally, combined with the actual transformer data collected from the State Grid Corporation of China, the LSTM_DBN model is used to predict the transformer state. The results show that the method has higher prediction accuracy and can analyze potential faults.


Introduction
As one of the important pieces of equipment in the power system, power transformers can directly influence the stability and safety of the entire power grid. If the transformer fails in operation, it will cause power to turn off and also cause damage to the transformer itself and the power system, which may result in greater damage [1]. So, it is necessary to take real-time condition monitoring into consideration and make diagnoses for the transformer to predict future running states. The potential failure of the transformer is discovered in time and the potential failure types are analyzed. Sending early warning signals to maintainers and taking corresponding measures in a timely manner can reduce the possibility of an accident.
At present, there is much research on transformer fault diagnosis, but there are relatively few studies on the prediction of future running states of transformers and fault prediction. During the operation of the transformer, its internal insulating oil and solid insulating material will be dissolved in the insulating oil due to aging or external electric field and humidity. The content of various components of the gas in the oil and the proportional relationship between the different components are closely related to the running state of the transformer. Before the occurrence of electrical or thermal faults, the concentration of various gases has a gradual and regular change with time. Therefore, the dissolved gas analysis (DGA) is an important method to find the transformer defects and latent faults. It is highly feasible and accurate to predict transformer running states and make future fault classifications based on the trend of each historical gas concentration and the ratio between gas concentrations [2][3][4]. Current methods include oil gas ratio analysis [5][6][7], SVM [8,9] and artificial

Prediction of Dissolved Gases Concentration
Transformer oil chromatographic analysis technology has become one of the important methods for monitoring the early latency faults of oil-immersed power transformers and analyzing fault nature and locations after failure. Condition-based maintenance of oil-immersed transformers is fully based on oil chromatographic data. Transformer oil chromatographic analysis test can quickly and effectively find potential faults and defects without interruption of power. It has high recognition of overheating faults, discharge faults, and dielectric breakdown failures.
Most transformers use oil-paper composite insulation. When the transformer is under normal operation, the insulating oil and solid insulating material will gradually deteriorate and a small amount of gas will be decomposed, mainly including H 2 , CH 4 , C 2 H 2 , C 2 H 4 , C 2 H 6 , CO, and CO 2 . When the internal fault of the transformer occurs, the speed of these gases will be accelerated. As the failure develops, the decomposed gas forms bubbles, causing bubbles to flow and diffuse in oil. The composition and content of the gas are closely related to the type of fault and the severity of the fault. Therefore, during the operation of the transformer, chromatographic analysis of the oil is performed at regular intervals, so as to detect potential internal equipment failures as early as possible, which can avoid equipment failure or greater losses. However, due to the complex operation of the transformer oil chromatography test and the long sampling interval, it is of great significance to predict the future development trend based on the historical trend of the gas concentration in the transformer oil.

Principles of Prediction
The LSTM network is an improved model based on the RNN. While retaining the recursive nature of RNNs, the problem of disappearance of gradients and gradient explosions in the RNN training process is solved [24][25][26][27].

Prediction of Dissolved Gases Concentration
Transformer oil chromatographic analysis technology has become one of the important methods for monitoring the early latency faults of oil-immersed power transformers and analyzing fault nature and locations after failure. Condition-based maintenance of oil-immersed transformers is fully based on oil chromatographic data. Transformer oil chromatographic analysis test can quickly and effectively find potential faults and defects without interruption of power. It has high recognition of overheating faults, discharge faults, and dielectric breakdown failures.
Most transformers use oil-paper composite insulation. When the transformer is under normal operation, the insulating oil and solid insulating material will gradually deteriorate and a small amount of gas will be decomposed, mainly including H2, CH4, C2H2, C2H4, C2H6, CO, and CO2. When the internal fault of the transformer occurs, the speed of these gases will be accelerated. As the failure develops, the decomposed gas forms bubbles, causing bubbles to flow and diffuse in oil. The composition and content of the gas are closely related to the type of fault and the severity of the fault. Therefore, during the operation of the transformer, chromatographic analysis of the oil is performed at regular intervals, so as to detect potential internal equipment failures as early as possible, which can avoid equipment failure or greater losses. However, due to the complex operation of the transformer oil chromatography test and the long sampling interval, it is of great significance to predict the future development trend based on the historical trend of the gas concentration in the transformer oil.

Principles of Prediction
The LSTM network is an improved model based on the RNN. While retaining the recursive nature of RNNs, the problem of disappearance of gradients and gradient explosions in the RNN training process is solved [24][25][26][27]. A basic RNN network is shown in Figure 1a. It consists of an input layer, a hidden layer, and an output layer. The RNN network timing diagram is shown in Figure 1b is the input vector and y = [y (1) , y (2) , …, y (n) ] is the output vector. h is the state of the hidden layer. Wxh is the weight matrix of the input layer to the hidden layer. Why is the weight matrix of the hidden layer to the output layer, and Whh is the weight matrix of the hidden layer state as the input at the next moment. The layer state h (t−1) is used as the weight matrix input at time t. So when the input at t is x (t) , the value of the hidden layer is h (t) and the output value is y (t) . A basic RNN network is shown in Figure 1a. It consists of an input layer, a hidden layer, and an output layer. The RNN network timing diagram is shown in Figure 1b. x = [x (1) , x (2) , x (3) . . . , x (n−1) , x (n) ] is the input vector and y = [y (1) , y (2) , . . . , y (n) ] is the output vector. h is the state of the hidden layer. W xh is the weight matrix of the input layer to the hidden layer. W hy is the weight matrix of the hidden layer to the output layer, and W hh is the weight matrix of the hidden layer state as the input at the next moment. The layer state h (t−1) is used as the weight matrix input at time t. So when the input at t is x (t) , the value of the hidden layer is h (t) and the output value is y (t) .
where f is the hidden layer activation function and g is the output layer activation function. Substituting (2) into (1), we can get: From (3), it can be seen that the output value y (t) of the RNN network is affected not only by the input x (t) at the current moment, but also by the previous input value The RNN network has a memory function and can effectively deal with non-linear time series. However, when the RNN processes a time sequence with a long delay, the problem of gradient disappearance and gradient explosion will occur during the back-propagation through time (BPTT) training process. As an improved model, LSTM adds a gating unit which allows the model to store and transmit information for a longer period of time through the selective passage of information.
The gating unit of LSTM is shown in Figure 2. It consists of an input gate, a forget gate and an output gate. The workflow of the LSTM gate unit is as follows: (1) Input the sequence value x (t) at time t and the hidden layer state h (t−1) at time t − 1. The discarded information is determined by the activation function. The output at this time is: where f (t) is the result of the forget state, W f is the weight matrix of forget state, and b f is offset of forget state. σ is the activation function. It is usually a tanh or sigmoid function.
(2) Enter the gate unit state c (t−1) at time t − 1 and determine the information to update. Update the gate unit state c (t) at time t: where i (t) is the input gate state result. c (t) is the cell state input at t. W i is the input gate weight matrix. W c is the input cell state weight matrix. b i is the input gate bias, and b c is the input cell state bias.
• means multiplication by elements.
(3) The output of the LSTM is determined by the output gate and unit status: where o (t) is the output gate state result. W o is the output gate weight matrix and b o is the output gate offset. Where f is the hidden layer activation function and g is the output layer activation function. Substituting (2) into (1), we can get: From (3), it can be seen that the output value y (t) of the RNN network is affected not only by the input x (t) at the current moment, but also by the previous input value x ( The RNN network has a memory function and can effectively deal with non-linear time series. However, when the RNN processes a time sequence with a long delay, the problem of gradient disappearance and gradient explosion will occur during the back-propagation through time (BPTT) training process. As an improved model, LSTM adds a gating unit which allows the model to store and transmit information for a longer period of time through the selective passage of information.
The gating unit of LSTM is shown in Figure 2. It consists of an input gate, a forget gate and an output gate. The workflow of the LSTM gate unit is as follows: (1) Input the sequence value x (t) at time t and the hidden layer state h (t−1) at time t − 1. The discarded information is determined by the activation function. The output at this time is: where f (t) is the result of the forget state, Wf is the weight matrix of forget state, and bf is offset of forget state. σ is the activation function. It is usually a tanh or sigmoid function.
(2) Enter the gate unit state c (t−1) at time t − 1 and determine the information to update. Update the gate unit state c (t) at time t: where i (t) is the input gate state result. ( ) t  c is the cell state input at t. Wi is the input gate weight matrix.
Wc is the input cell state weight matrix. bi is the input gate bias, and bc is the input cell state bias.  means multiplication by elements.
(3) The output of the LSTM is determined by the output gate and unit status: where o (t) is the output gate state result. Wo is the output gate weight matrix and bo is the output gate offset.

Transformer Running Status Analysis
For the running state classification of the transformer, it is firstly divided into healthy state (H) and potential failure (P). According to the IEC60599 standard, the types of potential transformer faults can be classified into thermal fault of partial discharge (PD), low-energy discharge (LD), and high-energy discharge (HD), low temperature (LT), thermal fault of medium temperature (MT), thermal fault of high temperature (HT) [21]. Thus, the predicted running state of the transformer is divided into 7 (6 + 1) types.
Due to the normal aging of the transformer, the decomposed gas in the transformer oil is in an unstable state and will accumulate over time and change dynamically. Even though different transformers are in healthy operation, because of their different operating times, the concentration of dissolved gases in the oil varies greatly among different transformers. Therefore, it is necessary to use the ratio between the gas concentrations instead of the simple gas concentration as a reference vector for the prediction of the final running state.
The currently used ratios include IEC ratios, Rogers ratios, Dornenburg ratios and Duval ratios. This paper combines these four methods with other codeless ratio methods. The gas concentration ratios used is shown in Table 1.

Deep Belief Network
RBM, as a component of DBN, includes a visible layer v and a hidden layer h. The structure of RBM is shown in Figure 3. The visible layer consists of visible units v i and is used to input the training data. The hidden layer is composed of hidden units h i and is used for feature detection. w represents the weights between two layers. For the visible and hidden layers of RBM, the interlayer neurons are fully connected and the inner layer neurons are not connected [28][29][30][31].

Transformer Running Status Analysis
For the running state classification of the transformer, it is firstly divided into healthy state (H) and potential failure (P). According to the IEC60599 standard, the types of potential transformer faults can be classified into thermal fault of partial discharge (PD), low-energy discharge (LD), and high-energy discharge (HD), low temperature (LT), thermal fault of medium temperature (MT), thermal fault of high temperature (HT) [21]. Thus, the predicted running state of the transformer is divided into 7 (6 + 1) types.
Due to the normal aging of the transformer, the decomposed gas in the transformer oil is in an unstable state and will accumulate over time and change dynamically. Even though different transformers are in healthy operation, because of their different operating times, the concentration of dissolved gases in the oil varies greatly among different transformers. Therefore, it is necessary to use the ratio between the gas concentrations instead of the simple gas concentration as a reference vector for the prediction of the final running state.
The currently used ratios include IEC ratios, Rogers ratios, Dornenburg ratios and Duval ratios. This paper combines these four methods with other codeless ratio methods. The gas concentration ratios used is shown in Table 1.

Deep Belief Network
RBM, as a component of DBN, includes a visible layer v and a hidden layer h. The structure of RBM is shown in Figure 3. The visible layer consists of visible units vi and is used to input the training data. The hidden layer is composed of hidden units hi and is used for feature detection. w represents the weights between two layers. For the visible and hidden layers of RBM, the interlayer neurons are fully connected and the inner layer neurons are not connected [28][29][30][31]. For a specific set of (v, h), the energy function of the RBM is defined as: where , , is the parameter of RBM. ij ω is the connection weight between the visible layer node vi and the hidden layer node hj. ai and bj are the offsets of vi and hj respectively. According to this energy function, the joint probability density of (v, h) is: For a specific set of (v, h), the energy function of the RBM is defined as: where θ = (ω ij , a i , b j ) is the parameter of RBM. ω ij is the connection weight between the visible layer node v i and the hidden layer node h j . a i and b j are the offsets of v i and h j respectively. According to this energy function, the joint probability density of (v, h) is: The probability that the jth hidden unit in the hidden layer and the ith visible unit in the visible layer are activated are: where σ(·) is the activation function. Usually we can choose sigmoid function, tanh function or ReLU function. The expressions are: Since the ReLU function can improve the convergence speed of the model and has the non-saturation characteristics, this paper uses the ReLU function as the activation function. When given a set of training samples S, n s is the number of training samples. Maximizing the likelihood function can achieve the purpose of training RBM.
The DBN network is essentially a deep neural network composed of multiple RBM networks and a classified output layer. Its structure is shown in Figure 4.
The probability that the jth hidden unit in the hidden layer and the ith visible unit in the visible layer are activated are: where ( ) ⋅ σ is the activation function. Usually we can choose sigmoid function, tanh function or ReLU function. The expressions are: Since the ReLU function can improve the convergence speed of the model and has the nonsaturation characteristics, this paper uses the ReLU function as the activation function.
When given a set of training samples S, ns is the number of training samples. Maximizing the likelihood function can achieve the purpose of training RBM.
The DBN network is essentially a deep neural network composed of multiple RBM networks and a classified output layer. Its structure is shown in Figure 4.   The DBN training process includes two stages: pre-training and fine-tuning. In the pre-training phase, a contrast divergence (CD) algorithm is used to train each layer of RBM layer by layer. The output of the first layer of RBM hidden layer is used as the input of the upper layer of RBM. In the The DBN training process includes two stages: pre-training and fine-tuning. In the pre-training phase, a contrast divergence (CD) algorithm is used to train each layer of RBM layer by layer. The output of the first layer of RBM hidden layer is used as the input of the upper layer of RBM. In the fine-tuning phase, the gradient descent method is used to propagate the error between the actual output and the labeled numerical label from top to bottom and back to the bottom to achieve optimization of the entire DBN model parameters.

Transformer State Prediction Process
With the continuous development of power equipment on-line monitoring technology, the monitoring data are also increasing rapidly. Utilizing the existing historical state information, such as the type and development law of the characteristic gas in the insulating oil, and analyzing the change of the running state is of great significance to the state assessment and prediction.
The flowchart of the transformer running state prediction method based on the LSTM_DBN model is shown in Figure 5. The specific steps are as follows: (1) Collect the transformer oil chromatographic data and select the characteristic parameters H 2 , CH 4 , C 2 H 2 , C 2 H 4 and C 2 H 6 as input for the model; fine-tuning phase, the gradient descent method is used to propagate the error between the actual output and the labeled numerical label from top to bottom and back to the bottom to achieve optimization of the entire DBN model parameters.

Transformer State Prediction Process
With the continuous development of power equipment on-line monitoring technology, the monitoring data are also increasing rapidly. Utilizing the existing historical state information, such as the type and development law of the characteristic gas in the insulating oil, and analyzing the change of the running state is of great significance to the state assessment and prediction.
The flowchart of the transformer running state prediction method based on the LSTM_DBN model is shown in Figure 5. The specific steps are as follows: (1) Collect the transformer oil chromatographic data and select the characteristic parameters H2, CH4, C2H2, C2H4 and C2H6 as input for the model;

Gas Concentration Prediction
This paper takes the oil chromatographic monitoring data collected by a 220 kV transformer oil chromatography online monitoring device as an example. The sampling interval is 1 day. For the methane gas concentration sequence, 800 monitoring data are selected as training samples and 100 monitoring data are used as test samples. The prediction results are shown in Figure 6.
In order to evaluate the accuracy and validity of the prediction model proposed in this paper, the following evaluation criteria are used for analysis. 20) where N is the number of set, x i is the real value and x i is the predicted value.

Gas Concentration Prediction
This paper takes the oil chromatographic monitoring data collected by a 220 kV transformer oil chromatography online monitoring device as an example. The sampling interval is 1 day. For the methane gas concentration sequence, 800 monitoring data are selected as training samples and 100 monitoring data are used as test samples. The prediction results are shown in Figure 6.
In order to evaluate the accuracy and validity of the prediction model proposed in this paper, the following evaluation criteria are used for analysis. (20) where N is the number of set, xi is the real value and i x  is the predicted value.

Gas Concentration Prediction
This paper takes the oil chromatographic monitoring data collected by a 220 kV transformer oil chromatography online monitoring device as an example. The sampling interval is 1 day. For the methane gas concentration sequence, 800 monitoring data are selected as training samples and 100 monitoring data are used as test samples. The prediction results are shown in Figure 6.
In order to evaluate the accuracy and validity of the prediction model proposed in this paper, the following evaluation criteria are used for analysis. (20) where N is the number of set, xi is the real value and i x  is the predicted value.   As shown in Figure 6, the prediction model proposed in this paper has better fitting ability and has a good degree of fitting to the changing trend of methane gas concentration. The relative percentage error between the true and predicted values is shown in Figure 7, where the average relative percentage error is 0.26% and the maximum relative percentage error is 1.21%.
The LSTM model is used to predict the other gas concentrations. The predicted results are shown in Table 2, which shows that the average error of the LSTM method is lower than general regression neural network (GRNN), DBN, and SVM. Therefore, it can be seen that the use of LSTM to predict transformer concentration has high stability and reliability.

Gas Concentration Prediction
The transformer oil chromatographic gas concentration ratios are used as the input to the DBN network and the seven transformer running states are output. The case database used in this paper contains a total of 3870 datasets, including 838 normal cases and 3032 failure cases (521 LT cases, 376 MT cases, 587 HT cases, 519 PD cases, 489 LD cases and 540 HD cases). 90% of the sample data are randomly selected from the database to train the DBN network, leaving 10% of the sample data as the test sample to test the accuracy of the classification.
The results of the classification of DBN, SVM, and BPNN at a test are shown in Figure 8. This paper evaluates the classification results of transformer running states by drawing confusion matrix. Light green squares on the diagonal indicate the number of samples that match the predicted category with the actual category, and the blue squares indicate the number of falsely identified samples. The last row of gray squares is the precision (the number of correctly predicted samples/number of predicted samples). The last column of orange squares is the recall (the number of correctly predicted samples/actual number of samples). The last purple square is the accuracy (all correctly predicted samples/all samples). As shown in Figure. 6, the prediction model proposed in this paper has better fitting ability and has a good degree of fitting to the changing trend of methane gas concentration. The relative percentage error between the true and predicted values is shown in Figure 7, where the average relative percentage error is 0.26% and the maximum relative percentage error is 1.21%.
The LSTM model is used to predict the other gas concentrations. The predicted results are shown in Table 2, which shows that the average error of the LSTM method is lower than general regression neural network (GRNN), DBN, and SVM. Therefore, it can be seen that the use of LSTM to predict transformer concentration has high stability and reliability.

Gas Concentration Prediction
The transformer oil chromatographic gas concentration ratios are used as the input to the DBN network and the seven transformer running states are output. The case database used in this paper contains a total of 3870 datasets, including 838 normal cases and 3032 failure cases (521 LT cases, 376 MT cases, 587 HT cases, 519 PD cases, 489 LD cases and 540 HD cases). 90% of the sample data are randomly selected from the database to train the DBN network, leaving 10% of the sample data as the test sample to test the accuracy of the classification.
The results of the classification of DBN, SVM, and BPNN at a test are shown in Figure 8. This paper evaluates the classification results of transformer running states by drawing confusion matrix. Light green squares on the diagonal indicate the number of samples that match the predicted category with the actual category, and the blue squares indicate the number of falsely identified samples. The last row of gray squares is the precision (the number of correctly predicted samples/number of predicted samples). The last column of orange squares is the recall (the number of correctly predicted samples/actual number of samples  From Figure 8, it can be seen that compared with the SVM model and the BPNN model, the DBN model has the highest classification accuracy, which respectively increases the accuracy by 9.6% and 16.2%. And the precision and recall rate of the DBN model are both high, exceeding 85%. The comparison shows that the DBN model has a good effect for the classification of transformer running states. Since a single experiment may be accidental, this paper repeats 10 sets of tests on the DBN model, the SVM model, and the BPNN model to obtain the average accuracy respectively. The average accuracy of the three models is 89.4%, 80.1%, and 71.9%. Therefore, it can be seen that the DBN model has strong classification stability while maintaining a high accuracy.

Running State Prediction
The oil chromatogram data from January to October in 2015 of a main transformer in a substation are selected for analysis. The sampling interval for data points is 12 h. The original data are shown in Figure 9. From Figure 8, it can be seen that compared with the SVM model and the BPNN model, the DBN model has the highest classification accuracy, which respectively increases the accuracy by 9.6% and 16.2%. And the precision and recall rate of the DBN model are both high, exceeding 85%. The comparison shows that the DBN model has a good effect for the classification of transformer running states. Since a single experiment may be accidental, this paper repeats 10 sets of tests on the DBN model, the SVM model, and the BPNN model to obtain the average accuracy respectively. The average accuracy of the three models is 89.4%, 80.1%, and 71.9%. Therefore, it can be seen that the DBN model has strong classification stability while maintaining a high accuracy.

Running State Prediction
The oil chromatogram data from January to October in 2015 of a main transformer in a substation are selected for analysis. The sampling interval for data points is 12 h. The original data are shown in Figure 9. First, using the IEC three-ratio method integrated in the original system for analysis, there is no abnormal warning before September. Until September, the measured ratio code is 021, which is consistent with thermal fault of medium temperature. An abnormal warning should be issued at this time. Secondly, using the integrated threshold method in the original system, H2 content in excess of 150 μL/L is detected in October and an early warning signal is required.
Using the LSTM_DBN model proposed in this paper, the transformer running state is predicted and evaluated. Starting from the 5th month, use the LSTM model to predict the transformer gas concentration value in the next month, then calculate the gas concentration ratios and input them into the DBN network to get the transformer's future running state. The transformer's running state from May to October is shown in Table 3. As it can be seen from Table 3, the percentage of fault cases that are obtained through analysis using the LSTM_DBN model is gradually increasing, of which the percentage in August has exceeded 50% and the highest percentage of fault cases in October is 74.2%. It can be seen that there is a potential operational failure. Table 3 shows that, among all the fault type analysis results, the number of fault cases with MT is the largest, so that there is a potential fault type with thermal fault of medium temperature. It needs to send early warning signals. Oil chromatography monitoring device can be interfered with the external environment and cause errors in data acquisition. When the fault case accounts for more than 50%, it should immediately attract the attention of the staff. For this case, equipment early warning should be issued in August: "Closely concerned with the development trend of chromatographic data and timely check the transformer status".
The actual situation for the operation and maintenance personnel's detection records shows that the oil temperature rises abnormally since June and the value of the core grounding current increases gradually. The value of H2 in the oil chromatographic device exceeds 150 μL/L from October to December. During the outage maintenance in 2016, there are traces of burn at the end of the winding and the B phase winding is distorted. The prediction results of the transformer running state through the LSTM_DBN model are more consistent with the actual situation. This example shows that the transformer running state prediction method based on LSTM_DBN model can detect the abnormal First, using the IEC three-ratio method integrated in the original system for analysis, there is no abnormal warning before September. Until September, the measured ratio code is 021, which is consistent with thermal fault of medium temperature. An abnormal warning should be issued at this time. Secondly, using the integrated threshold method in the original system, H 2 content in excess of 150 µL/L is detected in October and an early warning signal is required.
Using the LSTM_DBN model proposed in this paper, the transformer running state is predicted and evaluated. Starting from the 5th month, use the LSTM model to predict the transformer gas concentration value in the next month, then calculate the gas concentration ratios and input them into the DBN network to get the transformer's future running state. The transformer's running state from May to October is shown in Table 3. As it can be seen from Table 3, the percentage of fault cases that are obtained through analysis using the LSTM_DBN model is gradually increasing, of which the percentage in August has exceeded 50% and the highest percentage of fault cases in October is 74.2%. It can be seen that there is a potential operational failure. Table 3 shows that, among all the fault type analysis results, the number of fault cases with MT is the largest, so that there is a potential fault type with thermal fault of medium temperature. It needs to send early warning signals. Oil chromatography monitoring device can be interfered with the external environment and cause errors in data acquisition. When the fault case accounts for more than 50%, it should immediately attract the attention of the staff. For this case, equipment early warning should be issued in August: "Closely concerned with the development trend of chromatographic data and timely check the transformer status".
The actual situation for the operation and maintenance personnel's detection records shows that the oil temperature rises abnormally since June and the value of the core grounding current increases gradually. The value of H 2 in the oil chromatographic device exceeds 150 µL/L from October to December. During the outage maintenance in 2016, there are traces of burn at the end of the winding and the B phase winding is distorted. The prediction results of the transformer running state through the LSTM_DBN model are more consistent with the actual situation. This example shows that the transformer running state prediction method based on LSTM_DBN model can detect the abnormal upward trend of oil chromatographic data in time and provide early warning to the abnormal state of the transformer.

Conclusions
(1) The LSTM model has excellent ability to process time series and solves problems such as gradient disappearance, gradient explosion, and lack of long-term memory in the training process. It can fully utilize historical data. The DBN model can extract the characteristic information hidden in fault case data layer by layer and has high classification ability.
(2) The transformer running state prediction method based on the LSTM_DBN model presented in this paper has high accuracy and can send warning information to potential transformer faults in time. Compared with the threshold method according to the standard and the state prediction method in the research literature, this paper can make full use of the historical and current state data.
(3) We will focus on the improvement of the LSTM model and the DBN model, as well as parameter optimization, to further improve the transformer state prediction accuracy in the next step. Due to the small number of substations with complete online monitoring equipment and rich state data, the method proposed in this paper needs further verification.
Author Contributions: Jun Lin designed the algorithm, test the example and write the manuscript. Lei Sum Gehao Sheng, Yingjie Yan, Da Xie and Xiuchen Jiang helped design the algorithm and debug the code.