A Novel Fault Early Warning Model Based on Fault Gene Table for Smart Distribution Grids

: Since a smart distribution grid has a diversity of components and complicated topology; it is very hard to achieve fault early warning for each part. A fault early warning model for smart distribution grid combining a back propagation (BP) neural network with a gene sequence alignment algorithm is proposed. Firstly; the operational state of smart distribution grid is divided into four states; and a BP neural network is adopted to explore the operational state from the historical fault data of the smart distribution grid. This obtains the relationship between each state transition time sequence and corresponding fault, and is used to construct the fault gene table. Then; a state transition time sequence is obtained online periodically, which is matched with each gene in fault gene table by an improved Smith–Waterman algorithm. If the maximum match score exceeds the given threshold, the relevant fault will be detected early. Finally, plenty of time domain simulation is performed on the proposed fault early warning model to IEEE-14 bus. The simulation results show that the proposed model can achieve efﬁcient early fault warning of smart distribution grids.


Introduction
In recent years, the smart grid has become a major concern of the international community [1][2][3][4][5][6][7].Smart distribution grids are the main connection between the main grid and power supply to users and are important parts of the smart grid [8].Whether its operational state is normal or not directly affects the power supply to thousands of users.Meanwhile, with the access of distributed generation, the popularization of electric vehicles, and the increase of user interaction, the dynamic behavior of smart distribution grid becomes complicated, and the operational risk increases greatly [9].In the event of power outage, there will be a great influence on social life and economic losses [10,11].With the help of condition monitoring and fault early warning techniques, predictive maintenance and condition-based maintenance have become increasingly adopted [12].Therefore, it is necessary to carry out more in-depth research on the condition monitoring and the fault early warning of the smart distribution grid so as to provide guidance and help for relevant management staff to predictive maintenance of smart distribution grids.
At present, many domestic and foreign scholars have put forward various solutions for the fault early warning of smart distribution grid from different angles.In [13], a fault early warning method suitable for the active distribution grid based on harmonic current is proposed.Firstly, a cloud model of harmonic current is designed and the entropy based on cloud model is used to measure the range of harmonic current during normal operation, so as to determine the anomaly threshold of harmonic current before and after the interconnection of intermittent energy.Then, comparison between the measured data with the harmonic anomaly threshold is used to determine whether the harmonic

Design of the Fault Gene Table
The fault of smart distribution grids has a certain mapping relationship with the operation data before the fault, therefore operation data can be represented as the gene of fault.In analogy to the human gene composed of four canonical bases, the operational state of smart distribution grid can be divided into four states: excellent, good, middle and bad.Then the BP neural network evaluation model is adopted to transform the operation data of the smart distribution grid into an ordered state transition time series, namely a gene.In addition, the inputs of the BP neural network are the state goal of each bus of smart distribution grid.Thus, the mapping relationship between each state transition time series and related fault can be obtain from historical fault data sources of smart distribution grids so as to construct the fault gene table.

State Division of Smart Distribution Grids
The state division of a distribution grid is hard to have a unified standard, and many studies have different points of view.In [26], it is divided into six kinds of state: fault, early warning, over-threshold, incomplete operation, safety and optimal.In [27], it is divided into six states: normal, early warning, threshold, emergency and recovery.
In this paper, in analogy to the human gene composed of the four canonical bases, the operational state of a smart distribution grid can be divided into four states: excellent, good, middle and bad labeled as E, G, M and B, which are shown as Table 1.

Operational State Evaluation of Each Bus
The operational state goal of each bus is determined by its voltage, active power and reactive power, as shown in Equation (1).
In which, G b as the operational state goal of each bus ranges from 0 to 1, which is better closer to 1, otherwise is worse.λ V is the weight of voltage, g V is the voltage goal of each bus, λ P is the weight of active power, g P is the active power of each bus, λ Q is the weight of reactive power, and g Q is the reactive power of each bus.
The voltage goal can be calculated as Equation (2).
where V. is the present value of voltage, V is the average voltage of the related bus in the historical data set, V max is the maximum voltage of the related bus in the historical data set, and V min is the minimum voltage of the related bus in the historical data set.The active power goal can be calculated as Equation (3).
where P is the present value of active power, P is the average active power of the related bus in the historical data set, P max is the maximum active power of the related bus in the historical data set, and P min is the minimum active power of the related bus in the historical data set.The reactive power goal can be calculated as Equation (4).
where Q is the present reactive power, Q is the average reactive power of the related bus in the historical data set, Q max is the maximum reactive power of the related bus in the historical data set, and Q min is the minimum reactive power of the related bus in the historical data set.

Operational State Evaluation Model of Smart Distribution Grid
In order to dynamically evaluate the current operational state of the smart distribution grid, it is necessary to have a comprehensive grasp of the operational condition of the smart distribution grid.Therefore, a state evaluation model based on BP neural network is proposed, which obtains the operational state by comprehensively taking each bus state into account.BP neural network has good features of fault tolerance and strong adaptive learning ability, which can improves the fault tolerance and the accuracy rate of evaluation.BP neural network is composed of an input layer, hidden layer and output layer.The input layer is the state goal of each bus.The output layer is the operational state of smart distribution grid.The model of BP neural network is shown as Figure 1.

Operational State Evaluation Model of Smart Distribution Grid
In order to dynamically evaluate the current operational state of the smart distribution grid, it is necessary to have a comprehensive grasp of the operational condition of the smart distribution grid.Therefore, a state evaluation model based on BP neural network is proposed, which obtains the operational state by comprehensively taking each bus state into account.BP neural network has good features of fault tolerance and strong adaptive learning ability, which can improves the fault tolerance and the accuracy rate of evaluation.BP neural network is composed of an input layer, hidden layer and output layer.The input layer is the state goal of each bus.The output layer is the operational state of smart distribution grid.The model of BP neural network is shown as Figure 1.The output result is divided into 4 states, which is shown as Table 2.

Fault Gene Table Construction Procedure of Smart Distribution Grid
1.A k-means clustering is adopted to cluster all buses state goals of the smart distribution grid in historical fault data source into four classes, labeled with related tags including of E, G, M and B according to the magnitude of average state goal   calculated by Equation (5).
where  =  The output result is divided into 4 states, which is shown as Table 2.
where k = where n is the number of training samples, and t nm represents state goal in the m th bus of the n th training sample.

3.
The other part of the clustered data is regarded as evaluation matrix A n * ×m , which is shown as Equation (7).
where n * is the number of evaluation samples, and a n * m represents state goal in the m th bus of the n * th evaluation sample.The training sample matrix T n×m (k) is input into the BP neural network to determine the weights and thresholds, and the training process was as follows: a.
Set the weight and threshold between input layer and hidden layer are respectively w ij and γ j , the weight and threshold between hidden layer and output layer are respectively w j and ∅, the number of nodes in hidden layer is N, learning rate is η, and the expected output of state goal is ŷ which is chosen distinct value according to the type of input sample Specifically, the ŷ for Bad(B) can be a reasonable value between 0 and 0.25, the ŷ for Middle(M) can be a reasonable value between 0.25 and 0.5, the ŷ for Good(G) can be a reasonable value between 0.5 and 0.75, and the ŷ for Excellent(E) can be a reasonable value between 0.75 and 1.For example, these expected outputs of state goals can take the mean value of the range, namely 0.125, 0.375, 0.625 and 0.875 respectively.b.
Choose sigmoid function as the activation function of hidden layer and output layer, namely f 1 and f 2 , which is shown as Equation (8). c.
Calculate the input and output value of each hidden neuron, which is shown as Equation ( 9) where h j is the input value of each hidden neuron, b j is the output value of each hidden neuron, r = 1, 2, . . .n, which denotes the index of training sample, and t ri represents state goal in the i th bus of the r th training sample.d.
Calculate the output value y according to the output of hidden layer b j , weight and threshold between hidden layer and output layer w j and ∅, which is shown as Equation (10).
Calculate the output error according to Equation (11).
f. Calculate the generalization error of output layer by Equation (12). g.
Calculate the generalization error of hidden layer by Equation (13). h.
Adjust the weight and threshold between hidden layer and output layer w j and ∅, which is calculated as Equation (14).
where w j and ∅ are the weight and threshold between hidden layer and output layer after adjustment.i.
Adjust the weight and threshold between input layer and hidden layer w ij and γ j , which is calculated as Equation (15).
where w ij and γ j are the weight and threshold between input layer and hidden layer after adjustment.j.
When r is changed from 1 to n, all training sample is trained completely, the global error E g will be calculated by adding all errors of each training sample.If E g reaches into the specified error range, return to the step k, otherwise, set the E g to zero and return to step c to repeat the training.k.
End the training and record the weights and thresholds of current network.

4.
Input the evaluation matrix A n * ×m into the trained BP neural network to get the state goal of the smart grid at each moment, and then compare it with the expected state to verify the effectiveness of the model.The trained BP neural network will be used to build the mapping relationship between state transition time sequence and fault so as to construct the fault gene table.The construction flow chart of fault gene table for the smart distribution grid is shown as Figure 2. The constructed fault gene table is shown as Figure 3.
4. Input the evaluation matrix   * × into the trained BP neural network to get the state goal of the smart grid at each moment, and then compare it with the expected state to verify the effectiveness of the model.The trained BP neural network will be used to build the mapping relationship between state transition time sequence and fault so as to construct the fault gene

Fault Early Warning by Improved Smith-Waterman
After the fault gene table is constructed, operation data of all buses in a smart distribution grid is obtained periodically, which is transformed into gene by BP neural network.Then, a sequence alignment algorithm is adopted to get the match score between this gene and genes in fault gene table.If the match score exceeds the given threshold, the related fault will be warned early.If the length of gene to be matched is increasing gradually, and the lengths of genes in the fault gene table are also different, it is better to use a local sequence alignment.In addition, fault early warning has a higher request of the sensitivity.Therefore, the Smith-Waterman algorithm is chosen to achieve the fault early warning of smart distribution grids, and some corresponding improvements are made to adapt to the characteristics of smart distribution grids.

Improved Smith-Waterman
Smith-Waterman algorithm aligns two sequences by matches or mismatches (also known as

Fault Early Warning by Improved Smith-Waterman
After the fault gene table is constructed, operation data of all buses in a smart distribution grid is obtained periodically, which is transformed into gene by BP neural network.Then, a sequence alignment algorithm is adopted to get the match score between this gene and genes in fault gene table.If the match score exceeds the given threshold, the related fault will be warned early.If the length of gene to be matched is increasing gradually, and the lengths of genes in the fault gene table are also different, it is better to use a local sequence alignment.In addition, fault early warning has a higher request of the sensitivity.Therefore, the Smith-Waterman algorithm is chosen to achieve the fault early warning of smart distribution grids, and some corresponding improvements are made to adapt to the characteristics of smart distribution grids.

Improved Smith-Waterman
Smith-Waterman algorithm aligns two sequences by matches or mismatches (also known as substitutions), insertions and deletions.Both insertions and deletions are operations that introduce gaps, which are represented by dashes.The Smith-Waterman algorithm based on the dynamic programing technique provides high sensitivity [28,29].The procedure consists of three steps:

•
Fill in the dynamic programming matrix.

•
Find the maximal score in the matrix.

•
Trace back the path that leads to the maximal score to find the optimal local alignment.
When the Smith-Waterman algorithm is used in biological gene sequence alignment, the importance of the four bases are the same, so the score matrix for the same base matching score is the same.In its application in the field of smart distribution grids, the analogous four bases are necessary to be different because the operational state of smart distribution grid is divided into four types with distinct performance, namely E, G, M and B.
In order to apply the request of online fault early warning of smart distribution grids, it is necessary to have some improvements to the traditional Smith-Waterman algorithm, specifically, in the design of the substitution matrix, which has two rules to be followed: 1.
If two states are matched, the worse the performances of two states are, the higher the match score is; namely, the match score of E with E, G with G, M with M and B with B follow an ascending order.

2.
If two states are mismatched, the bigger the difference between the two states is, the lower the match score.For instance, the match score of E with G, E with M and E with B follows a descending order.
The first rule can make the match between lower operational states more important, which can improve the accuracy rate of fault early warning.The second rule can lower the negative effects on fault early warning when the alignment of two states are mismatched.
According to the importance degree of each operational state and the design rules, the substitution matrix can be designed as the Equation (16).
where σ is the importance degree of operational state, s l i is the state of l th i element in the gene sequence S to be mentioned in the following paper, and u l j is the state of l th j element in the gene sequence U also to be mentioned in the following paper.

1.
The operation data of a smart distribution grid is obtained periodically in real time, and transformed to state transition time sequence, namely gene to be matched, by the BP neural network.The length of it gradually increases with the passage of time.

2.
The gene to be matched and the gene in fault gene table are matched periodically by the improved Smith-Waterman algorithm, and the matching process is as follows: a. Let gene to be matched S = s 1 s 2 . . .s l i . . .s l S , gene in fault gene table U = u 1 u 2 . . .u l j . . .u l U , and the lengths of them be l S and l U , respectively.Determine the substitution matrix and the gap penalty scheme.c.
Construct a scoring matrix D and initialize its first row and first column.The size of the scoring matrix is (l S + 1) × (l U + 1).d.
Fill the scoring matrix using the Equations ( 17) and (18).
In which, 1 ≤ l i ≤ l S , 1 ≤ l j ≤ l U , Score(s i , u j ) can be calculated by Equation ( 16), Score(s l i , 0) and Score(0, u l j ) are the gap penalty when the u l j or the s l i is dash.
e. Find l * i and l * j to make where D(l * i , l * j ) is the highest score in the scoring matrix D. f.Until all alignments are finished between genes to be matched and all genes in the fault gene table, all highest scores are obtained.

3.
Find the maximum score from all the highest scores, and compare it to the given threshold.If the maximum score exceeds the threshold, the related fault will be warned early.The block diagram of fault early warning for smart distribution grid realized by the improved Smith-Waterman algorithm is shown as Figure 4.
Energies In the process of gene sequence alignment, it is very critical to set an appropriate early warning threshold.In this paper, the early warning threshold is set to the percentage of full score of fault genes in the fault gene table.Therefore, the match score of early warning is different, which depends on the length of genes in the fault gene table for better adaptability.In the following simulation, the most suitable threshold, which can make the proposed model obtain an optimal accuracy rate of fault early warning, is chosen from different early warning thresholds.

Simulation and Analysis
PSAT (power system analysis toolbox) based on MATLAB is adopted to do time domain simulation tests for the proposed model.The IEEE-14 bus is used as the simulation object, and random disturbance is introduced during the process of the simulation.The disturbances are randomly introduced into one of the four components including of the constant power (PQ) load, automatic voltage regulator (AVR), transmission line and synchronous machine of IEEE-14 Bus.

Simulation Parameters
The corresponding simulation parameters include the simulation parameters of the proposed model and IEEE-14 bus.
1. Simulation parameters of the proposed model are shown as Tables 3-6.In the process of gene sequence alignment, it is very critical to set an appropriate early warning threshold.In this paper, the early warning threshold is set to the percentage of full score of fault genes in the fault gene table.Therefore, the match score of early warning is different, which depends on the length of genes in the fault gene table for better adaptability.In the following simulation, the most suitable threshold, which can make the proposed model obtain an optimal accuracy rate of fault early warning, is chosen from different early warning thresholds.

Simulation and Analysis
PSAT (power system analysis toolbox) based on MATLAB is adopted to do time domain simulation tests for the proposed model.The IEEE-14 bus is used as the simulation object, and random disturbance is introduced during the process of the simulation.The disturbances are randomly introduced into one of the four components including of the constant power (PQ) load, automatic voltage regulator (AVR), transmission line and synchronous machine of IEEE-14 Bus.

Simulation Parameters
The corresponding simulation parameters include the simulation parameters of the proposed model and IEEE-14 bus.

1.
Simulation parameters of the proposed model are shown as Tables 3-6.

2.
Simulation parameters of the IEEE-14 bus are shown as Figure 5 and Table 7.

Procedure of Simulation
In the process of simulation, the random disturbances are introduced into each model of IEEE-14 bus so as to simulate the actual operating environment of the distribution grid and shorten the

Procedure of Simulation
In the process of simulation, the random disturbances are introduced into each model of bus so as to simulate the actual operating environment of the distribution grid and shorten the appearance time of faults.The simulation process is divided into two steps including the simulation of fault gene table construction and the simulation of fault early warning of smart distribution grid based on the obtained fault gene table.

Simulation of Fault Gene Table Construction
Firstly, PSAT based on MATLAB is adopted to do time domain simulation on the IEEE-14 bus.The simulation time is 20 s.Two types of operation data of each bus are obtained.One type is the normal operation data of each bus about voltage, active power and reactive power of distribution grid without any disturbance, which is shown as Figure 6.The other is the operation data with random disturbances per second during the simulation, which is shown as Figure 7.
In Figure 6a-c, the change of the three parameters of each bus are very regular, namely the state fluctuation of each bus is very regular without any disturbance.In Figure 7a-c, the fluctuation of voltage and reactive power in each bus is strong, while the fluctuation of active power in each bus is weak.
Simulation data about three parameters of each bus to first fault is exported by PSAT, and 166 groups of voltage, active power and reactive power are obtained, respectively.The comprehensive operational state goal of each bus can be calculated by Equations ( 1)-( 4) with the simulation data.Then, a k-means clustering is adopted to cluster the data about state goal of each bus into four classes, labeled with related tags including of E, G, M and B according to the magnitude of average state goal G avg calculated by Equation (5).The clustered data is a high-dimensional dataset for the number of buses is 14.A dimensionality reduction algorithm named t-Distributed Stochastic Neighbor Embedding (t-SNE) [30,31] is adopted to model the high-dimensional data by a two-dimensional point, which then is visualized in a scatter plot shown as Figure 8.In Figure 6a-c, the change of the three parameters of each bus are very regular, namely the state fluctuation of each bus is very regular without any disturbance.In Figure 7a-c, the fluctuation of voltage and reactive power in each bus is strong, while the fluctuation of active power in each bus is weak.
Simulation data about three parameters of each bus to first fault is exported by PSAT, and 166 groups of voltage, active power and reactive power are obtained, respectively.The comprehensive operational state goal of each bus can be calculated by Equations ( 1)-( 4) with the simulation data.Then, a k-means clustering is adopted to cluster the data about state goal of each bus into four classes, labeled with related tags including of E, G, M and B according to the magnitude of average state goal buses is 14.A dimensionality reduction algorithm named t-Distributed Stochastic Neighbor Embedding (t-SNE) [30,31] is adopted to model the high-dimensional data by a two-dimensional point, which then is visualized in a scatter plot shown as Figure 8.A 10-fold cross-validation is adopted on the obtained 166 groups of data to get the relation between the average accuracy rate of BP neural network evaluation model and the number of neurons in hidden layer.As the number of neurons in hidden layer ranges from 4 to 20, the relation between them is shown as Figure 9.In Figure 9, the BP neural network evaluation model has the highest average accuracy rate close to 95%, when the number of neurons in a hidden layer is 12.
After the BP neural network state evaluation model is trained and the number of neurons in the hidden layer is set to 12, the goal of each bus in each time domain simulation with random disturbances is firstly transformed into a data sequence with a sampling period of 0.5 s, and then this data sequence is input into the trained BP neural network evaluation model to obtain a state transition time sequence and its related fault.In each time domain simulation, the flow diagram of the fault gene table construction is shown in Figure 10.A 10-fold cross-validation is adopted on the obtained 166 groups of data to get the relation between the average accuracy rate of BP neural network evaluation model and the number of neurons in hidden layer.As the number of neurons in hidden layer ranges from 4 to 20, the relation between them is shown as Figure 9.
buses is 14.A dimensionality reduction algorithm named t-Distributed Stochastic Neighbor Embedding (t-SNE) [30,31] is adopted to model the high-dimensional data by a two-dimensional point, which then is visualized in a scatter plot shown as Figure 8.A 10-fold cross-validation is adopted on the obtained 166 groups of data to get the relation between the average accuracy rate of BP neural network evaluation model and the number of neurons in hidden layer.As the number of neurons in hidden layer ranges from 4 to 20, the relation between them is shown as Figure 9.In Figure 9, the BP neural network evaluation model has the highest average accuracy rate close to 95%, when the number of neurons in a hidden layer is 12.
After the BP neural network state evaluation model is trained and the number of neurons in the hidden layer is set to 12, the goal of each bus in each time domain simulation with random disturbances is firstly transformed into a data sequence with a sampling period of 0.5 s, and then this data sequence is input into the trained BP neural network evaluation model to obtain a state transition time sequence and its related fault.In each time domain simulation, the flow diagram of the fault gene table construction is shown in Figure 10.In Figure 9, the BP neural network evaluation model has the highest average accuracy rate close to 95%, when the number of neurons in a hidden layer is 12.
After the BP neural network state evaluation model is trained and the number of neurons in the hidden layer is set to 12, the goal of each bus in each time domain simulation with random disturbances is firstly transformed into a data sequence with a sampling period of 0.5 s, and then this data sequence is input into the trained BP neural network evaluation model to obtain a state transition time sequence and its related fault.In each time domain simulation, the flow diagram of the fault gene table construction is shown in Figure 10.According to Figure 10, plenty of time domain simulations with random disturbances are made, and the relationship between the fault and the state transition time sequence is obtained in each time domain simulation, which is used to construct the fault gene table.The faults in the fault gene table are mainly composed of four component faults shown in Table 8.

Component Fault Bus Generator Line Breaker
Fault number 60 50 100 40 Three typical relationships between sate transition time sequences and related component faults are given in Figure 11.
In Figure 11, it can be concluded that smart distribution grid has been in a lower operation before the occurrence of a fault.According to Figure 10, plenty of time domain simulations with random disturbances are made, and the relationship between the fault and the state transition time sequence is obtained in each time domain simulation, which is used to construct the fault gene table.The faults in the fault gene table are mainly composed of four component faults shown in Table 8.Three typical relationships between sate transition time sequences and related component faults are given in Figure 11.
In Figure 11, it can be concluded that smart distribution grid has been in a lower operation before the occurrence of a fault.

Simulation of Fault Early Warning
PSAT based on MATLAB is adopted to do time domain simulation added with random disturbances on IEEE-14 bus with a simulation time of 20 s.During the time domain simulation, state transition time sequence as a gene is obtained periodically, which is used to match with the genes in the fault gene table by the improved Smith-Waterman algorithm.If the match score exceeds the given threshold, the related fault will be warned early.The flow diagram of fault early warning is shown in Figure 12.According to Equation ( 16) and Table 6, the substitution matrix for gene sequence alignment is shown in Table 9.

Simulation of Fault Early Warning
PSAT based on MATLAB is adopted to do time domain simulation added with random disturbances on IEEE-14 bus with a simulation time of 20 s.During the time domain simulation, state transition time sequence as a gene is obtained periodically, which is used to match with the genes in the fault gene table by the improved Smith-Waterman algorithm.If the match score exceeds the given threshold, the related fault will be warned early.The flow diagram of fault early warning is shown in Figure 12.

Simulation of Fault Early Warning
PSAT based on MATLAB is adopted to do time domain simulation added with random disturbances on IEEE-14 bus with a simulation time of 20 s.During the time domain simulation, state transition time sequence as a gene is obtained periodically, which is used to match with the genes in the fault gene table by the improved Smith-Waterman algorithm.If the match score exceeds the given threshold, the related fault will be warned early.The flow diagram of fault early warning is shown in Figure 12.According to Equation ( 16) and Table 6, the substitution matrix for gene sequence alignment is shown in Table 9.According to Equation ( 16) and Table 6, the substitution matrix for gene sequence alignment is shown in Table 9.

State
If a fault occurs during a time domain simulation, and the proposed model fails to give the fault, or the gene related to the fault is a new type, then it is necessary to be stored in the fault gene table for improving the accuracy rate of fault early waning in the following simulation.The total number of proposed models giving a right fault early warning and occurring faults are recoded respectively, and then the accuracy rate of fault early warning is calculated by the Equation (20).
where T is the number of giving a right fault early warning, C is the total number of occurring faults, and ρ is the accuracy rate of fault early warning.
If the threshold and weights of three parameters in each bus are the accuracy rate of fault early warning is different.In order to know the relation between them, four sets different weights are chosen and threshold ranges from 0.6 to 0.9.The relations between the average accuracy rate of fault early warning and threshold in different weights are obtained using plenty of simulations as shown in Figure 13.

State E G M
If a fault occurs during a time domain simulation, and the proposed model fails to give the fault, or the gene related to the fault is a new type, then it is necessary to be stored in the fault gene table for improving the accuracy rate of fault early waning in the following simulation.The total number of proposed models giving a right fault early warning and occurring faults are recoded respectively, and then the accuracy rate of fault early warning is calculated by the Equation (20).
where T is the number of giving a right fault early warning, C is the total number of occurring faults, and  is the accuracy rate of fault early warning.
If the threshold and weights of three parameters in each bus are different, the accuracy rate of fault early warning is different.In order to know the relation between them, four sets of different weights are chosen and threshold ranges from 0.6 to 0.9.The relations between the average accuracy rate of fault early warning and threshold in different weights are obtained using plenty of simulations as shown in Figure 13.In Figure 13, the following can be concluded: 1.In terms of weights: the accuracy rate of fault early warning is higher when the weights of voltage and reactive power of each bus is bigger.The threshold has a trend of decreasing at the maximum accuracy rate of fault early warning with the decrease of voltage weight.2. In terms of threshold: when the threshold is too small, the probability of error is higher due to the interference of the other similar fault genes.While the threshold is too large, the probability of that fault occurring before the match score reaches the threshold is higher.
Therefore, when selecting the weights of the parameters of the bus, it is better to choose slightly higher weights of voltage and reactive power in each bus, and slightly lower weights of active power.As to threshold, it is not necessary to select too large or too small thresholds, and an appropriate In Figure 13, the following can be concluded: 1.
In terms of weights: the accuracy rate of fault early warning is higher when the weights of voltage and reactive power of each bus is bigger.The threshold has a trend of decreasing at the maximum accuracy rate of fault early warning with the decrease of voltage weight.

2.
In terms of threshold: when the threshold is too small, the probability of error is higher due to the interference of the other similar fault genes.While the threshold is too large, the probability of that fault occurring before the match score reaches the threshold is higher.
Therefore, when selecting the weights of the parameters of the bus, it is better to choose slightly higher weights of voltage and reactive power in each bus, and slightly lower weights of active power.As to threshold, it is not necessary to select too large or too small thresholds, and an appropriate threshold is better.In Figure 13, when λ V = λ P = 0.2, λ Q = 0.4, and the threshold is chosen as 0.8, the proposed model has the highest average accuracy rate of fault early warning by 94%.
With the times of simulation increasing, the fault genes failed to warn early.This will be added into the fault gene table, which can increase the accuracy rate of fault early warning later.In order to know the relation between the accuracy rate and times of simulation, the current best weights and threshold are chosen as λ V = 0.4, λ P = 0.2, λ Q = 0.4, threshold = 0.8, and the accuracy rate changes with the times ranges from 30 to 150 as shown in Figure 14.
Energies 2017, 10,1963 17 of 20 threshold is better.In Figure 13, when   = 0.4,   = 0.2,   = 0.4, and the threshold is chosen as 0.8, the proposed model has the highest average accuracy rate of fault early warning by 94%.
With the times of simulation increasing, the fault genes failed to warn early.This will be added into the fault gene table, which can increase the accuracy rate of fault early warning later.In order to know the relation between the accuracy rate and times of simulation, the current best weights and threshold are chosen as   = 0.4,   = 0.2,   = 0.4, threshold = 0.8, and the accuracy rate changes with the times ranges from 30 to 150 as shown in Figure 14.In Figure 14, the change trend of accuracy rate can be divided into two parts.The first part shows that the accuracy rate increases with the times of simulation because the fault genes which failed to warn early are added into the fault gene table.The second part denotes that the accuracy rate decreases with the times of simulation because the number of fault genes in the fault gene table is too large, which causes redundancies and introduces disturbances.Therefore, the accuracy rate of fault early warning has a tendency to converge to one at the beginning.When the times reach 120, the accuracy rate of fault early warning is close to 97%.
In the field of fault early warning for smart distribution grids, due to the difference of data source access and the difference of needed data for distinct models, it is hard to take a quantitative approach to compare the proposed model with the existing models for fault early warning.Therefore, a qualitative approach is taken to compare these models to some degrees, which is shown as Table 10.

Conclusions
In this paper, the proposed model combined BP neural network and the Smith-Waterman gene sequence alignment algorithm, fully exploiting fault features of smart distribution grids, which provides a new thought in the solution of fault early warning for smart distribution grids.In practice, the historical fault data source including voltage, active power and reactive power in each bus can be transformed into a fault gene table by the BP neural network, and then an improved Smith-Waterman is adopted to match the current state transition time sequence (gene) with the genes in the fault gene table.If the match score exceeds a given threshold, the related fault will be warned early.The proposed model has strong versatility and adaptability due to a different fault gene table that can be constructed when confronted with different scale and more complicated topology of a smart distribution grid.PSAT based on MATLAB is adopted to do time domain simulations of proposed models on the object of IEEE-14 bus.The simulation result shows that the proposed model can achieve the fault early warning for smart distribution grids efficiently and with a high accuracy rate with a tendency to converge to one.It provides operational monitoring and maintenance guidance of smart distribution grids for relevant managers and effectively improves the scientific characteristics and predictability of operational decision-making for power systems.
There are actually some limitations in the proposed model.It can only perform the fault early warning in the most faults of smart distribution grid which have features of tendency and cumulative effect.The transient faults caused by the improper operation or extreme weather are hard to be addressed by the proposed model.
Fault gene tables relating to different scales of smart distribution grids have certain differences, but also have a certain generality to some extent.Therefore, in further research, it can be considered that an association rule algorithm such as Apriori and FP-Growth can be used to refine the fault gene tables related to different scales of smart distribution grids so as to make fault gene tables that have better universality.

Figure 1 .
Figure 1.Model of three-layer back propagation (BP) neural network.

Figure 1 .
Figure 1.Model of three-layer back propagation (BP) neural network.

Figure 3 .
Figure 3. Fault gene table of smart distribution grid.

Figure 4 .
Figure 4.The block diagram of fault early waning for smart distribution grid.

Figure 7 .
Figure 7. Three parameters of each bus with random disturbances: (a) Voltage; (b) Active power; (c) Reactive power.

Figure 8 .
Figure 8. Clustered 166 groups of data with 14 dimension reduces to two-dimensional points.

Figure 9 .
Figure 9. Relation between the average accuracy rate of BP neural network evaluation model and the number of neurons in a hidden layer.

Figure 8 .
Figure 8. Clustered 166 groups of data with 14 dimension reduces to two-dimensional points.

Figure 8 .
Figure 8. Clustered 166 groups of data with 14 dimension reduces to two-dimensional points.

Figure 9 .
Figure 9. Relation between the average accuracy rate of BP neural network evaluation model and the number of neurons in a hidden layer.

Figure 9 .
Figure 9. Relation between the average accuracy rate of BP neural network evaluation model and the number of neurons in a hidden layer.

Figure 10 .
Figure 10.Flow chat of fault gene table construction for smart distribution grid in the simulation test.

Figure 11 .
Figure 11.Typical relationship between sate transition time sequences and component faults in the simulation test.

Figure 11 .
Figure 11.Typical relationship between sate transition time sequences and component faults in the simulation test.

Energies 2017, 10 , 1963 15 of 20 Figure 11 .
Figure 11.Typical relationship between sate transition time sequences and component faults in the simulation test.

Figure 12 .
Figure 12.Flow chart of fault early warning for smart distribution grid in the simulation test.

Figure 13 .
Figure 13.Relations between the average accuracy rate of fault early warning and threshold in different weights.

Figure 13 .
Figure 13.Relations between the average accuracy rate of fault early warning and threshold in different weights.

Figure 14 .
Figure 14.Relations between the accuracy rate of fault early warning and times of simulation.

Table 1 .
State division of smart distribution grid.

Table 2 .
Division rule of output state in BP neural network.
1, 2, 3, 4 denotes to different class, m is the number of buses,   represents the number of groups in  ℎ class, and   (, ) represents the state goal of smart distribution grid in the mth bus of the zth group.2. One part of the clustered data is regarded as the training sample matrix  × () shown as Equation (6).

Table 2 . Division rule of output state in BP neural network. Output 0 ≤ y < 0.25 0.25 ≤ y < 0.5 0.5 ≤ y < 0.75 0.75 ≤ y ≤ 1
1, 2, 3, 4 denotes to different class, m is the number of buses, ϕ k represents the number of groups in k th class, and G b (i, z) represents the state goal of smart distribution grid in the m th bus of the z th group.

Table 1 st state transition time sequence (EEEGEEGMEGMB...) I th state transition time sequence (EEEEEEEGEEGMEGMB...) (I+1) th state transition time sequence (EEGGEGMMEGBB...) J th state transition time sequence (EEEEEEGGEGMMEGBB...)
table.The construction flow chart of fault gene table for the smart distribution grid is shown as Figure 2. The constructed fault gene table is shown as Figure 3. Fault gene table of smart distribution grid.
Obtain the mapping relationships between state transition time sequences and faults (fault gene table) Construct the evaluation sample matrix Figure 2. Fault gene table construction of smart distribution grid.Figure 2. Fault gene table construction of smart distribution grid.Energies 2017, 10, 1963 7 of 20 Fault Gene

Table 3 .
Weights of bus operational state goal.

Table 4 .
Simulation parameters of BP neural network.

Table 5 .
Simulation parameters of Smith-Waterman algorithm.

Table 6 .
Importance degree of four operational state in smart distribution grid.
Flow chat of fault gene table construction for smart distribution grid in the simulation test.

Table 8 .
Component faults in fault gene table.

Table 8 .
Component faults in fault gene table.
Flow chart of fault early warning for smart distribution grid in the simulation test.
Flow chart of fault early warning for smart distribution grid in the simulation test.

Table 9 .
Substitution matrix for gene sequence alignment.

Table 9 .
Substitution matrix for gene sequence alignment.