A Review of Fault Diagnosing Methods in Power Transmission Systems

: Transient stability is important in power systems. Disturbances like faults need to be segregated to restore transient stability. A comprehensive review of fault diagnosing methods in the power transmission system is presented in this paper. Typically, voltage and current samples are deployed for analysis. Three tasks / topics; fault detection, classiﬁcation, and location are presented separately to convey a more logical and comprehensive understanding of the concepts. Feature extractions, transformations with dimensionality reduction methods are discussed. Fault classiﬁcation and location techniques largely use artiﬁcial intelligence (AI) and signal processing methods. After the discussion of overall methods and concepts, advancements and future aspects are discussed. Generalized strengths and weaknesses of di ﬀ erent AI and machine learning-based algorithms are assessed. A comparison of di ﬀ erent fault detection, classiﬁcation, and location methods is also presented considering features, inputs, complexity, system used and results. This paper may serve as a guideline for the researchers to understand di ﬀ erent methods and techniques in this ﬁeld.


Introduction
It is a challenging task for power system operators (PSO) to supply uninterrupted electric power to end-users.Although fault intrusion is beyond human control, it is essentially important to accurately detect, classify and locate the fault location.Fault detection, classification and location finding methods in power transmission systems have been extensively studied [1][2][3][4][5][6].Efforts are under way develop an intelligent protection system that is able to detect, classify and locate faults accurately.
Advancements in signal processing techniques, artificial intelligence (AI) and machine learning (ML) have aided researchers in adopting a more comprehensive and dedicated approach in studies associated with conventional fault protection strategies.Moreover, two established limitations of online fault detection mechanisms are being dealt with.The first limitation is the difficulty in obtaining the needed data.In order to gain information at different nodes/buses in the grids, intelligent electronic devices (IED) are installed [7,8].Adding to this, the development of self-powered, non-intrusive sensors has the ability to build sensor networks for smart online monitoring [9,10].The knowledge obtained from the data related to various transmission grid conditions has enabled researchers to develop intelligent fault protection/diagnosis systems.The effects of diversified and complex transmission system topologies can be minimized by using the interspersed sensors for the collection of voltage and current signals.The second limitation is the lack of computational capability and communication.Synchronized global positioning system (GPS) sampling and high-speed broadband communications for IEDs in power grids are proposed in [8].These technical advancements assure the quick response to faulty scenarios and the effective functioning of online monitoring mechanisms based on sensor networks.The availability of high-performance computing solutions gives provision to the implementation of higher computation complexity methods [7].
Short circuit faults are more likely to appear in power systems (PS) than the series faults, break in the path of current.Shunt faults result in catastrophes and leave hazardous effects on PS.Short circuit faults can be divided into symmetrical and asymmetrical faults and further classification is presented in Figure 1 for the three-phase system [11].
Appl.Sci.2019, 9, x FOR PEER REVIEW 2 of 27 enabled researchers to develop intelligent fault protection/diagnosis systems.The effects of diversified and complex transmission system topologies can be minimized by using the interspersed sensors for the collection of voltage and current signals.The second limitation is the lack of computational capability and communication.Synchronized global positioning system (GPS) sampling and high-speed broadband communications for IEDs in power grids are proposed in [8].These technical advancements assure the quick response to faulty scenarios and the effective functioning of online monitoring mechanisms based on sensor networks.The availability of highperformance computing solutions gives provision to the implementation of higher computation complexity methods [7].Short circuit faults are more likely to appear in power systems (PS) than the series faults, break in the path of current.Shunt faults result in catastrophes and leave hazardous effects on PS.Short circuit faults can be divided into symmetrical and asymmetrical faults and further classification is presented in Figure 1 for the three-phase system [11].An in-depth review of the different techniques used for fault detection, classification, and location estimation has been presented in this paper.This paper also presents a detailed comparison of various fault detection, classification and location methods based on the algorithm used, input, test system, features extracted, complexity level and results.Where, complexity is defined by considering the number of inputs and rules involved in algorithm development throughout this paper as simple, medium and complex.The rest of the paper is organized as follows: feature extraction based on transformation, dimensionality reduction, and the modal transformation is discussed in Section 2. Fault detection methods largely based on feature extraction techniques are presented in Section 3. Fault-type classification methods are presented in Section 4 while a comparison of fault-type classification techniques is presented in Section 5. Section 6 describes the future prospects of fault-type classification.Fault location finding approaches are reviewed in Section 7 and a comparison of fault location finding methods is outlined in Section 8. Future trends in fault location finding techniques are proposed in Section 9. Strengths and weaknesses of notable emerging computational intelligence methods are presented in Section 10.Finally, conclusions are drawn in Section 11.

Disclose the Valuable Information
Power transmission grid information includes voltage and current signals.However, it is difficult to apply some set of rules and criteria to disclose the intelligent information contained in the sampled signals.Thus here, feature extraction approaches are handy to disclose the valuable information and to reduce the influence of the variance within the system under study.Researchers may attain better awareness about the fault-type, classification, and location by using an appropriate feature extraction approach.Furthermore, dimensionally reduced data boost the performance of the

Disclose the Valuable Information
Power transmission grid information includes voltage and current signals.However, it is difficult to apply some set of rules and criteria to disclose the intelligent information contained in the sampled signals.Thus here, feature extraction approaches are handy to disclose the valuable information and to reduce the influence of the variance within the system under study.Researchers may attain better awareness about the fault-type, classification, and location by using an appropriate feature extraction approach.Furthermore, dimensionally reduced data boost the performance of the algorithm employed within locators or classifiers, thus providing robust and precise results.Different feature extraction methods are discussed in the subsequent section with applications [7].

Transformations
It is well established that the frequency characteristics of voltage and current profiles change dramatically on the occurrence of a fault.However, if the fault is identified and investigated accurately, it can help to protect the affected transmission line (TL) to a great extent [12].Numbers of methods are available to analyze the frequency characteristics of time-domain signals but wavelet transform, Fourier transform and S-transform have frequently employed techniques in fault identification systems/protective relays.

Wavelet Transform (WT)
A comprehensive note on wavelet transform (WT) is provided in [13].WT is a widely used feature extraction approach in different fault diagnosis systems.Practically, to obtain characteristics of voltage and current signals in multiple frequency bands, discrete wavelet transform (DWT) is used than continuous WT (CWT).Thus during DWT implementations, it is important to select which mother wavelet (MW) and decomposition level to be used before actually creating the features.Gawali et.al. provided a detailed comparison of various mother wavelets for fault detection and classification [14] and recommended Bior3.9,Db10, Meyer, Sym8, and Coif5 mother wavelets for fault detection.It is to be noted that different sampling rates were adopted but frequency bounds are important than the decomposition level itself.Coefficients are selected in detail levels as a feature in [15][16][17].Kashyp et al. adopted a Mayer wavelet with frequency bands of 1-2 kHz [15] while the Db2 wavelet is selected in [16,17] with frequency bands of 4-8 kHz.Summations of absolute coefficients of detail levels are used to extract features in [18][19][20][21].The coefficients in  Hz or 99-199 Hz frequency bands are assumed in [18][19][20] and three Daubchies wavelets, namely Db1, Db4, and Db8 are selected.
Coefficients can be used in another way by calculating the energies of the detail levels.The energy of the 3540-7680 Hz frequency band is chosen in [22] while the frequency band of 1.5625-3.125kHz is used in [23].Wavelet energy entropy (WEE) is introduced in [24] based on the wavelet energies and used in [25].Moreover, wavelet singular entropy (WSE) is introduced in [26], which is a combination of singular value decomposition and Shanon entropy, to produce the features.Discussed literature proved the success of the DWT methods for fault detection and classification with a variety of mother wavelets and coefficients adopted in both low and high-frequency detail levels.

Fourier Transform (FT)
The Fourier transform (FT) is an extensively used mathematical tool for the analysis of frequency-domain signals.Discrete Fourier transform (DFT) is used where both frequency and time domain coefficients are discrete and, computed via fast Fourier transform (FFT).S. Yu et.al.adopted full-cycle DFT (FCDFT) and half-cycle DFT to remove harmonics, dc components and to assess the phasor elements [27].Half-cycle DFT is also used to calculate fundamental and harmonic phasors employed for fault-type classification [28,29].In [30], full-cycle FFT is used to find the fundamental components of voltage and current.

S-Transform (ST)
An S transform, derived from CWT, offers time-frequency demonstration of frequency-dependent resolution based on scalable and moving localized Gaussian window [31].Local spectral characteristics can be obtained effectively from S-transform, useful in expediting transients [32].ST calculations are stored in S-matrix and ST contours are plotted for two-dimensional visualization and then features are extracted.ST is used to overcome the shortcomings of the DWT such as; sensitive to noise and not able to exactly reveal the characteristics of harmonics [33,34].In [35], S-transform contour energy and standard deviation used to select faulty lines and sections, respectively.S. Samantaray et.al.used ST to find amplitude, impedance and fault points [36].

Dimensionality Reduction
Principal component analysis (PCA) is used to map data from high dimensional space to low dimensional subspace, to mitigate dimensionality of the data, where the variance of the data could be comprehended in the best possible way [37].Thukaram et al. considered the PCA method to extract features from voltage and current signals [38].The PCA approach is employed on wavelet coefficients and principal components, used for fault-type classification and location finding.The feature extraction method is proposed in [39] based on random dimensionality reduction projection (RDRP) to reduce the dimensionality of the original vector in the Gaussian random matrix, making RDRP an independent of training data.Further, small memory is required as feature extraction is furnished with matrix multiplication [39].

Modal Transformation
Clarke transformation (CT), a model transformation technique, is used in [40][41][42] to decouple and transform three-phase quantities a, b and c to α, β and 0.Then, fault-type by relating the modal components and phase quantities are characterized [42].Calculations for fault detection and location indices were carried out in [40,41].Researchers used CT with little modifications called Clarke-Concordia transformation [43,44] and Karrenbauer transformation [45], respectively, to expedite the implementation of fault characteristics.

Fault Detection (FD)
Typically, fault detection is done prior to classification and location estimation.Fault detection is performed based on the extracted features.Whenever a self-governing technique is used for fault detection, the classifier and the locator are activated after a fault is definitely detected.Furthermore, there is no need to devise fault detection algorithms when classifiers and locators are proficient to distinguish between healthy and abnormal conditions.However, some fault detection methods are discussed here.
Martinez et al. used negative sequence components for fault detection [46].To minimize the chances of false fault detection, the convolution of partial differential with respect to (w.r.t.) time of negative sequence components with a triangular wave is used to obtain a joint fault indicator (JFI).This JFI based fault detection is robust in amplitude variation and frequency deviation cases.
The wavelet-based method is proposed for real-time fault recognition in TLs [47].This technique is not exaggerated by the choice of mother-wavelet and has no time delay for fault detection for both long and compact wavelets.
Numerous researches have been conducted for the detection of high impedance faults (HIF) [48][49][50] as conventional algorithms may fail to detect HIF.Wai et al. extracted high-frequency data by using DWT with quadratic spline mother wavelet for HIF detection [48].Wavelet coefficient from DWT and converted scale coefficients used to detect HIF in [49].In [50], the mean of DWT coefficients is obtained via PCA to reduce the dimensionality of the features at different frequency bands.
Normally, the fault detection time does not significantly affect the overall protection system performance which includes fault detection, classification and location mechanisms.Typically, fault detection is achieved in 2-10 ms as compared to 30 ms for fault-type classification.So, more details on fault-type classification and location techniques are presented in this paper.However, a comparison of different fault detection methods is given in Table 1 considering the algorithm used in those methods, complexity level, employed system, inputs, features, and results.Where, complexity is defined by considering the number of inputs and rules involved in algorithm development as simple, medium and complex.Further, the selection of complexity level is achieved based on feature training and testing time, accuracy, convergence, and variance along with data required.

Fault-Types Classification (FC)
Classification of fault-type plays a vital role in protection for transmission lines, thus scholars have shown increased interest in developing robust, novel and precise fault-type classification approaches.Most of the available classification methods are based on classifiers and statistical learning theories [51] while some based on logical methods [52].A development in fault classification techniques is highly dependent on improvements in machine learning and pattern recognition methods.A detailed review of fault-type classification techniques is presented in the subsequent section.

Artificial Neural Network (ANN) Based FC
Artificial neural networks (ANNs) are part of learning algorithms and non-linear statistical models family with the aim to emulate behaviors of linked neurons within biological neural systems.Various ANN algorithms have been developed for various applications including fault-type classification in TLs.

Feedforward Neural Network (FNN)
Feedforward neural network (FNN) is the simplest configuration from all the ANN models, characterized as a single or multi-layer perceptron.Typically, an FNN has an input layer, a hidden layer and an output layer as shown in Figure 2 [52].The neurons in adjacent layers are fully linked, and the weights/parameters yield the output of the network.To make it simple, the learning procedure is accomplished by fine-tuning the parameters/weights of the ANN, in such a way that the output fulfills certain situations [59].In 1986 Rumelhart et al. proposed the back-propagation (BP) method [60].FNNs' training process is mainly based on the BP algorithm, the term BPNN is used.Applications of FNN with BP on power systems are found in [61,62].Bo et al. decomposed the voltage profile into six frequency bands and the energy of each band is determined by creating 18 features for the input layer [63].This 18-12-3 FNN has the ability to select a faulty line.Hagh et al. used discrete FNN units to detect various types of fault, thus each neural network (NN) would have to learn fewer patterns [30].

Radial basis Function Network (RBFN)
A radial basis function network (RBFN) is a type of FNN that uses radial basis activation functions for hidden nodes as shown in Figure 3 [64].Typically, activation functions are Gaussian functions and an RBFN has one hidden layer [65].RBFN used to construct fault classifiers considering the inability of FNN shown with sigmoid activation functions [66].Mahanty et al. trained two RBFNs separately to classify faults involved with and without earth, respectively [67].[70].DWT is used to extract features for PNN in [71] with nine pattern nodes.In [72], the authors compared PNN, FNN, and RBFN, execution of PNN requires less training time and provided better classification results.

Radial basis Function Network (RBFN)
A radial basis function network (RBFN) is a type of FNN that uses radial basis activation functions for hidden nodes as shown in Figure 3 [64].Typically, activation functions are Gaussian functions and an RBFN has one hidden layer [65].RBFN used to construct fault classifiers considering the inability of FNN shown with sigmoid activation functions [66].Mahanty et al. trained two RBFNs separately to classify faults involved with and without earth, respectively [67].[70].DWT is used to extract features for PNN in [71] with nine pattern nodes.In [72], the authors compared PNN, FNN, and RBFN, execution of PNN requires less training time and provided better classification results.[70].DWT is used to extract features for PNN in [71] with nine pattern nodes.In [72], the authors compared PNN, FNN, and RBFN, execution of PNN requires less training time and provided better classification results.[73].In ChNN polynomials, functional expansion is used to map original input into higher-dimensional space; the hidden layer is interchanged, leaving only one layer in the network as shown in Figure 5 [74].Thus only one parameter needs to be tuned in ChNN because of its single-layer structure, making it easy to implement than other ANN models with efficient fault classification results.

FC Based on Fuzzy Interface Systems (FIS)
Fuzzy logics are applied by performing an interference operation based on fuzzy if-then rules in fuzzy interference systems (FIS).This varies from Boolean logic in the manner that fuzzy logic lets the truth be represented by [0,1].Here 0 represents the absolute falseness whereas 1 represents the absolute truth.A basic FIS is distributed into three stages such as; fuzzification, inference and defuzzification stage as shown in Figure 6 [75].In [76], samples of three-phase currents for the post-  [73].In ChNN polynomials, functional expansion is used to map original input into higher-dimensional space; the hidden layer is interchanged, leaving only one layer in the network as shown in Figure 5 [74].Thus only one parameter needs to be tuned in ChNN because of its single-layer structure, making it easy to implement than other ANN models with efficient fault classification results.[73].In ChNN polynomials, functional expansion is used to map original input into higher-dimensional space; the hidden layer is interchanged, leaving only one layer in the network as shown in Figure 5 [74].Thus only one parameter needs to be tuned in ChNN because of its single-layer structure, making it easy to implement than other ANN models with efficient fault classification results.

FC Based on Fuzzy Interface Systems (FIS)
Fuzzy logics are applied by performing an interference operation based on fuzzy if-then rules in fuzzy interference systems (FIS).This varies from Boolean logic in the manner that fuzzy logic lets the truth be represented by [0,1].Here 0 represents the absolute falseness whereas 1 represents the absolute truth.A basic FIS is distributed into three stages such as; fuzzification, inference and defuzzification stage as shown in Figure 6 [75].In [76], samples of three-phase currents for the post-

FC Based on Fuzzy Interface Systems (FIS)
Fuzzy logics are applied by performing an interference operation based on fuzzy if-then rules in fuzzy interference systems (FIS).This varies from Boolean logic in the manner that fuzzy logic lets the truth be represented by [0,1].Here 0 represents the absolute falseness whereas 1 represents the absolute truth.A basic FIS is distributed into three stages such as; fuzzification, inference and defuzzification stage as shown in Figure 6 [75].In [76], samples of three-phase currents for the post-fault conditions are analyzed to evaluate the characteristic features for the fuzzy rules.The features are assessed as a difference of normalized ratios for maximums of the phase currents.Samantaray developed an initial fuzzy rule from already trained decision tree (DT) and then rules are simplified by using genetic algorithm (GA) and similarity measured [77].The E-algorithm is proposed in [78], to distinguish faults caused by lightning, animals, and trees via imbalanced data, which is a heuristic manner to discover the optimal fuzzy rules.Wang et al. proposed a fuzzy-neural approach for TL fault classification [79].Fuzzy-neural is a combination of fuzzy and neural logics.Negative, zero and positive sequence current components are the inputs [79].Adaptive network-based FIS (ANFIS) is used for fault classification in [80,81].Hassan confirmed the effectiveness, precision, and robustness of ANFIS by adding white noise to the test data [82].
Appl.Sci.2019, 9, x FOR PEER REVIEW 9 of 27 fault conditions are analyzed to evaluate the characteristic features for the fuzzy rules.The features are assessed as a difference of normalized ratios for maximums of the phase currents.Samantaray developed an initial fuzzy rule from already trained decision tree (DT) and then rules are simplified by using genetic algorithm (GA) and similarity measured [77].The E-algorithm is proposed in [78], to distinguish faults caused by lightning, animals, and trees via imbalanced data, which is a heuristic manner to discover the optimal fuzzy rules.Wang et al. proposed a fuzzy-neural approach for TL fault classification [79].Fuzzy-neural is a combination of fuzzy and neural logics.Negative, zero and positive sequence current components are the inputs [79].Adaptive network-based FIS (ANFIS) is used for fault classification in [80,81].Hassan confirmed the effectiveness, precision, and robustness of ANFIS by adding white noise to the test data [82].

FC Based on Decision Tree (DT) Technique
The term decision tree (DT) refers to graphs which are able to make decisions and its basics are detailed in [83,84].Three sorts of nodes are involved in DT namely; root node, internal nodes, and the leaf nodes.Mechanism of decision making starts from the root node for classification and class label is represented by the leaf node as shown in Figure 7 [85].The suboptimal decision tree is obtained using training data via greedy algorithm e.g., C4.6, regression tree, ID3 and, classification and regression tree (CART) with increased accuracy in reasonably reduced time [84].Random forest (RF) comprising of a finite number of DTs is used in [86] for fault classification in single and double circuit transmission lines.The decision-making mechanism can be performed with accuracy in less than a quarter-cycle via DT [28,29].DWT coefficients are used as features for CART-DT and a performance comparison was made with FNN [20].Both of the mechanisms obtained a high degree of accuracy whereas the performance of CART-DT is found better.

FC Based on Decision Tree (DT) Technique
The term decision tree (DT) refers to graphs which are able to make decisions and its basics are detailed in [83,84].Three sorts of nodes are involved in DT namely; root node, internal nodes, and the leaf nodes.Mechanism of decision making starts from the root node for classification and class label is represented by the leaf node as shown in Figure 7 [85].The suboptimal decision tree is obtained using training data via greedy algorithm e.g., C4.6, regression tree, ID3 and, classification and regression tree (CART) with increased accuracy in reasonably reduced time [84].Random forest (RF) comprising of a finite number of DTs is used in [86] for fault classification in single and double circuit transmission lines.The decision-making mechanism can be performed with accuracy in less than a quarter-cycle via DT [28,29].DWT coefficients are used as features for CART-DT and a performance comparison was made with FNN [20].Both of the mechanisms obtained a high degree of accuracy whereas the performance of CART-DT is found better.
Appl.Sci.2019, 9, x FOR PEER REVIEW 9 of 27 fault conditions are analyzed to evaluate the characteristic features for the fuzzy rules.The features are assessed as a difference of normalized ratios for maximums of the phase currents.Samantaray developed an initial fuzzy rule from already trained decision tree (DT) and then rules are simplified by using genetic algorithm (GA) and similarity measured [77].The E-algorithm is proposed in [78], to distinguish faults caused by lightning, animals, and trees via imbalanced data, which is a heuristic manner to discover the optimal fuzzy rules.Wang et al. proposed a fuzzy-neural approach for TL fault classification [79].Fuzzy-neural is a combination of fuzzy and neural logics.Negative, zero and positive sequence current components are the inputs [79].Adaptive network-based FIS (ANFIS) is used for fault classification in [80,81].Hassan confirmed the effectiveness, precision, and robustness of ANFIS by adding white noise to the test data [82].

FC Based on Decision Tree (DT) Technique
The term decision tree (DT) refers to graphs which are able to make decisions and its basics are detailed in [83,84].Three sorts of nodes are involved in DT namely; root node, internal nodes, and the leaf nodes.Mechanism of decision making starts from the root node for classification and class label is represented by the leaf node as shown in Figure 7 [85].The suboptimal decision tree is obtained using training data via greedy algorithm e.g., C4.6, regression tree, ID3 and, classification and regression tree (CART) with increased accuracy in reasonably reduced time [84].Random forest (RF) comprising of a finite number of DTs is used in [86] for fault classification in single and double circuit transmission lines.The decision-making mechanism can be performed with accuracy in less than a quarter-cycle via DT [28,29].DWT coefficients are used as features for CART-DT and a performance comparison was made with FNN [20].Both of the mechanisms obtained a high degree of accuracy whereas the performance of CART-DT is found better.

FC Based on Support Vector Machine (SVM)
Cortes and Vapnik invented the support vector machine (SVM) in 1995 [87].A theoretical foundation can be found in [88].SVM structure is shown in Figure 8 [89].SVM classifiers find optimal hyperplane which maximizes the margin between two entities.SVM avoids over-fitting and does not fall in local optima due to its risk-minimizing ability, which made SVM an attractive tool for fault classification in transmission lines.SVM is employed on series compensated TLs for fault classification in [90,91], where three SVMs were employed for three phases and separate SVM for ground.In [17,[92][93][94], features extracted by the DWT are directed as input to SVMs.SVM classifiers in [95,96] used features extracted from S-transform.Shahid et al. selected a quarter sphere support vector machine (QSSVM) to identify and classify the faults [97].QSSVM gives satisfactory fault detection and classification results through temporal-attribute QSSVM and attributes QSSVM, respectively.

FC Based on Support Vector Machine (SVM)
Cortes and Vapnik invented the support vector machine (SVM) in 1995 [87].A theoretical foundation can be found in [88].SVM structure is shown in Figure 8 [89].SVM classifiers find optimal hyperplane which maximizes the margin between two entities.SVM avoids over-fitting and does not fall in local optima due to its risk-minimizing ability, which made SVM an attractive tool for fault classification in transmission lines.SVM is employed on series compensated TLs for fault classification in [90,91], where three SVMs were employed for three phases and separate SVM for ground.In [17], [92][93][94], features extracted by the DWT are directed as input to SVMs.SVM classifiers in [95], [96] used features extracted from S-transform.Shahid et al. selected a quarter sphere support vector machine (QSSVM) to identify and classify the faults [97].QSSVM gives satisfactory fault detection and classification results through temporal-attribute QSSVM and attributes QSSVM, respectively.

FC Based on Logic Flow (LF)
Typically, a tree-like logic flow with multi-criteria is used if no AI or ML-based algorithms are devised.Kezunovic et al. compared the extracted features for ground and three phases to pre-set thresholds [98].Any value exceeding the pre-set threshold results in fault on phases or ground.Comparisons are conducted between thresholds and feature values at each node within the logic flow.In [26,99], Shanon entropy and WT used to yield features and logic flows are implemented.Jiang et al. employed Clark transformation to create fault detection guides for each phase and then compared with the threshold to complete classification [41].Karrenbauer's transformation with wavelet transform is used in [45] and modulus maxima of WT employed in logic flow to choose the type of fault. 2 considering the employed method within the algorithm, complexity level, input, test system, features, and results.

A comparison of different fault-type classification algorithms is given in Table
Here complexity is defined by considering the number of inputs and rules involved in algorithm development as simple, medium and complex.Further, the selection of complexity level is achieved based on feature training and testing time, accuracy, convergence, and variance along with data required.

FC Based on Logic Flow (LF)
Typically, a tree-like logic flow with multi-criteria is used if no AI or ML-based algorithms are devised.Kezunovic et al. compared the extracted features for ground and three phases to pre-set thresholds [98].Any value exceeding the pre-set threshold results in fault on phases or ground.Comparisons are conducted between thresholds and feature values at each node within the logic flow.In [26,99], Shanon entropy and WT used to yield features and logic flows are implemented.Jiang et al. employed Clark transformation to create fault detection guides for each phase and then compared with the threshold to complete classification [41].Karrenbauer's transformation with wavelet transform is used in [45] and modulus maxima of WT employed in logic flow to choose the type of fault. 2 considering the employed method within the algorithm, complexity level, input, test system, features, and results.

A comparison of different fault-type classification algorithms is given in Table
Here complexity is defined by considering the number of inputs and rules involved in algorithm development as simple, medium and complex.Further, the selection of complexity level is achieved based on feature training and testing time, accuracy, convergence, and variance along with data required.

Future Trends in Fault-Type Classification
The discussed transmission line fault classification studies mainly selected mature machine learning approaches such as FSI, ANN, SVM or DT, etc.However, huge advancements and new trends have been seen in the field of data mining and machine learning.Hinton et al. proposed an approach using restricted Boltzmann machine learning (RBML) to extract feature characteristics [112], the groundwork for deep learning (DL).Deep learning structure resembles multi-layer FNN, varying by the aspect that the unsupervised feature learning from a large amount of unlabeled input data saves the model from overfitting and falling into local optima.The fault classification capability of DL has increased recently [113] and its application on the power system is encouraging.Convolutional neural networks (CNNs) are recommended to handle multi-channel sequence recognition problems in [114], a promising idea for fault classification job.

Fault Location Finding Methods
A comprehensive review of existing techniques for finding fault location (FL) is provided in [115][116][117].Fundamentals and new progress in fault location methods based on existing literature are discussed in this paper.FL techniques can be categorized based on the source of data; double-end, single-end, and wide-area.Wide-area methods are discussed in this paper due to the demand and need for future smart grids.Similarly, series compensated and hybrid TLs are considered due to their distinguished properties than normal lines.Modern AI-based methods are discussed alongside because of their good performance for FL finding and broad application prospects.

Wide-Area FL Approach
Conventional fault location techniques are not able to trace faults when either of the monitoring devices installed at end terminals of TL fail to record changes in voltage and current profiles.Wide-area FL methods can be a possible solution [118] for this kind of scenario.In wide-area FL methods, a replica of each application/algorithm runs at different transmission substations, as shown in Figure 9 [119], to avoid overloading the available computation and communication resources of that particular station.Thus, fault can be located even with less number of devices installed at different end-terminals of transmission links.Optimization-based synchronized algorithms are proposed for fault location [120,121].Traveling-wave (TW) methods are also applied for fault location finding with single-end data and two-end data, respectively [122].The linear least square (LLS) method is employed to locate the fault position.Synchronized voltage based non-iterative substitution algorithm proposed for fault location estimation in [123].This method is based on the positive and negative sequence impedance matrix obtained via network topology.The matching degree factor is equal to zero in a positive sequence network, represents the fault point location [124].Thus, the matching degree is used to point out the faulty bus in the entire system.In [125], a hierarchical routine based on impedance is used to locate faulted zone, line and point.

Future Trends in Fault-Type Classification
The discussed transmission line fault classification studies mainly selected mature machine learning approaches such as FSI, ANN, SVM or DT, etc.However, huge advancements and new trends have been seen in the field of data mining and machine learning.Hinton et al. proposed an approach using restricted Boltzmann machine learning (RBML) to extract feature characteristics [112], the groundwork for deep learning (DL).Deep learning structure resembles multi-layer FNN, varying by the aspect that the unsupervised feature learning from a large amount of unlabeled input data saves the model from overfitting and falling into local optima.The fault classification capability of DL has increased recently [113] and its application on the power system is encouraging.Convolutional neural networks (CNNs) are recommended to handle multi-channel sequence recognition problems in [114], a promising idea for fault classification job.

Fault Location Finding Methods
A comprehensive review of existing techniques for finding fault location (FL) is provided in [115][116][117].Fundamentals and new progress in fault location methods based on existing literature are discussed in this paper.FL techniques can be categorized based on the source of data; double-end, single-end, and wide-area.Wide-area methods are discussed in this paper due to the demand and need for future smart grids.Similarly, series compensated and hybrid TLs are considered due to their distinguished properties than normal lines.Modern AI-based methods are discussed alongside because of their good performance for FL finding and broad application prospects.

Wide-Area FL Approach
Conventional fault location techniques are not able to trace faults when either of the monitoring devices installed at end terminals of TL fail to record changes in voltage and current profiles.Widearea FL methods can be a possible solution [118] for this kind of scenario.In wide-area FL methods, a replica of each application/algorithm runs at different transmission substations, as shown in Figure 9 [119], to avoid overloading the available computation and communication resources of that particular station.Thus, fault can be located even with less number of devices installed at different end-terminals of transmission links.Optimization-based synchronized algorithms are proposed for fault location [120,121].Traveling-wave (TW) methods are also applied for fault location finding with single-end data and two-end data, respectively [122].The linear least square (LLS) method is employed to locate the fault position.Synchronized voltage based non-iterative substitution algorithm proposed for fault location estimation in [123].This method is based on the positive and negative sequence impedance matrix obtained via network topology.The matching degree factor is equal to zero in a positive sequence network, represents the fault point location [124].Thus, the matching degree is used to point out the faulty bus in the entire system.In [125], a hierarchical routine based on impedance is used to locate faulted zone, line and point.

Fault Location Finding Algorithm for Series Compensated TLs
Typically, series compensation is achieved through series capacitors and metal oxide varistors.The non-linear nature of series compensation devices adds difficulty to locate the faulty segment and hence the fault location.Thus traditional approaches [126,127] need to be modified to address such cases.The generalized procedure for FL finding for series compensated TLs is shown in Figure 10.In [128], an impedance-based approach is proposed using double end voltage and current samples.Swetapadma et al. used an artificial intelligence-based algorithm to locate single and multi-fault locations [129].Third level wavelet coefficients (62.5-125 kHz) are extracted by DWT from two post-fault and one pre-fault cycles.The features based on standard deviation in coefficients of voltage and current signals serve as input for ANN.Nobakhti used synchronized measurements from ends of distributed transmission line and transient resistance nature of thyristor controlled series capacitor as an indicator of the faulty section [130].
Appl.Sci.2019, 9, x FOR PEER REVIEW 14 of 27 Typically, series compensation is achieved through series capacitors and metal oxide varistors.The non-linear nature of series compensation devices adds difficulty to locate the faulty segment and hence the fault location.traditional approaches [126,127] need to be modified to address such cases.The generalized procedure for FL finding for series compensated TLs is shown in Figure 10.In [128], an impedance-based approach is proposed using double end voltage and current samples.

FL Methods for Hybrid TLs
Hybrid TLs consisting of both underground cables and overhead (OH) TLs show discontinuity at joints, where the reflection of voltage and current signals are produced.Velocities of traveling waves are different in cables and OH lines.Conventional approaches need improvements so that they could be implemented for hybrid transmission systems [131].Traveling wave velocities can also be employed for fault location estimation because of different TW velocities within hybrid transmission systems.Niazy et al. proposed a TW-based fault lactation technique using transients caused by the circuit breaker operation instead of fault-induced transients [132].Arrival time of traveling-wave components is measured by WT and fault zone is estimated via polarity of reflections.Wave speed is also calculated and the double-end traveling wave method is employed to locate the fault.The dc offset is removed through a finite impulse response (FIR) filter.DWT plays a significant role in gathering details of wavelets and voltage signal coefficients.Such details are then fed as input to the neuro-fuzzy system.This helps in the determination of fault location (either on an overhead transmission line or underground cables) [133].The time-reversal method is also applicable for fault location finding [134].The maximum energy point is extracted by comparing all the energies of different points and treated as fault points.

FL Methods for Hybrid TLs
Hybrid TLs consisting of both underground cables and overhead (OH) TLs show discontinuity at joints, where the reflection of voltage and current signals are produced.Velocities of traveling waves are different in cables and OH lines.Conventional approaches need improvements so that they could be implemented for hybrid transmission systems [131].Traveling wave velocities can also be employed for fault location estimation because of different TW velocities within hybrid transmission systems.Niazy et al. proposed a TW-based fault lactation technique using transients caused by the circuit breaker operation instead of fault-induced transients [132].Arrival time of traveling-wave components is measured by WT and fault zone is estimated via polarity of reflections.Wave speed is also calculated and the double-end traveling wave method is employed to locate the fault.The dc offset is removed through a finite impulse response (FIR) filter.DWT plays a significant role in gathering details of wavelets and voltage signal coefficients.Such details are then fed as input to the neuro-fuzzy system.This helps in the determination of fault location (either on an overhead transmission line or underground cables) [133].The time-reversal method is also applicable for fault location finding [134].
The maximum energy point is extracted by comparing all the energies of different points and treated as fault points.

ANN-based Algorithm for FL
The fault location finding task can also be achieved in transmission networks by applying different kinds of ANNs as it shows self-organization, self-learning, high fault tolerance, fast processing, and non-linear function approximation.ANN data is trained by detailed coefficients obtained by DWT which are then employed for the Levenberg Marquardt algorithm to locate fault [135].In [136], fundamental components of voltage and current signals are extracted by DFT.Different modular ANNs are employed and triplet vectors served as input for them.Best performance is obtained by features containing information of both voltage and current signals with respect to the fault location accuracy and training speed.Complex domain ANNs are simply the extensions of real domain ANNs whose input, output and hidden layers are all complex numbers.Mother wavelet, Db2, is employed as input to the complex domain ANN for finding the location of fault [137].PNN reduced the error to 0 km for fault location for different circuit topologies including both loop and single-circuit topology [138].Gayathri et al. proposed a two-stage FL finding method using radial basis function (RBF) kernel-based SVM and scaled conjugate gradient (SCALCG) based ANN for fault location finding [139].In the first stage, the fault area is estimated by measuring the magnitudes of fundamental harmonics of voltage and current signals which then fed as input to RBF-based SVM.In the second stage, high-frequency characteristics served as input for SCALCG based ANN to obtain precise fault location.

FIS Based Algorithm for FL
Self-learning and fault-tolerant abilities of FIS algorithms let them refine pre-set fuzzy rules, which then employed for fault location finding [140,141].Mother wavelet, Db4, with ANFIS is used to locate faults.Efficiency is validated by Monte Carlo simulation and error found to be 5% [140].Norm entropy of harmonic coefficients (62.5-500Hz), main frequency coefficients (0-62.5 Hz) and transient coefficients (500-4000 Hz) achieved by the 6-level DWT using Db4 mother wavelet.And treated as an input for ten ANFIS regression algorithms trained by BP gradient descent technique along with the LLS method [141].

Support Vector Regression-Based Approach for FL
Regression problems can be solved via SVM by introducing , insensitive loss function.This technique is known as support vector regression (SVR).SVR retains the properties of SVM such as over-fitting data possibilities are minimized by selecting discriminative functions based on principles of structural minimization.The global solution can also be obtained by training as a convex optimization problem [142,143].In [144], noise removal and offset reduction is done by stationary WT (SWT).The special determinant transform function is used to extract features from 2 to 5 SWT coefficients.And then radial basis kernel SVR corresponding to fault-type is employed after classification attained by SVM.Eleven different kinds of features are obtained for fault estimation through the spatio-temporal prediction HST matrix in [95].It is implemented by replacing the Gaussian window of ST with the hyperbolic window as an asymmetrical window to extract features from current and voltage signals.Distinctive fault features are extracted using wavelet packet decomposition (WPD) with Db1 mother wavelet from the first half-cycle of post-fault voltage samples [145].

Comparison of Fault Location Methods
A comparison of various fault location techniques is given in Table 3 considering the algorithm employed, input, test system, complexity level, features, and results.Where, complexity is defined by considering the number of inputs and rules involved in algorithm development as simple, medium and complex.Further, the selection of complexity level is achieved based on feature training and testing time, accuracy, convergence, and variance along with data required.The computation burden is reduced as it is a reduction technique.
Pre and post-fault is a half-cycle.

Medium
The maximum error in finding FL is expected below than 0.07% [155] During training, the maximum percentage error of 0.031% and 0.0109% is observed for TL and cable, respectively.
During the testing process, the maximum % error of 0.0277% and 0.039% are observed for TL and cable, respectively.

Future Trends in Fault Location Estimation
As the transmission network is growing with complexity and inadequate measurements are expected to be common.Wide-area methods would be employed largely for fault location finding in the near future.However, machine learning algorithms have more compliance and less affected by line parameters as compared to TW or impedance-based approaches.Increased involvement of communication and computation is foreseen in power systems.Thus machine learning including deep learning methods should be explored for future fault location findings.

Weaknesses and Strengths of Different Emerging Computational Intelligence Methods
Generalized strengths and weaknesses of different artificial intelligence and machine learning-based algorithms are given in Table 4.It may help researchers to select a method for fault detection, classification, and location-based on its strengths, features and complexity levels.

ANN Technique
ANN is pretty good in determining the exact fault-type and its implementation is easy.
The training process is quite complex for high-dimension problems.
Its use is easy, with the adjustment of only a few parameters.
A local optimum solution is provided by the gradient-based back-propagation technique for non-linear separable pattern classification problem.
It has a lot of applications in real-life problems.
ANN offers slow convergence in the BP algorithm.
ANN learns and no need for reprogramming.
Convergence is dependent on the selection of the initial value of weight constraints connected to the network.

PNN Technique
The learning process is not required.
It requires high processing time for large networks.
Determination of initial weights of the network is not needed.
No correlation of the recalling process and learning process.
Convergence in Bayesian classifier is certain.Not easy to determine how many layers and neurons are required.
PNN show fast learning time.Large memory space is required to save the model Fuzzy Methods Simple 'if-then' relation is used to solve uncertainty problems.
No robustness is observed.
Experts are mandatory in order to determine membership function and fuzzy rules, for large training data.

ANFIS Technique
Parameters are tuned properly by the hybrid learning rule.ANFIS is highly complex in computation.It offers a faster convergence.
The search space dimension is reduced.Modal parameters are required The single transformation matrix is for the three-phase system (identical for current and voltages)

ANFIS is smooth and adaptable
Transposition and non-transposition of electrical values are done by simple multiplication of matrices.No convolution methods are required.

Not reliable for complex structures Deep Learning
Best-in-class performance on problems that significantly outperforms other solutions in multiple domains.This is not by a little bit, but by a significant amount.
A large amount of data is required DL reduces the need for feature engineering, one of the most time-consuming parts of machine learning practice.
DL is computationally expensive to train and takes weeks to train via hundreds of machines equipped with expensive graphical processing units (GPUs) It is an architecture that can be adapted to new problems relatively with ease e.g., time series, languages, etc., are using techniques like convolutional neural networks, recurrent neural networks, long short-term memory, etc Determining the topology/training method for DL is a black art with no theory

Conclusions
A comprehensive review of fault detection, classification, and location in transmission lines has been presented in this paper.A range of techniques and methods are presented in addition to representative works.
Before introducing methods used in fault detection, classification and location, an overview of feature extraction methods are presented, the groundwork for fault identification algorithms.Various transforms along with dimensionality reduction techniques have also been discussed.Newly developed ideas and their comparison with some noteworthy aspects regarding fault detection are also discussed.
Machine learning-based methods are widely employed by the researchers for fault-type classifications.However, in addition to SVM, FIS, ANN, and DT, deep learning-based promising algorithms such as; CNN and RBM, are recommended for fault classification.
Fault location finding algorithms are discussed with AI-based methods.Machine learning including deep learning methods is recommended for future FL finding methods due to increased involvement of communication and computation in transmission systems.
Generalized strengths and weaknesses of different artificial intelligence and machine learning-based algorithms are discussed.A comparative survey on all three tasks; fault detection, classification, and the location is also presented in a tabulated form considering features, inputs, complexity, system used and results.This paper may provide basic development to the researchers and further study directions in this field.

Figure 1 .
Figure 1.Various types of faults that can occur within three-phase power transmission systems.

Figure 1 .
Figure 1.Various types of faults that can occur within three-phase power transmission systems.An in-depth review of the different techniques used for fault detection, classification, and location estimation has been presented in this paper.This paper also presents a detailed comparison of various fault detection, classification and location methods based on the algorithm used, input, test system, features extracted, complexity level and results.Where, complexity is defined by considering the number of inputs and rules involved in algorithm development throughout this paper as simple, medium and complex.The rest of the paper is organized as follows: feature extraction based on transformation, dimensionality reduction, and the modal transformation is discussed in Section 2. Fault detection methods largely based on feature extraction techniques are presented in Section 3. Fault-type classification methods are presented in Section 4 while a comparison of fault-type classification techniques is presented in Section 5. Section 6 describes the future prospects of fault-type classification.Fault location finding approaches are reviewed in Section 7 and a comparison of fault location finding methods is outlined in Section 8. Future trends in fault location finding techniques are proposed in Section 9. Strengths and weaknesses of notable emerging computational intelligence methods are presented in Section 10.Finally, conclusions are drawn in Section 11.

Figure 2 .
Figure 2. A two-layer feedforward neural network.4.1.2.Radial basis Function Network (RBFN)A radial basis function network (RBFN) is a type of FNN that uses radial basis activation functions for hidden nodes as shown in Figure3[64].Typically, activation functions are Gaussian functions and an RBFN has one hidden layer[65].RBFN used to construct fault classifiers considering the inability of FNN shown with sigmoid activation functions[66].Mahanty et al. trained two RBFNs separately to classify faults involved with and without earth, respectively[67].

Figure 3 .
Figure 3. Radial basis function network structure.4.1.3.Probabilistic Neural Network (PNN) Probabilistic neural network (PNN) structure proposed by Specht in 1989, another type of FNN [68].It has four layers namely; input layer, pattern/hidden layer, summation layer and output layer as shown in Figure 4 [69].Mo et al. used PNN for fault classification and found that PNN classification is 10% higher than FNN[70].DWT is used to extract features for PNN in[71] with nine pattern nodes.In[72], the authors compared PNN, FNN, and RBFN, execution of PNN requires less training time and provided better classification results.

Figure 4 .
Figure 4. Probabilistic neural network layout.4.1.4.Chebyshev Neural Network (ChNN) Vyas et al. used a Chebyshev neural network (ChNN) for fault classification in TLs[73].In ChNN polynomials, functional expansion is used to map original input into higher-dimensional space; the hidden layer is interchanged, leaving only one layer in the network as shown in Figure5[74].Thus only one parameter needs to be tuned in ChNN because of its single-layer structure, making it easy to implement than other ANN models with efficient fault classification results.

Figure 4 .
Figure 4. Probabilistic neural network layout.4.1.4.Chebyshev Neural Network (ChNN) Vyas et al. used a Chebyshev neural network (ChNN) for fault classification in TLs[73].In ChNN polynomials, functional expansion is used to map original input into higher-dimensional space; the hidden layer is interchanged, leaving only one layer in the network as shown in Figure 5[74].Thus only one parameter needs to be tuned in ChNN because of its single-layer structure, making it easy to implement than other ANN models with efficient fault classification results.

Figure 9 .
Figure 9. Wide-area fault location method: the local site is where the application request originates and is responsible for coordinating the remote replicas.

Figure 9 .
Figure 9. Wide-area fault location method: the local site is where the application request originates and is responsible for coordinating the remote replicas.
Swetapadma et al. used an artificial intelligence-based algorithm to locate single and multi-fault locations [129].Third level wavelet coefficients (62.5-125 kHz) are extracted by DWT from two postfault and one pre-fault cycles.The features based on standard deviation in coefficients of voltage and current signals serve as input for ANN.Nobakhti used synchronized measurements from ends of distributed transmission line and transient resistance nature of thyristor controlled series capacitor as an indicator of the faulty section [130].

Figure 10 .
Figure 10.FL algorithm for series compensated transmission lines.

Figure 10 .
Figure 10.FL algorithm for series compensated transmission lines.

Table 1 .
Comparison of different fault detection methods.

Table 2 .
Comparison of different fault-type classification methods.

Table 3 .
Comparison of different fault location finding methods.

Table 4 .
Strengths and weaknesses of various emerging computational intelligence methods.