Decision Tree Method for Fault Causes Classiﬁcation Based on RMS-DWT Analysis in 275 kV Transmission Lines Network

: This paper presents a statistical algorithm for classiﬁcation of fault causes on power transmission lines. The proposed algorithm is based upon the root mean square (RMS) current duration, voltage dip, and discrete wavelet transform (DWT) measured at the sending end of a line and the decision tree method, a commonly accessible measurable method. Fault duration of RMS current signal, voltage dip, and DWT gives concealed data of a fault signature as a contribution to decision tree calculation which is utilized to classify various fault causes. The proposed method was carried out in the MATLAB/SIMULINK programming platform based upon the information made with the fault analysis of the 275 kV sample transmission line considering wide variations in the operating conditions. The classiﬁer performance of different parameters was also compared in a confusion matrix form to obtain the best classiﬁcation results of the decision tree.


Introduction
Unplanned electrical power outages have become a major issue to a power utility [1,2]. A temporary loss of interruption of power source especially a loss of electric power might affect the economic and security issue. The outage that occurred due to an equipment tripping or failure is categorized as force outage. According to the IEEE std 524, the equipment failure also known as electrical fault is defined as a physical condition that causes a device, component, or an element to fail to perform in a required manner. Transmission line networks, which consist of overhead lines and cable lines are susceptible to various system faults. In order to address the issues brought by faults in the power system, identification of the root cause of faults by its signature is necessary. Acquiring knowledge of the outage's cause immediately after the faults is highly helpful to reduce the outage's duration [3]. Several studies have been carried out to identify the root cause of fault such as due to natural phenomenon, equipment failures, and human error. The most common fault caused in overhead transmission lines is due to natural phenomena such as lightning, wind, tree growth, and bushfire.
Among the tremendous root cause of fault are the several types of phenomenon that prominently occur in Malaysia's transmission line system such as lightning, tree encroachment, crane encroachment, insulator degrading, and bushfire [4][5][6]. In most cases, the fault occurs and becomes a risk and would obviously reduce the productivity of the installation, in addition to the cost of maintenance to restore the system with normal conditions as well as the loss. To reduce the maintenance cost and increase the system availability and productivity at optimal performances, we proceed to early fault detection. It is possible to classify the faults by looking from different aspects, in particular: the fault current duration when it increases by a certain percentage, voltage dip percentage, and energy decomposition value. The classification tool works based on the acquired process parameters such as current signal, voltage signal, and frequency. The acquired signals of fault causes classification process contain the dynamic information about duration and voltage dip.
Regardless of which measure boundaries are chosen, signal processing techniques such as time domain, frequency domain, and time-frequency domain analyses a lot of signal features to anticipate various types of fault. Previously, many researchers proposed a signal processing method to detect and classify faults. Saravanan et al. [2] diagnosed the gear box fault using discrete wavelet transform (DWT) and classified the fault using artificial neural network (ANN). The ANN is capable of classifying the gear box fault on various conditions based on numerical values extracted from the wavelet energy decomposition. In [7], Malathy et al. proposed a continuous density hidden Markov model (CDHMM) to determine the dynamics of the state transition due to fault occurrences and classify the condition using neural networks. In addition, Sheng et al. proposed rotating machinery fault diagnosis using convolutional neural network (CNN) [8]. Fault detection using fuzzy method has also been proposed for photovoltaic (PV) protection [9]. Voltage ratio (VR) and power ratio (PR) are applied as input data for ANN to categorize fault regions in examining PV. Then, the second technique is implemented to detect the exact fault in the PV system. Centroid type for defuzzification process is chosen and 10 different membership functions are considered for fuzzy logic process. In [10], the author proposed fuzzy cause-and-effect network (FCE) in DS for fault diagnosis. The measurement of feeder currents and bus voltage derived from SCADA is converted into fuzzy terms before it is specified into the membership function of fuzzy sets. Furthermore, a decision tree is widely used for fault detection and classification [11]. Saravanan et al. proposed decision tree classify rules to build a repository of faults in a gear box based on statistical value extracted from wavelet transform [12]. Furthermore, Upendar et al. proposed the same method to classify the types of faults in 400 kV power transmission line networks and comparing the accuracy results with obtaining back-propagation neural network [13]. Rabah et al. implemented a decision tree to detect and diagnose the fault in grid connected photovoltaic systems under several weather conditions [14].
In much of the previous research, it is observed that the fault detection in overhead transmission lines is mainly focusing on the types of fault whether it is line-to-line fault or grounding fault. However, because of the diversity root cause of faults along the complex connection of transmission lines, it is difficult to identify the fault causes based on signal waveform recorded thus, less exploration made on previous study. In our proposed fault causes identification, we provide the fault signatures based on signal characterization to differentiate four types of fault causes which mainly occurred in Malaysian 275 kV overhead line system. In addition, the fault causes are further classified using decision tree technique since it gains a prudent accuracy and low cost computation among other signal processing techniques [11]. To extract useful information, RMS fault current duration, voltage dip, and DWT features are extracted and significant features are selected from raw signals using a decision tree. These criteria might be very informative for further analysis to identify any other fault causes such as bushfire, animal encroachment, and falling objects. An 840 sample set of signals are classified in a decision tree with different predictor selection to find the most accurate classifier with the best computation time.

Related Works
The analysis of fault signature involves four essential steps starting with collection of statistical and waveform data, fault characterization, develop fault algorithm model using simulation study and fault classification process. An overall step involved in the fault signature analysis is illustrated in Figure 1. In the first stage, the statistical fault data which has been endorsed by the committee meeting was obtained to identify their root cause. The utmost root cause of faults is sorting out to merge with their, system voltage involved, relative activities and time occurrences. Then, the voltage and current waveform are acquired and all the signatures of different fault causes are identified. In the fault characterization stage, fault current and voltage waveform chosen are converted to root mean square (RMS) waveform to establish their criteria. The behavior of each parameter is determined and evaluated. Since actual data is insufficient to be classified, several fault models regarding on the fault causes are developed in the MATLAB/SIMULINK environment. Several parameters such as fault distance, inception angle, and load are varied to study their effects on the fault signature. Then, the generated waveforms produced from the simulation are compared with the actual waveforms from the field to validate the results. Finally, the actual waveform and output waveform generated from the simulation are trained and tested in a decision tree tool for classification purposes. The decision tree will handle the numerical and categorical variables by ruling out features condition using the splitting method.
Sci. 2021, 11, x FOR PEER REVIEW 3 of 28 The utmost root cause of faults is sorting out to merge with their, system voltage involved, relative activities and time occurrences. Then, the voltage and current waveform are acquired and all the signatures of different fault causes are identified. In the fault characterization stage, fault current and voltage waveform chosen are converted to root mean square (RMS) waveform to establish their criteria. The behavior of each parameter is determined and evaluated. Since actual data is insufficient to be classified, several fault models regarding on the fault causes are developed in the MATLAB/SIMULINK environment. Several parameters such as fault distance, inception angle, and load are varied to study their effects on the fault signature. Then, the generated waveforms produced from the simulation are compared with the actual waveforms from the field to validate the results. Finally, the actual waveform and output waveform generated from the simulation are trained and tested in a decision tree tool for classification purposes. The decision tree will handle the numerical and categorical variables by ruling out features condition using the splitting method.

Parameters Condition
The transmission line has been modeled using the frequency-dependent phase model, which is the most accurate model, as it represents all frequency-dependent effects of a transmission line, and is very useful to study the fault behavior of the line. MATLAB-Simulink is used to generate four types of fault causes in three-phase transmission line system. In this model, the transmission line is connected with the same voltage level at the sending and receiving voltage source with 100 MVA, and 275 kV buses. The load is assumed to be connected at the end of receiving line before the measurement takes place. The current transformer model is implemented for measurement purposes where the grounding circuit is utilized in the system. The fault model has been connected to the transmission line with total length of 300 km line. The Simulink parameters of test system model are set as per Table 1.
Several parameters are varied to determine the effects on the fault signatures. The variable parameters are fault distance, fault inception angle, and load as described in Table 2. The distance of fault is various from 10 km to 100 km from the receiving end where the total line length is equivalent to 300 km. The fault current and voltage waveform are measured after fault distance is varied after every 10 km. Meanwhile, fault inception angle is varied from 0°, 30°, 45, 60°, 90°, 180°, and 270°. Finally, load parameter using RL circuit is varied to be 300 MW, 330 MW, and 150 MW.

Parameters Condition
The transmission line has been modeled using the frequency-dependent phase model, which is the most accurate model, as it represents all frequency-dependent effects of a transmission line, and is very useful to study the fault behavior of the line. MATLAB-Simulink is used to generate four types of fault causes in three-phase transmission line system. In this model, the transmission line is connected with the same voltage level at the sending and receiving voltage source with 100 MVA, and 275 kV buses. The load is assumed to be connected at the end of receiving line before the measurement takes place. The current transformer model is implemented for measurement purposes where the grounding circuit is utilized in the system. The fault model has been connected to the transmission line with total length of 300 km line. The Simulink parameters of test system model are set as per Table 1. Several parameters are varied to determine the effects on the fault signatures. The variable parameters are fault distance, fault inception angle, and load as described in Table 2. The distance of fault is various from 10 km to 100 km from the receiving end where the total line length is equivalent to 300 km. The fault current and voltage waveform are measured after fault distance is varied after every 10 km. Meanwhile, fault inception angle is varied from 0 • , 30 • , 45, 60 • , 90 • , 180 • , and 270 • . Finally, load parameter using RL circuit is varied to be 300 MW, 330 MW, and 150 MW.

Fault Model
The following subsection explains the adopted model for fault causes used in the study.

Tree Fault Model
The dangers of a downed conductor are obvious to all. The possibility of fire, property damage, and anything that comes into contact with the live conductor are the major concerns where it produces arc and causes fault [15]. Tree and crane contact that cause fault are categorized under high impedance fault (HIF) where, in the HIF model used, the parameters Vp and Vn model the contact surfaces. During the HIF, current of positive half cycle value will be higher than the negative half cycle where the waveform is known as unsymmetrical and it was experimentally proven by Emanuel et al. Therefore, to model this phenomenon, Vn must be greater than Vp (Vn > Vp), and Vn − Vp = ∆V, where ∆V is unsymmetrical voltage. Moreover, it was shown that less densely packed contact surface yields a higher arc voltage than contact surface with high density. Using this as a guide, tree encroachment and crane are modeled to obtain the specified current magnitudes. Furthermore, the values for Rp and Rn parameters are randomly varied between +10% of the specified steady-state values and represent the effective fault resistance for positive and negative half cycles, respectively [16][17][18][19][20][21]. The HIF model based on Emanuel arc is shown in Figure 2. The equations involved in these algorithms are derived in Equations (1)- (4). Therefore, the fault resistance for tree cause can be defined as: Parallel of total R FCtr in Emanuel arc can be defined as: where Based on Equation (6), the fault impedance for a tree is defined as:

Crane Fault Model
The crane contact usually contains harmonics current which is presented as arc in th output waveform. The harmonics current is connected in series with Emanuel arc mode to represent the arc. Based on the literature, the equation for the crane is nearly the sam as the tree model. However, the fault current is slightly different and the time of fau occurrences is faster than the tree. The fault current in the crane is added with harmonic which is presented as current injection model and shown in Figure 3. Harmonic analysi of the acquired current has shown values of 11.6% third order harmonics, which ar within the acceptable range of HIF standard characteristics [16][17][18][19][20][21]. The current injectio is defined as in Equations (5)- (7), where the total harmonics currents are determined b adding the fundamental, third, ninth, and fifteenth of current sources. The total harmonic current injected is then expressed as follows: By implementing the Emanuel arc model and harmonics current injection, the fau impedance for crane is defined as:

Crane Fault Model
The crane contact usually contains harmonics current which is presented as arc in the output waveform. The harmonics current is connected in series with Emanuel arc model to represent the arc. Based on the literature, the equation for the crane is nearly the same as the tree model. However, the fault current is slightly different and the time of fault occurrences is faster than the tree. The fault current in the crane is added with harmonics which is presented as current injection model and shown in Figure 3. Harmonic analysis of the acquired current has shown values of 11.6% third order harmonics, which are within the acceptable range of HIF standard characteristics [16][17][18][19][20][21]. The current injection is defined as in Equations (5)- (7), where the total harmonics currents are determined by adding the fundamental, third, ninth, and fifteenth of current sources. The total harmonics current injected is then expressed as follows: ppl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 28

Insulator Fault Model
Insulator failure in overhead line system can be caused by various factors such as ageing factor or degradation of the crossarm and pin insulator. As a result, the conductor will break and fall onto the ground. Most of the waveform recorded due to insulator failure results in line-to-ground fault. Initially, the insulator failure signature could be determined by evaluating the leakage current at pre-fault current waveform. However, most of the waveform is recorded after the protection operated in the system where leakage current is difficult to be determined and is too small [22,23]. In some cases, the neutral current distortion will appear at pre-fault waveform. Therefore, the easier way to represent the fault due to insulator failure is by implementing lumped parameters which consist of fault resistance and ground resistance component [24]. Figure 4 shows the model of fault impedance in this approach and Equations (8)-(10) derived the resistance value of the fault.  By implementing the Emanuel arc model and harmonics current injection, the fault impedance for crane is defined as:

Insulator Fault Model
Insulator failure in overhead line system can be caused by various factors such as ageing factor or degradation of the crossarm and pin insulator. As a result, the conductor will break and fall onto the ground. Most of the waveform recorded due to insulator failure results in line-to-ground fault. Initially, the insulator failure signature could be determined by evaluating the leakage current at pre-fault current waveform. However, most of the waveform is recorded after the protection operated in the system where leakage current is difficult to be determined and is too small [22,23]. In some cases, the neutral current distortion will appear at pre-fault waveform. Therefore, the easier way to represent the fault due to insulator failure is by implementing lumped parameters which consist of fault resistance and ground resistance component [24]. Figure

Insulator Fault Model
Insulator failure in overhead line system can be caused by various factors such as ageing factor or degradation of the crossarm and pin insulator. As a result, the conductor will break and fall onto the ground. Most of the waveform recorded due to insulator failure results in line-to-ground fault. Initially, the insulator failure signature could be determined by evaluating the leakage current at pre-fault current waveform. However, most of the waveform is recorded after the protection operated in the system where leakage current is difficult to be determined and is too small [22,23]. In some cases, the neutral current distortion will appear at pre-fault waveform. Therefore, the easier way to represent the fault due to insulator failure is by implementing lumped parameters which consist of fault resistance and ground resistance component [24]. Figure 4 shows the model of fault impedance in this approach and Equations (8)-(10) derived the resistance value of the fault. The fault resistance of insulator failure in the phase line is defined as: To phase C The fault resistance of insulator failure in the phase line is defined as: Based on Equation (9), the fault impedance is defined as:

Lightning Fault Model
The lightning current is represented by a Norton circuit with a current source (Iy) of 40 kA in parallel with an impedance (Ry) in Figure 5. The impedance of the lightning channel is considered to be of about 400 ohms, although the CIGRE, IEC, and IEEE standards assume higher values [25,26]. The tower electrical parameters for an overhead line of 275 kV are presented in Table 3, considering also a tower footing resistance in series with the entire structure. The parameters are established considering the model proposed by [26].
Based on Equation (9), the fault impedance is defined as:

Lightning Fault Model
The lightning current is represented by a Norton circuit with a current source (Iy) of 40 kA in parallel with an impedance (Ry) in Figure 5. The impedance of the lightning channel is considered to be of about 400 ohms, although the CIGRE, IEC, and IEEE standards assume higher values [25,26]. The tower electrical parameters for an overhead line of 275 kV are presented in Table 3, considering also a tower footing resistance in series with the entire structure. The parameters are established considering the model proposed by [26].
The multistory type tower model is used in this study where it is composed of four main parts representing the tower section between the cross arm, as illustrated in Figure  6. Each section consists of a lossless line in series with a parallel R-L circuit, included for attenuation of the traveling waves along the tower, a1. The propagation velocity of a traveling wave along the tower, C0 is assumed equal to 300 m/μs. Note that the overvoltage that can be obtained by means of simulation, when the simplest models are used, should be the same between terminals of all insulator strings since these models do not distinguish between line phases. In fact, some differences will be expected due to the different coupling between the shield wires and the phase conductors located at different heights above the ground. Meanwhile, this study will not vary the tower height as we maintained the use for 275 kV system voltage from Tenaga Nasional Berhad (TNB).   Table 3, where Z 1 is surge impedance of tower top to the upper phase arm which is equivalent to the upper to middle and middle to lower. Meanwhile, Z 4 is the surge impedance of tower to tower bottom. The value of R and L are defined using the following expression in (11) to (13) where the hi were indicated in Figure 6. Figure 5. Norton circuit. Table 3. Tower model parameters for 275 kV system voltage [26].

Tower Height/Geometry (m)
Surge Impedance (Ω) The multistory type tower model is used in this study where it is composed of four main parts representing the tower section between the cross arm, as illustrated in Figure 6. Each section consists of a lossless line in series with a parallel R-L circuit, included for attenuation of the traveling waves along the tower, a 1 . The propagation velocity of a traveling wave along the tower, C 0 is assumed equal to 300 m/µs. Note that the overvoltage that can be obtained by means of simulation, when the simplest models are used, should be the same between terminals of all insulator strings since these models do not distinguish between line phases. In fact, some differences will be expected due to the different coupling between the shield wires and the phase conductors located at different heights above the ground. Meanwhile, this study will not vary the tower height as we maintained the use for 275 kV system voltage from Tenaga Nasional Berhad (TNB). where = ℎ / 0 : travelling time along the tower

Proposed Methodology
This section provides the characterization of fault signal and its classification algorithm method.

Characterization of the Root Cause of Fault
Characterizing the fault is the process of determining the relevant features of rootcause of fault as well as finding indicators capable of measuring these features. Based on the results of the actual data acquisition, the characteristics of the fault are observed. The task of searching for indicators is one of the key steps in this research; that leads to the highest accuracy to be considered to describe and characterize the fault. This study has a clear achievement of fault detection; where most of the indicators of fault occurrences can be categorized as follows: the fault current duration increases from 10% to 90% of maximum value, fault current duration at 20% and fault current duration at 50% of maximum A transmission tower is represented by four distributed-parameter lines as defined in Table 3, where Z t1 is surge impedance of tower top to the upper phase arm which is equivalent to the upper to middle and middle to lower. Meanwhile, Z t4 is the surge impedance of tower to tower bottom. The value of R and L are defined using the following expression in (11) to (13) where the h i were indicated in Figure 6.
where τ = h t /c 0 : travelling time along the tower.

Proposed Methodology
This section provides the characterization of fault signal and its classification algorithm method.

Characterization of the Root Cause of Fault
Characterizing the fault is the process of determining the relevant features of rootcause of fault as well as finding indicators capable of measuring these features. Based on the results of the actual data acquisition, the characteristics of the fault are observed. The task of searching for indicators is one of the key steps in this research; that leads to the highest accuracy to be considered to describe and characterize the fault. This study has a clear achievement of fault detection; where most of the indicators of fault occurrences can be categorized as follows: the fault current duration increases from 10% to 90% of maximum value, fault current duration at 20% and fault current duration at 50% of maximum value. In addition, other evidences of fault can be observed, which shows the voltage dip percentage and energy wavelet of voltage waveform extracted, as shown in  The fault current duration at 20% of maximum waveform is extracted as it gives indication of the dynamic state of fault at steady stage. The underlying mechanism which fault current moves, differs for each of the major fault causes. Tree encroachme and crane encroachment fault current flows via the high impedance medium whilst fa current due to lightning is conducted via air particles. The resistivity of these mediu differs significantly, as well as the duration of fault. The equation used to calculate fa current duration at 20% is: The fault current duration at 50% from the maximum waveform is extracted as gives an indication of the dynamic state of fault at the final stage before fault extinctio The equation used to calculate fault current duration at 50% is: Figure 7 illustrates the current duration extracted from RMS waveform used in th study.

Fault Current Duration
The RMS fault current duration characteristics rising from 10% to 90% are extracted as they provide a picture of the dynamic state of the fault during the initial transient stage. This gives an indication of the rate of change of current relative to the network or load prior to the fault occurrences. Equation (14) below described the equation of T 10/90 . where represented current duration rise from 10% to 90% of maximum value. (14) with the minimum voltage value at the last point at which the 2nd cycle is calculated. T percentage of voltage dip value is then evaluated. Figure 8 illustrates an example of vo age dip with the minimum voltage indicated. The minimum voltage dip during the fa is calculated relatively with the maximum value as:

Time-frequency Domain Analysis Using Discrete Wavelet Transform (DWT)
This subsection describes the time-frequency domain analysis of voltage signal usi DWT. The DWT which is time-frequency analysis has been deployed as it was capable extract features in time resolution and frequency resolution respectively [27][28][29][30]. The d tail wavelet transform with mathematical algorithm is discussed in the subsequent s tion.

(a) Discrete Wavelet Transform Algorithm
Wavelet transform is a powerful signal processing tool used in recognizing pow disturbance pattern based on its features extraction [31]. It has the capability to analy the signal in multi resolution either localized in time or space. Previously, Fourier Tra form was used to analyze the stationary signal with limited capability for non-stationa analysis as the time information was lost [31]. Fourier Transform equation based on f quency domain can be defined as: From the equation, −∞ to +∞ indicated that time information will be lost. Therefo the wavelet transform is an effective tool to analyze non stationary signal because of mother wavelet function used as the basis function. Mother wavelet function ψ (t) equ tion is defined as:

(c) Sub-Band Filters
In DWT, the signal is analyzed at different frequency bands with different resolutions using the digital filtering techniques. This is significant to divide the signal into approximation and detail signals. The signal will comply into high pass and low pass filters. At the first stage, an original signal is slashed into two halves of bandwidth and shipped to both the filters. Next, the output of the low pass filter is further divided into half of the frequency bandwidth, and further shipped for the next stage. The step is iterated until at

Decision Tree Algorithm
The decision tree classification has advantages in terms of flexibility, nonparamet nature and capable to handle non-linear relations between features and classes. An inp sample can be classified into its possible classes through tree structures of decision tr formation [14,33]. The tree structures formation defined in decision rules model with based on if/else instruction. The decision tree is one of the well-known classification too since it gives prudent accuracy and low cost computation [11]. In this paper, the decisi tree application i.e., a type of supervised learning, is simulated in MATLAB using 'fitctre command. The targeting output will supervise the training sets using recursive bina partitioning method. Succeeding questions with yes or no results are inquired for sep rating the sample space. The nodes are the spots where the test is performed on the e ments. The test results then are represented to another node that could be seen branches. There are three kinds of node presenting in decision tree namely the root nod the leaf node, and the internal node, as illustrated in Figure 11. The outcome of the test determined by the purity of each node. The node will stop once it achieves an optim post level of class purity. The optimal level is defined when the node is having the on  The fault current duration at 20% of maximum waveform is extracted as it gives an indication of the dynamic state of fault at steady stage. The underlying mechanism by which fault current moves, differs for each of the major fault causes. Tree encroachment and crane encroachment fault current flows via the high impedance medium whilst fault current due to lightning is conducted via air particles. The resistivity of these medium differs significantly, as well as the duration of fault. The equation used to calculate fault current duration at 20% is: The fault current duration at 50% from the maximum waveform is extracted as it gives an indication of the dynamic state of fault at the final stage before fault extinction. The equation used to calculate fault current duration at 50% is: Figure 7 illustrates the current duration extracted from RMS waveform used in this study.

Voltage Dip
The RMS voltage dip of fault is evaluated based on two cycle window length which is equivalent to 200 samples. These features indicate a degree of unbalance during fault occurrences. Maximum voltage which is half cycle before the fault initiated is deducted with the minimum voltage value at the last point at which the 2nd cycle is calculated. The percentage of voltage dip value is then evaluated. Figure 8 illustrates an example of voltage dip with the minimum voltage indicated. The minimum voltage dip during the fault is calculated relatively with the maximum value as:

Time-Frequency Domain Analysis Using Discrete Wavelet Transform (DWT)
This subsection describes the time-frequency domain analysis of voltage signal using DWT. The DWT which is time-frequency analysis has been deployed as it was capable to extract features in time resolution and frequency resolution respectively [27][28][29][30]. The detail wavelet transform with mathematical algorithm is discussed in the subsequent section.

(a) Discrete Wavelet Transform Algorithm
Wavelet transform is a powerful signal processing tool used in recognizing power disturbance pattern based on its features extraction [31]. It has the capability to analyze the signal in multi resolution either localized in time or space. Previously, Fourier Transform was used to analyze the stationary signal with limited capability for non-stationary analysis as the time information was lost [31]. Fourier Transform equation based on frequency domain can be defined as: From the equation, −∞ to +∞ indicated that time information will be lost. Therefore, the wavelet transform is an effective tool to analyze non stationary signal because of the mother wavelet function used as the basis function. Mother wavelet function ψ (t) equation is defined as: where 1/a is frequency and 1 √ a is the normalizing constant of each scale parameter. Meanwhile, b is parallel translation of time axis. The continuous wavelet transform (CWT) is defined as: Terms a and b indicate the dilation and translation which determined the frequency length of wavelet and shifting position respectively. The ψ is the mother wavelet, the * indicates that the complex conjugate was used in the case of a complex wavelet. The extension of CWT, known as the discrete wavelet transform (DWT) is introduced to overcome the computational derived from CWT. The DWT is defined as: where; ψ m,n (t) = a Where m, n Z; m, n represent the frequency localization and time localization respectively. The x(t) denoted as signal in time domain. In this paper, db4 mother wavelet is used to detect the disturbance signal and obtain time detection information and detail frequency.

(b) Decomposition
Discrete wavelet transform is a very useful technique to analyze the transient phenomenon. Multiresolution analysis (MRA) is one of the tools of DWT, which decomposes a non-stationary signal into low frequency signal known as approximation and high frequency signal called details. In this stage, the original disturbance waveforms are decomposed using DWT at the desired level j with "Daubechies" wavelet function of order n. The decomposition of PQ waveform into various frequency bands is achieved by applying high pass filter and low pass filter to the time domain signals. The flow of division filters is further explained in the next subsection. Figure 9 (D4)-(D1) illustrates four level decomposition coefficient of voltage dip fault utilised in this paper.

(c) Sub-Band Filters
In DWT, the signal is analyzed at different frequency bands with different resolutions using the digital filtering techniques. This is significant to divide the signal into approximation and detail signals. The signal will comply into high pass and low pass filters. At the first stage, an original signal is slashed into two halves of bandwidth and shipped to both the filters. Next, the output of the low pass filter is further divided into half of the frequency bandwidth, and further shipped for the next stage. The step is iterated until at the agreed level, which are four levels in this case, and this is known as iterated filter bank. The accumulation of detailed information is measured by resolution of the signal, which is altered by filtering operations and the scale is rectified by the down sampling and up sampling operations.
The relation between low pass and high pass filters with the mother wavelet or known as scalar function ψ(t) and the wavelet function φ(t) can be defined as follows: The relation between the low pass filter and high pass filter is not independent to each other, but instead they are related by: where g[n] is the low pass filter, h[n] is the high pass filter, L is the filter length (total number of points). The impulse response h[n] is involved while dispatching the signal across a half band low pass filter. The mathematical of convolution operation of the signal while filtering processes in discrete time is defined as follows: Here h[n] can be any filter's impulse response.
Based on Equations (27) and (28), the high and low frequency are derived as following definition: where y high [k] and y low [k] are the yields of the high pass and low pass filters after subsampling by 2. Here y high [k] is denoted as the detailed component and y low [k] denoted as the approximate component. A 0 denoted as input signal. The algorithm of sub-band filter for high-pass filter and low pass filter of wavelet decomposition in levels of approximated and detailed coefficients were denoted in [32]. After the decomposition process, the wavelet signal will be reconstructed and all reconstructed energy level will be combined to obtain the fix value. At this stage, the higher noises, where the fault signal deviates more from the normal one, will produce more energy. However, an absolute energy value is evaluated above the set up threshold. In this case, the initiated threshold value is 10% of the maximum magnitude. Figure 10 illustrates the reconstructed coefficient energy that is extracted from voltage dip fault signal and used in this study.

Decision Tree Algorithm
The decision tree classification has advantages in terms of flexibility, nonparametric nature and capable to handle non-linear relations between features and classes. An input sample can be classified into its possible classes through tree structures of decision tree formation [14,33]. The tree structures formation defined in decision rules model within based on if/else instruction. The decision tree is one of the well-known classification tools since it gives prudent accuracy and low cost computation [11]. In this paper, the decision tree application i.e., a type of supervised learning, is simulated in MATLAB using 'fitctree' command. The targeting output will supervise the training sets using recursive binary partitioning method. Succeeding questions with yes or no results are inquired for separating the sample space. The nodes are the spots where the test is performed on the elements. The test results then are represented to another node that could be seen as branches. There are three kinds of node presenting in decision tree namely the root node, the leaf node, and the internal node, as illustrated in Figure 11. The outcome of the test is determined by the purity of each node. The node will stop once it achieves an optimal post level of class purity. The optimal level is defined when the node is having the only output types in the node. An element value will be tested against the decision tree whenever to classify the new samples. Class prediction for the tested samples is maintained by the attribution path from the root node to a leaf node. The basic process of getting a decision tree is to repetitively find the attribute to be tested on a node and then subsidiary to another node. This whole attribution process to identify test and branch is known as splitting.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 15 of 28 output types in the node. An element value will be tested against the decision tree whenever to classify the new samples. Class prediction for the tested samples is maintained by the attribution path from the root node to a leaf node. The basic process of getting a decision tree is to repetitively find the attribute to be tested on a node and then subsidiary to another node. This whole attribution process to identify test and branch is known as splitting.
The splitting process has a role to minimize the impurity in the dataset which corresponding to class at the later stage. The process requires information gain calculation which must be accomplished into two stages which are entropy and entropy splitting index. Based on the Figure 11b is parent node, is left child node, and -right child node. The entropy index also known as impurity and the measure of impurity/entropy i(t) at node t is denoted as; The splitting process has a role to minimize the impurity in the dataset which corresponding to class at the later stage. The process requires information gain calculation which must be accomplished into two stages which are entropy and entropy splitting index.
Based on the Figure 11b t P is parent node, t L is left child node, and t R -right child node. The entropy index also known as impurity and the measure of impurity/entropy i(t) at node t is denoted as; p w j |t log p w j |t (29) where p w j |t is the proportion of the pattern x j allocated to class w j at node t. Each non-terminal node is divided into node t L and t R as shown where x R j represented the best splitting values of variable x j . Corresponding proportions of entities new node are P L and P R . The best division (entropy splitting index) is that which maximized the difference is given by Equation (30).
Furthermore, to find the best split predictor at each node, this study implemented selection predictor algorithm namely all splits, curvature, and interaction-curvature. All splits or standard classification decision trees will select the split predictor that maximizes the split-criterion gain over all possible splits of all predictors. Curvature selects the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response. Training speed is similar to standard classification decision tree. Finally, interaction-curvature will choose the split predictor that minimizes the p-value of chi-square tests of independence between each predictor and the response, and that minimizes the p-value of a chi-square test of independence between each pair of predictors and response. Training speed can be slower than standard classification decision tree [34][35][36].

Confusion Matrix Algorithm
The confusion matrix contains the information about the predicted and actual classification. The confusion matrix is usually in the form of a table and is utilized to evaluate the performance of the classifier on a set of testing data where the true value is established. Figure 12 illustrates the basic terms in the confusion matrix formed in the the performance of the classifier on a set of testing data where the true value is established. Figure 12 illustrates the basic terms in the confusion matrix formed in the table. The terms are usually in a form of whole number: • True Positive (TP) is the cases that we predicted yes and they do have the cases.

•
True Negative (TN) is the cases that we predicted no and they do not have the cases.

•
False Positive (FP) is the cases that we predicted yes but they do not have the cases.

•
False Negative (FN) is the cases that we predicted not but they do have the cases.
Based on the confusion matrix basic terms, a list of rates that execute the classifier performance is computed. Table 4 indicates the list of rates with their definition used in this paper. There are five rates that have been used which are accuracy, sensitivity, specificity, precision, and F1 score.   Based on the confusion matrix basic terms, a list of rates that execute the classifier performance is computed. Table 4 indicates the list of rates with their definition used in this paper. There are five rates that have been used which are accuracy, sensitivity, specificity, precision, and F 1 score.

Fault Model Validation Result
The fault model proposed in the study was validated by comparing the simulation result with field data acquired from the fault recorder. Current waveform generated from the simulation was validated by evaluating its root mean square error (RMSE) for each cases. The equation of the RMS is defined as: where x is the actual waveform while x i is the generated waveform. Detail validation result is described in the following subsection. Figure 13a illustrates the generated simulation and actual waveform due to lightning strike. The generated waveform of 180-degree inception angle and 330 MVAR load were chosen for comparison with the actual fault occurs within 20 km from the substation. The RMSE value of the generated waveform current is 0.0342. Figure 13b illustrates the generated simulation and actual waveform due to insulator degrading. The RMSE generated waveform of 180-degree inception angle and 330 MVAR load was evaluated with actual fault occurs within 60 km from the substation, which gives the value of 0.0498. Furthermore, Figure 13c illustrates the generated simulation and actual waveform due to the tree encroachment. The actual waveform was taken on 30th December 2019 during the fault occurrence. The generated waveform chosen at 30-degree inception angle and 330 MVAR load was compared with the actual fault that occured within 10 km from the substation. The RMSE value of generated waveform current and is 0.0696. Meanwhile, Figure 13d illustrates the generated simulation and actual waveform due to crane encroachment. The generated waveform with fault triggered at 30-degree inception angle and 330 MVAR load is evaluated based on actual fault occurrences within 30 km from the substation. The RMSE value of generated waveform current is 0.0498. Therefore, the RMSE value of all simulated fault causes are less than 0.1 which is considered small and indicated the correctness of the models.

Fault Signature Characterization
Initially, the fault occurred in transmission line system is simulated based on signature obtained in raw neutral current and voltage signal. Single line to ground fault have been chosen for this study due to the fact that it is the most prominent fault types among the line to line fault, the double line to ground fault, and the three phase fault. Moreover, four types of faults namely lightning, insulator degrading, tree encroachment, and crane encroachment are chosen according to their prominent tripping occurred in Malaysia within the year 2016 to 2020. The signatures of faults based on neutral current and voltage waveform, which have different patterns, can be observed in Figures 14 and 15a-     Based on the Figure 14a, tree encroachment shows a gradual current increase that lasted for six cycles. Whilst lightning fault shows current increase which lasting for two cycles and with high magnitude of about 2 kA Figure 14b. In some cases, the lightning fault can last for three to three and half cycle. As for the crane encroachment, fault current indicates a gradual increase lasting for three cycles which contains harmonic content at the pre-fault as shown in Figure 14c. Finally, for insulator degrading (Figure 14d), the fault current increases within three cycles, which is the same pattern as per lightning fault but with a lower magnitude. However, fault current for lightning is usually higher than insulator degrading due to its high current which can be up to thousands of kiloamps, which is dependent on the load carried in the lines. The higher the loads, the greater the fault current magnitude.
In addition, the criteria of faults were observed on the voltage signature. Based on the Figure 15a,c, sinusoidal voltage pattern for tree encroachment and crane encroachment has no difference from the normal condition until the circuit breaker is closed. Meanwhile, Figure 15b shows a voltage dip of more than 20% was produced in the system for lightning with a slight dip in the case of insulator degrading as in Figure 15d Based on the observation, raw voltage and current fault did give several signatures but insufficient to be analyzed due to similar characteristics obtained in several cases. Therefore, further analyses such as converting the raw current and voltage into RMS and DWT have been considered and proposed in this paper.

Decision Tree Method Sample Selection
This subsection explores the performance of the decision tree classification method in detecting the four types of faults during operation. All extracted features are implemented in MATLAB, where a set of 840 samples contained four types of fault are mixed randomly and trained into the classification system. Several parameters in the decision tree namely the maximum number of splits and predictor selection are varied to find the best classifier to be used in the study based on its percentage accuracy. Furthermore, the confusion matrix algorithm is adopted in the decision tree to present the performance of classification accuracy of the fault causes. Table 5 shows a set of 10 samples containing five variables as the input data and those four types of fault were assigned as 1, 2, 3, and 4, respectively. Table 6 defines the actual output variables which later to be set as an array in the system to ease the computation process. Tree encroachment -4 Crane encroachment - Figure 16 shows the decision tree diagram with the average number of splits of five which is not cross validated at the early stage. The root node of the features is T1090 whereby if the duration is greater or equal to 0.0857 s, then the fault is defined as tree encroachment. Otherwise, if it is less than 0.0857 s, it will create another internal node. If the voltage dip is equal to or greater than 19.52%, the fault cause is termed as lightning, or else, it creates the other internal node. Next, if the wavelet energy is less than 0.113, the leaf node is concluded as the crane encroachment. Otherwise, the decision tree will look into T50 and T20. Based on these two features, the faults can be categorized as a crane or insulator degrading. If the T50 is less than 0.0599 s, and T20 is greater than or equal to 0.0716, the leaf node is defined as insulator degrading. Otherwise, if T50 is greater than or equal to 0.0599 s and T20 less than 0.0716, the fault causes are concluded as the crane encroachment.

Var 5
Wavelet energy (Ener) -Categorical variable / Output 1 Lightning -2 Insulator degrading -3 Tree encroachment -4 Crane encroachment - Figure 16. Decision tree diagram of fault causes without cross-validated.  In addition, the decision tree is cross-validated using 10 fold and iterated 10 times with a different maximum number of splits set up. In this study, the maximum number of splits are evaluated between 1 to 10 with the accuracy of each splits is then evaluated. Table 7 shows the least percentage accurate obtained when the number of splits is set at 1 which is 52.048% while the highest percentage accuracy is obtained when the number of splits is set at 3, 4, 8, and 9 which is translated in 99.829%. This defines that the minimum of split with the highest accuracy, which is 3, was indicated as an optimum number of split chosen. Therefore, the decision tree algorithm with a number of splits of 3 is considered and implemented in the next process.

Decision Tree Classification Performance of Fault Causes Based on Different Predictor Selection
A test set of 30% and 20% sample sizes are created to be tested in the decision tree algorithm. The test set ratio are randomly selected using MATLAB command to become an input of creating the decision tree while the targeted output is validated based on classifica-tion criteria. The training set of 70% and test set of 30% data samples are described in Tables  8-10. Based on Table 8, the classification accuracy of fault with respect to all splits predictor selection is evaluated where it gives 100% prediction accuracy. Meanwhile, Table 9 provides the classification accuracy of the decision tree for a fault that occurred when the predictor selection is set as curvature. It can be observed that three samples of crane encroachment are misclassified into tree encroachment region, resulting in about 85.71% accuracy. Furthermore, the classification accuracy based on the interaction-curvature of predictor selection adopted in the study is evaluated in Table 10. From the table, it can be observed that one sample of lightning is misclassified into insulator degrading region which makes the classification error increase by 1.6%. In summary, based on Tables 8-10, the decision tree classification has an average prediction accuracy of 97.4% although there are misclassifications recorded for each case.   On another case, the training set of 80% and test set of 20% data samples are described in Tables 11-13. When all splits predictor selection is adopted to the created decision tree, one sample from crane encroachment region was misclassified as a tree encroachment region, resulting in 97.4% prediction accuracy. Apart from that, the rest of the regions resulted in 100% accuracy without any misclassification samples. Whilst, the classification accuracy of the decision tree for fault causes with respect to the curvature predictor selection is provided in Table 12. It can be observed that all fault causes are correctly classified and thus, gives 100% prediction accuracy. The classification accuracy of the decision tree for fault classification with respect to interaction-curvature is provided in Table 13. Three samples from crane encroachment regions are misclassified into tree encroachment and thus resulted in the lowest prediction accuracy which is 93.2%. Overall, it can be concluded that although there are misclassification occurred, each decision tree has an average prediction accuracy of 96.87% which is considered appropriate and high accuracy, as depicted in Tables 11-13.

Computational Time
The proposed method also compares the computation time of each predictor selection. Table 14 describes the computation time of each predictor selection. Overall, the computation time of each predictor selection with both sample ratios is nearly the same. However, when zooming further, all splits predictor with sample ratio of 70/30 computed the processing code at the fastest rate with 11.622430 s while curvature predictor at sample ratio of 80/20 resulted at the slowest rate.

Confusion Matrix Performance for Decision Tree on Different Predictor Selection
The confusion matrix presents the performance of the decision tree based on different predictor selection and different testing samples. Figure 17 illustrates the forms of confusion matrix for the decision tree method. In this study, one versus one (OVO) error correcting output codes (ECOC) have been trained, and ensemble Bag parameters as binary learner are implemented to outperform multiclass output. Accuracy, sensitivity, specificity, precision, and F 1 scores are extracted from the confusion matrix as shown in Table 15.
Overall, the predictor selection of interaction-curvature gives the worst performance for this classification application and all splits give the best performance for most of the faults classified. Therefore, it can be concluded that several samples are misclassified due to the confusion features where the system sees the similar features for different faults On another note, the best predictor selection for this classification is all splits.   Figure 17. Confusion matrix for training (a) and testing (b) performance of decision tree. The fault due to lightning when tested with 30% and 20% data set shows the best performance (100%) for all predictor selection except for interaction-curvature where the accuracy, specificity, precision, and F 1 score are 98.8%, 94.5%, 94.8%, and 97.3%, respectively. In addition, the same pattern was obtained for fault due to insulator degrading where interaction-curvature gives the worst performance as compared with the other predictor selection. The accuracy of the confusion matrix is 99.6%, with specificity is 99.4%, precision is 98.6% and F 1 score is 99.3%. As for the tree encroachment, more misclassified data were obtained as compared to the lightning and insulator degradation where the confusion matrix performance is also lower. Based on the table, curvature predictor selection shows the worst performance when the tested sample is 30%, while all splits and interactioncurvature shows low performance when the tested sample is 20%. However, among all of them, interaction-curvature gives the worst predictor selection performance with only 98.2% accuracy, 97.8% specificity, 91.4% precision, and 95.5% F1 score. Finally, fault cause due to crane encroachment shows the worst performance when using curvature predictor selection for 30% tested sample and interaction-curvature predictor selection gives worst performance for 20% tested sample.
Overall, the predictor selection of interaction-curvature gives the worst performance for this classification application and all splits give the best performance for most of the faults classified. Therefore, it can be concluded that several samples are misclassified due to the confusion features where the system sees the similar features for different faults. On another note, the best predictor selection for this classification is all splits.

Conclusions
This paper outlined the fault signatures that occurred in transmission line networks by characterizing and classifying using the decision tree classification method. The RMS fault current duration and voltage dip percentage, as well as DWT, were evaluated for several cycles during the fault condition. One could infer that the proposed fault classification technique is simple and can accomplish extremely high precision where most of the classifier recorded performance greater than 93%, with computation time within 12.089 s on average. Classification using a decision tree is without much of a stretch to handle both numerical and categorical variables. The method can lift the decision list, incorporate with the search steps, apply the decision tree rules on the fault detection and also improve the accuracy of fault tree classification. Among other advantages of the decision tree method is its endurance to outliers, where the splitting algorithm will usually separate the outliers into individual node or nodes. An important practical property of a decision tree is that the structure of its classification trees is invariant with respect to monotonic transformations of independent variables. One can replace any variable with its logarithm or square root value, without the need to change the structure of the tree. Supervised fault classification with decision tree analysis is a successful method and effectively implemented for creating a ruled-based classification when expert knowledge is inadequate. Funding: The authors would like to thank Universiti Tenaga Nasional for the BOLD Scholarship and FRGS (20180112FRGS). Special thanks to Tenaga Nasional Berhad (Grid Maintenance) team for their kind support on the data.