Entropy SVM–Based Recognition of Transient Surges in HVDC Transmissions

Protection based on transient information is the primary protection of high voltage direct current (HVDC) transmission systems. As a major part of protection function, accurate identification of transient surges is quite crucial to ensure the performance and accuracy of protection algorithms. Recognition of transient surges in an HVDC system faces two challenges: signal distortion and small number of samples. Entropy, which is stable in representing frequency distribution features, and support vector machine (SVM), which is good at dealing with samples with limited numbers, are adopted and combined in this paper to solve the transient recognition problems. Three commonly detected transient surges—single-pole-to-ground fault (GF), lightning fault (LF), and lightning disturbance (LD)—are simulated in various scenarios and recognized with the proposed method. The proposed method is proved to be effective in both feature extraction and type classification and shows great potential in protection applications.


Introduction
High voltage direct current (HVDC) transmission plays an important role in power transmissions due to its advantages such as large transmission capacity and good performance in power flow control [1,2]. It has been widely applied in delivering large amount of power and connecting asynchrony power grids. Generally, traveling wave-based protection or voltage derivate based protection are used as the primary protection, and under-voltage protection or current-differential protection are adopted as the backup protections in the HVDC systems [3,4]. Traveling wave-based protection captures the transient traveling surges on transmission lines and make quick response in a very short time. As the shunt capacitors and the smoothing inductors in convertor station can effectively reflect traveling waves, traveling wave-based protection can easily distinguish faults beyond the protected zone. The time-domain features, such as magnitude and changing rate of electrical measurements, are commonly used in protection judgement [5,6].
However, the time domain-based method is sensitive to surge disturbances, for example, lightning strokes [7,8]. The transient waveforms of surge interferences look similar with the ones of ground faults in some cases. Such similarity makes them difficult to be discriminated. In order to improve the reliability of protection actions, the identification of transient surges is a critical function of protection algorithm, which includes two important aspects: feature extraction and classification algorithm.
To effectively identify transient surges, the unique features of various signals should be extracted, and the features should represent the signals with a stable and reliable performance. Frequency analysis that can provide more details on spectrum differences are often used to fully utilize the signal transient information in both time and frequency domains and reveal a better characterizing performance. Various frequency-based features are adopted and reported to generate good performance-for

Fundamentals of HVDC
The grounding method of a single polar DC system can lead to symmetric and asymmetric DC transmissions. Both kinds of grounding methods are widely used in practical projects. The symmetric DC transmission only employs one convertor to construct positive and negative poles and are more popular for voltage-source converters-high voltage direct current (VSC-HVDC) systems. A typical two-terminal VSC-HVDC system, as shown in Figure 1, is analyzed in this research and modeled on the platform of PSCAD/EMTDC. The midpoint of the DC supporting capacitor is grounded to form a symmetric DC transmission. Such a grounding method can reduce the insulation requirement of DC devices and avoid live currents in grounding loops under normal operation [27,28]. The bus voltage of VSC1 is controlled while the power of VSC2 is controlled.
Measuring units M are installed at the terminals of transmission lines to provide useful information for protection devices. The smoothing reactor L on both terminals of the transmission line can block high frequency components of the transient signal from convertors and thus reduce the influences of interferences beyond the protection zone of transmission lines. However, the transient interferences generated on transmission lines can still affect the protection judgement. To improve the reliability of DC line protections, the transients that can be detected by measuring units M

Pole-To-Ground Fault
GF is actually a kind of short circuit fault. It is the most commonly encountered fault in practical HVDC transmission lines. For the symmetric grounding VSC-HVDC transmission system mentioned above, the fault transient procedure includes two stages: discharge of supporting capacitor and current feeding of alternating current (AC) sources [17]. In the first stage, the traveling wave produced by faults moves quickly along the transmission line. When it reaches the converter station, the supporting capacitor discharges quickly. Such discharge causes a large amplitude decrease of DC bus voltage and quick rising fault currents. As the bus voltage decreases to AC phase voltages, the AC sources begin to feed the fault circuit and the second stage starts. The diodes on each leg of convertor are commutated in an uncontrolled commutation mode, and the overall current tends to be steady. Figure 2a illustrates five typical waveforms of GF, and Figure 2b shows their frequency spectrums. The details of these five GFs are as follows: (1) Zg = 20 Ω, Lf = 10 km; (2) Zg = 20 Ω, Lf = 50 km; (3) Zg = 10 Ω, Lf = 100 km; (4) Zg = 1 Ω, Lf = 150 km; (5) Zg = 0.01 Ω, Lf = 200 km. Here, Zg stands for the grounding impedance of fault, and Lf denotes the distance of fault point. Although these waveforms look quite different in the time domain, they have similar frequency spectrums. All of their spectrums decrease smoothly from lower frequencies to higher ones. Small ripples that distribute equally along the frequency range can be found.

Lightning Transients
For long distance transmission, overhead lines are preferred in the aspect of economy and maintenance. But overhead lines are generally erected high away from the earth and located in open area, where lightning disturbance occurs. The lightning strokes can produce quick rising transients that are interferences in some cases or faults when the insulations are breakdown [29]. Therefore, recognition of the lightning transient surges has great significance to improve the reliability of traveling wave based protection.
The lightning stroke is actually a kind of discharge of electric charges between clouds and the earth. It can be modeled by a current source and injects current into the transmission systems. In the aspect of transmission line protection, lightning surges can be divided into two kinds: lightning disturbance (LD) and lightning fault (LF). The formation of these two transient surges are similar, but the results are different. Both LD and LF are caused by overvoltage that is produced by direct strokes or indirect strokes. For overhead transmission lines, the direct strokes that hit the bare conductor can generate large overvoltage and result in short-circuit of lightning surge arrestors. Although a large overvoltage is produced, the current is not so large due to the characteristic impedance of the transmission line (around several hundred ohms) and the operation of lightning protection devices at the line terminals. This kind of lightning stroke only generates electrical interferences. The lightning-caused traveling wave continuously refracts due to discontinuities at both ends of the line and eventually decays to zero [30]. The indirect strokes hit a point in the vicinity of the transmission

Pole-To-Ground Fault
GF is actually a kind of short circuit fault. It is the most commonly encountered fault in practical HVDC transmission lines. For the symmetric grounding VSC-HVDC transmission system mentioned above, the fault transient procedure includes two stages: discharge of supporting capacitor and current feeding of alternating current (AC) sources [17]. In the first stage, the traveling wave produced by faults moves quickly along the transmission line. When it reaches the converter station, the supporting capacitor discharges quickly. Such discharge causes a large amplitude decrease of DC bus voltage and quick rising fault currents. As the bus voltage decreases to AC phase voltages, the AC sources begin to feed the fault circuit and the second stage starts. The diodes on each leg of convertor are commutated in an uncontrolled commutation mode, and the overall current tends to be steady. Figure 2a illustrates five typical waveforms of GF, and Figure 2b shows their frequency spectrums. The details of these five GFs are as follows: (1) Z g = 20 Ω, L f = 10 km; (2) Z g = 20 Ω, L f = 50 km; (3) Z g = 10 Ω, L f = 100 km; (4) Z g = 1 Ω, L f = 150 km; (5) Z g = 0.01 Ω, L f = 200 km. Here, Z g stands for the grounding impedance of fault, and L f denotes the distance of fault point. Although these waveforms look quite different in the time domain, they have similar frequency spectrums. All of their spectrums decrease smoothly from lower frequencies to higher ones. Small ripples that distribute equally along the frequency range can be found.

Lightning Transients
For long distance transmission, overhead lines are preferred in the aspect of economy and maintenance. But overhead lines are generally erected high away from the earth and located in open area, where lightning disturbance occurs. The lightning strokes can produce quick rising transients that are interferences in some cases or faults when the insulations are breakdown [29]. Therefore, recognition of the lightning transient surges has great significance to improve the reliability of traveling wave based protection.
The lightning stroke is actually a kind of discharge of electric charges between clouds and the earth. It can be modeled by a current source and injects current into the transmission systems. In the aspect of transmission line protection, lightning surges can be divided into two kinds: lightning disturbance (LD) and lightning fault (LF). The formation of these two transient surges are similar, but the results are different. Both LD and LF are caused by overvoltage that is produced by direct strokes or indirect strokes. For overhead transmission lines, the direct strokes that hit the bare conductor can generate large overvoltage and result in short-circuit of lightning surge arrestors. Although a large overvoltage is produced, the current is not so large due to the characteristic impedance of the transmission line (around several hundred ohms) and the operation of lightning protection devices at the line terminals. This kind of lightning stroke only generates electrical interferences. The lightning-caused traveling wave continuously refracts due to discontinuities at both ends of the line and eventually decays to zero [30]. The indirect strokes hit a point in the vicinity of the transmission lines-for example, the tower or shielding wire. If the lightning stroke current is high enough to cause tower-to-conductor flashover or shielding wire failure, LF occurs [31,32]. In this case, the induced currents in transmission line could be greater those due to LDs.
Generally, the lightning strike is modeled mathematically with a double exponential equation, as shown below [33,34].
where A is the magnitude correction coefficient, I L is the amplitude of lightning strike, and α and β are the waveform coefficients and stand for the rising and falling time of the lightning impulse, respectively. Different lightning strikes are added to produce both LD and LF transients. The current waveforms of grounding fault and disturbances due to lightning stroke are simulated and displayed in Figure 2c,e, and their frequency spectrums are shown in Figure 2d lines-for example, the tower or shielding wire. If the lightning stroke current is high enough to cause tower-to-conductor flashover or shielding wire failure, LF occurs [31,32]. In this case, the induced currents in transmission line could be greater those due to LDs. Generally, the lightning strike is modeled mathematically with a double exponential equation, as shown below [33,34].
where A is the magnitude correction coefficient, IL is the amplitude of lightning strike, and α and β are the waveform coefficients and stand for the rising and falling time of the lightning impulse, respectively. Different lightning strikes are added to produce both LD and LF transients. The current waveforms of grounding fault and disturbances due to lightning stroke are simulated and displayed in Figure 2c,e, and their frequency spectrums are shown in Figure 2d  Generally, the LF and GF currents increase gradually when the LD current oscillates around normal operating value. An easy method to distinguish fault and disturbance is to integrate the  Generally, the LF and GF currents increase gradually when the LD current oscillates around normal operating value. An easy method to distinguish fault and disturbance is to integrate the current waveforms, but this method usually needs tens of milliseconds, which is too long for a DC protection to make a judgement. Differences between the three kinds of transient surges can be revealed obviously in the frequency domain even though the time window of transient signal is only a few milliseconds. Appropriate selection of distribution features in frequency spectrum can help to discriminate the transients in extremely short duration.

Definantion of FSE
Entropy, a convenient tool for measuring the overall disorder of the system, has been effectively used in the field of signal processing [13,35]. If the frequency spectrum of any signal is considered as a system, its distribution can be characterized by entropy. In this paper, the frequency spectrum is generated by Fourier transform. The frequency spectrum is divided equally into m bands. The amplitude of the whole frequency spectrum is treated as a dataset, which is divided into n intervals to calculate the histogram of the frequency spectrum. The number of coefficients of ith interval is denoted as x ij , and the probability p(x ij ) of x i is calculated according to Equation (2). The definition of FSE H i is shown Equation (3). Each frequency band can produce one entropy value. A FSE vector H FSE with a size of m will be formed, as shown in Equation (4).

FSE Representation of Transient Surges
To illustrate the performance of FSE in representing different transient surges, the entropy vectors are analyzed under various scenarios and tested by the simulation model shown in Figure 1. The transient signal is sampled at a rate of 100 kHz, and the time window size is 3 milliseconds. Therefore, the data segment contains only 300 values (100 × 10 3 sample per second × 3 × 10 −3 millisecond). After Fourier transform, the frequency spectrum from the lowest frequency (except 0 Hz) to 50 kHz is divided into six frequency bands. Each frequency band has a range around 8.5 kHz. current waveforms, but this method usually needs tens of milliseconds, which is too long for a DC protection to make a judgement. Differences between the three kinds of transient surges can be revealed obviously in the frequency domain even though the time window of transient signal is only a few milliseconds. Appropriate selection of distribution features in frequency spectrum can help to discriminate the transients in extremely short duration.

Definantion of FSE
Entropy, a convenient tool for measuring the overall disorder of the system, has been effectively used in the field of signal processing [13,35]. If the frequency spectrum of any signal is considered as a system, its distribution can be characterized by entropy. In this paper, the frequency spectrum is generated by Fourier transform. The frequency spectrum is divided equally into m bands. The amplitude of the whole frequency spectrum is treated as a dataset, which is divided into n intervals to calculate the histogram of the frequency spectrum. The number of coefficients of ith (1 ≤ I ≤ m) band xi in jth (1 ≤ j ≤ n) interval is denoted as xij, and the probability p(xij) of xi is calculated according to Equation (2). The definition of FSE Hi is shown Equation (3). Each frequency band can produce one entropy value. A FSE vector HFSE with a size of m will be formed, as shown in Equation (4).

FSE Representation of Transient Surges
To illustrate the performance of FSE in representing different transient surges, the entropy vectors are analyzed under various scenarios and tested by the simulation model shown in Figure 1. The transient signal is sampled at a rate of 100 kHz, and the time window size is 3 milliseconds. Therefore, the data segment contains only 300 values (100 × 10 3 sample per second × 3 × 10 −3 millisecond). After Fourier transform, the frequency spectrum from the lowest frequency (except 0 Hz) to 50 kHz is divided into six frequency bands. Each frequency band has a range around 8.5 kHz.  The reliability of FSE representation is also tested with different transition resistances and locations: transition resistance equals 0.01 Ω, 1 Ω and 10 Ω in grounding faults, and transients occur at 10 km, 100 km, and 200 km, respectively. As illustrated in Figure 4, the trends of FSE distribution vary slightly when parameters change. The reliability of FSE representation is also tested with different transition resistances and locations: transition resistance equals 0.01 Ω, 1 Ω and 10 Ω in grounding faults, and transients occur at 10 km, 100 km, and 200 km, respectively. As illustrated in Figure 4, the trends of FSE distribution vary slightly when parameters change.

Foundamentals of SVM
SVM is a kind of machine learning algorithm based on statistical learning theory, Vapnik-Chervonenkis theory, and structural risk minimization. It has unique advantages in solving small sample, non-linear, and high-dimensional pattern recognition problems and has been widely used in the fields of pattern recognition and regression analysis [25,36].
SVM is a non-probabilistic binary linear classifier. Its main idea is to establish a classification hyperplane as the decision-making plane that maximizes its distance to the data [26]. For linearly separable data with l training samples, the design algorithm for an SVM is reduced to a convex optimization problem, as described in Equation (5), and its binary classification can be represented by Equation (6).
where Xi  R n is the ith feature, Yi  {−1, 1} is the target label value (binary problem), w  R m is the weight vector, are the Lagrange coefficient, 〈 , 〉 is the inner product of the input features vector and , b is the bias term, and d(w, b) = w T Xi + b = 0 defines the decision function (classification hyperplane). The weight vector w and the bias term b of decision function can be computed by Equations (7) and (8).
= max + min subject to { 1 ≤ , ≤ = −1, = 1 In practical applications, most kinds of data are not linearly separable in their original spaces. The original finite-dimensional space is then mapped to a much higher space to generate easier separation. The penalty parameter C and slack variables εi are added to the decision function, as shown in Equation (9).

Foundamentals of SVM
SVM is a kind of machine learning algorithm based on statistical learning theory, Vapnik-Chervonenkis theory, and structural risk minimization. It has unique advantages in solving small sample, non-linear, and high-dimensional pattern recognition problems and has been widely used in the fields of pattern recognition and regression analysis [25,36].
SVM is a non-probabilistic binary linear classifier. Its main idea is to establish a classification hyperplane as the decision-making plane that maximizes its distance to the data [26]. For linearly separable data with l training samples, the design algorithm for an SVM is reduced to a convex optimization problem, as described in Equation (5), and its binary classification can be represented by Equation (6).
where X i ∈ R n is the ith feature, Y i ∈ {−1, 1} is the target label value (binary problem), w ∈ R m is the weight vector, α i are the Lagrange coefficient, X i , X j is the inner product of the input features vector X i and X j , b is the bias term, and d(w, b) = w T X i + b = 0 defines the decision function (classification hyperplane). The weight vector w and the bias term b of decision function can be computed by Equations (7) and (8). In practical applications, most kinds of data are not linearly separable in their original spaces. The original finite-dimensional space is then mapped to a much higher space to generate easier separation. The penalty parameter C and slack variables ε i are added to the decision function, as shown in Equation (9).
Such optimization can be can be represented by a binary classification as shown in Equation (10).
To amplify the differences or the margins between data, every inner product X i , X j that is related to the features vectors is replaced by a nonlinear kernel function, as shown in Equation (11).
Here, K X i , X j is the kernel function, ϕ is the nonlinear mapping. The use of kernel function allows the maximum-margin hyperplane to linearly separate data in transformed higher dimensional feature space. The kernel function is selected to suit the particular classification problem by testing the performance of kernel functions. The most commonly used kernel functions are listed in Table 1. Table 1. Typical kernel functions.

Type
Definition Where γ, r and p are kernel parameters.
Though originated from the processing of binary classification, SVM can solve the problem of multi-classification by construction [37][38][39]. The construction of SVM can be divided into two categories: direct method and indirect method. The direct method is generally completed by modifying the objective function. Such method has a high computational complexity and is a bit of difficult in implementations. The indirect method is usually achieved by combining multiple binary classifiers. This solution is simple and easy to be used. "One-to-One" construction is one of the most commonly used indirect construction methods. It designs a SVM between any two types of samples, and determines the type of the unknown sample according to the category scores given by each SVM pair. This construction method greatly reduces the calculation complexity of each classification problem by increasing the number of binary classifiers, and the parallel computation of multiple classifiers improves the overall training speed and the classification accuracy. A "One-to-One" construction method is thus adopted to achieve multi-classification in this research.

Recognition Method
Combining the advantages of FSE and SVM, a transient surge recognition method is proposed. Its flowchart is shown in Figure 5. Four steps are included in this proposed method. Since voltage is controlled in VSC-HVDC systems, the current measurements contain more transients than voltages ones and are thus employed in recognition. To avoid the influence from communications, only local measurements are used.

Data processing
A time window of 3 milliseconds is used to capture the starting part of the transient surges. Fourier transform is adopted to generate frequency spectrum. The FSE vector is calculated according to Equation (4).

SVM training
Training SVM is crucial for accurate discrimination of faults and disturbances. The structure of SVM is defined by using the "One-to-One" method.

Transient recognition
The trained SVM is tested with test samples and used for recognizing different kinds of transients.

Simulation Model
A two-terminal point-to-point VSC-HVDC system, as shown in Figure 1 (1), is used to simulate lightning strokes. A typical current wave shape 8 μs/20 μs is adopted with ±10% variations in its parameters [34,40]. The amplitude of lighting strokes varies from 5 kA to 15 kA for LD, and from 30 kA to 100 kA for LF. The sampling rate is 100 kHz and the time window for surge capturing is 3 ms.

Data Processing
The frequency spectrum of measured current surges are generated by Fourier transform. The   (1), is used to simulate lightning strokes. A typical current wave shape 8 µs/20 µs is adopted with ±10% variations in its parameters [34,40]. The amplitude of lighting strokes varies from 5 kA to 15 kA for LD, and from 30 kA to 100 kA for LF. The sampling rate is 100 kHz and the time window for surge capturing is 3 ms.

Data Processing
The frequency spectrum of measured current surges are generated by Fourier transform. The whole frequency spectrum (frequency range (0 Hz, 50 kHz]) is divided into 6 frequency bands, and the total amplitude is divided into 30 intervals for FSE calculation. So, the FSE vectors have a size of 6, E FSE = [E 1 , E 2 , E 3 , E 4 , E 5 , E 6 ]. For each kind of transients, 200 samples are collected, 100 samples of each kind are randomly selected to form the training sample set, and the rest 100 samples are used for testing. Since six-dimensional data cannot be demonstrated graphically, the E FSE vector is decomposed into two vectors: E FSE1 = [E 1 , E 2 , E 3 ] that represents the lower frequency distributions and E FSE2 = [E 4 , E 5 , E 6 ] that suggests the higher frequency distributions. Figure 6 shows all samples used for training: 100 samples for each kind of transient surges. The feature map or the space distributions of two decomposed vectors E FSE1 and E FSE2 are shown in Figure 6a,b, respectively.
As illustrated in Figure 6, the decomposed E FSE1 and E FSE2 vectors are nonlinearly separable in their three-dimensional spaces. In Figure 6a, the lower frequency features E FSE1 of three kinds of transient surges are mixed together, especially, the features of LD and LF. The FSE features E FSE2 of LD and LF in higher frequency range are close to each other. A few of the E FSE2 of GF mixed with those of LD. Therefore, the original feature vectors E FSE of the three kinds of samples are also linearly inseparable in six-dimensional spaces. A kernel function is thus needed to construct a decision surface in higher dimensional space.  Figure 6a,b, respectively. As illustrated in Figure 6, the decomposed EFSE1 and EFSE2 vectors are nonlinearly separable in their three-dimensional spaces. In Figure 6a, the lower frequency features EFSE1 of three kinds of transient surges are mixed together, especially, the features of LD and LF. The FSE features EFSE2 of LD and LF in higher frequency range are close to each other. A few of the EFSE2 of GF mixed with those of LD. Therefore, the original feature vectors EFSE of the three kinds of samples are also linearly inseparable in six-dimensional spaces. A kernel function is thus needed to construct a decision surface in higher dimensional space.

SVM Training
As aforementioned, the "One-to-One" method is adopted in SVM training. Since there are 3 kinds of transient surges: LD, GF, and LF, three SVMs are employed to construct the following pairs: (i) 1st pair: LD-LF (SVM1), (ii) 2nd pair: LD-GF (SVM2), and (iii) 3rd pair: LF-GF (SVM3). The type of unknown transient can be determined by combining the results of each SVM. For example, an unknown transient surge can be determined to be LD only when both SVM1 and SVM2 produce LD classification results.
The selection of suitable kernel function is quite crucial for an excellent SVM classifier. The rate of correct recognition is used to evaluate the training performance. Four kinds of kernel functionslinear, polynomial, RBF, and sigmoid-are discussed. The K-fold Cross Validation (K-CV) is commonly used to choose parameter combinations to achieve highest classification accuracy, and avoid either over-learning or under-learning. The main idea of K-CV is to divide the original data into K groups, each of which includes both training and testing samples. The highest classification accuracy is taken as the objective function to determine the parameters of the SVM classifier. Here, K equals to 5 in this research. The mean recognition rates of five-fold cross validations are listed in Table  2.

SVM
Kernel Function Liner Polynomical RBF Sigmoid

SVM Training
As aforementioned, the "One-to-One" method is adopted in SVM training. The selection of suitable kernel function is quite crucial for an excellent SVM classifier. The rate of correct recognition is used to evaluate the training performance. Four kinds of kernel functions-linear, polynomial, RBF, and sigmoid-are discussed. The K-fold Cross Validation (K-CV) is commonly used to choose parameter combinations to achieve highest classification accuracy, and avoid either over-learning or under-learning. The main idea of K-CV is to divide the original data into K groups, each of which includes both training and testing samples. The highest classification accuracy is taken as the objective function to determine the parameters of the SVM classifier. Here, K equals to 5 in this research. The mean recognition rates of five-fold cross validations are listed in Table 2. As illustrated by Table 2, the kernel function RBF can produce higher recognition rates than others. The RBF kernel function is thus selected in FSE-SVM based recognition.
Other parameters, such as penalty parameter C and kernel function parameter γ, are tested by K-CV (K = 5). The common practice for parameter selection is to take the relevant parameters within a certain range. Both parameters are performed within a range from 0 to 1000. The values that give highest mean recognition rate are kept and used as the parameters of trained SVM. Table 3 lists all the selected parameters of each SVM and the overall mean recognition rates. highest mean recognition rate are kept and used as the parameters of trained SVM. Table 3 lists all the selected parameters of each SVM and the overall mean recognition rates.            Figure 6, the FSE features of LD and LF are quite similar. Two high frequency features: E 5 and E 6 are selected for illustration because their distributions are relatively far away from each other. The intersection of the decision function d(w, b) with the plane of features defines the optimal separation hyperplane, as shown in Figure 7b. By calculating the sign of decision surface d(w, b), the classification of LF and LD can be realized to some extent. As only two features are used for illustration, not all samples are effectively classified. With all 6 features, the classification results can be more effective. Figures 8 and 9 show the binary classifications of LD vs. GF and LF vs. GF, respectively, through only the feature E 1 and E 2 . As shown in Figure 6, the large amplitude FSE features of GF gather more in the lower frequency range, which is different from those of transients caused by lightning strikes. As demonstrated by Figure 8b, the samples of LD and GF can be effectively classified with only E 1 and E 2 . Also, most samples of LF and GF can be correctively distinguished with only two features E 1 and E 2 . With the whole FSE vectors which includes six features, the GF can be effectively recognized from lightning-caused transients.
Hence, the SVM structure with appropriate kernel functions and parameters can effectively classify different kinds of transient surges.

Transient Recognition
The performance of the trained SVM is tested with test samples, and the recognition results are shown in Table 4. The GF can be discriminated with 100% recognition rates. Only four LF samples are classified to be LDs, and four LD samples are classified to be LFs. The overall recognition rate of proposed FSE-SVM based method is 97.33%, which shows great potential in protection application.

Comparisons
To demonstrate the effectiveness of proposed FSE-SVM-based method, the feature FSE and the classifier SVM are compared with existing popular methods-energy distribution and artificial neural network (ANN), respectively.

Comparison of Features
Energy distribution is one of the commonly used methods for frequency domain analysis of signals, which has the advantages of simple calculation and intuitive expression. However, the energy distribution heavily depends on the amplitude of transient surges in certain frequency band. The energy distribution defined by Equation (13) is used to characterize the frequency spectrum of transient surges [41]: The frequency spectrum of transient surge is generated by Fourier transform. The whole frequency spectrum is divided into M bands. The energy E i of ith frequency band is the norm or square root of the sum of all Fourier coefficients c i . Here, c i stands for all of the coefficients in ith frequency band. The M energy value E i forms an energy distribution vector E .
The energy distribution vector E is used as the feature, and the same SVM structure is adopted as classifier. The training procedure of SVM is as the same as the one discussed in Section 5. Table 5 shows the recognition results. As demonstrated in Table 5, the recognition results of energy representations are lower than the proposed FSE based ones. The overall recognition rate is only 92.33%. The energy-based feature can effectively discriminate LF from other surges. However, it has difficulty in distinguishing faults and disturbances caused by lightning strokes. Only 90% of LDs can be correctly recognized. Among the misjudgments, four LDs are classified as GFs, and six LDs are regarded as LFs. Up to 13% of GF samples are classified as LFs. This might due to the energy attenuation and distortion of transient surges during propagation. The magnitude of the energy value varies a lot. However, the distribution, or disorder, of the frequency spectrum changes a little. When compared with energy based feature, the entropy based spectrum distribution is more effective in representing transient surges.

Comparison of Classifiers
Back-propagation (BP) ANN, which is a multi-feedforward network trained by error inverse propagation algorithm, is one of the widely used neural network models [42]. Different from a single SVM that can only distinguish two kinds of samples, a single BP ANN with proper design can realize recognition of multiple types. As the FSE dimension is six and the number of transient types is three, the number of neutrons of input and output layers of ANN are six and three, respectively. To achieve better performance, a lot of experiments are carried out to select the size of hidden layer, and six neutrons are finally chosen. Hyperbolic tangent function and linear function are selected to be the transfer function of hidden layer and output layer, respectively. The training function that based on gradient descent algorithm and dynamic adaptive learning rate is used.
The samples used in Section 5 are also characterized by FSE and recognized by ANN. Table 6 shows the recognition results of FSE-ANN based method. As shown in Table 6, the overall recognition rate of FSE-ANN based method is a bit lower than that of FSE-SVM based one, which is 97.33%. As the same as SVM based recognition results, misjudgments occur for both LD and LF when ANN is adopted. But more samples are misjudged.

Conclusions
This paper proposed a FSE-SVM-based method to distinguish three kinds of most commonly encountered transient surges in HVDC transmission lines. The proposed method can generate effective recognition results and help improving the reliability of protections with relative lower sampling frequency (100 kHz) and extremely short data segment (3 ms). Simulations and comparisons between the energy-based feature and the ANN classifier demonstrate the FSE is stable in charactering the frequency spectrum of transient surges, and the "One-to-One" SVM structure is simple and effective for training and stable in performance. With training samples from precisely modeled simulation systems, the trained SVM can perform well and respond quickly in practical applications.