A Support Vector Machine Learning-Based Protection Technique for MT-HVDC Systems

: High voltage direct current (HVDC) transmission systems are suitable for power transfer to meet the increasing demands of bulk energy and encourage interconnected power systems to incorporate renewable energy sources without any fear of loss of synchronism, reliability, and e ﬃ ciency. The main challenge associated with DC grid protection is the timely diagnosis of DC faults because of its rapid built up, resulting in failures of power electronic circuitries. Therefore, the demolition of HVDC systems is evaded by identiﬁcation, classiﬁcation, and location of DC faults within milliseconds (ms). In this research, the support vector machine (SVM)-based protection algorithm is developed so that DC faults could be identiﬁed, classiﬁed, and located in multi-terminal high voltage direct current (MT-HVDC) systems. A four-terminal HVDC system is developed in Matlab / Simulink for the analysis of DC voltages and currents. Pole to ground and pole to pole faults are applied at di ﬀ erent locations and times. Principal component analysis (PCA) is used to extract reduced dimensional features. These features are employed for the training and testing of SVM. It is found from simulations that DC faults are identiﬁed, classiﬁed, and located within 0.15 ms, ensuring speedy DC grid protection. The realization and practicality of the proposed machine learning algorithm are demonstrated by analyzing more straightforward computations of standard deviation and normalization.


Introduction
Intercontinental super-grid and integration of a large number of renewable energy sources to the conventional grid are the fruitful developments achieved through the promising technology of multi-terminal high voltage direct current (MT-HVDC) transmission systems [1][2][3]. Most of the renewable energy sources are located far from load centers, and therefore, the transfer of energy must be done effectively [4]. For example, an offshore super-grid will be inevitable in the future for the interconnection of offshore wind farms of Northern Europe and the United Kingdom [5]. Similarly, solar installations in Africa's deserts can only be connected to load centers of Europe and Asia via MT-HVDC grids [5].
The MT-HVDC system's technical and economic feasibility is proved by recent developments in voltage source converters (VSCs) and DC circuit breakers [6]. However, there is still a challenge in expanding point-point HVDC systems to MT-HVDC systems, i.e., developing reliable and quick protection systems to interrupt the abrupt rise of DC fault currents. In the literature, HVDC point-point links are interrupted from the AC side of the converter station if DC fault persists [6], which results in the shutdown of the entire DC system [7]. Thus, this technique is not recommended for MT-HVDC systems because of the tripping of healthy links along with faulted circuits. Therefore, it is always desired to develop (i) an appropriate relaying mechanism and (ii) HVDC circuit breakers for the Moreover, the SVM based analysis of standard deviation and normalization is added to demonstrate the realization of fault diagnosis in a time frame of 0.15 ms.
This research paper consists of the following sections: The mathematical formulation and proposed fault diagnostic algorithm based on the support vector machine is described in Section 2. MT-HVDC test system is explained in Section 3. Simulation results are discussed in Section 4. Main achievements are added in Section 5. Finally, conclusions are drawn in Section 6.

Support Vector Machines (SVM)
A support vector machine is a statistical supervised machine learning technique employed for both regression and classification. Vapnik and Cortes originally developed it in 1995 [44,45]. Although this technique differs from ANN, some authors sometimes declare SVM as a special type of ANN [46,47]. A rigorous mathematical and statistical approach is employed for the development of SVM [48][49][50][51][52][53]. Both binary and multi-class classification problems are the root cause of the development of its mechanism. In binary linear SVM, the optimal hyper-plane plays a decision-making role for classification between two classes based on training datasets. There are two ways to achieve the optimal classification of training datasets.
Hard margin optimality can be employed to achieve the perfect distinction between training datasets' classes, as shown in Figure 1. Maxima is derived from the hyper-plane decision boundary. It helps to maximize the distance between the hyper-plane and the nearest training data points. Soft margin optimality is employed if no perfect classification is required. Therefore, hyperplane plays a customized trade-off role between two relative extremes that are: Minimizing the failure (misclassification) rate and maximizing the distance between the hyperplane to the properly classified nearest training point.
In SVM classification, the decision boundary hyperplane is evaluated by support vectors. A different dataset can be applied to SVM classifier after the determination of the hyperplane. +1 or −1 is assigned to class depending on the location of the dataset concerning the decision boundary. In the case of a multi-class classification problem, as shown in Figure 2, different approaches, such as pairwise and one-versus-all, are employed to convert multi-class classification to binary-class classification [48]. In addition to this, reformulations of binary classification for compact multi-class classification problems are also developed in literature [54]. All the variables used in this research paper are enlisted in nomenclature section.

Mathematical Formulation
The objective function of the SVM classification problem is given as: 1} represents the class corresponding to the ith input vector.
C is a user-specified parameter that plays a trade-off role between misclassification and maximum inter-class margin. The minimum objective function, i.e., computation of optimal weight vector, should satisfy the following conditions: where is training data set, w is weight vector, b is bias, and ξ is a slack variable. Practically, it is impossible to solve all the classification problems by declaring a simple hyperplane as the decision boundary. Therefore, a more complex and dynamic decision boundary is required. In SVM, non-linear transformation is applied to increase the dimensions of the input space of dimensions m 0 to a feature space of dimensions m f > m o . As a result, the probability of misclassification in the transformed feature space is minimized. Radial basis functions, higher-order polynomials, and sigmoids are common transformation functions. In the non-linear classification problem, the hyperplane decision boundary is associated with feature space and given as: where ϕ(x) is the point of transformed feature space, x ∈ R m o and ϕ(x) ∈ R m f . Weight vector w can be optimized by the Lagrange multipliers method [45] and given as: where α i represents the coefficients of the Lagrange multiplier. The decision boundary can be optimized by: Now with the assumption of u i = α i d i and , the optimized decision function y is simplified as: In the scenario of linear classification, K(x, x i ) represents the traditional Euclidean inner product of the input vector x with the support vector x i . In the scenario of non-linear classification, K(x, x i ) represents the traditional Euclidean inner product of the input vector x with the linear transformation ϕ(x i ) of the support vector x i . The proposed flow chart of support vector machine-based classification is demonstrated in Figure 3.

Principal Component Analysis (PCA)
The principal component analysis is a statistical technique employed to study the inherent structure of the information. This method reduces the dimensions of data based on the rotation of coordinate axes. Eigenvalues and eigenvectors are produced from eigenvalue decomposition for the representation of variation in the sensed information. Uncorrelated lower-dimensional information is extracted from a set of correlated high-dimensional information. PCA minimizes squared reconstruction error in dimensionality reduction. The sensed information is represented by a matrix A (A ∈ p×q ) with p raw samples (rows) and q process variables (columns) and is expressed as: where x i is the ith normalized sample column vector. Correlation is represented by covariance matrix Cov, expressed as: Eigenvalues of Cov are decomposed, and a can be represented by: where a e and a r are the projection vectors of a onto the principal component subspace and residual subspace, respectively. The projection vector of residual subspace a r aids in the identification of faults and can be described as: where E ∈ q×k consists of the former k columns of the eigenvector matrix and its column vectors represent nonnegative real eigenvalues. The increasing magnitude of the corresponding eigenvalues are expressed as: A parameter is defined as an indication of the identification of a fault. It determines whether the sensed data belongs to the particular fault type or not and is expressed as:

PCA-Based SVM
Each sensed data consists of g number of samples. Training of the PCA model is conducted with a random selection of h samples (h < g) for each dataset. The rest of the samples are used to evaluate the accuracy of PCA-based features. The following are the steps behind the identification and classification of faults based on data obtained from multiple sensing terminals:

1.
Sensed data obtained from different sensing terminals are employed as input.

2.
Data is normalized with zero mean and unit variance.

3.
Eigenvalue decomposition is carried out for normalized data. Optimal principal components are determined by the employment of the Scree test [55] given as: where Scr is used to minimize the number of principal components and determine the optimal components. Each q process variable is associated with communalities C i . Scr = 1, when λ i > 1 and C i > 0.5, otherwise Scr = 0. In general, high eigenvalues based components are retained, and low eigenvalues based components are eliminated.

4.
An indication S is created based on the optimal principal components.

5.
The indication is classified based on the model of SVM based S i . PCA model is trained to extract each S i . Indicators S 0 , S 1, S 2 , S 3 has the indexes i = (0, 1, 2, 3). In the process of training, h samples of sensed data are used to prepare each indicator. 6.
The proposed algorithm is terminated with the results of fault identification and classification.

Proposed SVM-Based Protection Algorithm
The        Figure 4. Maximum and minimum values of SD and N, extracted from DC current, increase with the decrease in the distance between the sensing terminal and fault point, as presented in Figure 5.
In NPG fault, measurements of DC voltage and current are taken at the negative pole. SD and N are the features extracted from the measurements. Maximum and minimum values of SD and N, extracted from DC voltage, decrease with the increase in the distance between the converter station and DC fault location, as presented in Figure 6. In the case of supported features derived from DC current measurements, maximum and minimum values of SD and N increase with the increase in the distance between VSC and fault point, as depicted in Figure 7.
In the case of PP fault, DC voltage and DC currents are measured at both positive and negative poles simultaneously. SD and N are the features extracted from pole measurements. In the condition of DC voltage measurements, maximum and minimum values of SD and N increase with the increase in the distance between relay position and fault location for positive pole. The maximum and minimum values of SD and N decrease with the increase in the distance between VSC and fault point for negative pole as depicted in Figure 8. In the condition of measurement of DC current, the decrease in the maximum and minimum values of SD and N with the increase in the distance between sensing terminal and fault location is found for positive pole. The increase in the maximum, and minimum values of SD and N with the increase in the distance between relay terminal and fault location is found for negative pole as presented in Figure 9.

MT-HVDC Test System Understudy
A single line diagram of the four-terminal HVDC system is shown in Figure 10. This test model consists of two offshore VSC stations, i.e., rectifier stations (RS-I and RS-II), and two onshore VSC stations, i.e., inverter stations (IS-I and IS-II). An average two-level VSCs value model is employed [56]. DC voltage droop and reactive power controls are applied to onshore VSC stations [57,58]. At offshore VSC stations, active power and AC voltage controls are used to ensure a constant power flow in the grid via AC voltage regulation [59]. The dq control is applied at the primary level of the VSC stations [60]. In this bipolar HVDC transmission system, positive and negative DC voltages and DC currents are recorded for analysis. Four DC links designated as L 1 , L 2 , L 3 , and L 4 with lengths 300 km, 200 km, 300 km, and 200 km, respectively.

Simulations and Discussion
The four-terminal HVDC test system of Figure 10 is developed in Matlab/Simulink to analyze the proposed protection algorithm. The parameters of the test system are given in Table A1 of Appendix A.
As the prime objective of this research is to identify, locate, and classify the faults in the MT-HVDC system. Therefore, three types of DC faults are studied on the four-terminal HVDC test system. These faults are positive pole to ground (PPG) fault, negative pole to ground (NPG) fault, and pole to pole (PP) fault, respectively.
The DC faults are considered cancer for MT-HVDC systems as the rapid building of DC fault current results in the DC-grid collapse. Faults can be established at different locations within the MT-HVDC grid and thus designated as different cases shown in Table 1.

DC Voltage and DC Current Analysis
Voltage analysis is conducted to shape a theory that any electrical or mechanical change in the MT-HVDC system can be studied through DC voltages.
Similarly, the current is considered the primary measuring component in the MT-HVDC system as DC fault current rises rapidly. Thus, without DC current observations, it is impossible to decide or deploy protective equipment for expensive converter stations. Therefore, DC voltages and currents are measured at all VSC stations (RS-I, RS-II, IS-I, IS-II), as explained in Figure 11.  Under the no-fault case, DC voltages are measured at all four VSC stations, i.e., RS-I, IS-I, RS-II, and IS-II. Under normal conditions, DC voltage reaches a rated DC-link voltage, i.e., 100 kV (1 pu) in less than 0.15 ms. Thus, 1 pu is considered a steady-state. It is found from the analysis that the transient behavior will exist in the system unless there is a charging of DC capacitors and the functioning of DC filters. Variation of DC voltages and decaying down of transients are shown in Figure 12 at all four VSCs. In the no-fault scenario, DC currents are also measured at all VSCs, i.e., RS-I, IS-I, RS-II, and IS-II, respectively, as shown in Figure 13. Transients decayed down in less than 0.15 ms and thus attained steady-state. The DC currents are reduced to zero at VSC stations. The protection scheme for no-fault state identification in the MT-HVDC system is proposed in Figure 14.   This DC voltage analysis is further supported by the DC current analysis, as shown in Figure 15b. It is observed from the simulations that the magnitude of initial transients develops in the PPG fault at the positive pole are approximately three to five times higher than those in no-fault condition.
Hence, dropping DC voltage to zero at the positive pole and high rise magnitudes of transients in DC currents is the main indication of identifying PPG fault. The protection scheme of identification of PPG fault is presented in Figure 16.  This DC voltage analysis is further backed by the current analysis, as shown in Figure 17b. It is observed from the simulations that the magnitude of the initial transients developed during NPG fault at the negative pole is approximately three to five times higher than that in a no-fault state.
Hence, the dropping of DC voltage closed to zero at the negative pole, and high rise magnitudes of transients in DC currents is the main indication of NPG fault identification. The protection scheme of identification of NPG fault is given in Figure 18.  On the other hand, the initial rise in the DC fault current indicates a fault. It is found that the magnitude of transients during PP fault is increased to 3.0 pu value, an indication of fault presence, as shown in Figure 19b.
Further, it is also observed that the offshore VSC stations follow the same characteristics, i.e., maintaining a DC current of 3 pu on both poles for 0.2 ms. In contrast, current in onshore VSC stations decays to zero after developing high rise transients in less than 0.1 ms for IS-I and 0.025 ms for IS-II, respectively. This observation helps to determine the fault location in the case of PP fault. A protection scheme of identification of PP fault is shown in Figure 20. When a PPG fault occurs at 100 km from RS-I at line L 1 , the rapid decay of DC voltage at the negative pole and rapid rise in DC current at the positive pole is observed in Figure 21a. The increase in the magnitude of the DC current at the positive pole is more noticeable compared to the rise in DC current at the negative pole. It is found from the simulations that the rising characteristics of the DC current under PPG fault at 100 km from RS-I are the same as the increasing characteristic observed at PPG fault at RS-I, as shown in Figure 21b. Therefore, more indepth analysis must be carried out, or more features must be explored to determine the exact location of the PPG fault. Similarly, rapid decay in DC voltages at the negative pole and rapid rise of DC fault current at the negative pole is observed in the case of NPG fault at 100 km from RS-I at L 1 as depicted in Figure 22. This observation possesses a close resemblance to NPG fault at RS-I. Therefore, additional features are required to determine the exact location of the fault.     In the last, during PP fault, an abrupt rise in DC current and decay of DC voltage to 0.2 pu at both poles are noticed and are employed for identification, as shown in Figure 26. An in-depth study of transients present in DC voltages and DC currents is used to find the location of the fault in the MT-HVDC system.

SVM Algorithm for Fault Location
It is found from the DC voltage and current analysis that faults in MT-HVDC are not only identified but can also be classified in terms of its types, i.e., PPG fault, NPG fault, and PP fault. However, the location of the fault requires more exploration of features of DC voltage and current.
In this research, a support vector machine (SVM) is applied to determine the location of the fault, and it proved from the simulations that SVM offers a precise determination of fault location in the presence of multi-dimensional data. This technique is memory efficient as SVM learning with short subsets of training data. In this test model, out of total 81, 001 × 17 data values, a sample of 120 × 4 data values is used for training an SVM. Clear separation is achieved in multi-dimensional data by this proposed algorithm. Different cases are made, listed in Table 2, to demonstrate the performance of the proposed technique.   After successful training of the algorithm with 100% accuracy, testing is conducted by applying different cases as described below.

Scenario I
In scenario-I, the trained SVM is tested with the data of DC voltage and current measured at RS-I to determine the location of PPG fault and no-fault cases, as shown in Figure 30  This classification technique is further supported with an analysis of standard deviation and normalization to demonstrate the realization of fault diagnosis through the proposed technique. It is interesting to note that the fault locations can easily be determined with simpler computations of relative maxima and minima of standard deviation and normalization. Therefore, dimensionality reduced data samples are employed for the determination of relative maxima and minima of standard deviation and normalization as shown in Figures 4 and 5.

Scenario II
In scenario II, the trained SVM is tested with the data of DC voltage and currents measured at RS-I to determine the location of NPG fault and no-fault cases, as shown in Figure 31 Table 3 presents the relative maxima and minima values of standard deviation and normalization based on DC voltage and current. These values are employed in the SVM algorithm for supporting the classification based on location in the test system. A reduced computational procedure with reduced featured data samples and a promising attribute with a diagnosis carried out in less than 0.15 ms. Moreover, it provides an insight that this methodology of fault location can be extended to any number of VSCs for an MT-HVDC system, provokes its practical implementation. Moreover, performance evaluation of the proposed algorithm for fault identification, classification, and the location is supported with the confusion matrix results, as shown in Figure 33. The trained algorithm is tested with true and predicted classes of data of different fault locations. Accuracy is proven because of the synchronism between true classes and predicted classes, as shown in Figure 33a. The fault's location is determined with a 100% true positive rate, and the predicted location is classified accurately, depicting the successful implementation within a period of 0.15 ms, as shown in Figure 33b. The probability of predicting the positive value of fault location is 1, as shown in Figure 33c. Hence, the probability of discovering the false value of fault is zero, which depicts the excellent performance of the proposed technique for determining fault location in MT-HVDC systems.

Proposed Structure of SVM-Based Protection Algorithem
The applied structure of the proposed SVM-based protection technique is shown in Figure 34. DC voltage and currents are measured. PCA-based features are extracted from these measurements in order to reduce computational data and time. Relative maxima and minima of PCA-based features are evaluated and are fed into the trained SVM based model. This model-based relay identifies, classifies, and locates the fault and sends a trip signal to the corresponding breaker for fault current interruption.

Performance Comparison of Protection Techniques
Reduced computational time, accuracy, and realization with simpler calculations are the convincing features of this proposed SVM-based protection scheme. A detailed comparison of proposed protection techniques with different latest protection methods from literature is presented in Table 4.   Accuracy is higher in the case of two-terminal data, and accuracy is compromised in the case of single terminal data.
Accuracy is greatly affected by wave-speed, non-synchronization, and sampling frequency.
Complexity Implementation is easy. Implementation is easy Complex digital processing techniques and synchronization (for two-terminal) is required.

Realization and Practicality
A simple analysis of relative maxima and minima of standard deviation and normalization is done to prove its realization.
Error analysis is done to prove its realization Traveling wave-based relays are practically developed for Brasada-Harney transmission system and Bonneville Power Administration's field. Attenuation and dispersion are the factors that change the accuracy of finding fault location.

Additional Discussion and Achievements
As previously explained in the Introduction section, DC grid protection's existing strategies and techniques are inaccurate, not robust, impose electrical noise, slow in response, complex calculations, and require expensive signal processing tools. This means that there exists not a single mature technique for the protection of HVDC systems. Moreover, it is possible to highlight the novel aspects of the proposed approach because it differs considerably from the solutions to be found in the existing literature. Firstly, this technique manages to diagnose the DC fault within a time frame of 0.15 ms. To the best of the author's knowledge, this is the shortest time when the fault is diagnosed, as previous solutions offer comparatively large time for fault diagnosis. Secondly, this technique requires a simpler computations for relay operation. The magnitude of DC voltage and DC current and maxima and minima of standard deviation and normalization are the parameters employed for protective relaying. Indeed, a proposed technique offers reliable and rapid primary protection for DC grids. Thirdly, the proposed scheme utilizes relative maxima and minima values of standard deviation and normalization. This information is easier to understand and is aligned with the technical language of the protection system. Hence an understandable intuition from the human perspective regarding the decision-making process of protective relays is obtained. Fourthly, maxima and minima values of standard deviation also serve as a second step of fault identification and fault type-based classification on DC voltage and DC current values but serve as a first step in the case of fault location. This means that identification and classification are cross-checked in this proposed algorithm, which depicts its security and reliability. Moreover, confusion matrix-based prediction analysis proves its efficiency and enables the research to forecast fault conditions in MT-HVDC systems.
Finally, the feasibility of deploying the proposed fault diagnostic method to a real system is discussed. Computational complexity is reduced by the extraction of DC voltage and current based features. This characteristic reduces computational time, resulting in rapid training and testing of support vector machines. It can be concluded based on simulations that the proposed algorithm is perfectly suitable for real HVDC systems. Although this proposed algorithm involves capital cost for hardware development, software development, and data transformation, it can be viewed as a long term investment enabling reliability and successful integration of renewable energy. In short, it becomes profitable after a certain number of years because of the secure transfer of green energy.

Conclusions
In this research, a support vector machine-based protection technique is proposed for MT-HVDC systems. Simulations are developed for a four-terminal HVDC system in Simulink/Matlab under the no-fault, pole to ground fault, and pole to pole fault conditions and at different locations. The proposed protection technique offers rapid fault identification, classification, and location within a timeframe of 0.15 ms, enabling circuit breakers to respond accordingly. Computational complexities and training time of SVM for fault diagnosis are reduced by PCA based features. Therefore, building up large DC fault currents that could damage the MT-HVDC systems and expensive power electronic circuitries is evaded by quick fault diagnosis. Moreover, the robustness and abruptness of the proposed protection algorithm enhance the safety margins of VSC converters. Thus, fewer insulations are required for converter stations against the rapid rise of DC fault currents under abnormal conditions. An applied demonstration of the proposed protection algorithm is carried out by the supportive analysis of relative maxima and minima of standard deviation and normalization. Confusion matrices confirm its realization with more straightforward computations. Acknowledgments: The authors are thankful to the Department of Electrical Engineering, University of Lahore, Lahore, Pakistan, for providing facilities to conduct this research.

Appendix A
The parameters of the test system are presented here.