A Single-Terminal Fault Location Method for HVDC Transmission Lines Based on a Hybrid Deep Network

: High voltage direct current (HVDC) transmission systems play an increasingly important role in long-distance power transmission. Realizing accurate and timely fault location of transmission lines is extremely important for the safe operation of power systems. With the development of modern data acquisition and deep learning technology, deep learning methods have the feasibility of engineering application in fault location. The traditional single-terminal traveling wave method is used for fault location in HVDC systems. However, many challenges exist when a high impedance fault occurs including high sampling frequency dependence and difﬁculty to determine wave velocity and identify wave heads. In order to resolve these problems, this work proposed a deep hybrid convolutional neural network (CNN) and long short-term memory (LSTM) network model for single-terminal fault location of an HVDC system containing mixed cables and overhead line segments. Simultaneously, a variational mode decomposition–Teager energy operator is used in feature engineering to improve the effect of model training. 2D-CNN was employed as a classiﬁer to identify fault segments, and LSTM as a regressor integrated the fault segment information of the classiﬁer to achieve precise fault location. The experimental results demonstrate that the proposed method has high accuracy of fault location, with the effects of fault types, noise, sampling frequency, and different HVDC topologies in consideration.


Introduction
Renewable power generation has been widely used in recent years. High voltage direct current (HVDC) transmission systems can provide high power transmission capability over long distances. This technology is practical in the long-distance transmission of a large amount of wind power from ocean to land. Using submarine cables at sea and overhead lines on land is a practical method for connecting offshore wind farm stations with shore stations [1]. Owing to its long-distance transmission and complex natural environmental factors, the probability of line failures in this transmission system with mixed lines is high. Therefore, achieving timely and accurate fault location is a prerequisite for improving the reliable operation of the power system.
Fault location techniques can be divided into several categories, such as impedancebased, traveling wave-based, and machine learning-based methods [2]. Impedance-based methods [3] determine the fault distance via using fault voltage and current measurements from one or more terminals to calculate line impedance. These methods use a simplified equation for calculation by ignoring the capacitance and conductance of the transmission line to ground, which limits the accuracy of fault location [4]. Traveling wave-based methods determine the fault distance according to the reflection and refraction phenomenon of the voltage and current traveling wave at the fault point and terminals, and the traveling wave propagation speed [5,6]. Specifically, the single-ended traveling wave method does not require a global positioning system and communication equipment, showing a popular application in the fault location of the HVDC system [7,8]. However, the challenge of this method is that it needs to accurately identify both the reflected and the initial traveling wave head, and determine traveling wave velocity [9]. Classical signal singularity detection methods, including wavelet transform (WT) [10,11], Hilbert-Huang transform (HHT) [12], and variational mode decomposition-Teager energy operator (VMD-TEO) [13,14], are adopted to solve the problem of traveling wave head identification. Traveling wave heads can be identified during low impedance fault (LIF). However, the traveling wave becomes extremely weak during high impedance fault (HIF), and identifying the wave head is difficult, thereby causing huge fault location errors [15]. In addition, the fixed aerial-mode traveling wave velocity was used for calculation in the traveling wave methods [10][11][12][13][14], but the actual velocity with a frequency variation characteristic is difficult to determine [16,17].
With the rapid development of artificial intelligence, machine learning has begun to be applied to various tasks in the electrical engineering field, such as load forecasting [18], optimal scheduling [19], fault diagnosis [20], etc. Machine learning-based methods are considered to be a tool for performing soft computing in fault location [21]. Shallow neural networks were used to distinguish fault locations considering different fault types [22]. In References [7,23], hybrid shallow machine learning and traveling wave methods were used for fault location. Support vector machine was first applied to distinguish the fault sections whether at the cable section or the overhead line section and whether at the front 1/2 or the back 1/2 of the line length of this section. Then fault distances were calculated. However, the methods in References [7,23] still need to obtain accurate traveling wave velocity and understand the complex laws of traveling wave propagation on transmission lines, especially considering the discontinuous wave impedance. Deep learning methods have become an effective means to complete fault location tasks without a lot of expert knowledge. The 1D-convolutional neural network (CNN) model [15] for double-terminal fault location overcomes difficulties in determining traveling wave velocity and identifying the wave head during HIF. However, the double-ended measuring devices increase the cost in actual engineering. In Reference [15], CNN's regression mechanism performed fault location. Its essence is similar to completing a task of time series forecasting. Recurrent networks, such as recurrent neural network (RNN) [24], long short-term memory (LSTM) [25], and gated recurrent unit (GRU) [26] are better than CNN in processing time series forecasting. The advantage of LSTM and GRU over RNN is that they can learn long-range-dependency time series. GRU is a simplified model of the LSTM structure, and its computational efficiency is higher than that of LSTM [26]. The bidirectional gated recurrent unit (Bi-GRU) can learn the characteristics of time series from both forward and reverse directions [27]. These recurrent networks as a regressor can be used for fault location. In Reference [21], Bi-GRU was used to identify faulty lines and locate faults in a distribution network. However, considering the unobvious identification of traveling wave characteristics caused by HIF, it may not be able to solve the problem of the scenario of considering large transition resistance.
The laws of refraction and reflection of traveling waves on actual HVDC transmission lines vary on different fault segments. This research is divided into two tasks including fault segment identification and precise fault location, which are the classification and the regression task, respectively. After the DC side line in the HVDC system fails, the voltage and current measured from the terminals will change. In order to fully integrate the fault information of voltage and current signals, these signals can be converted into 2D tensors similar to single-channel grayscale images. 2D-CNN has been verified its advantages in image classification tasks in References [28][29][30]. Therefore, the fault segment identification task can be completed by 2D-CNN. After the signal output from 2D-CNN passes through the flatten layer, a time series containing rich feature information can be obtained. These time series are used for precise fault location by a regressor such as 1D-CNN, LSTM, GRU or Bi-GRU. Although CNN has feature extraction ability inherently, the learning effect of the model can be improved by feature engineering in the case of a limited sample size. In Reference [15], empirical mode decomposition (EMD) was used for feature engineering, which may have a modal aliasing phenomenon in the EMD decomposition process. In addition, the proposed method in Reference [15] is still strongly dependent on the sample size, and a large amount of memory could be occupied. Therefore, feature engineering and deep learning models need to be further studied for fault location.
The above analysis shows that the regressor needs to integrate the information of the classification results to achieve accurate fault location. Herein, a hybrid model of 2D-CNN and LSTM (CNN-LSTM) was adopted in this study. The main contributions of this work are as follows: (1) CNN-LSTM was used to solve many shortcomings of the single-ended traveling wave method, including high sampling frequency, difficulty to determine wave velocity and identify wave heads when an HIF occurs. It provides high precision and strong robustness to fault types, noise, sampling frequency, and different HVDC topologies in fault location. The rest of the paper is organized as follows: Section 2 introduces the principle of the single-ended traveling wave method and points out the difficulties of its application. The process of feature engineering by VMD-TEO and identifying fault sections by 2D-CNN are introduced. Section 3 describes a fault location approach, and presents a hybrid model comprising CNN and LSTM. Section 4 discusses the simulation parameters, results, and analysis. Section 5 provides the conclusions.

Feature Engineering and Acquisition of Samples
This study mainly investigated the fault location of hybrid transmission lines in a bipolar HVDC system, and its structure is shown in Figure 1. The transmission line on the DC side contains a mix of cables and overhead lines. Their connection point is denoted as J. The DC side close to the power supply and power receiving terminals are denoted as M and N, respectively. To identify the fault sections, 2D-CNN was considered to classify the characteristic quantities of the fault voltage and current signals at terminal M. Feature engineering is extremely important for identifying fault sections. VMD-TEO was mainly used for this task in this work. A large number of fault signals could be obtained as input samples for CNN considering the combination scenarios of different fault locations, fault types, and transition resistance. The process of obtaining one of these samples is described below.
Bipolar lines have mutual coupling problems, and their parameters have frequency conversion characteristics. Phase-mode analysis is necessary to simplify the calculation process before feature extraction of fault signals. The specific process of phase-mode transformation is expressed as the following formula [31]: where i m0 and i m1 are the current or voltage of the ground-mode and aerial-mode traveling wave after decoupling, respectively, and i p and i n are the fault current or voltage of the corresponding positive and negative electrodes, respectively. The principle of VMD-TEO singularity detection was described in detail in References [13,14]. The authors compared this method with WT, HHT, and EMD to prove its excellent detection performance. Feature engineering is mainly through decomposing the fault voltage and current signals at terminal M into several intrinsic mode function (IMF) [32,33] components by VMD and analyzing the first IMF component (IMF1) to obtain Teager energy values (TEVs) by TEO.
To improve the classification effect of 2D-CNN, the TEVs of voltage and current need to be preprocessed using min-max normalization to make them at the interval [0,1], which can be expressed as follows: where x* is the normalized value, and x max and x min are the maximum and minimum values in the sample data, respectively. The specific HVDC simulation model and related parameter settings are described in detail in Section 4. The sampling frequency was selected to be 100 kHz to improve the accuracy of fault location using the traveling wave method. The time window is 40.96 ms, i.e., 4096 sampling points are found in each time window. Considering the influence of the measurement noise at terminal M, 1% reference signal noise is added to the fault voltage and current of the two poles. Phase-mode transformation is first conducted, and the TEVs of the voltage and current are then calculated and normalized. In this way, a matrix with a size of 2 × 4096 is obtained, which is a 2D-tensor as an input sample. Considering different fault distances, transition resistance, and fault type scenarios, many input samples can be obtained.
It is assumed that an NG fault occurs at a distance of 350 km from the M terminal, with the transition resistance of 0.03 Ω, the failure time of 4 s, and the duration of 0.02 s. The aerial-mode component of the fault voltage and fault current after noise is added, and the TEVs before normalization is shown in Figure 2. VMD has a good filtering effect because fault signals in Figure 2c,d are much smoother than Figure 2a,b. The calculation of the TEVs based on IMF1 is shown in Figure 2e,f. They have much more high-frequency characteristic signals than Figure 2c,d. Therefore, more useful information of fault signals can be learned after VMD-TEO feature engineering. There is a slight difference in extreme value distribution in addition to the amplitude of TEVs of voltage and current between Figure 2e,f. The TEVs of voltage and current are constructed into 2D-tensor to complement each other with information. The work of classifying fault segments based on this characteristic information needs to be handed over to 2D-CNN for completion. The process of the above feature engineering is shown in Figure 3.

Fault Segment Identification Based on 2D-CNN
A deep 2D-CNN model was built, as shown in Figure 4. This model consists of 19 layers, including an input layer, six convolutional layers (C 1 -C 6 ), six pooling layers (S 1 -S 6 ), a flatten layer (R 1 ), two dense layers (F 1 , F 2 ), two dropout layers (D 1 , D 2 ), and a softmax layer (F 3 ). The parameter configuration of each layer in this model is listed in Table 1. The convolution layer operation can be described as: where * is the convolution operation, l is the l-th layer in the network, i is the i-th weight in the kernel, j is the j-th kernel, M denotes local feelings, w is the weight of synaptic connections, b is the bias coefficient, z is the result after the convolution operation, x i is the input sample or the feature sample of the previous layer, x j is the output feature sample, and f is the activation function to be selected. The main chosen activation function of the proposed model is the exponential linear unit (ELU). Compared with the rectified linear unit function, ELU has a negative part. The linear part of ELU can alleviate the vanishing gradient problem, and the negative part can be robust to input changes or noise [34].
The pooling layer can compress and filter the feature samples outputted by the convolution layer, thereby reducing redundant information and network parameters and improving the performance and robustness of the network. The pooling layer operation can be described as: x where down(·) is the subsampling method to be selected, and λ is the weight of the pooling layer.
The fully connected layer can integrate the features of all neurons in the previous layer. The first fully connected layer, such as flatten layer (R 1 ) in Figure 4, has a tiling function that can tile neurons on multiple multidimensional feature samples into 1D vectors. The operation of the fully connected layer can be described as: where ζ is the weight of the fully connected layer, and f is the ELU activation function.
To prevent overfitting during the learning process, regularization is needed to reduce the generalization error of the model. Batch normalization can prevent gradient dispersion and explosion. The dropout layer can make neurons randomly deactivate with a certain probability, thereby compressing the number of neurons, improving the model learning speed, and effectively reducing overfitting. Here dropout layers (D 1 , D 2 ) are set as 0.3 and 0.2, respectively. Table 1. Architecture of the proposed 2D-CNN model.

Activation Function Outputs
Input layer -

Single-Ended Traveling Wave Method for Fault Location
The laws of refraction and reflection of traveling waves on actual HVDC transmission lines vary in different fault segments [7]. The reflected wave head identified on the rectifier side may come from the fault point, the connection point of the hybrid line, or even the bus on the inverter side [23]. Different calculation formulas are adopted in the single-ended traveling wave method when faults occur in different sections [7,23]. When the fault section is located in the first or second halves of the cable, the fault traveling wave propagation is depicted in Figure 5. The lengths of the overhead line and the cable are L 1 and L 2 , respectively. As reported in Reference [16], the wave velocity is a frequency-dependent variable. The traveling wave velocities on the overhead line and the cable are v 1 (ω) and v 2 (ω), respectively. Here ω is the frequency when the traveling wave component reaches the rectifier side. ∆t is the time difference between the second traveling wave head and the first traveling wave head identified at the M terminal. The distance from the fault point to the M terminal is x, which is the fault location result. As shown in Figure 5, L 1 < x < L 1 + L 2 /2 when the fault occurs at F 1 , and L 1 + L 2 /2 < x < L 1 + L 2 when the fault occurs at F 2 . When the fault point is at F 1 , the path of the traveling wave is F 1 -J-F 1 -J-M, i.e., the second traveling wave head recognized by the M terminal is the reflected wave from the fault point. The fault distance can be calculated using the following formula: When the fault point is at F 2 , the path of the traveling wave is F 2 -N-F 2 -J-M, i.e., the second traveling wave head recognized by the M terminal is the reflected wave from the opposite bus N. The fault distance can be calculated by the following formula: There is a complicated nonlinear relationship of the fault distance, the time of the traveling wave reaching the rectifier side and the traveling wave velocity.

Fault Location Based on CNN-LSTM
The information on the time of traveling wave reaching the rectifier side and wave velocity can be obtained through combining the fault signal after feature engineering with the corresponding fault distance. A nonlinear relationship is found between these feature quantities and fault distance. These feature quantities have time correlation since it can be regarded as time series. LSTM can be used for the regression prediction of fault distances. The complete fault location process of CNN-LSTM is shown in Figure 6. The fault voltage and current are obtained at the M terminal. High-frequency components (TEVs) of the fault traveling wave are extracted by phase-mode transformation and VMD-TEO. Then TEVs are constructed into a 2D-tensor as an input of 2D-CNN. Finally, the fault location is performed through the trained LSTM model.

Theoretical Background of LSTM
An LSTM network is an improved temporal recurrent neural network. The introduction of a forgetting gate solves the problem of gradient disappearance during training, enabling LSTM to learn the long-term and short-term dependence information of time series [18,30]. Its network basic unit is shown in Figure 7. The basic unit of LSTM contains forget, input, and output gates. Input x t in the forget gate, state memory unit S t−1 , and intermediate output h t−1 together determine the forgotten part of the state memory unit. x t in the input gate is determined by the sigmoid and tanh functions to jointly determine the vector retained in the state memory unit. Intermediate output h t is jointly determined by the updated S t and o t . The calculation formulas in these processes are expressed as follows: where f t , i t , g t , o t , h t , and S t are the states of the forget gate, input gate, input node, output gate, intermediate output, and state unit, respectively, W fx , W fh , W ix , W ih , W gx , W gh , W ox , and W oh are the matrix weights of the corresponding gate multiplied by input x t and intermediate output h t−1 , b f , b i , b g , and b o are the bias terms of the corresponding gate. is the bitwise multiplication of the elements in the vector. σ is the sigmoid function, and ϕ is the tanh function.

CNN-LSTM Hybrid Model for Fault Location
A CNN-LSTM hybrid model showed in Figure 8 is built based on the previous 2D-CNN structure. The model 2D-CNN in the left half as a classifier completes the task of identifying fault segments, and the model LSTM in the right half as a regressor completes the task of precise fault location. The fault segment with the highest probability output from the softmax function is selected as the current fault segment. The feature information output by the flatten layer (R 1 ) in 2D-CNN and the probability information of the fault segment are calculated by the regressor. The experimental results demonstrate that six LSTM layers should be used in the CNN-LSTM network, and each LSTM layer contains 64 neurons. The dense layer (F 4 ) uses a neuron. The loss function is mean squared error, and the optimizer is Adam. The proposed hybrid network model can intelligently integrate information from different fault sections and corresponding fault distances, and continuously optimize and update network parameters to make it close to ideal. With the development of modern data acquisition technology and deep learning technology, the proposed method is feasible for engineering applications. A high-precision and adjustable sampling frequency transient recorder is installed on the M terminal to obtain a large number of fault sample data. The operation of the proposed method can be accelerated by parallelizing on graphical processing units or tensor processing units. After the proposed model is trained, it can be used repeatedly for fault location, and the speed is calculated in milliseconds. Circuit breakers in HVDC systems, which is a key element to interrupt and clear the faults [35]. The fault location speed of the proposed method can meet the operating time requirements of an actual circuit breaker. Feature engineering is performed in MATLAB 2019a. The model training and testing are carried out on the Python3.7 Keras framework.

Simulation Model and Related Parameters
The voltage source converter (VSC)-HVDC transmission system with mixed lines in Figure 1 is constructed on the power system simulation software PSCAD/EMTDC. The model proposed in Reference [14] was used in this study. Its structure and related parameter settings remained constant. In this paper, a bipolar DC transmission line power and a current double closed-loop proportional-integral (PI) control were adopted. The VSC overall control system structure is shown in Figure 9a comprising the inner loop current controller, outer loop power regulator, phase lock synchronization, trigger pulse generation, and other components, in which the inner loop current controller is used to directly control the AC side current waveform and the phase of the converter is applied to rapidly track the reference current.

Experimental Result of Traveling Wave Method
The fault resistance value used to distinguish HIF and LIF is related to specific HVDC system parameters. This research is mainly based on the change of the fault current. An assumption is that a positive ground (PG) fault occurs at a distance of 50, 150, 300 and 400 km from the M terminal. When the transition resistance R f changes, the aerial-mode component of the fault current is shown in Figure 11. When R f is less than 100 Ω, the fault current amplitude is larger and its variety is more obvious than that when R f is between 800 and 1200 Ω. Through this analysis, it can be seen that the fault resistance of LIF and HIF ([0, 100 Ω] and [800 Ω, 1200 Ω], respectively) is one of the reasonable circumstances. As depicted in Reference [7], the hybrid line was divided into four fault segments, and SVM was used to identify these sections. In this work, 2D-CNN was used to achieve this aim. A large number of CNN input samples can be obtained through different fault scenarios. The sampling frequency and time window are 100 kHz and 40.96 ms, respectively. A total of 4096 sampling points is found in each time window. Three fault conditions include PG, NG, and short between positive and negative (PN). The maximum transition resistance is 100 Ω with LIF considered. The performance of the proposed method is evaluated for various fault scenarios under different system conditions. The simulations are conducted with the following values or types: (1) The change step size of the fault distance is taken as 1 km.
(2) Transition resistance R f = 0.0001%, 1%, 2%, 3% . . . 100% of the maximum transition resistance. When LIF occurs, the test results are represented by a normalized confusion matrix, as shown in Figure 13a. The recognition effect of the four fault sections is excellent with an accuracy rate over 99.5% in Figure 13a. Under consideration of HIF, the value range of fault resistance is [800 Ω, 1200 Ω]. In this interval, the change step of fault resistance is 4 Ω. The other failure scenario settings are the same as above, and a total of 151,197 different cases can also be obtained. The normalized confusion matrix of the test results in this case is shown in Figure 13b. The recognition effect of the four fault sections is extremely poor, the lowest and fault recognition rates are 46.1% and 57.3%, respectively. The classification method of the four fault sections can only be used in the case of LIF and is not applicable to the case of HIF.   Assuming that PN LIF occurs at a distance of 450 km from the M terminal, the transition resistance is 10 Ω, the time of failure is 4 s, the duration is 0.02 s, and the fault section can be correctly identified. The fault current signal at the M terminal and the result of the singularity detection using the VMD-TEO are recorded in Figure 14. The IMF1 obtained by VMD can reflect the essential characteristics of the fault signal. After calculating the TEV of IMF1, evident extreme points are obtained. The positions of these extreme points can reflect the positions of singular points of the signal. In accordance with the previous analysis, the first two extreme points are the traveling wave initial wave head and the opposite wave bus head, respectively. The fault distance can be calculated using Formula (7), where t d1 = 4.00175 s, t d2 = 4.00233 s. According to the theoretical analysis [7,14], it is assumed that the parameters of the aerial-mode component do not change much, and the influence of frequency on the traveling wave velocity is ignored. Since the HVDC model and parameters in this study are exactly the same as that described in Reference [14], the fixed aerial-mode traveling wave velocities in Reference [14] are used for calculation, where v 1 = 293,997.1102 km/s, v 2 = 196,333.3333 km/s. The theoretical value of fault distance is 443.063 km, the error distance of fault distance is 6.937 km, and the error percentage is 1.54%. This error does not meet the actual engineering requirements. The possible reason of the error is that fixed traveling wave velocities are adopted while ignoring their frequency variation characteristics.

Experimental Result of CNN-LSTM
Several shortcomings are found in the previous fault location analysis of the traveling wave method. The connection points of the fault sections are not considered. The effect of identifying HIL is extremely poor. Many overlapping fault sections are considered to solve these problems.
The length of the mixed line and the overlap of the fault sections are considered as the length of the sample (L s ) and its offset (∆s), respectively. When ∆s is large, the number of categories (N) of the fault sections is small. When N is small, the recognition rate for HIF is low. When N is large, the data amount and learning effect of CNN-LSTM are affected. Therefore, the selection of appropriate ∆s is very important for accurate fault location.
The  Figure 15. The width of the ribbon in Figure 15 reflects the accuracy range of each fault segment identification, where the edges of the ribbon reflect the maximum and minimum accuracy, and the solid line reflects the average accuracy. When ∆s = 25, i.e., N = 19, the classification effect of HIF and LIF is best with accuracies of 99.98% and 99.99%, respectively.   Figure 16, which is similar to Figure 15. When the number of 2D-CNN layers changes, the accuracy range of the classification in 19 fault sections includes the maximum, minimum, and average values. Properly increasing the number of 2D-CNN layers can improve the classification ability of the model. However, the accuracy of training decreases when the number of 2D-CNN layers exceeds six layers, indicating that the model is overlearning. When the number of 2D-CNN layers is 6, the accuracy of the classifier in the 19 fault sections is close to 100%, and the variation range is the smallest. Therefore, the 6-layer 2D-CNN model is reasonable as a classifier.  Figure 17, which is similar to Figures 15 and 16. When the number of LSTM layers changes, the accuracy range of the regression prediction in each fault section includes the maximum, minimum, and average. The accuracy calculation sets a fault distance tolerance of ±0.5%, thereby meeting the actual engineering requirements. The predicted values of multiple samples for each fault distance are averaged. The predicted value is recorded as accurate when the average is within the fault distance tolerance. When the number of LSTM layers is more than or less than six, the accuracy of the regressor is not as good as that of the six-layer LSTM. The predicted fault distances are highly accurate at each fault section.

Influence of Sampling Frequency
The traveling wave method requires the measurement device to use a high sampling frequency, which is disadvantageous for practical engineering applications. To illustrate the effect of sampling frequency on the experimental results of CNN-LSTM, the sampling frequencies are taken as 100, 50, 20, 10, and 2 kHz. Regressive prediction is performed on 1000 groups of data with fault distances of 50, 150, 250, 350, and 450 km. The accuracy percentage of the fault location is obtained as follows: where y act is the actual fault distance, y pred is the predicted fault distance, N 1 and N 2 are the number of test samples for HIF and LIF, respectively, and N 1 = N 2 = 500. The accuracy of fault location at different sampling frequencies is shown in Figure 19. When sampling frequency is between 20 and 100 kHz, the change in the accuracy of CNN-LSTM fault prediction is relatively stable. Therefore, the method requires lower sampling frequency than the traveling wave method.

Influence of Noise
In the previous experimental analysis, the influence of measurement noise at the M terminal is considered, and 1% reference signal noise is added. Considering the effect of load changes during the actual operation of the HVDC system, 2% and 5% reference signal noises are added to the voltage and current signals at the M terminal. Similar to the process of analyzing the effect of sampling frequency on the accuracy of fault location, 1%, 2%, and 5% reference signal noises are added for fault prediction, and the sampling frequency is 50 kHz. The accuracy of fault location testing at different noise levels is shown in Figure 20. The effect of fault location is slightly affected by different levels of noise environment. The roles of VMD and CNN make the CNN-LSTM fault location method robust to noise.

Comparison of Other Methods
This study reasonably reduces the sampling frequency from 100 to 50 kHz to test HIF. In order to achieve a fair comparative test, the following methods adopt the same training and testing samples. Different feature engineering methods such as WT, HHT and VMD-TEO in this study are considered for experimental comparison. The experimental results are displayed in Table 2. The results by CNN-LSTM in the table are the average of 500 sets of test data. Since the data in the table are average values, it may happen that the test results cannot meet the allowable error range of actual engineering applications. Different feature engineering methods have a great influence on the accuracy of fault location, which can be seen in Table 2. It is verified here that the VMD-TEO method mentioned in our previous work [14] is superior to WT and HHT in terms of singularity recognition and robustness. The error in WT may be caused by the difficulty in using WT to select the basic functions and decomposition scales, and the error source of HHT may be a modal aliasing phenomenon in the EMD decomposition process.  In order to facilitate the comparison of experiments, the 2D-CNN classifier structure of the left half of the proposed model remains unchanged. Other regression algorithms such as 1D-CNN, GRU, and Bi-GRU are used to replace the regressor in the right half of the proposed hybrid model. The experimental results by different regression methods are listed in Table 3. The comparison of the test effects of these three deep learning methods shows that the effect of LSTM in the regressor is significantly better than that of 1D-CNN, GRU, and Bi-GRU. respectively. The fluctuation range of the error value of CNN-LSTM fault location is within the range of error tolerance value of actual engineering application. 1D-CNN has the worst experimental accuracy because its ability to learn the time correlation of fault signals is not as good as GRU, Bi-GRU and LSTM. GRU is a simplified model of the LSTM structure, which reduces the amount of calculation. However, reducing the model parameters may not guarantee that the regression effect is better than LSTM. Bi-GRU is that GRU learns the time series from the forward and reverse directions. The reason that the experimental accuracy of Bi-GRU is not as good as that of GRU and LSTM is that all regression prediction capabilities must come from the forward learning of the Bi-GRU network, and the reverse learning performs poorly on this task.

Other HVDC Model
In order to verify the feasibility of the proposed method in the other HVDC model, the topology in Figure 1 is changed to the form in Figure 21. Here L 1 and L 3 are overhead lines with a length of 150 km, and L 2 is a cable with a length of 150 km. The CNN-LSTM model is retrained according to the previous program, and 10 samples are randomly selected for testing, as shown in Table 4. The error interval of the proposed method is in [0.249, 0.379 km]. This error range also meets engineering needs. By the analysis of Table 4, it can be concluded that this fault location method has higher accuracy when considering different HVDC topologies, and is also rarely affected by fault types and fault resistance.

Conclusions
A deep CNN-LSTM method was proposed to locate the fault in HVDC systems with mixed cables and overhead lines. In the case of LIF and HIF, this method has high fault location accuracy with the effects of fault types, noise, sampling frequency, and different HVDC topologies in consideration.
VMD-TEO is used in feature engineering to improve the learning effect of the model. Experimental results prove that this method is superior to WT and HHT in feature extraction. This single-ended intelligent method transforms the problem of this research into two tasks: fault section identification by a 2D-CNN classifier and fault precise location by an LSTM regressor. Simultaneously, the regressor integrates the fault section information in the classifier, and finally completes the fault location task. Other deep learning methods such as 1D-CNN, GRU and Bi-GRU replace LSTM in the regressor as an experimental comparison. Experimental results show that the fault location accuracy of LSTM is better than other methods.
Regarding the choice of the deep learning algorithm, it needs to be based on actual data. Although it is mentioned in References [26,27] that GRU and Bi-GRU are optimizations of LSTM, the results of this experimental data analysis cannot prove such a conclusion. GRU simplifies the structure of LSTM and accelerates operations, but its learning effect is not as good as LSTM. Bi-GRU learns the characteristics of time series from both forward and reverse directions, but learning from the reverse direction is far worse than from the forward direction. The experimental results of Bi-GRU are not as good as LSTM, and even worse than GRU.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The raw/processed data required to reproduce these findings cannot be shared at this time as the data also forms part of an ongoing study.

Conflicts of Interest:
The authors declare no conflict of interest.