Intelligent Sensors for dc Fault Location Scheme Based on Optimized Intelligent Architecture for HVdc Systems

We develop a probabilistic model for determining the location of dc-link faults in MT-HVdc networks using discrete wavelet transforms (DWTs), Bayesian optimization, and multilayer artificial neural networks (ANNs) based on local information. Likewise, feedforward neural networks (FFNNs) are trained using the Levenberg–Marquardt backpropagation (LMBP) method, which multi-stage BO optimizes for efficiency. During training, the feature vectors at the sending terminal of the dc link are selected based on the norm values of the observed waveforms at various frequency bands. The multilayer ANN is trained using a comprehensive set of offline data that takes the denoising scheme into account. This choice not only helps to reduce the computational load but also provides better accuracy. An overall percentage error of 0.5144% is observed for the proposed algorithm when tested against fault resistances ranging from 10 to 485 Ω. The simulation results show that the proposed method can accurately estimate the fault site to a precision of 485 Ω and is more robust.


Introduction
To date, China celebrates the completion of 30,000 km of ultra-high-voltage lines connecting six regional grids with a total transmission capacity of close to 150 gigawatts [1]. However, power engineers struggle to manage and regulate the impact of dc-link faults in hybrid ac/dc systems [2]. Let us say the 8 GW dc-link from Gansu reports a fault unexpectedly, and the protection algorithm cannot locate it. The power outage might start a chain reaction, resulting in widespread blackouts throughout Hunan and beyond. As a result, ensuring accurate fault location is beneficial to minimize the threat of possible failure and is a prerequisite for the successful and safe operation of dc transmission systems [3]. Furthermore, accurate fault location estimation is important for maintaining the voltage stability of the power system [4] and operating the electricity market efficiently [5].
The prediction of correct fault sites in dc transmission systems has been shown to be reliable by frequency extraction, fault signal analysis, and travelling-wave (TW) approaches in previous studies [2,3,6]. Currently, TW methods based on the concept of travelling-wave reflections are preferred in dc transmission projects since they are highly accurate, reliable, and have high fault resistance [7]. The advancement of TW theory has led to the development of several signal processing techniques, such as wavelet transformation (WT) [8], S and Hilbert-Huang transform [6,9], empirical mode decomposition (EMD) [10], etc. A waveform's characteristics are analyzed using approximate or detailed coefficients in WT to predict fault locations. However, conventional TW methods require a very high sampling

•
Our initial goal is to create a learning-based algorithm that relies on only one end of the communication link for fault location. Hence, eliminating reliance on the communication link.

•
In general, a signal detected by a sensor is invariably interfered with by the surrounding environment or modified by the detecting equipment during the detection process, increasing failure chances. The DWT-based signal analysis model is used to eliminate interference from the observed signal to improve signal analysis and recognition.

•
The energy or norm of the current and voltage signals at each frequency band gives a unique signature for different fault locations and has been found to be robust against noise. Therefore, it is used as an extracted feature for pattern recognition. • The proposed algorithm must be able to locate internal faults with high fault impedances at further distances.
The remainder of the paper is organized in the following way. Section 2 discusses the mathematical model derivation from a simple backpropagation algorithm to the improvised backpropagation algorithm. It also discusses the implementation of the proposed framework as well. Meanwhile, Section 3 introduces the conditions and properties of the chosen system model, which has been developed to capture fault data under dynamic fault scenarios. Section 4 covers the methodologies utilized to analyze input features extracted under dynamic fault scenarios. It also covers a denoising scheme that is used to denoise features before training and data preprocessing. Section 5 presents comparisons and analyses against adversaries. Finally, Section 6 concludes with a summary of the proposed algorithm. An overview of the proposed method is presented in the next section. Figure 1 illustrates the architecture with the proposed methodology. During fault localization, the proposed method has two stages. In the first one, the captured fault window is filtered using discrete wavelets to rectify the noise issue, set to 10 ms. After denoising, the measured time-domain segment is transformed into a time-frequency domain by splitting it into low-and high-frequency components with a DWT-based multi-resolution analysis (MRA) technique.

Proposed Framework
mentation of the proposed framework as well. Meanwhile, Section 3 introduces the conditions and properties of the chosen system model, which has been developed to capture fault data under dynamic fault scenarios. Section 4 covers the methodologies utilized to analyze input features extracted under dynamic fault scenarios. It also covers a denoising scheme that is used to denoise features before training and data preprocessing. Section 5 presents comparisons and analyses against adversaries. Finally, Section 6 concludes with a summary of the proposed algorithm. An overview of the proposed method is presented in the next section. Figure 1 illustrates the architecture with the proposed methodology. During fault localization, the proposed method has two stages. In the first one, the captured fault window is filtered using discrete wavelets to rectify the noise issue, set to 10 ms. After denoising, the measured time-domain segment is transformed into a time-frequency domain by splitting it into low-and high-frequency components with a DWT-based multi-resolution analysis (MRA) technique. The multilayer neural network receives decluttered information from voltage and current signals as inputs. It then estimates the fault site using the activation function. Levenberg-Marquardt backpropagation (LMBP) is implemented instead of a standard backpropagation algorithm with a performance function. It is a function of the ANN regression model and ground truth of fault sites. The Levenberg-Marquardt method is used to update the weight and bias. The Jacobian matrix of the performance function with respect to the weight and bias variables is calculated via the proposed backpropagation algorithm. After updating the weight and bias, the multilayer ANN is applied to determine the fault site. The multi-stage BO procedure is conducted prior to the update, aiming to increase accuracy during training and to provide an optimal multilayer FFNN by optimizing hyperparameters. The hyperparameters, unlike internal parameters (weights, bias, etc.), are set before the neural network is trained, and they influence the neural network's performance. Regulating them via the trial-and-error method lengthens the training set-up time and may reduce accuracy. Hence, optimizing these hyperparameters enhances the accuracy and convergence speed [24]. In the following sub-section, the detailed architecture of the proposed algorithm is described. The multilayer neural network receives decluttered information from voltage and current signals as inputs. It then estimates the fault site using the activation function. Levenberg-Marquardt backpropagation (LMBP) is implemented instead of a standard backpropagation algorithm with a performance function. It is a function of the ANN regression model and ground truth of fault sites. The Levenberg-Marquardt method is used to update the weight and bias. The Jacobian matrix of the performance function with respect to the weight and bias variables is calculated via the proposed backpropagation algorithm. After updating the weight and bias, the multilayer ANN is applied to determine the fault site. The multi-stage BO procedure is conducted prior to the update, aiming to increase accuracy during training and to provide an optimal multilayer FFNN by optimizing hyperparameters. The hyperparameters, unlike internal parameters (weights, bias, etc.), are set before the neural network is trained, and they influence the neural network's performance. Regulating them via the trial-and-error method lengthens the training set-up time and may reduce accuracy. Hence, optimizing these hyperparameters enhances the accuracy and convergence speed [24]. In the following sub-section, the detailed architecture of the proposed algorithm is described.

Feedforward Neural Network (FFNN)
This study uses a feedforward neural network with a single hidden layer to model because a neural network with a single hidden layer can handle the most complex functions (i.e., one input layer, one output layer, and one hidden layer) [27]. In a multilayer FFNN, the basic building block is a neuron that mimics a biological neuron's functions and behavior [27]. The schematic structure based on the neuron is shown in Figure 2.

Feedforward Neural Network (FFNN)
This study uses a feedforward neural network with a single hidden layer to model because a neural network with a single hidden layer can handle the most complex functions (i.e., one input layer, one output layer, and one hidden layer) [27]. In a multilayer FFNN, the basic building block is a neuron that mimics a biological neuron's functions and behavior [27]. The schematic structure based on the neuron is shown in Figure 2. Usually, a neuron has multiple inputs. Each element of the input vector p = [p1, p2, K, pR] is weighted by elements w1, w2, K, wj of the weight matrix . Next, the bias of each neuron is summed with the weighted inputs to form the net-input n, expressed as: Following that, net-input n is sent via an activation function f, which results in the neuron's output . Mathematically expressed as: In this work, the activation function is based on the hyperbolic tangent sigmoid transfer function. The following equation presents it.
With reference to Figure 2, the multi-input FFNN executes the following equation: The output of the neural network is represented by . R stands for the number of inputs, the number of neurons in the hidden layer is denoted by S, and the jth input is represented by . The activation functions of the output and hidden layers are represented by f 2 and f 1 , respectively. The bias of the ith neuron is defined by , whereas the bias of the neuron in the output layer is represented by . The weight , represents the connection between the jth input and the ith neuron of the hidden layer. Meanwhile, the weight connecting the ith hidden layer source to the output layer neuron is denoted by , .

Backpropagation Algorithm
Following the definition of the FFNN, the next step is to create an algorithm for training such networks. To train the established multilayer FFNN, an error backpropagation algorithm based on the steepest descent technique is typically utilized [28]. For the Usually, a neuron has multiple inputs. Each element of the input vector p = [p 1 , p 2 , K, p R ] is weighted by elements w 1 , w 2 , K, w j of the weight matrix W. Next, the bias of each neuron is summed with the weighted inputs to form the net-input n, expressed as: Following that, net-input n is sent via an activation function f, which results in the neuron's output a. Mathematically expressed as: a = f (n) (2) In this work, the activation function is based on the hyperbolic tangent sigmoid transfer function. The following equation presents it.
With reference to Figure 2, the multi-input FFNN executes the following equation: The output of the neural network is represented by a 2 . R stands for the number of inputs, the number of neurons in the hidden layer is denoted by S, and the jth input is represented by p j . The activation functions of the output and hidden layers are represented by f 2 and f 1 , respectively. The bias of the ith neuron is defined by b 1 i , whereas the bias of the neuron in the output layer is represented by b 2 . The weight w 1 i,j p j represents the connection between the jth input and the ith neuron of the hidden layer. Meanwhile, the weight connecting the ith hidden layer source to the output layer neuron is denoted by w 2 1,i .

Backpropagation Algorithm
Following the definition of the FFNN, the next step is to create an algorithm for training such networks. To train the established multilayer FFNN, an error backpropagation algorithm based on the steepest descent technique is typically utilized [28]. For the proposed three-layer FFNN, we now express the function that represents the output of unit i in layer m + 1 as: Then to propagate the function and generate net-input (n m+1 (i)) to unit i, the neuron in the first layer receives extracted features from the MT-HVdc system to provide an initial condition for Equation (5): Equation (5) is further translated in matrix form for an M number of layers in a neural network as: a m+1 = f m+1 W m+1 a m + b m+1 , . . . , m = 0, 1.
where a m+1 and a m are the outputs of the network's (m+1)th and mth layers. b m+1 reflect the bias vector of the network's (m+1)th layer. Here, external inputs passing to the network via Equation (7), the overall network's outputs are equal to the outputs of the neurons in the last layer: The objective of this study is to locate the dc-link faults. Therefore, the proposed multilayer FFNN requires a set of input-output pairs that characterize the behavior of an MT-HVdc system under faulty settings. Mathematically expressed as: (p 1 , t 1 ), (p 2 , t 2 ), (p 3 , t 3 ), . . . , . . . , . . . , p Q , t Q , p q is input and t q is the relevant target of the network that uses for training.
After each input propagates through the multilayer FFNN during training, the network output is compared to the target. While doing so, the performance index for the backpropagation algorithm is the mean-square error (MSE), which is to be reduced by modifying the network parameters, given as: In the FFNN, x is the vector matrix containing the network weights and biases. However, in our case, the proposed network has multiple outputs. Therefore, Equation (10) generalized to: Since the steepest descent rule is utilized for the standard backpropagation algorithm, the performance index F(x) can be approximated as follows: The squared error replaces the expectation of the squared error in Equation (11) at iteration step k. The steepest (gradient) descent algorithm for the estimated MSE is then: ∝ is the learning rate, similar to the number of neurons (S); it is also a hyperparameter. Defined: as the performance index ( F ∧ ) sensitivity (s m i ) that measures the changes in the net input of the ith element in layer m. Next, based on the chain rule, the derivate of Equations (13) and (14) using Equations (5), (12) and (15) can be simplified as: Now with the definition of gradient, the steepest descent algorithm is approximated as: The following recurrence relation in matrix form can be satisfied by the sensitivity [29,30]: Equation (20) expresses the step used to propagate the sensitivities backward through a neural network. Mathematically, the sensitivities propagate backward across the network as: where Whereas a recurrence relation is initialized at the final layer as: Now, we can summarize the overall backpropagation (BP) based on the steepest descent algorithm as (1): First, use Equations (6)- (8) to propagate the input through the network. (2): Next, using Equations (20) and (24), backpropagate the sensitivity. (3): Finally, using Equations (18) and (19), update the weights and biases.

Levenberg-Marquardt Backpropagation
The backpropagation algorithm exhibits asymptotic convergence properties while training the multilayer FFNN, which causes a slow convergence rate due to minor weight changes around the solution. Meanwhile, Levenberg-Marquardt (LM) backpropagation [29] is a variant of Newton's method, which inherits the stability of the steepest descent algorithm and the speed of the Gauss-Newton algorithm [27,29,30]. Now, suppose we want to optimize performance index F(x); then, Newton's method is: Note that ∇ 2 F(x) represents the Hessian matrix, and ∇F(x) denotes the gradient. Let us assume that F(x) is a sum-ofsquares function, then: Then the gradient and Hessian matrix are expressed in matrix form as: J(x) denotes the Jacobian matrix as: Assume that S(x) ≈ 0, then Equation (30) (Hessian matrix) approximate as ∇ 2 F(x) ∼ = 2 J T (x)J(x). Next, Equation (25) updates after substituting Equation (27) and the approximation of Equation (28) as: The matrix (H = J T J ) may not be invertible using the Gauss-Newton method. This issue can be fixed by making the following changes to the approximation Hessian matrix: This modification to the Gauss-Newton method eventually leads to the LM algorithm [29]: Now, using the ∆x k direction, recalculate the approximated F(x). If a smaller number is obtained, then the computation procedure is repeated, but the parameter µ k is divided by a factor (α > 1). If the value of F(x) does not decrease, then the value of µ k for the next iteration in the step is multiplied by α.
The calculation of the Jacobian matrix is an essential step in the LM method. The elements of the Jacobian matrix are calculated using a slight modification to the BP algorithm to address the NN mapping difficulty [29]. For better understanding, similar to Equation (12)   Similarly, the n subscript is defined as n = S 1 (R + 1) + S 2 S 1 + 1 + . . . + S M S M−1 + 1 in the Jacobian matrix. Now making all these substitutions in Equation (29) of the Jacobian matrix as: Until now, the standard BP algorithm has been used to calculate the Jacobian matrix terms as follows: Meanwhile, in the LM algorithm, the terms for the elements of the Jacobian matrix can be calculated using the following: Thus, rather than computing the derivatives of the squared errors as in standard backpropagation, we are calculating the derivatives of the errors in this modified Levenberg-Marquardt algorithm. Similar to the concept for standard backpropagation sensitivities, a new Marquardt sensitivity is defined as follows: if x I is a bias, As previously stated, the Marquardt sensitivity can be determined using the same recurrence relation as the standard sensitivities. However, toward the conclusion of the final layer, there is only one modification for calculating the new Marquardt sensitivity: for i = k, it is equal to zero. Note that f ∧ M and its matrix can be defined with the help of Equations (22) and (23). In the proposed model, when extracted features from the MT-HVdc network are applied to the multilayer FFNN as an input (p q ) and the corresponding output (a M q ) is processed, the LMBP algorithm is initialized with the following: Each column of the matrix in Equation (41) is a sensitivity vector that must propagate back through the network to generate one row of the Jacobian matrix. The columns are propagated backward as follows: The augmentation that follows then obtains all of the Marquardt sensitivity matrices for the overall layers.
The proposed algorithm based on Levenberg-Marquardt's backpropagation algorithm for fault allocation is given for clarity in Table 1. Table 1. LMBP algorithm.

a.
With initial weights and bias (randomly generated), all extracted features should be fed into the FFNN as inputs. The outputs of the corresponding features are computed in the network using Equations (6) and (7), followed by error prediction using , calculate the sum of squared errors for all inputs with the Q targets in the training set. c.
After initializing with Equation (41), calculate the sensitivity using Equation (42) and augment the individual matrices into the Marquardt sensitivities using Equation (43). Meanwhile, Equations (37) and (38) are used to determine elements of the Jacobian matrix. d.
Then, to obtain ∆x k , update Equation (33) to adjust weights and biases. e.
Using x k + ∆x k recalculate the total of the squared errors. If the newly generated error value is less than the previous one, then divide µ k by α and return to step a with x k+1 = x k + ∆x k . If the recalculated value does not decrease, then multiply µ k by α and return to step c with the new weights.

Parameter Optimization
Hyperparameters should be distinguished from internal parameters such as weights and biases that are taken into account by the Levenberg-Marquardt backpropagation algorithm in the FFNN model. However, finding values for hyperparameters is a nonconvex optimization process for optimal fitting. This is because, like the MT-HVdc system, most existing systems do not have linear responses to their control parameters. From the standpoint of optimization, the problem can be presented as follows: x is the input vector (control parameters) of dimension d. f (x) is an objective function that depicts a multiscale system with high dimensional control parameters functioning under high-speed channels, such as an FFNN-based relaying model under dynamic conditions to protect the MT-HVdc grid. It is not a simple task to create a precise and accurate model of such systems in this situation. As a result, it is necessary to approach the problem in Equation (44) using the black-box settings shown in Figure 3.
is the input vector (control parameters) of dimension . ( ) is an objective function that depicts a multiscale system with high dimensional control parameters functioning under high-speed channels, such as an FFNN-based relaying model under dynamic conditions to protect the MT-HVdc grid. It is not a simple task to create a precise and accurate model of such systems in this situation. As a result, it is necessary to approach the problem in Equation (44) using the black-box settings shown in Figure 3.

Black-Box Settings
In most black-box systems, including MT-HVdc grids relaying models, it is not easy to acquire f (x) gradient information at an arbitrary value of x. However, gradient information is not required when employing BO based on Gaussian processes (GPs) [31]. As a result, it is a promising and appropriate candidate for black-box optimization. While optimizing, BO is an active learning method that chooses the next observation to maximize the reward for solving Equation (45). Its foundation is Bayes' Theorem.
, P( f |D 1:t ) and P( D 1:t | f ) are probabilities of prior, posterior, and likelihood based on the current observations, i.e., . Various predictive and distributional models can be used as priors in BO, but the GP is preferred due to its practical and theoretical advantages [31].

Gaussian Process (GP)
In the GP, the surrogate model replicates the behaviors of the expensive underlying function. While doing this, the underlying function f (x) that requires optimization is represented in BO as a collaborative and multidimensional Gaussian process. The mean (µ) and covariance (K) functions are calculated using: In BO, Equation (46) illustrates the process in which the predictive GP is trained. It is worth noting that, unlike other machine-learning algorithms, the goal of BO is to properly forecast where global extrema are situated in the sample space based on previous observations rather than to develop predictors that cover the entire sample space. Furthermore, the problem in Equation (44) is solved using black-box settings, implying that we do not have any prior information about the underlying function. Therefore, to improve the regression quality of the GP, we use a popular kernel/covariance function called the automatic relevance determination Matern 5/2 function in conjunction with a zero-mean GP for P( f ), given as: where /σ 2 d )) 1/2 ; σ f and σ d are hyperparameters of K(x). These hyperparameters are modified throughout the training phase to reduce the GP's negativelog marginal likelihood using the global or local method. Each parameter in an ARD-type kernel has a scaling parameter that must be set. If the σ d of one parameter is larger than the others after the GP-based predictive model has been trained, then it can be assumed that a change in this parameter has less sensitivity on the prediction. Furthermore, if a certain parameter has a greater effect, then the proposed solution in BO will alter the training process to reduce σ d of that parameter in comparison to others. These advantages make the underlying function more interpretable and serve as an implicit sensitivity analysis.

Acquisition Function
Since the original function f (x) is hard to estimate, based on a predefined strategy and auxiliary optimization, an acquisition function u(x) is obtained to find the next point x t+1 of the solution. It is worth noting that u(x) does not require any additional points; instead, it relies on past sample knowledge to make predictions at candidate points.
Then predictive distribution at the next point is given as: The most prominent acquisition functions in BO are the probability of improvement, upper confidence bound and expected improvement per second. However, we propose an expected improvement per second-plus in this paper. In comparison, it allows for faster model building and optimization, and the term 'plus' prevents a region from overexploiting (more search for a global minimum). Expected improvement (EI) is given as: where f ∧ * is the best point observed so far. ζ is a hyperparameter for µ(EI), and Φ(.) are the probability density function and cumulative distribution function of normal distribution. Further interpreted in EI per second (EIpS) as: where µ S (x) is the posterior mean of the timing Gaussian process model, respectively. The next sampling point x t+1 is found by minimizing the expected improvement per second-plus EI pSp(x) acquisition function.
In doing so, the proposed acquisition function escapes the local objective function minimum and searches for a global minimum by setting σ f (x) to be the posterior objective function (P( f |D 1:t )) standard deviation at point x. Let σ NP be the additive noise posterior standard deviation so that σ 2 The positive exploration ratio is denoted by t σNP . After each iteration, the acquisition function evaluates if the next point x satisfies σ f (x) < t σNP σ NP . If this is the case, then the acquisition function will announce that x is overexploiting and adjust its kernel function by multiplying θ by the number of iterations [32]. When compared to EI pS(x), this adjustment increases the variation σ Q for points between observations. It then creates a new point using the newly fitted kernel function. However, if the new point x is still being overexploited, then the function multiplies θ by a factor of ten and tries again. This process is repeated five times, with the goal of generating a point x that is not overexploited. The new x is accepted as the next exploration ratio by the proposed acquisition function. As a result, it manages the tradeoff between examining new points, searching for a better global solution, and focusing on nearby already investigated points. The whole process optimizes the FFNN structure in a much faster and more efficient manner with a reduced computation burden.

Implementation of Proposed Framework
The steps to train the FFNN model with the LMBP algorithm and optimize network hyperparameters with the Bayesian algorithm are demonstrated in Figure 4.  In step 1, fault location and impedance are modified to create the training and testing datasets for several simulations. Additionally, data events are labeled and normalized according to criteria to improve the training process in this mode. The ANN hyperparameters are determined by feeding the training dataset into BO's AI model until the maximum number of iterations is reached. The AI model is updated each time the maximum number of iterations is reached. In step 2, the optimal hyperparameters of the ANN, which gives the minimum root-mean-square error (RMSE), are selected by BO, and the FFNN is trained for the given training data with the help of the LMBP algorithm. In step 3, the trained ANN model is evaluated on a different testing dataset from the training dataset.
To prevent overfitting, K-fold cross-validation was used during the assessment with K = 5. RMSE = ∑ ( − ∘ ) , Rz stands for the data size, yn for the actual output, and ∘ represents the predicted output. The proposed framework can now be implemented; In step 1, fault location and impedance are modified to create the training and testing datasets for several simulations. Additionally, data events are labeled and normalized according to criteria to improve the training process in this mode. The ANN hyperparameters are determined by feeding the training dataset into BO's AI model until the maximum number of iterations is reached. The AI model is updated each time the maximum number of iterations is reached. In step 2, the optimal hyperparameters of the ANN, which gives the minimum root-mean-square error (RMSE), are selected by BO, and the FFNN is trained for the given training data with the help of the LMBP algorithm. In step 3, the trained ANN model is evaluated on a different testing dataset from the training dataset.
To prevent overfitting, K-fold cross-validation was used during the assessment with K = 5. RMSE = 1 R z ∑ R z n = 1 (y n − y • n ), R z stands for the data size, y n for the actual output, and y • n represents the predicted output. The proposed framework can now be implemented; a system model will be presented in the next section, which enables the collection and analysis of input features for fault types, matching the theoretical foundation to real-world fault scenarios, and using intelligent computation to train and evaluate the framework's effectiveness.

System Model
The electrical power from two offshore wind farms is transferred to two onshore converters through dc transmission, as shown in Figure 5 [33]. A boundary is defined by installing current limiter inductors at the end of a dc line. Other test grid settings and MMC parameters are provided in Tables 2 and 3. The cable specifications are provided in Table 4. It is a single-end scheme, which means that information will be gathered near circuit breakers and inductor lines. analysis of input features for fault types, matching the theoretical foundation to real-world fault scenarios, and using intelligent computation to train and evaluate the framework's effectiveness.

System Model
The electrical power from two offshore wind farms is transferred to two onshore converters through dc transmission, as shown in Figure 5 [33]. A boundary is defined by installing current limiter inductors at the end of a dc line. Other test grid settings and MMC parameters are provided in Tables 2 and 3. The cable specifications are provided in Table  4. It is a single-end scheme, which means that information will be gathered near circuit breakers and inductor lines.

Model Output
As shown in Table 5, the examined system model has several outputs that can be used to determine fault distance from a relay contact point. Additionally, it shows fault resistances and fault types along with a total of 714 dc-link fault scenarios (k) for training. By doing so, dc-link faults are categorized into pole-to-pole (PTP) and pole-to-ground (PTG) faults. It is important to note that a dc-link problem is an internal fault, so the criteria (dV dc /dT) should be applied when an internal failure occurs. By activating this criterion, the trained algorithm begins sampling relevant values for the 10 ms time window and estimating the fault distance. Note that the fault detection strategy is selective in nature.

Data Processing
The fact that the initial travelling waves of the voltage and current induced by the dc-link faults from the system above contain helpful information about fault distance is exploited in this study [7]. However, noise interference is expected, considering the dynamic disturbances associated with the MT-HVdc system. Therefore, the following sub-section discusses the noise suppression mechanism before processing data for the regression model.

Signal Processing
The implementation of the DWT to suppress noises from a measured signal is shown in Figure 6

Setting Numbers of Decomposition Layers
Transforming discrete wavelets into more decomposition layers helps separate noise from the original signal, resulting in better signal filtering. We have chosen eight levels to keep the balance between signal processing burden and robustness against noise, corresponding to the frequency band of 195.3-390.6 Hz at a sampling frequency of 50 kHz.

Selection of Mother Wavelet Function
The next critical step in the denoising scheme is choosing a mother wavelet. A literature review and practical results presented in the previous studies show that Daubechies (dB) is an appropriate mother wavelet for analyzing fault signals [35]. It is suggested that, in this study, the Pearson correlation coefficient be used to determine the correlation between the Daubechies wavelet function and the cable fault signals in order to determine the best mother wavelet function. The mother wavelet function is written as follows:

Setting Numbers of Decomposition Layers
Transforming discrete wavelets into more decomposition layers helps separate noise from the original signal, resulting in better signal filtering. We have chosen eight levels to keep the balance between signal processing burden and robustness against noise, corresponding to the frequency band of 195.3-390.6 Hz at a sampling frequency of 50 kHz.

Selection of Mother Wavelet Function
The next critical step in the denoising scheme is choosing a mother wavelet. A literature review and practical results presented in the previous studies show that Daubechies (dB) is an appropriate mother wavelet for analyzing fault signals [35]. It is suggested that, in this study, the Pearson correlation coefficient be used to determine the correlation between the Daubechies wavelet function and the cable fault signals in order to determine the best mother wavelet function. The mother wavelet function is written as follows: where X is the original fault signal, X denotes the original fault signal's average, Y denotes the noise-eliminated fault signal, and Y denotes the noise-eliminated fault signal's average.

Set the Threshold and Filter the Signal
After selecting the mother wavelet, the noise from the fault signal can be filtered out. The Universal threshold is multiplied by the median of each decomposition layer after wavelet decomposition to automatically set the threshold, as expressed: λ j is the threshold of the jth decomposition layer, σ j is the median of the jth decomposition layer, and n j is the signal length of the jth decomposition layer. After setting the threshold, the noise is filtered out through the thresholding process. This thresholding process usually includes soft and hard thresholds [35]. However, in this study, a hard threshold is set to filter out the noise.
This equation demonstrates that the hard threshold retains a larger wavelet coefficient while the coefficient below the threshold is set to zero. Finally, using inverse DWT (IDWT), the signal processed by the hard threshold can be configured layer by layer into a noise-free signal. The implementation of the proposed denoising approach with a 20 dB signal-tonoise ratio (SNR) is shown in Figure 7. where is the original fault signal, denotes the original fault signal's average, denotes the noise-eliminated fault signal, and denotes the noise-eliminated fault signal's average.

Set the Threshold and Filter the Signal
After selecting the mother wavelet, the noise from the fault signal can be filtered out. The Universal threshold is multiplied by the median of each decomposition layer after wavelet decomposition to automatically set the threshold, as expressed: is the threshold of the th decomposition layer, is the median of the th decomposition layer, and is the signal length of the th decomposition layer. After setting the threshold, the noise is filtered out through the thresholding process. This thresholding process usually includes soft and hard thresholds [35]. However, in this study, a hard threshold is set to filter out the noise.
This equation demonstrates that the hard threshold retains a larger wavelet coefficient while the coefficient below the threshold is set to zero. Finally, using inverse DWT (IDWT), the signal processed by the hard threshold can be configured layer by layer into a noise-free signal. The implementation of the proposed denoising approach with a 20 dB signal-to-noise ratio (SNR) is shown in Figure 7.

Feature Extraction Set-Up
After selecting and denoising the signal, the feature extraction stage is critical for data-driven-based fault detection and location estimation problems. Extracted features are measurable data taken from the transient of the current-and voltage-filtered signals to create a feature vector. This feature vector should be dimensionally compact to successfully implement the learning and generalization processes in the estimation algorithms for fault location. The feature extraction stage is divided into two sub-stages. The first stage involves decomposing all generated samples for each fault location up to eight levels using DWT-MRA to obtain wavelet coefficients. The wavelet coefficients are Aj approximation and Dj detail levels. For each type of fault location, vectors of D1-D8 and A8 coefficients are obtained. The second stage of feature extraction involves providing effective and appropriate statistical parameters for feature vector creation to reduce the collected data and improve estimation performance.

Feature Extraction Set-Up
After selecting and denoising the signal, the feature extraction stage is critical for data-driven-based fault detection and location estimation problems. Extracted features are measurable data taken from the transient of the current-and voltage-filtered signals to create a feature vector. This feature vector should be dimensionally compact to successfully implement the learning and generalization processes in the estimation algorithms for fault location. The feature extraction stage is divided into two sub-stages. The first stage involves decomposing all generated samples for each fault location up to eight levels using DWT-MRA to obtain wavelet coefficients. The wavelet coefficients are A j approximation and D j detail levels. For each type of fault location, vectors of D1-D8 and A8 coefficients are obtained. The second stage of feature extraction involves providing effective and appropriate statistical parameters for feature vector creation to reduce the collected data and improve estimation performance.

Feature Extraction Results
When a large number of high-frequency components of voltage and current signals are fed for training, several learning tools face problems due to a limitation on the input space dimension. These learning tools lack the capability to provide suitable learning patterns with a large number of features. This is due to the enlargement of the structure and an extreme increase in the number of learning parameters [11]. The regression model used in this study is designed to train with the second norm (referred to as the norm) of the wavelet coefficients. In general, the decomposed signal's norm for wavelet coefficients is determined as follows: j denotes the decomposition level, and the maximum level of decomposition is N. The detail and approximate coefficients have n values at level j. Overall, the proposed energy vector obtained from the MRA-based DWT for any current or voltage signal from a given time window is represented as Using the MRA-based DWT, norm values of current for ground faults at various sites are calculated and presented in Figure 8, respectively. There is a distinct difference in the approximate norms between the given fault locations at levels D6 through D8. These differences in norms indicate that the obtained features contain distinct fingerprints for estimating ground faults at various places. Figure 9 shows the obtained features of the voltage signal for ground faults between locations 40 to 200 km.

Feature Extraction Results
When a large number of high-frequency components of voltage and current signals are fed for training, several learning tools face problems due to a limitation on the input space dimension. These learning tools lack the capability to provide suitable learning patterns with a large number of features. This is due to the enlargement of the structure and an extreme increase in the number of learning parameters [11]. The regression model used in this study is designed to train with the second norm (referred to as the norm) of the wavelet coefficients. In general, the decomposed signal's norm for wavelet coefficients is determined as follows: denotes the decomposition level, and the maximum level of decomposition is N. The detail and approximate coefficients have values at level . Overall, the proposed energy vector obtained from the MRA-based DWT for any current or voltage signal from a given time window is represented as Using the MRA-based DWT, norm values of current for ground faults at various sites are calculated and presented in Figure 8, respectively. There is a distinct difference in the approximate norms between the given fault locations at levels D6 through D8. These differences in norms indicate that the obtained features contain distinct fingerprints for estimating ground faults at various places. Figure 9 shows the obtained features of the voltage signal for ground faults between locations 40 to 200 km.     . Figure 9. Feature vector extracted for ground fault at various locations of the voltage signal.
In Figure 9, the norm values for each location are significantly different in the dominant frequency band between D5 and D8 and can be used as input vectors to establish fault estimation rules. Similarly, as illustrated in Figure 10, a unique signature of the poleto-pole fault may be derived at different frequency bands. A schematic diagram for the feature vector development process is shown in Figure 11.    In Figure 9, the norm values for each location are significantly different in the dominant frequency band between D5 and D8 and can be used as input vectors to establish fault estimation rules. Similarly, as illustrated in Figure 10, a unique signature of the pole-to-pole fault may be derived at different frequency bands. A schematic diagram for the feature vector development process is shown in Figure 11. . Figure 9. Feature vector extracted for ground fault at various locations of the voltage signal.
In Figure 9, the norm values for each location are significantly different in the dominant frequency band between D5 and D8 and can be used as input vectors to establish fault estimation rules. Similarly, as illustrated in Figure 10, a unique signature of the poleto-pole fault may be derived at different frequency bands. A schematic diagram for the feature vector development process is shown in Figure 11.

Training Set-Up
Following preprocessing strategies, these extracted features are standardized for computational simplification. The decluttered training dataset is then applied to the BObased AI model to find the appropriate hyperparameters for the FFNN once the feature vectors have been determined. The input vector p = (x1, x2, x3, x4) of 10 ms is designed for the FFNN input; two inputs (x1, x2) represent the transient dc current second norm from positive and negative poles, while the rest (x3, x4) indicate the dc voltage second norm from positive and negative poles. This corresponds to 36 inputs for each training sample (total training samples = k = 714). In doing so, BO's AI model is modified each time until the maximum number of iterations is reached. BO then selects the ideal FFNN hyperparameters that result in the lowest RMSE, and the FFNN is trained using the LMBP algorithm. The final RMSE obtained is 0.0132, with a total evaluation time of 39.3428 s for 30 iterations. Some key hyperparameters of the multilayer FFNN model obtained via BO are presented in Table 6.

A. Metric for Evaluation and Testing Set-Up
Although, during validation, the selected models' average estimation accuracy was 98.94%. However, we tested our method for further investigation using case studies given in Table 7. For verification and more in-depth analysis, a performance index based on percentage error was used as follows:

Training Set-Up
Following preprocessing strategies, these extracted features are standardized for computational simplification. The decluttered training dataset is then applied to the BObased AI model to find the appropriate hyperparameters for the FFNN once the feature vectors have been determined. The input vector p = (x 1 , x 2 , x 3 , x 4 ) of 10 ms is designed for the FFNN input; two inputs (x 1 , x 2 ) represent the transient dc current second norm from positive and negative poles, while the rest (x 3 , x 4 ) indicate the dc voltage second norm from positive and negative poles. This corresponds to 36 inputs for each training sample (total training samples = k = 714). In doing so, BO's AI model is modified each time until the maximum number of iterations is reached. BO then selects the ideal FFNN hyperparameters that result in the lowest RMSE, and the FFNN is trained using the LMBP algorithm. The final RMSE obtained is 0.0132, with a total evaluation time of 39.3428 s for 30 iterations. Some key hyperparameters of the multilayer FFNN model obtained via BO are presented in Table 6.

A. Metric for Evaluation and Testing Set-Up
Although, during validation, the selected models' average estimation accuracy was 98.94%. However, we tested our method for further investigation using case studies given  Table 7. For verification and more in-depth analysis, a performance index based on percentage error was used as follows: Percentage error = Actual Location − Prediction location Total lenght of transmission line × 100 (61)

Case 1 (Fault Location)
In Case 1 (under varying fault locations and fault resistance), the functionality of the proposed technique was tested using the scenarios given in Table 7. After thorough training, fault analyses were carried out with varying fault distances and resistances. Table 8 shows the 800 test samples, absolute and percentage errors for two types of dc-link faults: PTP and PTG. It can be observed that the percentage error for the testing dataset was found to be 0.4927% and 0.5361% for the PTP fault and PTG fault, respectively. The proposed technique's total percentage error was found to be 0.5144 percent, which demonstrated that the misclassification was well within acceptable bounds. In addition, Figure 12 depicts the percentage inaccuracy for the proposed technique in locating PTP faults on line 13 PTP faults with fault distances ranging from 5 km to 200 km. With a maximum percentage error of 1.3174% at 175 km and a minimum value of 0.00103% at 15 km, the findings revealed that the proposed algorithm had no major impact on the variance of fault distance. Therefore, the proposed approach is suitable for locating close-in and far-away faults.   Table 6 for fault distance

Case 1 (Fault Location)
In Case 1 (under varying fault locations and fault resistance), the functionality of the proposed technique was tested using the scenarios given in Table 7. After thorough training, fault analyses were carried out with varying fault distances and resistances. Table 8 shows the 800 test samples, absolute and percentage errors for two types of dc-link faults: PTP and PTG. It can be observed that the percentage error for the testing dataset was found to be 0.4927% and 0.5361% for the PTP fault and PTG fault, respectively. The proposed technique's total percentage error was found to be 0.5144 percent, which demonstrated that the misclassification was well within acceptable bounds. In addition, Figure 12 depicts the percentage inaccuracy for the proposed technique in locating PTP faults on line 13 PTP faults with fault distances ranging from 5 km to 200 km. With a maximum percentage error of 1.3174% at 175 km and a minimum value of 0.00103% at 15 km, the findings revealed that the proposed algorithm had no major impact on the variance of fault distance. Therefore, the proposed approach is suitable for locating close-in and far-away faults.

Case 2 (Fint)
Apart from fault location, it is important to note that the characteristics and amplitude of faulty signals, such as voltage and current measured at the local terminal, are also determined by fault parameters such as fault resistance. Therefore, it is crucial to highlight the proposed approach's performance under diverse fault resistances. This section analyzes the proposed algorithm's performance for in-depth fault resistance validity ranging from 10 to 385 Ω, and the results are given in Table 9. Notably, in the event of high fault resistance, such as 385 Ω, with an actual fault distance of 185 km, the energy of the travelling waves tended to be on the lower side, bringing the system closer to the steady state. However, the proposed algorithm with selected features extracted even the most minute voltage and current information. For example, the predicted fault distances for PTP and PTG at 385 Ω were 183.63147 km and 186.93141 km, respectively. The associated misclassification of 0.68427% and 0.96571% for each fault type was well within acceptable limits.

Case 3 (Noisy Events)
In this case, a white Gaussian was added to the testing signals to examine the proposed fault-locating scheme under various noisy occurrences. Original signals with SNRs ranging from 20 to 45 dB were employed to assess fault location performance. Table 10 indicates that the proposed scheme could locate all sorts of faults with a reasonable mean percentage error rate for close-in, mid-point of line, and far-end of line. In the case of 45 dB noise additions at the far end of 155 km of the dc-link, the total mean percentage error was 0.72424% and 0.83147% for PTP and PTG faults. It is worth noting that the proposed method was noise-resistant because of the denoising process with better threshold settings and functions. This improved the estimation accuracy despite the high noise level of 20 dB with an overall mean percentage error of 0.9411% and 0.8561% for PTP and PTG faults, respectively.

Case 4 (Comparison with Existing Methods)
To further validate the proposed scheme's robustness, Figure 13 replaces it with intelligent adversaries such as the conventional FFNN and BP-NN with an original current signal as the input under the testing conditions listed in Table 7.

Case 4 (Comparison with Existing Methods)
To further validate the proposed scheme's robustness, Figure 13 replaces it with intelligent adversaries such as the conventional FFNN and BP-NN with an original current signal as the input under the testing conditions listed in Table 7. On a dual 2.9 GHz, Intel Core i7 with 16 GB RAM, the current version of the algorithm implemented in Matlab ® R2020a took 39.3428 s to run. Thirty ANN models were selected, trained, and validated with this runtime. It was approximately five times faster than a conventional FFNN configured manually with hyperparameters. The results showed that the proposed algorithm performed better than the BP-NN and had the lowest percentage error (i.e., 0.49%, 0.54% and 0.51%) for all fault types. In terms of percentage error, the conventional FFNN with hyperparameters such as 15 neurons in the hidden layer and a learning rate of 0.01 gave an average percentage error of 0.56%. This showed that efficient features and regulating parameters in the proposed algorithm helped to increase the interpretability of the spectrum generated by the wavelet. On a dual 2.9 GHz, Intel Core i7 with 16 GB RAM, the current version of the algorithm implemented in Matlab ® R2020a took 39.3428 s to run. Thirty ANN models were selected, trained, and validated with this runtime. It was approximately five times faster than a conventional FFNN configured manually with hyperparameters. The results showed that the proposed algorithm performed better than the BP-NN and had the lowest percentage error (i.e., 0.49%, 0.54% and 0.51%) for all fault types. In terms of percentage error, the conventional FFNN with hyperparameters such as 15 neurons in the hidden layer and a learning rate of 0.01 gave an average percentage error of 0.56%. This showed that efficient features and regulating parameters in the proposed algorithm helped to increase the interpretability of the spectrum generated by the wavelet.

Comparison and Analysis
This section compares the proposed methodology with existing fault estimation schemes for the MT-HVdc grid.

Non-AI-Based Methods
The proposed fault location method utilizes a continuous wavelet transform on dc line current signals in the MT-HVdc network [36]. The technique is quite efficient; however, a high sampling frequency of 200 kHz and time-synchronized measurements are required. Further, evaluation under high fault resistance has not been investigated thoroughly. Another work used time-stamped measurements to locate faults at a 200 kHz sampling frequency [37]. The proposed model is robust against noise measurement, but high sampling frequency and synchronized measurements could be a barrier to practical applications. The single-ended TW-based fault location model has no synchronized measurement issue [38] but has a high sampling frequency (100 kHz) [39]. In another example, modal voltage and current measurements are sampled at 1 MHz to develop a single-end fault location model [40]. However, it has only been tested for 100 Ω fault resistance. All the aforementioned TW-based fault location models require a high sampling frequency for good accuracy. Such a requirement is frequently considered a drawback. In comparison, the proposed single-end fault location approach operates with reasonable sampling frequency and tests against fault resistance as high as 485 Ω.

AI-Based Methods
Among the fault location approaches, learning-based techniques fall into a distinct category. Even though such practices are commonly utilized in AC systems for fault localization, few papers discuss their relevance to MT-HVdc networks. For example, an extreme learning machine was proposed to locate the fault in the MT-HVdc network [41]. Voltage and current measurements were captured at a 500 kHz sampling frequency during the learning phase to perform the wavelet transform and s-transform for feature extraction. However, the entire scheme has been tested for fault resistance up to 100 Ω. Similarly, the high voltage and current measurements sampled at 200 kHz and the investigation of highly resistive faults are missing [42]. Another method applied a traditional two-ended TW-based fault location algorithm to current measurements sampled at 5 kHz [43]. The distance inaccuracy caused by the moderate sampling frequency was subsequently reduced using a machine-learning approach. However, utilizing multiple distributed sensors on long transmission added cost to the method. With the help of the ANN, the real-time implementation of the proposed method is quite efficient. It has been proven to have a low execution time on low-spec machines [44]. Further, all the aforementioned models do not discuss the optimization of the machine-learning model. The proposed approach optimizes the pre-training set-up with the help of Bayesian optimization.

Conclusions
At first, a novel dc fault location scheme based on AI for a meshed dc grid is proposed. The BO-based FFNN model with DWT application is used to determine the best hyperparameters that improve the selected model's performance while keeping the RMSE low. Levenberg-Marquardt backpropagation is used to adjust weights and biases during training for the chosen multilayer FFNN model. The contribution of this work is summarized as follows: 1.
The wavelet coefficient energies of voltage and current over 10 ms are calculated and denoised during the learning phase for feature extraction. This leads to fewer features yet is robust for the learning model.

2.
A comprehensive training dataset is collected to train the multilayer FFNN model for different fault locations by varying fault impedance.

3.
The performance of this model is then evaluated on data points that are not included in the training dataset. The study results show that the fault location can be calculated using the FFNN for fault resistance up to 485 Ω.

4.
Because the signal and Gaussian noise are integrated into the FFNN training sets, the influence of the noise-contained environment is reduced. 5.
Due to plug-and-play capability, the suggested intelligent algorithm is tailored for a multi-vendor-based fault location estimation strategy in meshed MT-HVdc grids. 6.
The case studies show that the proposed scheme performs well against many variables, such as different fault resistances, transmission line lengths, and non-ideal noise events. Thus, that makes it feasible for practical application in the MT-HVdc grid.
In future work, variable time windows will be used to consider the effect of the fault location, fault resistance, and computational burden. This work provides an analysis of the fault location estimation method for HVdc cable grids that can be applied to hybrid cable-overhead line systems as well.