2.2. Selection of Inversion Parameters
When multiple thyristors suffer insulation degradation, the system may face performance deterioration and fault risks, severely affecting safe operation. For a thyristor level, the main indicator of insulation degradation is reverse leakage current, which can be represented by the leakage resistance of the thyristor. Therefore, by identifying the equivalent insulation resistance formed by the parallel combination of the voltage-sharing resistor and the off-state leakage resistance of the thyristor level, the insulation degradation of the thyristor can be reflected [
15].
In addition, during system operation, the damping capacitor in the damping circuit connected across the thyristor is frequently charged and discharged. The metal-film capacitor used in this circuit gradually deteriorates over time, thereby affecting the normal turn-on and operation of the thyristor. Engineering practice shows, however, that the performance of the damping resistor in the damping circuit is relatively stable.
By analyzing the variations of equivalent insulation resistance and damping capacitance in the circuit, the degradation of the circuit connected across the thyristor can be assessed [
16,
17]. Therefore, this paper selects the equivalent insulation resistance and damping capacitance of the thyristor level as the key electrical parameters for evaluating the thyristor health status.
A thyristor converter valve model typically consists of dozens of thyristor levels connected in series, and the variations of parameters such as equivalent insulation resistance exhibit nonlinear characteristics. Because deriving and solving the circuit state equations is highly complex, this paper adopts a data-driven parameter inversion method for more efficient inversion.
First, a power-system simulation model is built according to the actual converter valve, and the original dataset is collected using this simulation model. Since the voltage and current at both ends of a bridge arm can reflect the impedance characteristics of the entire valve, and to ensure safe and reliable operation, the thyristor control unit (TCU) of each thyristor level sends feedback signals to the valve base electronics (VBEs) when preset voltage thresholds are reached [
18]. The most common signals are the forward feedback signal POS triggered when the thyristor-level voltage reaches 70 V and the negative feedback signal NEG triggered when it reaches −150 V. Therefore, the selected input data consist of the voltage and current signals at both ends of the bridge arm, together with the pulse timing of the POS and NEG feedback signals.
Next, wavelet packet decomposition is used to extract time-frequency-domain features from the original data so as to distill feature inputs related to parameter inversion and ensure that key characteristics of the circuit dynamic behavior are captured. After obtaining the feature inputs, a suitable network architecture is selected and trained using the original dataset and its corresponding parameter labels. Ultimately, the trained neural network can not only accurately identify the parameters of the simulation model but also be applied to practical systems having the same structure to realize accurate parameter inversion. The overall inversion framework is shown in
Figure 3.
2.3. Parameter Inversion Method
According to the six-pulse converter valve model described in
Section 2.1, a Simulink circuit simulation model is established, as shown in
Figure 4. Based on parameter values used in practical engineering, the simulation parameters are set as shown in
Table 1.
Considering that each thyristor level is identical when their parameters are the same, the system exhibits a certain symmetry. When the parameters of only one thyristor level change, the other thyristor levels remain identical. Therefore, this paper mainly changes the parameters of the first level in the seven-level thyristor simulation model and observes the timing variations of the feedback signals for the first level and the other thyristor levels. The damping capacitance and equivalent insulation resistance of a thyristor level need to be replaced when their degradation reaches 5% and 30%, respectively [
19,
20].
However, the 0.7 p.u. condition is introduced here only to illustrate the sensitivity of the valve-voltage distribution and feedback-signal timing to severe parameter degradation, rather than to define the neural network training domain. Accordingly,
Figure 5 presents an extreme-case mechanism analysis in which both the damping capacitance and the equivalent insulation resistance are reduced to 0.7 p.u.
As the equivalent insulation resistance of the first level degrades from 100% to 70% in decrements of 3%, resulting in 11 cases in total, the changes in the feedback signals are shown in
Figure 6. For both feedback signals, the signal of the degraded first thyristor level advances, whereas the signals of the remaining levels are delayed. This is mainly because changes in equivalent insulation resistance only alter the voltage-sharing among levels and do not affect the voltage variation rate.
Figure 7 shows the changes in the feedback signals when the damping capacitance of the first level degrades from 100% to 70% in decrements of 3%, again resulting in 11 cases. The POS feedback signal of the first level advances, whereas the NEG feedback signal changes nonlinearly.
When the number of thyristor levels in a converter valve is large, traditional grid sampling leads to an excessive dataset size. Assuming each parameter takes 10 possible values and each converter valve arm contains n thyristor levels, the required dataset size would be 102n, which is almost impossible to realize for large numbers of levels. To reduce the data demand, LHS rather than conventional grid-like sampling is adopted. LHS is a random sampling method for multi-dimensional variable spaces. It partitions the range of each variable and performs random sampling in each interval, thereby ensuring uniform coverage of the multi-dimensional space and improving sampling efficiency and representativeness. Meanwhile, training the network using the whole valve would make the network structure complex and prone to error accumulation. Therefore, the parameters of each thyristor level are identified independently so as to simplify the model structure. It is important to note that in real multi-level converter valves, dynamic coupling exists among levels. When multiple thyristor levels degrade simultaneously, the voltage redistribution effect becomes highly nonlinear, which may introduce cross-coupling errors and reduce the inversion accuracy of the current independent identification model. However, in the context of early condition monitoring, the probability of simultaneous severe degradation occurring across multiple levels is relatively low. The proposed independent identification strategy primarily serves as an efficient and computationally viable method for isolated early fault detection. For larger-scale systems or scenarios involving complex multi-level simultaneous degradation, future work will address this by expanding the multi-dimensional sampling space to include coupled degradation scenarios and exploring decoupling network architectures to further enhance the robustness of the inversion method.
To ensure reliable prediction over a relatively wide degradation range, some margin must be reserved in the sampling range for neural network training. For neural network training, the sampling ranges are set according to practical engineering requirements, together with a limited margin around the normal operating region. Specifically, the damping capacitance is sampled from 90% to 105%, and the equivalent insulation resistance is sampled from 65% to 105%. Therefore, the trained network is intended for accurate inversion within these sampled ranges, whereas the 0.7 p.u. cases in
Figure 4,
Figure 5 and
Figure 6 are used only for qualitative sensitivity analysis.
In the seven-level six-pulse simulation study presented in
Section 2, 2000 parameter degradation combinations obtained by LHS are used in the simulation model. Because each bridge arm contains seven thyristor levels, these combinations yield a total of 14,000 original single-level samples for the simulation-based method development. To construct the datasets, these 14,000 samples were divided into training, validation, and test sets, accounting for 70%, 15%, and 15% of the total data, respectively. Crucially, to prevent information leakage and ensure independence between subsets, all seven single-level samples generated from the exact same operating condition were assigned to the same subset, guaranteeing no overlap between training and testing. Furthermore, during neural network training, feature normalization was performed using only the training set to compute the mean and standard deviation. These exact normalization parameters were subsequently applied to the validation and test sets, ensuring fair and reliable evaluation.
- (1)
Selection of the Mother Wavelet
To extract useful information from the waveforms more effectively, wavelet packet decomposition is employed to decompose the voltage and current waveforms into different frequency bands, and the energy proportion of each band is taken as the feature value. Since some mother wavelets are not suitable for wavelet packet transforms, four specific mother wavelets are selected, namely Daubechies 4 (Db4), Symlet 4 (Sym4), Biorthogonal 1.5 (Bior1.5), and Coiflet 4 (Coif4). Taking the valve voltage signal measured when all thyristor levels are free of aging as an example, the performance of these four mother wavelets at the third decomposition level is compared. At the third decomposition level, the signal is decomposed into eight groups of wavelet coefficients. The ratio of the energy to Shannon entropy of each group of wavelet coefficients is calculated to evaluate the feature extraction performance. This ratio serves as a criterion for signal quality: higher energy implies more concentrated useful information, while lower Shannon entropy indicates less uncertainty or noise. Therefore, a higher ratio means the corresponding mother wavelet is more effective at extracting distinct features. The results are summarized in
Table 2.
As can be seen from
Table 2, the Db4 mother wavelet has the highest energy-to-Shannon entropy ratio. Therefore, Db4 is ultimately selected as the mother wavelet for analyzing the voltage and current signals of the thyristor converter valve.
- (2)
Feature Dimensionality Reduction
A three-level wavelet packet decomposition is selected, yielding eight feature values for each waveform. The feature values are first indexed: the eight feature values obtained from voltage decomposition are denoted λ1–λ8, and those obtained from current decomposition are denoted λ9–λ16.
After three-level wavelet packet decomposition of the voltage and current, a total of 16 feature values are obtained. Because these 16 feature values are composed of the energy proportions of each frequency band, a certain degree of redundancy exists, and dimensionality reduction is therefore required. The feature values corresponding to changes in capacitance and resistance are then analyzed. The coefficient of variation and the explained variance ratio are two key indicators, and the results are shown in
Table 3. The coefficient of variation is used to measure the degree of dispersion of the data and is calculated as the ratio of the standard deviation to the mean. In feature selection, it helps identify feature values with large fluctuations. The explained variance ratio indicates the contribution of each feature value to the total variance and reflects its importance in the data. A high explained variance ratio means that the feature contains more information and variation and is therefore a major feature.
Table 3 shows the indicators for feature value dimensionality reduction.
The voltage feature set {λ1–λ8} and current feature set {λ9–λ16} are both normalized energy proportions obtained from a three-level wavelet packet decomposition; therefore, the sum of the features within each set is equal to 1. To eliminate linear dependence, one feature can be removed from each set. Considering parameter sensitivity, λ1 and λ9 have the smallest coefficients of variation in the voltage and current feature sets, respectively, under parameter degradation and are therefore discarded as redundant features with the weakest discrimination ability. In addition, λ11 exhibits a low explained variance ratio and limited contribution to parameter variation, and is therefore also removed. Consequently, the retained features are λ2–λ8, λ10, and λ12–λ16. Together with the POS and NEG timing features, the baseline input vector contains 15 elements.
- (3)
Neural Network Training
Step 1: A BP neural network is a multilayer feedforward network whose basic idea is the gradient descent method [
21]. Through repeated learning and training, the weights and thresholds are adjusted until the network parameters corresponding to the minimum error are determined, thereby completing training. However, BP neural networks also have some drawbacks, such as a slow training speed and difficulty in determining network parameters.
Because the inputs are the voltage and current feature values together with the feedback-signal times, whereas the outputs are the equivalent insulation resistance and damping capacitance, whose numerical ranges differ greatly, direct training can easily lead to non-convergence of the network. Therefore, the data are first normalized:
Step 2: After normalization, all data fall within the interval [0, 1]. The number of neurons in the hidden layer is determined by combining empirical formulas with practical tests, and its value range is calculated by the following formulas:
where
m is the number of neurons in the input layer,
n is the number of neurons in the output layer, and a is a constant, typically ranging from 1 to 10.
Step 3: The Grey Wolf Optimizer (GWO) was first proposed by Mirjalili et al. in 2014 [
22,
23,
24]. Its main advantages are simplicity, ease of implementation, fast convergence, and insensitivity to parameter selection, which allow it to effectively avoid local optima. Before training begins, the initial weights are optimized by the GWO algorithm. Multiple candidate weight combinations are first generated to simulate the hunting strategy of grey wolves. The first step is encircling the prey:
where
t denotes the current iteration number;
denote the positions of the alpha, beta, and delta wolves, respectively;
denote their updated positions;
denotes the search step;
A and
C are control vectors;
and
are random numbers in (0, 1);
is the convergence factor; and
is the maximum number of iterations.
The prey is then attacked. Under the guidance of the three leading wolves, the ω wolves surround and hunt the prey, and the position update equations are as follows:
where
,
, and
are the current positions of the alpha, beta, and delta wolves, respectively;
,
, and
represent the search steps of the three leading wolves; and
denotes the updated position of an individual grey wolf under the guidance of the alpha, beta, and delta wolves.
Finally, the weight combination with the best fitness value is selected as the initial weight of the neural network.
In neural network training, the GWO algorithm was used to optimize the initial weights of the BP network. The parameters of the GWO were set as follows: a population size of 30, maximum number of iterations of 100, fitness function defined as the mean squared error between network outputs and target values, and search limits within [−1, 1]. The stopping criterion was either reaching the maximum number of iterations or the fitness function converging below 1 × 10−9. To ensure reliability, each training session was independently repeated five times, and the best weight combination was selected for the final network training. This configuration ensures that the GWO can converge quickly while effectively avoiding local optima during weight optimization.
Step 4: The sample data are passed from the input layer to the hidden layer, and the output of the hidden layer is calculated through an activation function. The Sigmoid function is selected as the activation function.
Step 5: The output of the hidden layer is further processed by the activation function of the output layer. Since parameter inversion is a regression problem, a linear activation function is used in this paper.
Step 6: The loss function is established as the mean square error (MSE) based on the sum of squared differences between the network output and the target value:
Step 7: By calculating the gradient of the error, the weights and biases between the input layer and hidden layer and between the hidden layer and output layer are adjusted:
Step 8: The above steps are repeated until the parameter inversion accuracy requirement is met or the maximum number of training iterations is reached. The network model is then successfully trained. The training flowchart is shown in
Figure 8.
Based on the baseline input–output requirements, the number of input nodes in the neural network is set to 15, the number of output nodes to 2, and the estimated hidden node range was 5 to 15. Within this range, simulations were conducted for different hidden node numbers, recording the training and validation mean squared errors and training time.
Table 4 shows that with 10 hidden nodes, both the training and validation errors reached a low level while keeping training time reasonable. Fewer than 10 nodes resulted in insufficient network capacity and higher errors, whereas more than 10 nodes offered limited error reduction but a significantly longer training time. Therefore, 10 hidden nodes were selected to balance network accuracy and training efficiency. The maximum number of iterations was set to 1000, the error threshold to 1 × 10
−9, and the learning rate to 0.01. In this way, the input dataset after parameter feature extraction was used to train the neural network. To ensure the reproducibility of the computational performance and to provide a fair basis for comparing the inversion and training times, all neural network training and parameter inversion simulations were executed on a workstation. The hardware environment consisted of an Intel Core i7-13700K processor (Intel Corporation, Santa Clara, CA, USA) and 16 GB of RAM. The algorithms were implemented and tested using MATLAB R2023a (MathWorks, Natick, MA, USA) running on a Windows 11 operating system (Microsoft Corporation, Redmond, WA, USA). All execution times were recorded under standard conditions without other computationally intensive background tasks.