1. Introduction
High-voltage diodes are core components of modern power electronic systems, which are widely used in energy conversion, electric vehicles, industrial automation, and renewable energy systems, and their reliability directly affects the operational safety and efficiency of power electronic systems, in which high-voltage diodes occupy a key position [
1]. The accurate prediction of their storage life remains challenging due to complex degradation mechanisms affected by environmental stresses, electrical load variations, and material aging. Traditional degradation prediction methods mainly rely on simplified physical models or statistical methods, such as the Arrhenius model and the Weibull distribution [
2,
3], which often fail to capture nonlinear degradation patterns. Traditional degradation prediction methods primarily rely on simplified physical models or statistical approaches. Among these, the Arrhenius model is based on chemical reaction kinetics theory, assuming that the degradation rate exhibits an exponential relationship with temperature. Its mathematical expression is
, where
represents the failure rate, A is the frequency factor,
denotes the activation energy, k is the Boltzmann constant, and T is the absolute temperature. However, this model suffers from the following limitations: (1) Single-stress assumption: it only considers the influence of temperature stress while neglecting the coupling effects of multiple stresses such as voltage and humidity; (2) Linear degradation assumption: it assumes that the degradation process follows linear patterns, failing to describe the nonlinear degradation behaviors commonly observed in actual devices, such as rapid failure phases following initial slow degradation periods; (3) Parameter estimation difficulties: under limited experimental data conditions, the estimation accuracy of key parameters like activation energy is relatively low.
The Weibull distribution model describes the failure probability distribution of devices through shape parameter and scale parameter , with its probability density function expressed as . The main limitations of this model include (1) Static modeling: it cannot reflect the dynamic evolutionary characteristics of degradation processes, making it difficult to capture the time-varying nature of degradation rates; (2) Strict assumption constraints: it requires failure data to strictly follow the Weibull distribution, and when actual degradation processes deviate from this distribution, prediction accuracy significantly decreases; (3) Poor environmental adaptability: model parameters are typically determined under specific experimental conditions, making it difficult to adapt to environmental changes in practical applications.
The common weakness of these traditional methods is their inability to effectively handle the nonlinear, multi-stage, and multi-factor coupling characteristics in high-voltage diode degradation processes. Particularly under small-sample conditions of accelerated aging tests, their prediction accuracy and reliability often fail to meet the requirements of engineering applications. In recent years, deep learning methods have demonstrated strong nonlinear modeling capabilities and made significant progress in the field of remaining useful life (RUL) prediction [
4,
5]. Recent studies further validate hybrid architectures: He et al. combined Transformer with Kolmogorov–Arnold networks for bearing RUL prediction [
6]; Xu integrated attention-LSTM with artificial bee colony optimization for lithium batteries [
7].
Among deep learning methods, LSTM networks can effectively deal with the long-term dependency problem in time-series data by virtue of its unique gating mechanism and memory unit structure [
8,
9]. In recent years, LSTM has been widely used in the field of lifetime prediction. For example, Wang et al. applied LSTM to bearing life prediction and extracted the timing features through a multilayer LSTM network, and the prediction accuracy was improved by 42% compared with that of traditional RNN [
10]. Zhang et al. proposed a battery life prediction model based on LSTM-RNN, and introduced the elastic mean-square backpropagation method to perform the adaptive optimization, which enhances the ability of extracting the key timing features, and improves the prediction accuracy significantly [
11]. And, the Transformer model has received widespread attention since it was proposed in 2017 for its parallel computing capability and global modeling advantages [
12]. Important progress has also been made in the application of Transformer in the field of lifetime prediction. Chen et al. applied Transformer to mechanical device RUL prediction, and through the mechanism of multiple self-attention, to capture degradation sequences in the long-range dependencies, and the prediction accuracy is improved by 15% compared with LSTM [
13]. Zhang et al. proposed a fusion model for lithium-electronic battery RUL estimation, which integrates the stacked denoising self-encoder (SDAE) and Transformer model, and improves the accuracy compared with the traditional recursive model [
14]. Zhang et al. designed a self-attention Transformer for multi-feature battery degradation [
15] and Wang et al. proposed multivariate dynamic embedding for industrial time series [
16].
And, the LSTM-Transformer hybrid model framework proposed in recent years combines the local time-series modeling capability of LSTM and the global dependency capture advantage of Transformer, which shows significant potential in time-series prediction tasks, especially in the field of RUL prediction, which has been widely studied. Lu et al. proposed an LSTM-Transformer hybrid architecture fuel cell lifetime prediction model, which can not only capture the local features of fuel cell performance degradation, but also effectively simulate the long-term degradation trend of the fuel cell by deeply fusing the two neural networks, improving the prediction accuracy and generalization ability [
17]. In addition, Pentsos et al. proposed an LSTM-Transformer model designed for power load prediction, which utilizes the advantages of the LSTM and Transformer model to achieve more accurate and reliable prediction of power consumption [
18]. These studies show that the hybrid LSTM-Transformer modeling framework has significant advantages in dealing with complex timing prediction tasks, and is able to effectively deal with challenges such as nonlinear degradation patterns, multimodal data, and so on.
However, the existing research on the LSTM-Transformer hybrid architecture has the following limitations: (1) the limitation of the application field and the lack of adaptability of degradation mechanism—existing research is mainly concentrated in the fields of mechanical equipment, power batteries, and so on, and research into the degradation prediction of semiconductor devices such as high-voltage diodes is relatively limited, while the degradation mechanism of high-voltage diodes is essentially different from mechanical wear or battery capacity degradation. Different from the gradual wear of mechanical equipment or the monotonous degradation of battery capacity, the degradation of high-voltage diodes presents multi-stage nonlinear characteristics: slow drift in the initial stage, accelerated degradation in the middle stage, and sharp failure in the later stage. The existing series parallel fusion mechanism cannot adaptively identify and model this complex multi-stage degradation mode, and lacks the targeted feature extraction and weight adjustment mechanism for different degradation stages, resulting in significant differences in the prediction accuracy of different degradation stages; (2) The insufficient processing capacity of small samples—the accelerated aging test usually can only obtain limited sample data. Existing research work is mainly based on large sample datasets (usually containing thousands to tens of thousands of sample points), while accelerated aging test of high voltage diode can only obtain dozens to hundreds of degradation data points due to high cost, long cycle and limited number of equipment. Under such small sample conditions, the traditional LSTM transformer hybrid structure is prone to overfitting, which cannot fully extract the key degradation features in sparse data, and lacks effective regularization and data enhancement strategies to improve the generalization ability of the model, Pan et al. addressed this via knowledge-based data augmentation [
19]; (3) The fusion strategy is singular and lacks uncertainty quantification—most existing studies use simple series parallel fusion or weighted average, lack a specialized fusion mechanism design for degradation prediction tasks, cannot effectively integrate local time series information and global dependencies, and lack quantification and confidence evaluation mechanisms for prediction uncertainty.
Compared with existing research on the hybrid structure of LSTM transformer, there are significant differences in the architecture design and fusion strategy of this research: the existing research mostly adopts the simple series structure (LSTM before transformer) or the parallel structure (two paths are calculated independently and then fused directly), while the two-path residual connection mechanism proposed in this research not only maintains the independence of the two modules, but also realizes the effective transmission of deep features and the optimization of gradient flow through residual connection. In addition, existing studies mostly rely on data enhancement or simple regularization techniques when processing small sample data, while this study can more effectively learn and generalize degradation patterns under limited sample conditions through specially designed residual path and multi-scale feature extraction mechanism. In terms of uncertainty processing, the traditional LSTM transformer hybrid model mainly focuses on the prediction accuracy and lacks the systematic quantification of the prediction confidence. This study provides a richer feature representation basis for the subsequent uncertainty quantification through the feature separation of the dual path architecture.
Based on the above analysis, the difference between the LSTM-Transformer hybrid architecture proposed in this study and previous studies is that for the multi-stage nonlinear degradation characteristics of high-voltage diodes, a dual path residual connection mechanism for degradation stage perception is designed. Through the adaptive fusion of local time sequence path and global attention path, it can better integrate different scales of time sequence information and global correlation, and flexibly capture relevant features according to the characteristics of the degradation stage.
Artificial Bee Colony (ABC) algorithm is a swarm intelligence optimization algorithm inspired by the foraging behavior of honey bees, proposed by Karaboga et al. in 2005, which has the advantages of few parameters and fast convergence [
20]. In the field of deep learning hyper-parameter optimization, the ABC algorithm shows unique advantages. Erkan et al. compared the performance of ABC, particle swarm optimization (PSO), and genetic algorithm (GA) in neural network optimization, and found that ABC has a significant advantage in parameter space exploration efficiency [
21]. Lamjiak et al. proposed an improved ABC algorithm that enhances the traditional ABC algorithm by including neighboring food sources of the parameters to enhance the discovery phase of traditional ABC, which enhances the search capability to find the best solution and improves the optimization efficiency [
22]. These studies provide an important theoretical and practical foundation for the improved ABC algorithm (IABC) proposed in this study.
Despite the significant progress of deep learning methods in the field of life prediction, the application of these methods to high-voltage diode life prediction still faces many challenges, such as the accurate modeling of complex degradation patterns (especially in the case of small samples), the inefficiency of hyper-parameter optimization, and the insufficient quantification of prediction uncertainty, etc., and the feature extraction and generalization capabilities of existing studies in the field of high-voltage diodes, especially in the case of accelerated aging small samples, still need to be strengthened. To address these challenges, this study proposes a hybrid LSTM-Transformer model framework based on improved artificial bee colony algorithm (IABC) optimization. The core contributions of this framework are (1) constructing and optimizing the LSTM-Transformer hybrid model framework for high-voltage diode degradation prediction. In this study, a two-path residual linkage mechanism is carefully designed to deeply optimize the fusion of the local timing detail capturing the capability of LSTM and the global context-dependent modeling capability of the Transformer. Through the specific fusion strategy, this method fully retains key local dynamic information and integrates global semantic features under limited sample conditions, which significantly improves the robust feature extraction capability and the portrayal effect of nonlinear degradation laws under high noise and complex backgrounds, and provides theoretical and algorithmic support for accurate lifetime prediction; (2) developing optimization algorithms for parameter type sensing. The model hyperparameters are efficiently optimized through strategies such as differential neighborhood search, structural constraints processing and logarithmic space optimization; (3) A multi-method lifetime prediction framework incorporating deep learning trend prediction, stochastic process fluctuation modeling, and statistical extrapolation is constructed, and the prediction uncertainty is quantified.
This study provides high-precision solutions for the reliability assessment of high-voltage electronic devices, which is valuable for the design of smart grids and new energy systems.
4. Experimental Results and Analysis
4.1. Performance Evaluation Metrics
In this study, the model performance was comprehensively evaluated using four indicators: mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R
2). A smaller MSE, RMSE, and MAE indicate better model performance, and a larger R
2 indicates better model performance. The formulas are, respectively:
where
n is the number of samples,
is the true value,
is the predicted value, and
is the mean of true values. These metrics assess the model prediction performance from different perspectives, providing clear directions for model improvement.
4.2. Analysis of the IABC Optimization Effect
4.2.1. IABC Hyperparameter Optimization Effect
To address the complexity of the hyperparameter space of the hybrid model, the Improved Artificial Bee Colony Algorithm (IABC) proposed in this study significantly improves the optimization efficiency through innovative mechanisms such as adaptive integer variation, structural constraints perception, and logarithmic space search. The sample hyperparameter search space as well as the optimal hyperparameters are shown in
Table 2.
The IABC optimization tailors hyperparameters to individual degradation dynamics by adapting to sample-specific failure mechanisms. For LSTM Encoder Units (32–256), higher values (Sample 4: 69 units) model nonlinear degradation in moisture-sensitive samples (current degradation rate 0.0029
A/h), while lower values (Sample 7: 20 units) suffice for stable devices with monotonic aging. Attention heads (1–8) resolve multi-stage failures: Sample 9’s voltage fluctuations (std. 0.1746 V) require two heads to capture phase transitions, contrasting with Sample 10’s steady decay (1 head). Attention dimension (32–256) scales with feature complexity—Sample 5’s accelerated degradation (high current std 0.2330
A) demands larger dimensions (64) to encode abrupt changes, validated by its divisible pair with heads (64 ÷ 2 = 32). LSTM decoder units (32–256) correlate with prediction horizon; long-life samples (Sample 8: low degradation rate 0.0031
A/h) need deeper decoders (40 units) to extrapolate slow degradation. Dropout rate (0.1–0.5) counters overfitting in noisy regimes: Sample 1’s high data variance (current std. 0.1608
A) uses 0.3426 dropout for regularization. Logarithmic learning rate (0.0001–0.01) stabilizes training—Sample 7’s high rate (0.0100) prevents gradient vanishing during sharp failure events. This physics-aware optimization, enabled by IABC’s type-specific search (
Section 2.2), reduces the prediction uncertainty by 12.57% and MSE by 48.1%.
Specifically, during the hyperparameter optimization of the 10 samples, as shown in
Figure 5, the best fitness of each sample during the IABC iteration was steadily increased from an initial mean value of 28.62 to a final mean value of 45.23 at the end of the iteration, which indicates that the average increase in the best fitness of all the samples reached about 58.0%. At the same time, the average fitness of the entire population also increased from about 17.03 at the beginning to about 29.28 at the end, which represents an average increase of about 71.9%. These results initially show the effectiveness of the IABC algorithm in searching for more optimal hyperparameter combinations. It is worth noting that the exact value and percentage of improvement of the fitness depend on the definition of the fitness function, but its growth trend reflects the convergence ability of the optimization algorithm. Moreover, there are significant differences in the optimization effects of different samples, which reflect the necessity of the personalized optimization and the adaptive nature of the IABC algorithm. These data optimization effects fully demonstrate the excellent search performance of IABC algorithm in complex hyperparameter space. The IABC optimization process is shown in
Figure 5. The horizontal axis is the number of iterations (1–30), the vertical axis of Figure (a) is the average best fitness of 10 samples, and the vertical axis of Figure (b) is the average population fitness. The curves show the overall convergence trend and optimization effect of the IABC algorithm, reflecting the search efficiency and stability of the algorithm in the iteration process.
Figure 5 shows the convergence curves of IABC optimization: (a) Best fitness values over iterations; (b) Population fitness trends. The 58% fitness improvement confirms IABC’s efficacy in hyperparameter search for complex deep learning models.
The IABC algorithm significantly improves model performance. Compared to the model with base parameters, the IABC-optimized model improves the average R
2 on the 10-fold cross-validation set from 0.892 to 0.924 (a 3.6% improvement), reduces the average MSE from 0.00574 to 0.00402 (a 33.7% improvement), reduces the average MAE from 0.0489 to 0.0422 (a 13.9% improvement), and reduces the average RMSE decreased from 0.0688 to 0.0593 (14.4% improvement). Statistical significance was verified by a
t-test, and the optimized model showed a significant improvement in prediction accuracy on the test set: the average MSE was reduced from 0.00636 to 0.00322 (48.14% improvement,
p = 0.0110 < 0.05), the average MAE was reduced from 0.0587 to 0.0391 (33.4% improvement,
p = 0.0013 < 0.01), and the mean RMSE decreased from 0.0757 to 0.0512 (32.36% improvement,
p = 0.0044 < 0.01).
Figure 6 and
Figure 7 present in detail the optimization of the Improved Artificial Bee Colony Algorithm (IABC) on 10 high-voltage diode samples cross-validated with the test set.
Figure 7 shows the comparison of R
2 metrics on independent test sets for the model before and after IABC optimization.
Figure 6 shows the comparison of various performance metrics (R
2, MSE, MAE, RMSE) of the model on the cross-validation set before and after IABC optimization.
To validate the performance of IABC, comparative experiments were conducted with Bayesian optimization and particle swarm optimization (PSO) algorithms on identical optimization tasks. The experimental results across 10 samples runs demonstrate that IABC consistently outperformed both baseline methods. Specifically, IABC achieved a mean best fitness of 44.48 (±9.50), significantly higher than Bayesian optimization (29.50 ± 6.04) and PSO (26.79 ± 5.44). Moreover, IABC exhibited superior computational efficiency with a mean optimization time of 12.07 s (±0.80), compared to 18.10 s (±0.90) for Bayesian optimization and 15.14 s (±0.63) for PSO. These results empirically confirm the theoretical advantages of IABC in handling complex hyperparameter optimization tasks for deep learning models.
Figure 8 illustrates the performance comparison between IABC, Bayesian optimization, and PSO across 10 independent runs. And, as described in
Section 2.2, IABC is theoretically able to handle the complex hyperparameter space of deep learning models more efficiently than standard ABC by introducing mechanisms such as parameter type-aware search strategies, structural constraints handling, and log-space search for learning rate.
Figure 8 shows the algorithm benchmarking, highlighting the 33.7% MSE reduction and IABC’s superiority over Bayesian/PSO methods.
4.2.2. Verification of Generalizability
In order to further verify the generalization ability of the IABC optimization algorithm, this study applies the optimization algorithm to the external validation dataset of NMOS tubes with model number C500N8C3-2. The NMOS tubes and the high-voltage diodes are from the same batch of test samples, and all the test procedures are consistent. The sensitive parameters are zero gate voltage leakage current and gate threshold voltage. The same base model and IABC optimization flow as in the high-voltage diode dataset are used on this external dataset to evaluate the performance of the optimization algorithm under different data sources and possible differences in data distribution. The experimental results show that the IABC-optimized model exhibits a significant performance improvement on the 10-fold cross-validation set compared to the model with base parameters: the R2 on the 10-fold cross-validation set improves from 0.950 to 0.968 (a 1.7% improvement), the mean squared error MSE improves by 36.2% (from 0.00208 to 0.00133), the mean absolute error MAE decreased by 21.5% (from 0.0361 to 0.0284), and the root mean square error RMSE decreased by 19.1% (from 0.0425 to 0.0344). The overall improvement in these metrics indicates that the IABC optimized model not only performs well on the high-voltage diode dataset, but also achieves considerable performance gains on external datasets. This result strongly demonstrates that the IABC optimization is not merely an overfitting phenomenon for a specific dataset, but rather, the optimization process learns more robust and adaptive model parameters to maintain excellent prediction ability under different data distributions. In summary, through the multi-dimensional experimental validation and statistical analysis, the IABC optimization algorithm not only significantly improves the performance of the model on specific datasets, but also demonstrates excellent generalization ability, which provides strong support for its reliability in more widely used scenarios.
4.3. Model Performance Comparison Analysis
The study compares the performance of the proposed IABC-LSTM-Transformer model with six mainstream benchmark models—LSTM, GRU, TCN, Informer, RSM, and Kriging—on 10 samples of high-voltage diodes. For preliminary comparison, the hyperparameters of the benchmark models (LSTM, GRU, TCN, Informer, RSM, and Kriging) are set based on common configurations in the literature as well as a small number of preliminary experiments, and all benchmark models use the same training parameters as IABC-LSTM-Transformer, which have not been optimized with systematic hyperparameters as in IABC. Therefore, the comparisons in this section mainly validate the superiority of the proposed hybrid modeling framework, rather than comparing the optimal benchmark models in the strict sense.
The results show that the proposed hybrid modeling framework outperforms in all performance metrics: the mean value of R
2 is 0.932 ± 0.038, which is 5.5% better than LSTM (0.883 ± 0.079), 4.5% better than GRU (0.892 ± 0.077), 8.0% better than TCN (0.863 ± 0.077), 7.5% better than Informer (0.867 ± 0.047), 12.0% better than RSM (0.832 ± 0.050), and 5.3% better than Kriging (0.885 ± 0.041); the mean MSE value of 0.00487 ± 0.00196 is 27.2% lower than LSTM (0.00669 ± 0.00386), 16.6% lower than GRU (0.00584 ± 0.00339), 57.2% lower than TCN (0.01137 ± 0.00479), 44.1% lower than Informer (0.00872 ± 0.00427), 43.5% lower than RSM (0.00862 ± 0.00283), and 23.7% lower than Kriging (0.00638 ± 0.00232). In terms of error control, MAE (0.0464 ± 0.0077) and RMSE (0.0612 ± 0.0101) are reduced by 8.9% and 13.0%, respectively, compared to the optimal benchmark model GRU, and the standard deviation is narrowed down by 46–57%, which shows that the more stable prediction ability is significantly improved compared to all other benchmark models.
Figure 9 demonstrates the average performance and standard deviation of each model on the test set, and
Figure 10 presents the comparison of the distribution of the predictive performance metrics of each model on the 10-sample test set in the form of box-and-line diagrams, intuitively reflecting the stability and superiority of the proposed model.
Notably, compared to RSM and Kriging models, which typically have comparative advantages in small-sample learning, the proposed IABC-LSTM-Transformer model demonstrates even more significant advantages. In the small-sample condition with only 10 samples, traditional machine learning methods such as RSM and Kriging are generally considered more suitable for handling limited data scenarios; however, the IABC-LSTM-Transformer model in this study not only outperforms the deep learning benchmark models but also significantly surpasses these methods specifically designed for small samples. Compared with RSM, the proposed model achieves a 12.0% improvement in R2, a 43.5% reduction in MSE, a 35.1% reduction in MAE (0.0464 ± 0.0077 vs. 0.0714 ± 0.0125), and a 33.3% reduction in RMSE (0.0612 ± 0.0101 vs. 0.0917 ± 0.0152); compared with Kriging, it achieves a 5.3% improvement in R2, a 23.7% reduction in MSE, 21.7% reduction in MAE (0.0464 ± 0.0077 vs. 0.0592 ± 0.0122), and 22.3% reduction in RMSE (0.0612 ± 0.0101 vs. 0.0788 ± 0.0146), respectively, indicating that the proposed model achieves significantly enhanced prediction stability across different samples.
Figure 9 and
Figure 10 show the model performance on test sets: Mean metrics with error bars and box plots of R
2/MSE distributions. The hybrid model outperforms benchmarks with narrower deviations (+12% R
2 vs. RSM), proving stability under small-sample conditions.
The statistical significance of the performance differences between the models is analyzed by the
t-test, and the results further validate the improvement of the proposed model. As shown in
Table 3, the improvement in R
2 (
p = 0.0057), MSE (
p = 0.0032), MAE (
p = 0.0018), and RMSE (
p = 0.0021) of IABC-LSTM-Transformer is statistically significant when compared to the LSTM model; and when compared to the GRU model, the improvement in R
2 (
p = 0.0389), MSE (
p = 0.0421), MAE (
p = 0.0475), and RMSE (
p = 0.0498) all showed statistical significance, with the MAE metric showing the greatest improvement (12.3% relative reduction). Compared with the TCN model, the proposed model demonstrated highly significant advantages in all indicators (
p < 1 × 10
−5), especially the MAE indicator with a
p-value of 6.4 × 10
−8, indicating its obvious advantage in handling temporal features; compared with the Informer model, the proposed model significantly outperforms in all indicators, with MSE (
p = 0.0006) and MAE (
p = 0.0001) reaching the
p < 0.001 level, and R
2 (
p = 0.0093) and RMSE (
p = 0.0003) also showing statistically significant improvements.
When compared with traditional small-sample models, the IABC-LSTM-Transformer demonstrates even more compelling statistical significance. Against the RSM model, all performance metrics show extremely significant improvements (p < 0.0001), with R2 (p = 5.7 × 10−6), MSE (p = 1.24 × 10−6), MAE (p = 2.39 × 10−7), and RMSE (p = 1.85 × 10−7) all reaching the p < 0.001 level, highlighting the robust superiority of the proposed model over traditional response surface methodology. Compared to the Kriging model, the improvements are also statistically significant, with R2 (p = 0.0127) showing significant difference at p < 0.05 level, while MSE (p = 0.0086), MAE (p = 0.0093), and RMSE (p = 0.0078) all demonstrate highly significant differences at p < 0.01 level. These results further confirm that the proposed IABC-LSTM-Transformer substantially outperforms traditional small-sample modeling approaches that are typically considered advantageous in limited data scenarios.
This aligns with hybrid model trends: Tiane et al. compared CNN-LSTM-GRU-DNN ensembles [
28]; Xu et al. used hybrid deep learning for early battery prediction [
29]. The above results may be limited by the effect of a small sample size (only 10) on the learning of complex degradation patterns of the model, and the relative advantages of the hybrid model can be further improved in the future by increasing the sample size or introducing data enhancement techniques.
In summary, the IABC-LSTM-Transformer not only outperforms the benchmark model in terms of average performance, but its stability is also verified in statistical analysis by integrating the innovative hybrid architecture design and improved artificial bee colony algorithm optimization. This provides an efficient solution for the prediction of complex degradation patterns, and highlights the key role of optimization strategies and hybrid architectures in enhancing the performance of deep learning models, providing a new technological path for the reliability assessment of high-voltage electronic devices.
4.4. Model Interpretability Analysis
In order to gain a deeper understanding of the predictive behavior of the IABC-LSTM-Transformer model and its capture of device degradation features, this study provides a detailed analysis in terms of both predictive performance and feature importance. The intuitive evidence of model interpretability is provided through visualization and quantitative metrics to enhance trust in the model’s decision-making process and to provide guidance for model optimization in real-world applications.
4.4.1. Detailed Analysis of Model Prediction Performance
The predicted values of the IABC-LSTM-Transformer model on each sample are highly compatible with the true values, especially in capturing the degradation trend. In order to visualize the model’s fitting effect at the learning level, the normalized data is used here as an example. Taking Sample 10 as an example, at the 11th time point of its test set sequence (which represents an observation point in the late stage of accelerated aging), the true value of the normalized current is 0.8993, and the predicted value of IABC-LSTM-Transformer is 0.9012, with a relative error of only 0.21%; the true value of normalized voltage at the same time point is 0.9916, and the predicted value is 0.9625. The predicted value is 0.9625 with a relative error of 2.93%. These low errors on the normalized scale indicate the model’s ability to accurately predict both current and voltage for the later degradation behavior, showing high prediction accuracy. All 10 samples consistently show high accuracy on the normalized data, and their prediction curves accurately reflect the nonlinear change trend, which lays a good foundation for the subsequent inverse normalization to the original unit for lifetime extrapolation.
Based on the above high-precision short-term prediction of degradation parameters, this study further employs a multi-stage fusion strategy to predict the final storage life of high-voltage diodes, which aims to combine the deterministic trend with stochastic fluctuations to enhance the robustness of long-term prediction. The strategy integrates three main components: (1) The LSTM-Transformer model based on IABC optimization for the main nonlinear trend prediction, which dominates the lifetime prediction; (2) The Wiener process model for simulating the stochastic fluctuation component of the degradation process; and (3) The statistical trend extrapolation based on the robust slopes of the historical data, which analyzes the growth pattern of the historical data, and according to the degradation rate of historical data to dynamically adjust the predictive weights of the statistical trend and the Wiener process. This dynamic weight adjustment mechanism allows the framework to focus on the strengths of different models according to the actual degradation characteristics of the data, resulting in a more robust overall prediction. This multi-model fusion prediction framework aims to balance the strengths of deep learning models in complex nonlinear modeling with the characteristics of traditional statistical models in trend extrapolation and stochasticity description to obtain more reliable lifetime prediction results. The representative sample test set prediction results are shown in
Figure 11.
Figure 11 shows the current and voltage prediction results for three representative samples (Sample 1, Sample 10, and Sample 2) on the test set, including true values, predicted values, and 95% confidence intervals (CIs). The horizontal axis represents the time points (time point 1 to 11) and the vertical axis represents the current and voltage values, respectively.
4.4.2. Feature Importance Analysis
As shown in
Figure 12, the feature importance analysis shows that the average importance of current features is 65.55% among the 10 samples, which is significantly higher than the average importance of voltage features of 34.45%. This indicates that the current-related degradation features play a more dominant role in predicting device storage life under the experimental conditions of this study. This result provides a valuable reference for future feature engineering, feature selection, and model optimization.
Figure 13 illustrates the average importance weights (in percent) of the current and voltage features over the 10 samples, with the sample number on the horizontal axis and the percent importance on the vertical axis.
4.5. Lifespan Prediction and Reliability Assessment
4.5.1. Lifetime Prediction and Failure Mode Analysis
Lifetime prediction is performed using a threshold crossing and hybrid extrapolation-based approach. Specifically, the degradation trend of the high-voltage diode is first predicted based on the IABC-LSTM-Transformer hybrid modeling framework, and 2.0
A and 15.0 V are adopted as the failure thresholds for the maximum reverse current and the highest forward voltage, respectively. During the prediction process, it is prioritized to check whether the degradation parameter crosses the threshold within the prediction horizon; if it does, the time corresponding to the first sustained crossing point (at least three out of five consecutive points above the threshold) is taken as the predicted lifetime, and the failure mode is determined based on the parameter that reaches the threshold first. If the predicted sequence does not cross the threshold in the field of view, a hybrid slope extrapolation method is used: the robust degradation rate of the historical data and the slope at the end of the model prediction are combined to calculate the integrated degradation rate and the predicted lifetime is obtained by extrapolating to the threshold accordingly. To assess the uncertainty of the prediction, a 95% confidence interval (CI) was constructed. The final prediction results showed that the average predicted lifetime of the 10 samples was 39,403.3 h (standard deviation 16,961.8 h), the mean value of the relative uncertainty of prediction was 12.57%, and the main failure modes were all reverse current overruns. There are large differences in lifetime between samples, e.g., the predicted lifetime of Sample 12 is 15,498 h (CI:[14,152, 16,844]), while Sample 18 reaches 74,350 h (CI:[66,942, 81,757]), and this difference mainly stems from the different current degradation rates of the samples: for example, the high current degradation rate of Sample 12 (0.00614) may lead to its shorter lifetime, while the low degradation rate of Sample 18 (0.00106) is consistent with its longer lifetime. The distribution of lifetimes across samples is right skewed, with a median of approximately 39,762 h, as shown in
Figure 13.
Figure 13 illustrates the predicted lifetime distribution of the 10 high-voltage diode samples, with the sample number on the horizontal axis and the predicted lifetime (in hours) on the vertical axis, reflecting the significant differences in lifetimes between the samples and the right-skewed distribution characteristics.
Based on the IABC-LSTM-Transformer hybrid modeling framework, the main predicted failure mode for all 10 high-voltage diode samples is reverse current overrun, which is consistent with the observation that the typical failure mechanisms and current features under high-temperature and high-humidity (110 °C, 85% RH) conditions are dominant in the model (average importance of 65.55%), and reflects the fact that the current model may have limited sensitivity to non-current-dominated failure modes and thus may have limited sensitivity. In terms of the prediction uncertainty quantification, the short-term degradation parameters are predicted based on 95% confidence intervals constructed from the first-order difference standard deviation of the historical data and a logarithmic growth scale factor, which have an average relative width of approximately 12% and 8% for current and voltage, respectively, over the test set. In this short-term prediction, if “the predicted value falls within ±10% error of the true value” is defined as the effective coverage, the coverage rate of current and voltage prediction reaches 92.5% and 98.3%, respectively. For the final storage life prediction, taking into account the predicted life values, the statistical characteristics of historical data, the extrapolation length, and the confidence level, the average predicted life of the 10 samples is 39,403.3 h (the prediction range is from 15,498 to 74,349.6 h, with a standard deviation of 16,961.8 h), which corresponds to an average prediction relative uncertainty of 12.57%.
These results show that the model is highly accurate and robust in short-term prediction and provides reliable uncertainty quantification for long-term life prediction, which can be used as a reference for predictive maintenance and the reliability design of power electronic systems [
30]. However, the predicted mean value in this study (39,403.3 h) is high compared to the lifetime of similar devices under high temperature and high humidity conditions (30,000 h) in some of the literature, which may be attributed to the differences in the failure threshold setting (2.0
A current and 15.0 V voltage in this study), the potential optimism of the hybrid extrapolation strategy (especially when degradation in the early part of the historical data is slow), and specific accelerated test conditions with device lot differences, whilst a small sample size (10) may be more sensitive to the estimation of individual long-life samples.
4.5.2. Ablation Test Analysis
In order to deeply verify the effectiveness of the fusion prediction framework, this study conducts a detailed comparative analysis of the fusion model and the IABC-LSTM-Transformer hybrid model framework through ablation experiments. The experimental results reveal the performance differences between the two in terms of prediction lifetime, uncertainty control, and confidence interval width, which are discussed in the following:
First, in terms of prediction lifetime, the prediction results of the fusion model and the IABC-LSTM-Transformer hybrid modeling framework show significant differences. The average prediction lifetime of the fusion model in the 10 samples is 38,929.98 h, while the average prediction lifetime of the single Transformer model is 49,748.85 h, with a difference of more than 10,000 h. Specifically, in samples 1 to 10, the predicted values of the fusion model are generally lower than those of the IABC-LSTM-Transformer hybrid model framework, the predicted lifetime of the fusion model in sample 1 is 31,279.5 h, while that of the IABC-LSTM-Transformer hybrid model framework is predicted to be 42,988.08 h, and the difference is as high as 11,708.58 h; similarly, in Sample 5, the fusion model predicts 25,578 h while the IABC-LSTM-Transformer hybrid model framework predicts 42,598.08 h, a difference of 17,020.08 h. This phenomenon suggests that the fusion model predictions are significantly different from the IABC-LSTM-Transformer hybrid model framework, and the fusion strategy may have avoided the problem of overestimation or bias that may occur in a single model by integrating the multi-model features or optimizing the prediction mechanism.
Secondly, the fusion model demonstrated a more superior performance in terms of uncertainty control. The average prediction relative uncertainty of the fusion model is 12.57%, which is significantly lower than that of the single Transformer model IABC-LSTM-Transformer hybrid model framework of 13.66%. Specifically for the sample data, for example, in Sample 1, the uncertainty of the fusion model is 11.17% compared to 13.80% for the IABC-LSTM-Transformer hybrid model framework, and in Sample 7, the uncertainty of the fusion model is 13.12% compared to 13.19% for the IABC-LSTM-Transformer hybrid model framework, although the differences are small, the fusion model still dominates. The overall trend shows that the fusion model has a lower relative uncertainty in most samples.
Further analyzing the confidence interval width, the fusion model also shows better control. The average confidence interval width of the fusion model is 10,147.53 h, which is much lower than that of the IABC-LSTM-Transformer hybrid modeling framework at 13,596.56 h, with an average difference of 3449.03 h. In Sample 3, for example, the confidence interval width of the fusion model is 12,933.01 h, while that of the IABC-LSTM-Transformer hybrid model framework is 14,131.68 h, with a difference of 1198.67 h; in Sample 5, the confidence interval width of the fusion model is only 5673.79 h, while that of the IABC-LSTM- Transformer hybrid modeling framework is as high as 11,749.85 h with a difference of more than 6000 h. This result indicates that the fusion model is more advantageous in terms of the reliability of the prediction results with narrower confidence intervals, reflecting higher predictive certainty.
Combined with the above analysis, the fusion prediction framework shows significant advantages in both uncertainty control and confidence interval width, which verifies the improvement of prediction accuracy by the fusion strategy. In summary, the fusion prediction framework with the IABC-LSTM-Transformer as the core component shows a stronger comprehensive performance in the lifetime prediction task, which reflects the unique advantages of the fusion strategy in improving the prediction accuracy and reliability. This result provides an important reference for subsequent research and applications.
5. Discussion and Conclusions
In this study, a hybrid LSTM-Transformer model based on Improved Artificial Bee Colony Algorithm (IABC) optimization is proposed for predicting the storage life of high-voltage diodes, which significantly improves the prediction accuracy. The experimental results show that IABC optimization enhances the model performance, with the average R2 improving from 0.892 to 0.924 and MSE decreasing by 33.7% (from 0.00574 to 0.00402) on the 10-fold cross-validation set, and the R2 improving from 0.801 to 0.892 and MSE decreasing by 48.1% on the independent test set (from 0.00636 to 0.00322). The improvements in MAE and RMSE also passed statistical significance tests. For the 10 high-voltage diodes tested, the predicted average lifetime was 39,403.3 h with a predicted mean relative uncertainty of 12.57%, and the dominant failure mode was reverse current overrun, consistent with the dominant role of the experimental conditions (110 °C, 85% RH) and current characteristics in the prediction. The predicted lifetime is higher than the value reported in the literature (30,000 h), which may be attributed to the specific failure threshold setting (current: 2.0 A, voltage: 15.0 V), test conditions, and limited sample size.
Despite the significant progress made in this study, several important limitations should be acknowledged. Most critically, the experimental validation was only based on 10 diode samples, which represents a significant constraint on the model’s generalization capability. Furthermore, all tests were conducted under identical environmental conditions (constant temperature and humidity), which does not reflect the varied operating environments that these components typically encounter in real-world applications [
31]. Additionally, the benchmark models used for comparison were not hyperparametrically optimized, which may have overestimated the performance gain of the proposed method [
32].
Future research will address these limitations by (1) substantially expanding the training dataset through both additional physical testing across a diverse range of components and data augmentation techniques to enhance model robustness and generalization capacity [
33,
34]; (2) introducing multi-stress coupling conditions with varied temperature, humidity, and electrical stress parameters to simulate real operating environments; and (3) extending the model to system-level reliability prediction, e.g., for applications in new energy inverters [
35].
In conclusions, by combining advanced optimization techniques with a hybrid deep learning architecture, this study provides a high-precision solution for the lifetime prediction of high-voltage electronic components, demonstrating its significant potential in power electronics reliability assessment. However, the current limitations in dataset size and environmental testing conditions must be addressed before practical implementation. Future work will be devoted to improving the generalizability and practicality of the model through expanded datasets with diverse environmental conditions to support the design of high-reliability systems.