1. Introduction
Insulated Gate Bipolar Transistor (IGBT) modules constitute critical components in modern power electronic systems, enabling energy conversion in applications ranging from renewable energy grids to electric vehicles [
1]. Their operational reliability directly impacts system safety and economic viability, with field statistics indicating that power semiconductor devices account for ≥31% of power converter failures [
2]. Among predominant failure mechanisms, bond wire degradation emerges as the primary failure mode in standard packaging architectures, responsible for ∼70% of module failures [
3]. Thermomechanical stress induced by coefficient of thermal expansion (CTE) mismatch between aluminum bond wires (
ppm/K) and silicon chips (
ppm/K) initiates interfacial cracking during thermal cycling [
4]. Progressive crack propagation increases electrical impedance and junction temperature, with heel-cracking failures elevating local stress by 77.4 MPa and lift-off of five wires increasing adjacent bond temperatures by 48.7 °C [
5]. Ultimately, this leads to catastrophic open-circuit failures. Consequently, comprehensive understanding and monitoring of bond wire degradation are imperative for predictive maintenance of high-reliability power systems.
Bond wire failure manifests primarily through two distinct modes: heel cracking at the wire-chip interface and lift-off detachment. The former results from cyclic shear stresses (
MPa) during power cycling, while the latter occurs due to cumulative fatigue at bonding points [
6]. Crucially, the spatial distribution of failures significantly influences electro-thermal characteristics. Concentrated lift-off on a single IGBT chip increases local current density by ≥35%, elevating junction temperature by
°C and accelerating aging [
7]. Multitier wire layouts mitigate thermal imbalance but introduce multicellular electro-thermal effects that exacerbate local overheating when bond contacts degrade non-uniformly [
8]. These phenomena underscore the complex interplay between mechanical degradation, electrical redistribution, and thermal runaway, necessitating physics-based monitoring approaches.
Existing bond wire health assessment techniques can be categorized into three primary approaches. Voltage-based methods dominate industrial practice due to implementation feasibility, where collector-emitter on-state voltage (
) monitoring leverages its linear relationship with bond wire resistance (
) [
9,
10]. However,
exhibits strong temperature dependence (
), necessitating complex junction temperature compensation that introduces ≥15% error. Dynamic voltage signatures like gate overshoot (
) [
11] and collector undershoot (
) [
12] provide higher sensitivity but require high-bandwidth measurement circuits susceptible to electromagnetic interference. Current-based approaches utilize signatures like short-circuit current (
) or differential-mode EMI spectrum shifts [
13], but suffer from implementation complexity and noise susceptibility [
14]. Advanced techniques include module transconductance (
) monitoring [
15], on-state inductance (
) measurement [
16], and convolutional neural networks processing gate waveforms [
17]. While these offer improved specificity, they typically require specialized sensors or complex calibration procedures. Active sensing techniques such as Spread Spectrum Time Domain Reflectometry (SSTDR) enable non-intrusive fault detection through impedance discontinuity analysis [
18], yet struggle with signal superposition in multichip modules. Multiphysics simulation approaches using Finite Element Analysis (FEA) combined with cohesive zone models (CZMs) or Paris law accurately capture crack propagation dynamics at micro-scales [
19,
20], but incur prohibitive computational costs (>6 h/simulation) unsuitable for real-time monitoring [
21].
The inherent trade-offs between measurement accuracy, implementation cost, and electromagnetic compatibility (EMC) in existing approaches motivate a co-design methodology that simultaneously addresses these constraints. This work resolves these conflicts through a non-intrusive architecture which features three key aspects. First, it relies exclusively on standard industrial sensors, such as current shunts and thermocouples, for feature extraction. This eliminates the need for specialized circuitry while maintaining full compatibility with conventional gate-drive designs. Second, a physics-constrained ridge regression meta-model is developed, combining predictions from three gradient boosting models: XGBoost (Extreme Gradient Boosting), LightGBM (Light Gradient Boosting Machine), and CatBoost (Categorical Boosting)—three advanced tree-based ensemble algorithms known for their efficiency, accuracy, and ability to handle complex feature interactions, which are essential for modeling the nonlinear electro-thermal behavior of IGBTs. This model embeds fundamental semiconductor constraints, expressed as
to ensure thermodynamic consistency during parameter drift. Finally, the feature engineering incorporates non-linear and interaction terms—including
,
,
, and
—which exhibit intrinsic noise immunity through quadratic regularization. This hardware-algorithm integration establishes a monitoring paradigm robust to the voltage transients characteristic of modern SiC-based converters while maintaining compatibility with normal operating conditions.
The remainder of this paper is organized as follows:
Section 2 introduces the physics-constrained ensemble framework for bond wire health assessment via
monitoring.
Section 3 details the
-based feature extraction methodology and its integration with the
prediction system.
Section 4 presents the experimental validation platform and results analysis. Conclusions are provided in
Section 5.
4. Integrated Framework Implementation for IGBT Health Monitoring
The proposed framework operates through a cohesive four-stage pipeline that transforms raw sensor data into a physics-consistent prediction. The process initiates with physics-informed feature extraction, where raw sensor measurements are acquired and transformed into a comprehensive 16-dimensional feature vector . This transformation involves computing the total power loss , calculating the normalized thermal gradient , generating critical interaction terms (, , ), constructing quadratic features (, , ), and applying logarithmic transformation to aging cycles .
Following feature extraction, three specialized gradient boosting models process the feature vector concurrently. The XGBoost model, configured with 200 estimators and a maximum depth of 6, has been specifically trained on late-stage degradation data (exceeding 50,000 cycles) to capture saturation effects. The LightGBM model utilizes leaf-wise growth with 31 leaves and is optimized for high-temperature conditions ( °C), enabling precise thermal stress modeling. The CatBoost model employs symmetric trees with ordered boosting and is trained on high-current operation data (), providing superior performance in overcurrent scenarios. These models generate individual predictions: , , and .
The dynamic weight adjustment mechanism then computes context-aware fusion weights based on localized model performance. This process begins by constructing a KD-tree
from the historical dataset
using the dominant degradation features
. For the current operating point
, the algorithm queries
for the 50 nearest neighbors
using Mahalanobis distance, which accounts for correlations between degradation drivers. For each model
m, a weighted mean absolute error is computed as:
The final model weights are derived through softmax normalization with a sensitivity parameter
:
Finally, the physics-constrained fusion stage combines the weighted predictions through a meta-learner that enforces fundamental semiconductor properties. The prediction matrix
is formed, and the final
prediction is obtained by solving the ridge regression problem with physics-based regularization:
with the composite loss function defined as:
This formulation ensures physical consistency by enforcing the constraints and , guaranteeing that predictions adhere to the inherent thermal-electrical relationships of IGBT devices. The framework outputs as the final health indicator, providing a robust and physically consistent estimate of bond wire degradation across the device’s operational lifespan.
5. Experimental Validation and Analysis
5.1. Experimental Platform and Methodology
The experimental validation platform integrates precision instrumentation designed to validate the physics-constrained monitoring framework, as illustrated in
Figure 4. The power stage consists of a SEMIKRON IGBT module with programmable DC supplies. The measurement system employs a HIOKI MR8875 DAQ (HIOKI E.E. CORPORATION, Nagano, Japan) with
probes achieving ±0.1% accuracy, while thermal management is provided by a precision chamber offering a temperature range of 25–150 °C. Thermocouple placement follows the spatial configuration shown in
Figure 3 with 15 mm spacing to enable
computation.
The experimental methodology encompasses four key phases designed to validate the framework’s capabilities. First, physics-guided monitoring implementation involves sensor deployment with K-type thermocouples installed at the IGBT baseplate center (
) and periphery (
) per
Figure 3, along with
probes using Kelvin connections to minimize measurement error. This configuration enables capture of the normalized thermal gradient
for degradation precursor analysis. Feature acquisition operates the IGBT at
V and
kHz while sweeping
(10–50 A) and
(25–150 °C) to generate the 16-dimensional feature vectors for ensemble training. Physics-constraint validation measures
at
°C under current steps of 30, 40, and 50 A, verifying
to enforce semiconductor physics in meta-model predictions.
Second, baseline characterization stabilizes the IGBT at °C using the thermal chamber, configures V using the DC supply, powers the gate driver with the DC supply (15 V/0.5 A), sweeps from 10 A to 50 A in 10 A increments and repeats this characterization at values of 40 °C, 80 °C, 125 °C, and 150 °C to establish reference under healthy conditions.
Third, accelerated aging protocol applies power cycling with °C, °C, maintaining A during the conduction phase. Continuous monitoring configures the MR8875 DAQ with a 1 ms sampling interval for while recording , fault signals, and values, terminating when degradation is reached to simulate bond-wire fatigue mechanisms.
Fourth, data processing and model training involves dynamic weighting calibration by computing Mahalanobis distances using and optimizing via cross-validation on aging datasets to implement context-aware model fusion. Physics-regularized training employs the meta-model with penalty for violations of and validates high compliance with the IGBT property to ensure predictions obey carrier mobility degradation principles.
5.2. Experimental Results and Analysis
This section presents comprehensive experimental validation of the proposed physics-guided ensemble framework for IGBT bond wire health assessment. The results demonstrate superior prediction accuracy, physical consistency, and robustness across diverse operating conditions, validating all theoretical aspects of the proposed methodology.
5.2.1. Overall Prediction Accuracy
Figure 5 demonstrates the correlation between predicted and measured
values across 3100 operational samples. The ensemble model achieves MAE = 0.0066 V and R
2 = 0.9998, indicating exceptional prediction fidelity. Detailed performance metrics are presented in
Table 4. Three methodological innovations contribute to this improvement: the
interaction term explicitly models electro-thermal stress acceleration in bond wires; quadratic features (
,
) provide inherent noise rejection while preserving degradation signatures; and the revised physics regularization ensures predictions strictly adhere to semiconductor principles, eliminating non-physical artifacts.
5.2.2. Adaptive Weighting Performance
Figure 6 provides empirical validation of the context-aware weighting mechanism. The weight distributions are:
Normal: XGBoost (0.35), LightGBM (0.35), CatBoost (0.30)
Overcurrent: XGBoost (0.15), LightGBM (0.20), CatBoost (0.65)
Thermal Stress: XGBoost (0.15), LightGBM (0.70), CatBoost (0.15)
End-of-Life: XGBoost (0.65), LightGBM (0.20), CatBoost (0.15)
The dynamic weighting scheme (
Figure 6) demonstrates specialized model activation: CatBoost dominates at high currents (60–75% weight), LightGBM at high temperatures (65–75%), and XGBoost in late-life stages (60–70%), as quantified in
Table 5. This context-aware selection leverages inherent model strengths: CatBoost’s ordered boosting handles current transients affecting bond wire stress; LightGBM efficiently processes thermal gradients; XGBoost models degradation saturation.
The dynamic weighting scheme (
Figure 7) demonstrates specialized model activation: CatBoost dominates at high currents (60–75% weight), LightGBM at high temperatures (65–75%), and XGBoost in late-life stages (60–70%). Smooth transitions occur along diagonal operational paths. This context-aware selection leverages inherent model strengths: CatBoost’s ordered boosting handles current transients affecting bond wire stress; LightGBM efficiently processes thermal gradients; XGBoost models degradation saturation. The Mahalanobis distance metric in
space (
,
,
) ensures neighborhoods share equivalent aging acceleration profiles, crucial for consistent health assessment during operational transitions.
5.2.3. Physical Constraint Verification
Figure 8 confirms strict adherence to
(0.025–0.075 V/°C) and
(0.02–0.04 V/(A·°C)) across the operational envelope. The
regularization term enforces two fundamental relationships: positive temperature coefficient from reduced carrier mobility (
) and current-dependent heating (
) that exacerbates bond wire thermo-mechanical stress. The verification shows that 99.1% of points satisfy
within the operational envelope (
A,
°C), using a numerical tolerance of
V/(A·°C). Violations occur only in low-stress regions where the derivative magnitude is below this tolerance.
5.2.4. Prediction Residual Analysis Across Operating Conditions
Residual analysis (
Figure 9) confirms prediction consistency across all conditions. Residual distributions are Gaussian with near-zero mean (
V), satisfying the theoretical expectation of unbiased estimation. The ensemble reduces residual spread by 35–62% versus base models, with the tightest distribution under thermal stress (
V). This validates the framework’s noise immunity during high-temperature operation. The residual analysis shows MAE values for each condition: Normal (0.0086 V), Overcurrent (0.0062 V), Thermal Stress (0.0006 V), and End-of-Life (0.0069 V). These values are consistent with the corresponding standard deviations (
). The theoretical relationship
for Gaussian residuals is approximately observed in the Thermal Stress condition; deviations in other conditions reflect non-Gaussian residual distributions, as evidenced by the reported skewness and kurtosis.
Figure 10 provides critical insight into error distribution patterns across different operating regimes, validating the theoretical framework for context-aware prediction. The ensemble achieves the smallest residual magnitude in all conditions, with the most significant advantage appearing in overcurrent conditions (45% reduction), thermal stress (18% reduction), and end-of-life stages (43% reduction). This pattern directly confirms the dynamic weighting mechanism, demonstrating how the meta-learner successfully identifies the optimal base model for each operating regime, combines predictions to compensate for individual model weaknesses, and maintains physical consistency through ridge regression constraints.
5.2.5. Degradation Tracking and Health Assessment
Figure 11 demonstrates accurate aging progression prediction across health states. The ensemble maintains MAE
V even at >20% degradation, validating the effectiveness of aging-sensitive features designed in the framework. The
feature linearizes initial bond wire fatigue progression, while
captures thermal acceleration effects. XGBoost’s progressive weighting (65%) in late stages prevents underestimation of bond wire lift-off—a critical advantage for remaining useful life (RUL) estimation in power modules. The aging trajectory includes 12,000 data points (sampled at 1 ms intervals) providing dense temporal validation of the tracking capability.
Table 6 provides a comprehensive comparison of bond wire monitoring techniques across multiple dimensions. All comparative methods are evaluated on the same dataset under identical experimental conditions to ensure fair comparison. The proposed method demonstrates superior accuracy (0.0066 V MAE), high physical consistency (99.1%), real-time computational capability, low implementation complexity using standard sensors, excellent multi-chip capability through spatial
averaging, high noise immunity via quadratic regularization, and early detection capability through the
precursor.
5.3. Discussion and Comparative Analysis
The experimental validation demonstrates several key innovations of the proposed framework. The physics-algorithm co-design achieves 48.4% MAE reduction versus base models and maintains 99.1% physical constraint compliance. Multi-timescale adaptation combines dynamic weighting for operational transients with aging feature sensitivity for long-term degradation tracking. Early detection capability is demonstrated through the precursor, which identifies degradation 5000 cycles before a 5% increase occurs.
Comparative analysis reveals distinct advantages across multiple dimensions. The proposed method demonstrates superior accuracy (0.0066 V MAE) compared to voltage-based (0.0195 V), current-based (0.0273 V), and advanced techniques including transconductance (0.0142 V) and on-state inductance (0.0167 V) methods. Physical consistency reaches 99.1%, significantly higher than CNN-based gate analysis (82.4%) and voltage-based approaches (84.2%). Implementation complexity remains low through exclusive use of standard industrial sensors, avoiding specialized circuitry required by methods like SSTDR or thermal imaging. Multi-chip capability is excellent due to spatial averaging, overcoming limitations of current-based methods with poor multi-chip performance. Noise immunity is enhanced through quadratic regularization, providing superior performance in EMI-rich power electronics environments compared to voltage and current-based techniques. Early detection capability through the precursor surpasses conventional electrical measurement approaches.
The framework offers practical implications for predictive maintenance systems by providing 15–20% earlier warning through monitoring. Industrial compatibility is ensured through exclusive use of standard sensors without requiring impractical junction temperature monitoring. Scalability to multi-chip modules is enabled by spatial averaging that accommodates non-uniform aging distributions.
Several limitations and future research directions are identified. Thermal measurement currently requires thermocouple placement that necessitates device disassembly; future work will investigate infrared thermography for non-contact monitoring. Cross-coupling effects in parallel chips experiencing non-uniform aging warrant further investigation through distributed temperature sensing arrays. Real-world validation in grid-scale converters is ongoing to assess performance under field conditions with complex loading profiles and environmental variations. Future developments will focus on wireless sensing integration and enhanced remaining useful life (RUL) estimation algorithms for comprehensive prognostic health management systems.
6. Conclusions
This paper has introduced and validated a physics-constrained ensemble learning framework for accurate and robust bond wire health assessment in IGBT power modules. The effectiveness of the proposed methodology is demonstrated through comprehensive experimental evaluation, which confirms significant performance improvements across multiple dimensions.
The framework achieves superior prediction accuracy, with the ensemble model attaining a mean absolute error (MAE) of 0.0066 V and a determination coefficient (R2) of 0.9998 for estimation. This represents a 48.4% reduction in MAE compared to the best-performing individual base model. The integration of physics-based regularization ensures high thermodynamic consistency, with the model maintaining 99.1% compliance with the fundamental semiconductor property . A key strength of the approach is its context-aware adaptability. The dynamic weighting mechanism successfully activates specialized models under respective extreme conditions—CatBoost under high current (60–75% weight), LightGBM under thermal stress (65–75%), and XGBoost during late-life degradation (60–70%)—resulting in smooth and robust performance transitions across operational regimes. Furthermore, the model demonstrates strong noise immunity, reducing residual spread by 35–62% compared to base models and showing minimal bias ( V). Comparative analysis confirms the practical advantages of the proposed system. It outperforms conventional voltage-based, current-based, and advanced monitoring techniques in accuracy (0.0066 V MAE vs. 0.0195–0.0350 V), physical consistency (99.1% vs. 78.5–92.1%), and implementation simplicity, relying solely on standard industrial sensors without requiring junction temperature measurement or specialized circuitry.
In summary, the framework establishes an effective synergy between data-driven learning and domain knowledge, providing a reliable, accurate, and industrially viable solution for IGBT health monitoring. Future work will focus on non-contact thermal sensing, validation in grid-scale applications, and integration with prognostic algorithms for remaining useful life estimation.