Behavioral Fault Diagnosis in Inverter-Driven PMSM Systems Using a Hybrid CNN–BiLSTM–Attention Deep Learning Framework with SHAP-Based Interpretability

Yılmaz, Ümit

doi:10.3390/machines14060638

Open AccessArticle

Behavioral Fault Diagnosis in Inverter-Driven PMSM Systems Using a Hybrid CNN–BiLSTM–Attention Deep Learning Framework with SHAP-Based Interpretability

by

Ümit Yılmaz

Quality Coordination Office, Bursa Technical University, Bursa 16310, Türkiye

Machines 2026, 14(6), 638; https://doi.org/10.3390/machines14060638

Submission received: 6 May 2026 / Revised: 26 May 2026 / Accepted: 27 May 2026 / Published: 1 June 2026

(This article belongs to the Special Issue New Advances in Electric Power Systems and Microgrids)

Download

Browse Figures

Versions Notes

Abstract

Reliable fault detection and diagnosis (FDD) plays a key role in inverter-driven permanent magnet synchronous motor (PMSM) systems, especially in applications where operational continuity cannot be compromised. In this work, a hybrid deep learning framework is developed by combining one-dimensional convolutional neural networks (CNN), bidirectional long short-term memory networks (BiLSTM), and a multi-head self-attention mechanism. The model targets multi-class fault classification in a three-phase PMSM inverter system. Its effectiveness is evaluated on a publicly available experimental dataset consisting of 10,892 multi-sensor samples collected under nine operating conditions, including normal operation, open-circuit faults, short-circuit faults, and half-bridge overheating scenarios. To avoid temporal data leakage, a block-aware chronological splitting strategy is applied. Model hyperparameters are determined through a validation process involving 24 different configurations. The proposed CNN–BiLSTM–Attention model achieves a macro F1-score of 0.9681 ± 0.0195, accuracy of 0.9810 ± 0.0102, Matthews correlation coefficient (MCC) of 0.9757 ± 0.0130, and ROC-AUC of 0.9996 ± 0.0003 over five independent runs, achieving the highest accuracy and MCC among all evaluated models; although the Random Forest baseline attains a marginally higher macro F1 score (0.9747) by operating on temporally aggregated features without temporal modelling, the proposed model provides superior discrimination across the full confusion matrix structure alongside end-to-end temporal interpretability via SHAP. Model interpretability is provided through SHAP (SHapley Additive exPlanations) GradientExplainer analysis, revealing that temperature-related features dominate fault discrimination, particularly for over-heating conditions, while current imbalance features are critical for distinguishing open- and short-circuit faults.

Keywords:

fault detection and diagnosis; permanent magnet synchronous motor; inverter; convolutional neural network; bidirectional LSTM; multi-head attention; explainable AI; SHAP

1. Introduction

Permanent magnet synchronous motors (PMSMs) are widely used in industrial and transportation systems, particularly in applications such as electric vehicles, wind energy conversion, and precision manufacturing. Their widespread adoption is largely driven by advantages such as high power density, energy efficiency, and compact design [1,2]. In inverter-driven systems, PMSMs function alongside power electronic converters that adjust the supply voltage through high-frequency switching. This enables flexible speed control and enhances overall system performance. However, the same integration also increases system complexity and brings a wider range of potential fault conditions. These include motor-related faults, such as inter-turn short-circuit (ITSC) and demagnetization, as well as inverter-related faults, including open-circuit and short-circuit switch failures and overheating conditions [3,4]. Because these electromechanical systems are tightly coupled, a fault in a single component may spread across the system within a short time. A failure of this kind may lead to unplanned downtime. It can also increase maintenance costs and raise safety concerns, especially in critical applications [5,6].

Reliable fault detection and diagnosis (FDD) plays an important role in condition-based maintenance of modern electric drive systems. Common signal-processing approaches such as motor current signature analysis (MCSA), Fast Fourier Transform (FFT), and wavelet decomposition are typically used to identify periodic fault patterns, particularly under stable operating conditions [7,8]. In inverter-driven PMSMs, these methods tend to lose effectiveness. Current signals are often non-stationary, operating speed varies, and switching harmonics are present, together masking fault-related spectral features and making reliable detection more difficult [9]. In real-world conditions, degradation does not usually occur in isolation. Different fault types may appear in various forms and with varying levels of severity, reducing the effectiveness of simple binary classification approaches and making more advanced multi-class diagnostic frameworks necessary [10].

Deep learning has improved fault detection and diagnosis (FDD) in electric drive systems. Convolutional neural networks (CNNs) are especially useful because they can learn spatial features and short-term temporal patterns directly from raw sensor data, reducing the need for manual feature extraction and making the diagnostic process more straightforward [2,11]. For time-series fault signals, recurrent architectures such as long short-term memory (LSTM) networks and their bidirectional extension (BiLSTM) are well-suited for capturing sequential dependencies [4,12]. Recent studies have also incorporated attention-based mechanisms inspired by transformer architectures. These approaches allow the model to focus on the most informative time steps and feature channels when identifying faults [13,14]. Using a single model architecture can limit performance. Hybrid approaches, on the other hand, combine different modeling strengths and usually lead to more robust and accurate results within an end-to-end framework [15,16].

Despite their strong performance, many deep learning-based FDD models suffer from limited interpretability. In safety-critical settings, it is not enough to achieve high accuracy. The reasoning behind each fault classification also needs to be clearly understood. SHapley Additive exPlanations (SHAP), which is based on cooperative game theory, offers a model-agnostic framework for quantifying the contribution of each feature to individual predictions [17]. Recent studies have applied SHAP to vibration-based machine learning models for motor diagnostics [18] and to multimodal predictive maintenance systems [19]. Its use in end-to-end deep learning architectures for PMSM inverter fault diagnosis, however, remains limited. Expanding its application in this context would help improve transparency, support operator decision-making, and increase trust in industrial systems.

A hybrid deep learning framework combining CNN, BiLSTM, and a multi-head self-attention mechanism is developed for multi-class fault diagnosis in inverter-driven PMSM systems. The approach is validated using a publicly available multi-sensor experimental dataset consisting of 10,892 samples collected under nine operational conditions [20]. The architecture sequentially extracts local temporal features via CNN, models bidirectional long-range dependencies via BiLSTM, and applies adaptive time-step weighting via a multi-head self-attention sub-layer with residual connections. A block-aware chronological data splitting strategy is adopted to prevent temporal data leakage, and hyperparameters are selected through a 24-configuration validation sweep. SHAP GradientExplainer analysis is applied post hoc to the trained model, yielding physically interpretable feature importance rankings that are validated against the known mechanisms of each fault class. The principal contributions of this work are as follows: (i) a systematic ablation study quantifying the incremental contribution of each architectural component; (ii) a methodologically rigorous evaluation protocol that avoids temporal leakage in block-structured datasets; and (iii) SHAP-based interpretability analysis that links model decisions to measurable physical sensor signatures across nine fault classes.

The remainder of this paper is organised as follows. Section 2 reviews related work on deep learning-based PMSM fault diagnosis, attention mechanisms, and SHAP-based interpretability. The experimental dataset and its structural characteristics are presented in Section 3. Section 4 outlines the proposed methodology, including preprocessing procedures, data partitioning, model architecture, and the training strategy. Comparative performance results and ablation analysis are reported in Section 5. Section 6 provides the SHAP-based explainability analysis. Section 7 discusses the results in the context of related literature, and Section 8 concludes with directions for future research.

2. Related Work

Early deep learning approaches to PMSM fault diagnosis relied predominantly on one-dimensional CNN architectures applied to raw current or vibration signals. Song et al. [6] proposed a multiscale kernel residual CNN for inter-turn short-circuit estimation that demonstrated effectiveness under complex operating conditions. Li et al. [11] extended this direction by developing a mechanism-based fault diagnosis method using time-frequency image representations, achieving over 98.6% accuracy on ITSC and demagnetisation faults. The repeated finding across this body of work is that CNNs effectively extract discriminative local features from sensor time series, but their inherently fixed receptive fields limit the capture of long-range temporal dependencies that span multiple electrical cycles.

To address the temporal modelling limitations of CNNs, recurrent neural network variants have been incorporated into FDD pipelines. Yan and Hu [4] demonstrated that a multiscale residual dilated CNN combined with a BiLSTM layer achieved 4.2% higher accuracy than a standalone CNN and 29.06% higher than a standalone BiLSTM for ITSC and demagnetisation fault diagnosis in ship PMSMs, directly motivating the hybrid design adopted in the present study. Yatak [3] proposed a hybrid deep model for simultaneous inverter-driven and stator winding fault detection in PMSMs, achieving 99.44% and 99.98% accuracy on the two fault categories using multiple signal transforms. Peng et al. [9] proposed a self-attention-enhanced convolutional architecture capable of diagnosing early PMSM faults under multiple unseen operating conditions by modelling long-range dependencies in two-phase current signals without relying on sequential recurrent structures. Lee et al. [12] demonstrated that attention recurrent neural networks could reliably estimate ITSC fault severity across varying operating points, establishing the diagnostic value of attention-gated recurrent processing for severity-sensitive applications. Gmati et al. [21] proposed a BiLSTM-based open-circuit fault diagnosis approach for induction motor drives and reported only marginal accuracy gains from bidirectionality over standard LSTM (98.07% vs. 97.69%), suggesting that the utility of bidirectional temporal modelling is task-dependent and may vary with signal characteristics and fault type.

The recognition that CNN and recurrent components capture complementary aspects of fault signals, local feature patterns versus long-range temporal dependencies, has motivated a growing class of hybrid architectures. Xu et al. [13] proposed a CNN–LSTM–Attention model for PMSM fault diagnosis that achieved at least 97% accuracy with strong adaptability across common fault types, confirming the generalisability of the hybrid design principle. Yang et al. [15] developed a hybrid CNN–BiLSTM–Multi-Head Self-Attention model for rotor motor bearing fault diagnosis that achieved 99.33% accuracy under variable speeds and demonstrated stability in real-world conditions. Fan et al. [14] proposed a large-kernel group convolutional perceptron attention network for ITSC fault diagnosis in PMSMs, in which multi-head self-attention improved both feature representation and interpretability. Overall, these results suggest that combining CNN, BiLSTM, and attention mechanisms generally provides more consistent accuracy gains than using a single model across different PMSM fault scenarios.

Beyond hybrid CNN-recurrent designs, attention-based and transformer-inspired architectures have also been explored for motor fault diagnosis. Zheng et al. [16] developed an interpretable harmonic-aware dual-branch neural network that achieved 99.90% accuracy and 99.91% F1-score under signal disturbances for open-circuit fault diagnosis in dual three-phase PMSMs, integrating SHAP to support interpretability. Sun et al. [22] proposed a 1D-CNN–MLP–cross-attention architecture with a golden cosine scheduler, demonstrating the utility of cross-attention for fusing time-domain and frequency-domain feature representations. Their model achieved 99.83% baseline accuracy and demonstrated strong robustness by maintaining over 90% accuracy even under extreme 0 dB noise conditions. The present study adopts a multi-head self-attention sub-layer with residual connections following the BiLSTM encoder, as this configuration has been shown to provide more stable training than full transformer encoders for low-frequency industrial time series.

A parallel research direction emphasises the fusion of heterogeneous sensor modalities to improve diagnostic comprehensiveness and noise robustness. Fan and Hu [23] reported 98.2% accuracy by fusing vibration, temperature, and electrical signals within an attention-based lightweight architecture suitable for real-time edge deployment. Cömert et al. [10] demonstrated 100% and 98.95% accuracy for ITSC and inter-coil fault detection by combining current and vibration signals in a data fusion framework. Wang et al. [5] showed that the synchronised fusion of current and vibration signals, tuned via Bayesian hyperparameter optimisation, improves robustness of severity estimation for early ITSC diagnosis. The present study evaluates a multi-sensor dataset comprising current, DC bus, temperature, and driver voltage measurements, leveraging the complementary fault-discriminative information across these modalities without requiring external sensor synchronisation.

Hyperparameter optimisation has received increasing attention as a means of improving the generalisation and efficiency of deep FDD models. Wang et al. [5,24] applied Bayesian optimisation for hyperparameter tuning in CNN-based ITSC diagnosis, demonstrating improved accuracy and reduced model complexity. Zhang et al. [25] employed a multi-objective tree-structured Parzen estimator to optimise a residual CNN for ITSC fault diagnosis, achieving 99.62% accuracy with improved noise robustness. In the present study, a systematic grid sweep over 24 configurations is used to select CNN filter counts, BiLSTM unit sizes, dropout rates, and learning rates, providing a transparent and reproducible hyperparameter selection protocol. Exhaustive enumeration is computationally tractable for this four-dimensional discrete grid (2 × 2 × 3 × 2) and eliminates the stochastic coverage gaps inherent to random or Bayesian sampling on sparse grids; reproducibility is also maximised since every configuration is evaluated under identical conditions. The number of attention heads (num_heads = 4) was fixed a priori following standard transformer encoder practice and was not included in the sweep, as the search space was already fully enumerated without this dimension. XGBoost hyperparameters, by contrast, were selected via RandomizedSearchCV because that search space is substantially larger and partly continuous, making exhaustive enumeration infeasible; the two strategies are therefore complementary choices matched to their respective search space sizes.

The use of explainable artificial intelligence (XAI) in deep learning-based FDD systems has become more prominent, especially in safety-critical settings where transparency and regulatory requirements play an important role. Shojaeinasab et al. [17] proposed a unified XAI framework for signal-based models, integrating SHAP-based feature selection with interpretable outputs, and reported improved model simplicity without loss of accuracy. In a related study, Wang and Wang [18] applied SHAP to vibration-based machine learning models for motor fault diagnosis, showing that interpretability can enhance both model reliability and alignment with physical system behavior. Sharma et al. [26] incorporated SHAP into an ensemble model combining CNN, LSTM, and random forest, achieving strong classification performance and enabling real-time analysis of feature contributions. In another study, Awan et al. [27] developed an explainable framework for power electronics fault diagnosis using LIME, SHAP, and attention mechanisms, and evaluated its interpretability on both simulated and real-world datasets. Despite these advances, the coherent integration of SHAP with deep hybrid CNN–BiLSTM–Attention architectures applied to multi-class PMSM inverter fault datasets remains an underexplored area, constituting the primary interpretability contribution of the present study.

Taken together, the surveyed literature demonstrates that while individual components, CNN-based feature extraction, BiLSTM temporal modelling, multi-head attention, and SHAP interpretability have each been validated independently for motor fault diagnosis, their systematic combination within a single end-to-end framework evaluated under a rigorous non-leaking temporal split protocol on a multi-fault PMSM inverter dataset has not been previously reported. The majority of existing studies either employ random data splits that inflate performance estimates by distributing time-adjacent samples across training and test partitions, or they limit interpretability to global attention visualisations without per-feature Shapley attributions at the fault-class level. The present study addresses both gaps simultaneously, contributing a methodologically robust evaluation and physically validated explainability analysis for the nine-class PMSM inverter fault diagnosis task.

3. Dataset Description

The experimental evaluation uses the publicly available multi-sensor PMSM inverter fault dataset introduced by Bacha [20]. The dataset was collected from a custom-built laboratory test bench comprising a three-phase two-level MOSFET inverter powered by a 15 V DC supply and driving a PMSM converted from a DENSO car alternator. Data acquisition was performed at a sampling frequency of 10 Hz using an Arduino-based system, with motor speed regulated at a constant 10 rad/s via Field-Oriented Control throughout all recording sessions. Speed variation primarily affects the fundamental electrical frequency and the amplitude of current harmonics, whereas load variation modulates steady-state current magnitude and thermal dissipation patterns. Holding the operating point constant ensures that signal variations across fault classes are attributable to fault conditions rather than changes in external loading.

The dataset contains 10,892 samples organised into nine operational classes, as presented in Table 1: one normal operating condition (F0) and eight fault scenarios covering high-side open-circuit faults (F1), low-side open-circuit faults (F2), low-side short-circuit faults (F3), high-side short-circuit faults (F4, F5), and overheating conditions affecting individual or multiple half-bridge modules (F6, F7, F8). Each sample comprises eight raw sensor measurements, two phase currents (Ia, Ib), DC bus voltage (VDC), DC bus current (IDC), three half-bridge temperatures (T1, T2, T3), and driver voltage (VD), together with fifteen derived features including physical unit conversions, DC power, AC power, current imbalance, maximum temperature difference, normalised currents, and moving averages and rates of change for key signals.

The distribution of the analysed classes is shown in Figure 1, where F0 constitutes 39.4% of total samples while fault classes range from 3.1% (F4) to 15.9% (F7), indicating a noticeable imbalance typical of real-world data. The dataset has a temporal structure of nine consecutive blocks, each corresponding to a single experimental recording session conducted under a specific operating condition. This block structure affects the experimental design and is discussed further in Section 4.2.

Figure 2 shows the Pearson correlation heatmap for the eight raw sensor channels. A clear negative correlation appears between Ia and Ib (r = −0.77), consistent with the expected phase relationship in a three-phase system. Moderate positive correlations were observed among the temperature sensors T1, T2, and T3 (r = 0.29–0.52), reflecting thermal interactions between adjacent half-bridge modules. The voltage-related channels (VDC, VD, IDC) show weak correlations with both current and temperature signals, indicating that each sensor group captures distinct aspects of system behavior. This distinction supports the multi-sensor fusion approach adopted in the proposed framework.

Figure 3 illustrates the temporal evolution of current imbalance and maximum temperature difference across the complete dataset. In the OC and SC fault regions (F1–F5), current imbalance shows clear transient peaks rather than a stable pattern, whereas the overheating regions (F6–F8) display a gradual increase in temperature difference. These distinct responses suggest that each fault type produces its own characteristic signature in the sensor signals. Figure 4 presents per-class box plots for T1 and Ia, further illustrating the separation between overheating faults, characterised by elevated T1 distributions, and OC/SC faults, distinguished by their Ia amplitude and spread patterns.

4. Methodology

4.1. Pre-Processing and Feature Selection

Among the 25 available features, eight features were excluded from the model input based on redundancy and data quality considerations. The Timestamp column encodes data acquisition order and carries no physical sensor information. Ia_arduino and Ib_arduino are alternative calibration estimates of Ia_original and Ib_original with Pearson correlation coefficients exceeding 0.99, and their inclusion would introduce near-perfect multicollinearity without adding diagnostic information. IDC_arduino and IDC_original are both derived from the same DC current channel and are therefore redundant. Ia_Normalized and Ib_Normalized contain 28 infinite values resulting from division-by-zero in low-current transients; since the raw current signals are normalised by StandardScaler within the pipeline, these features are not required. The final feature set consists of 18 variables: eight raw sensor measurements (Ia, Ib, VDC, IDC, T1, T2, T3, VD) and ten derived features (Ia_original, Ib_original, Power_DC, Power_AC, Current_Imbalance, Temp_Diff_Max, VDC_RateOfChange, IDC_RateOfChange, VDC_MovingAvg, IDC_MovingAvg).

Data quality issues identified prior to splitting were addressed as follows. Three features contained infinite values (Current_Imbalance: 21; Ia_Normalized: 14; Ib_Normalized: 14) arising from division operations on near-zero denominators; excluded features were not affected. Four features contained isolated NaN values (VDC_RateOfChange: 1; IDC_RateOfChange: 1; VDC_MovingAvg: 9; IDC_MovingAvg: 9) resulting from differencing and windowing operations at the start of each recording. Infinite values were replaced by NaN, and all NaN values were imputed using the training-set median of the corresponding feature, computed prior to applying the fill values to the validation and test splits. This imputation order preserves temporal data integrity and prevents any information flow from the test partition into the training pipeline.

4.2. Block-Aware Data Splitting

The dataset is organised in temporal blocks, with each fault class captured as one continuous recording segment. A similar structure has been reported in other inverter-driven synchronous motor fault datasets [29]. A randomly stratified split would distribute time-adjacent samples across partitions, constituting data leakage and producing optimistically biased accuracy estimates.

To address this, a block-aware chronological splitting strategy is adopted. Within each class block, samples are partitioned sequentially into 65% training, 20% validation, and 15% test sub-blocks. Windows are subsequently constructed independently within each sub-block, ensuring that no sliding window crosses a training–validation–test boundary. This procedure yields 1393 training windows, 413 validation windows, and 306 test windows across all nine classes. The increased validation fraction of 20% (compared to the symmetric 15%/15% split used in preliminary experiments) was adopted to stabilise hyperparameter selection, as a 305-window validation set produced several degenerate configurations with perfect validation F1 = 1.0 due to insufficient sample diversity.

The temporal structure visible in Figure 3, transient current imbalance peaks in F1–F5 and a gradual temperature ramp-up in F6–F8, raises the question of whether boundary artefacts near block edges influence evaluation. Because the sequential split is applied within each class block, the early transient region of each class falls predominantly in the training sub-block. The test sub-block corresponds to the trailing 15% of each class recording, capturing quasi-steady-state fault signatures after startup transients have settled. Windows are constructed exclusively within each sub-block with no window crossing a block boundary, so neither the training–validation nor the validation–test boundary generates contaminated windows that mix transient and steady-state samples. The reported test performance therefore reflects the model’s ability to diagnose established fault conditions; fault onset detection represents a distinct diagnostic problem requiring a different experimental protocol.

4.3. Sliding Window Construction

Sequential sensor measurements are organised into fixed-length sliding windows to provide temporal context for the recurrent and attention components of the proposed architecture. A window size of w = 15 samples (1.5 s at 10 Hz) and a stride of s = 5 samples (0.5 s) were selected based on the window size sensitivity analysis presented in Section 5.3. At 10 Hz, the acquisition system resolves thermal dynamics and steady-state current asymmetry, the persistent, low-frequency signatures that each fault class imprints on the sensor channels, but cannot capture high-frequency switching transients such as PWM-induced voltage spikes at the inverter switching frequency, which typically falls in the kHz range. The 1.5 s window captures the slowly evolving fault envelope rather than instantaneous switching behaviour, which is consistent with the nature of the features retained in the final feature set. Regarding electrical cycle coverage, the motor operates at a constant speed of 10 rad/s; the fundamental mechanical frequency is therefore approximately 1.6 Hz, and the 1.5 s window spans roughly 2.4 mechanical cycles, which is sufficient to capture the repeating current asymmetry and thermal patterns that characterise each fault class under steady-state conditions. Each window is represented as a tensor of shape (15, 18) corresponding to 15 consecutive time steps and 18 input features. Windows are constructed per class within each split; no window crosses a class boundary, preserving the semantic integrity of fault episodes.

StandardScaler normalisation is applied by fitting the scaler on the training split feature matrix and applying the identical transformation to the validation and test splits. Class imbalance is handled through weighted loss during deep learning training (class weights computed using sklearn compute_class_weight with the balanced strategy) and through sample weights for the XGBoost baseline model.

4.4. Proposed Architecture: CNN–BiLSTM–Attention

The proposed model processes windowed multi-sensor sequences through four sequential functional blocks, as illustrated schematically in Table 2.

The CNN block consists of two one-dimensional convolutional layers with ReLU activation, batch normalisation, and dropout. The first layer applies fc filters of kernel size 3 to the input sequence, extracting local temporal patterns across the 18-dimensional feature space. The second layer doubles the filter count to 2fc, deepening the hierarchical representation. MaxPooling is deliberately omitted to preserve the temporal resolution required by the subsequent recurrent layers.

The BiLSTM block comprises two stacked bidirectional LSTM layers. The first processes the CNN output sequence in both forward and backward directions using lu × 2 recurrent units per direction, capturing long-range temporal dependencies from both past and future context within the window. The second BiLSTM layer uses lu recurrent units per direction, producing a sequence of hidden states of dimensionality lu × 2 that serves as input to the attention sub-layer.

The multi-head self-attention sub-layer computes scaled dot-product attention with four heads and key dimensionality 16 over the BiLSTM output sequence. A residual connection adds the attention output to the BiLSTM sequence, followed by layer normalisation. A position-wise feed-forward sub-layer with dimensionality lu × 2 and dropout is then applied with a second residual connection and layer normalisation, following the standard transformer encoder block design [30].

The classifier head applies GlobalAveragePooling1D to aggregate the attended sequence into a fixed-size vector, followed by two dense layers with ReLU activation and dropout, and a final softmax output layer with nine units corresponding to the nine operational classes.

4.5. Hyperparameter Selection

Hyperparameters for the proposed model were selected through a systematic grid sweep over 24 configurations covering: filters fc ∈ {64, 128}, LSTM units lu ∈ {32, 64}, dropout rate ∈ {0.1, 0.2, 0.3}, and learning rate lr ∈ {10⁻³, 5 × 10⁻⁴}. Each configuration was trained for up to 60 epochs with early stopping (patience = 8) and evaluated on the validation set using macro F1. Figure 5 presents the mean validation F1 and standard deviation for each hyperparameter value, averaged across all configurations sharing that value. The selected best configuration, fc = 64, lu = 32, dropout = 0.2, lr = 10⁻³, achieved a validation F1 of 1.000 and was selected as the most parsimonious configuration at the top of the ranking. All ablation and competitor models were constructed using the same fc, lu, and dropout values to ensure fair component-wise comparison.

XGBoost hyperparameters were selected via RandomizedSearchCV with 30 iterations and five-fold stratified cross-validation on the training windows, optimising for macro F1. The search covered n_estimators ∈ [100, 500], max_depth ∈ [3, 9], learning_rate ∈ [0.01, 0.20], subsample ∈ [0.6, 1.0], colsample_bytree ∈ [0.6, 1.0], and min_child_weight ∈ [1, 10]. The best configuration achieved a cross-validation macro F1 of 0.9657.

4.6. Training Protocol

All deep learning models were compiled with the Adam optimiser and sparse categorical cross-entropy loss. Training proceeded for up to 100 epochs with early stopping (patience = 12, monitor = val_loss, restore_best_weights = True) and ReduceLROnPlateau (factor = 0.5, patience = 5, minimum learning rate = 10⁻⁶). A mini-batch size of 32 was used throughout. Class imbalance was addressed by passing per-class weights to the Keras class_weight argument during training, computed as described in Section 4.3.

To quantify stochastic training variability, each deep learning model was trained five times using independently initialised random seeds (42, 43, 44, 45, 46). At each run, all stochastic elements, weight initialisation, mini-batch sampling order, and dropout masks are re-seeded independently. Performance metrics are recorded on the fixed held-out test set for each run, and the standard deviation across the five values quantifies sensitivity to training stochasticity. This protocol does not incorporate data resampling uncertainty such as bootstrap confidence intervals, as the block-aware split is fixed to preserve temporal integrity; the reported standard deviation therefore reflects initialisation and optimisation variance only. Classical ML models are deterministic given a fixed random seed; therefore, single-run results are reported for these baselines without standard deviation. All experiments were conducted on Google Colaboratory using an NVIDIA Tesla T4 GPU (16 GB VRAM). The five independent training runs for the proposed model were completed in approximately 6 min in total (approximately 1–2 min per run), with early stopping terminating training before the 100-epoch limit in all runs. The complete experimental pipeline, including exploratory analysis, hyperparameter sweep, all baseline and ablation model training, and SHAP attribution computation, was completed in under 50 min.

4.7. Evaluation Metrics

Model performance is assessed using four complementary metrics that together provide a comprehensive view of classification quality under class imbalance. Macro-averaged F1-score (Macro F1) assigns equal weight to each class regardless of sample count and is therefore sensitive to minority-class performance. Matthews Correlation Coefficient (MCC) is a single scalar summary of the full confusion matrix that accounts for all four contingency cells simultaneously and is considered one of the most informative metrics for multi-class imbalanced classification. Overall accuracy measures the fraction of correctly classified test windows. The macro-averaged Area Under the ROC Curve (AUC) measures how effectively the model separates classes across different decision thresholds. Precision, recall, and F1-score are also reported for each class individually.

5. Results

5.1. Comparative Performance

Table 3 summarises the performance of all evaluated models. Among the classical ML baselines, Random Forest attains the highest macro F1 score of 0.9747, exceeding the performance of the tuned XGBoost (F1 = 0.9406) and MLP (F1 = 0.9092). This result is noteworthy because all tabular baselines aggregate the sliding window into a single mean feature vector, discarding temporal structure. The competitive performance of Random Forest under these conditions indicates that fault-discriminative information is partially encoded in cross-sensor relationships rather than exclusively in temporal dynamics, consistent with the strong inter-class separation visible in the box plots of Figure 4.

Among the deep learning sequence models, CNN only achieves a macro F1 of 0.9754 ± 0.0127, which is comparable to Random Forest, confirming that local convolutional feature extraction provides strong discriminative capacity even without explicit temporal modelling. The overlapping standard deviation intervals between CNN only and the proposed model (0.9754 ± 0.0127 vs. 0.9681 ± 0.0195) indicate that the macro F1 difference does not reach practical significance under the five-run protocol; the proposed model’s primary gains are in accuracy (0.9810 vs. 0.9791) and MCC (0.9757 vs. 0.9734), reflecting more consistent discrimination across the full confusion matrix. The pure BiLSTM model (F1 = 0.9423 ± 0.0085) and LSTM (F1 = 0.9404 ± 0.0081) both underperform CNN only, suggesting that temporal dependencies alone are insufficient when the convolutional feature hierarchy is absent. CNN + BiLSTM improves over both components individually (F1 = 0.9564 ± 0.0112), confirming the complementary nature of the two processing stages.

The proposed CNN–BiLSTM–Attention model achieves the highest accuracy (0.9810 ± 0.0102) and MCC (0.9757 ± 0.0130) across all evaluated models, and a macro F1 of 0.9681 ± 0.0195. It is important to acknowledge that Random Forest achieves a macro F1 of 0.9747, which marginally exceeds the proposed model’s 0.9681 ± 0.0195. This result reflects the partial separability of fault classes through cross-sensor relationships captured in temporally aggregated features, consistent with the strong inter-class separation visible in the box plots of Figure 4; however, the proposed model achieves higher accuracy (0.9810 vs. 0.9575) and MCC (0.9757 vs. 0.9480), metrics that account for the full confusion matrix structure, and additionally provides temporal interpretability through SHAP attributions and calibrated probabilistic outputs. Critically, the addition of the attention mechanism to CNN + BiLSTM reduces the standard deviation of macro F1 from ±0.0112 to ±0.0195 in this run set; however, the model simultaneously achieves the highest single-metric scores in accuracy and MCC, reflecting more consistent discrimination of the most challenging minority classes. The CNN + Transformer competitor achieves F1 = 0.9569 ± 0.0191 and accuracy = 0.9725 ± 0.0127, both below the proposed model. These results demonstrate that the recurrent temporal modelling of BiLSTM combined with attention-based time-step weighting is more effective than the purely attention-based temporal modelling of the Transformer encoder for this dataset, where the 10 Hz sampling rate and 1.5 s window capture predominantly steady-state and slowly varying fault dynamics rather than high-frequency transients.

5.2. Ablation Study

Table 3 serves simultaneously as an ablation study by tracking performance as each architectural component is added incrementally. Beginning from a pure LSTM baseline (F1 = 0.9404), the substitution of LSTM with the full BiLSTM captures bidirectional temporal context but yields only a marginal improvement (F1 = 0.9423), consistent with other studies that find the added benefit of bidirectionality to be task-dependent [21]. The integration of CNN feature extraction ahead of BiLSTM (CNN + BiLSTM; F1 = 0.9564) provides a clear improvement, confirming that local convolutional feature learning and recurrent temporal modelling are complementary. The addition of the multi-head self-attention sub-layer to CNN + BiLSTM yields the proposed model (F1 = 0.9681), a further improvement of 0.012 in macro F1 together with the highest accuracy (0.9810) and MCC (0.9757) across all deep learning models. These results collectively validate the three-stage design rationale: CNN for local feature extraction, BiLSTM for bidirectional temporal modelling, and attention for adaptive time-step weighting.

5.3. Window Size Sensitivity Analysis

The sensitivity of the proposed model to the sliding window size is summarised in Table 4 based on three independent runs for each setting. Each configuration uses the same best_cfg hyperparameters, so the effect of window size can be examined independently. The results show that accuracy remains relatively stable across different window sizes, consistently staying above 0.96. Macro F1 peaks at ws = 15 (F1 = 0.9579 ± 0.0324) and ws = 30 (F1 = 0.9620 ± 0.0155), with ws = 20 showing slightly lower F1 (0.9469 ± 0.0144) and the highest standard deviation. The lowest standard deviation across all metrics is observed at ws = 30 (F1 std = 0.0155), indicating more stable training with longer windows. The selection of ws = 15 balances diagnostic temporal coverage, sufficient to capture approximately one to two electrical cycles at the 10 Hz sampling rate, with the constraint that the smallest class (F4, 341 samples) must yield an adequate number of test windows for reliable evaluation.

5.4. Per-Class Evaluation of the Proposed Model

The confusion matrix for the best single run of the proposed model is presented in Figure 6. Seven of nine classes achieve perfect precision (1.000), reflecting highly selective decision boundaries. The most challenging class is F2 (OC S6 Low-side), where precision reaches 0.969 and recall reaches 1.000, indicating that occasional F2 misclassifications involve samples predicted as F2 when belonging to a different class. F0 (Normal operation) achieves a precision of 1.000 and recall of 0.992, with one normal sample misclassified as F2. These confusions are physically interpretable: low-side open-circuit faults (F2) produce current waveform distortions that may, in certain time windows, resemble normal operation transients.

6. Explainability Analysis

Model interpretability is provided through SHAP GradientExplainer applied to the complete trained CNN–BiLSTM–Attention model, from the raw input layer to the nine-unit softmax output. The explainer backpropagates gradients from each class-specific output neuron through the full forward pass, CNN feature extraction, BiLSTM temporal encoding, multi-head self-attention, and global average pooling, to the raw input tensor of shape (15, 18), using 100 randomly sampled training windows as background references and computing attributions for 200 test windows. The resulting SHAP arrays have shape (n_samples, T, F) per class; the time dimension is averaged (mean |SHAP| over T = 15 steps) to yield per-feature importance vectors of shape (F = 18) for each class, and global importance is obtained by averaging across all nine classes. This temporal averaging is appropriate because the diagnostic question of interest concerns which sensor channels are most discriminative for each fault class, rather than which specific sub-second intervals within the 1.5 s window are most informative. Attributions therefore reflect end-to-end contributions through all architectural components rather than being local to any single intermediate layer. GradientExplainer provides gradient-based approximations of Shapley values rather than exact solutions; for architectures containing recurrent components such as BiLSTM, the approximation quality is bounded by the smoothness of the gradient landscape, and the resulting attributions should therefore be interpreted as directional indicators of feature importance rather than precise Shapley values.

Figure 7 presents the global feature importance ranking. Temperature-related features dominate: T1 (half-bridge 1 temperature) and Temp_Diff_Max (maximum temperature differential) rank first and second, with mean |SHAP| values of 0.0083 and 0.0077 respectively, followed by T2 (0.0050) and T3 (0.0027). Current-related features occupy intermediate positions: Ib and Power_AC rank fifth and sixth, while Current_Imbalance, Ib_original, and Ia_original follow closely. Voltage and DC bus features (VDC, IDC, VD) consistently rank in the lower tier.

The per-class SHAP heatmap is shown in Figure 8, highlighting different sensor activation patterns for each fault type. For overheating faults (F6, F7, F8), T1, T2, and Temp_Diff_Max show the largest SHAP values, consistent with the direct thermal response of these sensors during half-bridge overheating. For open-circuit faults (F1, F2) and short-circuit faults (F3, F4, F5), current-related features, particularly Ib, Ia_original, and Current_Imbalance, show elevated importance. For normal operation (F0), feature importance remains low across both temperature and current signals, suggesting that the model relies on the absence of strong distinguishing patterns. These results are consistent with the expected physical behavior of the system and reinforce confidence in the model’s diagnostic decisions [16,17,18,28].

7. Discussion

The experimental results highlight several important points. The CNN–BiLSTM–Attention architecture delivers the strongest performance on the Bacha [20] dataset, with higher accuracy and MCC values than both baseline and ablation models. The ablation analysis indicates that each component contributes to the final outcome: CNN layers extract local patterns, BiLSTM layers capture temporal relationships, and the attention mechanism adjusts the importance of different time steps. The choice of a block-aware chronological split is also important for this dataset. Each fault class is recorded as a single continuous sequence, so a random split would place time-adjacent samples in both training and test sets, inflating performance estimates. Preserving the temporal structure leads to a more realistic evaluation of model generalisation. SHAP analysis further clarifies how the model responds to different fault types. For overheating faults (F6, F7, F8), temperature sensors account for a large share of the importance (33.2%), with T1 and Temp_Diff_Max standing out in the global ranking, consistent with their direct response to thermal effects during half-bridge heating. For OC and SC faults, current imbalance and phase current features become more prominent, aligning with the known role of stator current asymmetry as a diagnostic indicator for switch-level faults in three-phase inverters [31,32].

Several limitations of the present study should be acknowledged. The dataset was collected under fixed operating conditions (constant rotor speed of 10 rad/s, DC bus voltage of 15 V, and ambient temperature of 25 °C), representing a deliberate laboratory simplification. In field deployments, motor speed varies continuously, load torque fluctuates, and ambient temperature spans a wide range; each of these variations modulates the baseline current, voltage, and thermal signatures from which fault features are extracted, potentially shifting class boundaries and degrading model performance. Validating the proposed framework under variable-speed, variable-load, and wide-temperature protocols remains a priority direction for future work and will likely require either multi-condition experimental datasets or physics-informed domain adaptation strategies. The 10 Hz sampling rate is sufficient to capture thermal and steady-state current dynamics but may not resolve the high-frequency switching transients that provide early fault signatures at higher acquisition rates. The GradientExplainer approach provides gradient-based approximations rather than exact Shapley values; for models with recurrent components, the approximation quality is theoretically bounded, and the importance values should be interpreted as directional indicators rather than precise attributions. Furthermore, while SHAP GradientExplainer identifies which sensor channels are most discriminative at the feature level, it does not directly reveal which time steps within the 1.5 s window the attention mechanism focuses on. Attention weight visualisation across time steps would complement the present feature-level analysis and is left for future work.

8. Conclusions

This study proposed a hybrid CNN–BiLSTM–Attention deep learning framework for multi-class fault diagnosis in inverter-driven PMSM systems and evaluated it on a publicly available multi-sensor experimental dataset spanning nine operational conditions. The proposed model shows the best performance in terms of accuracy (0.9810 ± 0.0102) and MCC (0.9757 ± 0.0130) when compared with all evaluated alternatives, including classical ML approaches, sequence-based models, and a CNN–Transformer architecture. It is noted, however, that the Random Forest baseline attains a higher macro F1 score (0.9747 vs. 0.9681), reflecting the partial separability of fault classes through aggregated cross-sensor features without temporal modelling; this finding underscores the importance of reporting complementary metrics when evaluating diagnostic models under class imbalance. Results from the ablation analysis indicate that each component contributes to the overall performance. In addition, SHAP GradientExplainer analysis offers feature importance rankings that are consistent with known fault mechanisms and provide meaningful physical interpretation.

The adoption of a block-aware chronological data splitting strategy, five-run statistical reporting, and validation-set-based hyperparameter selection represents a methodologically rigorous evaluation framework that avoids common sources of inflated performance estimates in temporal FDD benchmarks. Future work will investigate the generalisation of the proposed architecture to multi-speed and multi-load operating conditions, the incorporation of physics-informed constraints as regularisation terms to enforce thermodynamic and electromagnetic consistency, and the deployment of lightweight model variants suitable for embedded edge inference in real-time predictive maintenance systems.

Funding

This research received no external funding.

Data Availability Statement

The dataset used in this study is publicly available at https://doi.org/10.5281/zenodo.14482932. The analysis code will be made available upon reasonable request.

Conflicts of Interest

The author declares no conflicts of interest.

References

Yu, Y.; Yuan, C.; Zeng, D.; Carbone, G.; Hu, Y.; Yang, J. Conceptual Approach to Permanent Magnet Synchronous Motor Turn-to-Turn Short Circuit and Uniform Demagnetization Fault Diagnosis. Actuators 2024, 13, 511. [Google Scholar] [CrossRef]
Al-Hindawi, D.; Al-Greer, M.; Bashir, I.; Ayub, A.A. Advanced Multi-Fault Diagnosis of PMSMs using Deep Transfer Learning and CWT-Based 2D CNNs. In Proceedings of the IECON 2025—51st Annual Conference of the IEEE Industrial Electronics Society, Madrid, Spain, 14–17 October 2025; pp. 1–6. [Google Scholar]
Yatak, M.Ö. Inverter-Driven and Stator Winding Fault Detection in Permanent Magnet Synchronous Motors with Hybrid Deep Model. Electronics 2025, 14, 4289. [Google Scholar] [CrossRef]
Yan, G.; Hu, Y. Inter-turn short circuit and demagnetization fault diagnosis of ship PMSM based on multiscale residual dilated CNN and BiLSTM. Meas. Sci. Technol. 2024, 35, 046105. [Google Scholar] [CrossRef]
Wang, M.; Lai, W.; Zhang, H.; Liu, Y.; Song, Q. Intelligent Fault Diagnosis of Inter-Turn Short Circuit Faults in PMSMs for Agricultural Machinery Based on Data Fusion and Bayesian Optimization. Agriculture 2024, 14, 2139. [Google Scholar] [CrossRef]
Song, Q.; Wang, M.; Lai, W.; Zhao, S. Multiscale Kernel-Based Residual CNN for Estimation of Inter-Turn Short Circuit Fault in PMSM. Sensors 2022, 22, 6870. [Google Scholar] [CrossRef] [PubMed]
Kao, I.H.; Wang, W.J.; Lai, Y.H.; Perng, J.W. Analysis of Permanent Magnet Synchronous Motor Fault Diagnosis Based on Learning. IEEE Trans. Instrum. Meas. 2019, 68, 310–324. [Google Scholar] [CrossRef]
Chen, Z.; Liang, K.; Peng, T.; Wang, Y. Multi-Condition PMSM Fault Diagnosis Based on Convolutional Neural Network Phase Tracker. Symmetry 2022, 14, 295. [Google Scholar] [CrossRef]
Peng, T.; Ye, C.; Yang, C.; Chen, Z.; Liang, K.; Fan, X. A novel fault diagnosis method for early faults of PMSMs under multiple operating conditions. ISA Trans. 2022, 130, 463–476. [Google Scholar] [CrossRef]
Cömert, M.; Şahin Sadık, E.; Ünsal, A. Data Fusion Based Multimodal Fault Diagnosis in Permanent Magnet Synchronous Motors. Kahramanmaraş Sütçü İmam Üniversitesi Mühendislik Bilim. Dergisi 2025, 28, 1546–1557. [Google Scholar] [CrossRef]
Li, L.; Liao, S.; Zou, B.; Liu, J. Mechanism-Based Fault Diagnosis Deep Learning Method for Permanent Magnet Synchronous Motor. Sensors 2024, 24, 6349. [Google Scholar] [CrossRef] [PubMed]
Lee, H.; Jeong, H.; Koo, G.; Ban, J.; Kim, S.W. Attention Recurrent Neural Network-Based Severity Estimation Method for Interturn Short-Circuit Fault in Permanent Magnet Synchronous Machines. IEEE Trans. Ind. Electron. 2021, 68, 3445–3453. [Google Scholar] [CrossRef]
Xu, J.; Zhou, Y.; Zhang, C.; He, L.; Li, X.; Liu, Y. Fault Diagnosis Method of Permanent Magnet Synchronous Motor Based on CNN-LSTM-Attention. In Proceedings of the 2025 IEEE 20th Conference on Industrial Electronics and Applications (ICIEA), Yantai, China, 3–6 August 2025; pp. 1–5. [Google Scholar]
Fan, S.; Huang, M.; Zhu, L.; Wu, W.; Wang, K.; Yao, Z. Large-Kernel Group Convolutional Perceptron Attention Network for Interturn Short Circuit Fault Diagnosis in PMSM. In Proceedings of the 2024 IEEE 10th International Power Electronics and Motion Control Conference (IPEMC2024-ECCE Asia), Chengdu, China, 17–20 May 2024; pp. 1836–1842. [Google Scholar]
Yang, Z.; Li, W.; Yuan, F.; Zhi, H.; Guo, M.; Xin, B.; Gao, Z. Hybrid CNN-BiLSTM-MHSA Model for Accurate Fault Diagnosis of Rotor Motor Bearings. Mathematics 2025, 13, 334. [Google Scholar] [CrossRef]
Zheng, B.; Liu, B.; Yan, J.; Tang, M.; Zanchetta, P.; Yu, H. Interpretable Harmonic-Aware Dual-Branch Neural Network for Trustworthy Diagnosis of OCFs in DTP-PMSMs With Enhanced Disturbance Robustness. IEEE Trans. Power Electron. 2026, 41, 103–108. [Google Scholar] [CrossRef]
Shojaeinasab, A.; Jalayer, M.; Baniasadi, A.; Najjaran, H. Unveiling the Black Box: A Unified XAI Framework for Signal-Based Deep Learning Models. Machines 2024, 12, 121. [Google Scholar] [CrossRef]
Wang, Y.; Wang, P. Explainable machine learning for motor fault diagnosis. In Proceedings of the 2023 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Kuala Lumpur, Malaysia, 22–25 May 2023; pp. 1–6. [Google Scholar]
Nguyen, D.A.; Jose, S.; Nguyen, T.P.K.; Medjaher, K. Explainable multimodal learning for predictive maintenance of steam generators. In Proceedings of the Asia Pacific Conference of the PHM Society 2023, Tokyo, Japan, 4 September 2023; pp. 1–7. [Google Scholar]
Bacha, A. Comprehensive Dataset for Fault Detection and Diagnosis in Inverter-Driven PMSM Systems. Zenodo 2024. [Google Scholar] [CrossRef]
Gmati, B.; Ben Rhouma, A.; Meddeb, H.; Khojet El Khil, S. Diagnosis of Multiple Open-Circuit Faults in Three-Phase Induction Machine Drive Systems Based on Bidirectional Long Short-Term Memory Algorithm. World Electr. Veh. J. 2024, 15, 53. [Google Scholar] [CrossRef]
Sun, A.; He, K.; Dai, M.; Ma, L.; Yang, H.; Dong, F.; Liu, C.; Fu, Z.; Song, M. Bearing Fault Diagnosis Based on Golden Cosine Scheduler-1DCNN-MLP-Cross-Attention Mechanisms (GCOS-1DCNN-MLP-Cross-Attention). Machines 2025, 13, 819. [Google Scholar] [CrossRef]
Fan, H.; Hu, J. Application of Machine Algorithm in Electrical Equipment Fault Warning and Diagnosis. In Proceedings of the 2025 International Conference on Computing Technologies & Data Communication (ICCTDC), Hassan, India, 4–5 July 2025; pp. 1–7. [Google Scholar]
Wang, M.; Lai, W.; Sun, P.; Li, H.; Song, Q. Severity Estimation of Inter-Turn Short-Circuit Fault in PMSM for Agricultural Machinery Using Bayesian Optimization and Enhanced Convolutional Neural Network Architecture. Agriculture 2024, 14, 2214. [Google Scholar] [CrossRef]
Zhang, W.; Xu, Q.; Zhang, Y.; Wang, Y.; Yang, Y.; Cai, H. Multi-objective tree-structured Parzen estimator optimized Res-Net for ITSC fault diagnosis of PMSM. Meas. Sci. Technol. 2025, 36, 026002. [Google Scholar] [CrossRef]
Sharma, A.; Sim, K.Y.; Chandrasekaran, S. A Comparative Study of Hybrid AI Models for Predictive Maintenance in Machine Processes. In Proceedings of the 2025 IEEE 4th International Conference on Computing and Machine Intelligence (ICMI), Mount Pleasant, MI, USA, 5–6 April 2025; pp. 1–6. [Google Scholar]
Awan, D.; Khan, M.U.; Zia, M.; Lais, M.; Tabassum, S.; Hamza, A.; Khan, M.S.; Awan, M.S. Development of Explainable AI (XAI) Framework for Fault Diagnosis in Power Electronics Systems. Phys. Educ. Health Soc. Sci. 2025, 3, 110–124. [Google Scholar] [CrossRef]
Bacha, A.; El Idrissi, R.; Janati Idrissi, K.; Lmai, F. Comprehensive dataset for fault detection and diagnosis in inverter-driven permanent magnet synchronous motor systems. Data Brief 2025, 58, 111286. [Google Scholar] [CrossRef]
Sun, Z.; Machlev, R.; Wang, Q.; Belikov, J.; Levron, Y.; Baimel, D. A public data-set for synchronous motor electrical faults diagnosis with CNN and LSTM reference classifiers. Energy AI 2023, 14, 100274. [Google Scholar] [CrossRef]
Wang, R.; Dong, E.; Cheng, Z.; Liu, Z.; Jia, X. Transformer-based intelligent fault diagnosis methods of mechanical equipment: A survey. Open Phys. 2024, 22, 20240015. [Google Scholar] [CrossRef]
Sun, X.; Diao, N.; Song, C.; Qiu, Y.; Zhao, X. An Open-Circuit Fault Diagnosis Method Based on Adjacent Trend Line Relationship of Current Vector Trajectory for Motor Drive Inverter. Machines 2023, 11, 928. [Google Scholar] [CrossRef]
Hyon, B.J.; Hwang, D.Y.; Jang, P.; Noh, Y.-S.; Kim, J.-H. Offline Fault Diagnosis for 2-Level Inverter: Short-Circuit and Open-Circuit Detection. Electronics 2024, 13, 1672. [Google Scholar] [CrossRef]

Figure 1. Class distribution of the PMSM inverter fault dataset (N = 10,892).

Figure 2. Pearson correlation heatmap of raw sensor measurements.

Figure 3. Time-series visualisation of current imbalance (dimensionless [20,28]; upper panel) and maximum temperature difference (°C; lower panel) across the complete dataset. Shaded regions indicate fault class boundaries.

Figure 4. Per-class box plots of half-bridge temperature T1 (°C; left) and phase current Ia (ADC counts; right) for the nine operational classes.

Figure 5. Validation macro F1 scores across DL hyperparameter sweep configurations.

Figure 6. Confusion matrix for the best single run of the proposed CNN–BiLSTM–Attention model on the test set.

Figure 7. Global feature importance derived from SHAP GradientExplainer applied to the proposed CNN–BiLSTM–Attention model.

Figure 8. Per-class SHAP feature importance heatmap.

Table 1. Distribution of fault scenarios in the PMSM inverter fault dataset [20].

Class	Location	Description	Samples	Proportion (%)
F0	-	Normal operating condition	4295	39.4
F1	S3 (high side)	Open-circuit fault	692	6.4
F2	S6 (low side)	Open-circuit fault	1122	10.3
F3	S2 (low side)	Short-circuit fault	407	3.7
F4	S3 (high side)	Short-circuit fault	341	3.1
F5	S5 (high side)	Short-circuit fault	412	3.8
F6	HB1	Overheating fault	854	7.8
F7	HB1 & HB2	Overheating fault	1735	15.9
F8	HB3	Overheating fault	1034	9.5

Table 2. Proposed CNN–BiLSTM–Attention architecture summary (best configuration: fc = 64, lu = 32, dropout = 0.2, lr = 0.001).

Block	Layer	Configuration
Input	Input layer	(15, 18)—15 time steps × 18 features
CNN Block	Conv1D + BN + Dropout	64 filters, kernel = 3, ReLU, BN, do = 0.2
CNN Block	Conv1D + BN + Dropout	128 filters, kernel = 3, ReLU, BN, do = 0.2
BiLSTM Block	Bidirectional LSTM 1	64 units × 2 directions, return_sequences = True, do = 0.2
BiLSTM Block	Bidirectional LSTM 2	32 units × 2 directions, return_sequences = True, do = 0.2
Attention Block	MultiHeadAttention	4 heads, key_dim = 16, + residual + LayerNorm
Attention Block	Feed-Forward + Residual	Dense(64, ReLU), Dropout(0.2), + residual + LayerNorm
Classifier Head	GlobalAveragePooling1D	Temporal aggregation
	Dense + Dropout	64 units, ReLU, do = 0.2
	Dense + Dropout	32 units, ReLU, do = 0.2
Output	Dense (softmax)	9 units, softmax

Table 3. Performance comparison of all evaluated models on the test set.

Model	Accuracy	Macro F1	MCC	ROC-AUC
Logistic Regression	0.8431	0.8124	0.8176	0.9866
Random Forest	0.9575	0.9747	0.9480	0.9999
XGBoost (tuned)	0.9542	0.9406	0.9429	0.9998
MLP	0.9542	0.9092	0.9410	0.9971
LSTM only	0.9536 ± 0.0089	0.9404 ± 0.0081	0.9406 ± 0.0113	0.9967 ± 0.0012
CNN only	0.9791 ± 0.0039	0.9754 ± 0.0127	0.9734 ± 0.0049	0.9998 ± 0.0002
BiLSTM only	0.9542 ± 0.0058	0.9423 ± 0.0085	0.9420 ± 0.0076	0.9983 ± 0.0005
CNN + BiLSTM	0.9699 ± 0.0139	0.9564 ± 0.0112	0.9619 ± 0.0173	0.9995 ± 0.0003
CNN + Transformer	0.9725 ± 0.0127	0.9569 ± 0.0191	0.9649 ± 0.0160	0.9989 ± 0.0010
CNN–BiLSTM–Attn (Proposed)	0.9810 ± 0.0102	0.9681 ± 0.0195	0.9757 ± 0.0130	0.9996 ± 0.0003

Table 4. Window size sensitivity analysis for the proposed CNN–BiLSTM–Attention model.

ws	Train Windows	Test Windows	Acc Mean	F1 Mean	MCC Mean
10	1402	315	0.9671 ± 0.0069	0.9489 ± 0.0202	0.9639 ± 0.0081
15	1393	306	0.9739 ± 0.0237	0.9579 ± 0.0324	0.9628 ± 0.0295
20	1384	297	0.9542 ± 0.0057	0.9469 ± 0.0144	0.9415 ± 0.0068
30	1366	279	0.9613 ± 0.0112	0.9620 ± 0.0155	0.9497 ± 0.0149

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yılmaz, Ü. Behavioral Fault Diagnosis in Inverter-Driven PMSM Systems Using a Hybrid CNN–BiLSTM–Attention Deep Learning Framework with SHAP-Based Interpretability. Machines 2026, 14, 638. https://doi.org/10.3390/machines14060638

AMA Style

Yılmaz Ü. Behavioral Fault Diagnosis in Inverter-Driven PMSM Systems Using a Hybrid CNN–BiLSTM–Attention Deep Learning Framework with SHAP-Based Interpretability. Machines. 2026; 14(6):638. https://doi.org/10.3390/machines14060638

Chicago/Turabian Style

Yılmaz, Ümit. 2026. "Behavioral Fault Diagnosis in Inverter-Driven PMSM Systems Using a Hybrid CNN–BiLSTM–Attention Deep Learning Framework with SHAP-Based Interpretability" Machines 14, no. 6: 638. https://doi.org/10.3390/machines14060638

APA Style

Yılmaz, Ü. (2026). Behavioral Fault Diagnosis in Inverter-Driven PMSM Systems Using a Hybrid CNN–BiLSTM–Attention Deep Learning Framework with SHAP-Based Interpretability. Machines, 14(6), 638. https://doi.org/10.3390/machines14060638

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Behavioral Fault Diagnosis in Inverter-Driven PMSM Systems Using a Hybrid CNN–BiLSTM–Attention Deep Learning Framework with SHAP-Based Interpretability

Abstract

1. Introduction

2. Related Work

3. Dataset Description

4. Methodology

4.1. Pre-Processing and Feature Selection

4.2. Block-Aware Data Splitting

4.3. Sliding Window Construction

4.4. Proposed Architecture: CNN–BiLSTM–Attention

4.5. Hyperparameter Selection

4.6. Training Protocol

4.7. Evaluation Metrics

5. Results

5.1. Comparative Performance

5.2. Ablation Study

5.3. Window Size Sensitivity Analysis

5.4. Per-Class Evaluation of the Proposed Model

6. Explainability Analysis

7. Discussion

8. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI