1. Introduction
Three-phase induction motors (IMs) are indispensable workhorses in modern industrial systems, powering production lines, HVAC systems, water treatment facilities, and marine propulsion systems. According to recent international surveys [
1], IMs account for approximately 85% of all electric motors in operation across industrial sectors, reflecting their dominance in electromechanical energy conversion [
2]. This widespread deployment is driven by their inherent reliability, simple construction, high power density, and ability to operate efficiently under demanding mechanical and electrical conditions.
However, despite their robustness, induction motors are susceptible to numerous electrical and mechanical faults. Among the most prevalent electrical disturbances encountered in real-world power systems is
imbalanced supply voltage (USV), a phenomenon wherein the magnitudes and/or phase angles of the three-phase supply voltages deviate from ideal balance. According to IEEE, IEC, and NEMA standards [
3,
4,
5], even minor imbalances (as low as 3–5% voltage imbalance) can induce significantly amplified current imbalances due to the motor’s relatively low negative-sequence impedance, leading to cascading consequences: excessive heating in the stator windings, increased iron and copper losses, torque pulsations, elevated acoustic emissions, and accelerated mechanical wear of bearings and rotor bars [
6,
7]. The severity thresholds for USV categorization evaluated in this study are strictly delineated in accordance with NEMA MG 1 and IEC 60034-1 [
8,
9] standards compliance regarding permissible phase imbalances.
The diagnosis and characterization of USV requires two complementary analyses: (1)
severity quantification, typically expressed via the Negative Voltage Factor (NVF), which is the ratio of negative-sequence to positive-sequence voltage components computed via the Fortescue transformation [
10], and (2)
impedance characterization, wherein the three-phase stator winding impedances (
) are estimated to differentiate supply-side faults (voltage imbalance) from load-side faults (internal motor degradation).
Over the immediate past three years, solutions for induction motor fault diagnosis have heavily pivoted toward deep learning-enhanced architectures. Contemporary research successfully leverages advanced Convolutional Neural Networks (CNNs) and Vision Transformers to autonomously extract non-linear harmonic dependencies from induction signals without relying on rigid manual signal processing.
Recent shallow machine learning methodologies have equally shown considerable promise. Notably, Laadjal et al. [
11] demonstrated that Decision Tree Regressors (DTRs), when combined with Short-Time Least Squares Prony (STLSP) signal processing, could achieve excellent impedance estimation accuracy (MAE ≈ 0.05
) and NVF detection (MAE ≈ 0.08%) on a publicly available USV dataset. However, this approach carries inherent limitations:
Manual feature engineering burden: The STLSP method requires careful window length tuning, preprocessing parameter selection, and domain knowledge in signal processing. Features extracted by STLSP (amplitude, phase, frequency, damping factor) are hand-crafted and may not capture the full complexity of non-stationary motor dynamics.
Shallow model capacity: Decision Trees, though interpretable and computationally efficient, lack the hierarchical feature learning capability of deep neural networks. They struggle to capture higher-order, non-linear interactions among input features, particularly in low-data regimes.
Task isolation: The original formulation treats impedance estimation (regression) and fault detection (classification) as independent tasks, ignoring potential synergies. A unified framework could regularize the model via multi-task learning, improving generalization on both tasks simultaneously.
Raw signal underexploitation: While STLSP effectively compresses 10 kHz sampled voltage/current waveforms into interpretable features, the compression inherently discards information. An end-to-end deep learning model trained directly on raw waveforms could, in principle, learn richer representations without manual feature definition.
To address these gaps, we propose a comprehensive deep learning framework for USV diagnostics. The overall pipeline is visualized in
Figure 1.
Multi-Head Residual MLP (ResMLP) architecture: We introduce a deep Residual Multilayer Perceptron with a shared encoder and task-specific prediction heads that jointly estimate phase impedances and imbalance severity via multi-task learning.
End-to-End Temporal Convolutional Network (TCN): We explore an end-to-end TCN operating on raw voltage waveforms to assess the feasibility of voltage-only diagnostics and to quantify the trade-offs relative to feature-based models.
Comprehensive Evaluation Framework: We evaluate models across operating conditions, imbalance magnitudes, and scarce-label regimes using MAE, MSE, , and classification metrics to ensure practical relevance.
Ablation Studies and Deployment Focus: We analyze architectural choices, data augmentation, and latency trade-offs, producing a configuration that balances accuracy and embedded inference constraints.
The main novelty and contributions of this work lie in:
Demonstrating that deep residual architectures can effectively learn from STLSP-engineered features to capture complex non-linear impedance–NVF relationships.
Proposing a unified multi-task framework that leverages task correlations to improve generalization, a strategy not explored in prior USV diagnostics literature.
Providing a rigorous ablation study on voltage-only representation learning. By deploying an end-to-end TCN strictly on voltage vectors, we empirically quantify the performance boundary of deep learning models when physically deprived of causal current variables, definitively proving the strict requirement of coupled sensor topologies.
Delivering production-ready PyTorch implementations optimized for embedded inference latency, bridging the gap between research and industrial deployment.
Establishing a comprehensive framework for USV diagnostics against which future deep learning approaches can be measured.
The remainder of this paper is organized as follows.
Section 2 provides the theoretical foundation: USV fundamentals, the Fortescue transformation, STLSP methodology, and an overview of relevant deep learning architectures.
Section 3 details the experimental setup, dataset characteristics, preprocessing pipeline, and proposed models.
Section 4 presents comprehensive experimental results, including baseline validation, deep learning comparisons, ablation studies, and cross-operating-condition analysis.
Section 5 interprets findings, discusses trade-offs, and contextualizes results within the broader literature. Finally,
Section 6 summarizes key insights and outlines future research directions.
2. Theoretical Background and Related Work
2.1. Unbalanced Supply Voltage and Symmetrical Components
In a balanced three-phase system, the three phase voltages are equal in magnitude and separated by 120° in phase. However, in practice, imbalances arise from multiple sources: malfunctioning power factor correction capacitors, uneven single-phase load distribution across the grid, open-circuit faults in distribution lines, and transformer winding imbalances [
11]. These imbalances are mathematically characterized using the
Fortescue transformation, which decomposes the three phase-voltages into three symmetrical components: positive-sequence (VP), negative-sequence (VN), and zero-sequence (VZ) [
10].
Given the three-phase voltages
at time instant
t, the symmetrical components are computed as:
where
is the complex cube root of unity. The positive-sequence component rotates in the forward direction (standard 50/60 Hz), the negative-sequence rotates backward, and the zero-sequence is common to all phases.
The severity of voltage imbalance is quantified via the
Negative Voltage Factor (NVF), defined as:
A balanced system has NVF ≈ 0, while increasing imbalance elevates NVF. According to NEMA and IEEE standards, motors should not be operated continuously with NVF exceeding 2–3%.
The motor’s impedance responses to supply voltage perturbations through its stator winding resistance and leakage inductance. For each phase, the fundamental-frequency sequence impedance is defined as:
where the subscript
denotes the fundamental frequency component (50 Hz in our case) extracted via signal processing. By estimating
and computing their symmetrical components, we obtain:
The motor’s impedance is sensitive to load, temperature, and operating point; however, imbalanced supply voltage induces systematic changes in that differ qualitatively from internal faults, enabling fault source attribution.
2.2. Short-Time Least Squares Prony (STLSP) Method
The STLSP method is a high-resolution signal decomposition technique that accurately extracts frequency, amplitude, phase, and damping of exponentially damped sinusoids from short, non-stationary signal windows [
12,
13]. Unlike Fourier-based methods (FFTs), which assume stationarity over the entire analysis window, STLSP is particularly suited for motor transients and time-varying conditions.
The core principle is linear prediction: given a signal sequence
for
, we model it as a linear combination of exponentially damped sinusoids:
where
is the amplitude,
the damping factor,
the normalized frequency, and
the phase of the
k-th component.
STLSP finds the linear prediction coefficients
by minimizing the forward prediction error:
The characteristic polynomial of the prediction filter is:
The roots
of this polynomial encode the signal’s damping and frequency:
Once the poles are determined, the amplitudes and phases are found by solving a Vandermonde linear system via least squares. For a window of N samples (), the system is over-determined, and a least-squares solution yields optimal parameters.
In the USV diagnostic context, STLSP is applied to sliding windows of the three-phase voltages and currents (sampled at 10 kHz, window length typically 100–200 samples corresponding to 5–10 ms), extracting the fundamental component (
Hz) amplitude and phase for each phase. This compression from 590,000 raw samples to 13,721 STLSP windows (per the dataset provided in [
11]) enables efficient feature-based machine learning while retaining time-varying information.
2.3. Deep Learning Architectures for Tabular and Sequential Data
2.3.1. Residual Multilayer Perceptron (ResMLP)
While convolutional and recurrent architectures dominate deep learning, recent work has demonstrated that residual connections and careful layer normalization can substantially improve the performance of dense networks on tabular data [
14]. ResMLP, proposed by Tolstikhin et al. and adapted for various domains, replaces convolutional layers with fully-connected blocks and uses skip connections to enable training of deeper architectures.
A typical ResMLP block can be written as:
where
is a two-layer dense network with hidden dimension
and GELU activation:
The GELU (Gaussian Error Linear Unit) activation is preferred over ReLU for tabular data due to its smoother gradient landscape and reduced tendency toward dead neurons [
15].
For multi-task learning, we employ a shared encoder (sequence of ResMLP blocks) followed by task-specific output heads:
The joint loss function combines losses from all tasks with learnable weights or fixed balancing coefficients.
2.3.2. Temporal Convolutional Networks (TCNs)
Temporal Convolutional Networks represent an alternative to RNNs/LSTMs for sequence modeling, offering superior parallelization and more stable gradients [
16]. A TCN consists of stacked 1D convolutional layers with dilated convolutions, causal padding, and residual connections.
A dilated 1D convolution at position
t is defined as:
where
d is the dilation factor and
k is the kernel size. By stacking layers with exponentially increasing dilation (
), the receptive field grows exponentially, enabling the network to capture dependencies over long time horizons with relatively few parameters.
Causal padding ensures that the output at time t depends only on inputs up to time t (not future samples), preserving the temporal causality necessary for online diagnostics.
2.4. Related Work and State of the Art
The field of induction motor fault diagnosis has evolved through several paradigms. Early approaches [
1] relied on hand-crafted statistical features (RMS, kurtosis, crest factor) combined with classifiers like Support Vector Machines (SVMs). Subsequent work incorporated advanced signal processing: Empirical Mode Decomposition (EMD) [
17], Wavelet Transforms [
18], and Hilbert–Huang Transform [
19].
More recently, deep learning has gained traction. CNN-based approaches learn hierarchical features from raw vibration or current signals [
20]. However, most prior deep learning work in motor diagnostics focuses on bearing faults or rotor bar breakage, not USV. The study by Laadjal et al. [
11], which forms the direct predecessor to our work, established a strong baseline using Decision Trees with STLSP features. Their work is comprehensive in scope but does not explore deep learning or multi-task learning paradigms.
Concurrent efforts in deep learning for motor control include [
21], which employed encoder–decoder RNNs with skip connections for motor dynamics modeling. However, this work targets control applications (predicting future motor states) rather than diagnostics (fault detection and impedance estimation).
To our knowledge, this is the first work to:
Apply Multi-Head deep networks with shared representations to joint impedance-NVF estimation in USV diagnostics.
Conduct a systematic comparison of feature-based (ResMLP) vs. end-to-end (TCN) learning on the USV problem.
Provide a critical analysis of why raw-signal approaches struggle in this domain and when they might be feasible.
Deliver production-ready PyTorch implementations optimized for real-time embedded deployment.
4. Results
4.1. Baseline Reproduction: Decision Tree Regressor
To establish a credible benchmark, we first reproduce the DTR baseline from Laadjal et al. [
11].
Table 3 summarizes the performance achieved with optimal decision tree depths determined via pre-pruning on a hold-out validation set (99% test set as per the original work). Crucially, to eliminate data leakage caused by the high autocorrelation of sliding-window extractions, the datasets were partitioned using chronological Block Time-Series Splitting rather than randomized shuffling, ensuring the test set evaluates genuinely unseen transient states.
These results closely match those reported in [
11], confirming the integrity of our reproduction. The very low NVF error (0.08% relative to mean NVF of 0.5%) indicates that Decision Trees, despite their simplicity, are remarkably effective for this task. This high baseline accuracy sets a demanding target for our deep learning models.
4.2. Deep Learning Model Comparison
Table 4 and
Figure 4 present the performance and comparative metrics of the proposed ResMLP and TCN architectures on the same 1% training/99% testing protocol.
Key Observations:
ResMLP superiority over DTR: The best ResMLP variant (ResMLP-3) achieves impedance MAE of 0.0412 (Block Time-Series Split), a improvement over the DTR baseline (0.0524 ). For NVF, ResMLP-3 achieves MAE of 0.0007, matching DTR’s performance (MAE 0.0008). The improved (0.9921 vs. 0.9876 for impedance) indicates superior variance explanation.
TCN struggles on impedance: The TCN achieves much poorer impedance estimation (MAE 0.1823 , 3.5× worse than DTR). This is unsurprising given that impedance is fundamentally a voltage–current ratio; without explicit current input, the model must infer current from voltage transients alone, which is ill-posed under varying loads. The high of 0.8734 still indicates reasonable correlation, but the absolute error is problematic for protection applications (which typically require sub-0.05 precision).
TCN acceptable for NVF: Interestingly, the TCN achieves NVF MAE of 0.0054 (reasonable but 7.7× worse than ResMLP-3). This suggests that voltage waveforms alone contain some diagnostic information about imbalance severity, even without current information. The lower for TCN NVF (0.65 vs. 0.99) indicates higher variance in predictions.
Inference latency: ResMLP inference is 20× faster than TCN (2.3 ms vs. 45.7 ms), making ResMLP far more suitable for real-time embedded systems. The TCN’s higher latency is due to sequential processing through multiple convolutional layers and the larger input window.
Architectural Depth Trade-off:
The progression from ResMLP-1 to ResMLP-3 shows steady improvement in impedance estimation (MAE from 0.0589 to 0.0412 under Block Time-Series Split evaluation), with diminishing returns beyond 3 blocks. ResMLP-4 shows slight degradation (MAE 0.0424 under the same protocol), suggesting mild overfitting due to increased model capacity (512k parameters in ResMLP-4 vs. 128k in ResMLP-3) and the small training set (1% of 13,700 samples ≈ 137 samples). This motivates the selection of ResMLP-3 as the optimal configuration.
4.3. Fault Detection Analysis
Binary fault detection is performed by thresholding the predicted NVF: samples with NVF
are labeled faulty. This threshold aligns with NEMA standards (max continuous operation at 2% NVF, maintenance recommended at 1%).
Table 5 compares detection performance, while
Figure 5 presents the detailed confusion matrices, and
Figure 6 displays the corresponding ROC and Precision–Recall curves:
ResMLP-3 achieves the highest precision (0.9265) and recall (0.9479), translating to an accuracy of 93.7% for fault detection. The balance between precision and recall is excellent, indicating that the model neither over-reports faults (high false positive rate) nor misses actual faults (high false negative rate). The F1-Score of 0.8831 is competitive with DTR (0.8995), with the slight reduction offset by superior impedance estimation. To further decompose fault detection reliability, a rigorous error analysis confirmed that the majority of false positives are highly localized to initial transient start-up phases, where induced current spikes temporarily obfuscate the true fault signature. To mitigate this in safety-critical systems, we incorporated Monte Carlo (MC) Dropout during inference, yielding predictive variance bounds that successfully identify and flag these transient regions as low-confidence predictions (see
Figure 7).
TCN’s poor fault detection (F1 = 0.39) is consistent with its impedance estimation struggles and reflects the fundamental limitation of voltage-only analysis in this problem space.
4.4. Ablation Study: Network Depth and Regularization Effects
Figure 8 visualizes how network depth affects ResMLP performance:
The plot demonstrates:
Clear optimal depth at 3 blocks: MAE is minimized at ResMLP-3, with increasing depth beyond 3 leading to degradation.
Impedance more sensitive to depth than NVF: The spread in impedance error is larger across depths (0.0412–0.0589 under Block Time-Series Split, variation) compared to NVF (0.0007–0.0031, variation in raw terms but smaller in relative magnitude).
Regularization balance: Shallower models (ResMLP-1, -2) underfit due to limited capacity, while deeper models (ResMLP-4, -5) overfit due to small training set size. Dropout (0.2 rate) helps, but the limited training data fundamentally constrains the effective capacity.
4.5. Operating Condition Analysis
To assess robustness, we evaluate model performance across the eight operating conditions (two loads × four USV levels).
Table 6 shows NVF estimation error (more insightful than impedance, which is condition-dependent), and
Figure 9 illustrates the performance stability:
Insights:
Scaled error with USV magnitude: Both DTR and ResMLP show increasing error with larger voltage dips, which is expected given the wider range of target values. Notably, ResMLP consistently achieves lower or comparable error to DTR across all conditions.
Load independence: Errors at 0 Nm and 10 Nm are similar, suggesting the models generalize reasonably across load points. The slightly higher error at 10 Nm is attributable to higher current magnitudes and stronger non-linear motor dynamics at higher power.
TCN degradation with severity: The TCN’s NVF MAE scales proportionally with USV magnitude, reaching 1.58% at 10 Nm with 15V dip. This is problematic for diagnostics, as it introduces greater uncertainty precisely when the fault severity is highest and accurate diagnosis is most critical.
4.6. Training Dynamics and Convergence
Figure 10 displays training and validation loss curves for ResMLP-3 and TCN over 200 epochs:
Observations:
ResMLP smooth convergence: ResMLP training loss decreases monotonically and validation loss stabilizes around epoch 80–100. The small train–validation gap indicates minimal overfitting despite the small training set. Final validation loss reaches approximately 0.0001 (combined impedance and NVF loss).
TCN high variance: TCN loss exhibits substantial epoch-to-epoch fluctuation (noise), and validation loss remains consistently high (>0.001), indicating that the model struggles to learn a stable representation from raw voltage inputs. The model does not converge to achieve a good solution even after 200 epochs.
Gradient stability: ResMLP’s smooth curves suggest well-conditioned gradients, likely due to the combination of layer normalization and skip connections. TCN’s noise suggests gradient instability, possibly due to the vanishing gradient problem in deep convolutional stacks or the inherent difficulty of the raw-signal learning task.
4.7. Prediction Scatter Plots and Residual Analysis
Figure 11 compares predicted vs. actual NVF for ResMLP-3 and TCN:
ResMLP Prediction Quality:
The scatter plot reveals that ResMLP predictions are tightly concentrated along the true = predicted diagonal. Residuals (True − Predicted) are normally distributed with mean near zero and standard deviation of approximately 0.0007, consistent with the reported MAE.
TCN Prediction Issues:
The TCN plot shows:
Substantial vertical spread, indicating high variance in predictions.
Systematic upward bias (predictions often exceed true values), suggesting the model has learned to over-estimate NVF, possibly as a conservative safety measure due to training instability.
Non-uniform residual distribution, with larger residuals at higher NVF values.
4.8. Computational Requirements and Deployment Feasibility
Table 7 summarizes practical metrics relevant to embedded deployment:
To fully contextualize the overhead of the deep learning techniques, the baseline analysis was expanded to include industry-standard ensemble methods. Random Forest (RF) achieves MAE = 0.0654 and Gradient Boosting (XGBoost) achieves MAE = 0.0487 on the same 99% hold-out test partition, significantly closing the gap above the base DTR (0.0524 ) and confirming that the unified multi-task ResMLP (0.0412 ) retains a statistically meaningful accuracy advantage whilst simultaneously solving impedance regression and NVF detection in a single architecture pass.
While the multi-task ResMLP architecture achieves a lower MAE (0.0412 ), this precision incurs a latency cost (2.3 ms inference time) compared to the simpler DTR (0.5 ms). We note that the 0.0112 absolute improvement (21% relative reduction) represents a classic model complexity trade-off. For basic condition monitoring, simpler models are sufficient; however, for high-precision diagnostic systems operating within the typical 16.6 ms to 20 ms grid window, the deep learning latency is entirely viable.
6. Conclusions
This paper presents a comprehensive study of deep learning approaches for induction motor diagnostics under imbalanced supply voltage. We propose a Multi-Head Residual MLP architecture that jointly estimates motor phase impedances and detects voltage imbalance, achieving superior accuracy compared to traditional Decision Tree baselines while maintaining practical deployment constraints.
Key findings:
ResMLP-3 (three residual blocks) achieves impedance estimation MAE of 0.0412 under Block Time-Series Split evaluation (21% improvement over DTR baseline of 0.0524 ) and NVF MAE of 0.0007 (equivalent to DTR), with inference latency of 2.3 ms suitable for real-time embedded systems.
Multi-task learning, where impedance and NVF estimation are jointly optimized, provides regularization benefits that improve generalization in low-data regimes (1% training split).
Raw-voltage Temporal Convolutional Networks struggle with impedance estimation due to the fundamental voltage–current relationship but can provide supplementary imbalance severity indicators (NVF MAE 0.0054) in voltage-only scenarios.
Ablation studies confirm that three residual blocks represent the optimal trade-off between model capacity and overfitting, with degrading performance for deeper architectures under the training data constraints.
Operating condition analysis demonstrates robustness across load points (0 Nm and 10 Nm) and USV magnitudes (5–15 V), with consistent performance improvements over baselines.
Practical recommendations:
For industrial deployment, we recommend:
Primary method: ResMLP-3 with STLSP features for high-precision impedance and imbalance diagnostics.
Fallback method: TCN on raw voltage for degraded but functional diagnostics when current sensors are unavailable.
Confidence: Ensemble multiple initializations to provide prediction confidence intervals for diagnostic decision support.
Adaptation: Implement periodic re-training or fine-tuning as new data accumulates to track motor aging and maintain accuracy over time.
This work opens pathways for further research in deep learning-based condition monitoring, with immediate applications to industrial motor protection and predictive maintenance. The release of trained models and inference code would accelerate adoption in the community.