Next Article in Journal
Analytical Modeling of Slot Leakage Inductance for Hairpin Windings
Previous Article in Journal
Friction and Wear Behavior of General Freight Train Composite Brake Shoes with Reinforced Steel Fibers
Previous Article in Special Issue
Features of the Data Collection and Transmission Technology in an Intelligent Thermal Conditioning System for Engines and Vehicles Operating on Thermal Energy Storage Technology Based on a Digital Twin
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Unreasonable Effectiveness of Neural Operators and Mambas in Detecting and Quantifying Electrical Machine Faults: A Case Study on Eccentricity

by
Latifa Yusuf
*,
Belaid Moa
and
Ilamparithi Thirumarai Chelvan
Electrical and Computer Engineering Department, University of Victoria, Victoria, BC V8P 5C2, Canada
*
Author to whom correspondence should be addressed.
Machines 2026, 14(5), 574; https://doi.org/10.3390/machines14050574
Submission received: 29 March 2026 / Revised: 15 May 2026 / Accepted: 16 May 2026 / Published: 21 May 2026
(This article belongs to the Special Issue Data-Driven Fault Diagnosis for Machines and Systems, 2nd Edition)

Abstract

Reliable fault detection and quantification are essential for the operational integrity of electric machines. While traditional current-based analysis relies on harmonic signatures or wavelet-based time-frequency representations, this study investigates modern learning formulations that capture spectral, multiscale, and temporal characteristics of fault-affected signals. Moving beyond conventional models, including our earlier CNN-based approaches, we develop sequence-based and operator-learning architectures within a multi-output formulation for eccentricity fault analysis. Three models are investigated: Mamba for temporal dynamics, the Fourier Neural Operator for global spectral mapping, and the Wavelet Neural Operator for localized multiscale decomposition. Evaluated on induction, salient pole synchronous, and inverter-based reluctance synchronous machines, each model maps stator current waveforms to multiple diagnostic quantities, including voltages, operating conditions, and fault severity. With time-delay embedding, all three achieve low prediction errors, with severity RMSE reaching the 10 4 scale for the induction machine, a notable reduction from the 0.04 errors of our earlier hierarchical CNN models. These results show that modern sequence-based and operator-learning formulations can broaden machine fault analysis by enabling simultaneous prediction and estimation of multiple aspects of machine condition within a single model.

1. Introduction

1.1. Background Study

Electric motors and other electromechanical systems form the backbone of modern industrial infrastructure, making their reliable operation critical for productivity and safety. The ability to swiftly detect and precisely diagnose faults in these systems, including those arising from eccentricity and other subtle internal abnormalities, is therefore crucial for minimizing downtime, reducing maintenance costs, and prolonging equipment life. Historically, fault detection has primarily relied on analytical and physics-based models, which require detailed knowledge of system parameters and idealized operational assumptions. However, such classical approaches often struggle to accurately represent complex, nonlinear behavior, particularly under varying load conditions and dynamic environmental influences—conditions commonly associated with eccentricity and similar electromechanical faults [1]. In response, recent research has shifted toward data-driven methodologies as a promising alternative to address these challenges.
Unlike traditional approaches, data-driven fault detection methods construct models from measured operational data, enabling the identification of intricate relationships between system behavior and emerging faults. This capability is particularly valuable for capturing uncertainty, component interactions, and gradual degradation processes that are challenging to describe analytically. By leveraging techniques such as machine learning (ML), especially deep learning (DL), these methods can distill informative features from raw signals and adapt to changing operational scenarios with minimal prior assumptions. Furthermore, their scalability and flexibility make them well suited to large-scale, end-to-end monitoring applications, in which high-dimensional datasets and diverse machine configurations are prevalent [2].
Among the frequently encountered faults in electric machines is eccentricity, which, especially in its early stages, can produce subtle signatures due to an uneven distribution of the air-gap between the rotor and the stator. Eccentricity typically arises from mechanical imperfections, such as manufacturing tolerances, bearing degradation, shaft misalignment, or gradual wear during prolonged operation. It is commonly categorized into three forms: static eccentricity (SE), characterized by a fixed displacement of the rotor axis relative to the stator axis; dynamic eccentricity (DE), in which the rotor rotates around its axis while the center of rotation itself revolves around the stator center; and mixed eccentricity (ME), which combines features of both SE and DE conditions [3,4]. These fault types provide the physical basis for the diagnostic problem considered in this study. Each type introduces characteristic but often faint fault signatures into electrical and mechanical measurements, complicating early diagnosis. The nonlinear nature and load dependency of these signatures further challenge conventional detection strategies. Consequently, the development of learning-based frameworks for detecting and quantifying eccentricity faults has become an area of growing interest.
In contrast to classical approaches, such as Motor Current Signature Analysis (MCSA), equivalent circuit models, or observer-based estimators, which often rely on simplified descriptions or restrictive assumptions, data-driven methods learn discriminative patterns autonomously from empirical data [5]. This approach facilitates the recognition of low-amplitude, nonlinear features that characterize eccentricity under various load and speed conditions. By systematically analyzing operational measurements, such as stator currents, voltage waveforms, or vibration signals, ML and DL models can achieve enhanced sensitivity, earlier fault detection, and improved adaptability across diverse operational environments. These capabilities position data-driven strategies as a compelling and practical foundation for advancing the state of health monitoring and fault diagnosis in electromechanical systems [6].
Existing data-driven eccentricity studies can be broadly grouped into handcrafted-feature methods and deep learning-based methods. Early applications primarily relied on combining manually extracted features with classical ML classifiers. In these approaches, informative indicators such as sideband amplitudes in the current spectrum, time-domain statistical measures, or Park’s vector components were derived from electrical or vibrational measurements to characterize fault conditions. Classifiers, including Support Vector Machines (SVMs), k-nearest Neighbors (k-NNs), Decision Trees (DTs), Discriminant Analysis, Principal Component Analysis (PCA), and Artificial Neural Network (ANN) [7,8,9,10,11,12,13,14,15,16,17] were then employed to distinguish between healthy and eccentricity states. These methods demonstrated notable improvements over purely analytical or threshold-based techniques, particularly in their ability to capture subtle variations associated with fault progression. However, their performance often depended heavily on the quality and consistency of the handcrafted features, which were sensitive to load fluctuations, measurement noise, and variations in machine topology. Furthermore, the reliance on expert-designed signal transformations limited the models’ adaptability across different machines or operating environments.
More recently, DL approaches have been explored to reduce reliance on manual feature extraction and improve the accuracy of eccentricity fault diagnosis. Convolutional Neural Networks (CNN) have been applied to raw time-domain current signals or transformed forms such as spectrograms and frequency spectra, demonstrating superior classification performance under a range of loading and fault conditions [17,18,19,20]. Autoencoder architectures (AEs) and Recurrent Neural Networks (RNNs) have also been investigated to capture temporal dependencies through unsupervised feature learning. These methods have improved detection accuracy and reduced the need for expert intervention in signal preprocessing [21].

1.2. Motivation and Scope of the Study

While DL approaches have improved detection and classification, most methods still assign discrete labels to operating conditions and often require separate models for classification and regression. This focus can limit their effectiveness in scenarios where predicting multiple outputs, such as discrete fault classes and continuous severity indices, is required for comprehensive monitoring and maintenance planning. In addition, conventional architectures typically learn abstract features that are optimized for discrimination but do not explicitly model the functional relationship between input waveforms and target outputs. In contrast, operator-based models provide a structured approach to approximating this relationship, potentially enhancing the expressiveness and scalability of fault diagnosis frameworks [22].
Modeling the relationship between input and output functions is crucial in electromechanical systems, in which fault progression manifests as continuous changes across entire waveforms rather than isolated categorical shifts. Conventional ML and DL architectures typically map a window of input samples to a single-output label or scalar regression value, which limits their ability to capture the underlying functional dependencies that govern eccentricity behavior. By treating waveform data as discrete points rather than full functions, these approaches overlook the harmonic interactions, amplitude–phase coupling, and load-dependent distortions that evolve throughout the electrical cycle. Operator learning provides a principled alternative by directly learning mappings between functions, preserving global structure and temporal coherence. This functional viewpoint naturally supports sequence-to-sequence prediction, enabling severity estimates to evolve smoothly across each waveform segment. It also facilitates joint multi-output regression, allowing fault type and severity information to be inferred within a single modeling framework. These properties align closely with the physics of eccentricity faults, in which interactions among harmonics and load conditions shape the diagnostic signatures.
Beyond classical DL architectures, modern sequence models such as Mamba offer an alternative approach to modeling waveform evolution. Mamba employs selective state-space dynamics to capture long-range temporal dependencies and smooth progression patterns for sequence-to-sequence fault estimation. Within this context, operator-learning frameworks such as the Fourier Neural Operator (FNO) [23], Wavelet Neural Operator (WNO) [24], DeepONet [25], and Koopman-inspired models [26], have emerged as powerful approaches for modeling function-to-function relationships. These architectures differ in how they parameterize operators, ranging from global spectral mixing to localized multiscale decompositions to branch-trunk formulations and linearized dynamical embeddings. Yet, they share the common objective of learning mappings between infinite-dimensional functional spaces. This characteristic makes them well suited for electromechanical diagnostics, where input currents, output voltages, and fault-related quantities all evolve as continuous functions with coupled temporal and spectral behavior. By integrating physical structure directly into the learning process, operator-learning models can represent waveform-level dependencies that conventional networks treat only implicitly.
In this work, eccentricity fault diagnosis is addressed using a sequence-to-sequence learning approach that supports continuous estimation of machine behavior and multiple diagnostic quantities directly from measured inputs. We evaluate three representative realizations: Mamba, the FNO, and the WNO. These models are selected because they provide complementary ways of representing fault-affected input electrical signals, including their temporal evolution, harmonic structure, and localized variations. Other operator-learning approaches exist; however, the selected models provide a sufficient basis for evaluating the proposed formulation without expanding the scope of the study beyond its primary objective. It should be noted that the focus of this study is not on developing new deep learning model architectures but on creating an adaptable framework for existing learning models for electromechanical fault detection, progression, and estimation. The main contributions of this study are summarized as follows:
1.
This study formulates electromechanical fault diagnosis as a multi-output waveform-mapping problem, where measured current signals are used to estimate voltage waveforms, operating quantities, and fault severity.
2.
It evaluates three complementary learning models, Mamba, FNO, and WNO, for the same diagnostic formulation, allowing temporal, spectral, and multiscale features of fault-affected electrical signals to be examined.
3.
These proposed formulations are validated across three machine platforms, SPSM, INV-RSM, and IM, covering both grid-connected and inverter-fed machine conditions.
The remainder of this article is organized as follows: Section 2 describes the experimental setup and data acquisition across the three machine platforms. Section 3 presents the Mamba model and its selective state-space formulation, while Section 4 introduces the neural operator models, detailing the FNO and WNO architectures and their function-to-function mapping mechanisms. These sections also report the corresponding sequence-to-sequence, multi-output prediction results for eccentricity detection and severity estimation. Section 5 discusses the implications of the findings and outlines opportunities for extending operator-learning approaches to broader machine-health applications. Section 6 concludes the article.

2. Experimental Setup, Data Preparation

2.1. Experimental Setup

This study uses datasets collected from three machine platforms: a salient-pole synchronous motor (SPSM), an inverter-fed reluctance synchronous machine (INV-RSM), and an induction motor (IM). Specifications for each machine are provided in Table A4, Table A5 and Table A6. Figure 1 shows the INV-RSM setup. The RSM was driven by a ME2 series inverter, (Motortronics, Clearwater, FL, USA). with the inverter switching frequency set at 4000 Hz. The motor was operated under open-loop control throughout the experiment. The experimental setups for the SPSM and IM followed a similar configuration, except that they operated under grid supply rather than inverter excitation [19] (see Figure 2 for the actual experimental setup of SPSM). Figure 3 shows the 40SE and 40DE sleeves, which are representative examples of the eccentricity components used to impose controlled fault conditions in the experiments. Eccentricity faults were introduced mechanically using eccentric bushings and sleeves whose inner and outer centers were intentionally offset relative to each other. For SE (Figure 3c), eccentric bushings were placed between the bearings and the end plates on both sides of the rotor, producing a fixed displacement between the rotor and stator axes. For DE (Figure 3d), eccentric sleeves were placed between the shaft and the bearings, producing an offset associated with the rotating assembly. ME was obtained by combining both arrangements, with eccentric bushings introduced at the end-plate locations and eccentric sleeves introduced between the shaft and the bearings. The percentage eccentricity level represents the imposed center displacement expressed relative to the nominal air-gap length. For example, a 40% eccentricity level corresponds to a center offset equal to 40% of the nominal air gap. During installation, markings on the bushings and sleeves were used to control the minimum air gap and to avoid inclined eccentricity.

2.2. Data Acquisition

Three-phase stator line currents and phase-to-phase voltages were recorded over 15 s intervals under controlled eccentricity conditions. Figure 4 and Figure 5 compare the healthy and faulty 40SE waveforms across the three machines for the line current I a and line-to-line voltage V a b , respectively. These plots are provided as qualitative examples from separate healthy and faulty recordings and should not be interpreted as synchronized waveform comparisons or as indicators that faults can be segregated merely by observing the difference between the two signals. For the SPSM, measurements were acquired at a sampling rate of 3.6 kHz, yielding 54,000 samples per condition. For the INV-RSM, a sampling rate of 24 kHz was used, yielding 360,000 samples per condition. For the IM, measurements were also collected at 3.6 kHz, with 54,000 samples per condition. Additional SPSM datasets were acquired under three power factor (PF) operating points: 0.9 lagging, 0.9 leading, and unity. Table 1 summarizes the health and fault conditions investigated across all three machines. The NI DAQPad-6070E (National Instruments Corporation, Austin, TX, USA) was used as the data acquisition device to collect data from the machines.

2.3. Data Preprocessing

The input sequences are constructed from raw current recordings using time-delay embedding applied only to the current channels. Let I a [ n ] and I b [ n ] denote the two measured line currents. Only two current channels were used because the machines were connected in a three-phase, three-wire setup with no neutral connection. According to Kirchhoff’s current law, the instantaneous line currents satisfy I a [ n ] + I b [ n ] + I c [ n ] = 0 , so the third current can be obtained as I c [ n ] = ( I a [ n ] + I b [ n ] ) . Similarly, only V a b and V b c were used as voltage targets because Kirchhoff’s voltage law gives V a b [ n ] + V b c [ n ] + V c a [ n ] = 0 , so V c a [ n ] = ( V a b [ n ] + V b c [ n ] ) . Thus, including I c and V c a would introduce redundant channels rather than additional independent information. In addition, the selected current channels were processed using time-delay embedding (TDE), which expands each measured current signal into delayed coordinates and, in the sense of Takens’ theorem [27], recovers the machine dynamics from the observed current signatures. A TDE of dimension m = 64 with Takens’ delay step d = 7 is formed by collecting lagged samples as shown in Equation (1).
I r [ n ] , I r [ n + d ] , , I r [ n + ( m 1 ) d ] , r { a , b } ,
over the valid range n = 0 , , N ( m 1 ) d 1 . Stacking all lagged copies of both currents produces a ( 2 m ) × T sequence, where T = N ( m 1 ) d . The voltage channels V a b [ n ] and V b c [ n ] are not embedded. Scalar targets (load, SE , DE , and PF, where applicable) do not require temporal alignment and are broadcast as constant sequences of length L. The input sequences were initially constructed without time-delay embedding (No-TDE) to establish a baseline representation. While the models learned meaningful mappings under this formulation, introducing time-delay embedding led to a clear reduction in RMSE across all machine platforms. Consequently, the TDE-based representation was adopted throughout this study, and all results reported in the main text correspond to this formulation. The No-TDE results are provided in Appendix A.1, Appendix A.2 and Appendix A.3 for the three model formulations. All recordings were partitioned into three parts: the first 80% was dedicated to the training, the next 10% was used for validation, and the last 10% was used for testing. Each part was then subjected to TDE and segmented into overlapping fixed-length windows using strides. This strategy serves two purposes: first, it compensates for the smaller number of healthy recordings by applying shorter strides to healthy signals, thereby generating more windows; second, it enriches temporal dependencies across windows, ensuring that the models are trained on sequences that preserve continuity while reducing redundancy in the more abundant faulty data. A batch size of 200 was used uniformly across all the models with a learning rate of 0.001. The L1 loss function was employed for optimization. RMSE was used as the baseline evaluation metric for all models, with additional metrics introduced in the appropriate model subsections. Figure 6 summarizes the overall workflow of the proposed analysis. First, the measured line-current signals are preprocessed using segmentation, time-delay embedding, and dataset partitioning. The processed sequences are then used as inputs for three representative model realizations, Mamba, FNO, and WNO, each of which produces multi-output predictions of the diagnostic quantities considered in this study.

3. Mamba Model

Building on the motivation for sequence-to-sequence modeling in the introduction, this section first examines Mamba as a complementary temporal framework. While distinct from neural operators in formulation, Mamba captures the evolving state dynamics that motivate operator-based learning, providing a sequential perspective for comparison and continuity. Mamba, also known as Structured Selective State Space Models (SSSMs), are a class of dynamical models that capture time-varying system behavior by allowing system parameters to adapt in response to input observations. Unlike fixed-parameter state-space models, such as pure linear models used for generic monitoring, Mamba enable input-conditioned transitions suitable for modeling complex phenomena, such as mechanical faults in rotating machinery. The core discrete-time dynamics are defined in Equation (2) as
h t = A t h t 1 + B t x t , y t = C t h t ,
where x t R d head , h t R d state , and y t R d head represent the input, hidden state, and output, respectively, and A t , B t , C t are computed via linear projections from the input x t [28,29,30,31]. A block diagram for the above state space model is shown in Figure 7.
To capture long-term dependencies and complex dynamics, Mamba rely on stacking SSSM models into deep architectures as shown in Figure 8. In addition to the core SSSM transformation, a Mamba2 block [31] involves several key supporting components: normalization layers (like RMSNorm) to stabilize activations, parameter-splitting projection layers to generate the input-dependent parameters for the SSSM, a 1D convolution to capture immediate local context, non-linear activation (SiLU), and a multiplicative gating mechanism to modulate the SSSM output. Following the standard design in deep architectures, we incorporate a residual connection (skip connection) around each Mamba layer, summing its input with its transformed output. This enables effective gradient propagation and information flow throughout the deep stack of layers.
In eccentricity fault detection, where variations in the radial or axial air gap between the rotor and stator result in rotor misalignment, electromagnetic disturbances evolve with rotor position, load, and fault severity. Mamba are well suited to modeling these changes by continuously updating internal states based on observed inputs. By embedding input-driven transitions within an interpretable modeling structure, Mamba offer a principled approach to capturing fault-induced deviations as they unfold. This approach integrates three key components. First, it incorporates sensitivity to frequency-domain features often associated with eccentricity-related distortions. Second, it leverages residual-based diagnostic strategies grounded in physical system behavior. Third, it introduces adaptive temporal learning to capture temporal transitions, in contrast to convolutional models that operate over static receptive fields. By maintaining internal memory, Mamba can represent fault progression more effectively [32]. This means that Mamba do not simply classify a fault at a single instant but rather accumulate information about its evolution over time. This property makes Mamba suitable for adaptation to monitoring tasks in which fault severity changes over time.

3.1. Mamba Methodology

The Mamba framework was applied to the INV-RSM, IM, and SPSM datasets described in Table 1 following the preprocessing described in Section 2.3 and data segmentation shown in Table 2.
Each input sample consists of two line currents ( I a , I b ) , while the output stack includes the line-to-line voltages ( V a b , V b c ) , load level, and the severity of SE and DE. For the SPSM dataset, PF is included as well. Scalar targets are broadcast along the sequence dimension to align with waveform predictions. The network adopts a residual state-space design based on Mamba2 [31]. The preprocessed sequences are provided to the Mamba model as length-L windows, where each sample consists of the 2 m -dimensional vector obtained from the time-delay embedding of the two line-current channels. A linear projection first maps each embedded vector to a 128-dimensional hidden representation. The sequence is then passed through seven stacked Mamba2 blocks [31]. Each block performs layer normalization, a selective state-space update with a state dimension of 64 and a head dimension of 32, followed by a residual connection. The expansion ratio within each block is set to two, which defines the width of the intermediate projection used in the selective update. This configuration allows the network to represent both short-range variations within the waveform and the long-range temporal structure created by the delay embedding. A final linear projection maps the hidden representation to the model’s output channels. Scalar targets such as load, SE, DE, and PF (for SPSM) are broadcast along the time axis so that all outputs share the same sequence length. The voltage channels are predicted as full waveforms across the window, while the scalar targets remain constant in time. Treating all outputs as sequences allows a unified L1 loss to be applied across all channels and timesteps. Training uses the L1 loss computed over the full output window. RMSprop is used for the INV-RSM dataset, while AdamW is used for SPSM and IM. The maximum number of epochs is set to 20,000 for INV-RSM and 10,000 for both SPSM and IM. All models are implemented in PyTorch version 2.5.3 Lightning and trained on the Rorqual computing cluster using one NVIDIA GPU and 12 CPU cores. Memory allocations were 120 GB for INV-RSM and 64 GB for SPSM and IM. Model performance is reported using RMSE for each output channel.

3.2. Mamba Results and Evaluation Metrics

The performance of the Mamba model was evaluated using the root mean squared error (RMSE) computed independently for each predicted output channel. This is defined in Equation (3) as:
RMSE = 1 N i = 1 N ( y i y ^ i ) 2 .
where y i and y ^ i are the ground truth and predicted values, respectively, and N is the number of samples. RMSE is used as the primary performance metric throughout this work, as it provides a consistent measure of average prediction accuracy across sequence outputs. Additional error statistics, including maximum absolute error (MaxAE) and the 95th percentile error (Q95), are reported in the appendices for completeness but are not used for comparative interpretation. Figure 9 summarizes the RMSE values across machines and targets.
For the SPSM dataset, RMSE values remained on the order of 10 3 for all output channels, including V a b , V b c , load, SE, DE, and PF. Both voltage waveforms were reconstructed with high accuracy, and severity estimates closely followed the imposed fault levels across the sequence windows.
For the INV-RSM dataset, RMSE values for load and eccentricity severity outputs were on the order of 10 3 to 10 2 , while voltage channels exhibited greater errors, reaching the 10 2 range. This behavior is consistent with the increased waveform complexity introduced by inverter switching and higher sampling rates. Despite this, the model produced stable sequence-level predictions across all outputs.
On the IM dataset, RMSE values across voltage, load, and severity targets remained in the 10 4 to 10 3 range, indicating consistently low prediction error across sequence windows. The limited spread in RMSE values suggests that the state-space formulation combined with time-delay embedding effectively captured the dominant temporal structure of the IM signals. The full numerical results across the three machines are provided in Table 3. Overall, the RMSE results indicate that the Mamba architecture supports stable multi-output sequence-to-sequence prediction from current measurements across different machine platforms with time-delay embedding. Figure 10, Figure 11 and Figure 12 provide representative examples of the SSM predictions for the IM case across all output channels. Each waveform segment is indexed along the horizontal axis, with signal amplitude shown on the vertical axis. For the severity outputs, discrete target levels of 0.2 , 0.4 , and 0.6 correspond to fault severities of 20%, 40%, and 60%, respectively. For SE and DE, the predicted outputs remain at the imposed severity levels throughout each segment. For V a b , the predicted signals follow the measured waveforms with astonishing precision, capturing the dominant oscillatory behavior across operating conditions.

4. Neural Operator Models

Building on the state-space formulation of Mamba, which models temporal evolution via input-conditioned dynamics, this section extends the sequence-to-sequence framework to operator-based learning. The Neural Operator family, comprising the Fourier and Wavelet variants, offers complementary formulations that learn mappings across entire waveform functions rather than stepwise state transitions, thereby broadening the perspective established by the Mamba model.

4.1. Fourier Neural Operators (FNOs)

Mamba demonstrated that a sequential state-space formulation can reliably track eccentricity faults across the three testbed machines; its strength lies in modeling temporal dynamics step by step via a stacking of linear models composed with non-linearity operators in the time domain. However, this is not compatible with the MCSA view, which relies on the harmonic signature and the frequency analysis of the waveforms. In this regard, FNO can provide a complementary perspective via spectral convolutions that propagate information across the entire waveform in parallel, offering a different approach to function-to-function learning that scales more efficiently with longer sequences and multiple outputs. The FNO architecture is fundamentally different: it transforms the entire input signal into the frequency domain, learns in spectral space, and projects back to the time domain, enabling it to capture correlations and patterns across the entire input. This characteristic aligns well with the physical nature of electrical machine waveforms, which often exhibit global harmonic patterns under fault conditions. Practical considerations also motivated the shift to FNO. Scaling models such as Mamba to longer sequences or multi-output channels, e.g., (SE, DE, load, voltages, power factor), incurs significantly higher GPU memory requirements and computational overhead. While CNN-based models, whether in the time or frequency domain, rely on convolutional kernels to learn local patterns, they struggle to capture global dependencies governed by underlying physics. FNOs provide a principled solution by learning mappings between infinite-dimensional function spaces [22].
Instead of learning a point-wise mapping, FNOs aim to learn a full operator G that maps functions in a space A to functions in a space B is defined in Equation (4) as:
G : A B , u G ( u ) = v .
where u and v are functions in A and B , respectively. The domains of the functions in A and B have the same domain Ω R d , and ranges in R d a and R d b , respectively.
FNOs are known to excel at solving PDEs, where u represents the initial conditions and v the final solution at a later time. This is especially useful for problems in which outputs depend on spatial–temporal relationships, as in fluid mechanics and electromagnetism. In our context, the operator G represents an actual electrical machine that takes three-phase currents, I = I a ( t ) , I b ( t ) , considered functions in A = L 2 ( R + , R 2 ) , and produces the three-phase voltages, V = ( V a b ( t ) , V b c ( t ) ) , giving rise to those currents as well as any conditions, considered as functions, which affect the currents. In our general setting, the functional conditions are the load L ( t ) , the power factor, P ( t ) , the SE severity, S ( t ) , and the DE severity, D ( t ) . Therefore, B , in our general setting, is the functional space L 2 ( R + , R 6 ) , representing the tuple functions V a b ( t ) , V b c ( t ) , L ( t ) , P ( t ) , S ( t ) , D ( t ) .
The FNO layer consists of the following steps:
1.
Lifting Layer: Lift the input function u ( x ) to a higher-dimensional representation using a pointwise (local) linear transformation:
u 0 ( x ) = P ( u ( x ) ) , P : R m R d .
2.
Fourier Transform: Apply a Fourier transform to u 0 ( x ) :
u ^ 0 ( k ) = F [ u 0 ] ( k ) = Ω u 0 ( x ) e 2 π i k · x d x .
3.
Spectral Convolution: Apply a learnable complex-valued filter W ( k ) in the Fourier domain:
v ^ ( k ) = W ( k ) · u ^ 0 ( k ) , for a limited number of modes k .
4.
Inverse Fourier Transform: Transform back to the spatial domain:
v ( x ) = F 1 [ v ^ ] ( x ) = k v ^ ( k ) e 2 π i k · x .
5.
Nonlinear Activation and Skip Connection: Add a weight-based skip connection and apply a nonlinear function (e.g., ReLU or GELU):
u j + 1 ( x ) = σ v ( x ) + W j u j ( x ) .
6.
Projection Layer: This is a pointwise linear layer that projects the FNO output back to the target dimension:
v f ( x ) = Q ( v ( x ) ) , Q : R d R p .
A complete FNO model stacks several such layers. The lifting and the projection layers ensure that the input and output spaces match those of the FNO. Figure 13 shows the architecture of an FNO.
FNOs are powerful because they:
  • Learn directly in the frequency domain, enabling global information propagation.
  • Handle varying input resolutions and geometries.
  • Can effectively learn models for PDEs and many physical systems.
  • They support efficient multi-output regression within a unified architecture.

4.1.1. FNO Methodology

The FNO framework was applied to the INV-RSM, IM, and SPSM datasets described in Table 1, following the data preprocessing and segmentation described in Section 2.3 and shown in Table 4.
Each input sequence consisted of two line currents, and the target outputs included voltages, load level, and the severity of SE and DE, with PF also included for the SPSM dataset. In all FNO models across the three machines, 32 Fourier modes were retained, ensuring that both fundamental and harmonic components of interest were preserved while avoiding excessive computational cost. The network width was set to 20, which defines the number of hidden features per layer and governs the model’s capacity. The maximum epochs were fixed at 20,000 for INV-RSM and 10,000 for SPSM and IM. Training was conducted on the Narval, Fir, and Rorqual supercomputing clusters of the Digital Research Alliance of Canada. Each experiment used NVIDIA A100 GPU with 40 GB VRAM in the case of FNO and WNO, and H100. CPU Memory allocations varied with dataset size, ranging from 84 GB for SPSM to 48 GB for IM. Optimization was performed with RMSprop using a learning rate of 0.001 and L2 regularization set to 0.000001 .

4.1.2. FNO Results and Evaluation Metrics

The performance of the FNO model was further evaluated using RMSE, MaxAE and Q95 metrics. All metrics were computed independently for each output target, namely V a b , V b c , load, SE, DE, and PF where applicable. The complete numerical results for the FNO model are summarized in Table 5. A consistent pattern emerges in the severity estimation performance across the different machine platforms. For the SPSM and IM datasets, the RMSE values associated with both SE and DE remain on the order of 10 3 . In contrast, for the INV-RSM dataset, the SE RMSE increases to the order of 10 2 , while the DE RMSE reaches the few-times- 10 2 range. These results define the characteristic severity estimation error scale achieved by the operator-based framework across the three machines.
Figure 14 presents a bar chart illustrating the distribution of RMSE values across all predicted targets for the test machines. In addition, Figure 15, Figure 16, Figure 17 and Figure 18 show representative comparisons between predicted and measured signals for all targets in the SPSM case. In these plots, the horizontal axis corresponds to the time index of the waveform segments, while the vertical axis represents the signal amplitude. For SE and DE, the target labels 0.2, 0.4, and 0.6 denote 20%, 40%, and 60% fault severity levels, respectively. The predicted outputs closely track these discrete severity levels across segments, indicating stable and consistent severity estimation. For PF, the target values correspond to 0.9 lagging, 0.9 leading, and unity operating conditions, and the predictions accurately reproduce these discrete operating points with minimal deviation. For V a b , the predicted sinusoidal waveforms closely match the measured signals across operating conditions, demonstrating that the FNO model generalizes effectively to continuous waveform prediction.
The FNO results highlight a distinct advantage of operator-based learning: the model predicts complete sequences of voltages, load, and fault severities, providing a continuous trajectory of system behavior rather than a single scalar per window. This representation reveals how the predicted waveforms evolve in time according to the input dynamics. Therefore, when trained properly, FNOs can capture a detailed view of dynamic fault progression. Such sequence-to-sequence inference marks a conceptual shift from traditional approaches and lays the foundation for subsequent operator formulations, such as the WNO, which further enhance temporal and spectral localization.

4.2. Wavelet Neural Operators (WNOs)

The Wavelet Neural Operator (WNO) was explored as a natural extension of the FNO to investigate whether multi-resolution representations could yield superior predictive accuracy. Unlike the Fourier transform, which decomposes signals into global sinusoidal modes with uniform frequency resolution, the discrete wavelet transform (DWT) provides localized, scale-dependent features that can capture both transient and steady-state components of the electrical signals. This property is particularly appealing for fault detection, as eccentricity faults may manifest as both low-frequency harmonics and short-lived perturbations in the current and voltage waveforms [33,34,35].
The WNO retains the same high-level architecture as the FNO described in Section 4.1, with the exception of the spectral transformation stage. Specifically, steps 2 and 4 of the FNO framework (the Fourier transform and its inverse) are replaced by a multi-level DWT and its inverse, as shown in Equations (5)–(7):
( a J , { d j } j = 1 J ) = W [ u 0 ( x ) ] = u 0 , ϕ J , n , { u 0 , ψ j , n } j = 1 J .
v ^ ( J , n ) = W a a J [ n ] , v ^ ( j , n ) = W d , j d j [ n ] , j = 1 , , J
v ( x ) = W 1 W a a J , { W d , j d j } j = 1 J .
where a J denotes the approximation coefficients at the coarsest scale J (low-frequency content), d j denotes the detail coefficients at scale j (localized high-frequency content), ϕ J , n ( x ) is the scaling function (low-pass basis), and ψ j , n ( x ) is the wavelet function (high-pass basis). The DWT decomposes the signal into approximation and detail coefficients at multiple scales, enabling joint time–frequency localization. The learned spectral filters W a and W d , j are applied directly in the wavelet domain to { a J , d j } , followed by the inverse DWT to return to the spatial domain for the nonlinear activation and skip connection steps. The lifting and projection layers remain unchanged from the FNO case.

4.2.1. WNO Methodology

In contrast to conventional implementations of Wavelet Neural Operators (WNOs), which retain only the approximation coefficients and the final detail coefficients at each layer, we implemented a hybrid spectral–spatial mixing strategy. This means that the mixing operation incorporates both inter-scale (wavelet-level) and intra-scale (spatial) mixing, facilitating interactions not only across frequency bands but also among neighboring spatial locations. This hybrid method enhances representational capacity by integrating spectral decomposition with spatial context, albeit at the cost of increased computational complexity. In this case, the learned spectral weights W a and W d , j in Equation (6) depend on n as well, making it more general and more computationally expensive. Additionally, this implementation introduces a masking mechanism that allows users to selectively exclude specific wavelet levels during mixing. This flexibility is critical for:
  • Noise suppression: Omitting irrelevant or noisy frequency bands.
  • Computational efficiency: WNOs exhibit high training costs, particularly when retaining all decomposition levels. Masking enables adaptive complexity reduction, optimizing performance in latency-sensitive applications [36].
The data preprocessing for WNO is the same as for FNO, as explained in Section 2.3. The WNO model takes as input a two-channel sequence of line currents and maps it to a multi-channel output sequence consisting of voltage, load, eccentricity severity, and, for SPSM, power factor. Processing occurs in the wavelet domain via a multi-level DWT, where the number of decomposition scales J was dataset-specific: 16 for INV-RSM, eight for IM, and eight for SPSM. For each retained subband, a fixed number of wavelet modes was used for spectral mixing, balancing representational power and computational efficiency. The model width was set to 20 across all machines, with additional convolutional layers (filter size eight, pooling size eight) applied to reduce sequence dimensionality before projection to the output space. Training was performed on the Narval supercomputing cluster using single NVIDIA A100 GPUs. Table 4 shows the data segmentation used for training, and Table 6 summarizes the dataset-specific hyperparameters.

4.2.2. WNO Results and Evaluation Metrics

The WNO model was evaluated on SPSM, INV-RSM, and IM datasets using the same metrics described earlier (RMSE, MaxAE, Q95), with full numerical results provided in Table 7. The WNO results also showed consistent severity estimates across machines. For the SPSM dataset, RMSE values for both SE and DE remain on the order of 10 3 . For the INV-RSM dataset, SE and DE increase to the order of 10 2 . For the IM dataset, RMSE values for both SE and DE remain on the order of 10 3 . Collectively, the severity RMSE scale is of a very limited range across the three machine platforms, and Figure 19 summarizes RMSE performance across all machines and targets in a bar chart.
Although the WNO was introduced with the expectation that its localized multiscale representation could enhance sensitivity to localized variations, the results did not show a consistent performance gain. Based on the mathematical formulation of WNOs, particularly their ability to retain temporal localization through wavelet decomposition, a noticeable improvement in accuracy was anticipated. However, in practice, the overall performance did not improve upon that of the FNO, with each model exhibiting similar strengths and weaknesses across the different targets and machine types. The findings therefore indicate that, despite their architectural differences, both FNO and WNO can deliver accurate predictions for the current eccentricity fault detection and quantification tasks. The results reinforce the flexibility of neural operator frameworks in this domain and highlight that gains from adopting a wavelet-based variant may depend on application-specific factors beyond those considered in this study. Together with FNO, the WNO confirms that operator-based mappings can flexibly adapt to different spectral formulations while maintaining consistent diagnostic accuracy.

5. Discussion and Implications

The results across the three frameworks show that sequence-to-sequence learning can effectively represent fault-related behavior in electromechanical systems. Although each model approaches this task differently, with Mamba relying on selective state updates, FNO using global spectral mapping, and WNO using multiscale wavelet representations, all three produced sample-level predictions of voltage, load, and eccentricity severity across waveform segments. This consistency indicates that the sequence-to-sequence formulation remained effective across different learning paradigms and across multiple machine platforms. From a physical standpoint, these findings are consistent with signal distortions arising from eccentricity-related air-gap asymmetry. The FNO results show waveform reconstructions that reflect the harmonic structure typically associated with such distortions. The WNO formulation provides a localized multiscale representation relevant to waveform variations across different scales, although in the present study, it did not yield a consistent performance gain over FNO. The temporal coherence observed in the Mamba predictions likewise indicates that state-space sequence modeling can represent the dominant temporal structure of fault-affected signals across waveform segments. Taken together, these observations show that the learned mappings capture meaningful aspects of the electromechanical behavior associated with eccentricity faults.
Across the three machine platforms, a consistent performance pattern was observed for all formulations. The INV-RSM dataset yielded higher errors than SPSM and IM, consistent with the increased waveform complexity introduced by inverter excitation and switching-related distortion. However, the prediction errors remained sufficiently low to indicate that the proposed formulations still provided effective multi-output estimation under these more demanding conditions. In contrast, the IM dataset generally yielded the lowest error levels, suggesting that its dominant signal structure was more readily captured under the present learning setup. The SPSM results also remained low across the principal outputs, showing that the same formulation can maintain stable predictive behavior across different electromechanical platforms while still reflecting meaningful differences in task difficulty. To further contextualize the present IM prediction accuracy within the literature, the IM fault-severity RMSE values obtained by the proposed Mamba, FNO, and WNO models were compared with representative induction motor-based approaches, including conventional spectral-feature regression, PCA-based feature regression, hierarchical CNN variants, and feature-inherited HCNN [37]. The literature reports eccentricity as a single ECC severity target, whereas the present IM results report SE and DE severities separately. Table 8 summarizes the compared methods, their main modeling strategy, and the RMSE values, while Figure 20 provides a visual comparison of the RMSE values. A similar RMSE-based comparison was not included for synchronous machines because the available eccentricity studies in the literature primarily report eccentricity fault detection and accuracy, rather than continuous severity estimation RMSE.
Compared with earlier Hierarchical CNN-based (HCNN) results [19,38], the present study extends the modeling scope and achieves lower severity estimation errors across multiple machine platforms. In contrast to the HCNN formulation, the current frameworks operate within a single learning setup while supporting multi-output prediction from current-based measurements. In earlier HCNN-based formulations, expanding the diagnostic scope would require additional output-specific modeling blocks, whereas the present frameworks jointly estimate multiple diagnostic quantities within one model. Unlike the HCNN architecture, which requires machine-specific tuning of the number of layers, all new models employ a fixed layer count across all machines. Notably, for the INV-RSM, the new models achieve results comparable to those of the tuned HCNN model reported in [38] without hyperparameter tuning. The comparatively higher errors observed for the INV-RSM, relative to the IM and SPSM, are attributed to the pronounced noise in the input currents (as shown in Figure 4) and the voltage discontinuities (as shown in Figure 5).
A key practical advantage of the proposed framework is that explicit load, PF, and voltage information are not required as inputs; only the currents are required. These currents inherently encode the operating condition, and the models learn to map raw current waveforms directly to multiple targets, including load, voltage, and both SE and DE severities. The Time Delay Embedding (TDE) technique enriches the dynamical structure of the currents, enabling the models to capture machine dynamics and, implicitly, infer the load profile and other outputs from the current signatures alone. This practical strength means the proposed approach remains fully operational in real-world settings, even when the load, for example, is unmeasured, unknown, or constantly varying. This is reflected in the experimental design, which included multiple load conditions: no load, 50% load, 75% load, and full load. The current magnitude varies systematically with load, as expected, and, rather than treating each load condition separately, the models were trained across these diverse operating points, demonstrating that they generalize across load levels without requiring load as an explicit input. The fact that the models successfully predict the load as an output further confirms that the current waveforms contain sufficient information about the machines’ operating conditions.

6. Conclusions

This study introduced and evaluated three sequence-to-sequence learning frameworks for eccentricity fault estimation in an Induction Machine (IM), a Salient Pole Synchronous Machine (SPSM), and an Inverter-fed Reluctance Synchronous Machine (INV-RSM). The Mamba, Fourier Neural Operator (FNO), and Wavelet Neural Operator (WNO) models were trained to map stator current waveforms to multiple diagnostic quantities, including voltages ( V a b , V b c ) , load, and eccentricity severity levels (SE and DE), with power factor included where applicable. Across the three machine platforms, all three frameworks achieved low severity estimation errors for the principal diagnostic targets. For SPSM and IM, severity RMSE values remained in the 10 3 range or lower across all models, with IM reaching 10 4 . For INV-RSM, severity RMSE values increased relative to SPSM and IM but remained on the order of 10 2 , indicating effective severity estimation under inverter-fed conditions despite the presence of inverter-induced waveform distortion and the absence of explicit noise filtering of the current inputs. The models were trained and tested on real measured currents from multiple loads and varying power factors. These real-world measurements inherently contain ambient noise, sensor noise, and electrical interference. The fact that our models achieved low prediction errors under these realistic conditions provides indirect evidence of robustness to typical noise levels. Nevertheless, a controlled noise injection study (e.g., adding synthetic Gaussian noise at varying SNRs) is necessary to systematically characterize this robustness.
Beyond numerical accuracy, the results highlight an important modeling advantage of the proposed formulations. Unlike conventional approaches that require separate models or increased architectural complexity as the number of diagnostic targets grows, both the sequence-based and operator-based formulations enable multi-output prediction within a single model. A shared representation of electrical behavior is learned from input measurements, from which multiple diagnostic quantities are inferred simultaneously, without requiring separate models or architectural redesign. This property supports multi-output fault monitoring within the machine platforms considered in this study and suggests potential for broader machine health assessment. It is important to note that this study is limited to three machine types and eccentricity faults under controlled laboratory conditions. Real-time deployment and large-scale field validation were not considered, and generalization beyond the tested configurations remains to be established. Future work will use Grad-CAM [39] to identify which features in the line currents are essential for severity quantification. This will help us understand the capabilities of each architecture and, therefore, tap into detailed interpretations of their results. Moreover, we will need to adjust the architectures to address the low RMSE for RSM, extend them to additional fault mechanisms, explore hybrid formulations that integrate state-space and operator reasoning, and investigate adaptive and transfer learning strategies to improve cross-machine generalization. Finally, systematic noise injection studies and labeled multi-fault datasets for fault distinguishability analysis represent essential next steps toward real-world deployment.

Author Contributions

Conceptualization, L.Y. and B.M.; methodology, L.Y. and B.M.; validation, L.Y. and B.M.; formal analysis, L.Y. and B.M.; data curation, L.Y. and B.M.; writing—original draft preparation, L.Y. and B.M.; writing—review and editing, L.Y., B.M. and I.T.C.; visualization, L.Y. and B.M.; supervision, B.M. and I.T.C.; Funding acquisition, I.T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the University of Victoria, the Natural Sciences and Engineering Research Council of Canada (NSERC), and Mitacs.

Data Availability Statement

The data is available on request.

Acknowledgments

The authors would like to acknowledge the Digital Research Alliance of Canada (alliancecan.ca) for providing the computational resources. The authors would also like to acknowledge Albert Gu for his guidance and help with the Mamba model.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SEStatic Eccentricity
DEDynamic Eccentricity
FNOFourier Neural Operator
WNOWavelet Neural Operator
SSSMSelective State Space Model
CNNConvolutional Neural Network
HCNNHierarchical Convolutional Neural Network
SPSMSalient Pole Synchronous Motor
IMInduction Motor
INV-RSMInverter-connected Reluctance Synchronous Machine
TDETime Delay Embedding
MCSAMotor Current Signal Analysis
Grad-CAMGradient-weighted Class Activation Mapping

Appendix A. Results with No TDE

Appendix A.1. Mamba

Table A1. Performance Results for the Mamba model with no TDE.
Table A1. Performance Results for the Mamba model with no TDE.
SPSMINV-RSMIM
TargetRMSEMaxAEQ95RMSEMaxAEQ95RMSEMaxAEQ95
Vab0.002290.021840.004170.038092.471640.022540.071421.703610.02824
Vbc0.001670.018800.003250.037222.427690.021910.076591.298470.03267
Load0.001520.120400.002550.005431.005990.003100.094221.531300.22926
SE0.002190.397180.003600.065313.348540.198190.037870.810540.00524
DE0.001910.575870.003250.090543.147980.200900.018870.527930.00274
PF0.001510.019940.00252

Appendix A.2. FNO

Table A2. Performance Results for the FNO model with no TDE.
Table A2. Performance Results for the FNO model with no TDE.
SPSMINV-RSMIM
TargetRMSEMaxAEQ95RMSEMaxAEQ95RMSEMaxAEQ95
Vab0.004460.050730.009080.284952.961750.951110.041780.923720.01715
Vbc0.004490.051520.009540.229992.445240.403340.046530.875180.01958
Load0.002860.011840.005140.004270.091710.009210.066321.000240.00760
SE0.005620.197630.005670.005150.165460.010890.026210.402070.00393
DE0.003430.163350.006440.008770.330230.007900.008990.202130.00271
PF0.005160.071690.00454

Appendix A.3. WNO

Table A3. Performance Results for the WNO model with no TDE.
Table A3. Performance Results for the WNO model with no TDE.
SPSMINV-RSMIM
TargetRMSEMaxAEQ95RMSEMaxAEQ95RMSEMaxAEQ95
Vab0.010520.401560.020120.258632.478170.685930.583791.247811.06592
Vbc0.010430.318900.019440.227042.436320.429640.583961.248541.06449
Load0.003030.087000.005860.002110.051000.004070.004290.140800.00754
SE0.008200.321120.009430.152380.401510.399570.005520.212310.00505
DE0.003100.155720.005610.129760.406950.384080.006190.248270.00272
PF0.001470.019900.00292

Appendix B. Experimental Testbed Parameters

Appendix B.1. SPSM

Table A4. Specifications of the SPSM used in the experimental setup.
Table A4. Specifications of the SPSM used in the experimental setup.
Machine ParameterValue
Rated power2 kW
Stator voltage208 V
Number of phases3
Number of poles4
Speed1800 rpm
Frequency60 Hz
Type of stator windingDouble layer, lap
Number of turns per phase144
Number of stator slots36
Number of rotor bars20 (5 bars per pole)
Stack length76 mm
Stator inner diameter148 mm
Rotor outer diameter146.8 mm
Stator resistance per phase 0.6 Ω
Stator leakage inductance per phase0.0079 H
Rotor bar resistance 5.827 μ Ω
Rotor bar leakage inductance 0.034 μ H
End ring resistance 0.4531 μ Ω
End ring leakage inductance17.5 nH
Field winding resistance 81 Ω
Field winding inductance6 H
Nominal air gap along d-axis0.6 mm
Nominal air gap along q-axis40.27 mm
Effective air gap along d-axis1.7769 mm
Effective air gap along q-axis59.1058 mm

Appendix B.2. IM

Table A5. Specifications of the IM used in the experimental setup.
Table A5. Specifications of the IM used in the experimental setup.
Machine ParameterValue
Rated power2 kW (1.5 hp)
Stator voltage460 V
Number of phases3
Number of poles4
Speed1800 rpm
Frequency60 Hz
Type of stator windingSingle layer, concentric
Number of turns per phase282
Number of stator slots36
Number of rotor bars24 (6 bars per pole)
Stack length114.055 mm
Stator outer diameter143.5 mm
Stator inner diameter93.50 mm
Rotor outer diameter92.716 mm
Stator resistance per phase 3.4 Ω
Stator leakage inductance per phase0.0472 H
Rotor bar resistance 100 μ Ω
Rotor bar leakage inductance 0.168 μ H
End ring resistance 5.35 μ Ω
End ring leakage inductance26.9 nH
Moment of inertia 0.00651 kg · m 2
Nominal air gap along d-axis0.392 mm
Nominal air gap along q-axis11.56 mm
Effective air gap along d-axis0.5912 mm
Effective air gap along q-axis37.3134 mm

Appendix B.3. INV-RSM

Table A6. Specifications of the INV-RSM used in the experimental setup.
Table A6. Specifications of the INV-RSM used in the experimental setup.
Machine ParameterValue
Rated power2 kW (1.5 hp)
Stator voltage460 V
Number of phases3
Number of poles4
Speed1800 rpm
Frequency60 Hz
Type of stator windingSingle layer, concentric
Number of turns per phase282
Number of stator slots36
Number of rotor bars24 (6 bars per pole)
Stack length114.055 mm
Stator outer diameter143.5 mm
Stator inner diameter93.50 mm
Rotor outer diameter92.716 mm
Stator resistance per phase 3.4 Ω
Stator leakage inductance per phase0.0472 H
Rotor bar resistance 100 μ Ω
Rotor bar leakage inductance 0.168 μ H
End ring resistance 5.35 μ Ω
End ring leakage inductance26.9 nH
Moment of inertia 0.00651 kg · m 2
Nominal air gap along d-axis0.392 mm
Nominal air gap along q-axis11.56 mm
Effective air gap along d-axis0.5912 mm
Effective air gap along q-axis37.3134 mm

References

  1. Khorasgani, H.; Farahat, A.; Gupta, C. Data-driven Residual Generation for Early Fault Detection with Limited Data. arXiv 2021, arXiv:2110.15385. [Google Scholar] [CrossRef]
  2. Yin, S.; Li, X.; Gao, H.; Kaynak, O. Data-Based Techniques Focused on Modern Industry: An Overview. IEEE Trans. Ind. Electron. 2015, 62, 657–667. [Google Scholar] [CrossRef]
  3. Toliyat, H.; Al-Nuaim, N. Simulation and detection of dynamic air-gap eccentricity in salient-pole synchronous machines. IEEE Trans. Ind. Appl. 1999, 35, 86–93. [Google Scholar] [CrossRef]
  4. Nandi, S.; Toliyat, H. Fault diagnosis of electrical machines-a review. In Proceedings of the IEEE International Electric Machines and Drives Conference; IEMDC’99. Proceedings (Cat. No.99EX272); IEEE: Piscataway, NJ, USA, 1999; pp. 219–221. [Google Scholar] [CrossRef]
  5. Henao, H.; Capolino, G.A.; Fernandez-Cabanas, M.; Filippetti, F.; Bruzzese, C.; Strangas, E.; Pusca, R.; Estima, J.; Riera-Guasp, M.; Hedayati-Kia, S. Trends in Fault Diagnosis for Electrical Machines: A Review of Diagnostic Techniques. IEEE Ind. Electron. Mag. 2014, 8, 31–42. [Google Scholar] [CrossRef]
  6. Zhao, R.; Yan, R.; Chen, Z.; Mao, K.; Wang, P.; Gao, R.X. Deep learning and its applications to machine health monitoring. Mech. Syst. Signal Process. 2019, 115, 213–237. [Google Scholar] [CrossRef]
  7. Al-Sabbagh, Q.S.; Alwan, H.E. Detection of Static Air-Gap Eccentricity in Three Phase induction Motor by Using Artificial Neural Network (ANN). J. Eng. 2009, 15, 4176–4192. [Google Scholar] [CrossRef]
  8. Matić, D.; Kulić, F.; Pineda-Sanchez, M.; Pons-Llinares, J. Artificial Neural Networks Eccentricity Fault Detection of Induction Motor. In Proceedings of the 2010 Fifth International Multi-Conference on Computing in the Global Information Technology; IEEE: Piscataway, NJ, USA, 2010; pp. 1–4. [Google Scholar] [CrossRef]
  9. Irgat, E.; Ünsal, A.; Canseven, H.T. Detection of Eccentricity Faults of Induction Motors Based on Decision Trees. In Proceedings of the 2021 13th International Conference on Electrical and Electronics Engineering (ELECO); IEEE: Piscataway, NJ, USA, 2021; pp. 435–439. [Google Scholar] [CrossRef]
  10. Alameh, K.; Cité, N.; Hoblos, G.; Barakat, G. Feature extraction for vibration-based fault detection in Permanent Magnet Synchronous Motors. In Proceedings of the 2015 Third International Conference on Technological Advances in Electrical, Electronics and Computer Engineering (TAEECE); IEEE: Piscataway, NJ, USA, 2015; pp. 163–168. [Google Scholar] [CrossRef]
  11. Dussa, R.K.; Kumar N, P. Implementation of Machine Learning to Analyze Static Eccentricity Fault in SPMSM using FEM. In Proceedings of the 2023 IEEE 8th International Conference for Convergence in Technology (I2CT); IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar] [CrossRef]
  12. Huang, X.; Habetler, T.G.; Harley, R.G. Detection of Rotor Eccentricity Faults in a Closed-Loop Drive-Connected Induction Motor Using an Artificial Neural Network. IEEE Trans. Power Electron. 2007, 22, 1552–1559. [Google Scholar] [CrossRef]
  13. Ilamparithi, T.C.; Nandi, S. Analysis, modeling and simulation of static eccentric reluctance synchronous motor. In 8th IEEE Symposium on Diagnostics for Electrical Machines, Power Electronics & Drives; IEEE: Piscataway, NJ, USA, 2011; pp. 45–50. [Google Scholar]
  14. Yusuf, L.; Ilamparithi, T.C. Dynamic Eccentricity Fault Detection in Synchronous Machines Using Principal Component Analysis. In Proceedings of the 2023 IEEE Canadian Conference on Electrical and Computer Engineering (CCECE); IEEE: Piscataway, NJ, USA, 2023; pp. 348–353. [Google Scholar] [CrossRef]
  15. Senanayaka, J.S.L.; Van Khang, H.; Robbersmyr, K.G. Toward Self-Supervised Feature Learning for Online Diagnosis of Multiple Faults in Electric Powertrains. IEEE Trans. Ind. Inform. 2021, 17, 3772–3781. [Google Scholar] [CrossRef]
  16. Imamura, B.; Le Menach, Y.; Tounzi, A.; Sadowski, N.; Guillot, E. Study of static and dynamic eccentricities of a synchronous generator using 3-D FEM. IEEE Trans. Magn. 2010, 46, 3516–3519. [Google Scholar] [CrossRef]
  17. Shejwalkar, A.; Yusuf, L.; Ilamparithi, T.C. Comparative Analysis of Machine Learning Algorithms for Eccentricity Fault Classification in Salient Pole Synchronous Machine. In Proceedings of the 2024 IEEE Texas Power and Energy Conference (TPEC); IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar] [CrossRef]
  18. Liu, R.; Meng, G.; Yang, B.; Sun, C.; Chen, X. Dislocated Time Series Convolutional Neural Architecture: An Intelligent Fault Diagnosis Approach for Electric Machine. IEEE Trans. Ind. Inform. 2017, 13, 1310–1320. [Google Scholar] [CrossRef]
  19. Yusuf, L.; Shejwalkar, A.; Moa, B.; Ilamparithi, T. Classification and Severity Estimation of Eccentricity Faults in Salient Pole Synchronous Machine Using Deep Learning. IEEE Trans. Ind. Appl. 2025, 61, 6193–6204. [Google Scholar] [CrossRef]
  20. Gao, Y.; Gao, L.; Li, X.; Cao, S. A Hierarchical Training-Convolutional Neural Network for Imbalanced Fault Diagnosis in Complex Equipment. IEEE Trans. Ind. Inform. 2022, 18, 8138–8145. [Google Scholar] [CrossRef]
  21. Zhu, Z.; Lei, Y.; Qi, G.; Chai, Y.; Mazur, N.; An, Y.; Huang, X. A review of the application of deep learning in intelligent fault diagnosis of rotating machinery. Measurement 2023, 206, 112346. [Google Scholar] [CrossRef]
  22. Li, Z.; Kovachki, N.; Azizzadenesheli, K.; Liu, B.; Bhattacharya, K.; Stuart, A.; Anandkumar, A. Fourier Neural Operator for Parametric Partial Differential Equations. arXiv 2021, arXiv:2010.08895. [Google Scholar] [CrossRef]
  23. Rani, J.; Tripura, T.; Goswami, U.; Kodamana, H.; Chakraborty, S. Fault detection using Fourier neural operator. In Computer Aided Chemical Engineering; Elsevier: Amsterdam, The Netherlands, 2023; Volume 2, pp. 1897–1902. [Google Scholar] [CrossRef]
  24. Rani, J.; Tripura, T.; Kodamana, H.; Chakraborty, S. Generative adversarial wavelet neural operator: Application to fault detection and isolation of multivariate time series data. arXiv 2024, arXiv:2401.04004. [Google Scholar] [CrossRef]
  25. Lu, L.; Jin, P.; Pang, G.; Zhang, Z.; Karniadakis, G.E. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators. Nat. Mach. Intell. 2021, 3, 218–229. [Google Scholar] [CrossRef]
  26. Bakhtiaridoust, M.; Irani, F.N.; Yadegar, M.; Meskin, N. Data-driven sensor fault detection and isolation of nonlinear systems: Deep neural-network Koopman operator. IET Control Theory Appl. 2023, 17, 123–132. [Google Scholar] [CrossRef]
  27. Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence; Springer: Berlin/Heidelberg, Germany, 1981; pp. 366–381. [Google Scholar] [CrossRef]
  28. Gu, A.; Goel, K.; Ré, C. Efficiently Modeling Long Sequences with Structured State Spaces. In Proceedings of the International Conference on Learning Representations (ICLR), Online, 25–29 April 2022. [Google Scholar]
  29. Gu, A.; Johnson, I.; Goel, K.; Saab, K.; Dao, T.; Rudra, A.; Ré, C. Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers. Adv. Neural Inf. Process. Syst. 2021, 34, 572–585. [Google Scholar]
  30. Gu, A.; Dao, T. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. arXiv 2023, arXiv:2312.00752. [Google Scholar]
  31. Dao, T.; Gu, A. Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality. In Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria, 21–27 July 2024. [Google Scholar]
  32. Wang, Y.; Zhao, H.; Lin, H.; Xu, E.; He, L.; Shao, H. A Generalizable Physics-Enhanced State Space Model for Long-Term Dynamics Forecasting in Complex Environments. arXiv 2025, arXiv:2507.10792. [Google Scholar] [CrossRef]
  33. Bessous, N.; Zouzou, S.E.; Sbaa, S.; Bentrah, W.; Becer, Z.; Ajgou, R. Static eccentricity fault detection of induction motors using MVSA, MCSA and discrete wavelet transform (DWT). In Proceedings of the 2017 5th International Conference on Electrical Engineering—Boumerdes (ICEE-B); IEEE: Piscataway, NJ, USA, 2017; pp. 1–10. [Google Scholar] [CrossRef]
  34. Mehrjou, M.R.; Mariun, N.; Karami, M.; Samsul, B.M.; Sahar, Z.; Norhisam, M.; Mohd Zainal, A.; Mohd Amran, M.R.; Mohammad, H.M. Wavelet-Based Analysis of MCSA for Fault Detection in Electrical Machine. In Wavelet Transform and Some of Its Real-World Applications; Baleanu, D., Ed.; IntechOpen: Rijeka, Croatia, 2015; Chapter 5. [Google Scholar] [CrossRef]
  35. Verma, A.K.; Radhika, S.; Padmanabhan, S.V. Wavelet Based Fault Detection and Diagnosis Using Online MCSA of Stator Winding Faults Due to Insulation Failure in Industrial Induction Machine. In Proceedings of the 2018 IEEE Recent Advances in Intelligent Computational Systems (RAICS); IEEE: Piscataway, NJ, USA, 2018; pp. 204–208. [Google Scholar] [CrossRef]
  36. Jansen, M.; Patrick, O. Second Generation Wavelets and Applications; Springer: London, UK, 2005. [Google Scholar] [CrossRef]
  37. Park, C.H.; Kim, H.; Lee, J.; Ahn, G.; Youn, M.; Youn, B.D. A Feature Inherited Hierarchical Convolutional Neural Network (FI-HCNN) for Motor Fault Severity Estimation Using Stator Current Signals. Int. J. Precis. Eng. Manuf.-Green Technol. 2021, 8, 1253–1266. [Google Scholar] [CrossRef]
  38. Yusuf, L.; Moa, B.; Ilamparithi, T. Static and Dynamic Eccentricity Fault Detection and Quantification in an Inverter-Fed Reluctance Synchronous Machine Using Machine Learning. In Proceedings of the 2025 IEEE 34th International Symposium on Industrial Electronics (ISIE); IEEE: Piscataway, NJ, USA, 2025; pp. 1–7. [Google Scholar] [CrossRef]
  39. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 2019, 128, 336–359. [Google Scholar] [CrossRef]
Figure 1. The experimental setup for the inverter-fed RSM showing a 3-phase RSM connected to an inverter. Two current sensors, two differential probes, the Data Acquisition device are used for the actual data collection.
Figure 1. The experimental setup for the inverter-fed RSM showing a 3-phase RSM connected to an inverter. Two current sensors, two differential probes, the Data Acquisition device are used for the actual data collection.
Machines 14 00574 g001
Figure 2. The experimental setup showing a 3-phase SPSM driving a separately excited DC generator.
Figure 2. The experimental setup showing a 3-phase SPSM driving a separately excited DC generator.
Machines 14 00574 g002
Figure 3. Picture showing the Inverter drive (a), RSM (b) and eccentricity sleeves (c,d).
Figure 3. Picture showing the Inverter drive (a), RSM (b) and eccentricity sleeves (c,d).
Machines 14 00574 g003
Figure 4. Healthy and 40SE  I a waveforms for the SPSM, INV-RSM, and IM motors.
Figure 4. Healthy and 40SE  I a waveforms for the SPSM, INV-RSM, and IM motors.
Machines 14 00574 g004
Figure 5. Healthy and 40SE line-to-line voltage V a b waveforms for the SPSM, INV-RSM, and IM motors.
Figure 5. Healthy and 40SE line-to-line voltage V a b waveforms for the SPSM, INV-RSM, and IM motors.
Machines 14 00574 g005
Figure 6. A summary of the machine learning workflow followed to benchmark the proposed models: FNO, WNO, and MAMBA.
Figure 6. A summary of the machine learning workflow followed to benchmark the proposed models: FNO, WNO, and MAMBA.
Machines 14 00574 g006
Figure 7. Discrete-time state-space representation of the input x t , hidden-state update h t , the old hidden-state h t 1 , and the output y t . Z 1 is the unit delay.
Figure 7. Discrete-time state-space representation of the input x t , hidden-state update h t , the old hidden-state h t 1 , and the output y t . Z 1 is the unit delay.
Machines 14 00574 g007
Figure 8. Mamba2-based sequence model architecture with stacked selective state-space layers.
Figure 8. Mamba2-based sequence model architecture with stacked selective state-space layers.
Machines 14 00574 g008
Figure 9. RMSE comparison across machine platforms and target outputs for the Mamba model.
Figure 9. RMSE comparison across machine platforms and target outputs for the Mamba model.
Machines 14 00574 g009
Figure 10. Actual and predicted V a b waveform for the IM using the Mamba model.
Figure 10. Actual and predicted V a b waveform for the IM using the Mamba model.
Machines 14 00574 g010
Figure 11. Actual and predicted plot for SE severities for IM using the Mamba model.
Figure 11. Actual and predicted plot for SE severities for IM using the Mamba model.
Machines 14 00574 g011
Figure 12. Actual and predicted plot for DE severities for IM using the Mamba model.
Figure 12. Actual and predicted plot for DE severities for IM using the Mamba model.
Machines 14 00574 g012
Figure 13. Fourier Neural Operator architecture. Fourier layers are stacked together between the lifting and the projection layers.
Figure 13. Fourier Neural Operator architecture. Fourier layers are stacked together between the lifting and the projection layers.
Machines 14 00574 g013
Figure 14. RMSE comparison across machine platforms and target outputs for the FNO model.
Figure 14. RMSE comparison across machine platforms and target outputs for the FNO model.
Machines 14 00574 g014
Figure 15. Actual and predicted V a b waveform for SPSM using the FNO model.
Figure 15. Actual and predicted V a b waveform for SPSM using the FNO model.
Machines 14 00574 g015
Figure 16. Actual and predicted plot for SE severities for SPSM using the FNO model.
Figure 16. Actual and predicted plot for SE severities for SPSM using the FNO model.
Machines 14 00574 g016
Figure 17. Actual and predicted plot for DE severities for SPSM using the FNO model.
Figure 17. Actual and predicted plot for DE severities for SPSM using the FNO model.
Machines 14 00574 g017
Figure 18. Actual and predicted plot for PF for SPSM using the FNO model.
Figure 18. Actual and predicted plot for PF for SPSM using the FNO model.
Machines 14 00574 g018
Figure 19. RMSE comparison across machine platforms and target outputs for the WNO model.
Figure 19. RMSE comparison across machine platforms and target outputs for the WNO model.
Machines 14 00574 g019
Figure 20. A log-scale horizontal dot plot for RMSE comparison for IM severity estimation from literature and the proposed models.
Figure 20. A log-scale horizontal dot plot for RMSE comparison for IM severity estimation from literature and the proposed models.
Machines 14 00574 g020
Table 1. Dataset acquired from different machines, SPSM, INV-RSM, and IM, at different eccentricity fault levels (all at 0%, 50%, 75%, and 100% loads).
Table 1. Dataset acquired from different machines, SPSM, INV-RSM, and IM, at different eccentricity fault levels (all at 0%, 50%, 75%, and 100% loads).
SPSM ConditionINV-RSM ConditionIM Condition
Healthy (HL)Healthy (HL)Healthy (HL)
20% SE (20SE)10% DE (10DE)20% SE (20SE)
20% DE (20DE)10% SE (10SE)20% DE (20DE)
40% SE (40SE)20% SE (20SE)40% SE (40SE)
40% DE (40DE)20% DE (20DE)40% SE (40SE)
60% SE (60SE)40% SE (40SE)20% SE & 20% DE
60% DE (60DE)40% DE (40DE)40% SE & 20% DE
20% SE & 20% DE
20% SE & 40% DE
40% SE & 20% DE
Table 2. Data preprocessing parameters and batch sizes for the Mamba model.
Table 2. Data preprocessing parameters and batch sizes for the Mamba model.
MachineSeq. LengthStride (Faulty/HL)Batch Size
SPSM48075, 50200
INV-RSM2048500, 500200
IM48025, 25200
Table 3. Performance results for the Mamba model with TDE.
Table 3. Performance results for the Mamba model with TDE.
SPSMINV-RSMIM
TargetRMSEMaxAEQ95RMSEMaxAEQ95RMSEMaxAEQ95
Vab0.002290.021840.004170.023881.475060.012670.000970.007300.00185
Vbc0.001670.018800.003250.023201.320590.012020.000980.005680.00188
Load0.001520.120400.002550.002070.335490.003970.000160.001720.00031
SE0.002190.397180.003600.058900.534270.195030.000180.031490.00037
DE0.001910.575870.003250.077050.455380.112460.000110.058870.00019
PF0.001510.019940.00252
Table 4. Data preprocessing parameters and batch size for FNO.
Table 4. Data preprocessing parameters and batch size for FNO.
MachineSeq. LengthStride (Faulty/HL)Batch Size
SPSM48018, 6200
INV-RSM204869, 23200
IM2569, 3200
Table 5. Performance results for the FNO model with TDE.
Table 5. Performance results for the FNO model with TDE.
SPSMINV-RSMIM
TargetRMSEMaxAEQ95RMSEMaxAEQ95RMSEMaxAEQ95
Vab0.003710.049470.007090.081561.929160.038840.004780.066730.00989
Vbc0.003700.047130.007030.083672.317170.018320.004440.051130.00910
Load0.003480.012530.006950.002840.019600.005760.001680.010390.00356
SE0.003670.160800.004250.016550.339890.006970.002980.016200.00656
DE0.002200.089190.003390.033080.405500.006460.001830.007900.00351
PF0.001300.011160.00267
Table 6. WNO configuration and training parameters across machines.
Table 6. WNO configuration and training parameters across machines.
ParameterINV-RSMIMSPSM
Wavelet modes1688
Basis functiondb24db4db24
Width202020
Scales (levels J)1688
Epochs35,00015,00020,000
Table 7. Performance results for the WNO model with TDE.
Table 7. Performance results for the WNO model with TDE.
SPSMINV-RSMIM
TargetRMSEMaxAEQ95RMSEMaxAEQ95RMSEMaxAEQ95
Vab0.008670.330450.017190.125872.327310.036220.007410.085120.01489
Vbc0.008490.157330.017220.124911.964370.048220.006930.086670.01389
Load0.004250.063530.008350.008820.313220.017250.002110.055370.00404
SE0.004820.215040.008520.065780.429330.183250.003250.067650.00642
DE0.002980.222770.005510.089020.471800.203200.003580.178290.00630
PF0.001610.047570.00359
Table 8. RMSE comparison for IM severity estimation from literature and the proposed models.
Table 8. RMSE comparison for IM severity estimation from literature and the proposed models.
MethodDescriptionTargetRMSE
Spectral features with SVR [37]Conventional MCSA-based spectral features used with support vector regression.ECC0.0663
PCA features with SVR [37]FFT-magnitude features reduced using PCA and used with support vector regression.ECC0.0539
Input-reuse HCNN with FI-HCNN-style branch [37]Hierarchical CNN variant corresponding to Rep-HCNN1, where raw input is reused in the severity branch.ECC0.0141
Input-reuse HCNN with identical branches [37]Hierarchical CNN variant corresponding to Rep-HCNN2, where diagnosis and severity branches use identical structures.ECC0.0117
Feature-inherited HCNN [37]Hierarchical CNN that transfers latent features from the diagnosis module to the severity module.ECC0.0061
Earlier HCNN model [19]Hierarchical CNN model previously evaluated for IM generalization using current, load, and time-delay inputs.SE/DE0.0067/0.0235
Proposed Mamba modelSelective state-space model used for multi-output prediction from current inputs.SE/DE0.00018/0.00011
Proposed FNO modelFourier operator model used for multi-output prediction from current inputs.SE/DE0.00298/0.00183
Proposed WNO modelWavelet operator model used for multi-output prediction from current inputs.SE/DE0.00325/0.00358
Note: ECC denotes the single eccentricity severity target reported in [37].
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yusuf, L.; Moa, B.; Thirumarai Chelvan, I. The Unreasonable Effectiveness of Neural Operators and Mambas in Detecting and Quantifying Electrical Machine Faults: A Case Study on Eccentricity. Machines 2026, 14, 574. https://doi.org/10.3390/machines14050574

AMA Style

Yusuf L, Moa B, Thirumarai Chelvan I. The Unreasonable Effectiveness of Neural Operators and Mambas in Detecting and Quantifying Electrical Machine Faults: A Case Study on Eccentricity. Machines. 2026; 14(5):574. https://doi.org/10.3390/machines14050574

Chicago/Turabian Style

Yusuf, Latifa, Belaid Moa, and Ilamparithi Thirumarai Chelvan. 2026. "The Unreasonable Effectiveness of Neural Operators and Mambas in Detecting and Quantifying Electrical Machine Faults: A Case Study on Eccentricity" Machines 14, no. 5: 574. https://doi.org/10.3390/machines14050574

APA Style

Yusuf, L., Moa, B., & Thirumarai Chelvan, I. (2026). The Unreasonable Effectiveness of Neural Operators and Mambas in Detecting and Quantifying Electrical Machine Faults: A Case Study on Eccentricity. Machines, 14(5), 574. https://doi.org/10.3390/machines14050574

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop