1. Introduction
The motor is a complex system coupled by multiple physical fields, such as electric and magnetic fields [
1]. It is widely applied in industries, agriculture, transportation, national defense, and daily life [
2,
3]. With the rapid advancement of industrial automation and intelligent manufacturing, the demand for efficient and accurate fault diagnosis systems has grown significantly, particularly in the early detection of potential failures and the prevention of unexpected downtime [
4,
5]. However, conventional motor fault diagnosis approaches largely rely on one-dimensional time and frequency domain features and signal analysis techniques, which often struggle under non-stationary, nonlinear, and noise-intensive operating conditions [
6,
7]. Meanwhile, their diagnostic accuracy will be further reduced because of insufficient data and the easy overfitting of deep learning models, which limits their applicability in modern industrial environments requiring real-time and intelligent solutions [
8,
9,
10]. Therefore, investigating motor fault diagnosis under small-sample data conditions, combining image representation with deep learning classifiers, is crucial for improving fault detection efficiency and enhancing the operational robustness of motor systems [
11,
12,
13].
In recent years, significant advances have been achieved in fault diagnosis through the integration of time-series-to-image transformations and deep neural networks. On the one hand, recursive plot (RP), Gramian angular field (GAF), and Markov transfer field (MTF) are widely used to convert the original one-dimensional vibration signal into a two-dimensional representation, which can more richly characterize the time pattern and structural correlation. For instance, Xu developed a method that combines block 2D principal component analysis with a depth-separable convolutional model, demonstrating enhanced performance in rolling bearing fault identification [
14]. More recently, Zhao proposed a multi-source information fusion approach that has integrated time-frequency image representations with one-dimensional convolutional neural networks (CNNs) features, achieving nearly 99% diagnostic accuracy and high generalization across datasets [
15]. Similarly, Zhang discussed the application of deep learning in bridge health monitoring, emphasizing its adaptability and transferability in complex engineering systems [
16]. Multidisciplinary applications also suggest the potential of deep learning combined with spectral imaging for fast identification tasks, as demonstrated by Raman spectroscopy-based bacterial detection [
17]. Despite these advances, data scarcity remains a major bottleneck in motor fault diagnostics. To address this challenge, transfer learning, generative adversarial networks (GANs), and their variants, such as DCGAN and Wasserstein generative adversarial networks (WGANs), have been extensively studied. With architectural refinements and spectral regularization strategies, these models further improve the reliability of classification under small sample scenarios [
18]. In order to solve the problem of the GAN suffering from mode collapse or unstable sample quality under extremely limited data conditions. A variety of enhanced architectures have been proposed: for instance, an enhanced GAN demonstrated improved performance for the imbalanced fault diagnosis of rotating machines [
19]; the modified conditional generative adversarial network (MCGAN), based on memristive hyperchaotic sequences and a mixed-dimensional convolutional neural network (MCNN), was proposed for augmenting the fault sample and reducing the imbalanced rate [
20]; and the residual signal-driven GAN integrating dual-stream attention was reported to extract the fault characteristics of induction motors [
21]. Furthermore, hybrid approaches combining DCGANs with CNNs have been successfully applied to permanent magnet motor diagnosis under small sample settings [
22]. The improved DCGAN and Gramian Angular Summation Field (GASF) methods have been applied to arc fault diagnosis of photovoltaic arrays [
23]. Although standard deep convolutional networks have shown strong feature extraction capabilities, such networks usually give equal weight to all extracted feature channels. In the actual industrial scene, the signal is often accompanied by strong background noise, and not all the extracted frequency features have the same fault diagnosis value. Consequently, combining spectral constraints for high-quality data augmentation with architectures that capture both local details and global dependencies remains a critical direction for future research in Motor fault diagnosis. This is the first motivation of this study.
With the development of neural networks, CNNs and their variants have been widely recognized as powerful tools for fault diagnosis [
24]. Wen proposed a data-driven CNN-based framework that demonstrated superior feature extraction capability compared with traditional signal processing approaches, laying the groundwork for end-to-end diagnostic models [
25]. Building upon CNNs, residual networks (ResNets) were introduced to overcome the gradient disappearance problem in the deep structure, making the learning of hierarchical fault features more robust. For example, fault diagnosis strategies using ResNet have been successfully applied to driving motor diagnostics, significantly enhancing recognition accuracy under limited data conditions [
26]. The auxiliary classifier generative adversarial network (GAN), which incorporates convolution and transformer modules, realizes the extraction of global and local features and improves the classification accuracy [
27]. Similarly, dual-mode attention-based residual architectures have been developed to capture both the spatial and temporal dependencies of vibration signals, thereby achieving improved adaptability under variable operating conditions [
28]. In addition, recent systematic reviewshave highlighted the growing role of CNNs and ResNets as mainstream architectures for fault diagnosis, emphasizing their capacity to generalize across diverse machinery and fault scenarios [
29,
30]. Overall, CNNs and ResNet-based models provide strong foundations for designing diagnostic frameworks, but their effectiveness is still closely tied to data sufficiency and model generalization, motivating the integration with generative and attention-based mechanisms in recent research. This gives us a clear research direction.
Although substantial progress has been achieved in fault diagnosis through time series-to-image conversion and deep neural network classification, several key challenges remain unresolved. First, most image transformation techniques fail to incorporate spectral characteristics adequately, resulting in limited sensitivity to frequency domain information of the fault signals. Second, in terms of model architecture, the integration of local features with long-range dependencies is still insufficiently explored. At the same time, more research is needed to design diagnostic networks with stronger representational capability. Based on the above discussion, we study spectrum-aware generative models based on RP. Using RP retains the complete time dependence and reduces the sensitivity to random noise. When combined with DCGAN, it is used to generate high-quality samples and achieve sample balance. The RP spatial layout in the image has time delay and topological symmetry, rather than physical shape. Spatial attention processing is easily misguided; this urgently needs an attention method for the channel level. This particular need constitutes the core motivation for us to choose the SE mechanism. Different from spatial attention, the SE mechanism explicitly models the dependencies between channels to achieve adaptive feature recalibration, thereby dynamically amplifying specific channels containing key mechanical failure frequencies and suppressing noise-dominated channels. The improved DCGAN-ResNet-SE model for motor fault diagnostics is proposed and verified on mechanical faults and electrical faults, such as motor bearing faults and stator inter-turn short circuit faults. The main contributions are as follows.
- (1)
A spectrum-aware generative learning framework is proposed for intelligent motor fault diagnosis under small-sample conditions. By incorporating frequency-domain consistency constraints into the adversarial training process, the proposed method effectively improves the quality and reliability of generated samples, thereby alleviating the data scarcity problem in practical industrial systems.
- (2)
A spectrum-aware filtering and feedback mechanism is developed to evaluate the similarity between generated samples and real signals in the frequency domain. The screening strategy selectively retains high-quality synthetic samples and reinjects them into the adversarial training loop, forming a closed-loop optimization process that enhances both the stability and diversity of the generated data.
- (3)
An attention-enhanced deep classification network based on ResNet is designed to improve diagnostic performance. By integrating the SE mechanism with residual learning, the proposed model captures both local transient features and long-range dependencies in vibration signal representations, leading to more robust fault identification under limited data conditions.
The remainder of this paper is organized as follows:
Section 1 introduces the research background, motivation, and contributions;
Section 2 reviews the theoretical background;
Section 3 presents the proposed spectrum-aware DCGAN with dynamic threshold filtering in detail;
Section 4 reports the experimental setup and results; and
Section 5 concludes the paper and discusses future research directions.
2. Theoretical Background
2.1. Recurrence Plot
Recurrence Plot (RP) [
31] is a technique used to visualize the dynamics of time series data. By transforming a one-dimensional signal into a two-dimensional representation, RP highlights temporal patterns and structural correlations. Given a signal sequence
, the reconstructed state vectors with a time delay can be expressed as.
where
,
denotes the delay time and
is the embedding dimension. The optimal values of
and
are typically determined using the autocorrelation function method.
The distance between any two vectors in the reconstructed phase space is evaluated.
Based on these distances, the recurrence matrix of the phase space can be defined as.
where
is a threshold for recurrence, commonly chosen as approximately 15% of the standard deviation of the original series.
2.2. Generative Adversarial Network
GANs [
32] are a class of generative models with two core components: a generator
G and a discriminator
D, trained simultaneously in an adversarial manner. The generator samples from a latent space and produces data resembling the distribution of real samples, while the discriminator determines whether the input comes from the true dataset or is created by the
G. During training, the
G improves its ability to “fool” the
D, while the
D enhances its ability to distinguish real from synthetic data.
Mathematically, the training objective of a GAN is a minimax optimization problem:
where
x denotes real data from the distribution
, and
is the prior noise distribution. The operator
denotes the expectation, and
G(
z) represents synthetic samples produced by the generator.
A key advantage of GANs is their ability to learn complex data distributions without requiring explicit probability density modeling. However, the non-convex nature of the minimax game leads to instability or mode collapse. To address this, architectures like Deep Convolutional GANs (DCGANs) and Wasserstein GANs (WGANs) improve sample quality and convergence stability.
2.3. Deep Convolutional GAN
In the conventional GAN framework, both the generator and discriminator are typically implemented using fully connected layers. However, such architectures are prone to excessive parameterization when handling high-dimensional image data, leading to inefficient training and difficulty in preserving spatial structures in the generated samples. To overcome these drawbacks, Radford et al. introduced the Deep Convolutional GAN (DCGAN) [
33], which incorporates convolutional neural network architectures to enhance both spatial consistency in generation and discriminative capability in classification.
In the generator, DCGAN employs transposed convolution layers to progressively map the low-dimensional latent vector
into high-dimensional images. Formally, the mapping process can be expressed as.
where
denotes the transposed convolution operation,
and
are the generator parameters, and
is generally a ReLU or Leaky ReLU activation. This hierarchical expansion allows the model to increase the resolution of feature maps step by step, producing images with meaningful local spatial structures.
For the discriminator, DCGAN leverages conventional convolution and down-sampling layers to compress the input image progressively, extracting features that are discriminative between real and generated samples. Its discriminative function can be written as.
where
represents the sigmoid activation function, outputting the probability that the input image belongs to the real data distribution.
Through these architectural innovations, DCGAN demonstrates significantly more stable convergence than the original GAN and generates images with superior visual quality and structural fidelity. These advantages not only enhance training stability and sample realism but also establish DCGAN as a widely adopted data augmentation technique for addressing small sample challenges in fault diagnosis of rotating machinery.
2.4. Convolutional Neural Network
CNNs are a category of deep learning models characterized by local connectivity, shared parameters, and hierarchical feature extraction capabilities. Compared with traditional fully connected neural networks, CNNs can significantly reduce the number of parameters when processing images, audio, or time-series signals, while more effectively capturing local spatial correlations. In the context of mechanical fault diagnosis, vibration signals transformed into images typically exhibit two-dimensional textures with temporal patterns. CNNs are capable of progressively extracting multi-level features from these representations, ranging from low-level edge textures to high-level semantic patterns. The primary components of a CNN include convolutional layers, activation layers, pooling layers, and fully connected layers.
Although CNNs excel in feature extraction, increasing network depth can lead to issues such as vanishing gradients or performance degradation, where the model’s accuracy may actually drop beyond a certain number of layers. To address this problem, He proposed the Residual Network (ResNet) [
34].
The key innovation of ResNet is residual learning. By introducing identity-based skip connections, ResNet allows a deep network to directly learn the residual mapping between the input and output.
where
represents a nonlinear transformation composed of convolutional layers and activation functions,
is the input feature, and
is the output feature.
This design ensures that, although learns an effective mapping, the network can still preserve identity mapping, thus avoiding degradation.
2.5. SE Mechanism
In deep convolutional neural networks, the contribution of each extracted feature channel to the final classification result varies. Traditional convolutional mappings treat all channels with equal weight, which limits the discriminative power of the model in fault diagnosis scenarios characterized by significant background noise. To address this, the SE (squeeze-and-excitation) mechanism [
35] is introduced in this study to model the inter-dependencies between channels and achieve adaptive feature recalibration. The SE module primarily consists of two key processing units: squeeze and excitation.
Squeeze Unit: This unit aims to compress spatial information into channel descriptors. Given an input representation
, a global average pooling mapping is employed to compress the
spatial features of each channel into a single global descriptive value
.
where
denotes the feature mapping of the
channel.
This operation enables the model to obtain a global receptive field and capture the global distribution of features across all channels.
Excitation Unit: This unit captures non-linear dependencies between channels through two fully connected layers. To reduce the parameter count, a dimensionality reduction weight matrix
is first applied, followed by a dimensionality recovery weight matrix
to return to the original channel dimension.
where
represents the Sigmoid activation function, which maps the output to a weight coefficient
ranging from 0 to 1.
Feature Recalibration: Finally, the generated weight coefficients are applied to the corresponding channels of the original feature map
to perform feature reconstruction:
This channel-wise attention mechanism allows the model to autonomously learn the importance of different fault feature channels. In the context of fault diagnosis, the integration of the SE mechanism enables the network to reinforce feature channels sensitive to fault types (such as periodic textures in recurrence plots) while suppressing invalid channels affected by environmental noise. Consequently, this significantly improves the recognition accuracy and robustness of the model under complex operating conditions.
3. Improved DCGAN-ResNet-SE Model for Motor Fault
3.1. Improved DCGAN Algorithm Based on Spectrum-Aware Filtering
To further improve the quality of synthetic samples and ensure their physical plausibility in the frequency domain. A spectrum-aware filtering module is integrated into the training loop. The design theory of this module is strictly rooted in the physical nature of rotating machinery faults, and on this basis, the mathematical measurement method is optimized. At the advantage level of mathematical measurement, the traditional signal similarity evaluation often depends on the L1 Manhattan distance or the L2 Euclidean distance norm. However, these two spatial distance measures are extremely sensitive to signal phase offset and random ambient noise. Even in the frequency domain, due to the change in motor load under multiple operating conditions, the absolute amplitude of the spectrum induced by the same physical fault will fluctuate violently, resulting in the failure of the L1/L2 distance in evaluating sample fidelity. In contrast, the Pearson correlation coefficient used in this module is mainly used to measure the linear correlation between two vectors. Combined with the fast Fourier transform, this measurement method can effectively ignore the overall amplitude scaling and phase difference caused by the change in working conditions and focus on evaluating the high consistency between the generated samples and the real samples in the spectral contour. This screening mechanism, based on spectral contour similarity, forces the generator to learn the frequency distribution of faults mathematically, thus ensuring that the retained synthetic samples truly reproduce the resonance band structure of the corresponding faults in the physical principle, providing high-quality data support with high physical interpretability for subsequent classification models.
Specifically, every 10 epochs, the generator produces 100 candidate images, which are compared with reference samples from the minority class. Each candidate is converted into grayscale and transformed via a two-dimensional FFT. The magnitude spectrum is then logarithmically scaled, and the Pearson correlation coefficient is computed against the spectra of reference samples. For each candidate, the maximum correlation score is taken as its similarity measure. Samples with similarity exceeding a dynamic threshold are retained, while others are discarded. To prevent overly strict filtering, a minimum-keep rule is applied: at least Kmin = 5 candidates are preserved by selecting the top-ranked samples. All retained images are stored and iteratively fed back into the training process, providing high-quality synthetic data that enhances both the diversity and reliability of the classifier under limited-sample conditions. The procedure is as follows.
Candidate Sample Generation: At the end of each training iteration, the generator G produces a batch of candidate image samples.
Spectral Mapping and Similarity Computation: Each candidate sample is transformed via an FFT to obtain its spectral representation.
is compared with the spectrum
of the real reference sample to define the normalized spectrum difference.
Dynamic threshold filtering: Set a gradually tightening spectral similarity threshold, whose value decreases with increasing training epochs
t.
where
represents the Sigmoid activation function, which maps the output to a weight coefficient
ranging from 0 to 1.
Feedback and closed-loop optimization: Selecting qualified samples is not only used to expand the training set of the classifier but also included as input for the next round of adversarial training, optimizing the discriminator on the joint distribution of real samples and high-quality generated samples, and gradually converging the generator towards spectral consistency. Its new adversarial objective can be expressed as.
where
pspec(x) represents the distribution of high-quality generated samples after spectral screening. The algorithm flowchart is shown in Algorithm 1.
| Algorithm 1. Adversarial Training with Spectrum-Aware Filtering |
Require: Training data X, reference set Xr, noise distribution (0, I) Ensure: Best generator G*
1: Initialize generator G, discriminator D, optimizers;
2: for epoch = 1 to T do
3: for each mini-batch of size B do
4: Sample noise Z ~ (0, I);
5: Generate fake samples X^ = G(Z);
6: Update discriminator D with real X and fake X^;
7: Update generator G using discriminator feedback;
8: end for
9: if t mod fs = 0 then
10: Sample candidate noise Zc;
11: Generate candidate set = G(Zc);
12: Compute FFT spectra and correlation with reference set Xr;
13: Select samples with similarity s() ≥ τi;
14: if number of selected samples < kmin then
15: Keep top-kmin most similar candidates;
16: end if
17: Save selected samples and metadata;
18: end if
19: end for
20: return best generator G*
|
3.2. Classification Model Based on ResNet-SE
The ResNet-SE classification model constructed in this study builds upon a conventional CNN by integrating residual connections, feature concatenation, and the SE mechanism to achieve a high-precision classification of the fault signals. In technical diagnosis, vibration signals are often accompanied by strong environmental noise. This model realizes the compression and extraction of key features through the SE mechanism. Specifically, the input features are compressed by global average pooling, and the spatial information is transformed into a global channel descriptor. This topology compression mechanism enables the network to ignore invalid background noise and give higher weights to the feature channels containing periodic fault pulses, thus establishing a direct mapping between mathematical reconstruction and physical diagnosis. The network architecture primarily consists of an input layer, dual-branch feature extraction paths, a feature fusion module, convolutional and pooling layers, fully connected layers, and an attention mechanism, with a final Softmax layer for classification. The overall structure is illustrated in
Figure 1 below.
Input Layer: Accepts color images of size 64 × 64 × 3, which represent the visualized vibration signals. This layer provides the raw information necessary for subsequent feature extraction.
Main Path: Uses a 4 × 4 convolutional layer to perform initial feature extraction on the input images, producing four output channels. The convolution is followed by batch normalization (BN) and ReLU activation to accelerate training convergence and enhance nonlinear feature representation. The 3 × 3 maximum pooling layer is used for spatial down sampling to reduce the feature dimension and improve the feature compactness while maintaining significant information.
Residual Path: To preserve essential input information and facilitate gradient propagation, a parallel residual branch is established. This branch employs a 1 × 1 convolution to adjust the channel dimensions, ensuring compatibility with the main path output. The residual features are later fused with the main path features during the feature concatenation stage, enriching feature representation and mitigating gradient vanishing issues.
Feature Concatenation Layer: Outputs from the main and residual paths are merged to form dual-channel features. The fused features then pass through multiple layers of convolution, batch normalization, ReLU activation, and pooling. The second convolutional layer outputs 16 channels, and the third convolutional layer outputs 32 channels. This hierarchical convolution and normalization process enables the model to progressively extract multi-scale fault features, from shallow texture to deep semantic patterns.
Fully Connected and SE Layer: After convolutional feature extraction, a flattening layer is employed to reshape the multi-dimensional feature maps, ensuring that the data format is compatible with the subsequent fully connected mapping and processing units. A 24-dimensional fully connected layer is then used for feature compression and combination. Subsequently, the SE mechanism is applied to perform an adaptive recalibration of the feature channels, enhancing critical fault-related features while suppressing redundant information. This process results in a more precise and discriminative feature representation for fault diagnosis.
Output Layer: The final fully connected layer outputs four feature dimensions, which are normalized by a Softmax layer to form a probability distribution across fault classes. The Softmax ensures that the sum of probabilities equals one, and the class with the highest probability is taken as the predicted fault type.
5. Conclusions
To address the challenge of insufficient training samples in motor fault diagnosis, this work proposes a DCGAN-based method with a spectrum-aware screening mechanism. Taking vibration signals as the research object, by integrating image constraints with high spectral similarity into the generative network, the proposed approach improves both the distributional consistency and the discriminative power of synthetic samples, thereby ensuring higher quality signal generation. For classification, a ResNet enhanced with an SE module is designed to capture both local details and global temporal dependencies, leading to more robust feature representations. Extensive experiments were conducted on the CWRU fault datasets, the stator interturn fault dataset, and the HUST fault datasets. The results demonstrate the effectiveness of the method under limited data conditions. For example, on the HUST motor fault dataset, the diagnostic accuracy increased by 2% after adding the generated samples. For the stator inter-turn short circuit fault dataset, the diagnostic accuracy 95.11% without generating samples, and increased to 99.44% when 40 generated samples per class were used. On the CWRU dataset, the model achieved nearly 100% accuracy, confirming the strong generalization capability of the approach. Moreover, the key novelty of this work lies in the spectrum-aware closed-loop design, which distinguishes our approach from conventional loss-regularized GANs. By screening and rejecting frequency-consistent samples, we ensure both stability and diversity in small sample augmentation. In addition, the integration of a ResNet-SE classifier allows for the simultaneous capture of local and global features, achieving superior accuracy under highly constrained conditions. This dual-level innovation, at both the data generation and classification stages, provides a systematic solution to small sample motor fault diagnosis, with a strong potential for extension to other rotating machinery.
The future research directions will focus on addressing the limitations of laboratory-based data and further enhancing the practicality and adaptability of the proposed method. On the one hand, we will study fault diagnosis methods under variable operating conditions and strong noise environments, aiming to narrow the gap between laboratory research and real-world applications. On the other hand, we will optimize the generative network to adapt to different types of motor faults and motor models (e.g., high-voltage industrial motors and permanent magnet synchronous motors), improving the generalization ability of synthetic samples across diverse fault categories and motor configurations. Finally, we will investigate the combination of the spectrum-aware GAN frameworks with transfer learning or federated learning to address the problem of insufficient labeled real-world fault data, further improving the diagnostic performance and practical value of the method in small-sample industrial scenarios.