Next Article in Journal
Effects of Oil Removal and Saturation on Core Integrity in Jimsar Shale Cores
Previous Article in Journal
Green Innovation for Solid Post-Distillation Residues Valorization: Narrative Review of Circular Bio-Economy Solutions
Previous Article in Special Issue
Multi-Condition Fault Diagnosis Method for Rolling Bearings Based on Enhanced Singular Spectrum Decomposition and Optimized MMPE + SVM
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Rolling Element Bearing Fault Diagnosis Based on Adversarial Autoencoder Network

1
College of Mechanical and Electrical Engineering, Kunming University, Kunming 650214, China
2
School of Mechanical and Electrical Engineering, Kunming University of Science and Technology, Kunming 650500, China
*
Author to whom correspondence should be addressed.
Processes 2026, 14(2), 245; https://doi.org/10.3390/pr14020245
Submission received: 11 December 2025 / Revised: 5 January 2026 / Accepted: 8 January 2026 / Published: 10 January 2026

Abstract

Rolling bearing fault diagnosis is critical for the reliable operation of rotating machinery. However, many existing deep learning-based methods rely on complex signal preprocessing and lack interpretability. This paper proposes an adversarial autoencoder (AAE)-based framework that integrates adaptive, data-driven signal decomposition directly into a neural network. A convolutional autoencoder is employed to extract latent representations while preserving temporal resolution, enabling encoder channels to be interpreted as nonlinear signal components. A channel attention mechanism adaptively reweights these components, and a classifier acts as a discriminator to enhance class separability. The model is trained in an end-to-end manner by jointly optimizing reconstruction and classification objectives. Experiments on three benchmark datasets demonstrate that the proposed method achieves high diagnostic accuracy (99.64 ± 0.29%) without additional signal preprocessing and outperforms several representative deep learning-based methods. Moreover, the learned representations exhibit interpretable characteristics analogous to classical envelope demodulation, confirming the effectiveness and interpretability of the proposed approach.

1. Introduction

Rolling bearings, often referred to as the joints of industry, are widely used in various industrial scenarios such as aerospace, wind power generation, and manufacturing, serving to reduce friction and support loads. The health condition of rolling bearings is critical to the safe operation of mechanical equipment [1]. In operation, rolling bearings often run under complex working conditions and are affected by alternating loads, making them one of the parts that are most prone to failure [2]. Therefore, it is of great significance to carry out research on fault diagnosis technology for rolling bearings.
Traditional fault diagnosis methods typically extract fault characteristics using signal processing techniques and then compare them with fault characteristic frequencies derived from fault mechanism analysis. Li et al. [3] systematically investigated several key factors influencing the performance of Singular Value Decomposition (SVD) and proposed a Correlated SVD (C-SVD) algorithm, which proved effective in bearing fault diagnosis. Liu et al. [4] proposed a sparse time-frequency analysis method, the Time-reassigned Multi-synchrosqueezing S-Transform (TMSSST), which achieves high-resolution time-frequency representation and robust bearing fault diagnosis, outperforming traditional methods in transient feature extraction. Ren et al. [5] investigated the mathematical properties of Time-Synchronous Averaging (TSA) and demonstrated its advantages in quasi-periodic signal processing through its successful application to rolling bearing signal analysis. However, these methods require substantial domain knowledge, extensive diagnostic experience, and a high level of user expertise.
With the development of computer technology, researchers have introduced machine learning algorithms into the field of fault diagnosis. Cao et al. [6] extracted frequency-domain and time-domain features from bearing vibration signals to construct feature vectors, applied Principal Component Analysis (PCA) for dimensionality reduction, and finally fed the reduced features into a Random Forest (RF) model for fault diagnosis. Zhou et al. [7] employed VMD to extract bearing fault features, which were then fed into a Support Vector Machine (SVM) for fault classification. Both the feature extraction and classification processes were optimized using the Whale Gray Wolf Optimization Algorithm (WGWOA). Li et al. [8] proposed a method based on wavelet packet decomposition to extract fault features, which were subsequently fed into an SVM optimized by an improved sparrow search algorithm for rolling bearing fault diagnosis. In these methods, machine learning algorithms act merely as classifiers and do not fully exploit the signal information. As a result, the effectiveness of these approaches largely depends on the quality of the extracted features.
In the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) of 2012, a deep learning-based model, AlexNet, achieved a breakthrough victory [9], surpassing traditional image recognition methods and igniting a surge of interest in deep learning. Since then, deep learning has taken center stage in subsequent competitions. Over the following years, deep learning algorithms have demonstrated remarkable performance in domains such as image processing and speech recognition, motivating researchers to explore their potential in fault diagnosis. For example, Huang et al. [10] proposed a multi-scale parallel convolutional neural network integrated with a channel attention mechanism, enabling effective rolling bearing fault diagnosis under noisy conditions and varying working speeds. Zhao et al. [11] applied Continuous Wavelet Transform (CWT) to convert raw vibration signals into images, which were subsequently fed into a Convolutional Deep Belief Network (CDBN) with Gaussian distribution for fault diagnosis. Duan et al. [12] employed Markov Transition Fields (MTF) to transform time-domain signals into two-dimensional feature maps, which were then input into a Multidimensional Supervised Module Convolutional Neural Network (MSMCNN) for fault diagnosis. Guo et al. [13] used Gram Angle Field (GAF) encoding to transform time-domain signals into images, which were then fed into an improved ConvNeXt network for fault diagnosis.
Despite recent advances, existing fault diagnosis models face two major limitations: they either rely heavily on complex signal preprocessing procedures for performance improvement or lack interpretability due to opaque internal decision mechanisms. In particular, conventional signal decomposition algorithms, which are widely used as preprocessing tools, are separated from the downstream classifier and optimized independently, often leading to suboptimal diagnostic performance.
Recent work in explainable AI (XAI) and physics-informed deep learning (e.g., refs. [14,15,16,17]) has aimed to enhance model transparency or embed domain knowledge into fault diagnosis. However, XAI methods typically extract post hoc features associated with classification, and physics-informed networks require manually designed physics-based components, relying heavily on expert knowledge. In contrast, our framework aims to make the learning process itself interpretable by embedding an adaptive, data-driven signal decomposition mechanism into the network and jointly optimizing reconstruction and classification objectives.
Interestingly, many traditional signal decomposition algorithms share a formal similarity with autoencoders in their transform–inverse transform structure. Signal decomposition can be understood as breaking a complex signal into several basic constituent patterns, while different channels in a convolutional neural network (CNN) act as filters that extract distinct components from the input signal. In this context, the convolution operation in CNNs can be interpreted as waveform matching using convolution kernels. Accordingly, training a CNN corresponds to adaptively learning a set of basis-like functions that are most responsive to fault-related waveforms, which aligns conceptually with classical basis-function signal decomposition methods based on inner products [18,19]. From this viewpoint, a convolutional autoencoder can be regarded as a data-driven signal decomposition framework in which encoder channels capture latent nonlinear components and the decoder performs a corresponding inverse transformation to reconstruct the original signal. Motivated by this insight, this paper proposes a novel Adversarial Autoencoder (AAE) framework that integrates adaptive signal decomposition directly into the neural network. Specifically, a convolutional autoencoder is designed as the feature extraction module, where each encoder channel can be interpreted as a nonlinear signal component and the decoder acts as the inverse transformation to ensure reconstruction fidelity. A channel attention mechanism is further introduced to adaptively reweight feature components, while a classification network functions as a discriminator to enforce class separability in the latent representation. In this way, the feature decomposition process is directly optimized toward fault discrimination, achieving both adaptivity and interpretability within an end-to-end pipeline. Validation on three benchmark datasets demonstrates the effectiveness and robustness of the proposed method.

2. Theoretical Background

2.1. Convolutional Neural Network

CNNs are currently the mainstream framework of deep learning algorithms due to their excellent feature extraction capabilities, and they are also widely used in the field of fault diagnosis. The convolutional layer plays a central role in CNNs. Figure 1 illustrates the 1D convolution operation, where the denotes the convolution operation, and its computational formula is given as follows:
z i = m = 0 K 1 x i · S + m P · w m + b ,               i = 0 , 1 , , L o u t 1
where z [ i ] denotes the output of the convolutional layer, x [ i ] represents the input (with x [ i ] = 0 when the index is out of range), w [ m ] is the convolution kernel, K is the kernel size, S is the stride, P is the padding, b is the bias, and L o u t is the output sequence length of the convolutional layer.
From the formula, it can be seen that the convolution operation essentially performs an inner product between the convolution kernel and local segments of the input sequence, which is consistent with signal decomposition methods based on inner-product principles. Therefore, this paper employs convolutional neural networks as the backbone for feature extraction.

2.2. Residual Networks

As the depth of neural networks increases, they encounter the problem of vanishing and exploding gradients, which makes training deep models difficult. Blindly increasing the number of layers may even degrade network performance. To address this issue, He et al. [20] proposed ResNet in 2016, where shortcut connections are introduced. These connections allow gradients to be propagated without attenuation during backpropagation, effectively alleviating the vanishing and exploding gradient problems. Based on this idea, the proposed network in this paper is constructed using Residual Blocks (RBs), which not only facilitate convergence but also make it easy to add or remove layers when the model suffers from underfitting or overfitting, thereby enabling further improvements. The network used in this study is built upon the RB structure shown in Figure 2.
Each Residual Block consists of three convolutional modules, each comprising a one-dimensional convolutional layer (1D_Conv), a Batch Normalization (BN) layer, and an activation function. The kernel sizes of the three convolutional modules are 1, 3, and 1, respectively, with paddings of 0, 1, and 0 and convolution strides of 1 or 2. The activation function used is ReLU, due to its simplicity, relatively low computational cost, and effectiveness in alleviating the vanishing gradient problem. In addition, the convolutional layer with a kernel size of 1 facilitates adjustment of the number of channels when the input and output dimensions differ, while also enhancing the model’s nonlinearity.

2.3. Autoencoder

The autoencoder is an unsupervised learning model that consists of an encoder and a decoder, as shown in Figure 3.
Autoencoder- and VAE’-based methods have been explored to learn representations directly from raw vibration signals. However, conventional autoencoders generally compress the high-dimensional input signal x into a low-dimensional latent variable h optimized independently for reconstruction or classification, which limits interpretability and prevents decomposition into meaningful signal components. In this process, the decoder attempts to reconstruct the original signal from the latent variable, forcing the network to learn the most representative features of the input and making the autoencoder an effective feature extractor. When the autoencoder is implemented using convolutional neural networks (CNNs) and each latent channel is interpreted as a distinct component, the architecture becomes formally analogous to a signal decomposition process, with each channel capturing a latent component and the decoder performing the corresponding inverse transformation to reconstruct the original signal.
For encoder:
T x h
For decoder:
T 1 h x
Here, T represents a functional transformation, T 1 represents its inverse transformation, x R N , N is the length of the signal, h R M × L , M is the number of decomposed signals, corresponding to the number of output channels of the encoder, and L is the length of the decomposed signal.
This formulation reveals a key insight: a convolutional autoencoder is mathematically analogous to traditional signal decomposition algorithms. Building on this equivalence, we replace conventional handcrafted decomposition procedures with a convolutional autoencoder, thereby embedding an adaptive functional decomposition transformation directly within the neural network. Unlike conventional autoencoders that primarily aim to downscale the input into a compact latent vector, the proposed encoder–decoder architecture performs an upscaling transformation, projecting the original signal into a higher-dimensional representation in which each latent channel preserves the same temporal length as the input signal.
In this expanded feature space, the energy of the signal is redistributed across channels, enabling fault-related components that are inseparable in the original signal domain to become distinguishable. Furthermore, by jointly optimizing reconstruction and classification objectives in an end-to-end manner, the classifier can be interpreted as a metric function that explicitly constrains the encoded representations to remain separable across different fault categories. If the encoder output achieves high classification accuracy under this metric, the resulting transformation can be regarded as a reversible, discriminative signal decomposition that simultaneously ensures interpretability and fault separability. This framework eliminates the need for explicit handcrafted signal decomposition steps, simplifies the diagnostic pipeline, and improves robustness by learning the decomposition process directly from data within a unified optimization framework.

3. Adversarial Autoencoder Model

3.1. Structure and Parameters

The network used in this paper is constructed with Residual Blocks and consists of an encoder, a decoder, and a classifier, as shown in Figure 4.
The encoder is built with seven Residual Blocks, and the decoder is designed to be symmetrical to the encoder. Following the principle of an autoencoder, the number of channels in the network is designed to decrease from large to small. The network parameters for the encoder, decoder and classifier are shown in Table 1.
In this paper, pooling layers and down-sampling layers are omitted in the autoencoder, since such operations are irreversible and hinder signal reconstruction. As a result, the output length of the latent variable layer remains consistent with the length of the input layer. Each channel of the latent variable layer is treated as a component, making the encoder analogous to a functional decomposition transformation, while the decoder serves as its inverse transformation. Subsequently, the output of the latent variable layer is connected to the classifier for classification. The AvgPool denotes the average pooling layer with a pooling kernel size of 4, a stride of 2, and a padding of 1. AdaptiveAvgPool refers to the adaptive average pooling layer in PyTorch, which automatically adjusts pooling parameters according to the target output length; in this case, the output length is set to 1, equivalent to computing the average value of each channel.

3.2. Channel Attention Layer

The encoder’s different components capture distinct fault-related characteristics. To reduce the interference of irrelevant features on the classifier, an attention mechanism [21] is introduced, and a channel attention layer is incorporated into the classifier. This layer enhances the network’s focus on critical features by adaptively reweighting channels. Moreover, since each channel is assigned different weights during training, the autoencoder is encouraged to produce diverse feature representations across channels, thereby improving model performance. The structure of the channel attention layer is illustrated in Figure 4.
The channel attention layer first compresses the input features along the channel dimension using different pooling functions. The compressed features are then passed through a fully connected layer for affine transformation, followed by a sigmoid activation to obtain the channel weight matrix. Finally, the input features are multiplied by the weight matrix to generate the output features. The computation of the channel weight matrix M A is given by:
M A = S i g m o i d ( F C ( R e L U ( F C ( M a x P o o l F + A v g P o o l ( F ) ) ) ) )
In the formula, M A represents the attention weight matrix, F represents the input features, S i g m o i d   ( ) and R e L U   ( ) are activation functions, F C   ( ) denotes a fully connected layer, and M a x P o o l   ( ) and A v g P o o l   ( ) represent max pooling and average pooling, respectively.

3.3. Adversarial Training Strategy

For the training of autoencoders, a common approach is to first pre-train the encoder and decoder and then jointly train the encoder and the classifier [22]. However, to enhance the effectiveness of the features extracted by the encoder, this paper adopts the network training strategy proposed in [23], which draws inspiration from the adversarial training paradigm of Generative Adversarial Networks (GANs). GANs have been widely applied in fault diagnosis, such as domain adversarial learning [24,25] and fault signal generation [26,27], although their training process is often reported to suffer from convergence difficulties and instability. In this paper, inspired by the adversarial training strategy of GANs, the classifier is regarded as a discriminator that guides the gradient descent of the autoencoder. In this way, the autoencoder is encouraged to achieve a low reconstruction error while simultaneously ensuring that the encoded features are more discriminative across different fault categories.
Regarding training stability, no notable convergence difficulties were observed in our experiments. This can be attributed to the fact that the reconstruction loss remains significantly smaller than the classification loss during training, such that the overall optimization process is dominated by the classification objective. Consequently, the adversarial-inspired training strategy does not introduce severe instability in practice while still effectively enhancing the discriminative capability of the learned features. It should be noted that, as with adversarial training in general, convergence behavior may be influenced by factors such as loss weighting and network initialization; however, such issues were not observed under the experimental settings considered in this work.
In practice, an alternating iterative training strategy is adopted: the autoencoder network is updated first, followed by the classifier network. The mean squared error is used as the reconstruction loss for the autoencoder, while cross-entropy is employed as the classification loss.
A E l o s s = 1 N i = 1 N x i x ˜ i 2
C l o s s = i = 1 N t i ln y i
Let the model parameters for the encoder, decoder, and classifier be θ E , θ D , and θ C respectively. The encoder and decoder are regarded as one sub-network, while the encoder and classifier constitute another sub-network. These two sub-networks are optimized alternately to ensure a balance between reconstruction and classification.
In the first phase, the encoder and decoder are updated to minimize the reconstruction loss:
θ E = θ E η A E l o s s θ E θ D = θ D η A E l o s s θ D
In the second phase, the encoder and classifier are updated to minimize the classification loss:
θ E = θ E η C l o s s θ E θ C = θ C η C l o s s θ C
where η represents the learning rate, and denotes the partial derivative operator.
One complete training iteration consists of executing both phases, enabling the encoder to learn features for reconstruction and classification simultaneously. The hardware, software, and training configurations of the experimental platform are summarized in Table 2, and the training strategy is illustrated in Figure 5.

3.4. Model Training

The flowchart of the proposed fault diagnosis method is shown in Figure 6. First, the dataset is segmented into samples of a fixed length and then split into training, validation, and testing sets according to a predefined ratio. Each sample is subsequently normalized. The model is then trained on the training set, and the parameters corresponding to the highest validation accuracy are saved, as this model is considered to have the best generalization ability. Finally, after training, the model is evaluated on the test set to assess its performance.

4. Experiments

4.1. Data Preprocessing

The collected signals are standardized using the min–max normalization method, as shown in Equation (9), to scale the amplitudes of different signals to the range [0, 1], thereby reducing the influence of varying signal magnitudes.
x ¯ = x m i n x m a x x m i n x
In the equation, x ¯ R N represents the normalized signal, where N is the length of the signal, x R N represents the original signal, and m a x   ( ) and m i n   ( ) denote the maximum and minimum value functions, respectively.

4.2. Experimental Dataset

To evaluate the performance of the proposed model, one rolling bearing failure dataset, referred to as Dataset_1, collected on a rotating machinery fault simulation test bench in our laboratory, along with two public datasets, is used for validation.
The structure of the test rig is shown in Figure 7, which consists of an electric motor, a bearing housing, a belt pulley, and other components. The bearing used is an SKF-6204, and faults are introduced by Wire Electrical Discharge Machining (WEDM), as illustrated in Figure 8. Notches of different depths (0.2 mm, 0.4 mm, and 0.6 mm) are machined to simulate varying fault severities, resulting in a total of 13 fault types, as listed in Table 3. The motor operates at 1500 RPM, and acceleration signals along the Z-axis of the bearing housing are collected at a sampling frequency of 12.8 kHz, with each recording lasting 30 s. In addition, the height of the tension pulley is adjusted to vary the bearing load, thereby enriching the dataset. Each fault type is recorded under two different load conditions. The collected data is segmented into 9750 non-overlapping samples of 1024 points each and further divided into training, validation, and test sets with a ratio of 3:1:1.
The rolling bearing fault dataset from Case Western Reserve University (CWRU) [28] contains fault data under four load conditions and four levels of fault severity, sampled at a frequency of 12 kHz. Fault data with fault depths of 7 mils, 14 mils, and 21 mils at the drive end are selected. Fault data of the same type but under different loads are grouped into a single fault category, resulting in a total of 10 fault categories, each containing data from four different load conditions, as shown in Table 4. The model’s classification performance under varying load conditions is evaluated. The dataset consists of 5927 non-overlapping samples, each with 1024 data points, which are divided into training, validation, and testing sets in a ratio of 3:1:1.
The rolling bearing fault dataset from Huazhong University of Science and Technology (HUST) [29] contains vibration signals of bearings under nine different health conditions and four different speeds, sampled at a frequency of 25.6 kHz. Each dataset includes vibration signals in three directions. Signals of the same fault type at different speeds are merged into a single category, and the acceleration signals along the Z-axis are selected, resulting in data for nine fault categories. Each fault category therefore includes signals from four different speeds, as shown in Table 5.
The model’s classification performance under varying rotational speeds is evaluated. The vibration data are segmented into non-overlapping samples of 1024 points, yielding a total of 9216 samples, which are then divided into training, validation, and testing sets in a ratio of 3:1:1.

4.3. Comparative Experiment

The time-domain signals of the rolling bearings, after standardization processing, were directly fed into the neural network for training on the training set. The classification accuracy of the model was evaluated on the test set, and the average performance over 10 training runs was reported as the final result for each dataset. The proposed model achieved an average reconstruction error of 0.000436 across the three datasets, while the average classification accuracy reached 99.64 ± 0.29%. These results demonstrate that the proposed network is capable of simultaneously achieving low reconstruction error and high classification accuracy, indicating that a reversible signal decomposition process can be simulated by training an autoencoder, and the decomposed signals can be effectively utilized for fault classification. The confusion matrix of the test set is shown in Figure 9, and the training loss curves and validation accuracy curves are presented in Figure 10.
Dataset_1 and the CWRU dataset correspond to variable load conditions. As shown in the confusion matrix, the network misclassified some samples with different fault depths of rolling elements due to load variations and relative rolling motion. The HUST dataset, on the other hand, corresponds to variable speed conditions, where speed fluctuations caused the network to confuse certain samples of different fault types. Overall, the proposed network is capable of achieving high diagnostic accuracy under both variable speed and variable load conditions.
In addition, adversarial training strategies are often prone to issues such as pattern collapse and convergence difficulty [30,31]. However, no such problems were observed during the training of the proposed model. As indicated by the loss curves, the reconstruction error is significantly smaller than the classification error, suggesting that the classification error dominates during training. This may explain why the model did not encounter convergence difficulties.
To further evaluate the performance of the model, the proposed method is compared with existing mechanical fault diagnosis methods. The methods compared in this paper mainly come from Ref. [32], as the selected methods in Ref. [32] provide complete implementation details and publicly available source code, ensuring fair comparison and experimental reproducibility. The compared methods include AlexNet, BiLSTM, CNN, CNN+CWT, ResNet18, and an autoencoder (AE) composed of fully connected layers, as well as an additional model, CNN+EMD, which first decomposes the signals using EMD and selects the five components with the highest correlation to the original signal as input to the CNN. All networks use the Adam optimizer, with 128 training epochs and a batch size of 32. The saved model parameters correspond to the highest accuracy historically achieved on the validation set. The AE network for comparison is trained in the same manner as in this paper. The final results are shown in Table 6, Table 7 and Table 8.
From Table 6, Table 7 and Table 8, it can be seen that the proposed model achieves high diagnostic accuracy on all three datasets, with the smallest standard deviation. In terms of classification results, Dataset_1 exhibits the highest classification difficulty, followed by the HUST dataset, and then the CWRU dataset. While the proposed model achieves the highest average accuracy on Dataset_1 and the HUST dataset, it does not reach the highest accuracy on the CWRU dataset. This performance drop may be attributed to mild overfitting caused by the relatively excessive network capacity for this simpler dataset. Supporting this, the loss curve in Figure 10b shows larger fluctuations in gradient magnitude during training, suggesting less stable optimization dynamics when the model capacity exceeds the data complexity. In such cases, appropriately reducing the model parameters may lead to improved performance on simpler datasets.
Furthermore, the comparatively lower performance of the fully connected stacked autoencoder (AE) can be explained by two factors. First, fully connected AEs often involve a large number of parameters, leading to inefficient utilization for one-dimensional vibration signals. Second, unlike convolutional autoencoders, AEs do not explicitly capture local structural characteristics of the signal. Since bearing fault signals typically exhibit localized impulsive patterns, convolutional architectures are better suited to extract these discriminative features, resulting in superior diagnostic performance.

4.4. Ablation Study

To investigate the impact of each module on the model’s performance, the decoder part of the proposed method was removed, leaving only the encoder and classifier to form a classification model, referred to as Model I. The attention layer was then removed from both the proposed method and Model I to form Models II and III, respectively. Ablation experiments were conducted to verify the effectiveness of each component of the model. The experimental results are shown in Table 9, Table 10 and Table 11.
From the results, it can be observed that the introduction of adversarial training, channel attention layers and residual structure can enhance the model’s feature extraction capability. This improvement is particularly evident when the dataset presents a higher classification difficulty. However, when the dataset is relatively simple, overfitting may occur, leading to a decrease in classification accuracy. In such cases, it is appropriate to reduce the number of layers in the model.
To facilitate observation of the signals decomposed by the encoder, the outputs of the encoder trained on Dataset_1 were summed across channels, and the resulting signals were plotted in both the time and frequency domains for comparison with the original signals, as shown in Figure 11.
For rolling bearing faults, the main characteristic lies in the periodic impact components of the signal. From the time-domain signal obtained by summing the encoder outputs across channels, it can be observed that the impacts are more pronounced after the encoder transformation, and the impact frequency is close to the fault frequency of 124 Hz. Moreover, the fault frequency can be directly identified from the frequency-domain representation of the summed encoder outputs. The signals processed by the encoder thus exhibit an effect similar to envelope demodulation. For comparison, the envelope spectrum of the original signal is shown in Figure 12, from which it can be observed that the identified fault characteristic frequencies are consistent with those revealed by the envelope spectrum.
Similarly, the time-domain and frequency-domain waveforms of the sum of encoder outputs across all channels, trained on the HUST dataset, were plotted and compared with the original signals, as shown in Figure 13. For reference, the envelope spectrum of the original signal is shown in Figure 14. The summed encoder outputs in the time domain exhibit noticeable impact components; however, since the impacts in the original signals are less pronounced, only the more prominent portions are highlighted. The rotational speed of the bearing corresponding to the original signals is 75 Hz, and the calculated inner-race fault frequency is approximately 407 Hz. This fault frequency can also be identified in the frequency-domain representation of the summed encoder outputs, and it is consistent with the characteristic frequency obtained from envelope spectrum analysis. The signals processed by the encoder continue to exhibit an effect similar to envelope demodulation.
The rolling bearing fault signals collected in our laboratory were input into the encoder trained on the CWRU dataset. The time-domain and frequency-domain waveforms of the original signals and the sum of encoder outputs across all channels are shown in Figure 15. In addition, the envelope spectrum of the original signal is presented in Figure 16. These signals were not used in neural network training, and both the bearing model and operating conditions differ from those in the training data. Nevertheless, the summed encoder outputs in the time domain still exhibit pronounced impact components. The rotational speed corresponding to the original input signals is 25 Hz, and the calculated outer-race fault frequency is approximately 76 Hz. This fault frequency can also be identified in the frequency-domain representation of the summed encoder outputs and is consistent with the characteristic frequency obtained from envelope spectrum analysis, indicating that the encoder has effectively learned to extract fault-related features.
These observations suggest that through training, the autoencoder can learn an effective adaptive functional decomposition for signal preprocessing, allowing impact components in the signals to be effectively identified. Moreover, this transformation exhibits a degree of transferability, which may enable future applications in signal analysis: by feeding complex signals into the encoder and analyzing its outputs, characteristic patterns corresponding to different fault types can be discerned.

5. Conclusions

This paper proposes a novel adversarial autoencoder framework for rolling bearing fault diagnosis, addressing the limitations of existing methods that either rely on complex signal preprocessing or lack interpretability. Inspired by the formal analogy between signal decomposition algorithms and autoencoders, as well as the equivalence between CNN convolution operations and inner-product-based waveform matching, a convolutional autoencoder was designed to perform adaptive signal decomposition within the network. In this view, the autoencoder can be regarded as a data-driven signal decomposition approach, where each encoder channel corresponds to a nonlinear signal component, and the decoder ensures reconstruction fidelity. A channel attention mechanism adaptively reweights these components, and a classifier serves as a discriminator to enforce class separability in the latent space.
The autoencoder is trained via an adversarial-inspired strategy, achieving both low reconstruction error and highly discriminative latent features. Ablation studies confirm the effectiveness of the key components, including attention mechanisms and the adversarial training strategy. Analysis of encoder outputs demonstrates that the learned transformation emphasizes impact components in the signals, producing effects similar to envelope demodulation, and exhibits a degree of transferability across different datasets and operating conditions. Quantitative evaluation demonstrates the model’s robust performance, achieving an accuracy of 99.64 ± 0.29%, recall of 99.62 ± 0.29%, and an F1-score of 99.63 ± 0.29%, confirming the reliability and effectiveness of the proposed framework for rolling bearing fault diagnosis.
Overall, the proposed method provides an end-to-end, interpretable, and adaptive framework for intelligent fault diagnosis that eliminates the need for separate preprocessing steps while achieving high classification accuracy. At the same time, certain limitations remain. For example, the model may overfit when applied to simpler datasets with relatively low complexity; its robustness under extreme noise or highly variable operating conditions has not been fully evaluated, and the convolutional architecture may be less sensitive to non-local signal patterns. Future work will focus on addressing these limitations by enhancing the model’s robustness to noise, extending the framework to extract and enhance additional types of signal components, and incorporating model transfer strategies to facilitate knowledge transfer across different machines or operating environments, ultimately improving the model’s generalization to more complex and previously unseen operating conditions.

Author Contributions

Conceptualization, W.Z. and H.X.; software, H.X.; validation, W.Z. and X.Z.; investigation, X.Z.; resources, W.Z.; data curation, H.X.; writing—original draft preparation, X.Z.; writing—review and editing, W.Z. and H.X.; funding acquisition, W.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Yunnan Provincial Key Laboratory of Intelligent Logistics Equipment and Systems (Grant No.: 202449CE340008), Shen Weiming Academician Workstation of Yunnan Province (Grant No.: 202505AF350084) and Science and Technology Project of Yunnan Provincial Universities Serving Key Industries (Grant No.: FWCY-ZNT2025015).

Data Availability Statement

The data supporting this study can be obtained upon request from the corresponding author. However, due to privacy considerations and the presence of undisclosed intellectual property, these data are not accessible to the public.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Lei, Y.; Han, T.; Wang, B.; Li, N.; Yan, T.; Yang, J. XJTU-SY rolling element bearing accelerated life test datasets: A tutorial. Chin. J. Mech. Eng. 2019, 55, 1–6. [Google Scholar] [CrossRef]
  2. Li, W.; He, C.; Chen, Z.; Huang, R.; Jin, G. Unsupervised fault diagnosis method of gearbox based on symmetrical comparative learning. Chin. J. Sci. Instrum. 2022, 43, 121–131. [Google Scholar] [CrossRef]
  3. Li, H.; Liu, T.; Wu, X.; Li, S. Correlated SVD and its application in bearing fault diagnosis. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 355–365. [Google Scholar] [CrossRef]
  4. Liu, W.; Liu, Y.; Zhai, Z.; Li, S. Time-reassigned multi-synchrosqueezing S-Transform for bearing fault diagnosis. IEEE Sens. J. 2023, 23, 22813–22822. [Google Scholar] [CrossRef]
  5. Ren, Z.; Wang, Y.; Tang, H.; Chen, X.; Feng, W. Time-synchronous-averaging-spectrum based on super-resolution analysis and application in bearing fault signal identification. J. Zhejiang Univ. Sci. A 2024, 25, 573–585. [Google Scholar] [CrossRef]
  6. Cao, Z.; Cui, L.; Lei, B.; Wang, J.; Cao, S. Feature dimensionality reduction and random forest method in intelligent diagnosis of rolling bearings for urban rail trains. J. Jilin Univ. Eng. Ed. 2022, 52, 2287–2293. [Google Scholar] [CrossRef]
  7. Zhou, J.; Xiao, M.; Niu, Y.; Ji, G. Rolling bearing fault diagnosis based on WGWOA-VMD-SVM. Sensors 2022, 22, 6281. [Google Scholar] [CrossRef] [PubMed]
  8. Li, X.; Jin, W. Fault diagnosis of rolling bearing based on ISSA-SVM. J. Vib. Shock 2023, 42, 106–114. [Google Scholar] [CrossRef]
  9. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classif. Deep Convolutional Neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  10. Huang, Y.; Liao, A.; Hu, D.; Shi, W.; Zheng, S. Multi-scale convolutional network with channel attention mechanism for rolling bearing fault diagnosis. Measurement 2022, 203, 111935. [Google Scholar] [CrossRef]
  11. Zhao, H.; Liu, J.; Chen, H.; Chen, J.; Li, Y.; Xu, J.; Deng, W. Intelligent diagnosis using continuous wavelet transform and Gauss convolutional deep belief network. IEEE Trans. Rel. 2023, 72, 692–702. [Google Scholar] [CrossRef]
  12. Duan, X.; Jiao, M.; Lei, C.; Li, J. A rolling bearing fault diagnosis method based on MTF-MSMCNN with small sample. J. Aerosp. Power 2024, 39, 240–252. [Google Scholar] [CrossRef]
  13. Guo, P.; Zhang, W.; Cui, B.; Guo, Z.; Zhao, C.; Yin, Z.; Liu, B. Multi-condition fault diagnosis method of rolling bearing based on enhanced deep convolutional neural network. J. Vib. Eng. 2025, 38, 96–108. [Google Scholar] [CrossRef]
  14. Jang, K.; Pilario, K.E.S.; Lee, N.; Moon, I.; Na, J. Explainable Artificial Intelligence for Fault Diagnosis of Industrial Processes. IEEE Trans. Ind. Inform. 2023, 21, 4–11. [Google Scholar] [CrossRef]
  15. Yang, D.; Karimi, H.R.; Gelman, L. An Explainable Intelligence Fault Diagnosis Framework for Rotating Machinery. Neurocomputing 2023, 541, 126257. [Google Scholar] [CrossRef]
  16. Russell, M.; Wang, P. Physics-informed Deep Learning for Signal Compression and Reconstruction of Big Data in Industrial Condition Monitoring. Mech. Syst. Signal Process. 2022, 168, 108709. [Google Scholar] [CrossRef]
  17. Wang, H.; Liu, Z.; Peng, D.; Zuo, M.J. Interpretable Convolutional Neural Network with Multilayer Wavelet for Noise-Robust Machinery Fault Diagnosis. Mech. Syst. Signal Process. 2023, 195, 110314. [Google Scholar] [CrossRef]
  18. Yan, R.; Zhou, Z.; Yang, Y.; Li, Y.; Hu, C.; Tao, Z.; Zhao, Z.; Wang, S.; Chen, X. Challenges and opportunities of XAI in industrial intelligent diagnosis: Attribution explanation. Chin. J. Mech. Eng. 2024, 60, 21–40. [Google Scholar] [CrossRef]
  19. Yan, R.; Shang, Z.; Wang, Z.; Xu, W.; Zhao, Z.; Wang, S.; Chen, X. Challenges and opportunities of XAI in industrial intelligent diagnosis: Priori-empowered. Chin. J. Mech. Eng. 2024, 60, 1–20. [Google Scholar] [CrossRef]
  20. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for im-age recognition. In Proceedings of the IEEE Conference on Computuer Vision and Pattern Recognition, (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
  21. Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar] [CrossRef]
  22. Wang, Z.; Li, T.; Xu, W.; Sun, C.; Zhang, J.; Xu, B.; Yan, R. Denoising mixed attention variational auto-encoder for axial piston pump fault diagnosis. Chin. J. Mech. Eng. 2024, 60, 167–177. [Google Scholar] [CrossRef]
  23. Chen, Z.; Zhong, Q.; Huang, R.; Liao, Y.; Li, J.; Li, W. Intelligent fault diagnosis for machinery based on enhanced transfer convolutional neural network. Chin. J. Mech. Eng. 2021, 57, 96–105. [Google Scholar] [CrossRef]
  24. He, R.; Wang, J.; Dai, J.; Zhu, Z.; Liu, J. Dynamic balanced domain-adversarial networks for cross-domain fault diagnosis of train bearings. IEEE Trans. Instrum. Meas. 2022, 71, 3514612. [Google Scholar] [CrossRef]
  25. Liu, B.; Yan, C.; Liu, Y.; Lv, M.; Huang, Y.; Wu, L. ISEANet: An interpretable subdomain enhanced adaptive network for unsupervised cross-domain fault diagnosis of rolling bearing. Adv. Eng. Inform. 2024, 62, 102610. [Google Scholar] [CrossRef]
  26. Li, X.; Ma, J.; Wu, J.; Li, Z.; Tan, Z. Transformer-based conditional generative transfer learning network for cross domain fault diagnosis under limited data. Sci. Rep. 2025, 15, 6836. [Google Scholar] [CrossRef]
  27. Yu, S.; Li, Z.; Gu, J.; Wang, R.; Liu, X.; Li, L.; Guo, F.; Ren, Y. CWMS-GAN: A small-sample bearing fault diagnosis method based on continuous wavelet transform and multi-size kernel attention mechanism. PLoS ONE 2025, 20, e0319202. [Google Scholar] [CrossRef]
  28. Case Western Reserve University Bearing Data Center, “Bearing data file,” Case Western Reserve University. Available online: https://engineering.case.edu/bearingdatacenter/download-data-file (accessed on 24 February 2025).
  29. Zhao, C.; Zio, E.; Shen, W. Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study. Rel. Eng. Syst. Saf. 2024, 245, 109148. [Google Scholar] [CrossRef]
  30. Chen, H.; Wei, J.; Huang, H.; Yuan, Y.; Wang, J. Review of imbalanced fault diagnosis technology based on generative adversarial networks. J. Comput. Des. Eng. 2024, 11, 99–124. [Google Scholar] [CrossRef]
  31. Sifat, M.S.I.; Kabir, M.A.; Islam, M.M.M.; Rehman, A.U.; Bermak, A. GAN-based data augmentation for fault diagnosis and prognosis of rolling bearings: A literature review. IEEE Access 2025, 13, 148083–148103. [Google Scholar] [CrossRef]
  32. Zhao, Z.; Li, T.; Wu, J.; Sun, C.; Wang, S.; Yan, R.; Chen, X. Deep learning algorithms for rotating machinery intelligent diagnosis: An open source benchmark study. ISA Trans. 2020, 107, 224–255. [Google Scholar] [CrossRef]
Figure 1. 1D convolution operation.
Figure 1. 1D convolution operation.
Processes 14 00245 g001
Figure 2. Residual block structure.
Figure 2. Residual block structure.
Processes 14 00245 g002
Figure 3. Autoencoder structure.
Figure 3. Autoencoder structure.
Processes 14 00245 g003
Figure 4. Adversarial autoencoder network architecture.
Figure 4. Adversarial autoencoder network architecture.
Processes 14 00245 g004
Figure 5. Training strategy.
Figure 5. Training strategy.
Processes 14 00245 g005
Figure 6. Model training flow chart.
Figure 6. Model training flow chart.
Processes 14 00245 g006
Figure 7. Test bench structure.
Figure 7. Test bench structure.
Processes 14 00245 g007
Figure 8. Rolling bearing fault type.
Figure 8. Rolling bearing fault type.
Processes 14 00245 g008
Figure 9. Confusion matrix of test set.
Figure 9. Confusion matrix of test set.
Processes 14 00245 g009
Figure 10. Training losses and accuracy curves.
Figure 10. Training losses and accuracy curves.
Processes 14 00245 g010aProcesses 14 00245 g010b
Figure 11. Time-domain and frequency-domain waveforms of the raw signal from the bearing with an inner race fault and the encoder output summed across all channels (Dataset_1). (a) Time-domain waveform of the original signal. (b) Time-domain waveform of the summed encoder outputs. (c) Frequency-domain waveform of the original signal. (d) Frequency-domain waveform of the summed encoder outputs.
Figure 11. Time-domain and frequency-domain waveforms of the raw signal from the bearing with an inner race fault and the encoder output summed across all channels (Dataset_1). (a) Time-domain waveform of the original signal. (b) Time-domain waveform of the summed encoder outputs. (c) Frequency-domain waveform of the original signal. (d) Frequency-domain waveform of the summed encoder outputs.
Processes 14 00245 g011
Figure 12. Envelope spectrum of the raw signal from the bearing with an inner race fault (Dataset_1).
Figure 12. Envelope spectrum of the raw signal from the bearing with an inner race fault (Dataset_1).
Processes 14 00245 g012
Figure 13. Time-domain and frequency-domain waveforms of the raw signal from the bearing with an inner race fault and the encoder output summed across all channels (HUST dataset). (a) Time-domain waveform of the original signal. (b) Time-domain waveform of the summed encoder outputs. (c) Frequency-domain waveform of the original signal. (d) Frequency-domain waveform of the summed encoder outputs.
Figure 13. Time-domain and frequency-domain waveforms of the raw signal from the bearing with an inner race fault and the encoder output summed across all channels (HUST dataset). (a) Time-domain waveform of the original signal. (b) Time-domain waveform of the summed encoder outputs. (c) Frequency-domain waveform of the original signal. (d) Frequency-domain waveform of the summed encoder outputs.
Processes 14 00245 g013
Figure 14. Envelope spectrum of the raw signal from the bearing with an inner race fault (HUST dataset).
Figure 14. Envelope spectrum of the raw signal from the bearing with an inner race fault (HUST dataset).
Processes 14 00245 g014
Figure 15. Time-domain and frequency-domain waveforms of the untrained bearing fault signal and the encoder output summed across all channels. (a) Time-domain waveform of the original signal. (b) Time-domain waveform of the summed encoder outputs. (c) Frequency-domain waveform of the original signal. (d) Frequency-domain waveform of the summed encoder outputs.
Figure 15. Time-domain and frequency-domain waveforms of the untrained bearing fault signal and the encoder output summed across all channels. (a) Time-domain waveform of the original signal. (b) Time-domain waveform of the summed encoder outputs. (c) Frequency-domain waveform of the original signal. (d) Frequency-domain waveform of the summed encoder outputs.
Processes 14 00245 g015
Figure 16. Envelope spectrum of the untrained bearing fault signal.
Figure 16. Envelope spectrum of the untrained bearing fault signal.
Processes 14 00245 g016
Table 1. Parameters of the autoencoder and classifier.
Table 1. Parameters of the autoencoder and classifier.
EncoderDecoderClassifier
ModuleOutput SizeModuleOutput SizeModuleOutput Size
Input Layer1 × 1024Latent Variable Layer64 × 1024Latent Variable Layer64 × 1024
RB 1128 × 1024RB 832 × 1024Channel Attention Layer64 × 1024
128 × 1024 32 × 1024RB 1564 × 512
128 × 1024 64 × 1024 64 × 512
RB 264 × 1024RB 932 × 1024 128 × 512
64 × 1024 32 × 1024RB 1664 × 512
128 × 1024 64 × 1024 64 × 512
RB 364 × 1024RB 1032 × 1024 128 × 512
64 × 1024 32 × 1024AvgPool128 × 256
128 × 1024 64 × 1024RB 1764 × 128
RB 464 × 1024RB 1164 × 1024 64 × 128
64 × 1024 64 × 1024 256 × 128
64 × 1024 128 × 1024RB 18128 × 128
RB 532 × 1024RB 1264 × 1024 128 × 128
32 × 1024 64 × 1024 512 × 128
64 × 1024 128 × 1024AdaptiveAvgPool512 × 1
RB 632 × 1024RB 1364 × 1024DropoutP = 0.5
32 × 1024 64 × 1024FCClasess
64 × 1024 128 × 1024
RB 732 × 1024RB 14128 × 1024
32 × 1024 128 × 1024
64 × 1024 1 × 1024
Latent Variable Layer64 × 1024Output Layer1 × 1024
Table 2. Implementation details.
Table 2. Implementation details.
ItemDetails
HardwareCPU13th Gen Intel® Core™ i7-13700KF 3.40 GHz
(Intel Corporation, Santa Clara, CA, USA)
GPUNVIDIA GeForce RTX 4090 24 G
(NVIDIA Corporation, Santa Clara, CA, USA)
SoftwareOperating systemWindows11
(Microsoft Corporation, Redmond, WA, USA)
Python3.9.10
(Python Software Foundation, Wilmington, DE, USA)
PyTorch1.13.0
(Linux Foundation, San Francisco, CA, USA)
MATLABR2018a
(The MathWorks Inc., Natick, MA, USA)
ModelOptimizerAdam
Learning rate0.001
Batch size32
Epochs128
Table 3. Fault type of dataset_1.
Table 3. Fault type of dataset_1.
Failure ModeSample NumberLabel
Roller fault 0.2 mm750B02:0
Roller fault 0.4 mm750B04:1
Roller fault 0.6 mm750B06:2
Combination fault 0.2 mm750C02:3
Combination fault 0.4 mm750C04:4
Combination fault 0.6 mm750C06:5
Inner ring fault 0.2 mm750I02:6
Inner ring fault 0.4 mm750I04:7
Inner ring fault 0.6 mm750I06:8
Normal bearing750N:9
Outer ring fault 0.2 mm750O02:10
Outer ring fault 0.4 mm750O04:11
Outer ring fault 0.6 mm750O06:12
Table 4. Fault type of CWRU dataset.
Table 4. Fault type of CWRU dataset.
Failure ModeSample NumberLabel
Roller fault 07 mils473B07:0
Roller fault 14 mils475B14:1
Roller fault 21 mils475B21:2
Inner ring fault 07 mils476I07:3
Inner ring fault 14 mils472I14:4
Inner ring fault 21 mils474I21:5
Normal bearing1657N:6
Outer ring fault 07 mils475O07:7
Outer ring fault 14 mils474O14:8
Outer ring fault 21 mils476O21:9
Table 5. Fault type of HUST dataset.
Table 5. Fault type of HUST dataset.
Failure ModeSample NumberLabel
Medium ball fault10240.5X_B:0
Medium combination fault10240.5X_C:1
Medium inner race fault10240.5X_I:2
Medium outer race fault10240.5X_O:3
Severe ball fault1024B:4
Severe combination fault1024C:5
Healthy bearing1024H:6
Severe inner race fault1024I:7
Severe outer race fault1024O:8
Table 6. Diagnostic accuracy of each model (%).
Table 6. Diagnostic accuracy of each model (%).
ModelDataset Acc.Total Avg. Acc.
Dataset_1CWRUHUST
AlexNet98.18 ± 0.2899.18 ± 0.2099.34 ± 0.1998.90 ± 0.57
BiLSTM97.05 ± 0.2599.76 ± 0.1199.11 ± 0.1798.64 ± 1.19
CNN99.13 ± 0.1699.92 ± 0.0799.24 ± 0.1499.43 ± 0.38
CNN+CWT98.77 ± 0.2599.49 ± 0.2399.51 ± 0.1199.26 ± 0.40
CNN+EMD98.51 ± 0.1499.99 ± 0.0397.85 ± 0.2298.79 ± 0.92
ResNet1898.95 ± 0.1799.86 ± 0.1199.71 ± 0.1499.50 ± 0.43
AE43.79 ± 0.0078.36 ± 1.0071.74 ± 1.3464.63 ± 15.27
Proposed99.29 ± 0.1699.92 ± 0.0799.73 ± 0.0999.64 ± 0.29
Table 7. Recall of each model (%).
Table 7. Recall of each model (%).
ModelDataset RecallTotal Avg. Recall
Dataset_1CWRUHUST
AlexNet98.12 ± 0.2998.96 ± 0.2599.35 ± 0.1998.81 ± 0.57
BiLSTM96.97 ± 0.2699.69 ± 0.1399.14 ± 0.1598.60 ± 1.21
CNN99.10 ± 0.1699.90 ± 0.1099.24 ± 0.1399.41 ± 0.37
CNN+CWT98.74 ± 0.2599.38 ± 0.2899.52 ± 0.1199.21 ± 0.41
CNN+EMD98.59 ± 0.1399.99 ± 0.0497.84 ± 0.2298.81 ± 0.92
ResNet1898.92 ± 0.1799.81 ± 0.1499.72 ± 0.1399.48 ± 0.43
AE43.94 ± 0.0074.53 ± 1.4671.91 ± 1.3463.46 ± 14.12
Proposed99.27 ± 0.1699.88 ± 0.0999.72 ± 0.1099.62 ± 0.29
Table 8. F1-score of each model (%).
Table 8. F1-score of each model (%).
ModelDataset F1-ScoreTotal Avg. F1
Dataset_1CWRUHUST
AlexNet98.12 ± 0.2998.94 ± 0.2599.35 ± 0.1998.80 ± 0.57
BiLSTM96.97 ± 0.2699.70 ± 0.1399.12 ± 0.1698.59 ± 1.21
CNN99.11 ± 0.1699.90 ± 0.1099.23 ± 0.1499.41 ± 0.37
CNN+CWT98.74 ± 0.2599.27 ± 0.2999.52 ± 0.1199.21 ± 0.41
CNN+EMD98.58 ± 0.1499.99 ± 0.0397.84 ± 0.2298.80 ± 0.92
ResNet1898.92 ± 0.1799.82 ± 0.1499.71 ± 0.1499.48 ± 0.43
AE42.41 ± 0.0073.89 ± 1.2971.66 ± 1.6562.65 ± 14.63
Proposed99.27 ± 0.1699.88 ± 0.0999.73 ± 0.0999.63 ± 0.29
Table 9. Ablation experimental results—Accuracy (%).
Table 9. Ablation experimental results—Accuracy (%).
ModelDataset Acc.Total Avg. Acc.
Dataset_1CWRUHUST
Model I99.22 ± 0.2799.88 ± 0.1699.50 ± 0.1299.53 ± 0.33
Model II99.17 ± 0.2599.88 ± 0.0999.38 ± 0.1699.48 ± 0.35
Model III98.95 ± 0.1599.92 ± 0.1199.30 ± 0.1099.39 ± 0.42
Proposed99.29 ± 0.1699.91 ± 0.0799.73 ± 0.0999.64 ± 0.29
Table 10. Ablation experimental results—Recall (%).
Table 10. Ablation experimental results—Recall (%).
ModelDataset RecallTotal Avg. Recall
Dataset_1CWRUHUST
Model I99.20 ± 0.2899.86 ± 0.1899.50 ± 0.1299.52 ± 0.34
Model II99.15 ± 0.2699.84 ± 0.1299.37 ± 0.1699.45 ± 0.35
Model III98.92 ± 0.1599.90 ± 0.1499.29 ± 0.1099.37 ± 0.43
Proposed99.27 ± 0.1699.88 ± 0.0999.72 ± 0.1099.62 ± 0.29
Table 11. Ablation experimental results—F1-score (%).
Table 11. Ablation experimental results—F1-score (%).
ModelDataset F1-ScoreTotal Avg. F1
Dataset_1CWRUHUST
Model I99.21 ± 0.2799.85 ± 0.2099.50 ± 0.1299.52 ± 0.33
Model II99.16 ± 0.2599.84 ± 0.1299.38 ± 0.1699.46 ± 0.34
Model III98.93 ± 0.1599.90 ± 0.1599.30 ± 0.1099.37 ± 0.43
Proposed99.27 ± 0.1699.88 ± 0.0999.73 ± 0.0999.63 ± 0.29
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, W.; Zhang, X.; Xu, H. Rolling Element Bearing Fault Diagnosis Based on Adversarial Autoencoder Network. Processes 2026, 14, 245. https://doi.org/10.3390/pr14020245

AMA Style

Zhang W, Zhang X, Xu H. Rolling Element Bearing Fault Diagnosis Based on Adversarial Autoencoder Network. Processes. 2026; 14(2):245. https://doi.org/10.3390/pr14020245

Chicago/Turabian Style

Zhang, Wenbin, Xianyun Zhang, and Han Xu. 2026. "Rolling Element Bearing Fault Diagnosis Based on Adversarial Autoencoder Network" Processes 14, no. 2: 245. https://doi.org/10.3390/pr14020245

APA Style

Zhang, W., Zhang, X., & Xu, H. (2026). Rolling Element Bearing Fault Diagnosis Based on Adversarial Autoencoder Network. Processes, 14(2), 245. https://doi.org/10.3390/pr14020245

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop