A Hybrid Approach Based on a Windowed-EMD Temporal Convolution–Reallocation Network and Physical Kalman Filtering for Bearing Remaining Useful Life Estimation

Wei, Zhe; Lang, Lang; Chen, Mo; Ge, Chao; Tong, Enguo; Chen, Liang

doi:10.3390/machines13090802

Open AccessArticle

A Hybrid Approach Based on a Windowed-EMD Temporal Convolution–Reallocation Network and Physical Kalman Filtering for Bearing Remaining Useful Life Estimation

by

Zhe Wei

^1,2

,

Lang Lang

^1,*

,

Mo Chen

¹

,

Chao Ge

³,

Enguo Tong

³ and

Liang Chen

³

¹

School of Mechanical Engineering, Shenyang University of Technology, Shenyang 110870, China

²

The Key Laboratory of Intelligent Manufacturing and Industrial Robots in Liaoning Province, Shenyang 110870, China

³

Ansteel Group Automation Co., Ltd., Anshan 114009, China

^*

Author to whom correspondence should be addressed.

Machines 2025, 13(9), 802; https://doi.org/10.3390/machines13090802

Submission received: 24 July 2025 / Revised: 29 August 2025 / Accepted: 1 September 2025 / Published: 3 September 2025

(This article belongs to the Special Issue AI-Driven Intelligent Maintenance and Health Management for Complex Industrial Systems)

Download

Browse Figures

Versions Notes

Abstract

Rolling bearings are one of the core components of industrial equipment. Owing to the rapid development of deep learning methods, a multitude of data-driven remaining useful life (RUL) estimation approaches have been proposed recently. However, several challenges persist in existing methods: the limited accuracy of traditional data-driven models, instability in sequence prediction, and poor adaptability to diverse operational environments. To address these issues, we propose a novel prognostics approach integrating three key components: time-intrinsic mode functions-derived feature representation (TIR) sequences, a one-dimensional temporal feature convolution–reallocation network (TFCR) with a flexible configuration scheme, and a physics-based Kalman filtering method. The approach first converts denoised signals into TIR-sequences using windowed empirical mode decomposition (EMD). The TFCR network then extracts hidden high-dimensional features from these sequences and maps them to the initial RUL. Finally, physics-based Kalman filtering is applied to enhance prediction stability and enforce physical constraints, producing refined RUL estimates. The experimental results based on the XJTU-SY dataset show the superiority of the proposed approach and further prove the feasibility of this method in bearing RUL estimation.

Keywords:

remaining useful life; rolling bearings; convolutional neural network; Kalman filtering

1. Introduction

The Internet of Things (IoT) connects various devices and objects to the network, and its applications drive the transformation of multiple industries through intelligent data collection, analysis, and processing [1]. Among these applications, the Industrial Internet of Things (IIoT) uses IoT technology within the industrial sector to enhance industrial intelligence [2,3]. Prognostic and health management (PHM) is a part of this advancement, which uses cloud and edge computing patterns to more effectively maintain equipment health [4,5]. This cloud-edge calculation pattern in PHM, which rapidly processes historical and local data, can prevent failures and improve the production efficiency of industrial equipment [1,6,7]. As a crucial aspect of the prognostics methods, remaining useful life (RUL) estimation contributes significantly to the PHM of industrial parts [8]. For example, rolling bearings are core components of industrial rotary equipment [9]. Their operating status directly affects the overall performance and production efficiency of the equipment. The accurate estimation for rolling bearings can reduce unexpected downtime and maintenance costs and enhance equipment reliability [10,11]. Typical methods for predicting RUL mainly include two categories [12]:

Model-based methods: These methods are based on physical models and use mathematical equations to describe degradation mechanisms such as fatigue, wear, or corrosion [13]. However, these methods are unsuitable for prognostics with modeling complexity and limited generalizability [14].
Data-driven methods: These approaches utilize statistical techniques or machine learning algorithms, e.g., the Markov process, Kalman filter, Wiener process, or particle filter, to model degradation processes [15]. While effective, large amounts of monitoring information render it difficult to capture complex mapping using statistical distributions [16].

Recently, deep learning has effectively addressed this problem by automatically extracting features to reveal hidden degradation information [17,18]. Furthermore, they can be deployed across the source domain without prior knowledge or assumptions, rendering them more suitable for complex systems and domains [19]. Firstly, the effectiveness of the estimation heavily depends on the quality of the features [15], which can be divided into three main approaches:

Direct approach: In this approach, the model employs an end-to-end prediction strategy by integrating the feature extraction step directly into the model framework. For instance, Xia et al. [20] propose a Long Short-Term Memory (LSTM) neural network to provide prognostics. This method segments time sequence data through sliding windows and applies the LSTM model to extract features and predict the RUL for each window. Similarly, Wang et al. [21] introduced a Deep Separable Convolutional Neural Network (DSCN) approach to process overlapping sampling windows for RUL prediction directly. The above methods use a model to segment the raw signal data and obtain a large number of windows mapped to RUL, solving specific nonlinear problems. The model must be capable of dynamically adjusting to large volumes of windowed data, which increases the computation complexity of training.
Hybrid approach: This approach can flexibly process physically meaningful features based on the characteristics of the signal, and then evaluate HI, in a total of two steps. For example, Chen et al. [22] propose a hybrid method to provide prognostics that combine five bandpass energy values from the frequency spectrum with an attention-based recurrent neural network (RNN) for further prediction. Similarly, Guo et al. [23] propose a health indicator (RNN-HI), which includes correlation similarity features and classical time-domain features based on recurrent neural networks. Moreover, Zhou et al. [24] also introduce a hybrid approach, transformer (TRM), to perform RUL tasks based on 14 time-domain features. Furthermore, Cao et al. [25] propose TCN-RSA, which employs causal dilated convolutions to capture long-term dependencies and extract high-level features from the time-frequency domain, incorporating residual self-attention mechanisms to obtain feature contributions at different time steps during bearing degradation. Additionally, Liu et al. [26] propose a hybrid framework that combines temporal convolutional networks (TCNs) and LSTM networks with a Convolutional Block Attention Module (CBAM) for multi-dimensional feature weighting. In short, the above methods have been proven to be effective in prognostics. This can reduce the parameters and increase calculation speed, but it may face challenges when it comes to dealing with high interference noise and complicated mapping over long distances [27].
Physics-informed hybrid approach: This emerging paradigm integrates physical degradation laws and domain knowledge into data-driven models to enhance prediction reliability and interpretability. Specifically, Lu et al. [28] integrate physical consistency constraints into the LSTM loss function to ensure predictions comply with monotonic degradation laws. Similarly, Yang et al. [29] utilize dynamic adaptive IDFT frequency domain blocks with multi-state memory units for physics-constrained time–frequency fusion prediction. Furthermore, Hu et al. [30] combine separable convolutional feature extraction with physics-informed neural networks, learning implicit physical degradation patterns while incorporating physics constraint losses. Collectively, these methods address the black-box nature of data-driven approaches by incorporating physical principles, thereby enhancing model interpretability and prediction reliability. Nevertheless, they may face challenges in modeling complex real-world physical relationships and require domain expertise for proper constraint formulation.

Secondly, the model architecture substantially impacts prognostic accuracy. Fixed input dimensions and model structures can reduce prediction effectiveness, limiting suitability for practical engineering applications. For instance, Yin et al. [8] emphasize that effective feature extraction and appropriate model parameter settings can significantly enhance prediction performance.

Thirdly, complex architectures with numerous parameters are typically required to capture temporal and nonlinear degradation information. However, this complexity can result in high latency and poor performance when deployed on cloud and edge platforms [31].

An effective prediction depends on well-informed input for deep-learning-based approaches [15]. Among signal processing methods, Empirical Mode Decomposition (EMD) [32] is particularly suitable for processing complex non-stationary and nonlinear signals. Meanwhile, this method has been demonstrated to effectively enhance input information for prediction tasks [33,34,35]. As part of hybrid feature engineering, EMD can adaptively decompose the raw signal into multiple time–frequency-scale intrinsic modal functions (IMFs) across different time–frequency scales, thereby mitigating the influence of noise on the original signal. This advantage effectively solves the problem of hybrid engineering mentioned above. Moreover, in recent times, one-dimensional convolution neural networks (1D-CNN) have demonstrated considerable potential in processing long sequences [36]. The convolution mode inherently has the feature of parameter sharing to reduce the number of parameters. Additionally, it can further reduce the parameters by adjusting the convolution category, kernel sizes, and the number of channels [37,38]. In a typical 1D-CNN architecture, the number of channels gradually increases as the network depth increases, while the time dimension, namely the feature map, is gradually reduced through pooling operations [21,39,40]. However, reducing the time dimension resolution through pooling can lead to the loss of important temporal information, which is critical in tasks such as RUL prediction. In this regard, a Physics-informed hybrid bearing RUL estimation method is proposed based on the windowed EMD feature engineering method, an effective multi-model flexible configuration strategy, and one-dimensional temporal convolution neural network models, and the novel method provides a robust and accurate solution for bearing RUL estimation. It primarily comprises three aspects:

Vibration signals are monitored and analyzed, providing comprehensive degradation information. The feature engineering method is an adjustable windowed EMD time-domain feature extraction method, and its purpose is to convert the denoised vibration signals into TIR-sequences as the input. Moreover, experimental validation demonstrates that this method is feasible for characteristic degradation.
A one-dimensional temporal convolution model (TFCR) is combined to extract hidden features from TIR-sequences and construct RUL. Specifically, the proposed model employs a causal convolution architecture in the time dimension to preserve temporal information. In parallel, a bottleneck top–down design is used in the channel dimension, with channel weight reallocation at the final stage. It can ensure that channel compression does not lead to a significant loss of critical features. Experimental results also demonstrate that it can address bearing life prediction tasks with high accuracy and efficiency.
The TFCR model combines flexible configuration to select the most appropriate model, thereby adapting to complex working conditions with different data distributions. Then, TFCR is combined with physics-based Kalman filtering to improve estimation stability.

The remainder of this paper is organized as follows. Firstly, Section 2 presents the proposed TIR-sequences, and Section 3 presents the proposed specific model components of TFCR, and the architecture of each layer. In addition, Section 4 introduces a flexible configuration scheme and a physically constrained Kalman filter approach for physically constraining, smoothing, and suppressing outliers in the RUL series extracted by TFCR. Finally, the validation experiments are shown in Section 5, in which a comparison of related studies is provided, while Section 6 discuss limitations and future research directions.

2. The TIR Sequences for the Temporal Degradation Model

In this section, the rolling bearing temporal degradation model is introduced. Subsequently, the process and method for obtaining TIR sequences are presented, which serve as the input for the TFCR model discussed in Section 3.

2.1. The Temporal Degradation Model

Rolling bearings inevitably undergo a nonlinear degradation process until failure. However, the inherent complexity of this process poses significant challenges for accurate predictive modeling. The temporal degradation model simplifies this nonlinearity by approximating it with a linearly decreasing number of remaining sampling cycles.

As illustrated in Figure 1, the circular symbols

n_{1}, n_{2}, \dots, n_{d}

represent feature vectors extracted during sampling duration, with the horizontal axis depicting the time series of the bearing degradation process. Together, these elements establish the temporal framework for mapping features to their corresponding degradation stages.

2.2. The TIR Sequences Extraction Method Based on Windowed EMD

The TIR sequence based on windowed EMD feature engineering contains three key steps:

Step 1: Divide the vibration signal at each time node into adjustable, non-overlapping time windows, and apply EMD to decompose the denoised signal into intrinsic mode functions (IMFs) and residual function (RES).
Step 2: Extract time-domain features from the IMFs and RES within each window, and concatenate them to form a feature vector representing that time node.
Step 3: Arrange these feature vectors chronologically to construct a complete TIR-sequence with temporal and feature dimensions.

It should be noted that the original signal has been denoised before applying EMD, because EMD is mainly used for signal decomposition rather than noise reduction in this approach. Figure 1 shows the complete flowchart of these steps. (TIR-sequences incorporate features extracted from both IMFs and residual components obtained through EMD decomposition, providing a comprehensive representation of the signal characteristics across time windows). The feature dimension is calculated as follows:

n_{d} = n_{i} \times n_{f} \times n_{w} \times n_{s}

(1)

where

n_{d}

refers to the dimensions extracted,

n_{i}

represents the number obtained after the EMD process,

n_{f}

is the number of time-domain features from [24] that have been verified as degradation-sensitive features,

n_{w}

denotes the number of adjustable windows into which the sampling time is divided, and

n_{s}

signifies the number of sensors involved.

Unlike traditional EMD, which continues decomposition until the residual becomes monotonic, this approach uses a fixed number of decomposition layers (m) based on empirical analysis of bearing signal characteristics. This modification prevents over-decomposition while ensuring consistent feature dimensionality across all time windows. Ultimately, Equation (2) shows the final decomposition with m modes:

x (t) = \sum_{i = 1}^{m} I M F_{i} (t) + r_{m} (t)

(2)

Regarding feature processing, the cumulative feature transformation method proposed in [24] was initially considered to enhance monotonicity and trend characteristics through point-wise accumulation and scaling operations. However, two issues were identified when applying this method after TIR sequences extraction: (1) noise amplification in the transformed features, and (2) severe overfitting during model training. Consequently, we discarded the cumulative feature transformation method and used the raw features directly extracted from IMFs and residuals.

3. The TFCR Model Architecture

This section introduces the proposed patchwork model-TFCR, which consists of temporal feature compression layers and channel weight redistribution layers and can be categorized into large and light models.

3.1. Temporal Feature Compression Layer

The temporal feature compression layers use a causal convolution architecture [36], implementing masked padding in the temporal dimension to form a unidirectional structure, which effectively addresses the limitations of conventional convolutions in capturing temporal dependencies, as shown in Figure 2 and Figure 3. The architecture maintains the temporal dimension across layers while applying compression to the channel, thereby reducing the parameters and achieving dimensionality reduction. Simultaneously, compared with layer a, layer b expands the receptive field of the temporal dimension by employing dilated convolution kernels. The convolution Equations (3) and (4) of these layers are as follows:

y (t, c) = \sum_{m = 1}^{M} \sum_{k = 0}^{K - 1} w_{k, m, c} \cdot x (t - d \cdot k, m)

(3)

x (t - d \cdot k, m) = \{\begin{matrix} x (t - d \cdot k, m) & t - d \cdot k \geq 0 \\ masked & t - d \cdot k < 0 \end{matrix}

(4)

where

w_{k, m, c}

refers to the kernel weight, associated with temporal index k, input channel m, and output channel c;

y (t, c)

is the output at time node t and channel c;

x (t - d \cdot k, m)

is the value of the input at time node

t - d \cdot k

and channel m, which is masked when

t - d \cdot k < 0

(typically set to zero). d is the dilation coefficient that controls the expansion of the receptive field.

3.2. Channel Weight Redistribution Layer

The channel weight redistribution layer dynamically adjusts feature importance. Figure 4 shows its architecture with a global average pooling layer and a feature redistribution block. In contrast to SEblock [41], which uses fully connected layers for dimensionality reduction, the layer in the figure employs

1 \times 1

convolution without channel scaling, preserving all feature information. The output is obtained by element-wise multiplication with the generated channel weights. The mathematical formulation of this layer is given by Equation (5):

FRL {(x)}_{t, c} = σ (w_{2} δ (w_{1} \frac{1}{T} \sum_{i = 1}^{T} x_{i, c})) \cdot x_{t, c}

(5)

where

x_{t, c}

denotes the input at time step t and channel c, and T represents the total time steps. Matrices

w_{1}

and

w_{2}

are learnable parameters. The ReLU activation function

δ

introduces nonlinearity, while the Sigmoid function

σ

constrains the channel weights to the range

(0, 1)

. These weights selectively emphasize important channel features while suppressing less relevant ones.

4. The Implementation of RUL Estimation

This section integrates the padded and normalized TIR sequences with the proposed TFCR models using a flexible scheme strategy to generate initial RUL predictions. Additionally, physics-based Kalman filtering is introduced to refine these initial RUL estimates.

4.1. Data Preparation and Flexible TFCR Scheme

To enhance model convergence and performance, TIR sequences undergo normalization, scaling the data to a suitable range compatible with the output of sigmoid activation function (0–1). Additionally, padding is applied at the beginning of the sequences to ensure alignment between input and output sample counts, as shown in Equations (6)–(8).

{\tilde{x}}_{f, t} = \frac{x_{f, t} - x_{f, \min}}{x_{f, \max} - x_{f, \min}}

(6)

[P| X] = [\begin{matrix} p_{1} & \dots & p_{l - 1} \\ p_{1} & \dots & p_{l - 1} \\ ⋮ & ⋱ & ⋮ \\ p_{1} & \dots & p_{l - 1} \end{matrix}| \begin{matrix} x_{1, 1} & \dots & x_{1, t} \\ x_{2, 1} & \dots & x_{2, t} \\ ⋮ & ⋱ & ⋮ \\ x_{f, 1} & \dots & x_{f, t} \end{matrix}]

(7)

| P | = l - 1

(8)

The above equations represent the preprocessing steps applied to the TIR sequences. Here, f represents the feature dimension, t denotes the temporal dimension, and l is the input sequence length for the model. Equation (6) performs feature-wise Min-Max normalization, where

x_{f, t}

is the original value of feature f at time step t, and

x_{f, \min}

and

x_{f, \max}

are the minimum and maximum values of feature f across all time steps. Equations (7) and (8) illustrate the zero-padding process, where matrix P with length

l - 1

is prepended to the original data matrix X to ensure temporal alignment when using a stride of 1.

The flexible scheme is based on a dynamic adjustment strategy, specifically adjusting the feature dimension of the TIR-sequences according to the window size

n_{w}

and m. As described in Section 2, the accepted dimension and parameters inside TFCR also gradually reduce to adapt to changes in input. For example, Table 1 shows four distinct schemes in this approach with their corresponding parameters. These schemes can be divided into large and light models. In particular, these schemes represent only a subset of possible configurations.

4.2. Physics-Based Kalman Filtering for RUL Refinement

The original network outputs often exhibit fluctuations, particularly in long sequence outputs, which are misinterpreted by the filter as observation noise. Thence, trend windows

F_{t}

are employed to compute the averaged RUL values as observation inputs for the Kalman filter. Simultaneously, all trend values within the smooth window

F_{s}

are compared against an anomaly threshold

C_{o}

using Z-score (9). When anomalies are detected (i.e.,

Z_{k} > C_{o}

), the system adaptively increases the observation variance to reduce the Kalman gain, while decreasing the trust parameter to make the filtering results closer to historical states (10). Finally, the adaptive filtering output is generated through a trust-weighted combination (11), and RUL physical constraints are applied to ensure the rationality of predictions.

Z_{k} = \frac{| z_{k}^{t} - M D (H_{k}) |}{M A D (H_{k}) \times 1.4826}

(9)

α_{k} = C_{t} \times {(1 + min (\frac{Z_{k}}{C_{o}}, 3.0))}^{- 1}, R_{k} = C_{m} \times (1 + min (\frac{Z_{k}}{C_{o}}, 3.0))

(10)

{\hat{x}}_{k} = α_{k} \cdot {\hat{x}}_{k}^{r a w} + (1 - α_{k}) \cdot {\hat{x}}_{k - 1}

(11)

where

z_{k}^{t}

represents the trend value at time k obtained by averaging DL predictions within the trend window,

H_{k}

denotes the historical trend values within the smooth window,

{\hat{x}}_{k}^{r a w}

is the standard Kalman filter output, and

{\hat{x}}_{k - 1}

is the previous filtering result. The scaling factor of 1.4826 and saturation threshold of 3.0 are adopted from robust statistical practice to provide meaningful parameter ranges, with all final parameter values optimized through a systematic grid search on validation data rather than theoretical derivation.

5. Experimental Evaluation of the Proposed RUL Estimation Approach

5.1. Introduction to the XJTU-SY Dataset

The XJTU-SY bearing dataset contains vibration signals from bearings with various failure modes (inner/outer ring wear and fracture and cage fracture). Data was collected at a 25.6 kHz sampling frequency with dual accelerometers until failure occurred (when amplitude reached 10× normal levels). Table 2 shows the classification of training sets and test sets in this paper.

5.2. Training Strategy in Experiment

The training uses the AdamW optimizer with MSE loss, variable learning rates, and early stopping. Training runs 100–800 epochs depending on dataset size, with the learning rate starting at

1 \times 10^{- 4}

(

1 \times 10^{- 5}

for condition 3) and reducing by 50% after 60 epochs without improvement. Regularization includes weight decay (

1 \times 10^{- 5}

,

5 \times 10^{- 5}

), gradient clipping (0.8, 1.0, 2.0), and Kaiming initialization [42] for temporal layers. The implementation was conducted using PyTorch 2.1.0 on an Intel i7-12700 CPU (Intel Corp., Santa Clara, CA, USA) and an NVIDIA RTX 3070 Ti GPU (NVIDIA Corp., Santa Clara, CA, USA).

The synthetic time-series data preserve temporal characteristics based on the linear degradation model from Section 2. The synthetic component generates new RUL values with small perturbations, then uses linear interpolation to map RUL to features. Three perturbations ensure uniqueness: (1) horizontal perturbations based on inter-bearing variation, (2) vertical perturbations proportional to feature standard deviation, and (3) degradation perturbations across degradation stages. Synthetic samples use 50–70% of the original bearing data length. To validate the quality of the synthetic data, a comprehensive similarity analysis was performed between training, testing, and validation datasets using working conditions 1 and 2 as representative examples. Figure 5 presents the comparative analysis of dataset similarity on normalized features using multiple metrics, including statistical moments, quantiles, distribution shapes, KS-statistic-based similarity scores, and correlation structures, with averaged similarity scores for evaluation.

y = y_{1} + \frac{x - x_{1}}{x_{2} - x_{1}} (y_{2} - y_{1}) + Δ_{horizontal, j} + Δ_{vertical, j} + Δ_{Degradation, j}

(12)

where x is the newly generated RUL value,

x_{1}

and

x_{2}

are adjacent RUL values from original data (

x_{1} \leq x \leq x_{2}

),

y_{1}

and

y_{2}

are corresponding feature values. The three perturbations are as follows:

Δ_{horizontal, j} \sim N (0, σ_{inter, j})

represents inter-bearing differences, where

σ_{inter, j}

is the standard deviation of feature j means across all training bearings. This fixed offset simulates the differences in characteristics between bearings.

Δ_{vertical, j} \sim N (0, σ_{feature, j} \cdot r_{noise})

represents measurement variations, where

σ_{feature, j}

is the internal variability of the selected bearing for feature j.

Δ_{Degradation, j} \sim N (0, σ_{feature, j} \cdot \frac{R U L_{\max} - R U L_{current}}{R U L_{\max} - R U L_{\min}})

represents degradation-dependent noise that increases as RUL decreases, simulating signal instability near failure.

5.3. TFCR–Kalman Filter for RUL Estimation of TIR Sequences

Table 3 shows the average experimental results of various models at their optimal sequence lengths on the test set. The evaluation metrics include Rmse, Score (as defined by equations in the PHM 2012 Challenge), Time (average processing time per sample in ms), and Size (in MB). Although the models demonstrate similar overall average prediction results, they each excel differently under their respective advantageous conditions and scenarios. Figure 6, Figure 7 and Figure 8 show the prediction curves of each model under different conditions, where the black line represents the actual RUL values.

The objective function adapts to sequence length: short sequences prioritize MSE accuracy (

F_{t} = 1

), while long sequences minimize center-deviation penalties with distance-based penalty scaling. Table 4 presents the optimal parameters identified through grid search and the corresponding ablation study results for Kalman filter configuration and performance evaluation.

The proposed model is evaluated in terms of computational speed and accuracy. Table 5 presents a comparison of the efficiency of the computation between the proposed TFCR model and established sequential models from the NLP domain, including GRU [43], LSTM [44], and Transformer [45], under identical experimental conditions (as specified in Table 1). The average computational time is normalized by sequence length to enable a fair comparison across different input sizes. Table 6 compares the precision of the proposed model with that of other studies on the same dataset, demonstrating its effectiveness.

5.4. Result

The experimental results demonstrate that the proposed TFCR method successfully maps TIR sequences to RUL, with the physical Kalman filtering integration providing consistent performance improvements across all variants. The four TFCR configurations achieved baseline RMSE values of 0.0884, 0.0953, 0.1023, and 0.1135, respectively. After physical Kalman filter integration, these improved to 0.0767, 0.0827, 0.0894, and 0.0946, representing RMSE reductions of 13.24%, 13.22%, 12.61%, and 16.65%, respectively. Similarly, MAE improvements ranged from 9.3% to 21.4% (9.3%, 12.6%, 21.4%, and 14.2% for scheme 1–4), while prediction scores enhanced by 8.8%, 7.4%, 6.7%, and 12.5%, respectively.

6. Discussion

The experimental validation confirms the feasibility of the proposed hybrid method. Despite our best-performing method (TFCR-KF 1: 0.0767) lagging behind the leading approach, our hybrid method provides superior interpretability and ensures RUL physical consistency.

More importantly, our method exhibits significant limitations. First, the TIR sequences extraction process requires extensive data preprocessing, substantially increasing system complexity. Second, our TFCR network faces bottlenecks in feature representation and RUL mapping capabilities, particularly in frequency domain processing. While our approach employs windowed EMD with temporal feature extraction, leading methods utilize more sophisticated frequency analysis techniques. For instance, advanced approaches like TCN-RSA employ marginal spectrum analysis from the Hilbert–Huang Transform, which provides richer frequency domain representations compared to our basic windowed EMD approach. Similarly, other state-of-the-art methods incorporate advanced spectral techniques such as adaptive frequency decomposition, multi-scale wavelet analysis, and frequency-specific attention mechanisms, whereas our method relies on relatively simple temporal feature extraction from EMD components. Third, while Kalman filtering provides physical constraints, it also introduces additional parameter tuning requirements that may affect the method’s generalization capability.

Inspired by the foundational role of pre-training in large language models, future work could explore using TFCR as a temporal autoencoder component to perform pre-training dimensionality reduction on high-dimensional TIR sequences, which could then serve as input for downstream time-series models (such as TimesNet, Informer, etc.) to further enhance performance and efficiency.

Author Contributions

Conceptualization, L.L. and Z.W.; methodology, L.L.; validation, L.L. and Z.W.; formal analysis, L.L. and M.C.; investigation, L.L.; resources, C.G., E.T. and L.C.; writing—original draft preparation, L.L.; writing—review and editing, Z.W.; supervision, Z.W.; project administration, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 51975386), Liaoning Province “Unveiling and Commanding” Technology Projects (Grant No. 2022020630-JH1/108), and the Science and Technology Research and Development Program of China National Railway Group Corporation (Grant No. N2022J014). The APC was funded by the National Natural Science Foundation of China (Grant No. 51975386).

Data Availability Statement

This study uses publicly available benchmark datasets including the XJTU-SY bearing dataset (available at https://biaowang.tech/xjtu-sy-bearing-datasets/, accessed on 31 August 2025). No new datasets were generated during this study.

Acknowledgments

The authors would like to thank all of the collaborators for their valuable contributions to this research.

Conflicts of Interest

Authors Chao Ge, Enguo Tong and Liang Chen were emplyed by the company Ansteel Group Automation Co., Ltd. All authors declare no conflicts of interest.

References

Wang, X.; Yang, L.T.; Xie, X.; Jin, J.; Deen, M.J. A Cloud-Edge Computing Framework for Cyber-Physical-Social Services. IEEE Commun. Mag. 2017, 55, 80–85. [Google Scholar] [CrossRef]
Wang, X.; Yang, L.T.; Song, L.; Wang, H.; Ren, L.; Deen, M.J. A Tensor-Based Multiattributes Visual Feature Recognition Method for Industrial Intelligence. IEEE Trans. Ind. Inform. 2021, 17, 2231–2241. [Google Scholar] [CrossRef]
Ren, L.; Meng, Z.; Wang, X.; Zhang, L.; Yang, L.T. A data-driven approach of product quality prediction for complex production systems. IEEE Trans. Ind. Inform. 2020, 17, 6457–6465. [Google Scholar] [CrossRef]
Hamdan, S.; Ayyash, M.; Almajali, S. Edge-computing architectures for internet of things applications: A survey. Sensors 2020, 20, 6441. [Google Scholar] [CrossRef]
Deng, S.; Zhao, H.; Fang, W.; Yin, J.; Dustdar, S.; Zomaya, A.Y. Edge intelligence: The confluence of edge computing and artificial intelligence. IEEE Internet Things J. 2020, 7, 7457–7469. [Google Scholar] [CrossRef]
Compare, M.; Baraldi, P.; Zio, E. Challenges to IoT-enabled predictive maintenance for industry 4.0. IEEE Internet Things J. 2019, 7, 4585–4597. [Google Scholar] [CrossRef]
Hamadache, M.; Jung, J.H.; Park, J.; Youn, B.D. A comprehensive review of artificial intelligence-based approaches for rolling element bearing PHM: Shallow and deep learning. JMST Adv. 2019, 1, 125–151. [Google Scholar] [CrossRef]
Yin, Y.; Tian, J.; Liu, X. Remaining useful life prediction based on parallel multi-scale feature fusion network. J. Intell. Manuf. 2025, 36, 3111–3127. [Google Scholar] [CrossRef]
Rai, A.; Upadhyay, S.H. A review on signal processing techniques utilized in the fault diagnosis of rolling element bearings. Tribol. Int. 2016, 96, 289–306. [Google Scholar] [CrossRef]
Li, P.; Liu, X.; Yang, Y. Remaining useful life prognostics of bearings based on a novel spatial graph-temporal convolution network. Sensors 2021, 21, 4217. [Google Scholar] [CrossRef]
Yang, W.; Wang, Z.; Ma, H.; Qiao, H.; Diao, N. Gear Fault Diagnosis Based on Parameter Optimization VMD and Kurtosis Criterion. Modul. Mach. Tool Autom. Manuf. Tech. 2023, 0, 13–16+21. [Google Scholar] [CrossRef]
Zhao, H.; Liu, H.; Jin, Y.; Dang, X.; Deng, W. Feature extraction for data-driven remaining useful life prediction of rolling bearings. IEEE Trans. Instrum. Meas. 2021, 70, 1–10. [Google Scholar] [CrossRef]
Tinga, T.; Loendersloot, R. Physical Model-Based Prognostics and Health Monitoring to Enable Predictive Maintenance. In Predictive Maintenance in Dynamic Systems: Advanced Methods, Decision Support Tools and Real-World Applications; Lughofer, E., Sayed-Mouchaweh, M., Eds.; Springer International Publishing: Cham, Switzerland, 2019; pp. 313–353. [Google Scholar] [CrossRef]
Ramezani, S.; Moini, A.; Riahi, M. Prognostics and Health Management in Machinery: A Review of Methodologies for RUL prediction and Roadmap. Int. J. Ind. Eng. Manag. Sci. 2019, 6, 38–61. [Google Scholar]
Wang, X.; Wang, T.; Ming, A.; Han, Q.; Chu, F.; Zhang, W.; Li, A. Deep spatiotemporal convolutional-neural-network-based remaining useful life estimation of bearings. Chin. J. Mech. Eng. 2021, 34, 62. [Google Scholar] [CrossRef]
Alzubaidi, L.; Bai, J.; Al-Sabaawi, A.; Santamaría, J.; Albahri, A.S.; Al-dabbagh, B.S.N.; Fadhel, M.A.; Manoufali, M.; Zhang, J.; Al-Timemy, A.H.; et al. A survey on deep learning tools dealing with data scarcity: Definitions, challenges, solutions, tips, and applications. J. Big Data 2023, 10, 46. [Google Scholar] [CrossRef]
Ye, Y.; Yong, Z.; Han, D. Research on key technology of industrial artificial intelligence and its application in predictive maintenance. Acta Autom. Sin. 2020, 46, 2013–2030. [Google Scholar]
Lei, Y.; Li, N.; Guo, L.; Li, N.; Yan, T.; Lin, J. Machinery health prognostics: A systematic review from data acquisition to RUL prediction. Mech. Syst. Signal Process. 2018, 104, 799–834. [Google Scholar] [CrossRef]
Li, W.; Zhang, L.C.; Wu, C.H.; Wang, Y.; Cui, Z.X.; Niu, C. A data-driven approach to RUL prediction of tools. Adv. Manuf. 2024, 12, 6–18. [Google Scholar] [CrossRef]
Xia, M.; Zheng, X.; Imran, M.; Shoaib, M. Data-driven prognosis method using hybrid deep recurrent neural network. Appl. Soft Comput. 2020, 93, 106351. [Google Scholar] [CrossRef]
Wang, B.; Lei, Y.; Li, N.; Yan, T. Deep separable convolutional network for remaining useful life prediction of machinery. Mech. Syst. Signal Process. 2019, 134, 106330. [Google Scholar] [CrossRef]
Chen, Y.; Peng, G.; Zhu, Z.; Li, S. A novel deep learning method based on attention mechanism for bearing remaining useful life prediction. Appl. Soft Comput. 2020, 86, 105919. [Google Scholar] [CrossRef]
Guo, L.; Li, N.; Jia, F.; Lei, Y.; Lin, J. A recurrent neural network based health indicator for remaining useful life prediction of bearings. Neurocomputing 2017, 240, 98–109. [Google Scholar] [CrossRef]
Zhou, Z.; Liu, L.; Song, X.; Chen, K. Remaining useful life prediction method of rolling bearing based on Transformer model. J. Beijing Univ. Aeronaut. Astronaut. 2021, 49, 430–443. [Google Scholar] [CrossRef]
Cao, Y.; Ding, Y.; Jia, M.; Tian, R. A novel temporal convolutional network with residual self-attention mechanism for remaining useful life prediction of rolling bearings. Reliab. Eng. Syst. Saf. 2021, 215, 107813. [Google Scholar] [CrossRef]
Liu, Q.; Dai, Z.; Chen, P.; Lai, H.; Liang, Y.; Chen, M.; Xu, X.; Hou, M.; Wang, G. Remaining useful life prediction of rolling bearings based on TCN-LSTM. In Proceedings of the 14th International Conference on Quality, Reliability, Risk, Maintenance, and Safety Engineering (QR2MSE 2024), Harbin, China, 24–27 July 2024; Volume 2024, pp. 1016–1020. [Google Scholar] [CrossRef]
Zhao, D.; Li, J.; Cheng, W.; He, Z. Generalized demodulation transform for bearing fault diagnosis under nonstationary conditions and gear noise interferences. Chin. J. Mech. Eng. 2019, 32, 7. [Google Scholar] [CrossRef]
Lu, W.; Wang, Y.; Zhang, M.; Gu, J. Physics guided neural network: Remaining useful life prediction of rolling bearings using long short-term memory network through dynamic weighting of degradation process. Eng. Appl. Artif. Intell. 2024, 127, 107350. [Google Scholar] [CrossRef]
Yang, S.; Tang, B.; Wang, W.; Yang, Q.; Hu, C. Physics-informed multi-state temporal frequency network for RUL prediction of rolling bearings. Reliab. Eng. Syst. Saf. 2024, 242, 109716. [Google Scholar] [CrossRef]
Hu, Y.; Chao, Q.; Xia, P.; Liu, C. Remaining useful life prediction using physics-informed neural network with self-attention mechanism and deep separable convolutional network. J. Adv. Manuf. Sci. Technol. 2024, 4, 2024018. [Google Scholar] [CrossRef]
Merenda, M.; Porcaro, C.; Iero, D. Edge machine learning for ai-enabled iot devices: A review. Sensors 2020, 20, 2533. [Google Scholar] [CrossRef]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 1998, 454, 903–995. [Google Scholar] [CrossRef]
Su, B.; Sun, Y. Intelligent Prediction of Bearing Remaining Useful Life Based on Data Enhancement and Adaptive Temporal Convolutional Networks. J. Fail. Anal. Prev. 2023, 23, 2709–2720. [Google Scholar] [CrossRef]
Meng, Z.; Xie, Y.; Sun, J. Short-term load forecasting using neural attention model based on EMD. Electr. Eng. 2022, 104, 1857–1866. [Google Scholar] [CrossRef]
Li, G.; Tian, T.; Hao, F.; Yuan, Z.; Tang, R.; Liu, X. Day-Ahead Photovoltaic Power Forecasting Using Empirical Mode Decomposition Based on Similarity-Day Extension Without Information Leakage. Arab. J. Sci. Eng. 2024, 49, 6941–6957. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
Li, X.; Ding, Q.; Sun, J.Q. Remaining useful life estimation in prognostics using deep convolution neural networks. Reliab. Eng. Syst. Saf. 2018, 172, 1–11. [Google Scholar] [CrossRef]
Ren, L.; Liu, Y.; Wang, X.; Lü, J.; Deen, M.J. Cloud–edge-based lightweight temporal convolutional networks for remaining useful life prediction in iiot. IEEE Internet Things J. 2021, 8, 12578–12587. [Google Scholar] [CrossRef]
Eknath, K.G.; Diwakar, G. Prediction of Remaining Useful Life of Rolling Bearing using Hybrid DCNN-BiGRU Model. J. Vib. Eng. Technol. 2023, 11, 997–1010. [Google Scholar] [CrossRef]
Du, X.; Jia, W.; Yu, P.; Shi, Y.; Gong, B. RUL prediction based on GAM–CNN for rotating machinery. J. Braz. Soc. Mech. Sci. Eng. 2023, 45, 142. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. arXiv 2017, arXiv:1709.01507. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
Cho, K.; van Merrienboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]

Figure 1. Rolling bearing temporal degradation model. (1) Signal windowing and EMD decomposition; (2) Feature extraction from IMFs and RES; (3) Chronological arrangement into TIR-sequence.

Figure 2. TFCR model output flowchart.

Figure 3. Temporal feature compression layers.

Figure 4. Channel weight redistribution layer.

Figure 5. Comparative analysis of similarity.

Figure 6. Comparison estimation in condition 1.

Figure 7. Comparison estimation in condition 2.

Figure 8. Comparison estimation in condition 3.

Table 1. TIR sequences input and network configurations.

Scheme	$n_{i}$	$n_{w}$	$n_{s}$	Channel	Layer	Dilation	Blocks
Large 1	6	8	2	1056 → 512	a	1	2
					b	2	2
					c	1	1
Large 2	5	8	2	880 → 220	a	1	2
					b	2	2
					c	1	1
Light 3	5	4	2	440 → 160	a	1	1
					b	2	2
					b	4	1
					c	1	1
Light 4	5	2	2	220 → 60	a	1	1
					b	2	2
					b	4	1
					c	1	1

Table 2. Dataset split for training, validation, and testing sets.

Condition			Train	Validation	Test
No.	Load (kN)	Speed (rpm)	Train	Validation	Test
1	12	2250	$1_{1}$ , $1_{2}$ , $1_{3}$ , $1_{4}$	Synthetic	$1_{5}$
2	11	2500	$2_{1}$ , $2_{2}$ , $2_{3}$ , $2_{4}$		$2_{5}$
3	10	2400	$3_{1}$ , $3_{2}$ , $3_{3}$		$3_{4}$

Table 3. Performance comparison of models.

Scheme	Rmse			Avg.	Score			Avg.	Time			Size	Best Length
Scheme	C1	C2	C3	Avg.	C1	C2	C3	Avg.	C1	C2	C3	Size	C1	C2	C3
1	0.073	0.081	0.112	0.088	0.81	0.53	0.12	0.49	0.105	0.076	0.056	24.89	18	40	40
2	0.084	0.107	0.095	0.095	0.82	0.40	0.14	0.45	0.121	0.104	0.054	10.77	20	56	48
3	0.091	0.102	0.114	0.102	0.73	0.38	0.15	0.42	0.092	0.065	0.041	3.80	20	40	40
4	0.101	0.105	0.132	0.113	0.70	0.42	0.11	0.40	0.065	0.058	0.029	0.98	16	48	40

Table 4. Kalman filter configuration and performance summary.

Parameter	Scheme 1	Scheme 2	Scheme 3	Scheme 4
Core Parameters (C1, C2, C3)
Process Variance ( $C_{p}$ )	0.05, 0.05, 0.001	0.05, 0.0005, 0.0005	0.02, 0.001, 0.005	0.05, 0.02, 0.0005
Meas. Variance ( $C_{m}$ )	0.0001, 0.0001, 0.005	0.0001, 0.0001, 0.02	0.0001, 0.05, 0.02	0.0001, 0.0005, 0.05
Trust Factor ( $C_{t}$ )	0.95, 0.6, 0.85	0.9, 0.6, 0.9	0.8, 0.6, 0.95	0.6, 0.6, 0.98
Outlier Sensitivity ( $C_{o}$ )	5.0, 5.0, 4.0	5.5, 2.5, 3.0	2.0, 3.0, 3.5	2.5, 3.0, 2.5
Filter Parameters (C1, C2, C3)
Smoothing Window ( $F_{s}$ )	15, 3, 19	15, 3, 19	3, 2, 13	9, 2, 19
Decay Factor ( $F_{d}$ )	0.998, 0.9995, 0.9999	0.998, 0.9998, 0.9995	0.995, 0.9998, 0.9995	0.999, 0.998, 0.9995
Trend Window ( $F_{t}$ )	0, 4, 5	0, 3, 11	0, 2, 11	0, 3, 11
Performance Changes (C1, C2, C3)
RMSE (↓)	5.9, 12.6, 14.3	4.3, 8.2, 30.3	15.4, 19.5, 29.2	24.3, 6.5, 13.5
Score (↑)	2.5, 14.0, 9.9	0.8, 7.9, 13.4	7.2, 9.3, 3.7	7.5, 15.4, 14.6
MAE (↓)	5.6, 11.9, 10.4	5.1, 3.2, 29.5	15.1, 29.4, 19.9	25.3, 9.2, 8.2

Table 5. Computational speed comparison.

Model	Time Per Sample/Length (ms)
GRU	0.0101
LSTM	0.0120
Transformer	0.0153
TFCR	0.0067

Table 6. Computational performance comparison.

Model	RMSE
DSCN	0.0739
RCNN	0.0803
RVM	0.1082
MTCN	0.0732
Transformer	0.0701
PGLSTM	0.0866
TCN-RSA	0.0699
Proposed TFCR 1	0.0884
Proposed TFCR 2	0.0953
Proposed TFCR 3	0.1023
Proposed TFCR 4	0.1135
Proposed TFCR-KF 1	0.0767
Proposed TFCR-KF 2	0.0827
Proposed TFCR-KF 3	0.0894
Proposed TFCR-KF 4	0.0946

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, Z.; Lang, L.; Chen, M.; Ge, C.; Tong, E.; Chen, L. A Hybrid Approach Based on a Windowed-EMD Temporal Convolution–Reallocation Network and Physical Kalman Filtering for Bearing Remaining Useful Life Estimation. Machines 2025, 13, 802. https://doi.org/10.3390/machines13090802

AMA Style

Wei Z, Lang L, Chen M, Ge C, Tong E, Chen L. A Hybrid Approach Based on a Windowed-EMD Temporal Convolution–Reallocation Network and Physical Kalman Filtering for Bearing Remaining Useful Life Estimation. Machines. 2025; 13(9):802. https://doi.org/10.3390/machines13090802

Chicago/Turabian Style

Wei, Zhe, Lang Lang, Mo Chen, Chao Ge, Enguo Tong, and Liang Chen. 2025. "A Hybrid Approach Based on a Windowed-EMD Temporal Convolution–Reallocation Network and Physical Kalman Filtering for Bearing Remaining Useful Life Estimation" Machines 13, no. 9: 802. https://doi.org/10.3390/machines13090802

APA Style

Wei, Z., Lang, L., Chen, M., Ge, C., Tong, E., & Chen, L. (2025). A Hybrid Approach Based on a Windowed-EMD Temporal Convolution–Reallocation Network and Physical Kalman Filtering for Bearing Remaining Useful Life Estimation. Machines, 13(9), 802. https://doi.org/10.3390/machines13090802

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid Approach Based on a Windowed-EMD Temporal Convolution–Reallocation Network and Physical Kalman Filtering for Bearing Remaining Useful Life Estimation

Abstract

1. Introduction

2. The TIR Sequences for the Temporal Degradation Model

2.1. The Temporal Degradation Model

2.2. The TIR Sequences Extraction Method Based on Windowed EMD

3. The TFCR Model Architecture

3.1. Temporal Feature Compression Layer

3.2. Channel Weight Redistribution Layer

4. The Implementation of RUL Estimation

4.1. Data Preparation and Flexible TFCR Scheme

4.2. Physics-Based Kalman Filtering for RUL Refinement

5. Experimental Evaluation of the Proposed RUL Estimation Approach

5.1. Introduction to the XJTU-SY Dataset

5.2. Training Strategy in Experiment

5.3. TFCR–Kalman Filter for RUL Estimation of TIR Sequences

5.4. Result

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI