Remaining Useful Life Prediction of Rolling Bearings Based on Deep Time–Frequency Synergistic Memory Neural Network
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this manuscript, the authors tried to use a predictive framework that combines dynamic weighting mechanisms with hybrid deep learning. This framework incorporates continuous wavelet transform to generate two-dimensional time-frequency feature maps as degradation indicators, employs CNN for extracting local detailed features, integrates iTransformer modules with dynamic weighting mechanisms to enhance the focus on early subtle features, and leverages the time-dependent modeling capabilities of BiLSTM.
To the reviewer's point of view, the topic is interesting, and the results are promising. However, the following comments must be addressed before recommending it for publication:
(1) The state-of-the-art must be emphasized in the introduction.
(2) Several equations need at least one reference. Please mention at least one reference for each.
(3) In many places in the manuscript, the reference is missing and I encountered "Error! Reference resource not found", like lines 142, 190, 249, 276,277, etc. It is really not good. Please correct these inappropriate mistakes.
(4) Section 6 is called Patents! Do the author think that it is a really good choice for this section?
Comments on the Quality of English LanguageThe quality of English is good.
Author Response
Response to Reviewer #1
We express our sincere thanks to you and the reviewers for valuable comments on our manuscript. Those comments are very helpful for us to revise and improve our paper. According to these comments, we have made extensive and careful revisions one by one and would like to resubmit it for your reconsideration. The major revised portions are highlighted in red color in the revised paper. The details are shown in the following point-by-point.
Reviewer’s comments:
In this manuscript, the authors tried to use a predictive framework that combines dynamic weighting mechanisms with hybrid deep learning. This framework incorporates continuous wavelet transform to generate two-dimensional time-frequency feature maps as degradation indicators, employs CNN for extracting local detailed features, integrates iTransformer modules with dynamic weighting mechanisms to enhance the focus on early subtle features, and leverages the time-dependent modeling capabilities of BiLSTM.
To the reviewer's point of view, the topic is interesting, and the results are promising. However, the following comments must be addressed before recommending it for publication:
- The state-of-the-art must be emphasized in the introduction.
Reply: Thank you very much for the reviewer’s constructive and helpful suggestion. State-of-the-art technologies have been supplemented in the introduction. Highlights details are as following:
Line 83-91, in Section 1, Page 2:
Wang et al. [22] proposed a temporal graph convolution model, and Cao et al. [23] introduced an equidistant feature mapping TCN model, all aiming to enhance feature correlation through spatial topology modeling. Zheng et al.[24] developed a deep reinforcement learning (DRL)-integrated framework to mitigate RUL prediction instability stemming from overlooked temporal dependencies in conventional deep learning approaches, employing an autoencoder for degradation feature extraction coupled with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to establish temporal state dependencies.
Corresponding references, Line 506-511:
- Wang, Y.P.; Xu, Z.S.; Zhao, S.T.; et al. Performance degradation prediction of rolling bearing based on temporal graph convolutional neural network. Sci. Technol. 2024, 38, 4019–4036. DOI: 10.1007/s12206-024-0702-z.
- Cao, X.G.; Zhang, F.Q.; Zhao, J.B.; et al. Remaining useful life prediction of rolling bearing based on multi-domain mixed features and temporal convolutional networks. Sci. 2024, 14, 2354. 10.3390/app14062354.
- Zheng, G.K.; Li, Y.S.; Zhou, Z.; et al. A remaining useful life prediction method of rolling bearings based on deep reinforcement learning. IEEE Internet Things2024, 11, 22938-22949. DOI: 10.1109/JIOT.2024.3363610.
- Several equations need at least one reference. Please mention at least one reference for each.
Reply: Thank you for your valuable suggestion. We have carefully added appropriate references to support the equations in our manuscript. Specifically, we have cited Ref. 25 for equation (3), Ref. 27 for equations (4)-(9) and Ref. 28 for equations (12)-(15). These references have been added to provide proper attribution and strengthen the theoretical foundation of our work.
- In many places in the manuscript, the reference is missing and I encountered "Error! Reference resource not found", like lines 142, 190, 249, 276,277, etc. It is really not good. Please correct these inappropriate mistakes.
Reply: Thanks a lot for the reviewer’s professional comments. We carefully checked each reference and confirmed its source could be found.
- Section 6 is called Patents! Do the author think that it is a really good choice for this section?
Reply: Thanks for the reviewer’s professional suggestion. Section 6 has been deleted.
Reviewer 2 Report
Comments and Suggestions for AuthorsThe present manuscript introduces a Machine Learning algorithm based on Deep Learning to predict RUL of rolling element bearings. The method presents certain novelty and is well performed. Nevertheless, it should be improved prior to publication:
1- Citing 5 references at a time (line 54) is not a well recommended scientific practice.
2- Why do you utilized CWT and not other linear and computationally lighter time-frequency representations such as STFT?
3 - It would be useful to further introduce the utilized data by providing some experimental setup details. What do you mean by horizontal and vertical?
4 - Why is the WMA smoothing required?
5 - Which are the physical features of the vibration signals indicating degradation and thus, afffecting the RUL? Explain from an engineering standpoint.
6 - Figure 5. How do you define the frequency and time limits? Is the time defined in seconds?
7 - Is the time of color map (thermometer color map) relevant for the results?
8 - Figure 6. What are the blue and red lines meaning? And the tag? Attention must be paid to all figure tags and axis.
Author Response
Response to Reviewers
We express our sincere thanks to you for valuable comments on our manuscript. Those comments are very helpful for us to revise and improve our paper. According to these comments, we have made extensive and careful revisions one by one and would like to resubmit it for your reconsideration. The major revised portions are highlighted in red color in the revised paper. The details are shown in the following point-by-point.
Reviewer’s comments:
The present manuscript introduces a Machine Learning algorithm based on Deep Learning to predict RUL of rolling element bearings. The method presents certain novelty and is well performed. Nevertheless, it should be improved prior to publication:
- Citing 5 references at a time (line 54) is not a well recommended scientific practice.
Reply: Thank you very much for the reviewer’s constructive and helpful suggestion. We have modified the text and present a detailed expanded overview of the reference cited in the original manuscript line 54. Highlights details are as following:
Line 62-63, in Section 1, Page 2:
Deep learning technology has enabled data-driven feature extraction methods to demonstrate significant advantages [8-10].
Line 86-91, in Section 1, Page 2:
Zheng et al.[24] developed a deep reinforcement learning (DRL)-integrated framework to mitigate RUL prediction instability stemming from overlooked temporal dependencies in conventional deep learning approaches, employing an autoencoder for degradation feature extraction coupled with the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm to establish temporal state dependencies. (Ref.10 in original manuscript)
- Why do you utilized CWT and not other linear and computationally lighter time-frequency representations such as STFT?
Reply: Thank you very much for your attention to the selection of time-frequency analysis methods. It is true that the computational complexity of CWT as O (N log N) is higher than O (N) of STFT. The main considerations of using continuous wavelet transform (CWT) over STFT are as follows:
- Non-stationary signal adaptability: the vibration signal has transient impact and nonlinear characteristics (such as transient pulse caused by local defects of bearings). The variable scale characteristics of CWT can provide more refined temporal resolution in high frequency band and retain frequency resolution in low frequency band, which is crucial for early weak fault feature extraction.
- Edge effect suppression: The long fixed window of STFT causes insufficient high-frequency time resolution, while CWT effectively suppresses the spectrum leakage through the exponential attenuation characteristics of Morlet wavelets.
- It would be useful to further introduce the utilized data by providing some experimental setup details. What do you mean by horizontal and vertical?
Reply: Thanks for pointing out the shortcomings of the experimental description, we will add the following experimental details in the revision:
Line 284-291, in Subsection 3.1, Page 8:
The horizontal signal represents the vibration measurements in the radial direction of the bearing, exhibiting high sensitivity to radial loads and radial faults (e.g., outer ring defects and rolling element failures). Conversely, the vertical signal corresponds to the vibration measurements in the axial direction of the bearing, demonstrating significant sensitivity to axial loads and axial faults (e.g., inner ring defects and misalignment issues). The test unit collected a 0.1s vibration signal every 10s with a sampling frequency of 25.6 kHz. A total of 17 sets of bearing vibration data were collected under three different load and speed conditions as detailed in Table 1.
- Why is the WMA smoothing required?
Reply: Due to uncertainties in sensor measurements and actual operational processes, the data in this study may contain short-term fluctuations or noise. These random factors could obscure long-term trends or periodic patterns in the data. Smoothing processing is essential to effectively reduce noise interference and highlight the core variation patterns of the data, thereby providing a more robust foundation for subsequent analyses. The following explanations have been added to the revised manuscript. Thanks a lot for the reviewer’s professional comments.
Line 305-311, in Subsection 3.2, Page 9:
WMA assigns higher weights to recent data. This design allows WMA to sensitively capture the latest changes while maintaining reasonable utilization of historical information. Such a characteristic is particularly suitable for our focus on early degradation characteristics in this study. Furthermore, by adjusting the window size and weight distribution, WMA achieves a balance between noise suppression and effective signal preservation, thereby avoiding the loss of critical information caused by oversmoothing.
- Which are the physical features of the vibration signals indicating degradation and thus, afffecting the RUL? Explain from an engineering standpoint.
Reply: Thank you for the insightful question. From an engineering standpoint, we present the following information in the revised manuscript:
Line 49-58, in Section 1, Page 2:
The vibration amplitude follows a nonlinear progression during wear development - exhibiting gradual accumulation during initial stages followed by accelerated growth as damage propagates. Characteristic fault frequencies (such as raceway defect frequencies) and their associated harmonics/sidebands show progressive amplification with advancing degradation. The enlargement of localized defects enhances modulation phenomena, leading to increasingly intricate spectral compositions. These inherent properties make vibration analysis particularly valuable for RUL prediction in rolling bearing applications. However, due to the strongly nonlinear and time-varying relationships between these features and RUL, conventional physical models struggle to characterize dynamic degradation mechanisms under multi-fault coupling.
- Figure 5. How do you define the frequency and time limits? Is the time defined in seconds?
Reply: Thank you for raising this critical question. Here is a detailed clarification: Each subplot in Figure 5 displays a time-scale representation. The horizontal axis originally represented the time scale (dimensionless), corresponding to the wavelet translation parameter. The vertical axis represented the frequency scale (dimensionless), mapped to the wavelet dilation parameter. Based on the source data sampling frequency of 25.6 kHz, the total duration covered by the window is approximately 5 milliseconds. The figure with revised axes has been updated in the revised manuscript.
- Is the time of color map (thermometer color map) relevant for the results?
Reply: The time axes in the thermometer color maps presented in Fig. strictly correspond to the time resolution of wavelet transform. This temporal dimension inherently reflects the dynamic evolution process of vibration signals within the time-frequency domain. By preserving complete time-scale information, we achieve precise detection of transient pulse event occurrences during mechanical degradation (e.g., impacts when rolling elements pass through defect zones) while monitoring the temporal evolution patterns of amplitude modulation and frequency components. Furthermore, the sensitivity of time-scale characteristics proves crucial for capturing early-stage weak degradation features and determining RUL prediction outcomes. This preservation ensures the retention of transient event localization capabilities and maintains the integrity of dynamic feature correlations.
- Figure 6. What are the blue and red lines meaning? And the tag? Attention must be paid to all figure tags and axis.
Reply: Thank you for pointing out the issue with the figure labeling. In response to this comment, we have revised Figure 6 with the following modifications and clarations:
Curve Definitions:
Red Solid Line: Ture RUL value of the measured signal.
Blue Solid Line: RUL predicted value based on the proposed model.
Reviewer 3 Report
Comments and Suggestions for AuthorsIn this paper, the authors proposes a remaining useful life prediction of rolling bearings based on deep time-frequency synergistic memory NN.
First, some Typos: “Multivriate” to replace by “Multivariate”
Remarks and suggestions
CWT and CNN are well known in the literature, so I think it is not necessary to give details about them in the paper.
I don’t see how the optimized inverted transformer impacted the proposed approach mathematically or/and physically. In the section 3, the authors do not explain this impact! More interpretations are needed.
In the section 2, there is a missing subsection 2.5 (to be added) explaining the generic structure of the complete algorithm (flowchart and pseudo-code), applied in section 3. Because when reading section 3, there is a lot of confusion due to the absence of such an algorithm.
Knowing the domain of prognosis and the calculation of RUL, I am surprised by the precision of the calculation of obtained RUL. How did the authors proceed in the exploration of the data (learning, testing, validation)?
Author Response
Response to Reviewers
We express our sincere thanks to you and the reviewers for valuable comments on our manuscript. Those comments are very helpful for us to revise and improve our paper. According to these comments, we have made extensive and careful revisions one by one and would like to resubmit it for your reconsideration. The major revised portions are highlighted in red color in the revised paper. The details are shown in the following point-by-point.
Reviewer’s comments:
In this paper, the authors proposes a remaining useful life prediction of rolling bearings based on deep time-frequency synergistic memory NN.
- First, some Typos: “Multivriate” to replace by “Multivariate”
Reply: We sincerely appreciate the reviewer's constructive feedback. The term "Multivriate" has been systematically corrected to "Multivariate" throughout the manuscript. This revision has been carefully implemented in all relevant contexts. The correction is specifically reflected in Sections 2.4, as well as Figure 3 where the term originally appeared.
Figure 3. Optimized iTransformer structure.
- CWT and CNN are well known in the literature, so I think it is not necessary to give details about them in the paper.
Reply: We appreciate the reviewer's valuable perspective on methodological exposition. While fully acknowledging the established status of CWT and CNN in technical literature, we strategically retained concise descriptions (approximately 20 lines) in Sections 2.1 and 2.2 to ensure methodological self-containment. This selective presentation specifically highlights their customized implementation in our framework:1) CWT's time-frequency localization capability maintains simultaneous preservation of temporal dynamics and spectral signatures in vibration signals; 2) The multi-layer CNN architecture extracts hierarchical discriminative features to enhance sensitivity to incipient weak fault signatures. This technical synergy justifies the retained methodological exposition, as it directly supports the model's capability to detect early-stage degradation patterns critical for RUL prediction.
- I don’t see how the optimized inverted transformer impacted the proposed approach mathematically or/and physically. In the section 3, the authors do not explain this impact! More interpretations are needed.
Reply: Thanks a lot for the reviewer’s professional comments. We will add the following experimental details in the revision:
Line 345-350, in Section 3, Page 11:
The DWM employs Global Average Pooling (GAP) to extract global feature representations and utilizes a Multi-Layer Perceptron (MLP) to automatically generate channel-wise attention weights, which are adaptively optimized through back-propagation within the overall network framework. The weighted sequences are independently embedded into fixed-dimensional token representations before being fed into the iTransformer module for subsequent processing.
- In the section 2, there is a missing subsection 2.5 (to be added) explaining the generic structure of the complete algorithm (flowchart and pseudo-code), applied in section 3. Because when reading section 3, there is a lot of confusion due to the absence of such an algorithm.
Reply: Thank you very much for the reviewer’s constructive and helpful suggestion.
We have added this part and then revised the manuscript:
Text part:
2.5. Model composition
To address the long-term dependencies in rolling bearing vibration signals, this paper proposes a Time-Frequency Collaborative Dynamic Weighted Memory Network for RUL prediction. The model first employs a CNN to extract deep-level features from input 2D time-frequency (T-F) feature maps. These features are subsequently fed into a dynamic weighting mechanism layer to assign adaptive weights, followed by an iTransformer encoder to model global temporal dependencies. The enhanced features are then processed through a BiLSTM network to capture bidirectional long-term temporal relationships. Finally, the fused representations are transmitted to fully connected layers for bearing RUL prediction. The model framework is illustrated in Figure 4.
Figure 4. The model framework.
The algorithm of our proposed method is shown in Figure 5.
Figure 5. Pseudo-code of method steps
- Knowing the domain of prognosis and the calculation of RUL, I am surprised by the precision of the calculation of obtained RUL. How did the authors proceed in the exploration of the data (learning, testing, validation)?
Reply: We sincerely appreciate your insightful feedback and the opportunity to clarify our methodology. The concerns regarding data partitioning and RUL calculation are addressed as follows:
To rigorously evaluate the model’s generalization capability and perform ablation studies, we conducted experiments using full-life cycle data from the tested bearings. The dataset was partitioned into training, validation, and test sets at a ratio of 6:2:2, ensuring no temporal leakage between subsets. This division allows the model to learn degradation patterns across the entire operational lifespan while maintaining strict separation for unbiased evaluation. The corresponding description is also updated in the revision version:
Line 411-412, in Subsection 1, Page 13:
The division of the training set, validation set and test set is performed on the data set of the same bearing, with a ratio of 6:2:2.
Please download the attachment for the specific picture
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors addressed the provided comments satisfactorily.