Health Status Recognition Method for Rotating Machinery Based on Multi-Scale Hybrid Features and Improved Convolutional Neural Networks

Rotating machinery is susceptible to harsh environmental interference, and fault signal features are challenging to extract, leading to difficulties in health status recognition. This paper proposes multi-scale hybrid features and improved convolutional neural networks (MSCCNN) health status identification methods for rotating machinery. Firstly, the rotating machinery vibration signal is decomposed into intrinsic modal components (IMF) using empirical wavelet decomposition, and multi-scale hybrid feature sets are constructed by simultaneously extracting time-domain, frequency-domain and time-frequency-domain features based on the original vibration signal and the intrinsic modal components it decomposes. Secondly, using correlation coefficients to select features sensitive to degradation, construct rotating machinery health indicators based on kernel principal component analysis and complete health state classification. Finally, a convolutional neural network model (MSCCNN) incorporating multi-scale convolution and hybrid attention mechanism modules is developed for health state identification of rotating machinery, and an improved custom loss function is applied to improve the superiority and generalization ability of the model. The bearing degradation data set of Xi’an Jiaotong University is used to verify the effectiveness of the model. The recognition accuracy of the model is 98.22%, which is 5.83%, 3.30%, 2.29%, 1.52%, and 4.31% higher than that of SVM, CNN, CNN + CBAM, MSCNN, and MSCCNN + conventional features, respectively. The PHM2012 challenge dataset is used to increase the number of samples to validate the model effectiveness, and the model recognition accuracy is 97.67%, which is 5.63%, 1.88%, 1.36%, 1.49%, and 3.69% higher compared to SVM, CNN, CNN + CBAM, MSCNN, and MSCCNN + conventional features methods, respectively. The MSCCNN model recognition accuracy is 98.67% when validated on the degraded dataset of the reducer platform.


Introduction
With the development of industrialization, rotating machinery has achieved a wide range of applications among industries [1]. Affected by the harsh environmental interference and non-smooth working conditions during operation [2,3], the critical components of rotating machinery are prone to deformation, cracks and fractures, and other damages, and if they fail, they will cause incalculable economic losses to average production and even threaten personal safety. Therefore, the study of rotating machinery health status identification method is of great significance to improve the operational reliability and stability of rotating machinery [4].
The vibration signal of rotating machinery monitoring contains a lot of information, and the equipment health status identification method based on vibration signal has become a research hot spot. Rotating machinery health status recognition is mainly divided into methods based on signal processing, methods based on the combination of feature extraction and pattern recognition, and methods based on deep learning, among which the construction of feature engineering is the crucial step of rotating machinery health status recognition. The quality of the feature set directly determines the accuracy of health status recognition. Currently, related scholars have studied the extraction of statistical parameters in the time domain, frequency domain, and time-frequency domain based on signal processing methods, and have achieved specific results [5,6]. Liu [7] used the generalized S-transform to transform the original vibration signal in the time-frequency domain to obtain a two-dimensional time-frequency matrix and the singular value decomposition of the eigenmatrix to obtain the particular value vector group characterizing the characteristic information of the specific state parts of cylindrical roller bearings to achieve the identification of different state types of cylindrical roller bearings. Huang [8] used Transformer to extract features to obtain fine-grained better feature representation, combined with adversarial domain training and local maximum mean difference to guide feature learning by minimizing the difference in distribution between source and target domains resulting in better diagnosis. Choudhary [9] uses multi-input convolutional neural network technology to integrate the features of vibration signals and sound signals, then convert them into time-frequency spectrum, thus improving the accuracy of fault diagnosis. However, when coupled components are present in mechanical devices and harsh and complex environments, the devices tend to exhibit nonlinear behavior. Therefore, simple statistical feature parameter extraction cannot accurately identify fault characteristics. In the current research, few studies have unified advanced signal processing methods by extracting statistical parameters or entropy indicators in the time, frequency, and timefrequency domains [10], to construct complete multiscale hybrid domain feature sets to represent the degradation trend of rotating machinery.
In recent years, experts and scholars at home and abroad in the research of equipment health status identification methods can be broadly divided into three categories: knowledge-driven, model-driven, and data-driven [11]. In order to avoid problems such as a large amount of human experience required for knowledge-driven and the difficulty of establishing mathematical degradation models for model-driven, the data-driven health status identification methods have gained widespread attention [12]. Traditional data-driven methods for equipment health status identification based on data mainly include Bayesian classification [13], support vector machines [14], and artificial neural networks [15]. With the rapid development of deep learning, deep learning-related models have been developed in the field of rotating machinery health status identification due to their nonlinear solid mapping capability [16,17]. Lei [18] used mechanical frequency domain signals to train deep neural networks, combined unsupervised and supervised learning, and adaptively extracted extensive data fault features to identify the health conditions of multi-stage gear trains effectively. Chen [19] proposed a deep convolutional neural network-based health state identification method for planetary gearboxes, fusing the raw data of horizontal and vertical vibration signals and using a deep network to extract features for identification automatically. Zhang [20] stacked multiple self-encoders and classification layers to construct an integrated model of state feature self-learning and state recognition to achieve state recognition of rolling bearings. Chen [21] proposed a rolling bearing life state identification method based on source domain multi-sample integration, using random sampling to obtain multiple training samples within the source domain by multiple equal random selections of inter-class pieces in the source domain, constructing a kernel classifier using the kernel matrix and outputting the state identification results. However, the above methods still have shortcomings, such as difficulty in feature extraction and low accuracy of model recognition. To address the above problems, the health status recognition method for rotating machinery based on multi-scale hybrid features and improved convolutional neural networks is proposed. The main contributions of this paper are summarized as follows: • Empirical wavelet decomposition (EWT) of vibration signals, extraction of time domain, frequency domain and time-frequency-domain features on the decomposed signals, and establishment of multi-scale hybrid features sets.

•
The correlation coefficients are used to select features sensitive to degradation. Then the kernel principal component analysis method (KPCA) is used to construct health state identification indexes and complete the state category classification.

•
The convolutional neural network model is improved from the perspectives of multiscale features extraction, attention mechanism to screen optimal feature information, and optimization of losses to improve the accuracy of the health status of crucial components of rotating machinery.

Empirical Wavelet Transform
The empirical wavelet decomposition (EWT) constructs a set of filters by adaptive partitioning of the Fourier spectrum to obtain empirical wavelet functions with orthogonal properties, and performs empirical wavelet transform on the partitioned interval to propose amplitude-modulated frequency modulated single component components with tightly supported Fourier spectrum, which is better adaptive than wavelet transform and empirical modal decomposition, and does not have the problems of modal confusion and sizeable computational effort. EWT [22][23][24] essentially decomposes the signal into a sum of N + 1 eigenmode functions, which is calculated as follows: where: X(t) is the time domain signal; X i (t) is the useful amplitude modulation frequency component. Firstly, segment the signal spectrum X(w), build a wavelet filter bank, normalize the signal Fourier spectrum to [0, π], and then decompose it into N continuous spaces, use w n to represent the boundary of each continuous space, each interval is defined as n = [w n−1 , w n ], obviously there are N n=1 ∧ n = [0, π], and then take w n as the center frequency and 2τ n as the transition segment of adjacent intervals. For the determined N intervals, the empirical wavelet is defined as a bandpass filter on each interval, and Gilles constructs the empirical wavelet according to the construction method of Meyer wavelet, and uses the empirical wavelet function and the empirical scale function to decompose the signal by EWT. The empirical scale function and empirical wavelet function are calculated as: where: β is an arbitrary C k ([0, 1]) function that satisfies: The signal reconstruction is decomposed by EWT to obtain the IMF component f i (t) defined as:

Convolutional Neural Network
There are various structures of convolutional neural networks [25] (CNN). Still, the basic framework is similar and generally consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer, with the different layers connected by means of feature mapping [26].
The fundamentals of each layer are: The input layer is responsible for initially feeding the raw data into the neural network. The input layer is characterized as M-dimensional.
The convolutional layer is the core layer of CNN, which extracts the features of the input samples and outputs them to the next layer through the convolutional kernel. Convolution has two new features compared to the traditional fully connected neural network: one is the sparse connection, which connects with some neurons in the previous layer, and this area of the connected neuron is also called the perceptual field; the other is weight sharing, where each convolution kernel completes all processing with only one set of weight parameters. The convolutional operation is formulated as: where: X n j denotes the jth feature mapping of the nth layer, f denotes the activation function, M denotes the number of input feature mappings, X n−1 i denotes the ith feature mapping of the (n − 1)th layer, W n ij denotes the weights, and b j denotes the bias. The pooling layer is a way to reduce the dimensionality of the input data, thus representing the signal with high-quality features and reducing the amount of data processing in the next layer, preventing the overfitting of the model. The standard pooling methods are maximum, mean, random, and median pooling. In this paper, the maximum pooling is chosen, and the calculation formula is: where: n denotes the number of feature layers, and a n i denotes the ith feature of the nth layer. The fully-connected layer is to connect the pooled features to convert them into 1 × N dimensional matrices corresponding to each of the N outcomes to achieve classification. Each neuron in the fully connected layer is connected to all neurons in the previous layer. Therefore, the training parameters in this layer account for a larger proportion.
The output layer is then connected to the fully connected layer and outputs the final classification result. The structure of the convolutional neural network is shown in Figure 1.

Methodology of this Paper
The flow of the rotating machinery health state assessment model based on multiscale hybrid features and improved convolutional neural network (MSCCNN) established in this paper is shown in Figure 2. The specific contents of this paper are as follows: Firstly, multi-scale hybrid feature extraction of the original vibration signal of rotating machinery. Secondly, feature dimensionality reduction and health indicator construction for rotating machinery. Finally, the MSCCNN model for health status identification of rotating machinery.

Methodology of this Paper
The flow of the rotating machinery health state assessment model based on multiscale hybrid features and improved convolutional neural network (MSCCNN) established in this paper is shown in Figure 2. The specific contents of this paper are as follows: Firstly, multi-scale hybrid feature extraction of the original vibration signal of rotating machinery. Secondly, feature dimensionality reduction and health indicator construction for rotating machinery. Finally, the MSCCNN model for health status identification of rotating machinery.

Methodology of this Paper
The flow of the rotating machinery health state assessment model based on multiscale hybrid features and improved convolutional neural network (MSCCNN) established in this paper is shown in Figure 2. The specific contents of this paper are as follows: Firstly, multi-scale hybrid feature extraction of the original vibration signal of rotating machinery. Secondly, feature dimensionality reduction and health indicator construction for rotating machinery. Finally, the MSCCNN model for health status identification of rotating machinery.

Multi-Scale Hybrid Features Extraction
When a device fails, the relevant signal characteristics change simultaneously, including the amplitude and probability distribution in the time domain, the energy at different frequencies in the frequency domain, the position of the leading edge of the primary energy spectrum, and the structural distribution of the time-frequency power. Feature extraction is the key to health status identification. Single-scale features can identify and analyze the equipment under specific working conditions, but it isn't easy to reflect the operation status of the machinery comprehensively. In complex working conditions, the decomposition of vibration signals and the establishment of hybrid features at multiple scales can characterize the health state of rotating machinery more completely. For the above problems, EWT has a good effect on the decomposition of vibration signals of rotating machinery. In order to fully express the degradation characteristics of rotating machinery, this paper proposes a multi-scale hybrid features extraction method, which combines EWT decomposition to decompose the original vibration signal of rotating machinery at a deeper level, and extracts multi-scale hybrid features in the time domain, frequency domain, and time-frequency domain for the original vibration signal and its intrinsic modal components.
In this paper, the multiscale hybrid features are extracted simultaneously in the time domain, frequency domain, and time-frequency domain for the rotating machine vibration signal and the third-order IMF components of the vibration signal decomposition. The vibration signals of the rotating machine are extracted in X and Y directions, and the original vibration signals in X direction and their third-order IMF components are extracted with 30 features in the time domain, frequency domain, and time-frequency domain, respectively, which means that 120 features are extracted from the four vibration signals. One hundred twenty features are extracted from the original vibration signals in Y direction by the same operation, which means that 240 features are extracted in total.
A total of 12 features are extracted in the time domain, among which the dimensioned features include mean, variance, standard deviation, maximum value, minimum value, peak-to-peak value, root mean square value, absolute mean value, and square root amplitude, the dimensionless features include waveform indicator, peak indicator, and pulse indicator, whose expressions are shown in Table 1. For the convenience of experimental statistics, the time domain features extracted from the original vibration signal in the X direction are called XV-F 1 to XV-F 12 , and the time domain features extracted from its first order IMF components are named XIMF1-F 1 to XIMF1-F 12 . The naming of the time domain features extracted from the second and third order IMF components is the same as that of the time domain features extracted from the first order IMF components.

Parameter Expressions
Parameter Expressions A total of 10 features are extracted from the frequency domain, including spectral mean value, spectral mean square heel value, frequency domain frequency skewness, frequency domain frequency kurtosis, the center of gravity frequency, root mean square frequency, frequency amplitude variance, and frequency domain frequency kurtosis index, which reflect the magnitude of vibration energy in the frequency domain. Frequency domain analysis of vibration signals first requires the conversion of the time domain waveform of Sensors 2023, 23, 5688 7 of 21 the movement into spectral information with the help of the discrete Fourier transform, as follows: where: x(k∆tz) is the sampled value of the vibration signal; N is the number of sampling points; ∆t is the sampling interval; k is the ordinal number of the discrete value in the time domain. After obtaining the spectral information, the frequency domain index calculation formula is shown in Table 2. For the convenience of experimental statistics, the frequency domain features extracted from the original vibration signal in the X direction are named XV-F 13 to XV-F 22 , the frequency domain features extracted from its first order IMF components are named XIMF1-F 13 to XIMF1-F 22 . The naming of the frequency domain features extracted from the second and third order IMF components is the same as that of the frequency domain features extracted from the first order IMF components. Table 2. Frequency domain characteristic parameter expression.

Parameter Expressions
Parameter Expressions In order to facilitate the experimental statistics, the time-frequency domain features extracted from the original vibration signal in X direction are named XV-F 23 to XV-F 30 , the time-frequency domain features extracted from its first order IMF components are named XIMF1-F 23 to XIMF1-F 30 . The naming of the time-frequency domain features extracted from the second and third order IMF components is the same as that of the time-frequency domain features extracted from the first order IMF components. The calculation formula is: where: E l is the wavelet energy spectrum of the original signal on n scales. When the scale is a, the wavelet energy spectrum is calculated as follows: where: W f (a, b) is the amplitude of the wavelet transform.

Multi-Scale Features Selection and Health Index Construction
To select the most representative degradation features, eliminate redundant information, ensure that the selected features have good degradation performance and improve the classification accuracy, this paper uses correlation coefficients [27] to complete the selection of degradation-sensitive features. In order to better satisfy the health state classification, a good performance health degradation index needs to be established, and the KPCA algorithm [28] is selected to fuse the optimal multiscale hybrid features after selection, and the health index is constructed according to the cumulative contribution rate of the principal elements. The specific steps are: First, the data are nonlinearly mapped from space R into the high-dimensional space H: The covariance matrix of the sample is expressed as: where: x i (i = 1, 2, . . . , Z) is the sequence sample, Z is the input sequence length, and φ(x i ) is the feature sample point. Solving the characteristic equation for C H : where: V is the eigenvector and λ q is the eigenvalue. Multiplying φ(x i ) equally on both sides of the above equation: After refining V, the above equation is expressed as: where: γ and ω are different spatial signal samples, and α γ is the presence coefficient.
To avoid solving the unspecifiable representation of φ(x i ), the kernel matrix ] is introduced and the above equation is simplified as: where: α µ = (α 1 , α 2 , . . . , α n ) T . Finally, feature downscaling fusion is performed, the feature values are arranged in descending order, η is the cumulative contribution rate threshold, and the cumulative contribution rate of the first J features in the total number of Q features is: According to expert experience, in order to achieve good results in KPCA screening of degraded features, a threshold value η ≥ 85% is generally set to determine the J most important principal features. This paper uses the first principal feature as a health indicator to complete the health state.

Rotating Machinery Health Identification (MSCCNN) Modeling
The structure of the rotating machinery health recognition (MSCCNN) model is shown in Figure 3. Firstly, a multi-scale convolution layer is built to extract multi-scale features under the different field of view sizes. Secondly, the mixed features after multi-scale convolution are spliced and fused, convolved, and pooled to obtain higher level features. The features are input to the attention mechanism module (CBAM) to complete adaptive feature extraction, and the extracted features are convolved and pooled again to extract more abstract features. Finally, the output of the classification results is performed through a fully connected layer, and the model parameters are optimized by improving the loss.
features under the different field of view sizes. Secondly, the mixed features after multiscale convolution are spliced and fused, convolved, and pooled to obtain higher level features. The features are input to the attention mechanism module (CBAM) to complete adaptive feature extraction, and the extracted features are convolved and pooled again to extract more abstract features. Finally, the output of the classification results is performed through a fully connected layer, and the model parameters are optimized by improving the loss.  (1) Multi-scale convolution The multi-scale convolution uses convolution kernels of different sizes to convolve the input features to obtain features at different convolution scales, then splices the features at different convolution scales to receive the fused multi-scale features, and finally integrates the features through pooling. The multi-scale convolution can enrich the proposed data features, extract sensitive feature information from the input data from a global perspective, and then improve the segmentation performance of the data. The multi-scale convolution is shown in Figure 4.  (2) Attention mechanism module The attentional mechanism (CBAM) is a simple and effective attentional module for feed-forward convolutional neural networks. Given an intermediate feature map, the CBAM module sequentially infers attention along two independent dimensions, channel The multi-scale convolution uses convolution kernels of different sizes to convolve the input features to obtain features at different convolution scales, then splices the features at different convolution scales to receive the fused multi-scale features, and finally integrates the features through pooling. The multi-scale convolution can enrich the proposed data features, extract sensitive feature information from the input data from a global perspective, and then improve the segmentation performance of the data. The multi-scale convolution is shown in Figure 4.
features under the different field of view sizes. Secondly, the mixed features after multiscale convolution are spliced and fused, convolved, and pooled to obtain higher level features. The features are input to the attention mechanism module (CBAM) to complete adaptive feature extraction, and the extracted features are convolved and pooled again to extract more abstract features. Finally, the output of the classification results is performed through a fully connected layer, and the model parameters are optimized by improving the loss. (1) Multi-scale convolution The multi-scale convolution uses convolution kernels of different sizes to convolve the input features to obtain features at different convolution scales, then splices the features at different convolution scales to receive the fused multi-scale features, and finally integrates the features through pooling. The multi-scale convolution can enrich the proposed data features, extract sensitive feature information from the input data from a global perspective, and then improve the segmentation performance of the data. The multi-scale convolution is shown in Figure 4.  (2) Attention mechanism module The attentional mechanism (CBAM) is a simple and effective attentional module for feed-forward convolutional neural networks. Given an intermediate feature map, the CBAM module sequentially infers attention along two independent dimensions, channel (2) Attention mechanism module The attentional mechanism (CBAM) is a simple and effective attentional module for feed-forward convolutional neural networks. Given an intermediate feature map, the CBAM module sequentially infers attention along two independent dimensions, channel and space, and subsequently multiplies them by the input features to perform adaptive feature extraction. Because CBAM is a lightweight, general-purpose module, it can be seamlessly integrated into any convolutional neural network architecture with negligible consumption and can be trained end-to-end with the underlying convolutional neural network. The CBAM structure is shown in Figure 5. and space, and subsequently multiplies them by the input features to perform adaptive feature extraction. Because CBAM is a lightweight, general-purpose module, it can be seamlessly integrated into any convolutional neural network architecture with negligible consumption and can be trained end-to-end with the underlying convolutional neural network. The CBAM structure is shown in Figure 5. (3) Custom loss function The loss function is a non-negative real-valued function that measures the degree of inconsistency between the predicted and actual values of the model. The smaller the value of its loss function, the better the robustness of the model. Convolutional neural networks use an error backpropagation algorithm to train the model. The MSCCNN model in this paper uses the "MAE," "MSE," and "Log cosh" loss functions and a modified custom loss function (LM) to perform the test. The "LM" loss function combines the advantages of the "Log cosh" and "MSE" loss functions and avoids their respective disadvantages. The improved custom loss function (LM) formula in this paper is: where denotes the true value and denotes the predicted value. The innovation of the MSCCNN model built in this paper: (1) Enhancement of feature extraction field of view and enrichment of feature information by multi-scale convolution. (2) Introducing a self-attention mechanism to complete degradation-sensitive feature self-extraction. (3) Customized loss function to ensure optimization of model parameters and improve model robustness.

XJTU-SY Data Set Introduction
In this paper, the effectiveness of the proposed method is verified using accelerated life test data of XJTU-SY bearing, and the experimental bench consists of an AC motor, speed controller, rotating shaft, support bearing, and test bearing [29]. The experimental data consisted of failure data of 15 bearings in total for three different operating conditions. The accelerated life experiment platform of rolling bearings of Xi'an Jiaotong University is shown in Figure 6, and the bearing data information is shown in Table 3.

(3) Custom loss function
The loss function is a non-negative real-valued function that measures the degree of inconsistency between the predicted and actual values of the model. The smaller the value of its loss function, the better the robustness of the model. Convolutional neural networks use an error backpropagation algorithm to train the model. The MSCCNN model in this paper uses the "MAE," "MSE," and "Log cosh" loss functions and a modified custom loss function (LM) to perform the test. The "LM" loss function combines the advantages of the "Log cosh" and "MSE" loss functions and avoids their respective disadvantages. The improved custom loss function (LM) formula in this paper is: where y denotes the true value and f (x) denotes the predicted value. The innovation of the MSCCNN model built in this paper: (1) Enhancement of feature extraction field of view and enrichment of feature information by multi-scale convolution. (2) Introducing a self-attention mechanism to complete degradation-sensitive feature self-extraction. (3) Customized loss function to ensure optimization of model parameters and improve model robustness.

XJTU-SY Data Set Introduction
In this paper, the effectiveness of the proposed method is verified using accelerated life test data of XJTU-SY bearing, and the experimental bench consists of an AC motor, speed controller, rotating shaft, support bearing, and test bearing [29]. The experimental data consisted of failure data of 15 bearings in total for three different operating conditions. The accelerated life experiment platform of rolling bearings of Xi'an Jiaotong University is shown in Figure 6, and the bearing data information is shown in Table 3.
In this paper, the experimental data of bearings 1, 2, 3, and 4 under working condition one and bearings 1, 3, 4, and 5 under working condition two are selected as the training set, with a total of 1969 sample data. The sample data of the training set is divided into two parts: validation and test. The ratio of validation samples to test samples is 8:2.   In this paper, the experimental data of bearings 1, 2, 3, and 4 under working condition one and bearings 1, 3, 4, and 5 under working condition two are selected as the training set, with a total of 1969 sample data. The sample data of the training set is divided into two parts: validation and test. The ratio of validation samples to test samples is 8:2.

XJTU-SY Data Set Experimental Verification and Analysis
(1) Multi-scale features extraction and selection Taking bearing 1 under working condition one as an example, a multi-scale hybrid features set is extracted and constructed. First, its data set is decomposed to third-order using EWT. Then the original vibration signal and the third order IMF components are extracted from the time domain, frequency domain, and time-frequency domain with a total of 30 features. Since the test bearing has two kinds of original vibration signals, the horizontal original vibration signal and its third-order IMF components are extracted with a total of 120 features, and the vertical original vibration signal and its third-order IMF components are extracted with a total of 120 features, so a total of 240 multi-scale hybrid features are extracted after processing. The processing results are shown in Table 4.

XJTU-SY Data Set Experimental Verification and Analysis
(1) Multi-scale features extraction and selection. Taking bearing 1 under working condition one as an example, a multi-scale hybrid features set is extracted and constructed. First, its data set is decomposed to third-order using EWT. Then the original vibration signal and the third order IMF components are extracted from the time domain, frequency domain, and time-frequency domain with a total of 30 features. Since the test bearing has two kinds of original vibration signals, the horizontal original vibration signal and its third-order IMF components are extracted with a total of 120 features, and the vertical original vibration signal and its third-order IMF components are extracted with a total of 120 features, so a total of 240 multi-scale hybrid features are extracted after processing. The processing results are shown in Table 4.
Secondly, the correlation coefficient is used to select features in the multi-scale mixed features vector set that can better characterize the fault degradation process of bearing 1 under operating condition one. In this paper, 60 multi-scale mixed features with high ratings are selected to form the optimal feature vector set. The multi-scale hybrid features selection and the comparison of superior difference features are shown in Figure 7.  Secondly, the correlation coefficient is used to select features in the multi-scale mixed features vector set that can better characterize the fault degradation process of bearing 1 under operating condition one. In this paper, 60 multi-scale mixed features with high ratings are selected to form the optimal feature vector set. The multi-scale hybrid features selection and the comparison of superior difference features are shown in Figure 7. Finally, the processed multi-scale mixed features vector set is then fused and downscaled by KPCA to output the health status index of the bearing. The constructed rolling bearing health state index is shown in Figure 8. Finally, the processed multi-scale mixed features vector set is then fused and downscaled by KPCA to output the health status index of the bearing. The constructed rolling bearing health state index is shown in Figure 8. According to the constructed health status index, analysis of domestic and international literature related to the field of research equipment health identification reveals that equipment performance degradation can usually be classified into four states [30,31]. Therefore, combined with the fundamental operation of rolling bearings and expert experience, the rolling bearing health status is divided into four levels [32], and each level and the corresponding health status index interval are shown in Table 5.  According to the constructed health status index, analysis of domestic and international literature related to the field of research equipment health identification reveals that equipment performance degradation can usually be classified into four states [30,31]. Therefore, combined with the fundamental operation of rolling bearings and expert experience, the rolling bearing health status is divided into four levels [32], and each level and the corresponding health status index interval are shown in Table 5. (2) MSCCNN model training and validation. The MSCCNN model and training parameters built in this paper are as follows: 1 The 60 features are convolved by convolution of 3 channels and 3 scales, with a convolution kernel size of 3, 5, and 7, channel size of 32, and activation function of Relu activation function. 2 The multi-scale hybrid features are stitched by the Add function and input to the pooling layer for pooling with a pooling step of 2. 3 The convolution operation is performed by 1 convolution layer with a convolution kernel size of 3 and a channel size of 64, and using the Relu activation function. 4 The features are input to the CBAM module for more advanced feature extraction, followed by convolutional operation of the features through one convolutional layer, and the convolved features are input to the pooling layer, where the convolutional kernel size is 3, the channel size is 32, the pooling step is 2, and the activation function is the Relu activation function. 5 The features after the convolution are input to the last convolutional layer for the convolution operation, where the convolution kernel size is 5, the channel size is 16, the pooling step size is 2, and the activation function is the Relu activation function. 6 Finally, the bearing health status is classified into four categories: health, good, deterioration and failure through the full connection layer and "Softmax" function. 7 The optimizer is Adam, the learning rate is 0.001, and the training times are 200.
During the experiment, the optimizer and loss function of the model have a significant impact on how well the model is trained. When the model is trained, firstly, the improved custom loss function is selected, and the optimizer is selected from SGD, RMSprop, and Adam for comparison. Through the experiment, the model reaches the optimum, and the model accuracy is the highest when the Adam optimizer is selected. The recognition accuracy results of different optimizers are shown in Figure 9. Then, after the optimizer of the model is determined, the loss functions "MAE," "MSE," and "Log cosh" are selected for comparison with the improved custom loss functions. Through experiments, the enhanced custom loss function (LM) is chosen to make the highest accuracy of the model in this paper. It is proved that the improved custom loss function in this paper outperforms the "MAE," "MSE," and "Log cosh" loss functions. The recognition accuracies of different loss functions are shown in Figure 10. Then, after the optimizer of the model is determined, the loss functions "MAE," "MSE," and "Log cosh" are selected for comparison with the improved custom loss functions. Through experiments, the enhanced custom loss function (LM) is chosen to make the highest accuracy of the model in this paper. It is proved that the improved custom loss function in this paper outperforms the "MAE," "MSE," and "Log cosh" loss functions. The recognition accuracies of different loss functions are shown in Figure 10.

SGD
RMSprop Adam Then, after the optimizer of the model is determined, the loss functions "MAE," "MSE," and "Log cosh" are selected for comparison with the improved custom loss functions. Through experiments, the enhanced custom loss function (LM) is chosen to make the highest accuracy of the model in this paper. It is proved that the improved custom loss function in this paper outperforms the "MAE," "MSE," and "Log cosh" loss functions. The recognition accuracies of different loss functions are shown in Figure 10.  In summary, the finalized optimizer of the MSCCNN model is Adam, and the improved custom loss function is selected as the loss function of the model in this paper. When the parameters of the MSCCNN model are optimal, the confusion matrix of the test results of the model for bearing health status identification in the validation sample is shown in Figure 11. In summary, the finalized optimizer of the MSCCNN model is Adam, and the improved custom loss function is selected as the loss function of the model in this paper. When the parameters of the MSCCNN model are optimal, the confusion matrix of the test results of the model for bearing health status identification in the validation sample is shown in Figure 11. The accuracy of the model in assessing the health status of the bearings as measured by the test samples is as high as 98.22%, and the accuracy of identification is good in each state.

MAE
To further validate the accuracy and superiority of the MSCCNN health state assessment model proposed in this paper, it is compared with support vector machine (SVM), CNN, CNN + CBAM, MSCNN, and MSCCNN + conventional features (60 features in the time domain, frequency domain and time-frequency domain) [33], respectively, and the same training set is used to conduct the experiments. The confusion matrix results of each method for the accuracy of bearing health status identification are shown in Figure 12, and the specific data are shown in Table 6. The accuracy of the model in assessing the health status of the bearings as measured by the test samples is as high as 98.22%, and the accuracy of identification is good in each state.
To further validate the accuracy and superiority of the MSCCNN health state assessment model proposed in this paper, it is compared with support vector machine (SVM), CNN, CNN + CBAM, MSCNN, and MSCCNN + conventional features (60 features in the time domain, frequency domain and time-frequency domain) [33], respectively, and the same training set is used to conduct the experiments. The confusion matrix results of each method for the accuracy of bearing health status identification are shown in Figure 12, and the specific data are shown in Table 6.
To further validate the accuracy and superiority of the MSCCNN health state assessment model proposed in this paper, it is compared with support vector machine (SVM), CNN, CNN + CBAM, MSCNN, and MSCCNN + conventional features (60 features in the time domain, frequency domain and time-frequency domain) [33], respectively, and the same training set is used to conduct the experiments. The confusion matrix results of each method for the accuracy of bearing health status identification are shown in Figure 12, and the specific data are shown in Table 6.  From the specific data in Figure 12 and Table 6, it can be seen that the multi-scale hybrid features and improved convolutional neural network (MSCCNN) proposed in this paper are more accurate for the bearing health status identification method, with 5.83%, 3.30%, 2.29%, 1.52%, and 4.31% improvement compared to SVM, CNN, CNN + CBAM, MSCNN, and MSCCNN + conventional features methods, respectively.

XJTU-SY Data Set MSCCNN Model Generalization Performance Validation
Data samples of Bearing1_5, Bearing2_2, and Bearing3_3 are selected for testing to verify the model generalization capability. The results of the MSCCNN model for bearing health status identification are shown in Table 7. The data set [34] was provided by the PRONOSTIA experimental platform of the FEMTO-ST Institute, which describes the degradation of ball bearings throughout their service life and consists of three main parts, including the rotating part, the degradation generating part (load part) and the measuring part. The raw vibration signal is collected by installing acceleration sensors on the outer ring of the test stand, and the radial load force is applied to accelerate the degradation of the ball bearing. The bearing vibration signals are divided into two directions, horizontal and vertical, with a sampling rate of 25.6 kHz, sampled every 10 s and lasting 0.1 s, so 2560 data points are collected each time. 3 operating conditions and 17 bearings are included in the PHM2012 challenge data set. The specific information of the bearing operating conditions data used in this paper is shown in Table 8. In this paper, the degraded data of bearing 1 and 3 under working condition one, bearing 1 under working condition two, and bearing 2 under working condition three are selected to form the training set with a total of 7726 sample data. The sample data of the training set is divided into two parts: validation and test, and the ratio of validation samples to test samples is 8:2.
By the method of this paper, the health status of the ball bearing is identified, and the sample classification and identification results are shown in Table 9.
As shown in Table 9, the multi-scale hybrid features and improved convolutional neural network (MSCCNN) proposed in this paper are more accurate in identifying the health status of ball bearings, with 5.63%, 1.88%, 1.36%, 1.49%, and 3.69% improvement compared to the SVM, CNN, CNN + CBAM, MSCNN, and MSCCNN + traditional features, respectively. To verify the generalization ability of the model, the degradation data of bearing 2 under working condition two and bearing 3 under working condition three were selected for testing. The accuracy of the MSCCNN model for identifying the health status of the two bearings was 80.93% and 84.10%, respectively, with an average accuracy of 82.52%, which proved that the MSCCNN model has good generalization ability for identifying the health status of the bearings.

Platform Introduction
A rotating machinery (reducer) experimental platform is built to verify the effectiveness of the proposed method. The laboratory platform mainly consists of a magnetic powder brake, a gear reducer, an AC motor, a shaft coupling, and various sensors. Vibration, speed, temperature, and other sensors are installed to collect signals during the operation of the reducer. The specific types of sensors and their installation locations are shown in Figure 13.  The data acquisition card model of the decelerator experimental platform is MCC USB-1608FS-Plus, the vibration acceleration sensor model is CT1050LC with a sampling frequency of 5 kHz, and the speed and torque sensor model is HCNJ-101 with a sampling frequency of 1 kHz. The vibration sensor CT1050LC is installed in the horizontal and vertical direction of the input shaft of the reducer. The frequency response range is 0.5-1.5 kHZ, and the vibration measurement range is 0-10 g. The torque sensor measures the torque range of 0-200 N.m, the torque output is 5-15 KHz, and the measuring speed is 0-3000 r/min. The sensor sends the collected signals to the cloud server for storage and then carries out relevant data analysis by reading the data stored in the server database.
Since vibration signals contain rich state information, the following two parameters are selected in this paper as the data basis for the health state identification of the reducer. The specific parameters are shown in Table 10. In the experiment, the vibration data sampling frequency is 5000 Hz, and the vibration signal is recorded once in 1 min.

Serial Number
Parameter Name Unit 1 X-directional vibration of reducer input shaft g The data acquisition card model of the decelerator experimental platform is MCC USB-1608FS-Plus, the vibration acceleration sensor model is CT1050LC with a sampling frequency of 5 kHz, and the speed and torque sensor model is HCNJ-101 with a sampling frequency of 1 kHz. The vibration sensor CT1050LC is installed in the horizontal and vertical direction of the input shaft of the reducer. The frequency response range is 0.5-1.5 kHZ, and the vibration measurement range is 0-10 g. The torque sensor measures the torque range of 0-200 N.m, the torque output is 5-15 KHz, and the measuring speed is 0-3000 r/min. The sensor sends the collected signals to the cloud server for storage and then carries out relevant data analysis by reading the data stored in the server database.
Since vibration signals contain rich state information, the following two parameters are selected in this paper as the data basis for the health state identification of the reducer. The specific parameters are shown in Table 10. In the experiment, the vibration data sampling frequency is 5000 Hz, and the vibration signal is recorded once in 1 min. The multi-scale optimal and worst features curves extracted in this paper are shown in Figure 14.
torque range of 0-200 N.m, the torque output is 5-15 KHz, and the measuring speed is 0-3000 r/min. The sensor sends the collected signals to the cloud server for storage and then carries out relevant data analysis by reading the data stored in the server database.
Since vibration signals contain rich state information, the following two parameters are selected in this paper as the data basis for the health state identification of the reducer. The specific parameters are shown in Table 10. In the experiment, the vibration data sampling frequency is 5000 Hz, and the vibration signal is recorded once in 1 min.

Serial Number
Parameter Name Unit 1 X-directional vibration of reducer input shaft g 2 Y-directional vibration of reducer input shaft g

Decelerator Data Experimental Results and Analysis
The multi-scale optimal and worst features curves extracted in this paper are shown in Figure 14.  From the figure, it can be seen that the good characteristic curves in the multi-scale hybrid features have obvious monotonicity, which can well characterize the degradation status of the reducer with time. The health state index of the reducer constructed by multi-scale hybrid features selection and KPCA dimensionality reduction is shown in Figure 15. From the figure, it can be seen that the good characteristic curves in the multi-scale hybrid features have obvious monotonicity, which can well characterize the degradation status of the reducer with time. The health state index of the reducer constructed by multiscale hybrid features selection and KPCA dimensionality reduction is shown in Figure 15. is proposed to improve the accuracy of health status recognition of rotating machinery. The multi-scale hybrid features are extracted based on the modal components of the empirical wavelet decomposition. The correlation coefficient is used to select the degradationsensitive features, and the kernel principal component analysis is used to construct the rotating machinery health indexes based on which the health state classification is completed. A convolutional neural network model incorporating multi-scale convolution and hybrid attention mechanism modules is developed for the health status identification of rotating machines, and the loss function is customized to improve the superiority and generalization ability of the model. The accuracy of model identification was 98.22% on the bearing degradation dataset of Xi'an Jiaotong University and 98.67% on the gearbox platform degradation dataset. It effectively solves the problem of difficulty in identifying the health status of rotating machinery due to the difficulty in extracting data features under severe working conditions. Future research should: (1) study signal noise reduction methods under strong noise background and signal feature extraction methods sensitive to degradation; (2) study signal feature extraction and selection methods under variable working conditions and establish rotating machinery health state identification models under complex working conditions. (3) deep learning evaluation models are poorly interpretable, and subsequent research should be conducted on model interpretability and feature visualization.
Funding: This research was funded by the National Natural Science Foundation of China, grant number "51834006" and "52274158".

Data Availability Statement:
The data set of the reducer platform provided in this study is available from the corresponding author upon request. Due to project confidentiality, these data will not be made public.