Fault Diagnosis of Rotating Machinery: A Highly Efficient and Lightweight Framework Based on a Temporal Convolutional Network and Broad Learning System

Efficient fault diagnosis of rotating machinery is essential for the safe operation of equipment in the manufacturing industry. In this study, a robust and lightweight framework consisting of two lightweight temporal convolutional network (LTCN) backbones and a broad learning system with incremental learning (IBLS) classifier called LTCN-IBLS is proposed for the fault diagnosis of rotating machinery. The two LTCN backbones extract the fault’s time–frequency and temporal features with strict time constraints. The features are fused to obtain more comprehensive and advanced fault information and input into the IBLS classifier. The IBLS classifier is employed to identify the faults and exhibits a strong nonlinear mapping ability. The contributions of the framework’s components are analyzed by ablation experiments. The framework’s performance is verified by comparing it with other state-of-the-art models using four evaluation metrics (accuracy, macro-recall (MR), macro-precision (MP), and macro-F1 score (MF)) and the number of trainable parameters on three datasets. Gaussian white noise is introduced into the datasets to evaluate the robustness of the LTCN-IBLS. The results show that our framework provides the highest mean values of the evaluation metrics (accuracy ≥ 0.9158, MP ≥ 0.9235, MR ≥ 0.9158, and MF ≥ 0.9148) and the lowest number of trainable parameters (≤0.0165 Mage), indicating its high effectiveness and strong robustness for fault diagnosis.


Introduction
Rotating machinery, which consists of many rolling components, is the most widely used mechanical equipment [1] and is essential in many fields, such as aviation, transportation, and chemical industries. Due to technological advances in the manufacturing industry, rotating machinery has become increasingly complex and automated, increasing the requirements for the safe operation of rotating machinery [2]. However, some key components, such as bearings and gears, are susceptible to damage in complex and harsh operating environments, resulting in significant economic loss and human casualties [3,4]. Therefore, accurate and efficient fault diagnosis of the key components is crucial to guarantee the stable and safe operation of rotating machinery.
The development of artificial intelligence technology has attracted widespread attention for use in fault diagnosis based on monitoring data collected from sensors. Traditional intelligent fault diagnosis consists of two steps: (1) signal analysis of the monitoring data and manual features selection; (2) artificial fault assessment based on the signal analysis results and/or classification algorithms [2,5]. However, the effectiveness of the traditional fault diagnosis method relies largely on the experience of the maintenance personnel, leading to variable diagnosis results [6]. Besides, the non-stationarity of vibration signals and the nonlinear characteristics of rotating machinery in different scenarios can interfere with the extraction and identification of fault features [7]. Thus, it is necessary to research more intelligent methods to understand the relationship between the monitoring data and the equipment conditions to reduce the dependence of fault diagnosis on expert experience.
Deep learning-based fault diagnosis (DLFD), which can automatically extract features and establish the relationship between monitoring data and fault modes [8], has been increasingly applied to fault diagnosis research since it reduces the reliance on expert knowledge. DLFD methods can be categorized as: (1) fault diagnosis based on raw monitoring data and (2) fault diagnosis based on time-frequency domain-transformed monitoring data. Xie et al. [9] developed a deep one-dimensional (1-D) convolutional neural network (CNN) model to diagnose bearing faults using raw monitoring data. The effectiveness of the model was verified on the Case Western Reserve University (CWRU) dataset, achieving an average accuracy of 0.9958. Zhang et al. [10] proposed a deep residual network (ResNet) model based on residual learning to diagnose rotating machinery faults. The model used the raw data after 1-D convolution and achieved the highest accuracy of 0.9999 on the CWRU dataset. Although some studies, e.g., [9,10], successfully proposed methods to diagnose faults using raw data, the structure and complexity of the equipment produce non-stationary raw data [11], resulting in an inability to identify the fault features of the rotating machinery accurately in many applications. Appropriate methods to transform the data into the time-frequency domain can substantially increase the information extracted from the non-stationary raw signal [12]. Liang et al. [13] used time-frequency features extracted via wavelet transform (WT) and designed an effective fault diagnosis method consisting of a generative adversarial network and a CNN. The method achieved average accuracies of 0.9924 and 0.9789 on the CWRU dataset and a laboratory dataset, respectively. Zhao et al. [14] proposed a fault diagnosis model called multiple wavelet regularized deep ResNet. The model prevented overfitting for insufficient training data and improved the average accuracy by 8.19% and 3.62% compared with traditional CNNs and deep ResNets, respectively, on an experimental dataset. Some studies [13,14] designed outstanding fault diagnosis methods using the time-frequency features of the raw data. Nevertheless, some studies [15,16] have shown that the time-frequency analysis of the raw data could corrupt the structure of the vibration signals and alter the fault features, causing a one-sided fault feature representation and reducing the accuracy of fault identification. Therefore, it is necessary to research better methods to extract the complementary fault features and analyze the measured signals comprehensively to compensate for the shortcomings of existing fault diagnosis methods.
Another concern in fault diagnosis is the complexity (number of trainable parameters) of models. Many trainable parameters require high computing power and reduce the practicality of the models. The emergence of broad learning systems (BLSs) [17] has enabled the reduction in the computing cost for fault diagnosis while ensuring high accuracy. A BLS can be rapidly expanded by adding additional nodes through incremental learning, reducing the computational costs because training is not performed on the entire model [18]. BLSs have been used for fault diagnosis in the last five years. Fu et al. [18] designed an efficient fault diagnosis algorithm called adaptive BLS (ABLS) with two adaptive strategies to accelerate algorithm convergence and prevent under-fitting and over-fitting. Zhao et al. [19] proposed a fault diagnosis framework for rotors based on principal component analysis (PCA) and BLS, reducing the linear correlation between the data and eliminating redundant fault features. Guo et al. [20] designed a novel recurring BLS fault diagnosis model based on the original BLS. The model inherited the advantages of the BLS and achieved nearly 100% accuracy on two public datasets. The BLS has proven successful for fault diagnosis; however, although the single hidden layer structure of the BLS results in a low computational cost and fast calculation speed, the deep internal features contained in the data have not been sufficiently exploited. In order to solve these problems, we propose an intelligent framework (LTCN-IBLS) consisting of a 1-D lightweight temporal convolutional network (LTCN) backbone, a twodimensional (2-D) LTCN backbone, and a BLS with incremental learning (IBLS) classifier to diagnose faults of rotating machinery. The comparison of the LTCN-IBLS and other methods is summarized in Table 1, where the TDFs, TFDFs, and HLSFs denote time-domain features, time-frequency domain features, and high-level semantic features, respectively. The main contributions of this paper can be summarized as follows: (1) The 1-D LTCN and 2-D LTCN are proposed to diagnose faults of rotating machinery.
The number of trainable parameters of the LTCN-IBLS is lower than that of many networks that only use 1-D convolution.  The rest of this paper is organized as follows. Section 2 presents the related works. Section 3 describes the details of the proposed LTCN-IBLS framework. Section 4 presents the details of the datasets and experiments and provides the results and discussion. Section 5 concludes the paper.

Continuous Wavelet Transform
Suitable data can significantly improve the performance of DLFD models. With the development of sensor technology, many types of sensors, such as accelerometers and built-in encoders, have been used for mechanical condition monitoring [21]. The temporal signal collected by a sensor during working conditions is complex and non-stationary, complicating the identification of the fault features [11,22]. Therefore, signal-processing algorithms, such as Fourier transform (FT), short-time Fourier transform (STFT), and wavelet transform (WT), are useful for fault diagnosis. FT is a widely used signal-processing algorithm but has several disadvantages [23]. It cannot display the local frequency-domain features of the signal, and the corresponding time-domain information prevents the accurate processing of non-stationary signals. Although STFT can process non-stationary signals, it cannot change the time-frequency resolution during signal processing [24]. The WT has more basic functions and can perform multi-resolution signal analysis; thus, it is a preferred algorithm for fault diagnosis of rotating machinery [25,26]. Therefore, the WT is employed in our work.
CWT is an excellent signal processing tool. Its mathematical expression is as follows: where x(t) is the raw 1-D vibration signal. θ and * denote the wavelet mother function and the operator of the complex conjugate, respectively. ε denotes the translation parameter, and γ represents the scale factor. γ −1/2 is the normalization of energy between each scale so that the transformed signal has the same energy on each scale [27]. The CWT converts the raw time-domain signals into 2-D time-frequency images that can be used as the input of the convolutional layer [28]. The CWT has a fast response speed and provides information with a good balance between the time and frequency resolutions [29,30]. Therefore, we employ the CWT as the signal processing tool to convert the raw data from the time domain into the time-frequency domain.

Temporal Convolutional Network
The TCN, an improved 1-D CNN, was proposed by Bai et al. [31] and has a powerful memory ability to mine historical information from sequential data. The structure of the TCN is shown in Figure 1. It utilizes dilated causal convolution (DCC) and residual learning [32]. The K and D in Figure 1 present the kernel size and dilation, respectively. Unlike the traditional 1-D CNN, the TCN utilizes DCC to capture more useful historical information without stacking too many layers [33]. Furthermore, the DCC results in two other characteristics. First, the current output of the TCN is only affected by the information from the past moment. Second, the output and input of the TCN have the same sequence length. Residual learning enables the TCN to eliminate the impact of gradient explosion or vanishing on the performance of the deep network. TCNs have advantages over recurrent neural networks (RNNs) in concurrent data processing. The performance of TCNs has exceeded RNNs in many fields such as audio synthesis and machine translation [31]. Therefore, TCNs have contributed to researchers removing the dependence of the RNNs in the tasks of sequence modeling and temporal features extraction, which is beneficial to the diversity of solutions.

Broad Learning System with Incremental Learning of Additional Enhancement Nodes
Traditional DL algorithms typically suffer from a complex model structure, causing time-consuming training. Chen and Liu [17] used a random vector functional-link neural network and proposed the BLS to overcome the dependence on the deep structure of the DL algorithms. The incremental learning of additional enhancement nodes (ILAEN) [17], used in this study, adjusts the model to provide better performance without requiring

Broad Learning System with Incremental Learning of Additional Enhancement Nodes
Traditional DL algorithms typically suffer from a complex model structure, causing time-consuming training. Chen and Liu [17] used a random vector functional-link neural network and proposed the BLS to overcome the dependence on the deep structure of the DL algorithms. The incremental learning of additional enhancement nodes (ILAEN) [17], used in this study, adjusts the model to provide better performance without requiring retraining. The structure of the IBLS is shown in Figure 2. The mapped features (Z i ) and the enhancement nodes (H i ) are expressed as follows: where Z n ≡ [Z 1 , Z 2 , · · · , Z n ]. X is the input data. W ei and b ei represent the weights and biases of the mapped feature node of the ith set, respectively. W hj and b hj are the weights and biases of the enhancement node of the jth set. ϕ(·) and ε(·) are activation functions. W m can be obtained from the following equation: where (·) ℵ represents the operation to derive the pseudo-inverse matrix. When the performance of the BLS with the ILAEN is unsatisfactory, the model can be rapidly adjusted by adding additional enhancement nodes. Only the pseudo-inverse matrix of the additional enhancement nodes has to be computed without retraining the entire model. Denote A m = [Z n |[H 1 , H 2 , · · · , H m ]] and F = (A m ) ℵ H m+1 ; the weights (W m+1 ) of additional enhancement nodes are defined in Equations (6) and (7). The derivation process of the IBLS has been described in [17]. The BLS has been widely used in classification and regression tasks due to its advantages of fast training speed and high accuracy. Besides, the BLS enables researchers to model without limiting themselves to deep network structure. However, the accuracy of BLS largely depends on the number of nodes, resulting in redundant nodes and trainable parameters [34]. where Sensors 2023, 23, 5642

Evaluation Metrics
Four common evaluation metrics (accuracy, macro-precision (MP), macro-recall (MR), and macro-F1 score (MF)) are used to evaluate the model's classification performance. The accuracy is the ratio of the number of corrected classified samples to the total number of samples [35]. The MP and the MR represent the average number of precision values for each label and the average number of recall values for each label. The MF is the harmonic average of the MP and the MR. The four evaluation metrics are defined as follows: where and represent TP/(TP + FP) and TP/(TP + FN), respectively. TP, FP, TN, and FN denote the true positive, false positive, true negative, and false negative, respectively.

Evaluation Metrics
Four common evaluation metrics (accuracy, macro-precision (MP), macro-recall (MR), and macro-F 1 score (MF)) are used to evaluate the model's classification performance. The accuracy is the ratio of the number of corrected classified samples to the total number of samples [35]. The MP and the MR represent the average number of precision values for each label and the average number of recall values for each label. The MF is the harmonic average of the MP and the MR. The four evaluation metrics are defined as follows: where P i and R i represent TP/(TP + FP) and TP/(TP + FN), respectively. TP, FP, TN, and FN denote the true positive, false positive, true negative, and false negative, respectively.

Proposed Method
An intelligent fault diagnosis framework (shown in Figure 3a) for rotating machinery (LTCN-IBLS), which has excellent fault diagnosis performance and fewer trainable parameters, is proposed based on the 1-D LTCN, the 2-D LTCN, and the IBLS. In traditional fault diagnosis models, the extracted features must be input into the fully connected classifier for fault identification. However, stacking multiple dense layers is generally required to meet diagnostic requirements and improve accuracy, resulting in excessive trainable parameters for the classifier. Therefore, we replace the dense layers with the IBLS to achieve a lightweight classifier. In addition, we propose a feature extraction stage (stage 1) with a lightweight network structure to perform adaptive representation learning of data before identifying the fault classes to minimize the redundancy of the IBLS nodes and trainable parameters and achieve high accuracy.
In stage 1, two branches are used to obtain comprehensive fault information from time-frequency features and temporal features: a 1-D LTCN backbone (branch 1) and a 2-D LTCN backbone (branch 2). The raw vibration signal is divided into multiple samples without shuffling; each sample contains N data points. We consider the following two factors to determine the value of N. First, each sample contains enough data points to represent the fault features of a full rotation cycle of the equipment. Second, we minimize N to satisfy the first factor and reduce the computational cost. Therefore, the rules for determining N can be summarized as follows: where f s is the sampling frequency, and s r is the rotation speed of the equipment.

Proposed Method
An intelligent fault diagnosis framework (shown in Figure 3a) for rotating machinery (LTCN-IBLS), which has excellent fault diagnosis performance and fewer trainable parameters, is proposed based on the 1-D LTCN, the 2-D LTCN, and the IBLS. In traditional fault diagnosis models, the extracted features must be input into the fully connected classifier for fault identification. However, stacking multiple dense layers is generally required to meet diagnostic requirements and improve accuracy, resulting in excessive trainable parameters for the classifier. Therefore, we replace the dense layers with the IBLS to achieve a lightweight classifier. In addition, we propose a feature extraction stage (stage 1) with a lightweight network structure to perform adaptive representation learning of data before identifying the fault classes to minimize the redundancy of the IBLS nodes and trainable parameters and achieve high accuracy.   In stage 1, two branches are used to obtain comprehensive fault information from time-frequency features and temporal features: a 1-D LTCN backbone (branch 1) and a 2-D LTCN backbone (branch 2). The raw vibration signal is divided into multiple samples without shuffling; each sample contains N data points. We consider the following two factors to determine the value of N. First, each sample contains enough data points to represent the fault features of a full rotation cycle of the equipment. Second, we minimize In branch 1 (shown in Figure 3a), the raw signal is converted into time-frequency images using the CWT. The sampling period of the frequency is the same as that of the vibration signal. The Morlet wavelet is used in our work because its shape is similar to the pulse signal of mechanical faults, facilitating fault diagnosis [4].  Figure 3b) consists of two group convolution (GConv) layers, two cutting layers, two batch normalization (BN) layers, two ReLU layers, one channel shuffle (CS) layer, and one adaptive max pooling (AMP) layer. The 2-D LTCN Block 1 and 2-D LTCN Block 2 are shown in Figure 3c. They consist of three GConv layers, two cutting layers, two BN layers, two ReLU layers, two CS layers, and two AMP layers. The GConv layers substantially reduce the number of trainable parameters of the 2-D LTCN backbone by grouping input time-frequency features. The CS layers ensure the information flow between different groups by exchanging the features of different groups. The cutting layers remove the frequency features from the feature images in future moments to ensure that the output of the 2-D LTCN backbone has strict time constraints. The AMP 1 of the two 2-D LTCN blocks adjusts the size of the feature images after the cutting layers to obtain a residual connection. The output size of the AMP of the 2-D LDCC block is 25 × 25. The output sizes of the 2-D LTCN block 1 and block 2 are 11 × 11 and 1 × 1, respectively. It is worth noting that the cutting layers are activated only in the direction of the time axis during convolution to ensure that the frequency information at the edge of the feature maps is not lost. The detailed parameters of the 2-D LTCN backbone are summarized in Table 2. In branch 2, a 1-D LTCN backbone with a one-dimensional lightweight DCC (1-D LDCC) block and two 1-D LTCN blocks is utilized to extract the sequential features from the raw data. The 1-D LDCC Block (shown in Figure 3d) is composed of two 1-D GConv layers, two cutting layers, two BN layers, two ReLU layers, one CS layer and one AMP layer.  Table 2. The 2-D and 1-D LTCN backbones ensure that the feature extraction stage of LTCN-IBLS has sufficient receptive fields to extract representative temporal features and high-level semantic information from the time-frequency images and raw vibration signals while minimizing the number of trainable parameters. The cross-entropy function (Equation (13)) is used as the loss function for training the 2-D LTCN backbone and the 2-D LTCN backbone. The time-frequency features (F 1 ) extracted by the 2-D LTCN backbone and the sequential features (F 2 ) extracted by the 1-D LTCN backbone are fused into fused feature vectors (Fu).
After completing the representation learning of data in stage 1, we input the extracted fault features into the fault diagnosis stage (stage 2) and adopt the IBLS to replace the fully connected classifier to improve the ability to establish complex nonlinear relationships between features and fault classes. In stage 2, the IBLS receives the fused feature vectors from the feature extraction stage and diagnoses the faults. Z, H, and A in Figure 3a denote the feature mapping nodes, enhancement nodes, and additional enhancement nodes, respectively. The feature mapping nodes and the enhancement nodes are defined as follows: where Z r ≡ [Z 1 , Z 2 , · · · , Z r ]. ϕ(·) and ε(·) are Tanh functions. W ei and b ei denote the weights and the biases of the ith set of the feature mapping nodes, respectively. They are randomly generated in the range of −1 to 1. W hi and b hi represent the weights and the biases of the jth set of the enhancement node and are randomly generated in the range of −1 to 1. r and s are 1. Equation (5) is used to define the input weights of the output layer in Equation (16). When the faults diagnosis results are unsatisfactory, additional enhancement nodes can be added to improve the results. According to Equations (6) and (7), denoting F = (A s ) ℵ H s+1 , the weights (W s+1 ) between the additional enhancement nodes and the output layer are obtained using Equations (17) and (18). where (·) ℵ represents the operation to acquire the pseudo-inverse matrix. Y denotes the fault labels.
The proposed LTCN-IBLS framework is programmed using PyTorch 1.7.1. The framework's workflow is summarized in Figure 4.
(•) ℵ represents the operation to acquire the pseudo-inverse matrix. Y denotes the fault labels.
The proposed LTCN-IBLS framework is programmed using PyTorch 1.7.1. The framework's workflow is summarized in Figure 4.

Results and Discussion
We used three datasets (two public datasets and one laboratory dataset) to verify the fault diagnosis performance of the proposed LTCN-IBLS framework and compared it with six other powerful models (R-O-IBLS, Deep Convolutional Neural Networks with Wide First-layer Kernels (WDCNN) [36], 1-D deep CNN (1-D DCNN) [9], Deep ResNet [10], CWT-CNN [29], and AlexNet [37]). In the R-O-IBLS, the 2-D LTCN backbone shown in  Figure 3a was replaced by a fully connected layer. After every five epochs during the training process, the learning rates were multiplied by 0.9. The batch sizes were 0.1 times the number of training samples. When the loss changes of the validation set were less than 0.01, it was assumed the model had converged.
Gaussian white noise with a signal-to-noise ratio (SNR) equal to 0 was introduced into the datasets to verify the models' robustness. Five-fold cross-validation was utilized to assess the performance of the training process. Four evaluation metrics (accuracy, MR, MP, and MF) and the number of trainable parameters were employed to assess the results of the experiments. The objective was to obtain high evaluation metrics while minimizing the number of trainable parameters. All the experimental results are the average value of   Figure 3a was replaced by a fully connected layer. After every five epochs during the training process, the learning rates were multiplied by 0.9. The batch sizes were 0.1 times the number of training samples. When the loss changes of the validation set were less than 0.01, it was assumed the model had converged.

Results and Discussion
Gaussian white noise with a signal-to-noise ratio (SNR) equal to 0 was introduced into the datasets to verify the models' robustness. Five-fold cross-validation was utilized to assess the performance of the training process. Four evaluation metrics (accuracy, MR, MP, and MF) and the number of trainable parameters were employed to assess the results of the experiments. The objective was to obtain high evaluation metrics while minimizing the number of trainable parameters. All the experimental results are the average value of ten repeated trials to reduce the effect of randomness. All the experiments were implemented on a computer with two NVIDIA Tesla V100s graphics cards, and programming was performed with PyTorch 1.7.1.

Dataset Description
The CWRU dataset [38] provided by the bearing center at CWRU is a well-known and representative bearing dataset that contains sufficient standard experimental data. It has been used extensively for fault diagnosis research [39]. The sampling frequency in the experiments is 12 kHz. The equipment used in the experiment is shown in Figure 5. Three types of faults (inner race defect, ball defect, and outer race defect) were created in the experimental bearing components using electro-discharge machining. Each fault class has bearing components with three diameters (7 mils, 14 mils, and 21 mils). The experiments were conducted under four different loads (0 hp, 1 hp, 2 hp, and 3 hp). Therefore, the dataset had ten fault classes (one healthy class and 9 fault classes). The minimum rotation speed of the facility under the four different loads was 1730 rpm. According to (12), the number of data points per sample was 417. Without shuffling or using duplicate data, 70% of the data was used for training, and 30% was used for testing. The details of the dataset are summarized in Table 3.
Sensors 2023, 23, x FOR PEER REVIEW 12 of 24 ten repeated trials to reduce the effect of randomness. All the experiments were implemented on a computer with two NVIDIA Tesla V100s graphics cards, and programming was performed with PyTorch 1.7.1.

Dataset Description
The CWRU dataset [38] provided by the bearing center at CWRU is a well-known and representative bearing dataset that contains sufficient standard experimental data. It has been used extensively for fault diagnosis research [39]. The sampling frequency in the experiments is 12 kHz. The equipment used in the experiment is shown in Figure 5. Three types of faults (inner race defect, ball defect, and outer race defect) were created in the experimental bearing components using electro-discharge machining. Each fault class has bearing components with three diameters (7 mils, 14 mils, and 21 mils). The experiments were conducted under four different loads (0 hp, 1 hp, 2 hp, and 3 hp). Therefore, the dataset had ten fault classes (one healthy class and 9 fault classes). The minimum rotation speed of the facility under the four different loads was 1730 rpm. According to (12), the number of data points per sample was 417. Without shuffling or using duplicate data, 70% of the data was used for training, and 30% was used for testing. The details of the dataset are summarized in Table 3.  The mean values (MV) and standard deviations (SD) of the ablation experiments' results are summarized in Table 4    The mean values (MV) and standard deviations (SD) of the ablation experiments' results are summarized in Table 4 Figure 6) was used to perform a bearing run-to-failure test to collect the bearing data. The four test bearings were installed on a shaft driven by an AC motor with a rotation speed of 2000 rpm. A constant load of 6000 lbs was added to the shaft, and the sampling rate was 20 kHz [41]. After a certain amount of metal debris had adhered to the magnetic plug, an advanced stage of degradation was reached, and the test was stopped. The samples of the three fault classes (healthy, inner race defect, and roller element defect) were obtained from the data files collected on 25 November 2003 [40], and those of the outer race defect were obtained from the data files collected on 19 February 2004 [40]. According to [12], the number of data points of each sample was 600. Without shuffling the data, we used 357,000 data points from each fault class as the training dataset and 153,000 as the test dataset. The details of the dataset are presented in Table 8.  Figure 6) was used to perform a bearing run-to-failure test to collect the bearing data. The four test bearings were installed on a shaft driven by an AC motor with a rotation speed of 2000 rpm. A constant load of 6000 lbs was added to the shaft, and the sampling rate was 20 kHz [41]. After a certain amount of metal debris had adhered to the magnetic plug, an advanced stage of degradation was reached, and the test was stopped. The samples of the three fault classes (healthy, inner race defect, and roller element defect) were obtained from the data files collected on 25 November 2003 [40], and those of the outer race defect were obtained from the data files collected on 19 February 2004 [40]. According to [12], the number of data points of each sample was 600. Without shuffling the data, we used 357,000 data points from each fault class as the training dataset and 153,000 as the test dataset. The details of the dataset are presented in Table 8.    As shown in Table 9, the LTCN-IBLS achieved the best performance in the ablation experiments based on all four evaluation metrics, with an accuracy of 0.9999, an MR of 0.9999, an MP of 0.9999, and an MF of 0.9999. The OLTCN-TLTCN achieved the second-best performance, and all metric values were 0.9961. The OLTCN-IBLS and TLTCN-IBLS had similar values of the four metrics, ranging from 0.9860 to 0.9880. The overall fault diagnosis results of the different models for case 2 are summarized in Table 10. The LTCN-IBLS achieved the highest performance, with the highest values of the four performance metrics and the lowest number of trainable parameters. The values of the LTCN-IBLS were 0.0141, 0.0131, 0.0139, and 0.0141 higher than those of the AlexNet; the parameter number of the LTCN-BLS was 57.0035 M lower than that of the AlexNet.    This dataset consisted of vibration data obtained from an experimental platform of a multi-stage centrifugal air compressor unit (MCACU). It contained several common faults of gears and bearings. Five fault components (three bearings with inner race defects, outer race defects, missing balls, and two gears with missing teeth) were used in our experiments, and single faults and compound faults were considered. The details of the five fault components are displayed in Figure 7 and Table 13. The components and parameters of the experimental platform are shown in Figure 8a and Table 14, respectively. Figure 8b shows the details of the parts inside the red rectangle in Figure 8a, including the positions of the fault components and the velocity sensor used for data sampling.          (Figure 8c) was used to acquire and store the vibration data. The system includes a velocity sensor (Figure 8b), a signal conditioning module (not shown), a server, a data collector module, and a monitor. Two independent experiments with the MCACU were conducted in the same environment (temperature: 20 ± 1 • C, relative humidity: 65 ± 5%) to obtain training samples and test samples. The gross errors were removed during the data collection. The sampling rate was 1024 Hz, and the data were saved every 20 s (8192 data points were saved each time). The data of the first 10 s of each experiment were not saved so that only stable operation data were used in the analysis. According to [12], each sample contained 248 (62 × 4) data points. For each fault class, 347,200 data points were collected, and 1400 training samples were obtained; 148,800 data points were collected, and 600 test samples were obtained. The details of the dataset are summarized in Table 15. The average results of the ablation experiments for case 3 are summarized in Table 16.

Diagnosis Results of the Experiments for Case 3 under Noisy Conditions
The ablation experiments' results under noisy conditions are shown in Table 18.

Diagnosis Results of the Experiments for Case 3 under Noisy Conditions
The ablation experiments' results under noisy conditions are shown in Table 18.  Figure 10. All samples of the healthy condition (label 0) are all correctly classified, and only one sample of the single fault is misclassified. However, there are many misclassification cases in the samples of the compound faults (especially label 2 and label 4).

Discussion
The proposed lightweight fault diagnosis framework for rotating machinery exhibited an outstanding performance for fault classification. The effectiveness of the proposed framework was verified by two experiments: (1) ablation experiments were implemented to reveal the contributions of the framework's components. (2) Comparative experiments were conducted to compare the performance of the proposed framework with other models. The robustness of models was evaluated comprehensively using three datasets under noisy conditions.
The results of the ablation experiments for the three cases indicate that the fault diagnosis performance of the LTCN-IBLS was significantly better than that of the OLTCN-IBLS and TLTCN-IBLS, demonstrating that the time-frequency domain information and sequential features provided non-negligible contributions to the fault diagnosis. We found that the fused features obtained from the feature extraction stage of the proposed framework provided a comprehensive and advanced feature representation of the rotating machinery faults. This approach substantially improved the fault diagnosis performance of the proposed framework. In addition, the results (especially Tables 9 and 18) of the ablation experiments showed that the temporal features and time-frequency features were almost equally important for fault diagnosis. The LTCN-IBLS outperformed the OLTCN-TLTCN in all three cases, indicating that the IBLS has a stronger classification ability than the traditional classifier with a fully connected layer. Meanwhile, ablation

Discussion
The proposed lightweight fault diagnosis framework for rotating machinery exhibited an outstanding performance for fault classification. The effectiveness of the proposed framework was verified by two experiments: (1) ablation experiments were implemented to reveal the contributions of the framework's components. (2) Comparative experiments were conducted to compare the performance of the proposed framework with other models. The robustness of models was evaluated comprehensively using three datasets under noisy conditions.
The results of the ablation experiments for the three cases indicate that the fault diagnosis performance of the LTCN-IBLS was significantly better than that of the OLTCN-IBLS and TLTCN-IBLS, demonstrating that the time-frequency domain information and sequential features provided non-negligible contributions to the fault diagnosis. We found that the fused features obtained from the feature extraction stage of the proposed framework provided a comprehensive and advanced feature representation of the rotating machinery faults. This approach substantially improved the fault diagnosis performance of the proposed framework. In addition, the results (especially Tables 9 and 18) of the ablation experiments showed that the temporal features and time-frequency features were almost equally important for fault diagnosis. The LTCN-IBLS outperformed the OLTCN-TLTCN in all three cases, indicating that the IBLS has a stronger classification ability than the traditional classifier with a fully connected layer. Meanwhile, ablation experiments were conducted using the three datasets with Gaussian white noise (SNR = 0). The LTCN-IBLS achieved the best performance, further demonstrating the rationality of the framework.
In the comparative experiments, the proposed LTCN-IBLS framework was compared with other state-of-the-art models on three datasets. The results showed that the LTCN-IBLS framework had an outstanding performance for fault diagnosis and the number of trainable parameters in the three cases. The proposed 2-D LTCN backbone in the proposed framework provided sufficient receptive fields to extract the global and high-level semantic information from the time-frequency images of the rotating machinery faults, enabling the framework to capture the changing trend of the frequency intensity over time. The proposed 1-D LTCN backbone had a strong temporal feature extraction ability to capture the amplitude variation over time hidden in the raw vibration data. The LTCN-IBLS showed better diagnostic performance than the R-O-IBLS in the three cases, indicating that the time-frequency features extracted by the 2-D LTCN backbone contained information indispensable for accurate fault identification. The features extracted by the 2-D LTCN and 1-D LTCN had higher time constraints than those extracted by the other models (WDCNN, 1-D DCNN, Deep ResNet, CWT-CNN, and AlexNet). The proposed strategy ensured that the feature representations corresponding to different fault classes could be distinguished, enabling the classifier to accurately identify the fault categories. Besides, unlike the traditional dense channels of the TCN, the lightweight 2-D LTCN and 1-D LTCN backbones substantially reduced the number of trainable parameters, lowering the complexity of the network structure. The experimental results of the three cases demonstrated that the introduction of noise degraded the fault diagnosis performance of all models. The proposed LTCN-IBLS still achieved the best performance under noisy conditions, indicating that it has higher robustness than other comparable models. As shown in Table 20, a comparison of the convergence time of the training process (CTTP) was performed for models (CWT-CNN, AlexNet, and LTCN-IBLS) with 2-D data processing capability. The proposed LTCN-IBLS achieved the fastest convergence speed in the training process of the three datasets. It is worth mentioning that the input size of the CWT-CNN is smaller than that of LTCN-IBLS, indicating that the LTCN-IBLS has a better training performance.

Conclusions
In this study, we proposed an intelligent framework (called LTCN-IBLS) consisting of a feature extraction stage and a fault identification stage for data representation learning and faults diagnosis. In the feature extraction stage, the 1-D LTCN and 2-D LTCN backbone were used to extract the time-dependent information, including the time-frequency features and temporal features. Information from the future input in the direction of time axis was removed, while the complete frequency edge and corner information was retained during the extraction of the time-frequency features. The time-frequency features were fused with the temporal features to obtain more in-depth and high-quality information on the fault features to improve the fault identification performance of the IBLS classifier in the fault identification stage. The IBLS classifier established an accurate mapping relationship between the fused features and the fault categories. It exhibited better nonlinear mapping ability than the fully connected layers. Ablation experiments demonstrated the rationality and contributions of the framework's components. Under non-noisy conditions, the MVs of the accuracy, MP, MR, and MF of the LTCN-IBLS were up to 0.0560, 0.0426, 0.0560, and 0.0584 higher than those of the comparable models. Under noisy conditions, the MVs of the accuracy, MP, MR, and MF of the LTCN-IBLS were up to 0.1446, 0.1365, 0.1444, and 0.1492 higher than those of the comparable models. The LTCN-IBLS had the lowest number of trainable parameters (≤0.0165 M) among all models. The experimental results prove that the proposed framework possesses effectiveness, lightweight, and robustness for fault diagnosis.
This study provided insights and solutions for establishing lightweight neural network models to diagnose faults of rotating machinery, minimizing manual intervention. However, due to the large amount of computation in processing 2-D data, our proposed model is slower than many other models with 1-D input, although it has a low parameter number. Figures 9 and 10 indicate that it is more difficult to diagnose compound faults than single faults. Therefore, we will focus on more effective intelligent algorithms for improving the calculation speed of the models and diagnosing the compound faults of rotating machinery in future studies.