Motor Bearing Fault Diagnosis Based on Current Signal Using Time–Frequency Channel Attention

: As they are the core components of the drive motor in electric vehicles, the accurate fault diagnosis of rolling bearings is the key to ensuring the safe operation of electric vehicles. At present, intelligent diagnostic methods based on current signals (CSs) are widely used owing to the advantages of the easy collection, low cost, and non-invasiveness of CSs. However, in practical applications, the fault characteristics of the CS are weak, resulting in diagnostic performance that fails to meet the expected standards. In this paper, a diagnosis method is proposed to address this problem and enhance the diagnosis accuracy. Firstly, CSs from two phases are processed by periodic resampling to enhance data features, which are then fused through splicing operations. Subsequently, a feature enhancement module is constructed using multi-scale feature fusion for decomposing the input. Finally, a diagnosis model is constructed by using an improved channel attention module (CAM) for enhancing the diagnosis performance. The results from experiments containing two different types of bearing datasets show that the proposed method can extract high-quality fault features and improve the diagnosis accuracy, presenting great potential in intelligent fault diagnosis and the maintenance of electric vehicles.


Introduction
Rolling bearings are critical components of permanent magnet synchronous motors, which are the core power output components of electric vehicles.This requires the bearings to be reliable during the operation of electric vehicles.Therefore, researching diagnostic methods for bearing faults is of great importance for the safe driving of electric vehicles [1].
In recent years, with the rapid development of artificial intelligence algorithms, deep learning has been widely applied in bearing fault diagnosis [2][3][4][5][6].In early studies of bearing fault diagnosis based on deep learning, vibration signals (VSs), which contain rich information on the health status of bearings, were often used in deep learning models for fault diagnosis [7][8][9].Lee et al. [10] proposed a fault diagnosis method that utilizes the classical deep learning model, Convolutional Neural Networks (CNNs), to analyze VSs from motors.Wang et al. [11] proposed a fault diagnosis method that combines Short-Time Fourier Transform for extracting the time-frequency features of vibration signals with a CNN, achieving an approximate classification accuracy of 97%.Deng et al. [12] proposed a multi-sensor fusion method that involves using multiple vibration sensors to collect signals that are subsequently transformed into time-frequency images via Continuous Wavelet Transform.These images from the sensors are then concatenated through pooling and classified using a 2D CNN classifier, achieving an approximate classification accuracy of 98%.As research progressed, some scholars considered the operational status of mechanical equipment under actual working conditions, pointing out that, due to the narrow space of the new energy vehicle drive system, it is a difficult task to install additional sensors on World Electr.Veh.J. 2024, 15, 281 2 of 16 the motor in the drive system, which makes it difficult to operate fault diagnosis methods based on vibration sensors in new energy vehicles [13].However, CSs do not require additional sensors to be installed in electric vehicles.Therefore, bearing diagnosis methods based on CSs are considered to have broad development prospects [14][15][16][17].
Blodt et al. [18] considered the sudden vibrations during bearing faults as periodic load disturbances and derived a formula for the fault current characteristic frequency based on the physical equations of the motor.Based on this theory, Lessmeier et al. [19] extracted a total of 23 features from the original CSs in the time domain, frequency domain, and time-frequency domain.The maximum separation distance was selected as an indicator to evaluate the importance of each feature.According to the evaluation results, nine features were selected as the input of the machine learning algorithm.The final classification accuracy was about 93.3%, which is not a satisfactory performance.Jiang et al. [20] proposed a hybrid feature extraction method that uses wavelet transform and empirical mode decomposition to capture more comprehensive information from current signals.These features are classified by artificial neural networks, and good diagnostic results are obtained.Li et al. [21] proposed a method that combines generative adversarial networks and expert diagnosis knowledge, obtains fault knowledge from normal signals, and evaluates system status through the Wasserstein distance index.Through line spectrum feature technology, the main frequency component of the current signal is specifically removed, and the feature expression of the fault signal is enhanced.Finally, excellent diagnostic results are achieved, but this method cannot effectively classify faults.Sabir et al. [22] proposed a method for decomposing signals.CSs were decomposed by using wavelet packets to extract eight features, and these features were used as the input of long short-term memory (LSTM).The final classification accuracy was about 96%.Ince et al. [23] proposed a method combining the frequency features of the CS with an adaptive one-dimensional (1D) CNN, achieving a final classification accuracy of approximately 97%.Hoang et al. [24] proposed a diagnostic method based on the fusion of CSs, where the two-phase CSs were separately converted into grayscale images.These images were then input into different CNN models to obtain diagnostic results.The probability matrices obtained from the diagnoses served as a training set for training on a Multi-Layer Perceptron (MLP), ultimately achieving a classification accuracy of approximately 98%.
In summary, there are various challenges and limitations associated with bearing fault diagnosis methods based on CSs at present.These are mainly reflected in the following two aspects: on the one hand, due to the weak fault features of CSs, it is usually necessary to extract features artificially when putting them into the diagnostic model; on the other hand, diagnostic performance based on CSs is not perfect [25][26][27][28].In response to these challenges, this paper proposes an improved 1DCNN diagnosis method aimed at enhancing classification accuracy through multisource signal fusion, feature enhancement, and classifier improvement.To address these challenges, this paper proposes an improved 1DCNN diagnosis method that aims to improve the classification accuracy from three aspects: multisource signal fusion, feature enhancement, and an improved classifier.The main improvements covered in this paper can be summarized as follows: 1.
In view of the weak characteristics of the current signal, a data enhancement method of periodic sampling of the current signal is proposed at the data end, and the enhanced data are spliced and fused; 2.
In the input layer of the model, a multi-scale feature fusion method that can enhance fault features by self-learning is proposed.Through dimensional transformation of the input, multiple convolution layers decompose the signal.Finally, more fault features are provided through the fusion operation; 3.
Aiming at the problem of the poor classification performance of existing diagnostic models, an improved 1DCNN is constructed by extending the CAM to the time domain and frequency domain.
The remainder of the paper is organized as follows.In Section 2, the periodic sampling and fusion methods are given.In Section 3, the details of the Feature Enhance module are World Electr.Veh.J. 2024, 15, 281 3 of 16 provided.In Section 4, the fault diagnosis model is provided.In Section 5, the experimental studies based in the laboratory are provided to validate the performance of the proposed method.Finally, the conclusions are presented.

Periodic Sampling
Considering the scarcity of fault data in real-world operating environments, it is necessary to implement data augmentation measures to enhance data features before proceeding with data fusion.The sliding window is a commonly used method for data augmentation.However, existing research has not provided specific formulas for calculating the window length L and stride S. Therefore, this paper proposes a periodic sliding window sampling method specifically for periodic nonlinear signals such as CSs.
Specifically, the values of L and S are determined based on the rotational speed of the motor and the sampling period.Subsequently, the sampling process is discussed in detail using the A-phase current amplitude data as an example, with Figure 1 illustrating the specific steps.
3. Aiming at the problem of the poor classification performance of existing diagnostic models, an improved 1DCNN is constructed by extending the CAM to the time domain and frequency domain.
The remainder of the paper is organized as follows.In Section 2, the periodic sampling and fusion methods are given.In Section 3, the details of the Feature Enhance module are provided.In Section 4, the fault diagnosis model is provided.In Section 5, the experimental studies based in the laboratory are provided to validate the performance of the proposed method.Finally, the conclusions are presented.

Periodic Sampling
Considering the scarcity of fault data in real-world operating environments, it is necessary to implement data augmentation measures to enhance data features before proceeding with data fusion.The sliding window is a commonly used method for data augmentation.However, existing research has not provided specific formulas for calculating the window length L and stride S. Therefore, this paper proposes a periodic sliding window sampling method specifically for periodic nonlinear signals such as CSs.
Specifically, the values of L and S are determined based on the rotational speed of the motor and the sampling period.Subsequently, the sampling process is discussed in detail using the A-phase current amplitude data as an example, with Figure 1 illustrating the specific steps.Firstly, the number of collected signal points N can be calculated based on the sampling frequency fs and the sampling time t as N = fs•t.During a rotation cycle, the CSs will generate a specific fault signal point due to the bearing fault.Specifically, when the motor completes one rotation, the number of fault sample points collected at low speed is lower than that at high speed.Therefore, to ensure that the number of fault samples in each window is sufficient and to maintain training efficiency, the formula for calculating L based on the mechanical frequency fr at low speeds is as follows: where ⌊•⌋ represents two decimal places rounded down.The size of the stride is related to the overlap rate η between samples.If η is too small, the correlation between samples will be lost, and the number of samples will be reduced.If η is too large, it will lead to Firstly, the number of collected signal points N can be calculated based on the sampling frequency f s and the sampling time t as N = f s •t.During a rotation cycle, the CSs will generate a specific fault signal point due to the bearing fault.Specifically, when the motor completes one rotation, the number of fault sample points collected at low speed is lower than that at high speed.Therefore, to ensure that the number of fault samples in each window is sufficient and to maintain training efficiency, the formula for calculating L based on the mechanical frequency f r at low speeds is as follows: where ⌊•⌋ represents two decimal places rounded down.The size of the stride is related to the overlap rate η between samples.If η is too small, the correlation between samples will be lost, and the number of samples will be reduced.If η is too large, it will lead to overfitting.Therefore, in order to avoid overfitting and enhance the correlation between samples, this paper gives the S as follows: where η is directly defined as 0.5.Through the above calculation, the signal sample slice can be divided into n samples, and the size of n is as follows: The divided n samples are represented by X Ai j .The same operation is also completed on the B-phase current amplitude signal to obtain the sample set X Bi j .

Multi-Channel Fusion
This section will introduce the fusion process as shown in Figure 2.
overfitting.Therefore, in order to avoid overfitting and enhance the correlation between samples, this paper gives the S as follows: (1 where η is directly defined as 0.5.Through the above calculation, the signal sample slice can be divided into n samples, and the size of n is as follows: The divided n samples are represented by X Ai j .The same operation is also completed on the B-phase current amplitude signal to obtain the sample set X Bi j .

Multi-Channel Fusion
This section will introduce the fusion process as shown in Figure 2. Firstly, take X Ai j as an example, which is expressed as follows: 1 2 ( , , ), (1, ), (1, ) where i represents the bearing state, encompassing a total of K states (normal state and several fault states); and j indicates the working condition, with a total of m working conditions (based on the speed and load).Each sample corresponds to the current phase A amplitude data for working condition j and bearing state i.By aggregating the sample sets from different working conditions, XAi is derived as follows: 1 2 ( , ) where XAi represents the sample set of working conditions used in each bearing state.The same process is also applied to the B-phase current signal to obtain XBi.Then, XAi and XBi are put into different channels, and the merged sample set Xi is obtained by connecting.
The final fused dataset is represented by data as follows: The final fused dataset is represented by data as follows: where yi represents the label for each bearing status.

Ablation Experiment
To verify the effectiveness of the periodic sampling method proposed in this paper, this section conducts ablation experiments on the Paderborn university (PU) dataset [19].Firstly, the parameters of the PU dataset are specified: K = 5 represents five bearing Firstly, take X Ai j as an example, which is expressed as follows: where i represents the bearing state, encompassing a total of K states (normal state and several fault states); and j indicates the working condition, with a total of m working conditions (based on the speed and load).Each sample corresponds to the current phase A amplitude data for working condition j and bearing state i.By aggregating the sample sets from different working conditions, X Ai is derived as follows: where X Ai represents the sample set of working conditions used in each bearing state.The same process is also applied to the B-phase current signal to obtain X Bi .Then, X Ai and X Bi are put into different channels, and the merged sample set X i is obtained by connecting.
The final fused dataset is represented by data as follows: The final fused dataset is represented by data as follows: where y i represents the label for each bearing status.

Ablation Experiment
To verify the effectiveness of the periodic sampling method proposed in this paper, this section conducts ablation experiments on the Paderborn university (PU) dataset [19].Firstly, the parameters of the PU dataset are specified: K = 5 represents five bearing conditions, including one normal (NOR) bearing and two each of bearings with inner ring (IR) faults and bearings with outer ring (OR) faults, differentiated by fault size.Each bearing is categorized by its condition and level of damage, labeled as IR1, IR2, NOR, OR1, or OR2.The sampling frequency f s is set at 64kHz, with m = 4 denoting four different operational conditions, further detailed in Table 1.Table 1 shows that RPM high is 1500 r/min and RPM low is 900 r/min.Calculations from Equation (1) yield L = 6400 and S = 3200, which leads to the division of sample X Ai into dataset M0 and X i into M1.With L = 3200 and S = 1600, sample X Ai is categorized as M2; with L = 9600 and S = 4800, it is categorized as M3; with L = 6400 and S = 1600, it is categorized as M4; and, with the same L = 6400 but S = 4800, it is categorized as M5.Each dataset is subsequently divided into training and test sets at an 8:2 ratio.The basic diagnostic model is a 1DCNN, which incorporates optimizations such as Batch Normalization and residual connections, with the specific structure shown in Figure 3.
Electr.Veh.J. 2024, 15, x FOR PEER REVIEW 5 of 16 conditions, including one normal (NOR) bearing and two each of bearings with inner ring (IR) faults and bearings with outer ring (OR) faults, differentiated by fault size.Each bearing is categorized by its condition and level of damage, labeled as IR1, IR2, NOR, OR1, or OR2.The sampling frequency fs is set at 64kHz, with m = 4 denoting four different operational conditions, further detailed in Table 1.  1 shows that RPMhigh is 1500 r/min and RPMlow is 900 r/min.Calculations from Equation ( 1) yield L = 6400 and S = 3200, which leads to the division of sample XAi into dataset M0 and Xi into M1.With L = 3200 and S = 1600, sample XAi is categorized as M2; with L = 9600 and S = 4800, it is categorized as M3; with L = 6400 and S = 1600, it is categorized as M4; and, with the same L = 6400 but S = 4800, it is categorized as M5.Each dataset is subsequently divided into training and test sets at an 8:2 ratio.The basic diagnostic model is a 1DCNN, which incorporates optimizations such as Batch Normalization and residual connections, with the specific structure shown in Figure 3.
where TP, FN, FN, and FP denote the numbers of true-positive samples, true-negative samples, false-negative samples, and false-positive samples, respectively.The experimental results are shown in Table 2.
It can be seen from Table 2 that, with a fixed η, a larger L results in higher accuracy and longer training times.Conversely, with a fixed L, a larger η results in shorter training times and lower accuracy.A comparison of datasets M0, M2, M3, M4, and M5 reveals that M0 has the highest comprehensive performance, indicating the feasibility of the periodic sampling method.Meanwhile, M1 outperforms the other datasets, indicating that the proposed two-phase current fusion method significantly enhances feature effectiveness at the data layer.The number of epochs for the experiment is set at 50.To reduce random interference, all experiments are conducted three times, comparing the average accuracy and single training times for each dataset's test set.The formula for calculating accuracy is as follows: where TP, FN, FN, and FP denote the numbers of true-positive samples, true-negative samples, false-negative samples, and false-positive samples, respectively.The experimental results are shown in Table 2.It can be seen from Table 2 that, with a fixed η, a larger L results in higher accuracy and longer training times.Conversely, with a fixed L, a larger η results in shorter training times and lower accuracy.A comparison of datasets M0, M2, M3, M4, and M5 reveals that M0 has the highest comprehensive performance, indicating the feasibility of the periodic sampling method.Meanwhile, M1 outperforms the other datasets, indicating that the proposed two-phase current fusion method significantly enhances feature effectiveness at the data layer.

Feature Enhance
To address the issue of weak fault features in CSs, this paper constructs a feature enhancement module at the input layer.The specific structure of this module is illustrated in Figure 4.

Feature Enhance
To address the issue of weak fault features in CSs, this paper constructs a feature enhancement module at the input layer.The specific structure of this module is illustrated in Figure 4.
Secondly, this paper designs a multi-scale feature fusion module to extract features for x 2d .The structure of the multi-scale feature fusion module is shown in Figure 5.The multi-scale feature fusion module is constructed from multiple parallel convolution layers, each utilizing convolutional kernels of varying sizes to extract local features at different scales.This diversity in kernel sizes facilitates the comprehensive extraction of features across different scales and depths within the signal structure.The adoption of a multi-scale architecture is driven by the need to capture a broad range of fault characteristics, which often manifest at different scales and resolutions within the signal data.By Firstly, the 1D signal of each sample x = [x 1 , x 2 , . .., x L ] is reconstructed into a 2D signal to enhance feature representation capabilities using a 2D structure.Specifically, the length L of each sample is divided into two positive integers, h and w, such that L equals h times w.This division allows the 1D signal to be reconstructed into a 2D array with h rows and w columns, where each element represents a sub-segment of the sample, enhancing the capture of fault characteristics in the signal.The reconstruction into a 2D array is shown in the following equation: Secondly, this paper designs a multi-scale feature fusion module to extract features for x 2d .The structure of the multi-scale feature fusion module is shown in Figure 5.

Feature Enhance
To address the issue of weak fault features in CSs, this paper constructs a feature enhancement module at the input layer.The specific structure of this module is illustrated in Figure 4.
Secondly, this paper designs a multi-scale feature fusion module to extract features for x 2d .The structure of the multi-scale feature fusion module is shown in Figure 5.The multi-scale feature fusion module is constructed from multiple parallel convolution layers, each utilizing convolutional kernels of varying sizes to extract local features at different scales.This diversity in kernel sizes facilitates the comprehensive extraction of features across different scales and depths within the signal structure.The adoption of a multi-scale architecture is driven by the need to capture a broad range of fault characteristics, which often manifest at different scales and resolutions within the signal data.By The multi-scale feature fusion module is constructed from multiple parallel convolution layers, each utilizing convolutional kernels of varying sizes to extract local features at different scales.This diversity in kernel sizes facilitates the comprehensive extraction of features across different scales and depths within the signal structure.The adoption of a multi-scale architecture is driven by the need to capture a broad range of fault characteristics, which often manifest at different scales and resolutions within the signal data.By employing kernels of various sizes, the module can capture both fine-grained details and broader signal patterns, ensuring a robust feature representation that enhances fault detec-tion capabilities.Specifically, each convolution layer in this module applies its respective kernel to the input data, extracting features pertinent to its scale.The features extracted from each layer are then aggregated-summed and activated via a ReLU function-to form a unified feature set, denoted as x 2d fus .This aggregation process ensures that features from all scales contribute equally to the final feature representation, providing a more detailed and discriminative feature set for fault diagnosis.The aggregated features are subsequently flattened and normalized, producing the enhanced feature vector x e as follows: Finally, x and x e are concatenated to produce the output x concat , which includes both the intuitive information from the original input and the processed deep features, thus providing more comprehensive information for the model to learn.

Ablation Experiment
The ablation study in this section uses dataset M1 with the methods labeled as follows: M6 employs x as input, M7 uses x e as input, and M8 uses x concat as input.The convolution layers in the multi-scale feature fusion module employ two different sizes of convolutional kernels, namely 3 and 5.The specific diagnostic model is illustrated in Figure 6.
World Electr.Veh.J. 2024, 15, x FOR PEER REVIEW 7 of 16 employing kernels of various sizes, the module can capture both fine-grained details and broader signal patterns, ensuring a robust feature representation that enhances fault detection capabilities.Specifically, each convolution layer in this module applies its respective kernel to the input data, extracting features pertinent to its scale.The features extracted from each layer are then aggregated-summed and activated via a ReLU function-to form a unified feature set, denoted as x 2d fus .This aggregation process ensures that features from all scales contribute equally to the final feature representation, providing a more detailed and discriminative feature set for fault diagnosis.The aggregated features are subsequently flattened and normalized, producing the enhanced feature vector xe as follows: Finally, x and xe are concatenated to produce the output xconcat, which includes both the intuitive information from the original input and the processed deep features, thus providing more comprehensive information for the model to learn.

Ablation Experiment
The ablation study in this section uses dataset M1 with the methods labeled as follows: M6 employs x as input, M7 uses xe as input, and M8 uses xconcat as input.The convolution layers in the multi-scale feature fusion module employ two different sizes of convolutional kernels, namely 3 and 5.The specific diagnostic model is illustrated in Figure 6.Except for the first convolution layer, where the input channel count changes from 2 to 4, all other structures and experimental parameters remain unchanged.
The diagnostic results from the three experiments are shown in Figure 7.It can be seen that, compared to the original input, using features after multi-scale feature fusion as the input improves the diagnostic accuracy by approximately 0.5%.Additionally, using features processed through the feature enhancement module as the input increases the diagnostic accuracy by approximately 1.5%.These results demonstrate that the multiscale feature fusion and feature enhancement modules proposed in this paper are effective and can significantly enhance the ability of the model to recognize complex data patterns.Except for the first convolution layer, where the input channel count changes from 2 to 4, all other structures and experimental parameters remain unchanged.
The diagnostic results from the three experiments are shown in Figure 7.It can be seen that, compared to the original input, using features after multi-scale feature fusion as the input improves the diagnostic accuracy by approximately 0.5%.Additionally, using features processed through the feature enhancement module as the input increases the diagnostic accuracy by approximately 1.5%.These results demonstrate that the multi-scale feature fusion and feature enhancement modules proposed in this paper are effective and can significantly enhance the ability of the model to recognize complex data patterns.
employing kernels of various sizes, the module can capture both fine-grained details and broader signal patterns, ensuring a robust feature representation that enhances fault detection capabilities.Specifically, each convolution layer in this module applies its respective kernel to the input data, extracting features pertinent to its scale.The features extracted from each layer are then aggregated-summed and activated via a ReLU function-to form a unified feature set, denoted as x 2d fus .This aggregation process ensures that features from all scales contribute equally to the final feature representation, providing a more detailed and discriminative feature set for fault diagnosis.The aggregated features are subsequently flattened and normalized, producing the enhanced feature vector xe as follows: Finally, x and xe are concatenated to produce the output xconcat, which includes both the intuitive information from the original input and the processed deep features, thus providing more comprehensive information for the model to learn.

Ablation Experiment
The ablation study in this section uses dataset M1 with the methods labeled as follows: M6 employs x as input, M7 uses xe as input, and M8 uses xconcat as input.The convolution layers in the multi-scale feature fusion module employ two different sizes of convolutional kernels, namely 3 and 5.The specific diagnostic model is illustrated in Figure 6.Except for the first convolution layer, where the input channel count changes from 2 to 4, all other structures and experimental parameters remain unchanged.
The diagnostic results from the three experiments are shown in Figure 7.It can be seen that, compared to the original input, using features after multi-scale feature fusion as the input improves the diagnostic accuracy by approximately 0.5%.Additionally, using features processed through the feature enhancement module as the input increases the diagnostic accuracy by approximately 1.5%.These results demonstrate that the multiscale feature fusion and feature enhancement modules proposed in this paper are effective and can significantly enhance the ability of the model to recognize complex data patterns.

Improve Channel Attention Module 4.1. Improve Channel Attention Module Structure
To address the issue of suboptimal diagnostic results, the diagnosis model presented in this paper enhances the 1DCNN framework by incorporating an improved CAM.The structure of the traditional CAM is shown in Figure 8.

Improve Channel Attention Module Structure
To address the issue of suboptimal diagnostic results, the diagnosis model presented in this paper enhances the 1DCNN framework by incorporating an improved CAM.The structure of the traditional CAM is shown in Figure 8. where σ represents the sigmoid function.In order to facilitate the following description, Equation ( 11) is denoted as CAM (xconcat).The conventional CAM process does not consider the frequency domain characteristics of the input.However, the literature [18] indicates variations in the frequency domain of the current during bearing faults.Consequently, this paper modifies the CAM to extend its feature extraction capabilities to the frequency domain, and it is named the time-frequency CAM (TFCAM).The specific structure is shown in Figure 9. .The structure of the time domain block is similar to that of the traditional CAM, while the structure of the frequency domain block is as illustrated in Figure 10.It can be seen that the traditional CAM consists of three main parts: The first component comprises a Global Average Pooling (GAP) layer and a Global Max Pooling (GMP) layer, which extract information from the input, represented by F c avg for GAP features and F c max for GMP features.The second component involves a shared Multi-Layer Perceptron (MLP), which further processes F c avg and F c max through a shared network, outputting 1D vectors M c avg and M c max corresponding to the number of channels.The third component combines these two vectors element-wise and computes the channel attention scores M c using a sigmoid function, which represents the weight of each input channel.Finally, M c is element-wise multiplied with the input x concat to produce the final result x out .The computational process is described as follows: where σ represents the sigmoid function.In order to facilitate the following description, Equation ( 11) is denoted as CAM (x concat ).The conventional CAM process does not consider the frequency domain characteristics of the input.However, the literature [18] indicates variations in the frequency domain of the current during bearing faults.Consequently, this paper modifies the CAM to extend its feature extraction capabilities to the frequency domain, and it is named the time-frequency CAM (TFCAM).The specific structure is shown in Figure 9.

Improve Channel Attention Module Structure
To address the issue of suboptimal diagnostic results, the diagnosis model presented in this paper enhances the 1DCNN framework by incorporating an improved CAM.The structure of the traditional CAM is shown in Figure 8. where σ represents the sigmoid function.In order to facilitate the following description, Equation ( 11) is denoted as CAM (xconcat).The conventional CAM process does not consider the frequency domain characteristics of the input.However, the literature [18] indicates variations in the frequency domain of the current during bearing faults.Consequently, this paper modifies the CAM to extend its feature extraction capabilities to the frequency domain, and it is named the time-frequency CAM (TFCAM).The specific structure is shown in Figure 9. .The structure of the time domain block is similar to that of the traditional CAM, while the structure of the frequency domain block is as illustrated in Figure 10. .The structure of the time domain block is similar to that of the traditional CAM, while the structure of the frequency domain block is as illustrated in Figure 10.It can be seen that, upon the input xconcat entering the frequency block, it first undergoes a transformation to the frequency domain using Fast Fourier Transform (FFT) technology.The specific calculation formula is as follows: It can be seen from the above formula that the advantage of TFCAM over the traditional CAM lies in its broader analytical scope.While the traditional CAM focuses solely on the time domain information of the input data to discern channel relationships, TFCAM additionally incorporates frequency domain information.This dual consideration allows TFCAM to achieve a more precise channel relationship.

Ablation Experiment
The ablation study in this section compares the use of only the time block (TCAM), only the frequency block (FCAM), and the combined use of both blocks in the TFCAM.Specifically, these blocks are added to the diagnostic model after each convolution layer where there is a change in the number of channels, as shown in Figure 11.The experimental parameter settings are consistent with those described in Section 3.2.The average accuracies of FCAM, TCAM, and TFCAM are 97.36%,98.51%, and 99.32%, respectively.To more clearly visualize the effectiveness of the feature extraction methods, t-SNE dimensionality reduction technology is employed to visualize the classification results, as shown in Figure 12.It can be seen that, upon the input x concat entering the frequency block, it first undergoes a transformation to the frequency domain using Fast Fourier Transform (FFT) technology.The specific calculation formula is as follows: The transformed frequency domain data x freq then serve as the input to a time block, which is used to obtain the frequency domain channel attention scores M f req c as follows: After element-wise addition of M time c and M f req c , the final channel attention scores M enh c are obtained by activating through the sigmoid function as follows: Then, M enh c is element-wise multiplied with the input, resulting in the final output x enh out .The computational process is as follows: It can be seen from the above formula that the advantage of TFCAM over the traditional CAM lies in its broader analytical scope.While the traditional CAM focuses solely on the time domain information of the input data to discern channel relationships, TFCAM additionally incorporates frequency domain information.This dual consideration allows TFCAM to achieve a more precise channel relationship.

Ablation Experiment
The ablation study in this section compares the use of only the time block (TCAM), only the frequency block (FCAM), and the combined use of both blocks in the TFCAM.Specifically, these blocks are added to the diagnostic model after each convolution layer where there is a change in the number of channels, as shown in Figure 11.It can be seen that, upon the input xconcat entering the frequency block, it first undergoes a transformation to the frequency domain using Fast Fourier Transform (FFT) technology.The specific calculation formula is as follows: The transformed frequency domain data xfreq then serve as the input to a time block, which is used to obtain the frequency domain channel attention scores M freq c as follows: ) After element-wise addition of M Then, M enh c is element-wise multiplied with the input, resulting in the final output x enh out .The computational process is as follows: It can be seen from the above formula that the advantage of TFCAM over the traditional CAM lies in its broader analytical scope.While the traditional CAM focuses solely on the time domain information of the input data to discern channel relationships, TFCAM additionally incorporates frequency domain information.This dual consideration allows TFCAM to achieve a more precise channel relationship.

Ablation Experiment
The ablation study in this section compares the use of only the time block (TCAM), only the frequency block (FCAM), and the combined use of both blocks in the TFCAM.Specifically, these blocks are added to the diagnostic model after each convolution layer where there is a change in the number of channels, as shown in Figure 11.The experimental parameter settings are consistent with those described in Section 3.2.The average accuracies of FCAM, TCAM, and TFCAM are 97.36%,98.51%, and 99.32%, respectively.To more clearly visualize the effectiveness of the feature extraction methods, t-SNE dimensionality reduction technology is employed to visualize the classification results, as shown in Figure 12.The experimental parameter settings are consistent with those described in Section 3.2.The average accuracies of FCAM, TCAM, and TFCAM are 97.36%,98.51%, and 99.32%, respectively.To more clearly visualize the effectiveness of the feature extraction methods, t-SNE dimensionality reduction technology is employed to visualize the classification results, as shown in Figure 12.It can be seen that the five different labels-IR1, IR2, NOR, OR1, and OR2-are represented by five distinct colors, each corresponding to their respective features.Analysis of the clustering of each color reveals the following insights: In FCAM, the feature clustering of IR1 shows a clear separation, which is enhanced in TCAM.However, in TCAM, OR1 exhibits significant separation.These observations suggest that both TCAM and FCAM exhibit limitations in diagnosing early-stage faults.In contrast, TFCAM maintains good separation across all features without evident separation within individual categories, thereby demonstrating the superior effectiveness of TFCAM in feature discrimination.

Experiment and Analysis
Integrating the three methods introduced in the previous sections, the diagnostic process presented in this paper is illustrated in Figure 13.It can be seen that the five different labels-IR1, IR2, NOR, OR1, and OR2-are represented by five distinct colors, each corresponding to their respective features.Analysis of the clustering of each color reveals the following insights: In FCAM, the feature clustering of IR1 shows a clear separation, which is enhanced in TCAM.However, in TCAM, OR1 exhibits significant separation.These observations suggest that both TCAM and FCAM exhibit limitations in diagnosing early-stage faults.In contrast, TFCAM maintains good separation across all features without evident separation within individual categories, thereby demonstrating the superior effectiveness of TFCAM in feature discrimination.

Experiment and Analysis
Integrating the three methods introduced in the previous sections, the diagnostic process presented in this paper is illustrated in Figure 13.It can be seen that the five different labels-IR1, IR2, NOR, OR1, and OR2-are represented by five distinct colors, each corresponding to their respective features.Analysis of the clustering of each color reveals the following insights: In FCAM, the feature clustering of IR1 shows a clear separation, which is enhanced in TCAM.However, in TCAM, OR1 exhibits significant separation.These observations suggest that both TCAM and FCAM exhibit limitations in diagnosing early-stage faults.In contrast, TFCAM maintains good separation across all features without evident separation within individual categories, thereby demonstrating the superior effectiveness of TFCAM in feature discrimination.

Experiment and Analysis
Integrating the three methods introduced in the previous sections, the diagnostic process presented in this paper is illustrated in Figure 13.It can be seen that there are three steps involved: data collection, data processing, and fault diagnosis analysis.The techniques for data processing and fault diagnosis analysis have been previously detailed in this paper.The subsequent discussion will concentrate on the data collection aspect of the dataset utilized in this chapter and compare experimental analyses with other methods.
Following this, the discussion will be based on the process diagram outlined below.Firstly, the experimental platform for the laboratory dataset is presented in Figure 14.
World Electr.Veh.J. 2024, 15, x FOR PEER REVIEW 11 of 16 It can be seen that there are three steps involved: data collection, data processing, and fault diagnosis analysis.The techniques for data processing and fault diagnosis analysis have been previously detailed in this paper.The subsequent discussion will concentrate on the data collection aspect of the dataset utilized in this chapter and compare experimental analyses with other methods.
Following this, the discussion will be based on the process diagram outlined below.Firstly, the experimental platform for the laboratory dataset is presented in Figure 14.It can be seen that, from right to left, the platform comprises a drive motor, a bearing module, and a hysteresis brake, with all components interconnected via a coupler.The drive motor is a four-stage, 2 KW permanent magnet synchronous motor, responsible for providing rotational speed and CSs.There are six bearings in total: one normal bearing, one ball (BA) fault bearing, two IR fault bearings, and two OR fault bearings.The specific types of faults present in the three faulty bearings are depicted in Figure 15.The bearing model used is 6205, and the fault manifestation observed is a crack, caused by electrical processing.The specific parameters of each bearing are detailed in Table 3.The hysteresis brake is responsible for closed-loop adjustment of the load torque to simulate different operating conditions.For this experiment, six distinct conditions are specifically established.The settings for these conditions take into account variations in switching frequency (SWF) to ensure the comprehensiveness of data collection and the reliability of the experimental results.The specific parameters for each condition are shown in Table 4.It can be seen that, from right to left, the platform comprises a drive motor, a bearing module, and a hysteresis brake, with all components interconnected via a coupler.The drive motor is a four-stage, 2 KW permanent magnet synchronous motor, responsible for providing rotational speed and CSs.There are six bearings in total: one normal bearing, one ball (BA) fault bearing, two IR fault bearings, and two OR fault bearings.The specific types of faults present in the three faulty bearings are depicted in Figure 15.
World Electr.Veh.J. 2024, 15, x FOR PEER REVIEW 11 of 16 It can be seen that there are three steps involved: data collection, data processing, and fault diagnosis analysis.The techniques for data processing and fault diagnosis analysis have been previously detailed in this paper.The subsequent discussion will concentrate on the data collection aspect of the dataset utilized in this chapter and compare experimental analyses with other methods.
Following this, the discussion will be based on the process diagram outlined below.Firstly, the experimental platform for the laboratory dataset is presented in Figure 14.It can be seen that, from right to left, the platform comprises a drive motor, a bearing module, and a hysteresis brake, with all components interconnected via a coupler.The drive motor is a four-stage, 2 KW permanent magnet synchronous motor, responsible for providing rotational speed and CSs.There are six bearings in total: one normal bearing, one ball (BA) fault bearing, two IR fault bearings, and two OR fault bearings.The specific types of faults present in the three faulty bearings are depicted in Figure 15.The bearing model used is 6205, and the fault manifestation observed is a crack, caused by electrical processing.The specific parameters of each bearing are detailed in Table 3.The hysteresis brake is responsible for closed-loop adjustment of the load torque to simulate different operating conditions.For this experiment, six distinct conditions are specifically established.The settings for these conditions take into account variations in switching frequency (SWF) to ensure the comprehensiveness of data collection and the reliability of the experimental results.The specific parameters for each condition are shown in Table 4.The bearing model used is 6205, and the fault manifestation observed is a crack, caused by electrical processing.The specific parameters of each bearing are detailed in Table 3.The hysteresis brake is responsible for closed-loop adjustment of the load torque to simulate different operating conditions.For this experiment, six distinct conditions are specifically established.The settings for these conditions take into account variations in switching frequency (SWF) to ensure the comprehensiveness of data collection and the reliability of the experimental results.The specific parameters for each condition are shown in Table 4.The data collection device is shown in Figure 16.It can be seen that the specific acquisition process involves several key steps: Firstly, the Hall sensor is connected to the two-phase line of the inverter, converting the actual current into an analog signal.Subsequently, the acquisition card is connected to the Hall sensor, which converts the analog signal into a digital signal.Finally, the PC is connected to the acquisition card, storing the digital data locally at a sampling rate of 64 kHz.The data collection device is shown in Figure 16.It can be seen that the specific acquisition process involves several key steps: Firstly, the Hall sensor is connected to the two-phase line of the inverter, converting the actual current into an analog signal.Subsequently, the acquisition card is connected to the Hall sensor, which converts the analog signal into a digital signal.Finally, the PC is connected to the acquisition card, storing the digital data locally at a sampling rate of 64 kHz.It can be seen that methods a and b do not yield satisfactory results, while methods c and d, which both employ information fusion techniques, perform notably better.This outcome suggests that information fusion is highly effective for bearing fault diagnosis based on CSs Although both methods c and d utilize information fusion, method d still outperforms method c.Because the dataset used in this paper is taken from different working conditions, the experimental results also show that the proposed method has better robustness and can adapt to the diagnosis task of variable speed and variable load.It can be seen that methods a and b do not yield satisfactory results, while methods c and d, which both employ information fusion techniques, perform notably better.This outcome suggests that information fusion is highly effective for bearing fault diagnosis based on CSs Although both methods c and d utilize information fusion, method d still outperforms method c.Because the dataset used in this paper is taken from different working conditions, the experimental results also show that the proposed method has better robustness and can adapt to the diagnosis task of variable speed and variable load.
To further analyze the reasons behind the superiority of method d, t-SNE is employed to visualize the clustering of features, as shown in Figure 18.It can be seen that the clustering results are represented by six distinct colors.Specifically, methods a and b exhibit many overlaps, primarily between OR1 and OR2.Method c shows improvement in this aspect, though some feature boundaries, such as between IR1 and IR2, are not distinct.Method d exhibits the best performance, with clear separation distances between features and no overlapping points.This demonstrates that the proposed method excels in fault feature separation compared to other methods and is well suited for multi-classification fault bearing diagnosis tasks.To further analyze the reasons behind the superiority of method d, t-SNE is employed to visualize the clustering of features, as shown in Figure 18.It can be seen that the clustering results are represented by six distinct colors.Specifically, methods a and b exhibit many overlaps, primarily between OR1 and OR2.Method c shows improvement in this aspect, though some feature boundaries, such as between IR1 and IR2, are not distinct.Method d exhibits the best performance, with clear separation distances between features and no overlapping points.This demonstrates that the proposed method excels in fault feature separation compared to other methods and is well suited for multi-classification fault bearing diagnosis tasks.It can be seen that methods a and b do not yield satisfactory results, while metho and d, which both employ information fusion techniques, perform notably better.T outcome suggests that information fusion is highly effective for bearing fault diagn based on CSs Although both methods c and d utilize information fusion, method d outperforms method c.Because the dataset used in this paper is taken from diffe working conditions, the experimental results also show that the proposed method better robustness and can adapt to the diagnosis task of variable speed and variable lo To further analyze the reasons behind the superiority of method d, t-SNE is ployed to visualize the clustering of features, as shown in Figure 18.It can be seen that clustering results are represented by six distinct colors.Specifically, methods a and b hibit many overlaps, primarily between OR1 and OR2.Method c shows improvemen this aspect, though some feature boundaries, such as between IR1 and IR2, are not disti Method d exhibits the best performance, with clear separation distances between featu and no overlapping points.This demonstrates that the proposed method excels in f feature separation compared to other methods and is well suited for multi-classifica fault bearing diagnosis tasks.

Conclusions
In response to the limitations of vibration signals in diagnosing electric vehicle bearings, this paper proposes a diagnostic method based on CSs.Firstly, the method involves data enhancement of the two-phase current signals through periodic sampling, followed by concatenation and fusion of these data to enhance feature representation.Subsequently, the dimension transformation and multi-scale feature fusion are introduced to construct the Feature Enhance module to increase the diversity and effectiveness of the input features.Finally, by integrating TFCAM, which extends to both time and frequency domains, into a 1D-CNNs, the TFCAM-CNN is constructed to enhance the extraction of deep features and achieve better fault diagnosis results.Three ablation studies based on lifecycle bearing failures have demonstrated the effectiveness of the three methods proposed in this paper.Comparative experiments based on manufactured bearing faults have shown that the diagnostic performance of the method proposed in this paper surpasses that of comparative methods, demonstrating significant potential in bearing fault diagnosis based on CSs.However, in the early faults with shallow fault depth, although good results have been achieved in early fault diagnosis, the diagnostic rate does not reach 100%.Therefore, on the one hand, future research will continue to focus on the diagnosis of early bearing faults.On the other hand, because the current signal is easy to collect and not easily affected by the environment, this method can be extended to cross-device diagnosis and real-time diagnosis in the future.

Figure 1 .
Figure 1.The flow chart of periodic sampling.

Figure 1 .
Figure 1.The flow chart of periodic sampling.

Figure 2 .
Figure 2. The flow chart of multi-channel fusion.

Figure 2 .
Figure 2. The flow chart of multi-channel fusion.

Figure 3 .
Figure 3.The basic structure of the 1D-CNN.The number of epochs for the experiment is set at 50.To reduce random interference, all experiments are conducted three times, comparing the average accuracy and single training times for each dataset's test set.The formula for calculating accuracy is as follows: 100% TP FN Accuracy TP FP FN TN

Figure 3 .
Figure 3.The basic structure of the 1D-CNN.

Figure 4 .
Figure 4.The basic structure of the Feature Enhance module.Firstly, the 1D signal of each sample x = [x1, x2, …, xL] is reconstructed into a 2D signal to enhance feature representation capabilities using a 2D structure.Specifically, the length L of each sample is divided into two positive integers, h and w, such that L equals h times w.This division allows the 1D signal to be reconstructed into a 2D array with h rows and w columns, where each element represents a sub-segment of the sample, enhancing the capture of fault characteristics in the signal.The reconstruction into a 2D array is shown in the following equation:

Figure 5 .
Figure 5.The structure of multi-scale feature fusion module.

Figure 4 .
Figure 4.The basic structure of the Feature Enhance module.

Figure 4 .
Figure 4.The basic structure of the Feature Enhance module.Firstly, the 1D signal of each sample x = [x1, x2, …, xL] is reconstructed into a 2D signal to enhance feature representation capabilities using a 2D structure.Specifically, the length L of each sample is divided into two positive integers, h and w, such that L equals h times w.This division allows the 1D signal to be reconstructed into a 2D array with h rows and w columns, where each element represents a sub-segment of the sample, enhancing the capture of fault characteristics in the signal.The reconstruction into a 2D array is shown in the following equation:

Figure 5 .
Figure 5.The structure of multi-scale feature fusion module.

Figure 5 .
Figure 5.The structure of multi-scale feature fusion module.

Figure 6 .
Figure 6.The structure of fault diagnosis model with added Feature Enhance module.

Figure 6 .
Figure 6.The structure of fault diagnosis model with added Feature Enhance module.

Figure 6 .
Figure 6.The structure of fault diagnosis model with added Feature Enhance module.

Figure 7 .
Figure 7.The result of Feature Enhance module experiment.

16 Figure 7 .
Figure 7.The result of Feature Enhance module experiment.

Figure 8 .
Figure 8.The structure of traditional CAM.It can be seen that the traditional CAM consists of three main parts: The first component comprises a Global Average Pooling (GAP) layer and a Global Max Pooling (GMP) layer, which extract information from the input, represented by F c avg for GAP features and F c max for GMP features.The second component involves a shared Multi-Layer Perceptron (MLP), which further processes F c avg and F c max through a shared network, outputting 1D vectors M c avg and M c max corresponding to the number of channels.The third component combines these two vectors element-wise and computes the channel attention scores Mc using a sigmoid function, which represents the weight of each input channel.Finally, Mc is element-wise multiplied with the input xconcat to produce the final result xout.The computational process is described as follows:

Figure 9 .
Figure 9.The structure of TFCAM.It can be seen that, unlike in the traditional CAM, upon entering TFCAM, the input xconcat is processed in parallel by two distinct blocks: the time block and the frequency block.This parallel processing results in the generation of time domain channel attention scores M time c and frequency domain channel attention scores M freq c

Figure 8 .
Figure 8.The structure of traditional CAM.

Figure 7 .
Figure 7.The result of Feature Enhance module experiment.

Figure 8 .
Figure 8.The structure of traditional CAM.It can be seen that the traditional CAM consists of three main parts: The first component comprises a Global Average Pooling (GAP) layer and a Global Max Pooling (GMP) layer, which extract information from the input, represented by F c avg for GAP features and F c max for GMP features.The second component involves a shared Multi-Layer Perceptron (MLP), which further processes F c avg and F c max through a shared network, outputting 1D vectors M c avg and M c max corresponding to the number of channels.The third component combines these two vectors element-wise and computes the channel attention scores Mc using a sigmoid function, which represents the weight of each input channel.Finally, Mc is element-wise multiplied with the input xconcat to produce the final result xout.The computational process is described as follows:

Figure 9 .
Figure 9.The structure of TFCAM.It can be seen that, unlike in the traditional CAM, upon entering TFCAM, the input xconcat is processed in parallel by two distinct blocks: the time block and the frequency block.This parallel processing results in the generation of time domain channel attention scores M time c and frequency domain channel attention scores M freq c

Figure 9 .
Figure 9.The structure of TFCAM.It can be seen that, unlike in the traditional CAM, upon entering TFCAM, the input x concat is processed in parallel by two distinct blocks: the time block and the frequency block.This parallel processing results in the generation of time domain channel attention scores M time c and frequency domain channel attention scores M f req c

Figure 10 .
Figure 10.The structure of frequency block.
The transformed frequency domain data xfreq then serve as the input to a time block, which is used to obtain the frequency domain channel attention scores M After element-wise addition of M time c and M freq c , the final channel attention scores M enh care obtained by activating through the sigmoid function as follows: wise multiplied with the input, resulting in the final output x enh out .The computational process is as follows:

Figure 11 .
Figure 11.The structure of fault diagnosis model with added TFCAM.

Figure 10 .
Figure 10.The structure of frequency block.

16 Figure 10 .
Figure 10.The structure of frequency block.
final channel attention scores M enh c are obtained by activating through the sigmoid function as follows:

Figure 11 .
Figure 11.The structure of fault diagnosis model with added TFCAM.

Figure 11 .
Figure 11.The structure of fault diagnosis model with added TFCAM.

Figure 12 .
Figure 12. t-SNE visualization of different channel attention methods.

Figure 13 .
Figure 13.The framework of the proposed diagnosis method.

Figure 12 .
Figure 12. t-SNE visualization of different channel attention methods.

Figure 13 .
Figure 13.The framework of the proposed diagnosis method.Figure 13.The framework of the proposed diagnosis method.

Figure 13 .
Figure 13.The framework of the proposed diagnosis method.Figure 13.The framework of the proposed diagnosis method.

Figure 15 .
Figure 15.The fault types of bearings of laboratory dataset.

Figure 15 .
Figure 15.The fault types of bearings of laboratory dataset.

Figure 15 .
Figure 15.The fault types of bearings of laboratory dataset.

Figure 16 .
Figure 16.Data collection device for laboratory data.

Secondly, as indicated by Table 4 ,
RPMhigh is 1500 r/min and RPMlow is 900 r/min.Using the formula provided in Section 2, L = 6400 and S = 3200 are calculated, and the dataset is divided into M10.Finally, control experiments are conducted on dataset M10 which include four diagnostic models: (a) the LTSM diagnostic method proposed by Sabir, (b) the 1DCNN diagnostic method proposed by Innce, (c) the fusion CNN method proposed by Hoang, and (d) the TFCAM-1DCNN method proposed in this paper.The parameters for the experiment are the same as those used in the ablation study described in Section 4.2.The results of the three experiments are shown in Figure 17.

Figure 16 .
Figure 16.Data collection device for laboratory data.
Secondly, as indicated by Table4, RPM high is 1500 r/min and RPM low is 900 r/min.Using the formula provided in Section 2, L = 6400 and S = 3200 are calculated, and the dataset is divided into M10.Finally, control experiments are conducted on dataset M10 which include four diagnostic models: (a) the LTSM diagnostic method proposed by Sabir, (b) the 1DCNN diagnostic method proposed by Innce, (c) the fusion CNN method proposed by Hoang, and (d) the TFCAM-1DCNN method proposed in this paper.The parameters for the experiment are the same as those used in the ablation study described in Section 4.2.The results of the three experiments are shown in Figure17.

16 Figure 17 .
Figure 17.Experimental results based on different diagnostic methods.

Figure 17 .
Figure 17.Experimental results based on different diagnostic methods.

Figure 17 .
Figure 17.Experimental results based on different diagnostic methods.

Table 1 .
The working condition parameters of PU dataset.

Table 1 .
The working condition parameters of PU dataset.

Table 2 .
The results of multi-channel fusion ablation experiment.

Table 2 .
The results of multi-channel fusion ablation experiment.

Table 2 .
The results of multi-channel fusion ablation experiment.

Table 3 .
Fault bearing label information.

Table 3 .
Fault bearing label information.

Table 3 .
Fault bearing label information.