An Improved Fault Diagnosis Using 1D-Convolutional Neural Network Model

: The diagnosis of a rolling bearing for monitoring its status is critical in maintaining industrial equipment while using rolling bearings. The traditional method of diagnosing faults of the rolling bearing has low identiﬁcation accuracy, which needs artiﬁcial feature extraction in order to enhance the accuracy. The one-dimensional convolution neural network (1D-CNN) method can not only diagnose bearing faults accurately, but also overcome shortcomings of the traditional methods. Different from machine learning and other deep learning models, the 1D-CNN method does not need pre-processing one-dimensional data of rolling bearing’s vibration. In this paper, the 1D-CNN network architecture is proposed in order to effectively improve the accuracy of the diagnosis of rolling bearing, and the number of convolution kernels decreases with the reduction of the convolution kernel size. The method obtains high accuracy and improves the generalizing ability by introducing the dropout operation. The experimental results show 99.2% of the average accuracy under a single load and 98.83% under different loads.


Introduction
Modern industrial equipment uses many rolling bearings that play a significant role in mechanical transmission. The failure of rolling bearing makes mechanical equipment unable to operate normally and efficiently, reduce safety, and shorten the service life. Statistics show that bearing failure causes 45-55% of mechanical failures [1]. Therefore, in order to ensure the reliable and normal operation of industrial equipment, the application of intelligent monitoring technology is needed [2,3]. The recognition of rolling bearings' faults is based on modeling or monitoring signals. A model-based diagnosis uses a mathematical model that simulates a real system with the necessary assumptions and compares the data of the monitoring system to the mathematical model in order to predict the faults of rolling bearings [4]. The signal-based method refers to extracting bearing fault features for fault diagnosis while using various signal analysis techniques [5], from time-domain signals [6] and frequency-domain signals [7]. The time-frequency analyses of signals of rolling bearing's vibration include wavelet analysis [8], short-time Fourier transformation (STFT) [9], Empirical Mode Decomposition [10], and singular value decomposition [11].
Machine learning methods have applications in many fields. For example, Battineni et al. [12] comprehensively analyzed the application of machine learning model into chronic disease diagnosis and then concluded that support vector machine (SVM), logistic regression (LR), and clustering will become more important for chronic disease prediction and diagnosis. The machine learning method is also applied in the field of bearing fault diagnosis, which solves the shortcoming of traditional fault diagnosis methods that need rich mechanical knowledge and expert experience. The machine learning method for (CWRU) data set. Zhang et al. [24] suggested a diagnosis of bearing's fault based on 1D-CNN, which processed the original vibration signal as the input without de-noising and it achieved high accuracy, even with noise and different loads. The experimental results show that the average accuracy can reach 95.5% under different loads. Ma et al. [25] proposed a lightweight CNN with fast training and strong transfer learning. The experimental results show that the proposed algorithm is superior to the existing algorithms in terms of accuracy and transfer performance. The average accuracy of transfer learning under different loads is 98.7%. Wang et al. [26] tried a method for fusing multimodal sensor signals (i.e., data from accelerometers and microphones) and used 1D-CNN in order to extract characteristics from vibration and acoustic signals that were fused. The experimental results show that the algorithm can still achieve more than 98.87% accuracy under the influence of noise, with high accuracy and strong robustness. Shenfield et al. [27] proposed a new two-path recursive neural network (RNN-WDCNN), which focuses directly on the original vibration signal of bearings. RNN-WDCNN combines elements of recurrent neural networks (RNNs) and convolutional neural networks (CNNs) in order to capture distant dependencies in time series data and suppress high-frequency noise in the input signals. The experimental results show that RNN-WDCNN is superior to the existing network in terms of domain and noise suppression. Yuan et al. [28] used continuous wavelet transform to transform one-dimensional original vibration signals into two-dimensional time-frequency images and attempted to fit these transformed images into the CNN-SVM model. The experiments indicate that the diagnostic accuracy of this method can reach 98.75% for the CWRU dataset and 98.89% for the MFPT dataset, which verifies the flexibility and practicability of the constructed model. Han et al. [29] added red to the time-domain color feature map (TDCF), which significantly improved the fault characteristics of the signal. The experimental results show that the CNN fault diagnosis method that is based on 0.4TDCF can still achieve the accuracy of more than 93.7% under the condition of strong noise.
In this paper, a bearing fault diagnosis method, which used the 1D-CNN neural network classifier for automatic feature extraction and fault identification of original time series sensor data, is studied. Convolutional neural network (CNN) is a pre-feedback 2D neural network. Convolution and pooling operations are usually performed alternately, and a convolutional layer is a simulated human visual cortex cell [30]. For a given input data, the convolution kernel can automatically extract features. In the supervised phase of training, back-propagation optimizes the parameters of the convolution kernel, so that the convolution kernel better extracts the appropriate features from the input data. After proper training of the network model with the bearing vibration signal data set, the model can automatically extract the best classification features in order to realize the fault diagnosis classification of rolling bearings. The availability of the suggested approach is assessed by using the data set of the rolling bearing, which is supplied by CWRU. The main contributions of this paper are, as follows: (1) the network structure effectively improves the accuracy of bearing fault diagnosis, with the reduction of the number and size of the convolution kernel, (2) dropout operation effectively improves the accuracy of fault diagnosis of 1D-CNN model across loads and, as a result, the generalization ability is enhanced, and (3) under the finite number of iterations, the 1D-CNN model can achieve high accuracy.
The chapters of this paper are arranged, as follows. Section 2, the convolutional neural network is briefly introduced, and the 1D-CNN model is proposed. Section 3 describes the CWRU dataset and the selection of model parameters. Section 4 contrasts the experimental results of this method with other methods while using the CWRU dataset and proves the effectiveness. Finally, Section 5 presents the conclusion of this study.

A Brief Introduction to Convolutional Neural Networks
The convolutional neural network (CNN) has a unique network architecture and it effectively reduces the complexity and overfitting of a neural network. CNN is similar to the visual system of biology [31]. In the biological visual system, neurons in the visual cortex only respond to the stimulation of certain specific areas. That is, the neurons only receive local information and biological cognition of the external environment expands from local to global. Therefore, the neurons do not need to perceive the whole image in the neural network, but they only perceive the local features of the image as the local information from each neuron is synthesized at the highest level of the visual cortex to obtain the global information of the image. The following part mainly introduces the activation function, full connection layer, Softmax, and Dropout operation that are commonly used in the convolutional neural network.
After the convolution operation, the activation function transforms the output value nonlinearly. The original multi-dimensional features are mapped in order to enhance the linear separability of the extracted features. The activation function Tanh and the modified linear element are used in the neural network and the expressions of the two activation functions are shown as the following equations.
where a l(i,j) represents the activation value of output after passing through the convolutional layer. In Section 3.2 of this paper, the activation function Relu and Tanh are compared. The experimental results show that the activation function Tanh performs better in the 1D-CNN model.
The full connection layer classifies the features that are extracted by the convolution kernel. To be specific, the output of the previous layer is first spread out as a onedimensional vector (as shown in Figure 1), which is used as the input of the full connection layer, and the input and output are fully connected. The formula for the full connection layer is shown as where W l ij represents the weighted value of the ith neuron in the lth layer and the jth neuron in the l + 1 layer, z l+1 j represents the logit value of the jth output neuron at the l + 1 layer, and b l j represents the bias value of all neurons in the layer l to the jth neuron in the layer l + 1. The Softmax function calculates the probability of all the classification tags, which is a multi-classification form that is obtained by logistic regression. It is often used in multi-classification problems. The specific expression is as follows The Softmax function calculates the probability of all the classification tags, which is a multi-classification form that is obtained by logistic regression. It is often used in multi-classification problems. The specific expression is as follows where z 0 (j) represents the output value of the jth neuron in the output layer and M represents the sum number of categories. The dropout proposed by Hinton et al. [32] reduces the overfitting and enhances the generalization ability of a neural network. The dropout algorithm sets the neurons in a certain layer of the neural network to zero at a certain probability p, as shown in Figure 2. The algorithm weakens the joint adaptability of the same layer of neural nodes and improves the generalization ability. A neural network with N nodes is regarded as a set of 2 N models while using the Dropout algorithm. The number of training parameters is unchanged, and the optimal model is selected from the 2 N models as the best model by training. The Softmax function calculates the probability of all the classification tags, which is a multi-classification form that is obtained by logistic regression. It is often used in multi-classification problems. The specific expression is as follows where 0 z ( ) j represents the output value of the j th neuron in the output layer and M represents the sum number of categories. The dropout proposed by Hinton et al. [32] reduces the overfitting and enhances the generalization ability of a neural network. The dropout algorithm sets the neurons in a certain layer of the neural network to zero at a certain probability p, as shown in Figure 2. The algorithm weakens the joint adaptability of the same layer of neural nodes and improves the generalization ability. A neural network with N nodes is regarded as a set of 2 models while using the Dropout algorithm. The number of training parameters is unchanged, and the optimal model is selected from the 2 models as the best model by training.

Fault Diagnosis Process Based on 1D-CNN
The fault identification course of the rolling bearing that is based on the 1D-CNN is as follows.
(1) A sensor is installed on the corresponding position of the rolling bearing.
(2) The one-dimensional vibration signal is first collected as the raw data, and then the signal data is divided into training, validation, and test sets.
The training set is used as the input of 1D-CNN network. The model is trained, the validation set is used in order to verify the model performance, and appropriate network model parameters are selected. (4) The test set is put to the trained model, and the performance of the model is evaluated.
The 1D-CNN was evaluated while using the one-dimensional original vibration data set of the rolling bearing supplied by CWRU [33]. The dataset consists of 1000 samples, which are divided into the training set, validation set, and test set, according to the ratio of 6:2:2. Different loads were grouped into different fault categories, as shown in Table 1. The "0123 HP" in Table 1 represents a new bearing vibration data set that is composed of bearing vibration data sets, with loads of 0 HP, 1 HP, 2 HP, and 3 HP.

1D-CNN Structure
The number of convolution kernels in a convolutional neural network gradually increases with the reduction of the size of the convolution kernel, while the number of convolution kernels in the 1D-CNN model that is proposed in this paper also gradually decreases with the reduction of the size of the convolution kernel. The experimental results show that the 1D-CNN model has higher accuracy, with the number of convolution kernels decreasing with the size of the convolution kernel at different loads, as shown in Figure 3. The 1D-CNN model structure and parameter setting in this paper are obtained through several experiments. The 1D-CNN structure incorporates five convolution layers, four pooling layers, and two full connection layers ( Figure 4). Table 2 shows the specific detailed network structure parameters.
Step 1 In each convolution layer, the appropriate number and size of the convolution kernel performs one dimensional convolution operations. The input data are the one-dimensional signal that has a length of 1024. Five convolution layers use 128 convolution kernels of size 16 × 1 (Conv1), to Conv5), 64 of size 8 × 1 (Conv2), 32 of size 4 × 1 layer (Conv3), 16 of size 4 × 1 (Conv4), and eight of size 4 × 1 (Conv5). Tanh is the hyperbolic activation function for the five convolution layers.
Step 2 The pooling layer is appended to the Conv1, Conv2, Conv3, and Conv4, and carries out a 2 × 2 max-pooling operation. The dropout operation is executed after executing the first and second pooling layers and, then, the dropout ratio is set to 0.3. A dropout operation with a ratio of 0.25 is performed after the third pooling layer and that with a ratio of 0.25 after the fifth convolution layer. The dropout operation randomly selects and deletes neurons from the model in order to form a random subset of the neurons, solve the overfitting problem, and enhanced the generalization ability of the neural network model. This does not depend on connections between neurons that have specific connections. In the flatten layer, the extracted features from the five convolution layers are ex- The 1D-CNN model structure and parameter setting in this paper are obtained through several experiments. The 1D-CNN structure incorporates five convolution layers, four pooling layers, and two full connection layers ( Figure 4). Table 2 shows the specific detailed network structure parameters.
Step 1 In each convolution layer, the appropriate number and size of the convolution kernel performs one dimensional convolution operations. The input data are the one-dimensional signal that has a length of 1024. Five convolution layers use 128 convolution kernels of size 16 × 1 (Conv1), to Conv5), 64 of size 8 × 1 (Conv2), 32 of size 4 × 1 layer (Conv3), 16 of size 4 × 1 (Conv4), and eight of size 4 × 1 (Conv5). Tanh is the hyperbolic activation function for the five convolution layers.
Step 2 The pooling layer is appended to the Conv1, Conv2, Conv3, and Conv4, and carries out a 2 × 2 max-pooling operation. The dropout operation is executed after executing the first and second pooling layers and, then, the dropout ratio is set to 0.3. A dropout operation with a ratio of 0.25 is performed after the third pooling layer and that with a ratio of 0.25 after the fifth convolution layer. The dropout operation randomly selects and deletes neurons from the model in order to form a random subset of the neurons, solve the overfitting problem, and enhanced the generalization ability of the neural network model. This does not depend on connections between neurons that have specific connections. In the flatten layer, the extracted features from the five convolution layers are extended to a one-dimensional vector. The output layer contains 10 neurons. While using Softmax as the activation function, 10 types of faults are identified after training.

Data Set Description
The 1D-CNN network model was programmed by Spyder in Anaconda3 (py-thon3.7) while using Keras2.0.This 1D-CNN model was also tested on a computer that was equipped with 1.8 GHz quad-core i5-8265U, 8GB of RAM, and an NVIDIA MX250

Data Set Description
The 1D-CNN network model was programmed by Spyder in Anaconda3 (python3.7) while using Keras2.0.This 1D-CNN model was also tested on a computer that was equipped with 1.8 GHz quad-core i5-8265U, 8GB of RAM, and an NVIDIA MX250 graphics card. The CWRU data set is the benchmark data set and it is widely used in researches on diagnosing faults of the rolling bearing. The experimental platform was composed of a motor, torque sensor, power tester, and electronic controller, as indicated in Figure 5. Rolling bearings of Skf6205 and Skf6203 were used in the driving end and fan end of the experimental platform. Single-point damage was machined on the bearing by electric discharging machine (EDM). The diameter of the damage was 0.0028 mm, 0.0056 mm, 0.0083 mm, 0.011 mm and 0.0157 mm. The damage points of the bearing outer ring (fixed in operation) are set at 3 o'clock, 6 o'clock, and 12 o'clock, respectively, in order to make the collected fault data of the outer ring real and effective. The vibration acceleration signals of the rolling bearing are collected by acceleration sensors that are mounted on the fan end and the motor drive end housing. The sampling frequency of the fan end is 12 kHz, and the driver end is 12 kHz and 48 kHz. The bearing test platform uses 16-channel data recorders to the collected vibration signals and a torque sensor to the measured load and speed.
Electronics 2021, 10, x FOR PEER REVIEW 9 of 20 electric discharging machine (EDM). The diameter of the damage was 0.0028 mm, 0.0056 mm, 0.0083 mm, 0.011 mm and 0.0157 mm. The damage points of the bearing outer ring (fixed in operation) are set at 3 o'clock, 6 o'clock, and 12 o'clock, respectively, in order to make the collected fault data of the outer ring real and effective. The vibration acceleration signals of the rolling bearing are collected by acceleration sensors that are mounted on the fan end and the motor drive end housing. The sampling frequency of the fan end is 12 kHz, and the driver end is 12 kHz and 48 kHz. The bearing test platform uses 16-channel data recorders to the collected vibration signals and a torque sensor to the measured load and speed.
In the experiment, the vibration data from the acceleration sensor at the driving end were selected at 12 kHz sampling frequency. The data set included nine types of failures in the normal state of the bearing, the bearing inner ring, and the ball bearings at diameters of 0.0028 mm, 0.0056 mm, and 0.0083 mm. The damage points of the bearing's outer ring were in the direction of3, 6, and 12 o'clock. The vibration signals of the 10 fault types were selected when the load was 0 HP, 1 HP, 2 HP and 3HP with 1024 data points (the motor speed is 1797/min. At the sampling frequency of 12 kHz, there are 60/1797 × 12000 = 400.67 sampling points in each cycle. The sample length is set to 1024, which can contain sample data of 1024/400 = 2.56 cycles, so as to ensure that each sample contains abundant fault feature information). Different fault types under different loads are divided into the training set, validation set, and test set according to the ratio of 6:2:2. Table 1 shows the data set of rolling bearing vibration signals. Figure 6 reveals the original vibration signals of the 10 fault types under 0 HP.  In the experiment, the vibration data from the acceleration sensor at the driving end were selected at 12 kHz sampling frequency. The data set included nine types of failures in the normal state of the bearing, the bearing inner ring, and the ball bearings at diameters of 0.0028 mm, 0.0056 mm, and 0.0083 mm. The damage points of the bearing's outer ring were in the direction of3, 6, and 12 o'clock. The vibration signals of the 10 fault types were selected when the load was 0 HP, 1 HP, 2 HP and 3HP with 1024 data points (the motor speed is 1797/min. At the sampling frequency of 12 kHz, there are 60/1797 × 12000 = 400.67 sampling points in each cycle. The sample length is set to 1024, which can contain sample data of 1024/400 = 2.56 cycles, so as to ensure that each sample contains abundant fault feature information). Different fault types under different loads are divided into the training set, validation set, and test set according to the ratio of 6:2:2. Table 1 shows the data set of rolling bearing vibration signals. Figure 6 Table 1). The y-axis represents the amplitude of signal and x-axis is the number of sampling points.

1D-CNN Model Parameter Selection
The neural network is a multilayer compound function in mathematics. If there is no activation function, then the neural network will be a linear function. However, samples are not always linearly separable, so the activation function with nonlinear factors, such as Tanh and Relu, should be used in order to solve problems that cannot be settled. Relu makes the output of neurons with a negative input value zero, which reduces the interdependence among parameters and speeds up the calculation. In order to explore the influence of Tanh and Relu on the 1D-CNN network, their activation functions are used in the experiment under the load of 0HP. Figures 7 and 8 show the curve of loss function and accuracy in the training process.
In Figure 7, the loss curve can be seen in the process of 30 finite iterations, despite the fact that the loss function values of both are constantly decreasing. However, if the activation function is Tanh, then the curve converges at a significantly faster rate and, finally, approaches zero value. In contrast, if the activation function is Relu, then the convergence becomes slower. The accuracy curve depicted in Figure 8 shows that, with the increase of iteration times, the activation function uses Tanh in order to converge faster and more accurately than Relu. In order to further prove the necessity of selecting Tanh as the activation function, experiments were carried out at 0 HP, 1 HP, 2 HP, 3 HP, and 0123 HP. The experimental results that are shown in Figure 9 indicate that Tanh is more accurate than Relu if the former is selected as the activation function for the 1D-CNN network model proposed in this paper. According to the corresponding experimental results, Relu was deemed to be unsuitable for the 1D-CNN model that was  Table 1). The y-axis represents the amplitude of signal and x-axis is the number of sampling points.

1D-CNN Model Parameter Selection
The neural network is a multilayer compound function in mathematics. If there is no activation function, then the neural network will be a linear function. However, samples are not always linearly separable, so the activation function with nonlinear factors, such as Tanh and Relu, should be used in order to solve problems that cannot be settled. Relu makes the output of neurons with a negative input value zero, which reduces the interdependence among parameters and speeds up the calculation. In order to explore the influence of Tanh and Relu on the 1D-CNN network, their activation functions are used in the experiment under the load of 0HP. Figures 7 and 8 show the curve of loss function and accuracy in the training process.
In Figure 7, the loss curve can be seen in the process of 30 finite iterations, despite the fact that the loss function values of both are constantly decreasing. However, if the activation function is Tanh, then the curve converges at a significantly faster rate and, finally, approaches zero value. In contrast, if the activation function is Relu, then the convergence becomes slower. The accuracy curve depicted in Figure 8 shows that, with the increase of iteration times, the activation function uses Tanh in order to converge faster and more accurately than Relu. In order to further prove the necessity of selecting Tanh as the activation function, experiments were carried out at 0 HP, 1 HP, 2 HP, 3 HP, and 0123 HP. The experimental results that are shown in Figure 9 indicate that Tanh is more accurate than Relu if the former is selected as the activation function for the 1D-CNN network model proposed in this paper. According to the corresponding experimental results, Relu was deemed to be unsuitable for the 1D-CNN model that was proposed in this paper.     When training the network model, a batch size impacts not only the training speed, but also the accuracy. A large batch-size can expedite the training process, but it requires a large memory space in a computer. For small batch-size in the training process, although the operation speed is slower and some noise is produced, the appearance of noise is also helpful in preventing the training process from falling into local optimal. Therefore, it is very important to select an appropriate batch-size. In this paper, six different batch-sizes    When training the network model, a batch size impacts not only the training s but also the accuracy. A large batch-size can expedite the training process, but it req a large memory space in a computer. For small batch-size in the training proces hough the operation speed is slower and some noise is produced, the appearan noise is also helpful in preventing the training process from falling into local op Therefore, it is very important to select an appropriate batch-size. In this paper, s ferent batch-sizes (8,16,32,64,128 and 256) were selected for comparison test different loads. The experimental results, as shown in Figure 10, indicate that, wh batch-size is 8, 16, 32 and 64, the average accuracy of the 1D-CNN model on dif loads reaches more than 98%, while the batch size is 128, 256, the average accur different loads reads 97.28% and 93.86%, respectively. The experimental results that the 1D-CNN model has the highest average accuracy of 99.3% under different when the batch size is 64.

The Specific Parameters of Six Models
The 1D-CNN model is compared with the experiments of five different mode are based on machine learning and deep learning, so as to prove the effectiveness 1D-CNN model in the fault diagnosis of rolling bearings. The five models are (Long-Short Term Memory), MLP, SVM, Random forest, and KNN. The datasets s in Table 1 were used by the six models. The specific parameters of LSTM and ML work models are selected after experiments are done with the same selection meth the parameters of 1D-CNN. GridSearchCV (10-fold cross verification parameters) is in order to select several parameters that affect the performance of SVM, RandomF and KNN models. Experiments were carried out under loads of 0 HP, 1 HP, 2 HP, and 0123 HP (represents a new bearing vibration data set that is composed of be vibration data sets with loads of 0 HP, 1 HP, 2 HP, and 3 HP). The parameters o model are as follows: The learning ratio was set to 0.001 and the activation function is set to Tanh optimizer was Adam, which combines the advantages of Adagrad and RM

The Specific Parameters of Six Models
The 1D-CNN model is compared with the experiments of five different models that are based on machine learning and deep learning, so as to prove the effectiveness of the 1D-CNN model in the fault diagnosis of rolling bearings. The five models are LSTM (Long-Short Term Memory), MLP, SVM, Random forest, and KNN. The datasets shown in Table 1 were used by the six models. The specific parameters of LSTM and MLP network models are selected after experiments are done with the same selection method as the parameters of 1D-CNN. GridSearchCV (10-fold cross verification parameters) is used in order to select several parameters that affect the performance of SVM, RandomForest, and KNN models. Experiments were carried out under loads of 0 HP, 1 HP, 2 HP, 3 HP, and 0123 HP (represents a new bearing vibration data set that is composed of bearing vibration data sets with loads of 0 HP, 1 HP, 2 HP, and 3 HP). The parameters of each model are as follows: (1) 1D-CNN model The learning ratio was set to 0.001 and the activation function is set to Tanh. The optimizer was Adam, which combines the advantages of Adagrad and RMSprop algorithm and it has high computing efficiency and low memory requirement. The loss function is categorical_crossentrop, the batch-size was set to 64, and the iteration time was 30.
(2) LSTM model The first layer of the LSTM had 32 neurons with Tanh as the activation function. The second layer had 32 neurons in the full connection layer with Relu as the activation function. The third layer had 10 neurons and it was classified by Softmax. The learning ratio was set to 0.001 and the optimizer is Adam. The loss function was categorical_crossentropy. The batch-size was 32 and iteration time was 30.
(3) MLP model The first, second, third, and fourth layers were the whole connective layer with 300, 400, 200 and 100 neurons, respectively. The activation function was Relu. Dropout operations were adopted with a probability of 0.4 in each full connection layer. The fifth layer was the output layer with 10 neurons and it was classified by Softmax. The learning ratio was set to 0.002 and the optimizer is Adam. The loss function wascategorical_crossentropy. The batch-size was 32 and the iteration time was 40. (4) SVM model The GridSearchCV (10-fold cross verification parameters) is adopted.
Gaussian kernel (RBF) is selected as the kernel function of SVM. The penalty factor C is determined to be 128, and gamma (controls the width of gaussian kernel and it determines the distribution of data mapped to the new feature space) is 0.002. (5) RandomForest model The GridSearchCV (10-fold cross verification parameters) is adopted. Three-hundred decision trees were used in order to construct the random forest model. The maximum depth of the random forest tree is 16, and the minimum number of tree splits was 5. (6) KNN model The GridSearchCV (10-fold cross verification parameters) is used for the KNN model in order to determine the best K value of 1.

Compared with Other Model Experiments
Each model is cross-validated with 10-fold when tested on different loads in order to better evaluate the performance of the model. The experimental results of various rolling bearing fault diagnosis methods are shown in Tables 3 and 4, and Figure 10. Table 3 shows the accuracies of the six models under different loads and the average accuracy of the 1D-CNN network in different loads is99.2%.The 1D-CNN's average accuracy is 65.94%, 30.82%, and 28.15% higher than KNN, RandomForest, and SVM, respectively. As the results show, these three bearing failure methods that are based on machine learning perform worse than the 1D-CNN network proposed in this paper, and the main causes are as follows: when we use KNN algorithm, the data should be preprocessed to some extent; SVM does not perform well in data sets with many feature points; RandomForst sensitive to noise easily leads to overfitting.
The average accuracy of the 1D-CNN is 12.41% and 20.61% higher than that of LSTM and MLP. The average accuracies of LSTM (86.79%) and MLP (78.59%)are higher than RandomForst (68.38%), KNN (33.26%),and SVM (71.05%).From the experimental results, the deep learning method is superior to machine learning, mainly because machine learning cannot learn some nonlinear relations in complex bearing vibration signals well, while deep learning method has great advantages in analyzing complex and non-stationary signals. In the experiment, these six methods were tested on the loads of 0 HP, 1 HP, 2 HP, 3 HP, and 0123 HP, while the 1D-CNN model achieved high accuracy on each load, with the difference between the highest accuracy and the lowest accuracy only being 2.3%.  Table 4 and Figure 11 show the average accuracies of different models under different loads. The 1D-CNNhad the accuracies of 99.3%, 97.6%, 99.5%, 99.9%, and 99.68% under different loads, with an average accuracy of 99.2%, which was higher than those of the other five models. Moreover, the standard deviation of the 1D-CNN was only 0.82%, which is lower than the 5.17%, 5.50%, 3.56%, 4.08%, and 3.89% of the other five methods. These results prove the effectiveness of the 1D-CNN in fault diagnosis under different loads.

Performances under Different Loads
The Dropout layer is added to 1D-CNN network model in order to avoid overfitting and enhance the generalization ability of 1D-CNN network model. The experimental results that are shown in Figure 11 prove that 1D-CNN can achieve higher accuracy in cross-load training Cross-load training was carried out under the three loads of 1 HP, 2 HP, and 3 HP, as shown in Figure 12. The experimental results show that the generalization ability of the 1D-CNN network model is significantly improved with the addition of Dropout lay-

Performances under Different Loads
The Dropout layer is added to 1D-CNN network model in order to avoid overfitting and enhance the generalization ability of 1D-CNN network model. The experimental results that are shown in Figure 11 prove that 1D-CNN can achieve higher accuracy in cross-load training Cross-load training was carried out under the three loads of 1 HP, 2 HP, and 3 HP, as shown in Figure 12. The experimental results show that the generalization ability of the 1D-CNN network model is significantly improved with the addition of Dropout layer; especially, the model accuracy is increased by 9.5% under the condition of 3 HP→1 HP. The average accuracy of cross-load training increased from 94.25% (without Dropout) to 98.83% (with Dropout) ("3 HP→1 HP" means training at 3 HP and testing at 1 HP).

Performances under Different Loads
The Dropout layer is added to 1D-CNN network model in order to avoid overfitting and enhance the generalization ability of 1D-CNN network model. The experimental results that are shown in Figure 11 prove that 1D-CNN can achieve higher accuracy in cross-load training Cross-load training was carried out under the three loads of 1 HP, 2 HP, and 3 HP, as shown in Figure 12. The experimental results show that the generalization ability of the 1D-CNN network model is significantly improved with the addition of Dropout layer; especially, the model accuracy is increased by 9.5% under the condition of 3 HP→1 HP. The average accuracy of cross-load training increased from 94.25% (without Dropout) to 98.83% (with Dropout) ("3 HP→1 HP" means training at 3 HP and testing at 1 HP).  In the practical application, the fault samples of equipment under various loads are difficult to collect. Thus, the faults of rolling bearings are collected under a certain load. However, training the model for diagnosing the faults requires the fault data under different loads. In the research of this paper, the generalization ability of the 1D-CNN under different loads was investigated and the result was compared to that of Shufflenet V2, MobileNet, ICN [34], DFCNN [35], and PFC-CNN [36]. Table 5 and Figure 13 show the specific experimental contrast results ("1 HP→2 HP" means using 1HP as a training set and 2 HP as a test set). Table 5 shows that the highest and lowest accuracy of the 1D-CNN is 100% and 97%. The lowest accuracy was lower than that of Shufflenet V2 (96.3%) and ICN (94.17%). The average accuracy was 98.3%, which was higher than that of Shufflenet V2 (97.36%), MobileNet (94.38%), ICN (97.07%), DFCNN (90.05%), and PFC-CNN (93.31%). The results validate that the 1D-CNN is effective in diagnosing the bearing faults under different loads.  Figure 13 shows the accuracies of the diagnosis under various loads with different methods. Except for the case of 2HP→1HP, the 1D-CNN model had higher accuracies than Shufflenet V2 and ICN. In other cases, the proposed model showed effectiveness in diagnosing the faults under cross-loads.

Visual Analysis of Validity of 1D-CNN Model
The results under different loads are summarized in the confusion matrix in order to more intuitively assess the accuracy of the 1D-CNN model bearing fault diagnosis. Table  6 shows the bearing status that is represented by each number in Figure 14. The confusion matrix presents the predicted result of the samples on the horizontal axis and the actual label of the samples on the vertical axis. 5% of the ball fault (0.0056 mm) were incorrectly predicted as the ball fault (0.0028mm) under 0 HP and 12% of the ball fault (0.0084 mm) were incorrectly predicted as ball fault (0.0028mm) under 1HP.Under the Figure 13. Accuracy of different models across loads. Figure 13 shows the accuracies of the diagnosis under various loads with different methods. Except for the case of 2HP→1HP, the 1D-CNN model had higher accuracies than Shufflenet V2 and ICN. In other cases, the proposed model showed effectiveness in diagnosing the faults under cross-loads.

Visual Analysis of Validity of 1D-CNN Model
The results under different loads are summarized in the confusion matrix in order to more intuitively assess the accuracy of the 1D-CNN model bearing fault diagnosis. Table 6 shows the bearing status that is represented by each number in Figure 14. The confusion matrix presents the predicted result of the samples on the horizontal axis and the actual label of the samples on the vertical axis. 5% of the ball fault (0.0056 mm) were incorrectly predicted as the ball fault (0.0028 mm) under 0 HP and 12% of the ball fault (0.0084 mm) were incorrectly predicted as ball fault (0.0028 mm) under 1HP.Under the load of 0 and 1 HP, errors appeared in the diagnosis and prediction of the ball fault, as noise masks the characteristic information of the ball fault under lower loads. In other cases, the 1D-CNN has an appropriate prediction.  t-SNE is a dimension reduction algorithm that is based on manifold learning, which is different from the traditional PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) methods. t-SNE uses normalized Gauss collated high-dimensional spatial data features for similarity modeling. At the same time, t-distribution is used in order to model the similarity of low-dimensional spatial data. KL t-SNE is a dimension reduction algorithm that is based on manifold learning, which is different from the traditional PCA (Principal Component Analysis) and LDA (Linear Discriminant Analysis) methods. t-SNE uses normalized Gauss collated high-dimensional spatial data features for similarity modeling. At the same time, t-distribution is used in order to model the similarity of low-dimensional spatial data. KL distance narrows the distance distribution of high-and low-dimensional space and allows for visualizing high-dimensional data into two-dimensional or three-dimensional graphics. The t-SNE visualization algorithm is used in order to prove that the 1D-CNN distinguishes different fault types (Figure 15). By reducing the data's dimension in the prediction of results under different loads [37], the t-SNE algorithm visualizes the prediction results of different loads in the CWRU data set. Table 6 shows the status of the bearings that are represented by the numbers; Figure 15 shows that the bearing failure characteristics representing the alike fault type are gathered together, and various types of bearing faults are separated, which shows that the 1D-CNN effectively distinguishes fault characteristics under different loads. distinguishes different fault types ( Figure 15). By reducing the data's dimension in the prediction of results under different loads [37], the t-SNE algorithm visualizes the prediction results of different loads in the CWRU data set. Table 6 shows the status of the bearings that are represented by the numbers; Figure 15 shows that the bearing failure characteristics representing the alike fault type are gathered together, and various types of bearing faults are separated, which shows that the 1D-CNN effectively distinguishes fault characteristics under different loads.

Conclusions
A new method for diagnosing faults of the rolling bearingis proposed using a 1D-CNN. The vibration dataset of the rolling bearing supplied by Case Western Reserve University (CWRU) is used in order to verify the model. The following conclusions can be drawn from a series of experiments in this paper.

Conclusions
A new method for diagnosing faults of the rolling bearingis proposed using a 1D-CNN. The vibration dataset of the rolling bearing supplied by Case Western Reserve University (CWRU) is used in order to verify the model. The following conclusions can be drawn from a series of experiments in this paper.
(1) The method that is proposed in this paper shows an average accuracy of 99.2% under a single load and 98.83% across different loads. Moreover, the original vibration data of the bearings are directly used without preprocessing. (2) In this paper, we propose a 1D-CNN network structure, in which the number of convolution kernels decreases with the reduction of the size of the convolution kernel, and that network structure effectively improves the accuracy of bearing fault diagnosis. (3) The 1D-CNN model has great advantages for analyzing complex and non-stationary signals when compared with traditional machine learning methods. (4) The Dropout layer added to the 1D-CNN model effectively improves the accuracy of cross-load training and it enhances the generalization ability of the model.
In the actual industrial environment, the vibration data of the bearing will be disturbed by great noise, and the vibration data of the rolling bearing will be increasingly complicated. In future research, with a view to better integrating deep learning and fault diagnosis, we will continue exploring how to accurately diagnose bearing faults in complex bearing vibration data with noise.

Data Availability Statement:
We include a data availability statement with all Research Articles published in an MDPI journal. The nature of the data in an mat file, and the data can be accessed on the website: https://csegroups.case.edu/bearingdatacenter/home, there are no restrictions on data access. The data used to support the findings of this study are included in the article.