Bearings are essential mechanical parts and have operated for a long time as consumables, hence the wear of the bearings is quite considerable. In this section, an open bearing dataset was used for the experiment to preliminarily validate the proposed method. The lack of the entire wear process data was to simulate an actual manufacturing field and the data were labeled by the categorical RUL ratio. The effect of Ls, Ns, and Ts were also analyzed on the 1-D CNN with clustering loss. The proposed method on the RUL estimation was verified by comparing the estimation error and functionality with other studies.
3.1. Data Acquirement and Processing
The open bearing dataset was obtained from the Institute of Electrical and Electronics Engineers (IEEE) Prognostics and Health Management (PHM) 2012 Prognostic Challenge [
24]. It is a run-to-failure experiment and is an online health monitor through the accelerated degradation of bearings under adjustable operating conditions. The data gathered under three different loads (rotating speed and load force) contains rotating vibration, speed, load force, and the temperature of bearings. The sampling frequency was 25.6 kHz and the recording time was 0.1 s while the time interval of each piece of data was 10 s. Furthermore, six run-to-failure datasets were provided to build the prognostic models, 11 remaining bearings were used to evaluate the estimation accuracy of the bearings’ remaining useful life (RUL). In this experiment, the wear amount was replaced with the RUL of the bearings to simulate what manufacturers are lacking in the complete wear process data.
The RUL ratio of bearings was used as the target for the model estimation at first. The elapsed time of each piece of data was divided from the beginning by the total wear time of the bearing. Next, since the model requires time-series input, an arrangement of data is necessary, according to the Ls, Ns, and Ts designed, Ns pieces, Ls length of the one axis vibration signal data, and the RUL ratio of the last signal is set as an input and output pair (training pattern). Note that it was assumed that the manufacturer acquires only a few vibration signals of and wear failure samples in this study. The arranged data whose RUL ratio greater than 0.75 and less than 0.25 would be treated as OK and NG data. To evaluate the estimation performance, the whole dataset of 17 bearings was kept with the original RUL ratio target as an estimation set.
Generally, the total dataset was divided into three parts: training, validation, and test sets. Therefore, for building the 1-D CNN classification model, the 80% data of six bearings were used for the training set, and 20% of the remaining data were used for the validation set; the data of 11 bearings were used for the test set, similar to the IEEE PHM challenge. Finally, the input data were normalized for preventing abnormal calculation values, and the vibration signal was re-scaled to within the range [−1, 1].
3.2. 1-D CNN with Clustering Loss Model Analysis
In this section, the training results and the effect of selected parameters Ls, Ns, and Ts on the 1-D CNN with the clustering loss model are introduced. For a rotation speed of about 1500~1800 rpm, the characteristic defect frequencies of the bearing were higher than 25 Hz. The input signal length Ls was selected as 2048 and the longest signal period of one cycle was determined as 1024 samples. There were 2560 sampling points in a single separated data as the maximum data length, so it is feasible for the Ls to be consequently designed to 2048.
The proposed 1-D CNN shown in
Figure 3 was adopted to treat the problem. Herein, the convolution of the first few layers does not stack with the pooling layer. After multiple layers of convolution, it overlaps with the pooling layer for reducing excessive calculations and outputs the hidden layer features
f1,
f2. The clustering loss was added to the intermediate hidden layer output calculation to make the features extracted from the OK and NG data have a clustering effect. Subsequently, the classifier part is to distinguish the eight hidden layer features of OK and NG data into two classes. Its structure is a fully connected simple neural network structure of [2,8,32] (one hidden layer). The final one is the estimation part. Since the data were pre-processed by dividing them by the 0.75 and 0.25 RUL ratio (i.e., the
fµ0 and
fµ1 was 0.875 and 0.125), the estimation value
P is
The learning parameters designed are shown in
Table 1; and the training result of the root mean square error (RMSE) is shown in
Figure 7, blue: training loss; orange: validation loss. This shows that the training was successful and the overfitting phenomenon was not serious.
Discussion 1: Learning Algorithm Selection
The effects and results were compared to other popular algorithms in
Table 2, which are the results of the same model trained on different algorithms. It can be seen that the use of Adam could obtain lower loss and higher accuracy under the same initial learning rate and epoch number. For the reasons of efficiency and convenience, Adam was selected as the training algorithm.
Discussion 2: Time-Series Input Scheme
Herein, a comparison result of parameter analysis for the time-series input scheme is introduced in
Table 3, where each trained model is depicted by (
Ns,
Ts). It can be observed that the classification ability is feasible since the accuracy of the training and validation set classification was almost 100% and the accuracy of the test set was greater than 80%. Then, the data of the training set bearings with the original RUL ratio target were used for comparing each model. The mean square error (MSE) of each model estimation is shown in
Figure 8, where the color bar shows the corresponding MSE. The model (5, 10) had the minimum MSE 0.0139. Hence, the parameters (
Ns,
Ts) of 1-DNN with clustering loss designed were suggested as (5, 10). Moreover, the entire process data of the training sets were used for observing the continuous monitoring ability, which is shown in
Figure 9, blue: estimation RUL ratio; orange: actual time of vibration signal data. The estimation RUL ratio gradually decreased over time.
Discussion 3: CNN with Clustering Loss
In addition, the corresponding feature after training is introduced in
Figure 10, where
● and
× denote OK and NG, respectively. There was a model trained without clustering loss for comparing the effect of clustering loss, confirming that the clustering loss is feasible for intermediate feature outputs clustering into each other. From
Figure 10a, the CNN with clustering loss separated the features of both clusters; in contrast, the feature outputs overlapped and mingled messily with each other, as shown in
Figure 10b. Simultaneously,
Table 4 shows the cluster distance
D and cluster radiuses
r0,
r1 of each cluster. Although distance
D without clustering was larger than the model with clustering, the radiuses and
Lcluster(
Dnorm) with clustering loss were smaller. Consequently, the clustering loss was effective for the feature outputs to cluster into each category.