Multi-Sensor Data Fusion for Remaining Useful Life Prediction of Machining Tools by IABC-BPNN in Dry Milling Operations

Inefficient remaining useful life (RUL) estimation may cause unpredictable failures and unscheduled maintenance of machining tools. Multi-sensor data fusion will improve the RUL prediction reliability by fusing more sensor information related to the machining process of tools. In this paper, a multi-sensor data fusion system for online RUL prediction of machining tools is proposed. The system integrates multi-sensor signal collection, signal preprocess by a complementary ensemble empirical mode decomposition, feature extraction in time domain, frequency domain and time-frequency domain by such methods as statistical analysis, power spectrum density analysis and Hilbert-Huang transform, feature selection by a Light Gradient Boosting Machine method, feature fusion by a tool wear prediction model based on back propagation neural network optimized by improved artificial bee colony (IABC-BPNN) algorithm, and the online RUL prediction model by a polynomial curve fitting method. An example is used to verify whether if the prediction performance of the proposed system is stable and reliable, and the results show that it is superior to its rivals.


Introduction
In an automatic manufacturing system, machining tools of computer numerical control (CNC) have always been a crucial factor for machining quality. Machining tools wear or breakage may significantly decrease machining quality, increase production costs or even interrupt the running of the manufacturing system [1], and it is estimated 20% of downtime is attributed to tool failures [2]. Therefore, online remaining useful life (RUL) prediction and replacement of machining tools in time are urgently needed to assure machining quality and system reliability [3].
A huge amount of research work on RUL prediction of machining tools or equipment has been done over the last decade. In general, RUL prediction methods are divided into main three kinds, which are experience-based models, physics-based models and data-based methods [4,5].
According to the observed situation, experience-based models usually utilize engineering experience and expert knowledge to infer RUL from historical data. In diagnostics and prognostics, fuzzy logic methods and expert systems are two typical experience-based methods. Khelif et al. proposed an experience-based method, which uses the experience gained from solving similar and already seen problems, to predict the RUL of Li-ion batteries [6]. Yan et al. presented a fuzzy logic combined logistic regression method to predict RUL of gas turbine hot components and to assess sensing signals, including single sensor signal and multi-sensor signals. Compared to the single sensor signal, multi-sensor signals can provide more information about machining tools in machining process and make the RUL prediction result more reliable [3]. Thus, acquiring the most effective feature information and fusion from multi-sensor signals is a hot topic. Yu et al. proposed a novel weighted HMM-based approach for RUL prediction. The wear evolution process was discretized into five wear stages, and was formulated by multiple HMMs with different steps in each stage. The weighted HMM model was effectively fused based on multi-sensor signals and the predicted the RUL of tools [22]. Traditionally, feature extraction and selection is the key to multi-sensor data fusion. Many effective methods, like statistical analysis, time-frequency analysis and deep learning, have been used to extract features, and those, like correlation analysis, monotonicity analysis and residual analysis, have been used to select optimum features. Wu et al. utilized ensemble empirical mode decomposition method to eliminate noises of multi-sensor signals, statistic methods to extract feature, three methods including correlation analysis, monotonicity analysis and residual analysis to select optimum features, and adaptive NFIS to fuse feature, and then built an RUL prediction model [16]. Generally, soft computing techniques are applied for undertaking the fusion combing with some classical methods like SVM, NFIS and logistic regression, and an effective RUL of machining tool prediction model is ultimately formed. However, in the actual machining process, due to the randomness or nonlinearity between the level of tool wear and the feature of multi-sensor signals extracted and selected, the prediction model makes it difficult to predict the RUL of machining tools accurately and quickly.
In order to solve the above problems, an online RUL of machining tool prediction system, using back propagation neural network optimized by improved artificial bee colony algorithm (IABC-BPNN), based on multi-sensor data fusion is proposed in this paper. First, a multi-sensor data fusion online RUL prediction system scheme is introduced, which is based on massive sensor signals, and divided into an online signal data process and an offline signal data process. Then, the captured signals from force and vibration sensors are de-noised by a complementary ensemble empirical mode decomposition (CEEMD). The de-noised signals are used for effective feature extraction by statistical analysis, time domain analysis, frequency domain analysis and Hilbert-Huang transform (HHT). Next, a Light Gradient Boosting Machine (LightGBM) method-based feature selection is presented to obtain the optimal features. Finally, IABC-BPNN model is constructed to implement the feature fusion and predict the tool wear, and a polynomial curve fitting method (PCF) is used to predict online RUL of the machining tool.
The remainder of this paper is organized as follows: Section 2 proposes a RUL prediction system of machining tools based on multi-sensor data fusion. Section 3 introduces the signal preprocess method called CEEMD. Section 4 discusses different feature extraction methods in different domains, and the optimal features selection by the LightGBM method. Section 5 explains the IABC-BPNN prediction model-based data fusion and an online RUL prediction model building. Section 6 represents an experimental example study of the multi-sensor data fusion system, and discusses the experimental results. Section 7 summarizes the paper and looks forward to the future.

RUL Prediction System of Machining Tools Based on Multi-Sensor Data Fusion
As shown in Figure 1, the proposed RUL prediction system of machining tools based on multi-sensor data fusion is consisted of five parts: multi-sensor signal database (offline and online data), signal preprocess(de-noising), feature extraction, feature selection, feature fusion based on the IABC-BPNN model and RUL prediction by PCF method. The system involves two types of signal data process: offline and online.
For offline signal data process, multi-sensors, such as vibration and force, are installed around the workpiece to acquire different signals from CNC machining tools. First, a large volume of signal data from different sensors that are regularly received and stored in multi-sensor signal database. Next, these stored raw signal data are de-noised by CEEMD and features in time domain, frequency domain and time-frequency domain are extracted. The optimal features, which are those that are more related to tool wear, are selected by LightGBM method from all the extracted features. Finally, the selected features are inputted into the IABC-BPNN model to train and then to predict tool wear.
Once the trained model based on IABC-BPNN is proven to be feasible, it will trigger the process of online signal data process. Multi-sensor online signals are first acquired and de-noised by CEEMD. Next, three types of features are extracted and then are selected. Finally, the selected features as input data are sent to the trained model to obtain the tool wear. According to the tool wear levels, the RUL of machining tools is predicted using PCF.

Signal Preprocess
Due to the influence of the processing environment and other unavoidable factors, raw signals acquired from multi-sensors contain a lot of redundant information with noise, while the redundant information has a certain interference on the analysis of the signal, and affects the state monitoring of the equipment during the machining process, so further signal preprocess is needed before analysis.
De-noising is the most common method for signal preprocess. There are many methods for de-noising, amongst which wavelet threshold de-noising and empirical mode decomposition (EMD) are commonly used. The former needs to select the wavelet basis function, the number of decomposition layers, the threshold value, the threshold function, etc., which affect the accuracy of the final de-noising effect; while the latter does not need to set any basis function with prior knowledge, decomposes the signal into a set of intrinsic mode functions (IMFs) and a residue according to the time scale characteristics of the data, and each IMF component decomposed contains the local characteristics of different time scales of the original signals and can efficiently control the level of noise. Therefore, EMD is adaptive and suitable for analyzing non-linear and non-stationary signal sequences.
However, there are also some problems with EMD, among which is mode mixing problem. To deal with the problem, this paper introduces the CEEMD method proposed by Yeh et al. [24], which is an improved EMD method. The CEEMD method is mainly to add two opposite white noise signals to the analyzed signal many times, then perform EMD decomposition separately, and average the results of the multiple decompositions to obtain the final IMF. With enough the ensemble number of the white noise time series, noise in the signal can be reduced, or even completely eliminated. Figure 2 shows the flow chart of CEEMD preprocess for multi-sensor signals. The specific steps are described as follows. (1) The opposite white noise time series n i (t), whose variance is unity and mean value is zero, are added to the raw signal s(t) respectively and two new noise-added signal s + i0 (t) and s − i0 (t) are produced and expressed as where N is the number of ensemble and set to 80, and ε is the signal to noise ratio coefficient and set to [0.1, 0.2]. (2) The two new noised-added signal s + i0 (t) and s − i0 (t) are discomposed into the first IMF E + 1 (s + i0 (t)) and E − 1 (s − i0 (t)) using EMD method, then IMF i1 (t) can be described as The first residue r i1 (t) can be calculated as If r i1 (t) is monotonic, the decomposition will stop. Otherwise, two new noise-added signal s + i1 (t) and s − i1 (t) are produced by adding the opposite white noise time series E 1 (n i (t)) into r i1 (t) and expressed as according to the above decomposition process, the second IMF and the second residue r i2 (t) are calculated as The above decomposition is repeated until the residue is monotonic, and the final IMF and residue r iM (t) can be given as where M represents the number of signal decompositions and IMFs, and r iM (t) can be thought of as IMF i(M+1) (t).
(3) Repeating the above two steps for N trials and adding the opposite white noise time series into the signal very trial, we will obtain the final IMFs and residual of the signals, which are expressed as: Finally, the effective IMFs are selected to eliminate the noise in sensor signals, and the reconstruction of the raw signal can be expressed as

Feature Extraction and Selection
By extracting and analyzing features in time domain (TD), frequency domain (FD) and time-frequency domain (TFD) of the de-noised signals, the evolution of randomness or nonlinearity for machining tools can be tracked and described, so as to establish the RUL of machining tools prediction model.

Feature Extraction of the Multi-Sensor Signals
TD features (TDFs), FD features (FDFs) and TFD features (TFDFs) can reflect the state change of tools during machining, and they are also the effective features for the RUL prediction of machining tools [23,25,26]. By processing the multi-sensor signals after de-noising, TDFs, FDFs and TFDFs of signals at different stages during the machining process are extracted.
In this paper, a total of 10 TDFs are extracted from the multi-sensor de-noising signals by statistical analysis, which include mean value (T mv ), maximum (T max ), root mean square (T rms ), variance (T vr ), standard deviation (T sd ), peak-to-peak (T p2p ), waveform factor (T wf ), skewness factor (T sf ), kurtosis factor (T kf ) and crest factor (T cf ). Among them, T mv , T max , T rms , T vr , T sd and T p2p reflect the amplitude and energy of the signals over time domain, while T wf , T sf , T kf and T cf reflect the distribution situation over time domain. In frequency domain, a total of 7 FDFs are extracted by power spectrum density analysis, including mean (F mv ), maximum (F max ), root mean square (F rms ), variance (F vr ), skewness (F sf ), kurtosis (F kf ), and relative spectral peak per band (F rs ) of power spectrum, among which the first five describe the variation of main frequency band position of the signals over frequency domain while the last two describe the dispersion of spectral energies over frequency domain. Table 1 summarizes these TDFs and FDFs, where n is the number of sampling points (in time domain) or spectrum lines (in frequency domain).
Root mean square of power spectrum (Frms) Relative spectral peak per band (Frs) In time-frequency domain, TFDFs of the top 10 IMFs of the multi-sensor de-noising signals are extracted by Hilbert-Huang transform (HHT) which is based on the instantaneous frequencies resulting from IMFs of the analyzed signals [27][28][29]. HHT represents a time-frequency domain analysis method of signal by combining EMD with Hilbert transform [30]. Comparing with Fourier spectral analysis and Wavelet packet transform, HHT is mainly based on the instantaneous frequency calculation generated by Hilbert transform of the analyzed signals which are a series of IMFs decomposed by EMD. For any signal s(t), its Hilbert transform H[s(t)] is defined as Then, it can constitute an analytic signal z(t) whose amplitude and instantaneous frequency can be expressed as where, . Finally, the Hilbert spectrum of signal energy distribution in time and frequency is denoted as where Re denotes the real part of the analytic signal. H(ω, t) reflects the changing law of signal amplitude with time and frequency in the whole frequency band. In this paper, we selected the top 10 IMFs of signal to perform HHT, and any intrinsic energy feature is represented by E k :

Feature Selection of the Multi-Sensor Signals
Not all of the extracted features are perfectly related to the RUL prediction. On the contrary, some redundant or irrelevant features might reduce the accuracy of the prediction model, thereby decreasing the accuracy and efficiency of online prediction system. Therefore, the optimal feature selection of the multi-sensor signals is very important to improve the performance of the prediction system.
In the paper, the LightGBM method is used to select the optimal features. The literature has confirmed that LightGBM is on the top in machine learning in terms of computational accuracy and running speed, which is especially suitable for the processing of big data [31,32]. LightGBM proposed by Ke et al. [33] is a highly efficient gradient boosting decision tree (GBDT), including two algorithms: gradient-based one-side sampling (GOSS) and exclusive feature bundling (EFB). GOSS is used to split the optimal node in order to acquire a more accurate information gain estimation, while EFB is employed to bundle exclusive features into dense features in order to reduce the size of the training data. Then LightGBM is trained in sequence to fit the negative gradient of loss function in each iteration. According to the weighted combination scheme, LightGBM model F M (x) can be obtained as where m is the iteration number, M is the maximum iteration number, h m (x) represents the base decision tree, x is the data sample, and , y is the class label, and x and y combine a training set (x 1 , y 1 ), (x 2 , y 2 ), · · · , (x n , y n ) ). The extracted features are input into the LightGBM model for calculation, and the nonlinear relationship between the sequence features (the extracted features) and the class labels (tool wear) is mined. By calling the optimizing function in the encapsulated Sklearn class, the important features will be found in each iteration and given variable importance measures (VIM). The optimal features are selected from these important features with high VIM scores. In the LightGBM model, VIM usually is expressed using the Gini index (GI) from the random forest (RF) algorithm. Given that there are M features X 1 , X 2 , · · · , X c , GI score, VIM represents the average change of node splitting impurity of the jth feature in all RF trees. The formula of GI is where K is the number of categories in the sample data set, and p mk is the probability that the sample belongs to category k at node m. The importance of feature X j at node m, this is, GI change before and after node m branching, is where GI l and GI r indicate GI of two new nodes after branching, respectively. If the node where feature X j appears in the decision tree i is in the set M, then the importance of feature X j in the ith tree is Given that there are n trees in RF, then Finally, perform a normalization process on all the obtained importance scores to acquire the VIM score of feature X j

Feature Fusion and Tool Wear Prediction Model Based on Back Propagation Neural Network Optimized by Improved Artificial Bee Colony (Iabc-Bpnn) Algorithm
Once the optimal features are selected, IABC-BPNN optimization algorithm can be used for feature fusion, and the tool wear prediction model can be trained to obtain tool wear level as a health index of machining tools.

Improved Artificial Bee Colony (IABC) Algorithm
ABC algorithm was proposed and improved by Karaboga et al. [34][35][36], which is a swarm intelligence algorithm and simulates the foraging behaviors of honey bee swarm. The algorithm describes the foraging process of searching the food sources and sharing the information about the found sources among the three groups of bees, including the employed bees, the onlookers and the scouts. The employed bees are connected with the food sources being employed currently, explore the neighborhood through their memory and simultaneously share the information of their food sources with others; the onlookers choose food sources by the information from the employed bees; the role of the scouts is to randomly search a new food source. There is a mutual transformation relationship among them. The employed bees may be transformed into a few onlookers or scouts if they abandon their food sources to search other food sources. The onlookers may be transformed into a few scouts or employed bees if they abandon their food sources and follow other bees to search new ones, or share the information of their food sources with others. The scouts may be transformed into a few employed bees or onlookers if they find new food sources. In the algorithm, the position of a food source represents a candidate solution to a given problem in the search space, and its nectar amount corresponds to the fitness value. The number of the employed bees and the onlookers represents the number of solutions in the population, each of which accounts for half of the population.
Given the number of food sources is SN, the initial population can be represented as NP = {X 1 , X 2 , . . . , X i , . . . , X SN } each food source (candidate solution) is represented by . , x iN ) in a N-dimensional search space. In initial stage, the population P is generated by Equation (23) where L j and U j are the lower and upper bounds of jth dimension of the search space, respectively. In the employed bee stage, each employed bee X i will search in its neighborhood to find a new food source (a candidate solution), new_X i , according to Equation (24). Through greedy selection, if the fitness of new_X i is better than X i , then the new one replaces the previous one. When the times of the employed bee search exceeds the threshold limit, the food source is abandoned and a new one is randomly generated. new_X where i denotes the current solution, k is a random solution but k i, and i, k ∈ {1, 2, . . . , SN}, j represents the jth element of the corresponding solution, and R is a uniform random number in the rang [−1, 1].
In the onlooker bee stage, the onlooker bee will select a food source according to Equation (25), and this is a way of sharing information between the employed bees and the onlookers. The new solution is updated and selected as in the employed bee stage by Equation (24) and greedy rule.
where P i and f i denote the following probability and the fitness of the ith solution, respectively, M is the number of the onlookers in the population, and i ∈ {1, 2, . . . , M}.
In the scout bee stage, a scout bee searches for new solutions by Equation (23) in the case of the limit is exceeded. The pseudo code of the original ABC algorithm can be described in Algorithm 1.
It is well known that exploration and exploitation are very important for the population-based optimization algorithms, such as GA [37], WOA [38,39] and SSA [40]. In these algorithms, the exploration represents the ability of the algorithm to find the global optimum in the solution space, while the exploitation represents the ability of the algorithm to find a better solution using the previous good solution. In practice, whether an algorithm has good optimization performance mainly depends on whether it can balance the exploration and exploitation abilities well. In the ABC algorithm, the generation of a new candidate solution is based on the change in position (close to or far away) between the current solution and another randomly selected solution in the population by Equation (24). This randomness leads to the new candidate solution is not necessarily better than the previous one. In addition, R is a uniform random number, which also greatly increases the random exploration ability of Equation (24). In summary, the solution search equation described in Equation (24) is more exploratory but insufficiently exploitable. In order to improve the exploitation ability of ABC in the process of optimization, many scholars have rewritten Equation (24) in the form of Equation (26), by adding a term called global-best term close to or far away the global optimal solution (X g ) [41][42][43].
where β is a uniform number ranged in [0,C], where C is a nonnegative constant. By adjusting the value of β, the exploration and exploitation ability of the algorithm can be well balanced, but the global optimization ability can also be reduced in a certain degree.
In this paper, we improve ABC algorithm by replacing Equation (24) with Equation (27), which combines two search strategies form Equation (24) and Equation (26). In the early stage of the iteration, the algorithm is mainly based on the exploration efficiency, which can quickly find the global optimization, and also has a certain local exploitation ability. In the later stage of the iteration, the algorithm is mainly based on the exploitation ability, which can quickly jump out of the local optimization, and also has a certain global exploration efficiency. The solution search equation is described as where cr= 0.3, α is the variable step coefficient, b is an adjustment parameter,iter denotes the number of current iteration,maxiter is the maximum number of iterations, and round() is the rounding function.
As the optimization approaches to the optimal value, the step size in this process should be gradually reduced to decrease the turbulence around the optimal value. The relationship between the variable step coefficient α and the number of iterations is shown in Figure 3. In the process of iteration, the change of α is controlled by adjusting the value of b, which affects the time when the global-best term participates in the iteration. The smaller the value of b, the larger the value range of α is, the earlier the global best term participates in the iteration, and vice versa. The pseudo code of the IABC algorithm is described in Algorithm 2.

Back Propagation Neural Network (BPNN)
BPNN is a multi-layer feed-forward neural network using an error back propagation algorithm, which contains an input layer, an output layer, and one or more hidden layers. As a result of its simple structure and being easy to realize, it is widely applied in various aspects, such as prediction and pattern recognition [44][45][46].
The structure of BPNN is shown in the Figure 4, where j ∈ {1, 2, . . . , M},i ∈ 1, 2, . . . , q , k ∈ {1, 2, . . . , L} represent the number of input layer neurons, hidden layer neurons and output layer neurons, respectively; x 1 , x 2 , . . . , x M , y 1 , y 2 , . . . , y L and t k (k = 1, 2, . . . , L) denote the actual input and output, and target output of network, respectively; e k (k = 1, 2, . . . , L) is the output error of the network; w ij and w ki denote the connection weight of between input layer and hidden layer and between hidden layer and output layer, respectively. The input and output expressions of the hidden layer are expressed, respectively, as where HI i and HO i denote the input and output of the hidden layer neuron j, and b i is the corresponding threshold of the neuron j.
The input and output expressions of the output layer are expressed, respectively, as where YI k and YO k denote the input and output of the output layer neuron k, and b k is the corresponding threshold of the neuron k.
The signal is processed step by step from the input layer to the hidden layer until to the output layer, and each layer parameters only effect the next one. If the result of output layer does not meet to anticipant result, the back propagation will be switched by the network. According to the prediction error, the weight and threshold values can be adjusted continuously to make the outcome close to the expected output. The prediction error is usually expressed by minimizing the mean square error (MSE) of the output layer, as shown in Equation (33)

BPNN Optimized by Improved Artificial Bee Colony Algorithm (IABC-BPNN)
The BPNN optimized by improved artificial bee colony algorithm (IABC-BPNN) takes the selected features as the input of BPNN, and the weights and thresholds of neurons as a bee individual for ABC algorithm as shown in Figure 5, in which the thresholds and weights of BPNN are optimized by IABC, thus, avoiding falling into local optimization early, and improving the optimization ability of the algorithm.

The Rul Prediction of Machining Tools Base on A Polynomial Curve Fitting
A polynomial curve fitting method is used to fit the tool wear data from the output of IABC-BPNN. The polynomial function is described as follows: where x i is the number of ith machining, l j is the coefficient of the least squares polynomial by computing, and n is a polynomial factorial. Next, referring to the wear standard of machining tools, the max machining times MT of the machining tool can be deduced by regression analysis of the curve. The RUL of machining tools may be obtained as follows: where MT i is the machining times of the ith.

Experimental Equipment and Data Description
This study uses a CNC milling machine to perform the milling experiment of the tool, and multi-sensors to collect the data generated during the cutting process to verity the effectiveness of the RUL prediction system of machining tools proposed. Figure 6 shows the experimental equipment and connection diagram for measuring tool wear and predicting the RUL. The experimental equipment includes a CNC vertical machining center (G-VM8L, Spindle speed 50-8000 rpm/min, Cutting feed speed X, Y, Z: 5-6000 mm/min), two types of sensors (vibration sensor M69 and force sensor Kistler 9257 A), as well as their supporting charge amplifiers, data acquisitions card and software measuring system, a portable digital microscope (MSUSB401), a notebook, a workpiece (material:C45E, size: 250 mm × 100 mm × 70 mm) and five milling tools (two-edge micro-grain tungsten steel milling cutter SJY H550, type:D6 × 15 × 50 × 2F, HRC 55). In the G-VM8L CNC center, the workpiece (C45E) is dry milled using a two-edge micro-grain tungsten steel milling cutter with a diameter of 6 mm. The spindle speed is 1200 rpm/min, the milling depth in the z-axis direction is 0.2 mm, the feed rate is 200 mm/min, and the machining length in the feed direction (y-axis) is 70 mm. Each time the machining in the feed direction is completed, the cutter returns to the starting point and is taken a photo with the portable digital microscope MSUSB401 after a pause, and then the next machining operation repeats. The microscope and its own application software can acquire and store images, and measure and record the tool wear after each cut during dry milling operations [23]. In this experiment, the cutter is used to machine the groove of the workpiece, each cutter is machined 300 times or cuts and the total cutting length is 70 mm. There are five milling tools called Ci (i = 1, 2, 3, 4, 5), among which C1 and C3 are measured by the microscope and used as the offline tool wear data to train the prediction model, C2 and C4 are used as the offline data to test the prediction model, and C5 is used as the online data to predict the machining tool RUL.
In the milling process, the signals are simultaneously acquired at 1 KHz sampling frequency by the wireless three-axis Accelerometer M69 (vibration sensor) and Dynamometer Kistler 9257A (force sensor), which are respectively installed on the workpiece, and the between of the workpiece and the table. Specifically, Kistler 9257A is fixed on the worktable, and the workpiece is mounted on the clamping table of Kistler 9257A, which is used to measure the force signal of the workpiece during processing; M69 is fixed on the non-milled surface of the workpiece to measure the vibration signal of the workpiece during processing. M69 collects the cutting vibrations in three directions, whose coordinate system is consistent with the CNC's as in Figure 6. The cutting vibration signals are sent to a computer after being conditioned by the supporting wireless base station, and displayed in real time by the software MKServer installed on the computer. Simultaneously, Kistler 9257A, as well as its charge amplifier and data acquisition card, collects the cutting forces in three directions, which are also sent to the computer and displayed in real time by the software DynoWare.

Results and Analysis
As shown in Figures 7 and 8, six channel signals from the two sensors for one machining process are sampled and preprocessed by CEEMD. Their waveforms in different directions reflect the changes in the force or displacement of the cutter at a certain moment during the machining process, and are also different manifestations of tool wear, which are conducive to more accurately extract features for RUL prediction. The reconstructed signals (marked as CEEMD in Figures 7 and 8) almost coincide with the raw signals, which indicates the CEEMD method can effectively decompose the raw signals.
In this paper, TDFs, FDFs and TFDFs are extracted by different analysis methods and every channel of sensors can get 10 TDFs, 7 FDFs and top 10 IMFs' TFDFs, and, finally, a total of 162 features are acquired. These extracted features can reflect the change trend of tool wear during milling. Taking the data of C1 as an example, the corresponding TFDFs of the force sensor and vibration sensor change trends in Z-direction at the 50th, 150th and 250th cut are shown in Figures 9 and 10. With the increase of tool wear, we can find that the maximal amplitude in Z-direction of the two sensors in the C1 are enhanced, while the change trend of dominant frequency is not obvious in Z-direction of the vibration sensor, and gradually decreased in the force sensor.
To reduce the dimension of the features and select the optimal features, the lightGBM method is used to carry out the correlation calculation of features and acquire the correlation coefficient scores between different features. High-score features can better reveal the relationship between features and the RUL of the machining tools. From Figure 11, it is found that not all features are suitable to predict the RUL of the machining tools, only those with high VIM scores are selected as the optimal features. For example, 16 optimal features scored greater than 0.6 are selected in total, and they occur in T max , T rms , T vr , T sd , T p2p , F mv , F vr , E 5 , E 7 and E 8 , among which 11 features from the force sensor and six features from the vibration sensor. This shows that the tool wear correlation is related to those features, and, meanwhile, multi-sensor signals can play a very complementary role in predicting the RUL of the machining tools.     Next, the selected 16 features are used as the input of the IABC-BPNN model, and the tool wear as the output. According to the number of the selected features, the model structure of the network can be determined as 16-33-1, which means an input layer with 16 neurons, a hidden layer with 33 neurons and an output layer with 1 neuron. In the IABC-BPNN model, the size of bee colony is 50, the number of employed bees is 25, the dimension of individuals is 595 (528 + 33 + 33 + 1), the learning rate is 0.1, the training target is 0.01, and the max epoch is 500. The IABC-BPNN automatically optimizes the network weights and thresholds, and uses a backward feedback mechanism to train the neural network until the minimal error appears.
After the IABC-BPNN model parameters are determined, the different data with tool wear characteristics in the offline data set from C1 are selected as the training data (10 groups) and the testing data (four groups) from C1, respectively. After the model trained 10 times with 10 groups of training data, the measured values, the predicted values, the predicted values' standard deviation (STD), error percentage and confidence interval are shown in Table 2, and the results for 10 groups training are depicted Figure 12. Comparing Table 2 and Figure 12, we can easily find that the measured values and predicted values all are in the confidence interval, which are consistent with the results of each training in the box plot, especially the change ranges of the predicted values obtained are very small, indicating the stability of the model is very good. The four groups of testing data are used to test the built model. As shown in Figure 13, the 45 • line is a zero-error line, and the predicted value is within 5% error percentage of the 45 • line, indicating that the built model is reliable.   Once the IABC-BPNN model is confirmed, the tool wear values of C2 and C4 can be predicted. Figure 14 describes the relationship between the wear of the 4 tools and the number of machining times, where the wear values of C1 and C3 are measured by the portable digital microscope MSUSB401, while those of C2 and C4 can be predicted by the IABC-BPNN model. According to the PCF method to analyze the data of Figure 14, a polynomial curve of the tool wear can be obtained and descripted in Figure 14. The max tool wear is set to 0.3 mm, and the max machining times MT of the machining tools is computed using regression analysis of the curve, this is, there are 347, 329, 330 and 326 cuts for C1, C2, C3 and C4 respectively. In Figure 15, MT of each tool is represented by the number of corresponding machining times when the polynomial curve of each tool intersects the max tool wear line.
Then, the established IABC-BPNN can predict the machining tool RUL using the PCF method. After de-noising, feature extraction and selecting the optimal known features, the online data is input into the IABC-BPNN model to predict the tool wear in machining process online. According to Equation (35), the machining tool RUL can be acquired. As shown in Figure 16, the curve of C5 is the tool wear of prediction online, and the residual rate of RUL is expressed by ( RUL i MT ) × 100%, so it is easy to find that the RUL corresponding to points E, F and G is 0%, 10% and 40% of MT, respectively. In particular, Point E corresponds to the max wear (0.3 mm) of the machining tool, its corresponding machining times MT = 330, and its corresponding residual rate of RUL is 0%, this is, the RUL of machining tool is zero; point F corresponding to the tool wear is 0.21 mm; its machining times is 297 cuts; its residual rate of RUL is 10%; and the RUL of machining tool is 33 cuts (330 × 10%); point G corresponding to the tool wear is 0.095 mm; its machining times is 198 cuts; its residual rate of RUL is 40%; and the RUL of machining tool is 132 cuts (330 × 40%).   Finally, several compared methods such as NFIS, radial basis function networks (RBFN) and BPNN, are used to predict the tool wear using the same data set (there are 9600 data for C1 and C3). The training target of all these methods is 0.01, the max epoch is 500 and their other parameters are shown in Table 3. Meanwhile, the prediction performances of these methods are measured by such statistical indices as root mean square error (RMSE), mean absolute percentage error (MAPE) and the absolute fraction of the variance (R2). Table 4 gives the results of the statistical performances for the IABC-BPNN and compared methods, from which it can be observed that the proposed method performs better than the others.

Conclusions and Outlook
In this paper, a novel multi-sensor data fusion for online RUL prediction system of machining tools is proposed. This system integrates multi-sensor signals, signal preprocess, feature extraction, feature selection, IABC-BPNN model-based feature fusion and the RUL prediction, using the PCF method. First, multi-sensor signals from the vibration and force sensors are collected and de-noised by the CEEMD. Then, the multidimensional features are extracted in time domain, frequency domain and time-frequency domain by such methods as statistical analysis, power spectrum density analysis and HHT. Furthermore, the LightGBM method is used to select the optimal features that are important to improve the performance of the prediction system. Next, the IABC-BPNN model-based feature fusion trained by selected features is established and used to implement the tool wear prediction. Finally, an experimental example is implemented to verify the proposed system. The experimental study shows that the proposed method can precisely predict the machining tool RUL which verifies the feasibility of the proposed method in practical application. Meanwhile, compared with its rivals, the IABC-BPNN model shows better prediction and performance.
It should be pointed out that the proposed IABC-BPNN model is only used to predict the tool wear and RUL under a single working condition. In the future, the proposed method will be applied to predict the tool wear and RUL under multi-working conditions and different types of machining tools, and the feasibility of the proposed method is further improved by the automatic optimization of parameters. In addition, the health monitoring of machining tools should be combined with the RUL prediction.