1. Introduction
The intelligence of cutting is of great significance to the intelligence of the whole manufacturing industry [
1]. Intelligent monitoring in machining is a key technical link to realize intelligent manufacturing. Intelligent condition monitoring can realize real-time perception of the state of the equipment in the processing process, guide the adjustment of processing parameters, optimize the product quality, and give a timely warning when the equipment life is insufficient or failure occurs. In the field of mechanical processing, hole processing accounts for one-third of the total, and deep-hole processing accounts for more than half of the hole processing [
2]. Due to the particularity of the deep-hole parts structure, a boring bar with a large aspect ratio is generally needed for processing. This kind of boring bar is generally weak in rigidity, and it can easily violently vibrate or even chatter during processing. In the cutting process, the occurrence of vibration and chatter will affect the surface quality of the machined surface, accelerate tool wear, and even cause chatter marks on the surface of the workpiece and damage the workpiece and tools [
3]. In deep-hole machining, it is generally difficult to directly observe the vibration state of the boring bar, so the state monitoring technology combined with deep learning can play a key role in vibration monitoring. Timely detection of the change of the vibration state of the boring bar can help guide the adjustment of processing parameters and improve the quality and efficiency of deep-hole processing.
Artificial intelligence technology is an important tool to realize the intelligence of cutting processing, and deep learning is an important part of artificial intelligence technology. Deep learning is derived from machine learning. Through the ability of adaptive feature extraction and learning of multi-layer neural networks, the recognition, classification, and regression of data or images are realized. In recent years, with the advancement of computer technology, deep learning has also rapidly developed and is widely used in the field of machining [
4]. In order to give full play to the ability of deep learning to adaptively extract features, some scholars choose to input the original signal of the sensor into the deep learning model. Li et al. [
5] input the vibration and sound signals in the boring process into LSTM to realize the state recognition of deep-hole-boring tool bluntness and tool breakage. Liu et al. [
6] used the PHM2010 dataset as the training data, used the parallel residual network to adaptively extract the internal features of the multi-dimensional sensor signal, used the stacked bidirectional long short-term memory network to extract the time series features in the signal, and then established an effective mapping with the tool wear value, and a deep learning model with high accuracy was successfully constructed. He et al. [
7] constructed a long short-term memory neural network, and input the force, vibration, and acoustic emission signals in the milling process into the network model. Finally, an effective mapping between data and tool wear values was established, which proved the effectiveness and feasibility of the network. Xu et al. [
8] designed a deep neural network that combines one-dimensional dilated convolution kernels with residual blocks and used this model to predict the wear of tap tools.
Although deep learning can adaptively extract data features, in many cases, the features extracted by the model itself do not fully reflect the characteristics of the data. Therefore, more researchers consider using the data processed by mathematical methods as the input of the deep learning model for training. Zhou et al. [
9] decomposed the spindle torque signal by EMD and input the feature matrix composed of process parameters and workpiece information into the LSTM model to realize the prediction of the tool life. Chen et al. [
10] combined CNN with a deep bidirectional gated recurrent unit neural network, collected the milling acceleration signal, input the signal into the neural network after wavelet threshold denoising, and introduced the attention mechanism to adaptively perceive the network weight associated with the wear state, so as to realize real-time and accurate prediction of the tool wear state. Li et al. [
11] collected the spindle current signal of the machine tool, used compressed sensing to compress the frequency domain characteristics of the signal, and input the data into the stacked sparse auto-encoder network for training, which successfully realized the recognition of the wear state of the milling cutter.
Considering that deep learning is widely used in the field of image recognition, in order to make better use of the ability of the convolution kernel in the deep learning model to extract image features, some scholars also process data into images as the input of the model. Ren et al. [
12] proposed the method of the spectral principal energy vector to combine eigenvalues into a 64 × 64 feature map, an 8-layer CNN network was designed to predict the bearing life, and the smoothing technique was used to solve the problem of discontinuous prediction results. Wen et al. [
13] improved the LeNet-5 model by using the motor-bearing dataset, the self-priming centrifugal pump dataset, and the axial piston hydraulic pump dataset, and converted the original signal into a two-dimensional image as data input to verify the fault identification accuracy of the model. Liu et al. [
14] proposed a milling chatter monitoring method based on unlabeled dynamic signals. It uses the unsupervised clustering algorithm, does not need to add labels to the data, satisfies any processing parameters and processing environments, has strong stability, and can effectively identify chatter. Pagani et al. [
15] processed RGB and HSV channel images of chips as input data to predict tool wear, and the method was used in stable processing scenarios.
It can be seen from the above literature that data processing and deep learning technology have been widely used in the field of cutting. Scholars in various countries have conducted in-depth research on the identification and monitoring of the tool life and equipment status. However, there are few research results on real-time intelligent monitoring of boring bar vibration by deep learning. To accurately and real-time monitor the vibration state of the boring bar during machining, a monitoring method of the boring bar vibration state based on Wigner–Ville distribution and the shuffle-BiLSTM network is proposed by combining data time–frequency analysis technology with deep learning image recognition ability. Based on SPWVD, the time–frequency domain features are extracted from the collected signal and used as the input of the model, which solves the problem that the features extracted from the signal by the deep learning model cannot fully reflect the data characteristics. The shuffle unit is used in combination with BiLSTM. The memory ability of the network is enhanced, and the recognition accuracy is effectively improved through the BiLSTM structure. The training time of the model is shortened, and the real-time monitoring is improved by use of the shuffling unit, with the characteristics of lightweight and high speed. Through the experimental analysis, it is proven that the model runs fast and has a high recognition accuracy, and it has good research and application value.
3. Proposed Shuffle-BiLSTM Model
According to the functional characteristics of group convolution and channel shuffle, combined with the BiLSTM structure, this paper constructed the Shuffle-BiLSTM network. Its network architecture is shown in
Figure 1.
The network model is mainly composed of three parts: the shuffle unit module, BiLSTM module, and the vibration state monitoring module. Firstly, the 2D time–frequency spectrum of the three-way acceleration signal and the sound pressure signal extracted by SPWVD was adjusted to 0–255 grayscale images. A 256 × 256 × 3 matrix was formed by these images to realize the fusion of the two sensor signals in time–frequency domain features. Next, after the initial convolution pool and other operations, the shuffle unit was entered. The shuffle unit is mainly composed of the above group convolution and the channel shuffle, plus the necessary data batch normalization, pooling, and leaky RelU layers, and it imitates the residual network to increase the short-circuit mechanism, which avoids the gradient explosion problem in the training process, to a certain extent. Considering that the group convolution operation will reduce the accuracy of the network, multiple sets of shuffling units were added to increase the network depth. Then, after the shuffle unit, a BiLSTM layer was added to further extract the time series features of the data and adaptively filter the corresponding types of implicit features in the data. Finally, the learned features were fed back to the classification layer, composed of the fully connected layer and the SoftMax classifier, to identify and output the vibration state of the boring bar.
3.1. Shuffle Unit
The shuffle structure was originally a lightweight network structure model proposed by Xiangyu Zhang et al. [
19]. This structure greatly reduced the parameters of the deep learning model by introducing group convolution and channel shuffle, and effectively improved the calculation speed and accuracy of the model.
3.1.1. Group Convolution
According to the above content, the convolution kernel in the ordinary convolution layer will output all the information of the feature layer to the next feature layer through the convolution operation, as shown in
Figure 2a. The parameter quantity in the convolution operation is:
where
D0 is the number of layers of the input matrix, that is, the depth of the input layer.
The grouping convolution is different from the ordinary convolution layer. It groups the input layer and then uses different group convolutions for the calculation. The corresponding group convolution kernel is only convoluted with the corresponding input layer, as shown in
Figure 2b. The parameters of group convolution are:
where
G is the number of groups. From the Formula (17), Formula (18) can be seen as:
. The operation of group convolution can greatly reduce the number of parameters and improve the calculation speed of the model.
3.1.2. Channel Shuffle
Although the computational cost can be significantly reduced by grouping convolution, this method makes the output of each group only come from a part of the input layer, as shown in
Figure 3. Obviously, groups are isolated from each other, and there is no information flow, which will reduce the learning ability of the model.
To solve the problem of grouping convolution, the channel shuffle method was introduced. After the channel shuffle method was applied to the grouping convolution operation, the channels of each group were further grouped according to the total number of groups, and then mixed with each other to ensure that each large group can have the characteristics of the other groups and be used as the input layer of the next convolution operation. The specific implementation method is shown in
Figure 4. Let the number of feature layer groupings be G and the total depth be D. After combining them into (G, D), it was transposed and re-leveled to achieve channel shuffling. It can be seen from the diagram that the mixed channel after channel mixing avoided the separation and isolation between the above channels, to a certain extent.
3.2. Bidirectional Long Short-Term Memory Network
Long short-term memory (LSTM) recognizes, stores, and forgets features through the ‘gate’ mechanism. LSTM can use this memory-like feature to filter and save the implicit features of the input data, identify and save the features associated with the current state, discard redundant features, and repeatedly update the memory as the data are continuously input.
LSTM has three gate functions, namely the input gate, forgetting gate, and the output gate. The forgetting gate determines which memory information needs to be modified and passed to the next step. The input gate is responsible for integrating the previous memory and the new input, and the output gate will pass the filtered memory information down. The mathematical expression is as follows [
20]:
where
,
,
, and
, respectively, represent the corresponding input gate, forgetting gate, output gate, and cell unit state at time
.
,
,
, and
are the weights of each door.
,
,
, and
are the offsets of the respective doors.
stands for the sigmoid activation function,
is the input information at time
, and
and
are the hidden layer information at time
and
, respectively.
is the cell state information at the time
and
is the candidate cell state.
Since the feature transfer of LSTM has a direction, a single transfer direction may not be able to fully extract the implicit features in the data. Therefore, bidirectional long short-term memory (BiLSTM) was introduced, and the data feature sequence was bidirectionally extracted and combined by the combination of forward LSTM and backward LSTM to better capture the data sequence features and achieve accurate classification of the state.
The overall structure of BiLSTM is shown in
Figure 5.
3.3. Vibration State Monitoring Module
The vibration state monitoring module is responsible for outputting the vibration state of the monitored boring bar, which is mainly composed of a full connection layer and a classifier, as shown in
Figure 1. The full connection layer performed weighted regression on the advanced features learned from the shuffling unit module and the BiLSTM module, and finally connected the classifier to identify the three vibration states of the boring bar and output the corresponding labels.
5. Conclusions and Future Works
In the process of deep-hole-boring, the vibration state of the boring bar is difficult to monitor. To solve this problem, this study proposed an intelligent monitoring technology of the boring bar’s vibration state based on data acquisition, signal processing, and deep learning technology. Applying the proposed technology to the boring monitoring system, the vibration state of the boring bar can be perceived in real-time. Operators can adjust the processing parameters according to the perception results to improve the efficiency and accuracy of processing. Through a large number of experimental studies and comparison with some traditional depth models, the effectiveness and superiority of the model were verified. The main conclusions are as follows:
(1) The secondary time–frequency representation method with the kernel function (SPWVD) was used to process the experimental data. The original data were transformed into a two-dimensional time–frequency spectrum, and this was identified by the deep network model. The deeper features were extracted, and the effective classification of the vibration state was realized.
(2) The group convolution method was used to extract some features of the input layer, and the group convolution layer was rearranged by channel shuffling. This reduced the amount of calculation, shortened the calculation time of the model, and improved the real-time monitoring, while avoiding the cognitive limitations of the model. BiLSTM was used to extract and screen the data memory characteristics, which enhanced the memory ability of the network and realized the accurate classification of the boring bar’s vibration state.
(3) The cutting experiments of different vibration states of the boring bar were designed, and 192 groups of cutting experiments were carried out by changing different experimental parameters. The vibration and sound pressure data in the experiment were collected and used as the original data of the vibration state perception. The deep network model was trained and tested, and the test classification accuracy of the model used in this paper was 91.2% when the parameter quantity was only 1.9 M. A variety of typical deep network models and a single-signal input model were added for performance comparison testing. The test results showed the advantages of the models and methods used in this paper.
This study provides a better choice for civil and military enterprises involved in deep-hole-boring. To better guide industrial production, in the future research work, the following potential research directions can be further explored: how to accurately identify the vibration state of the boring bar under variable working conditions and design an appropriate deep transfer learning model to deal with small samples or incomplete datasets.