The structure of the proposed model can be divided into three main parts, which are data augmentation, physical deep learning and roughness prediction. Part I is mainly responsible for generating the data used throughout the model and performing data augmentation through physical mechanisms. Part II performs feature extraction and learning based on CNN–GRU, where physical constraints have been introduced through the physically guided constructed loss function. Part III feeds the milling features derived from Part II into the prediction model for surface roughness prediction. The interaction between the three parts can be summarized as a progression, where Part I provides the data containing the physical laws for Part II, and Part II provides the surface roughness prediction model under the constraints of the physics for Part III. The proposed model includes physical knowledge before and during training in deep learning. Before training, data enhancement was performed on limited experimental data by constructing a surface roughness mechanism model with a tolerable accuracy. During training, a physically guided loss function was constructed to guide the training process of the model with physical knowledge. Considering the excellent feature extraction capability of convolutional neural networks and gated recurrent units in the spatial and temporal dimensions, a CNN–GRU model was adopted as the main prediction model for surface roughness. Meanwhile, bidirectional gated recurrent units and a multi-headed self-attentive mechanism were introduced to enhance data correlation. The following section describes the sub-models and their components included in the physically guided deep prediction model in detail.
3.1. Surface Roughness Mechanism Model
During mechanical milling, the machined surface of the workpiece tends to not be completely flat. As an indicator to monitor the quality of workpiece machining, surface roughness is directly related to micro-geometry errors such as small spaces and micro-scale peak and valley inhomogeneities in the machining plane [
36,
37]. The relative motion between the workpiece and the tool during milling makes it impossible to completely eliminate the theoretical cutting area in actual machining. As shown in
Figure 2, many areas of the workpiece surface are left after the actual milling process, which are referred to as residual areas. The value of the theoretical roughness is directly related to the maximum residual height of the residual area.
The physical modeling of surface roughness was carried out by combining geometric and mechanical models for the residual areas of the machined surfaces mentioned above. The specific flow of this physical modeling process is referred to in [
15], and this paper demonstrates a simplified physical modeling process. Assuming that the surface roughness of the machined workpiece is only related to the height of the residual area in the actual machining process, the theoretical expression for the arithmetic contour mean is shown in Equation (1).
where
denotes the proportional relationship between
and
and
denotes the maximum residual height of the machined residual region. The mechanism of the formation of residual height of the machined surface during the movement of the tool from position
a to
b is visualized in
Figure 2a. The three points
,
and
in the diagram are the intersection points between the marked horizontal dashed line and the solid line of the tool, which constructs a right triangle
. The following theoretical relationships can be obtained from the labeled geometric relationships in the diagram, as shown in Equations (2) and (3).
where
denotes the auxiliary angle of the tool,
denotes the radius of the arc of the tool tip and
denotes the feed per tooth. The theoretical expression for the maximum residual height,
, can be obtained by simplifying the above equation, as shown in Equation (4).
In the actual machining process, the surface roughness of the workpiece is related to geometric factors and the mechanical properties of the material. The extrusion phenomenon of the tool on the surface of the workpiece during the cutting process results in elastic–plastic deformations of the machined surface. In view of the above influence of the mechanical properties on surface roughness, a corresponding physical model is constructed for the residual height,
, of the workpiece surface. The geometric meaning of the residual height,
, is shown in
Figure 2b, which consists of the plastic deformation height and elastic rebound, and its theoretical expression is shown in Equation (5).
The plastic deformation height,
, can be obtained by the Krakelsk principle of frictional wear calculations, the mathematical expression of which is shown in Equations (6) and (7).
where
denotes the micro-bump radius, i.e., the radius of the arc of the tool tip during cutting,
denotes the Vickers hardness of the material and
denotes the flow stress. After simplifying the above equation, the plastic deformation height,
, can be expressed in Equation (8).
where the flow stress
can be obtained by the Johnson–Cook principal structure equation, the mathematical expression of which is shown in Equation (9).
where
,
and
denote the equation coefficients,
denotes the strain rate;
and
denote the equivalent strain rate and the reference strain rate, respectively,
denotes the workpiece temperature,
denotes room temperature and
denotes the melting temperature of the workpiece. Besides the plastic deformation height
, the elastic rebound,
, in the remaining height,
, can be calculated by Hertz elastic contact theory, the mathematical expression of which is shown in Equations (10)–(12).
where
denotes the
z-axis cutting force during cutting,
and
denote the radii of the two contact bodies, respectively,
and
denote the modulus of elasticity of the tool and the machining material, respectively, and
and
denote the Poisson’s ratio of the tool and the machining material, respectively. On this basis, the elastic rebound,
, in the remaining height,
, can be calculated by Equation (13).
3.2. Physically Guided Deep Prediction Model
Deep learning has been demonstrated to be robust in terms of feature extraction, time series prediction, etc. This paper combines physical knowledge with deep learning to propose a physically guided surface roughness prediction model. The proposed prediction model consists of physical knowledge and neural networks. Considering the convincing feature extraction capability of CNNs for time-series data, this paper adopted a CNN for milling feature extraction to track the spatial pattern variations caused by process parameter differences. The CNN can feed pre-processed data directly into the model and extract the deep non-linear features hidden in it by combining multiple convolution and pooling layers. Since GRUs are capable of learning sequential and time-varying patterns from the original dataset, this paper employed a GRU for the dynamic tracking of surface changes during the tool life cycle. Compared with the LSTM architecture, GRUs have the advantages of faster convergence and a fewer number of parameters. This paper adopted a hybrid deep learning model, CNN–GRU, as the neural network part of the prediction model. Physical knowledge was mainly integrated in the training phase and the loss function part in order to use the physical knowledge to guide the training process of the model.
CNNs are one of the most successful deep learning methods, and their network structure can be divided into 1D-CNNs, 2D-CNNs and 3D-CNNs. A 1D-CNN is very suitable for time series analyses of sensor data or analyses of periodic signal data, and its specific structure is shown in
Figure 3.
The CNN architecture typically consists of a convolutional layer and a pooling layer to filter and extract useful features from the input data. The leftmost part of the figure shows the input multi-dimensional time series data. Each convolutional layer has a corresponding convolutional kernel, and each colored box on the input data in the figure represents a filter. The filter slides top-down over the entire input matrix, producing convolutional features of the input data through a contained coefficient matrix. The dimension of the convolutional features extracted by the filter is , where is related to the input dimension, the filter size and the convolutional step size. Assuming that the number of convolutional kernels applied to the input data is M, then the dimension of the extracted convolutional features is . The convolution layer tends to be followed by a nonlinear activation function and then this is immediately followed by a pooling layer. The pooling layer is a subsampling technique that can transform and aggregate each convolutional feature matrix into a low-dimensional feature matrix based on specific rules. For example, the maximum value in the current sliding window, i.e., the most critical feature in the window, will be output under the max-pooling rule. Pooling operations can enhance the robustness of the system and reduce the sensitivity of the pooled output to changes in the input. In summary, the CNN architecture is suitable for extracting robust features of time series data and avoiding the iterative expansion of matrix dimensions.
Recurrent neural networks (RNNs) are capable of memorizing historical data features and are applicable to a variety of sequential data problems. Gated recurrent unit (GRU) neural networks selectively filter and remember historical information through update gates and reset gates, solving the problems of gradient disappearance, gradient explosion and poor long-term memory of traditional RNNs. Due to the temporal characteristics of vibrational data, this paper employs a GRU to extract vibration sequence features. The detailed internal structure of the GRU model is shown in
Figure 4, where
and
represent the hidden states of the previous unit and the current unit, respectively,
represents the candidate state of the current unit and
represents the input tensor of the current unit. Specifically,
and
are obtained through
and
, which are calculated as shown in Equations (14)–(17).
where
and
represent the reset gate and update gate,
,
and
represent weight matrices,
consists of historical and current state information,
represents the selective forgetting of historical information and
represents the selective memory of current information. While processing vibration data with a GRU, the clarity of the historical information inherited from previous units is negatively correlated with the length of the sequence. To enhance the correlation between the dimensions of the time-series vibration data, a bidirectional GRU and a multi-headed self-attention mechanism were introduced. The timing vibration data are fed into the bidirectional GRU from the forward and reverse directions, respectively, where the specific structure of the network and cells is shown in
Figure 4. The bidirectional GRU combines the acquired forward and reverse output features into an output feature map, which enhances the correlation between the dimensions of the time-series vibration data.
A multi-headed self-attention mechanism can enhance the correlation between dimensions of sequence data, and is widely used in the fields of natural language processing and image processing. This paper set the head number of the multi-headed self-attentive mechanism to eight, and its specific structure is shown in
Figure 5.
The output feature mapping of the bidirectional GRU served as its input tensor, and the three feature matrices Query, Key and Value were calculated with
,
and
, respectively. The correlation between the Query matrix and the Key matrix was characterized by the dot product operation, and the calculation procedure is shown in Equation (18). Then, the calculated result was divided by the column dimensions of the Query matrix to avoid the occurrence of oversized results that are difficult to observe. Finally, the above result was converted into a weight matrix by the softmax operation, and the Z matrix was obtained by multiplying the weight matrix with the Value matrix.
The multiple Z matrices calculated above were executed and transformed into the target output tensor X*. The input tensor and the output tensor have the same data structure, and all dimensions of the output tensor take the relevance of the remaining dimensions into account. This mechanism can fully extract and enhance the correlation between the vibrational data dimensions.
The bidirectional GRU mainly consists of three GRU layers and a multi-headed self-attentive mechanism. In order to ensure a reasonable training time at the current model parameter scale, the first GRU layer is a bidirectional GRU layer that extracts vibrational data location features in both the forward and reverse dimensions. Then, a multi-headed self-attentive mechanism was placed after the bidirectional GRU layer for weight setting. This mechanism enables aggregating the dimensional weights of the extracted output features in the bidirectional GRU layer to enhance data correlation. Finally, the remaining GRU layers were set as normal GRU layers for extracting the features of the output matrix from the multi-headed self-attentive mechanism.
Physical knowledge was introduced in the training phase and loss function part of the above deep learning model to guide the training process of the model with physical knowledge. Compared to the conventional loss function, the proposed physically guided loss function was capable of capturing dynamic patterns that can be generalized in the framework of defined physical laws, and its mathematical model is shown in Equation (19).
where
and
are the same loss factors as used in most machine learning models and
denotes a loss factor based on physical knowledge, which ensures that the training learning process of the prediction model is consistent with physical laws.
The specific expression of physical knowledge normally depends on factors such as milling material, milling environment, static parameter settings, etc. Take S45C steel as an example. In the practical milling process, the surface roughness is positively correlated with the vibration frequency, i.e., the surface roughness increases with the increase in vibration frequency. The vibration frequency decreases as the spindle speed increases. When the feed speed falls within the range of
mm/min, the vibration frequency increases with the increase in feed speed, and when the feed speed exceeds
mm/min, the vibration frequency changes insignificantly with the increase in feed speed. As a result, the surface roughness of S45C steel decreases with the increasing spindle speed as well as the increasing feed rate. To enable the above physical laws to serve as a guide for the model training process, a positive and negative monotonicity loss function was constructed, the mathematical model of which is shown in Equations (20) and (21).
where
and
denote the domain loss under the physical monotonicity constraint and the logical operator AND (
) is responsible for determining the monotonicity relationship between each feature and the surface roughness. The ReLU as a nonlinear function can help the neural network model learn the nonlinear relationship, which can better capture the nonlinear relationship in the data compared to the linear function when training the neural network, thus improving the performance of the model. In addition, the gradient can easily disappear when the sigmoid function is back propagated in deep neural networks. ReLU can enhance the sparsity of the network, reduce the interdependence between parameters and alleviate the overfitting problem to improve training efficiency. Therefore, ReLU was selected as the activation function in this paper. The mask mechanism was applied to the ReLU function to ensure that the training loss on the physical level was considered only when the predictions of the neural network model were contrary to physical laws. When the selected features are physically negatively correlated with the surface roughness,
will be considered in the total model loss if the predicted data are positively correlated, and conversely,
will be used to calculate the physically guided loss. The physically guided loss function can guide the model to converge the learning route towards the region in the function space that conforms to physical laws, which effectively improves the training efficiency and physical rationality of the model. The above loss function considers the case of violation of physical laws from two different directions of evolution. The predictions that conform to physical laws are filtered out using the corresponding masking mechanism, and only the corresponding offending terms are superimposed, effectively placing physical constraints on the deep learning model. In summary, the physically guided loss function constructed above is incorporated into the global loss function to form the final loss function under the physical law constraints. This loss function enables the learning route of the model to converge to the region in the function space that conforms to physical laws, which is expressed in Equation (22).