Adaptive BP Network Prediction Method for Ground Surface Roughness with High-Dimensional Parameters

: Ground surface roughness is difﬁcult to predict through a physical model due to its complex inﬂuencing factors. BP neural networks (BPNNs), a promising method, have been widely applied in the prediction of surface roughness. This paper uses the concept of BPNN to predict ground surface roughness considering the state of the grinding wheel. However, as the number of input parameters increases, the local optimum solution of the model that arises is more serious. Therefore, “identify factors” are designed to judge the iterative state of the model, whilst “memory factors” are designed to store the best weights during network training. The iterative termination conditions of the model are improved, and the learning rate and update rules of the weights are adjusted to avoid the local optimal solution. The results show that the prediction accuracy of the presented model is higher and more stable than the traditional model. Under three types of iteration steps, the average prediction accuracy is improved from 0.071, 0.065, 0.066 to 0.049, 0.042, 0.039 and the standard deviation of prediction decreased from 0.0017, 0.0166, 0.0175 to 0.0017, 0.0070, 0.0076, respectively. Therefore, the proposed method provides guidance for improving the global optimization ability of BPNNs and developing more accurate models for predicting surface roughness.


Introduction
Grinding is a widely used practice in precision machining, and the grinding quality directly affects the surface finish and working life of products. Surface roughness is an important parameter for evaluating the quality of ground surfaces and the competitiveness of the overall grinding system, which is closely related to the assembly accuracy, corrosion resistance and wear resistance of the products [1]. Predicting the surface roughness accurately for the grinding process is beneficial for selecting efficient process parameters and guaranteeing the grinding quality. Hence, high-precision prediction of ground surface roughness is of great importance [2,3].
In the research of grinding mechanisms, most physical models only consider the grinding wheel speed, workpiece speed, and depth of cut. However, surface roughness is equally sensitive to wheel wear, and it is difficult for the physical model to consider the time-varying state of the grinding wheel. Statistical analysis is commonly used for prediction [4,5], but it has limitations in solving complex nonlinear relationships such as grinding processes. Therefore, to meet the control needs of actual processing, researchers are more inclined to use machine learning models to predict surface roughness [6][7][8][9][10]. Nikolaos et al. [11] took the depth of cut, feed rate, and spindle speed as input parameters to establish an artificial neural network to predict the ground surface roughness Ra. The results showed that the prediction accuracy was as high as 0.796, which proved the validity of the model. To control the surface roughness during high-speed machining, Jiao et al. [12] In the above studies, different optimization algorithms were applied to improve the global search ability of the BPNN in the prediction of surface roughness. The first two types of approaches improved the global search ability by expanding the number of weighted solutions, but the model would still fall into the local optimal solution. In addition, the threshold for iteration termination was selected artificially, resulting in over-convergence and non-convergence of the network; they could not achieve fast convergence and local accurate search of the network at the same time or maximize the prediction performance of the model. In the third type of approach, the learning rate could not be adjusted based on the network training state. Adjusting the learning rate too early or too late would reduce the prediction accuracy and make it not possible to effectively solve the local optimal solution problem. Therefore, improving the local optimal solution and convergence of a BPNN while ensuring the convergence efficiency and prediction accuracy is still a problem to be solved.
In this paper, an adaptive BP network for the prediction of ground surface roughness is proposed to settle these matters, processing the force signal and extracting the features of grinding wheel wear as input parameters. In addition, optimizing the BP algorithm, improving the iterative termination conditions of the prediction model, and adjusting the weight update rules help to solve the problem of local optimal solutions and reduce the impact of human factors. Furthermore, dynamically adjusting the learning rate based on the iterative state of the network maximizes the prediction performance of the model itself, and improves the search accuracy of the target weights with the premise of ensuring rapid convergence. The rationality of the method is verified by comparing the prediction performance of the BP model of ground surface roughness before and after optimization.

Experimental Setup
The experimental material was quartz glass, which was cut into standard specimens with a size of 15 mm × 15 mm × 10 mm. To explore the effect of grinding wheel wear on ground surface roughness, the same grinding wheel was used continuously for processing in the experiment. The grinding wheel material was #80 electroplated diamond, with approximately 190 µm grit and 20 mm in diameter. Three grinding parameters were selected to carry out experiments of 3 × 3 × 3 combinations; the specific parameters are shown in Table 1. A CNC milling machine was used for grinding experiments. First, the test piece was bonded to the iron plate fixed on the milling machine to ensure the synchronous movement of the test piece and the worktable. Then, the triaxial force sensor fixed on the iron plate detected the force of the specimen; the specific experimental process is shown in Figure 1.

Data Processing and Selection
The grinding force signal is closely related to the wear of the grinding wheel. Therefore, it is a feasible approach to extract the features of grinding wheel wear through the normal force data and obtain parameters for the subsequent BPNN prediction, extracting features in time, frequency, and time-frequency domains to retain as many features of the normal force signal as possible. Due to the large amount of experimental data, 20,000 data points in the smooth grinding state were selected for analysis in each group of experiments.

Data Processing and Selection
The grinding force signal is closely related to the wear of the grinding wheel. Therefore, it is a feasible approach to extract the features of grinding wheel wear through the normal force data and obtain parameters for the subsequent BPNN prediction, extracting features in time, frequency, and time-frequency domains to retain as many features of the normal force signal as possible. Due to the large amount of experimental data, 20,000 data points in the smooth grinding state were selected for analysis in each group of experiments.
Due to machine tool vibration and ambient temperature, there was a lot of noise interference in the collected normal force signal, which affected subsequent feature analysis. Therefore, noise reduction and zero-drift compensation were performed on the normal force signal to reduce the impact of the environment and ensure a high signal-to-noise ratio of the signal. According to the experimental environment, the force signal was processed by low-pass filtering with a cutoff frequency of 500 Hz. Finally, the features of grinding wheel wear were determined in the time domain, frequency domain, and timefrequency domain, as shown in Table 2.  Figure 2 shows the processing of the normal force signal. In the time domain and frequency domain, the mean value AVG, the root mean square value RMS, and the barycenter frequency FC were often used to characterize grinding wheel wear. In the timefrequency domain, wavelet packet decomposition was used to analyze the non-stationary signals. From the frequency domain analysis, it was known that 0-62.5 Hz is the main frequency band of the normal force signal, and the four-layer wavelet packet decomposition was performed on this frequency band to observe the change of the energy ratio of the first eight frequency bands. By comparison, it was found that the energy of the fourth frequency band (16.5-19.5 Hz) and the total energy of the first eight frequency bands (0-31.25 Hz) were sensitive to grinding wheel wear. Therefore, the energy value ratios of the Due to machine tool vibration and ambient temperature, there was a lot of noise interference in the collected normal force signal, which affected subsequent feature analysis. Therefore, noise reduction and zero-drift compensation were performed on the normal force signal to reduce the impact of the environment and ensure a high signal-to-noise ratio of the signal. According to the experimental environment, the force signal was processed by low-pass filtering with a cutoff frequency of 500 Hz. Finally, the features of grinding wheel wear were determined in the time domain, frequency domain, and time-frequency domain, as shown in Table 2.  Figure 2 shows the processing of the normal force signal. In the time domain and frequency domain, the mean value AVG, the root mean square value RMS, and the barycenter frequency FC were often used to characterize grinding wheel wear. In the time-frequency domain, wavelet packet decomposition was used to analyze the non-stationary signals. From the frequency domain analysis, it was known that 0-62.5 Hz is the main frequency band of the normal force signal, and the four-layer wavelet packet decomposition was performed on this frequency band to observe the change of the energy ratio of the first eight frequency bands. By comparison, it was found that the energy of the fourth frequency band (16.5-19.5 Hz) and the total energy of the first eight frequency bands (0-31.25 Hz) were sensitive to grinding wheel wear. Therefore, the energy value ratios of the above two frequency bands were selected as the time-frequency domain features of grinding wheel wear.
According to the processed data, the parameters of the input layer are shown in Table 3. The processed data was divided into training samples and testing samples according to the principle of 3:1, which was convenient for the training and testing of the BP prediction model of ground surface roughness.
Mathematics 2022, 10, x FOR PEER REVIEW 5 above two frequency bands were selected as the time-frequency domain features of g ing wheel wear. According to the processed data, the parameters of the input layer are shown in T 3. The processed data was divided into training samples and testing samples accordi the principle of 3:1, which was convenient for the training and testing of the BP predi model of ground surface roughness.

Presented BP Neural Network Prediction Model
The core content of the BP algorithm is to update the parameters through gradient descent and error back propagation to find the optimal result for the entire path. When predicting ground surface roughness, the BP model can be trained by a certain number of training samples to fit the relationship between the ground surface roughness and grinding parameters under specific processing conditions. The presented BP prediction model of ground surface roughness with a single hidden layer constructed in this paper is shown in Figure 3.
The core content of the BP algorithm is to update the parameters through gradient descent and error back propagation to find the optimal result for the entire path. When predicting ground surface roughness, the BP model can be trained by a certain number of training samples to fit the relationship between the ground surface roughness and grinding parameters under specific processing conditions. The presented BP prediction model of ground surface roughness with a single hidden layer constructed in this paper is shown in Figure 3. According to the basic principle of BPNN, compared with multiple hidden layers, the BPNN with a single hidden layer is simpler and has the property of fitting nonlinear functions well. The number of nodes in the input layer depends on the dimension of the input samples; the parameters of the input layer are shown in Table 3. The number of nodes in the output layer is 1, and the output is the predicted value of the surface roughness Ra. The number of hidden layer nodes is determined through the principle of the golden section method: where h is the number of hidden layer nodes, m is the number of input layer nodes, n is the number of output layer nodes, and a is an adjustment constant from 1 to 10. In this work, a is set as 5 − √10, and the number of hidden layer nodes h is 5.
In addition, the activation function of the hidden layer of the prediction model is the Sigmoid function, the activation function of the output layer is the identity function, and the error function is the global mean square error. The default learning rate μ is 0.024, the initial weights and biases are generated randomly, and the learned function is used to update the weights and biases of each layer.

The Standard BP Algorithm
The input layer vector of the neural network is x = (x1, x2, ···, x9), the hidden layer vector is y = (y1, y2, ···, y5), and w and γ represent the connection weights and biases between the input layer and the hidden layer. The output layer vector is z, whilst v and θ represent the connection weights and biases between the hidden layer and the output layer. The activation functions of the hidden layer and output layer are f1, f2. According to the basic principle of BPNN, compared with multiple hidden layers, the BPNN with a single hidden layer is simpler and has the property of fitting nonlinear functions well. The number of nodes in the input layer depends on the dimension of the input samples; the parameters of the input layer are shown in Table 3. The number of nodes in the output layer is 1, and the output is the predicted value of the surface roughness Ra. The number of hidden layer nodes is determined through the principle of the golden section method: where h is the number of hidden layer nodes, m is the number of input layer nodes, n is the number of output layer nodes, and a is an adjustment constant from 1 to 10. In this work, a is set as 5 − √ 10, and the number of hidden layer nodes h is 5. In addition, the activation function of the hidden layer of the prediction model is the Sigmoid function, the activation function of the output layer is the identity function, and the error function is the global mean square error. The default learning rate µ is 0.024, the initial weights and biases are generated randomly, and the learned function is used to update the weights and biases of each layer.

The Standard BP Algorithm
The input layer vector of the neural network is x = (x 1 , x 2 , · · · , x 9 ), the hidden layer vector is y = (y 1 , y 2 , · · · , y 5 ), and w and γ represent the connection weights and biases between the input layer and the hidden layer. The output layer vector is z, whilst v and θ represent the connection weights and biases between the hidden layer and the output layer. The activation functions of the hidden layer and output layer are f 1 , f 2 .
The information forward propagation process of the neural network can be expressed as

of 18
After the BPNN completes the forward propagation of the information once, the back propagation of the error is carried out. First, the error function of the neural network is calculated: where e is the error function and M is the expected output vector. According to the negative gradient direction of the error function e, the connection weights and biases between the hidden layer and the output layer are updated: where µ is the learning rate. The connection weights and biases between the input layer and the hidden layer are then updated; the process is shown in Figure 3.
The above is an iterative process of the BPNN. The training and prediction of the model can be achieved by setting the number of iterations or the threshold of the error function e.
Finally, the developed predictive model can be represented by Equation (9):

The Local Optimal Solution
Based on the mathematical theory, the BPNN uses the local gradient feature of the error function to update the weights within finite iterative steps, and finally obtains the minimum training function value. However, the minimum training function value found is not necessarily the global optimal solution. From the perspective of the training function, there are multiple local optimal solutions instead of global optimal solutions, so the algorithm finally finds a local optimal solution with a high probability. From the perspective of the gradient descent method, the adjustment principle of the weights is based on local optima, and there is no algorithmic idea to avoid the local optima. When the initial weights and biases are determined, the local optimal solution is also determined, which is an important reason why many types of neural networks are difficult to optimize. The selection of initial weights and learning rate are two main factors that affect the generation of local optimal solutions. The initial weight is a sensitive parameter of the BP network, and different initial weights often cause the model to fall into different local optimal solutions. The learning rate affects the convergence characteristics of the neural network. When the learning rate is too large, the BP network may not converge, or converge rapidly in the early iteration, but may skip the local optimal solution and the global optimal solution in the later iteration, which will result in a decrease in the accuracy of the prediction model. When the learning rate is too small, the iteration of the BP network is slow, which reduces the efficiency of model and makes the model fall into the local optimal solution.

The Development of the Presented BP Algorithm
The traditional model does not judge the state of the network, it will iterate until the end of training. To solve the local optimal solution, a network state identification method is proposed in this paper. First, "identify factors" are designed to determine whether the BP network falls into the local optimal solution. Then, "memory factors" are designed to dynamically update and store the best weights during training. It is stipulated that after the error function value oscillates continuously three times during model training, "identify factors" is 1 and the BP network is considered to fall into the local optimal solution. In the rest of the cases, "identify factors" is 0.
where e N is the error value of the Nth training; e N−1 is the error value of the (N − 1)th training; and t − 1, t, and t + 1 are three consecutive positive integers. The "identify factors" are used to solve the convergence and local optimal problems later. A fixed learning rate µ in traditional model cannot guarantee fast convergence and accurate prediction of the BPNN at the same time. Therefore, according to the different characteristics of the BP network in the early iteration and later iteration, the learning rate µ is adjusted dynamically. We judge the iterative state of the BPNN via "identify factors": early iteration and later iteration. In the early iteration, the learning rate µ is kept at the default value, which improves the convergence speed and results in quickly approaching the optimal solution area. In the later iteration, "identify factors" = 1 and the learning rate µ decays through the Equation (11), which will improve the solution accuracy and avoid model oscillation: To ensure that the learning rate µ decays moderately, h is set as the number of adjustments; the values are 0, 1, 2.
After falling into the local optimal solution, the traditional model does not process and continues to iterate until the end of training. In this paper, when h = 2, the learning rate µ is reset to the default value, h is reset to 0, and random oscillation is loaded on the weights of each layer in "memory factors", so that the model avoids the local optimal solution: Mem1 and Mem2 are the weight matrices of the hidden layer and the output layer in "memory factors". p is a random number in (0, 2) which represents the range of the random change of the weights. The sin function represents the direction in which the weights randomly change. i, j, and rand (0, 2) represent random changes between different weights. Then BP network is trained with w ij and v j as the weights of each layer. After the training, the best weights in "memory factors" are taken as the final solution. The comparison of the traditional and proposed updating principle of the weights and thresholds is shown in Figure 4.  In the training process of the presented BP, the adjustment of the learning rate and update of the weights are influenced mutually. The change of the learning rate affects the result of the weight update, and the update of the weights affects the iterative state of the network, which in turn affects the value of the "identify factors" and finally affects the In the training process of the presented BP, the adjustment of the learning rate and update of the weights are influenced mutually. The change of the learning rate affects the result of the weight update, and the update of the weights affects the iterative state of the network, which in turn affects the value of the "identify factors" and finally affects the change of the learning rate. The specific process of the prediction model of ground surface roughness is shown in Figure 5. In the training process of the presented BP, the adjustment of the learning rate and update of the weights are influenced mutually. The change of the learning rate affects the result of the weight update, and the update of the weights affects the iterative state of the network, which in turn affects the value of the "identify factors" and finally affects the change of the learning rate. The specific process of the prediction model of ground surface roughness is shown in Figure 5.

The Performance Evaluation of the Presented BP
The prediction performance of the presented BP network for ground surface roughness is evaluated from two aspects: firstly, analyzing the influence of grinding wheel wear parameters on the prediction model of ground surface roughness; secondly, comparing the prediction performance of the BP network before and after optimization.
The traditional BP network is used to predict the ground surface roughness to analyze the influence of grinding wheel wear, and the global relative error is used as the measurement index of the prediction accuracy: where δ is the global relative error, N is the number of the testing samples, and μi and xi are the measured and predicted values of the surface roughness of the ith sample in the testing samples.
Comparing the prediction performance of the BP network before and after optimization, Equation (14) is used as the measurement index of the prediction accuracy, and the standard deviation is used as the measurement index of the stability of the prediction model:

The Performance Evaluation of the Presented BP
The prediction performance of the presented BP network for ground surface roughness is evaluated from two aspects: firstly, analyzing the influence of grinding wheel wear parameters on the prediction model of ground surface roughness; secondly, comparing the prediction performance of the BP network before and after optimization.
The traditional BP network is used to predict the ground surface roughness to analyze the influence of grinding wheel wear, and the global relative error is used as the measurement index of the prediction accuracy: where δ is the global relative error, N is the number of the testing samples, and µ i and x i are the measured and predicted values of the surface roughness of the ith sample in the testing samples.
Comparing the prediction performance of the BP network before and after optimization, Equation (14) is used as the measurement index of the prediction accuracy, and the standard deviation is used as the measurement index of the stability of the prediction model: where s is the sample standard deviation and δ i is the relative error of the predicted value of the ith sample in the testing samples.

Influence of Grinding Wheel Wear Features
The traditional BP network was used to predict the ground surface roughness to verify the influence of grinding wheel wear parameters. The basic process parameters in Table 3 were selected as the input parameters of the neural network-Traditional BP1 (Tra BP1), and all the parameters in Table 3 are the input parameters of the neural network-Traditional BP2 (Tra BP2). The number of input layer nodes of Traditional BP1 and Traditional BP2 are 5 and 9, respectively, whilst the other parameters are set to be the same. The number of nodes in the hidden layer and output layer are 5 and 1, respectively; the default learning rate is 0.024; and the three types of iteration steps K are set to 2000, 10,000, and 15,000.
Both models were trained with training samples and then tested 20 times with testing samples for three types of iterative steps. The relative error has been used to characterize the prediction accuracy; the results are shown in Table 4, whilst the experimental results and predicted results appear in Appendix A at the end of the paper. After the grinding wheel wear features were introduced, the best prediction accuracy of surface roughness Ra improved from 0.062, 0.062, 0.061 to 0.045, 0.035, 0.031, respectively. The average prediction accuracy also improved, from 0.071, 0.065, 0.066 to 0.050, 0.053, 0.054, respectively. The BP prediction model is essentially an optimal nonlinear function that maps input parameters to output parameters. The results show that the wear characteristics of the grinding wheel enhance the correlation between the input parameters and the ground surface roughness and promote the development of the BP model during iterative training: the update of the weights and biases of each layer. According to the physical model, the increase in the dimension of the input parameters improves the mapping ability of the model (the best prediction accuracy of the BP network).
The specific data of the relative prediction error of Traditional BP1 and Traditional BP2 are shown in Figure 6. Compared with K = 10,000 and K = 15,000, the prediction accuracy of Traditional BP1 is fluctuates more when K = 2000. Because the number of iterations is too small, the training level of the Traditional BP1 is low, and a good mapping relationship between the input parameters and the surface roughness Ra cannot be established. However, the prediction accuracy of Traditional BP2 tends to be stable and is significantly higher than that of Traditional BP1. Because the grinding wheel wear features enhance the mapping relationship between input and output, high prediction accuracy is achieved with fewer iteration steps. In summary, the selected grinding wheel wear features enhance the correlation between the input parameters and the ground surface roughness and improves the mapping ability and prediction performance of the model. However, it makes the model fall into the local optimal solution more easily.

Comparison of Prediction Models before and after Optimization
Section 4.1 proves the effect of grinding wheel wear, but the local optimal solution and over-convergence of the BP model become more serious due to the increase of the As the number of iteration steps increases, the prediction accuracy of Traditional BP1 gradually stabilizes around 0.065, which indicates that the model has reached a saturated training level. However, the fluctuation of the prediction accuracy of Traditional BP2 becomes larger, which is caused by overfitting. Because after the model training is saturated, continuing to iterate will reduce the generalization and lead to overfitting. At the same time, the increase of the input parameter dimension and the randomness of the initial weight will make the model more likely to fall into the local optimal solution. This is a typical defect of the traditional BPNN, which is the problem that the next section focuses on solving.
In summary, the selected grinding wheel wear features enhance the correlation between the input parameters and the ground surface roughness and improves the mapping ability and prediction performance of the model. However, it makes the model fall into the local optimal solution more easily.

Comparison of Prediction Models before and after Optimization
Section 4.1 proves the effect of grinding wheel wear, but the local optimal solution and over-convergence of the BP model become more serious due to the increase of the input dimension. Based on it, the iterative termination conditions of the prediction model, the weight update rules, and learning rate should be improved. To compare the prediction performance of the BP network before and after optimization, all the parameters in Table 3 are selected as the input parameters of the traditional BP network-Traditional BP (Tra BP) and the presented BP network-Presented BP (Pre BP). The number of input layer, hidden layer, and output layer nodes of both are 9, 5, and 1, respectively; the default learning rate is 0.024; the three types of iteration steps K are set to 2000, 10,000, and 15,000; and the initial weights are the same and generated randomly. The training and prediction process of the two models is the same as in Section 4.1; the results are shown in Figure 7, whilst the experimental results and predicted results appear in Appendix B at the end of the paper. In summary, the selected grinding wheel wear features enhance the correlation between the input parameters and the ground surface roughness and improves the mapping ability and prediction performance of the model. However, it makes the model fall into the local optimal solution more easily.

Comparison of Prediction Models before and after Optimization
Section 4.1 proves the effect of grinding wheel wear, but the local optimal solution and over-convergence of the BP model become more serious due to the increase of the input dimension. Based on it, the iterative termination conditions of the prediction model, the weight update rules, and learning rate should be improved. To compare the prediction performance of the BP network before and after optimization, all the parameters in Table  3 are selected as the input parameters of the traditional BP network-Traditional BP (Tra BP) and the presented BP network-Presented BP (Pre BP). The number of input layer, hidden layer, and output layer nodes of both are 9, 5, and 1, respectively; the default learning rate is 0.024; the three types of iteration steps K are set to 2000, 10,000, and 15,000; and the initial weights are the same and generated randomly. The training and prediction process of the two models is the same as in Section 4.1; the results are shown in Figure 7, whilst the experimental results and predicted results appear in Appendix B at the end of the paper.

Influence of "Identify Factors"
The error functions in the training of the Traditional BP and Presented BP were analyzed, as shown in Figure 8. In the early iteration, the error functions of both decrease rapidly. In the later iteration, the error function curve of Traditional BP decreases very slowly, while that of Presented BP changes suddenly and then decreases rapidly. Under the same number of iteration steps, the error function value e P of Presented BP is smaller than the error function value e T of Traditional BP, and the final training accuracy of Presented BP is higher. Because the two models deal with the local optimal solution problem differently during training. After falling into the local optimal solution, Traditional BP does not process and continues to iterate until the end of training. Presented BP judges the network state by "identify factors". When "identify factors" = 1, the network falls into the local optimal solution, and the model begins to dynamically adjust the learning rate and selectively update the weights of each layer, avoiding the local optimal solution and continuing to search for the global optimal solution. Hence, the "identify factors" in this paper can effectively determine whether the BP network falls into the local optimal solution, which is a good remedy for the defects of the BP algorithm; Presented BP can completely solve the global optimal solution in the training process.
sented BP is higher. Because the two models deal with the local optimal solution proble differently during training. After falling into the local optimal solution, Traditional does not process and continues to iterate until the end of training. Presented BP judg the network state by "identify factors". When "identify factors" = 1, the network falls in the local optimal solution, and the model begins to dynamically adjust the learning ra and selectively update the weights of each layer, avoiding the local optimal solution a continuing to search for the global optimal solution. Hence, the "identify factors" in th paper can effectively determine whether the BP network falls into the local optimal so tion, which is a good remedy for the defects of the BP algorithm; Presented BP can co pletely solve the global optimal solution in the training process.  Figure 9 shows the change in learning rate in the training of the Presented BP. Duri training, the learning rate μ of Presented BP changes dynamically at a specific iterati step, decaying from 0.024 to 0.012 and eventually to 0.008 at the end. According to t principle of the optimized algorithm, in the early iteration, the learning rate μ is kept the default value, so that the model converges quickly to a local optimal solution. In t  Figure 9 shows the change in learning rate in the training of the Presented BP. During training, the learning rate µ of Presented BP changes dynamically at a specific iteration step, decaying from 0.024 to 0.012 and eventually to 0.008 at the end. According to the principle of the optimized algorithm, in the early iteration, the learning rate µ is kept at the default value, so that the model converges quickly to a local optimal solution. In the later iteration, after "identify factors" judges that the network is falling into the local optimal solution, the learning rate µ is dynamically attenuated according to Equation (11), which slows down the magnitude of weight update and improves the accuracy of the target weights. After the network avoids the local optimal solution, the learning rate µ is reset to the default value to adapt to the next early iteration. The dynamic learning rate not only ensures the training efficiency of the prediction model, but also improves the accuracy of the target weights.

Comparison of Prediction Accuracy
Under three types of iteration steps, the prediction accuracy of Traditional BP and Presented BP are compared, as shown in Figure 10. Under the iteration steps of K = 2000, K = 10,000, and K = 15,000, the best prediction accuracy of Traditional BP is 0.0466, 0.0344, and 0.0306, while that of Presented BP is 0.0452, 0.0324, and 0.0295, respectively. The difference between the best prediction accuracy of the two models is small because under the same number of iteration steps, the dimension and feature quantity of the input parameters are the main factors, which affect the best prediction accuracy of the BP model. However, the accuracy of Presented BP is slightly higher because in the early iteration of network training, the two models have not fallen into the local optimal value, and are iterated and updated in the same way. When both fall into the same local optimal solution, Traditional BP is limited by its own algorithm and does not process until the end of training. However, Presented BP judges the network state through "identify factors", and dynamically adjusts the learning rate µ (which reduces the weight update range and improves the solution accuracy of the target weights) and selectively updates the weights (which makes the model avoid the local optimal solution, in order to search for the global optimal solution until the end of training). Presented BP can continuously search for the global optimal value, so its best prediction accuracy is improved. later iteration, after "identify factors" judges that the network is falling into the local optimal solution, the learning rate μ is dynamically attenuated according to Equation (11) which slows down the magnitude of weight update and improves the accuracy of the target weights. After the network avoids the local optimal solution, the learning rate μ is reset to the default value to adapt to the next early iteration. The dynamic learning rate not only ensures the training efficiency of the prediction model, but also improves the accuracy of the target weights.  ing. However, Presented BP judges the network state through "identify factors", and dynamically adjusts the learning rate μ (which reduces the weight update range and improves the solution accuracy of the target weights) and selectively updates the weights (which makes the model avoid the local optimal solution, in order to search for the global optimal solution until the end of training). Presented BP can continuously search for the global optimal value, so its best prediction accuracy is improved. With the increase of the number of iteration steps, the average prediction accuracy of Traditional BP decreases (0.050, 0.054, and 0.055) while that of Presented BP increases (0.049, 0.042, and 0.039). During the training process, after falling into the local optimal solution, Traditional BP continues to iterate and results in overfitting, which reduces the generalization and prediction accuracy of the model. However, there is no over-convergence phenomenon in Presented BP. As the number of iteration steps increases, Presented BP will continue to avoid the local optimal solution and search for the global optimal solution, so the average prediction accuracy will be improved.

Comparison of Predictive Stability
Under three types of iteration steps, the prediction stability of Traditional BP and Presented BP is compared, as shown in Figure 11. Under the iteration steps of K = 2000, K = 10,000, and K = 15,000, the prediction standard deviations of Traditional BP and Presented BP are 0.0017, 0.0166, 0.0175 and 0.0017, 0.0079, 0.0076, respectively. As the number of iteration steps increases, the prediction stability of Presented BP is significantly better. Because the algorithm structure of Presented BP is more reasonable, the dynamic learning rate and the improved weight update rules avoid the model falling into the local optimal solution and keep the prediction stable. When the input parameters are unchanged, the With the increase of the number of iteration steps, the average prediction accuracy of Traditional BP decreases (0.050, 0.054, and 0.055) while that of Presented BP increases (0.049, 0.042, and 0.039). During the training process, after falling into the local optimal solution, Traditional BP continues to iterate and results in overfitting, which reduces the generalization and prediction accuracy of the model. However, there is no overconvergence phenomenon in Presented BP. As the number of iteration steps increases, Presented BP will continue to avoid the local optimal solution and search for the global optimal solution, so the average prediction accuracy will be improved.

Comparison of Predictive Stability
Under three types of iteration steps, the prediction stability of Traditional BP and Presented BP is compared, as shown in Figure 11. Under the iteration steps of K = 2000, K = 10,000, and K = 15,000, the prediction standard deviations of Traditional BP and Presented BP are 0.0017, 0.0166, 0.0175 and 0.0017, 0.0079, 0.0076, respectively. As the number of iteration steps increases, the prediction stability of Presented BP is significantly better. Because the algorithm structure of Presented BP is more reasonable, the dynamic learning rate and the improved weight update rules avoid the model falling into the local optimal solution and keep the prediction stable. When the input parameters are unchanged, the prediction performance of Traditional BP mainly depends on the fixed learning rate and initial weight, so the stability is extremely poor. As the number of iteration steps increases, the over-convergence phenomenon of Traditional BP is more serious, and the prediction fluctuation is bigger. In summary, as the number of iteration steps increases, Presented BP continuously avoids the local optimal solution and the dependence of the model on the initial weights decreases, so the stability of the prediction accuracy improves. prediction performance of Traditional BP mainly depends on the fixed learning rate and initial weight, so the stability is extremely poor. As the number of iteration steps increases, the over-convergence phenomenon of Traditional BP is more serious, and the prediction fluctuation is bigger. In summary, as the number of iteration steps increases, Presented BP continuously avoids the local optimal solution and the dependence of the model on the initial weights decreases, so the stability of the prediction accuracy improves.

Conclusions
Based on the physical model of ground surface roughness, this paper proposes an adaptive BP network prediction model considering grinding wheel wear. However, the increase of input parameter dimension makes the problem of local optimal solution of BP

Conclusions
Based on the physical model of ground surface roughness, this paper proposes an adaptive BP network prediction model considering grinding wheel wear. However, the increase of input parameter dimension makes the problem of local optimal solution of BP network more serious. In this situation, improving the iterative termination conditions of the prediction model, dynamically decaying the learning rate, and adjusting the weight update rules help to solve the problem of local optimal solutions. Comparing the prediction performance of the presented BP network and the traditional BP network, the results show the following: (1) The features of the force signal selected in this paper contain enough grinding wheel state information, which enhances the correlation between the input parameters and the ground surface roughness Ra and improves prediction performance of the model. (2) The "identify factors" effectively judge whether the BP network falls into the local optimal solution and reduces the influence of human factors. The "memory factors" can update and store the best weights in real time during network training. In summary, the presented BP prediction model of ground surface roughness considering grinding wheel wear not only greatly improves the prediction accuracy, but also significantly enhances the prediction stability.
Author Contributions: Funding acquisition, P.Z.; supervision, Y.Y. and P.Z.; validation, X.L., Y.P. and Y.W.; writing-original draft preparation, X.L.; writing-review and editing, X.L., Y.P., Y.Y., Y.W. and P.Z. All authors have read and agreed to the published version of the manuscript. Data Availability Statement: All data generated or analyzed during this study are included in the present article.