The Use of Neural Networks and Genetic Algorithms to Control Low Rigidity Shafts Machining

The article presents an original machine-learning-based automated approach for controlling the process of machining of low-rigidity shafts using artificial intelligence methods. Three models of hybrid controllers based on different types of neural networks and genetic algorithms were developed. In this study, an objective function optimized by a genetic algorithm was replaced with a neural network trained on real-life data. The task of the genetic algorithm is to select the optimal values of the input parameters of a neural network to ensure minimum deviation. Both input vector values and the neural network’s output values are real numbers, which means the problem under consideration is regressive. The performance of three types of neural networks was analyzed: a classic multilayer perceptron network, a nonlinear autoregressive network with exogenous input (NARX) prediction network, and a deep recurrent long short-term memory (LSTM) network. Algorithmic machine learning methods were used to achieve a high level of automation of the control process. By training the network on data from real measurements, we were able to control the reliability of the turning process, taking into account many factors that are usually overlooked during mathematical modelling. Positive results of the experiments confirm the effectiveness of the proposed method for controlling low-rigidity shaft turning.


Introduction
About half of all parts used in different types of machinery and mechanical devices are rotating parts. They include gears, cylinders, bushings, discs, and hubs. Most of those rotating parts (approximately 40%) are various types of shafts, of which about 12% are low-rigidity shafts. There is no single accepted definition of low-rigidity shafts; however, this category is commonly taken to include shafts with a length-to-diameter ratio of no less than 10. This means that these parts have irregular, strongly elongated shapes. Low-rigidity shafts are used in the electromechanical, tool-making, automotive, and aerospace industries, as well as in precision mechanics and many other areas of application.
Rotating parts, including shafts, are most commonly machined by turning. During this type of machining operation, the workpiece rotates at a certain angular velocity, which promotes vibration. The lower the contact stiffness of the workpiece, the greater its susceptibility to vibrations, which means that low rigidity shafts are particularly liable to chatter. The vibrations that occur during machining of shafts reduce the reliability of the turning process, affecting in a negative way the dimensional accuracy, waviness, and roughness of turned surfaces. Turning accuracy is commonly measured as the deviation y expressed in millimeters. If we assume that deviation is a function of n-arguments

The Application of Artificial Intelligence in the Machining of Parts
The literature provides numerous examples of application of artificial intelligence methods in controlling production processes. Most of them are solutions implemented in numerical machine tools. Yu, Kabaldin, and Shatagin, for example, describe the use of artificial neural networks for the classification of a point cloud in a 3D model of a workpiece, which allows to automatically analyze the shape of the workpiece machined in the working space of a CNC machine tool [8]. In turn, Moreira et al. applied fuzzy logic and neural networks to develop a controller for adjusting milling parameters in real time [9]. Mironova describes an intellectual system based on functional semantic network technologies, developed to ensure the accuracy of machining with point tools [10]. She found that axial misalignment of openings machined with high-speed steel drills depended on/was caused by tool advance and its rotational frequency. Sharma, Chawla, and Ram describe machine learning algorithms, namely support vector machine (SVM), restricted Boltzmann machine (RBM), and deep belief network (DBN), for the automatic programming of a computer numerical control machine [11]. Yusup et al. estimated optimal abrasive waterjet machining control parameters using artificial bee colony [12]. Fang, Pai, and Edwards developed a model for predicting roughness of machined surfaces. They used multilayer perceptron (MLP) neural networks to process multidimensional signals generated during metal machining operations, including three-dimensional cutting force signals and three-dimensional cutting vibration signals [13]. Naresh, Bose, and Rao report the results of a comparative study of artificial neural network (ANN) and adaptive neuro-fuzzy inference system models for the improved prediction of wire electro-discharge machining responses, such as material removal rate and surface roughness of a Nitinol alloy [14].
These examples indicate that machine learning methods can be used as effective predictive tools in controlling machining processes. Optimization methods can also be successfully used as components of production process control, as evidenced by various publications [6,[15][16][17][18][19].

Innovative Aspects of the Proposed Approach
Increasing the level of automation of CNC machine tools can be obtained by the dynamic methods of machining control application. Yusuf Altintas presents many examples of open-loop and closed-loop robust nonlinear control systems in his book [20]. Modern control of such systems is based on adaptive methods such as feed drive control system, sliding mode controller with disturbance estimation, or intelligent machining module. With intelligent machining, tasks such as adaptive control, tool condition monitoring, and process control can be performed. Taking into account the diversity of tasks that are realized simultaneously during the modern automated CNC machining process, it must be mentioned that machining dynamics analysis methods can be used for the control of low-rigidity shafts turning process as supplemental ones. The combination of different control methods gives the possibility of reaching a synergy effect. However, such hybrid solutions need additional research in defined areas of CNC machining. The chatter can be detected by continuously monitoring the amplitude of the sound spectrum. It can be measured by a microphone. Vibrations can be reduced by adjusting the cutting speed. However, it causes a reduction in the efficiency of the machining process.
Wang at al. [21] described an adaptive intelligent control system based on the constant cutting force and a smart machining tool. The smart cutting tool developed by the authors provided data on cutting force measurement, with a plug-and-produce feature, rendering a simple and compact low-cost sensing tool configuration. The authors state that the development of adaptive smart machining based on using smart cutting tools and the associated smart algorithms minimizes the machining time and improves surface roughness. The results presented in this paper completely prove this thesis.
The goal of this article is to present an original method for the adaptive control of turning low-rigidity shafts based on artificial intelligence and machine learning methods. A predictive controller algorithm was developed in which neural networks and genetic algorithms (GA) were implemented. The neural network generates the value of deviation y on the basis of two input variables: F x1 -moving tensile force and e-eccentricity of tensile force under tension. In mathematical terms, the neural network, treated as a black box, plays the role of an optimized fitness function given by the general Formula (1).
In the next step, the objective function (1) is optimized using a GA. The GA minimizes the deviation y by appropriately selecting force F x1 and eccentricity e. The novelty of the presented concept lies in the proper training of the neural network, which, once it acquires the ability to generalize, can effectively convert input data into deviation. Owing to the fact that the neural network was trained using real-life data, the measurements take into account the impact of many interfering factors, which, in the case of machine learning, unlike in mathematical methods, are not known. Actually, all the necessary information is contained in the learning data set. The use of a neural network as an objective function for a GA is also a novel idea. By replacing the classic objective function, which is given by a detailed mathematical formula, with a neural network, we solved the problem of disruptions and difficult-to-define factors affecting the turning process. The level of process automation was also increased.
The article comprises four sections. Section 1 presents the theoretical aspects of turning low-rigidity shafts and a review of the relevant literature. Section 2 describes the key aspects of modelling the machining process and the application of algorithmic methods. Section 3 reports the results obtained using the algorithmic methods developed in this study and compares them to verify the effectiveness of the network methods used. The article concludes with Section 4, which contains observations and reflections made during the experiments, analyses, and modelling. Figure 1 shows how the low-rigidity shaft was secured in the turning machine. The line defining the elastic limit of the workpiece is marked in red. There exist numerous theoretical methods for controlling the accuracy of machining elastically deformable shafts [22]. The accuracy of machining low-rigidity shafts can be effectively improved by increasing their stiffness via an oriented change in the elastic-deformable state [23]. This type of process control can be exerted by applying a tensile force to the workpiece, which, combined with the cutting force, produces longitudinal-transverse loads. Additionally, one can control the rotation angle of the cross-section of the workpiece at the holding point by applying a tensile force displaced relative to the axis of the lathe centers [24]. This work-holding solution can be depicted as a movable rotary support ( Figure 1). The present experiments were carried out using this method of controlling the machining of low-rigidity shafts. Formulas (2)-(4) specify the loading conditions of the shaft shown in Figure 1.

Materials and Methods
where: d-shaft diameter; e-eccentricity.
Because the kinematic functions describing shaft turning do not take into account all the factors involved in the process, they are not sufficient to ensure an optimal level of control. In the present model, the turning process was optimized by minimizing the deviation function (where deviation is a measure of the quality of turning), which had a direct impact on the quality of the surface of machined shafts. Figure 2 shows photographic images of the test stand used in the experiments. The spindle of the turning machine shown in the images is equipped with a rotary collet vise which allows to stretch the shaft secured in it. Apart from that, the position of the turning tailstock can be adjusted to change the angle of rotation of the shaft cross-section at the holding point by applying a tensile force displaced relative to the axis of the lathe centers. In this setup, the application of one controllable force factor (eccentric stretching) allows to produce two force factors in any pre-defined section of Figure 1. Work-holding method for securing the low-rigidity shaft specimen in the turning machine. Notation: F be -bending force exerted by the cutting tool bit, F x -tensile force along the x axis. x 2 , y 1 , y 2 -current coordinates at each section of the workpiece, a-distance from spindle to the tip of the cutting tool bit, L-length of shaft, M 0 , Q 0 -initial parameters: moment and transverse force at the holding point, respectively, M 1 -moment generated by the axial component of cutting force, M 2 -moment generated at the holding point at which the part is secured to the tailstock of the turning machine.
Formulas (2)-(4) specify the loading conditions of the shaft shown in Figure 1.
where: F f -axial component of cutting force.
Because the kinematic functions describing shaft turning do not take into account all the factors involved in the process, they are not sufficient to ensure an optimal level of control. In the present model, the turning process was optimized by minimizing the deviation function (where deviation is a measure of the quality of turning), which had a direct impact on the quality of the surface of machined shafts. Figure 2 shows photographic images of the test stand used in the experiments. The spindle of the turning machine shown in the images is equipped with a rotary collet vise which allows to stretch the shaft secured in it. Apart from that, the position of the turning tailstock can be adjusted to change the angle of rotation of the shaft cross-section at the holding point by applying a tensile force displaced relative to the axis of the lathe centers. In this setup, the application of one controllable force factor (eccentric stretching) allows to produce two force factors in any pre-defined section of the specimen (particularly in the machining zone): longitudinal force F x1 and bending moment M 2 = F x1 ·e, which counteracts cutting forces. , which counteracts cutting forces.

Data Preparation
Data for neural network training were collected during test stand experiments. A mechanical system was developed in which the process of machining a low rigidity shaft was controlled using two types of regulatory impacts-tensile force 1 x F and eccentricity e. An optimization problem described by the objective function (5)  It was assumed that all values of the parameters of the objective function (5), except for Fx1 and e, remained constant during turning. A shaft with a length of L = 300 mm was subjected to turning. Figures 3-6 show the curves of the objective function y, tensile force Fx1, and eccentricity e, for predefined shaft diameters and cutting forces. Data collected during the tests were modelled using a two-dimensional gradient descent search algorithm. The section of the shaft from 1 to 300 mm was divided into 5981 0.05 mm long (parts corresponding to) measuring intervals. Fx10 is the initial value of force Fx1. (a) Roughness measuring instrument; tailstock collet assembly for machining elastic-deformable shafts: (b) idle position, tensile force of 2 kN; (c) view of the test stand with the shaft secured in the lathe (Ø6, L = 300 mm); (d) specimens.

Data Preparation
Data for neural network training were collected during test stand experiments. A mechanical system was developed in which the process of machining a low rigidity shaft was controlled using two types of regulatory impacts-tensile force F x1 and eccentricity e. An optimization problem described by the objective function (5) was formulated.  (5), except for F x1 and e, remained constant during turning. A shaft with a length of L = 300 mm was subjected to turning. Figures 3-6 show the curves of the objective function y, tensile force F x1 , and eccentricity e, for predefined shaft diameters and cutting forces. Data collected during the tests were modelled using a two-dimensional gradient descent search algorithm. The section of the shaft from 1 to 300 mm was divided into 5981 0.05 mm long (parts corresponding to) measuring intervals. F x10 is the initial value of force F x1 .           The data given above were used to train three types of neural networks: a shallow MLP ANN, a flat nonlinear autoregressive network with exogenous input (NARX) designed for the prediction of multidimensional time series and signals, and a long short-term memory (LSTM) neural network, which is a recurrent deep learning network. Algorithm 1 shows the workflow of the neuron-genetic controller. The data given above were used to train three types of neural networks: a shallow MLP ANN, a flat nonlinear autoregressive network with exogenous input (NARX) designed for the prediction of multidimensional time series and signals, and a long short-term memory (LSTM) neural network, which is a recurrent deep learning network. Algorithm 1 shows the workflow of the neuron-genetic controller.

Shallow Neural Network
In the first variant of the experiment, an MLP ANN was developed ( Figure 7). This network has three inputs: a, Fx1, and e, one hidden layer containing 10 neurons and one output layer with a single y output representing deviation, which is a measure of the roughness of the machined shaft surface. A hyperbolic tangent sigmoid transfer function was used in the hidden layer, and a linear transfer function was used in the output layer. determine the initial conditions by performing a turning operation to produce one low rigidity shaft: determine bending force F be c.
determine cutting force component F f d. determine the initial distance from the cutting edge to the point at which the workpiece is secured in the spindle a 0 e. determine initial axial force F x10 f. determine initial eccentricity e 0 2. record measurements of F x1 , e and deviation y every 0.05 mm by changing the values of parameters F x1 and e within the pre-defined range 3.
use the measurement data to train an ANN to predict y = f (F x1 , e) 4. minimize deviation y, which is the output value of the neural network which serves as the objective function y = min Fx1,e φ(d, L, F f , v, a p , f , a, x, F x1 , e) for the GA

Shallow Neural Network
In the first variant of the experiment, an MLP ANN was developed ( Figure 7). This network has three inputs: a, F x1 , and e, one hidden layer containing 10 neurons and one output layer with a single y output representing deviation, which is a measure of the roughness of the machined shaft surface. A hyperbolic tangent sigmoid transfer function was used in the hidden layer, and a linear transfer function was used in the output layer. The data given above were used to train three types of neural networks: a shallow MLP ANN, a flat nonlinear autoregressive network with exogenous input (NARX) designed for the prediction of multidimensional time series and signals, and a long short-term memory (LSTM) neural network, which is a recurrent deep learning network. Algorithm 1 shows the workflow of the neuron-genetic controller.

Shallow Neural Network
In the first variant of the experiment, an MLP ANN was developed ( Figure 7). This network has three inputs: a, Fx1, and e, one hidden layer containing 10 neurons and one output layer with a single y output representing deviation, which is a measure of the roughness of the machined shaft surface. A hyperbolic tangent sigmoid transfer function was used in the hidden layer, and a linear transfer function was used in the output layer. Two measures of the quality of the trained network were used-mean square error (MSE) and regression R. Formula (6) gives the method of calculating MSE: Sensors 2020, 20, 4683 where n-number of cases in a given set; y i -reference value for the i-th shaft section; y * i -predicted value for the i-th shaft section.
The method of calculating the regression coefficient R is given by Formula (2): where σ y -standard deviation of reference values, σ y * -standard deviation of predicted values. In Table 1, the data set with 5981 cases is divided into three subsets: a training subset, a validation subset, and a test subset in a ratio of 70:15:15. Table 1 also shows MSE and R values determined for variant (I) data, which are visualized in Figure 3. The generalization capacity of a trained neural network is better the lower the value of MSE and the higher the value of R. Figure 8 shows a plot of MSE over training epochs. The learning curve ( Figure 8a) has a regular hyperbolic shape, which testifies to the high quality of the trained network. The high degree of overlap between the curves for the training, validation and test subsets also confirms that the network has a high ability to generalize predictions and that there is no overfitting. ) ( 1 (6) where n-number of cases in a given set; ' i y -reference value for the i-th shaft section; * i y -predicted value for the i-th shaft section.
The method of calculating the regression coefficient R is given by Formula (2): where ' y σ -standard deviation of reference values, * y σ -standard deviation of predicted values.
In Table 1, the data set with 5981 cases is divided into three subsets: a training subset, a validation subset, and a test subset in a ratio of 70:15:15. Table 1 also shows MSE and R values determined for variant (I) data, which are visualized in Figure 3. The generalization capacity of a trained neural network is better the lower the value of MSE and the higher the value of R. Figure 8 shows a plot of MSE over training epochs. The learning curve ( Figure 8a) has a regular hyperbolic shape, which testifies to the high quality of the trained network. The high degree of overlap between the curves for the training, validation and test subsets also confirms that the network has a high ability to generalize predictions and that there is no overfitting. To prevent overfitting of the ANN, the early stopping technique was used. The method consists in monitoring the validation error in the individual epochs. If the error did not decrease for six consecutive epochs, training was terminated. Network training was also constrained by setting a limit on the maximum number of epochs. In the case under consideration, this limit was 20 epochs. The ANN was trained using the Levenberg-Marquardt (LMA) optimization algorithm, which includes finding the zeros of Newton's function. This type of algorithm, also known as a back error propagation algorithm, is characterized by high speed and high memory requirements. Two important parameters of LMA are gradient and momentum (Mu). Figure 9b shows a graph of To prevent overfitting of the ANN, the early stopping technique was used. The method consists in monitoring the validation error in the individual epochs. If the error did not decrease for six consecutive epochs, training was terminated. Network training was also constrained by setting a limit on the maximum number of epochs. In the case under consideration, this limit was 20 epochs. The ANN was trained using the Levenberg-Marquardt (LMA) optimization algorithm, which includes finding the zeros of Newton's function. This type of algorithm, also known as a back error propagation algorithm, is characterized by high speed and high memory requirements. Two important parameters of LMA are gradient and momentum (Mu). Figure 9b shows a graph of gradient values during ANN training. The lack of clear fluctuations and the downward trend demonstrate that the training procedure worked well. Figure 9c shows a graph of Mu values. The momentum decreases and is analogous to the inertia of the search for the minimum point of the objective function; therefore, the closer to the minimum sought, the lower the Mu. Figure 9a shows an error histogram. The fact that the shape of the histogram resembles a normal distribution curve and that the largest number of errors has the lowest values demonstrates that the trained network is good quality and that there are no symptoms of overfitting.

NARX Neural Network
In the second variant of the experiment, we used a NARX with feedback connections. NARX are recurrent dynamic neural networks designed to predict single or multiple time series. Prediction can be made in the so-called closed-loop model, which means that the output values are passed back to the input, thus supporting the prediction. In the case at hand, NARX network inputs included components Fx1 and e, and also, in the variant with a closed-loop NARX, the previously obtained actual value of deviation yt-1. The equation that defines the operation of the NARX network is given by Formula (8) [25]: -sequential values of the NARX input signal; y n -number of outputs; x n -number of inputs. According to Formula (8), the next value of output signal y (t) is regressed on previous values of the input signal and previous values of the output signal. Figure 10 shows the structure of the NARX network. The network has two inputs, which are signals with values Fx1 and e. The hidden layer contains 10 neurons, and the output layer consists of a single deviation signal y. The network was created and trained in open-loop form as shown in Figure 10a. Open loop (single-step) training is more efficient than closed loop (multi-step) training. An open loop allows the network to be fed with the correct previous output values to produce the correct current outputs. After training, the network is converted to a closed-loop form required by the application. Table 2 shows data divided into training, validation, and test subsets, as well as the results of training the NARX open-loop network. It is worth noting that although the training results are better than for the shallow ANN, they are not final results yet. The actual quality of a NARX network is measured by calculating MSE and R parameters after it has been transformed into a closed-loop network.   Figure 9a shows an error histogram. The fact that the shape of the histogram resembles a normal distribution curve and that the largest number of errors has the lowest values demonstrates that the trained network is good quality and that there are no symptoms of overfitting.

NARX Neural Network
In the second variant of the experiment, we used a NARX with feedback connections. NARX are recurrent dynamic neural networks designed to predict single or multiple time series. Prediction can be made in the so-called closed-loop model, which means that the output values are passed back to the input, thus supporting the prediction. In the case at hand, NARX network inputs included components F x1 and e, and also, in the variant with a closed-loop NARX, the previously obtained actual value of deviation y t-1 . The equation that defines the operation of the NARX network is given by Formula (8) [25]: where F(·)-mapping function; y(t − 1)-output of NARX at moment t for moment t + 1; y(t), y(t − 1),. . . , y(t − n y )-actual past values of the signal; (t + 1)x(t), x(t − 1), . . . , x(t − n x )-sequential values of the NARX input signal; n y -number of outputs; n x -number of inputs. According to Formula (8), the next value of output signal y(t) is regressed on previous values of the input signal and previous values of the output signal. Figure 10 shows the structure of the NARX network. The network has two inputs, which are signals with values F x1 and e. The hidden layer contains 10 neurons, and the output layer consists of a single deviation signal y. The network was created and trained in open-loop form as shown in Figure 10a. Open loop (single-step) training is more efficient than closed loop (multi-step) training. An open loop allows the network to be fed with the correct previous output values to produce the correct current outputs. After training, the network is converted to a closed-loop form required by the application. Table 2 shows data divided into training, validation, and test subsets, as well as the results of training the NARX open-loop network. It is worth noting that although the training results are better than for the shallow ANN, they are not final results yet. The actual quality of a NARX network is measured by calculating MSE and R parameters after it has been transformed into a closed-loop network.   Figure 11a,b confirm the high quality of NARX training. The MSE curves for the training, validation, and test subsets are almost identical. Training was terminated after 20 epochs. Similar to ANN, early stopping was used to protect the NARX network against overfitting. The LMA algorithm was used to train the NARX network.   Figure 11a,b confirm the high quality of NARX training. The MSE curves for the training, validation, and test subsets are almost identical. Training was terminated after 20 epochs.
Similar to ANN, early stopping was used to protect the NARX network against overfitting. The, L.M.A. algorithm was used to train the NARX network. Figure 12a-c also confirms the high quality of the training process. The explanations and conclusions that can be drawn from the analysis of the data shown in those figures are analogous to those for Figure 9a-c. Table 3 shows the results obtained after the open-loop NARX network had been converted into a closed-loop network. It can be seen that the NARX network which predicts results a step ahead, shows excellent performance. The quality of the prediction for the whole sequence is much worse; however, this was not important in the context of the present study, because predictions were not made here for horizons longer than one step. For this reason, indicators of quality of closed-loop NARX for whole sequence prediction were not taken into account when assessing the performance of the neural networks for controlling the process of turning low-rigidity shafts.  Figure 11a,b confirm the high quality of NARX training. The MSE curves for the training, validation, and test subsets are almost identical. Training was terminated after 20 epochs. Similar to ANN, early stopping was used to protect the NARX network against overfitting. The LMA algorithm was used to train the NARX network.   Table 3 shows the results obtained after the open-loop NARX network had been converted into a closed-loop network. It can be seen that the NARX network which predicts results a step ahead, shows excellent performance. The quality of the prediction for the whole sequence is much worse; however, this was not important in the context of the present study, because predictions were not made here for horizons longer than one step. For this reason, indicators of quality of closed-loop NARX for whole sequence prediction were not taken into account when assessing the performance of the neural networks for controlling the process of turning low-rigidity shafts. 0.5506 Figure 13 shows regression statistics for the closed-loop step-ahead NARX for all cases from the training, validation, and test subsets. Figure 13a shows regression for the entire set, which is close to 1. Figure 13b shows data for 16 randomly selected cases, allowing to observe deviations of predictions from the reference value (target). For this set of 16 cases, R = 0.99946, which is confirmed by the high degree of overlap of measuring points and a fit line with the ideal prediction line Y = T.  Figure 13 shows regression statistics for the closed-loop step-ahead NARX for all cases from the training, validation, and test subsets. Figure 13a shows regression for the entire set, which is close to 1. Figure 13b shows data for 16 randomly selected cases, allowing to observe deviations of predictions from the reference value (target). For this set of 16 cases, R = 0.99946, which is confirmed by the high degree of overlap of measuring points and a fit line with the ideal prediction line Y = T.   Table 3 shows the results obtained after the open-loop NARX network had been converted into a closed-loop network. It can be seen that the NARX network which predicts results a step ahead, shows excellent performance. The quality of the prediction for the whole sequence is much worse; however, this was not important in the context of the present study, because predictions were not made here for horizons longer than one step. For this reason, indicators of quality of closed-loop NARX for whole sequence prediction were not taken into account when assessing the performance of the neural networks for controlling the process of turning low-rigidity shafts. 0.5506 Figure 13 shows regression statistics for the closed-loop step-ahead NARX for all cases from the training, validation, and test subsets. Figure 13a shows regression for the entire set, which is close to 1. Figure 13b shows data for 16 randomly selected cases, allowing to observe deviations of predictions from the reference value (target). For this set of 16 cases, R = 0.99946, which is confirmed by the high degree of overlap of measuring points and a fit line with the ideal prediction line Y = T.

Deep Network LSTM
In the third variant of the predictive model for controlling the process of turning low-rigidity shafts, we used a deep LSTM neural network. LSTM is a recurrent network. It has a more complex structure than MLP, which endows it with special properties for learning long-term relationships between individual sequential cases. Figure 14 shows the workflow of the LSTM network [26].

Deep Network LSTM
In the third variant of the predictive model for controlling the process of turning low-rigidity shafts, we used a deep LSTM neural network. LSTM is a recurrent network. It has a more complex structure than MLP, which endows it with special properties for learning long-term relationships between individual sequential cases. Figure 14 shows the workflow of the LSTM network [26].
Update Output x t i o f Figure 14. Structure of a long short-term memory (LSTM) layer [26].
Each of the LSTM layers contains two states, where ht is the hidden (initial) state at moment t and ct is the cell state at moment t. The cell state contains information learned in previous time steps. At each stage, each LSTM layer adds or removes information from the cell state. Information is updated using gates. The gates have the task of controlling the level of cell state: f-reset (forget), (i)-the input gate (update) controls the level of cell state update, g-candidate cell (update), (o)-output gate.
Equations (9) describe the components of the LSTM layer at time step t ) ( where W-weights, R-recurrent weights, b-biases, σ-sigmoidal gate activation functions expressed by where σc is the state activation function. The cell state at a given time step t is described by where  denotes element-wise multiplication of vectors. The architecture of the LSTM network used in this study is shown in Table 4. As in the NARX network, the input layer of the LSTM network consists of variables Fx1, e and the y value imported from the input. Accordingly, there are three activations in the input layer. The second layer is a bidirectional LSTM layer (BiLSTM) with 200 activations. It learns bidirectional long-term dependencies between steps of sequences. Such dependencies can be useful when the network should learn from full time series at each stage. The next layer is a fully connected layer with one activation. The last layer is the regression output variable y. Each of the LSTM layers contains two states, where h t is the hidden (initial) state at moment t and c t is the cell state at moment t. The cell state contains information learned in previous time steps. At each stage, each LSTM layer adds or removes information from the cell state. Information is updated using gates. The gates have the task of controlling the level of cell state: f-reset (forget), (i)-the input gate (update) controls the level of cell state update, g-candidate cell (update), (o)-output gate.
Equation (9) describe the components of the LSTM layer at time step t: where W-weights, R-recurrent weights, b-biases, σ-sigmoidal gate activation functions expressed by σ(x) = (1 + e −x ) −1 , h t -hidden state at time step t described as h t = o t • σ c (c t ), where σ c is the state activation function. The cell state at a given time step t is described by denotes element-wise multiplication of vectors. The architecture of the LSTM network used in this study is shown in Table 4. As in the NARX network, the input layer of the LSTM network consists of variables F x1 , e and the y value imported from the input. Accordingly, there are three activations in the input layer. The second layer is a bidirectional LSTM layer (BiLSTM) with 200 activations. It learns bidirectional long-term dependencies between steps of sequences. Such dependencies can be useful when the network should learn from full time series at each stage. The next layer is a fully connected layer with one activation. The last layer is the regression output variable y. The LSTM network was trained using the adaptive moment estimation optimization method (ADAM), for which: regularization factor L2 = 1 × 10 −4 , initial learning rate 0.05, learn rate drop factor 0.1, learning period 10, momentum 0.9. The learning conditions included a maximum of 5 epochs and a minimum batch size of 64. Two measures of learning quality were used-RMSE and loss. RMSE is the rooted MSE (10).
The loss function is given by Equation (11): where m-number of observations, n-number of responses, y n -reference values, y * n -response values. The learning effectiveness of LSTM is illustrated in Figures 15 and 16. The RMSE and loss curves are similar and they both show that the learning process proceeded in a correct manner. The initially high error values and losses decreased quickly to eventually stabilize at a constant level. Then, network training was terminated.  Regression output -- The LSTM network was trained using the adaptive moment estimation optimization method (ADAM), for which: regularization factor L2 = 1 × 10 −4 , initial learning rate 0.05, learn rate drop factor 0.1, learning period 10, momentum 0.9. The learning conditions included a maximum of 5 epochs and a minimum batch size of 64. Two measures of learning quality were used-RMSE and loss. RMSE is the rooted MSE (10). The learning effectiveness of LSTM is illustrated in Figures 15 and 16. The RMSE and loss curves are similar and they both show that the learning process proceeded in a correct manner. The initially high error values and losses decreased quickly to eventually stabilize at a constant level. Then, network training was terminated.     Table 5 presents additional parameters illustrating the learning process. Mini-batch RMSE stabilized after the first epoch, while mini-batch loss stabilized after the second epoch. The base learning rate was 0.05 throughout the entire learning process. Table 5. Layers of the LSTM after feature extraction.  Table 6 presents the results of training the LSTM network.

Epoch Iteration RMSE Mini-Batch Mini-Batch Loss Base Learning Rate
Step-ahead prediction was very effective, although the MSE parameter was slightly lower than in the case of NARX. Whole sequence prediction was much less effective. This, however, was of no consequence for the present experiments, because this type of prediction was not used in controlling the accuracy of the machining of shafts.

GA-Based Controller
GA are based on natural evolutionary processes. In nature, individuals that are best adapted to specific conditions have the best chances of survival and reproduction. As a result, subsequent generations are even better adapted than the previous ones, because they have inherited the best traits (ones that are best suited to their living conditions) from their parents. The same idea is used in evolutionary computational algorithms. GA are able to solve optimization problems with both real and integer types of constraints. They are based on a stochastic, population algorithm that searches randomly by mutation and crossover among elements of the population. Each population consists of a set of chromosomes, and each chromosome is a vector composed of genes. Genes have binary   Table 6 presents the results of training the LSTM network.
Step-ahead prediction was very effective, although the MSE parameter was slightly lower than in the case of NARX. Whole sequence prediction was much less effective. This, however, was of no consequence for the present experiments, because this type of prediction was not used in controlling the accuracy of the machining of shafts.

GA-Based Controller
GA are based on natural evolutionary processes. In nature, individuals that are best adapted to specific conditions have the best chances of survival and reproduction. As a result, subsequent generations are even better adapted than the previous ones, because they have inherited the best traits (ones that are best suited to their living conditions) from their parents. The same idea is used in evolutionary computational algorithms. GA are able to solve optimization problems with both real and integer types of constraints. They are based on a stochastic, population algorithm that searches randomly by mutation and crossover among elements of the population. Each population consists of a set of chromosomes, and each chromosome is a vector composed of genes. Genes have binary values of 0 or 1. The computational process for a classical GA comprises six stages: encoding, evaluation, selection, crossover (reproduction), mutation, and decoding.
Encoding consists in stochastically generating the initial population. In the next step, the degree of fit of each chromosome is evaluated by calculating the fitness function value for each chromosome.
In the present case, a neural network plays the role of fitness function. The higher the value of the objective function for a given chromosome, the better suited it is to solve the problem described by the objective function. The evaluation parameters assigned to chromosomes determine the likelihood that a given chromosome will be carried to the next stage (mutation). Mutation is the transformation O m : D(P) → D(P) which randomly alters the l-th component of the solution (chromosome) X t i at a predefined probability: Figure 17 shows a general scheme of the neural-genetic controller.
values of 0 or 1. The computational process for a classical GA comprises six stages: encoding, evaluation, selection, crossover (reproduction), mutation, and decoding.
Encoding consists in stochastically generating the initial population. In the next step, the degree of fit of each chromosome is evaluated by calculating the fitness function value for each chromosome. In the present case, a neural network plays the role of fitness function. The higher the value of the objective function for a given chromosome, the better suited it is to solve the problem described by the objective function. The evaluation parameters assigned to chromosomes determine the likelihood that a given chromosome will be carried to the next stage (mutation). Mutation is the   Figure 18 shows the best fitness plot of the GA. It plots the best function value in each generation compared to the iteration number. In the present optimization, the best fitness function value was 0.35655, and the mean value was 0.356526.

Results and Discussion
As previously mentioned, the main task of the predictive controller algorithm is to minimize the deviation function (1). The deviation y is directly correlated with the surface roughness of a machined shaft, and it is a measure of the quality of turning. The quality of the neural network prediction is expressed by means of the MSE and R indicators ( Table 7). The more accurate the A genetic minimizer was used to control the machining of low-rigidity shafts. The optimization problem can be formulated as min x f (X), where X-vector of input variables. Figure 18 shows the best fitness plot of the GA. It plots the best function value in each generation compared to the iteration number. In the present optimization, the best fitness function value was 0.35655, and the mean value was 0.356526. values of 0 or 1. The computational process for a classical GA comprises six stages: encoding, evaluation, selection, crossover (reproduction), mutation, and decoding.
Encoding consists in stochastically generating the initial population. In the next step, the degree of fit of each chromosome is evaluated by calculating the fitness function value for each chromosome. In the present case, a neural network plays the role of fitness function. The higher the value of the objective function for a given chromosome, the better suited it is to solve the problem described by the objective function. The evaluation parameters assigned to chromosomes determine the likelihood that a given chromosome will be carried to the next stage (mutation). Mutation is the  Figure 18 shows the best fitness plot of the GA. It plots the best function value in each generation compared to the iteration number. In the present optimization, the best fitness function value was 0.35655, and the mean value was 0.356526.

Results and Discussion
As previously mentioned, the main task of the predictive controller algorithm is to minimize the deviation function (1). The deviation y is directly correlated with the surface roughness of a machined shaft, and it is a measure of the quality of turning. The quality of the neural network prediction is expressed by means of the MSE and R indicators ( Table 7). The more accurate the Figure 18. Genetic algorithm-the best fitness plot.

Results and Discussion
As previously mentioned, the main task of the predictive controller algorithm is to minimize the deviation function (1). The deviation y is directly correlated with the surface roughness of a machined shaft, and it is a measure of the quality of turning. The quality of the neural network prediction is expressed by means of the MSE and R indicators ( Table 7). The more accurate the prediction of the neural network and the higher the level of optimization of the fitness function performed by the genetic algorithm, the smaller the deviation y that expresses the quality of turning. In order to best assess the quality of the neural networks used, a number of cases were extracted from the training subset before training, which were later used to test the individual network variants. Table 7 shows the results of tests determining the performance of the individual types of neural networks in controlling the machining of low-rigidity shafts.
An analysis of the data given in Table 7 indicates that the best results for the tested data set were obtained using the LSTM network. It is worth noting that the differences in MSE and R between the three types of networks are negligible. When MSE and regression R are considered, even the results of the least effective network (MLP ANN) are sufficient to control the process of turning low-rigidity shafts. LSTM and NARX are all the more suitable for this purpose.

Shallow MLP Network
An MLP ANN is fundamentally different from a NARX and LSTM. It not only has a distinct structure and lacks feedback and recurrent solutions, but, above all, it does not take into account the order of occurrence of the individual measurements on the time axis. For this reason, ANNs are rarely used to predict time sequences or event sequences. This does not mean, however, that they cannot be used for those purposes.
Parametric data collected during the turning of shafts constitute a certain sequence. To provide sequencing information, the value of parameter a (distance from the cutting edge to the point at which the workpiece is secured in the spindle) was entered into the MLP ANN input data vector (Figure 1). As a result, each three-component MLP ANN input vector, consisting of variables a, F x1 , and e, now had a specific index (variable a). This allowed to extract the test subset from the 5981-element data set, by first randomly mixing the cases, and then cutting off cases 1 to 5500 for the training set and cases 5501 to 5981 for the test set. The test subset obtained in this way contained 481 cases. This means that ANN was trained on a set which was unordered but indexed by the value of a. Owing to this, training produced very good results. Figure 19 shows the differences between predicted values and reference values of deviation y. Figure 20 shows a detail of Figure 19 to better visualize the deviations between the prediction line and the reference line. The 481 measurements were plotted on the horizontal axis in such a way that this axis corresponded to the total length of the machined shaft L = 300 mm.
Sensors 2020, 20, x FOR PEER REVIEW 17 of 23 prediction of the neural network and the higher the level of optimization of the fitness function performed by the genetic algorithm, the smaller the deviation y that expresses the quality of turning. In order to best assess the quality of the neural networks used, a number of cases were extracted from the training subset before training, which were later used to test the individual network variants. Table 7 shows the results of tests determining the performance of the individual types of neural networks in controlling the machining of low-rigidity shafts. An analysis of the data given in Table 7 indicates that the best results for the tested data set were obtained using the LSTM network. It is worth noting that the differences in MSE and R between the three types of networks are negligible. When MSE and regression R are considered, even the results of the least effective network (MLP ANN) are sufficient to control the process of turning low-rigidity shafts. LSTM and NARX are all the more suitable for this purpose.

Shallow MLP Network
An MLP ANN is fundamentally different from a NARX and LSTM. It not only has a distinct structure and lacks feedback and recurrent solutions, but, above all, it does not take into account the order of occurrence of the individual measurements on the time axis. For this reason, ANNs are rarely used to predict time sequences or event sequences. This does not mean, however, that they cannot be used for those purposes.
Parametric data collected during the turning of shafts constitute a certain sequence. To provide sequencing information, the value of parameter a (distance from the cutting edge to the point at which the workpiece is secured in the spindle) was entered into the MLP ANN input data vector ( Figure 1). As a result, each three-component MLP ANN input vector, consisting of variables a, Fx1, and e, now had a specific index (variable a). This allowed to extract the test subset from the 5981-element data set, by first randomly mixing the cases, and then cutting off cases 1 to 5500 for the training set and cases 5501 to 5981 for the test set. The test subset obtained in this way contained 481 cases. This means that ANN was trained on a set which was unordered but indexed by the value of a. Owing to this, training produced very good results. Figure 19 shows the differences between predicted values and reference values of deviation y. Figure 20 shows a detail of Figure 19 to better visualize the deviations between the prediction line and the reference line. The 481 measurements were plotted on the horizontal axis in such a way that this axis corresponded to the total length of the machined shaft L = 300 mm.

NARX Neural Network
In the case of NARX, it was impossible to apply the method of extraction of the test set used for ANN. This was because NARX had no index in the input vector. To preserve the order of the sequence, NARX was designed to include feedback connections, where the previous input value of deviation y-t 1 was provided as a third input vector component, beside Fx1 and e. Figure 21 shows the deviation of predicted values from reference values for the NARX network. As in the case of ANN, 481 measurements were plotted on the horizontal axis so that this axis corresponded to the total length of the machined shaft L = 300 mm. Figure 22 shows the machining quality prediction (a detail of Figure 21) to better visualize the deviations between the prediction line and the reference line

NARX Neural Network
In the case of NARX, it was impossible to apply the method of extraction of the test set used for ANN. This was because NARX had no index in the input vector. To preserve the order of the sequence, NARX was designed to include feedback connections, where the previous input value of deviation y -t 1 was provided as a third input vector component, beside F x1 and e. Figure 21 shows the deviation of predicted values from reference values for the NARX network. As in the case of ANN, 481 measurements were plotted on the horizontal axis so that this axis corresponded to the total length of the machined shaft L = 300 mm. Figure 22 shows the machining quality prediction (a detail of Figure 21) to better visualize the deviations between the prediction line and the reference line.

NARX Neural Network
In the case of NARX, it was impossible to apply the method of extraction of the test set used for ANN. This was because NARX had no index in the input vector. To preserve the order of the sequence, NARX was designed to include feedback connections, where the previous input value of deviation y-t 1 was provided as a third input vector component, beside Fx1 and e. Figure 21 shows the deviation of predicted values from reference values for the NARX network. As in the case of ANN, 481 measurements were plotted on the horizontal axis so that this axis corresponded to the total length of the machined shaft L = 300 mm. Figure 22 shows the machining quality prediction (a detail of Figure 21) to better visualize the deviations between the prediction line and the reference line

NARX Neural Network
In the case of NARX, it was impossible to apply the method of extraction of the test set used for ANN. This was because NARX had no index in the input vector. To preserve the order of the sequence, NARX was designed to include feedback connections, where the previous input value of deviation y-t 1 was provided as a third input vector component, beside Fx1 and e. Figure 21 shows the deviation of predicted values from reference values for the NARX network. As in the case of ANN, 481 measurements were plotted on the horizontal axis so that this axis corresponded to the total length of the machined shaft L = 300 mm. Figure 22 shows the machining quality prediction (a detail of Figure 21) to better visualize the deviations between the prediction line and the reference line

Deep LSTM Network
The data for training LSTM were prepared in the same way as for NARX. The input vector was also the same for both networks. A comparison of the deviations of the LSTM network shown in Figures 23 and 24 with the deviations of the NARX network show very large similarities in prediction. This is associated with the similar nature of the two networks, which, due to the use of feedback y t-1 , are well suited for predicting time series and sequences and therefore can be employed for predicting various types of processes.

Deep LSTM Network
The data for training LSTM were prepared in the same way as for NARX. The input vector was also the same for both networks. A comparison of the deviations of the LSTM network shown in Figures 23 and 24 with the deviations of the NARX network show very large similarities in prediction. This is associated with the similar nature of the two networks, which, due to the use of feedback yt-1, are well suited for predicting time series and sequences and therefore can be employed for predicting various types of processes.    Figure 25 shows two cases for L = 300 mm in which the controller adjusts the parameters e and Fx1 to minimize deviation y. In the first case, optimization was performed for turning length a = 100 mm and in the second case for a = 200 mm. As can be seen, there is a space between the two curves that allows to select parameters e and Fx1 within a certain range. The complex shape of the curves implies that the relationship between the two parameters is characterized by a high level of complexity. There are also big differences in the course of both curves, especially in the Fx1 range from 800 N to 1050 N.

Deep LSTM Network
The data for training LSTM were prepared in the same way as for NARX. The input vector was also the same for both networks. A comparison of the deviations of the LSTM network shown in Figures 23 and 24 with the deviations of the NARX network show very large similarities in prediction. This is associated with the similar nature of the two networks, which, due to the use of feedback yt-1, are well suited for predicting time series and sequences and therefore can be employed for predicting various types of processes.    Figure 25 shows two cases for L = 300 mm in which the controller adjusts the parameters e and Fx1 to minimize deviation y. In the first case, optimization was performed for turning length a = 100 mm and in the second case for a = 200 mm. As can be seen, there is a space between the two curves that allows to select parameters e and Fx1 within a certain range. The complex shape of the curves implies that the relationship between the two parameters is characterized by a high level of complexity. There are also big differences in the course of both curves, especially in the Fx1 range from 800 N to 1050 N.   Figure 25 shows two cases for L = 300 mm in which the controller adjusts the parameters e and F x1 to minimize deviation y. In the first case, optimization was performed for turning length a = 100 mm and in the second case for a = 200 mm. As can be seen, there is a space between the two curves that allows to select parameters e and F x1 within a certain range. The complex shape of the curves implies that the relationship between the two parameters is characterized by a high level of complexity. There are also big differences in the course of both curves, especially in the F x1 range from 800 N to 1050 N. Figure 25 shows a plot that is similar to that in Figure 24, but this time for a = 250 mm.

Neural-Genetic Controller
In Figures 25 and 26, there are many optimal pairs of parameters e and F x1 for particular values of turning length a. As a consequence, it is possible to fix one of these parameters for a defined constant value that is not modified during the machining. At the same time, the second value can be adjusted. It can happen that the controller issues a command to fix the value of parameter e or F x1 beyond the permissible range. In such cases, both of the parameters must be changed simultaneously. Sensors 2020, 20, x FOR PEER REVIEW 20 of 23   In Figures 25 and 26, there are many optimal pairs of parameters e and Fx1 for particular values of turning length a. As a consequence, it is possible to fix one of these parameters for a defined constant value that is not modified during the machining. At the same time, the second value can be adjusted. It can happen that the controller issues a command to fix the value of parameter e or Fx1 beyond the permissible range. In such cases, both of the parameters must be changed simultaneously.
The tests showed that the quality of GA-based control of the turning process mainly depends on the effectiveness of the objective function. Therefore, it can be assumed that the most effective variant among the ones investigated in this study is the one that combines LSTM with GA.

Conclusions
This article presents an original approach to controlling the process of turning low-rigidity shafts with the use of a hybrid neural-genetic controller. It was assumed that the use of an ANN in place of a GA objective function would increase the effectiveness of control compared to other known methods. Direct comparisons with other methods of controlling the machining of this type of shafts are not possible without maintaining exactly the same material, machine and measurement conditions, and parameters. For this reason, we compared three variants of machine learning algorithms we developed especially for this study: MLP ANN, NARX, and LSTM.
The tests confirmed that properly prepared measurement data are of key importance for the quality of the controller. A prerequisite for high-quality prediction, and thus for effective optimization and ultimately control, is the use of a training set that includes measurements for the  Figure 25 shows a plot that is similar to that in Figure 24, but this time for a = 250 mm. In Figures 25 and 26, there are many optimal pairs of parameters e and Fx1 for particular values of turning length a. As a consequence, it is possible to fix one of these parameters for a defined constant value that is not modified during the machining. At the same time, the second value can be adjusted. It can happen that the controller issues a command to fix the value of parameter e or Fx1 beyond the permissible range. In such cases, both of the parameters must be changed simultaneously.
The tests showed that the quality of GA-based control of the turning process mainly depends on the effectiveness of the objective function. Therefore, it can be assumed that the most effective variant among the ones investigated in this study is the one that combines LSTM with GA.

Conclusions
This article presents an original approach to controlling the process of turning low-rigidity shafts with the use of a hybrid neural-genetic controller. It was assumed that the use of an ANN in place of a GA objective function would increase the effectiveness of control compared to other known methods. Direct comparisons with other methods of controlling the machining of this type of shafts are not possible without maintaining exactly the same material, machine and measurement conditions, and parameters. For this reason, we compared three variants of machine learning algorithms we developed especially for this study: MLP ANN, NARX, and LSTM.
The tests confirmed that properly prepared measurement data are of key importance for the quality of the controller. A prerequisite for high-quality prediction, and thus for effective optimization and ultimately control, is the use of a training set that includes measurements for the The tests showed that the quality of GA-based control of the turning process mainly depends on the effectiveness of the objective function. Therefore, it can be assumed that the most effective variant among the ones investigated in this study is the one that combines LSTM with GA.

Conclusions
This article presents an original approach to controlling the process of turning low-rigidity shafts with the use of a hybrid neural-genetic controller. It was assumed that the use of an ANN in place of a GA objective function would increase the effectiveness of control compared to other known methods. Direct comparisons with other methods of controlling the machining of this type of shafts are not possible without maintaining exactly the same material, machine and measurement conditions, and parameters. For this reason, we compared three variants of machine learning algorithms we developed especially for this study: MLP ANN, NARX, and LSTM.
The tests confirmed that properly prepared measurement data are of key importance for the quality of the controller. A prerequisite for high-quality prediction, and thus for effective optimization and ultimately control, is the use of a training set that includes measurements for the full range of turning lengths. Hence, before starting the production of a new batch of products, a pilot process of turning one reference shaft should be performed. This step is necessary for the acquisition of training data. This study shows that improper division of measurement data into training and test subsets may seriously reduce the quality of prediction and, consequently, the efficiency of control.
Because the best performing controllers used feedback, their performance might constitute a crucial utilitarian problem. If the controller works too slowly, then either sampling has to be done less frequently or the turning process must be slowed down. It should be stated that in the present case of controlling the turning process, the quantity of data did not cause any efficiency problems. The neural networks were trained in a few seconds, and the results were generated in an even shorter time. One limitation of the proposed solution is that the GA, being iterative, slowed down the optimization, but it can be replaced by other, faster optimizers. It is all a matter of give and take between the speed and effectiveness of optimization.
A clear advantage of the presented solution is that it allows to bring to light and take into account many invisible but important factors that affect the effectiveness of control. Real-life data contain information that, for obvious reasons of space, cannot be included in mathematical models. Although all models, including both mathematical and neural models, constitute a simplified representation of real objects, neural networks can reproduce real-life processes more accurately because they have the ability to generalize and take into account large amounts of information contained in measurement data.
Author Contributions: A.Ś. gave the theoretical and substantive background for the developed solution and conceived and designed the experiments, D.W. prepared and provided the mathematical description of the method, G.K. made and experimentally verified the solution, A.G. provided technical guidance and critically reviewed this paper. All authors have read and agreed to the published version of the manuscript.

Funding:
The project/research was financed in the framework of the project Lublin University of Technology-Regional Excellence Initiative, funded by the Polish Ministry of Science and Higher Education (contract no. 030/RID/2018/19).

Conflicts of Interest:
The authors declare no conflict of interest.