Adaptive PID Control and Its Application Based on a Double-Layer BP Neural Network

: In this paper, focusing on the inconvenience of variable value PID based on manual parameter adjustment for the hydraulic drive unit (HDU) of a legged robot, a method employing double-layer back propagation (BP) neural networks for learning the law of PID control parameters is proposed. The ﬁrst layer is used to learn the relationship between different control parameters and the control performance of the system under various working conditions. The second layer is used to study the relationship between the parameters of the working conditions and the optimizing control parameters under various working conditions. The effectiveness of the proposed control method was veriﬁed by simulation and experiment. The results showed that the proposed method can provide a theoretical and experimental basis for the selection of control parameters, and can be extended to similar controllers, therefore possessing engineering application value.


Introduction
Robots can walk in a variety of ways. At present, the movement forms can be roughly divided into wheeled [1], tracked [2], wheel-foot compound [3], snake-like [4], bionic legged [5], and so on. Compared with other types of robots, bionic-legged robots have the characteristic of discontinuous support because they have a similar leg structure to tetrapods. Especially when combined with hydraulic drive, which has a high power-toweight ratio, it not only has good adaptability to unknown and unstructured environments but can also pass through the barrier. Therefore, this type of robot is particularly suitable for use in complex environments in the wild.
The leg controller serves as the bottom-level controller of this kind of robot, and each leg of the robot has several degrees of freedom controlled by highly integrated valve cylinders, also known as the hydraulic drive unit (HDU) [6,7]. While the HDU serves as the bottom-level controller of each leg, its control performance directly affects the control strategy and performance of the robot. Commonly, HDU bottom-level control methods can be divided into position control and force control. Based on bottom-level control, control methods of the leg can be extended to compliance control, contact force control, and so on. The above methods are not only applied in electrically driven robots such as Scara [8] and Stewart [9], but they can also be applied to robots such as Bigdog [10], Hydraulic quadrupedal (HyQ) [11], Light Weight Robot (LWR) [12], and Atlas [13].
This paper mainly researched the performance of the HDU in position control. The position control system in the HDU is a kind of high-order nonlinear system. Designing a superior control method requires a very detailed understanding of the characteristics of the controlled system. The establishment of a mathematical model involves analysis of the controlled system, and an accurate mathematical model can truly reflect the dynamic characteristics of the system, fully simulate the actual system in simulation research, and shorten the design cycle of the control method. High-performance intelligent control methods suitable for low-order nonlinear systems can also be used in it. However, in order to ensure the control stability and reliability of the whole machine, such a control method is not often used in engineering practice. The traditional control method is simple to implement and the effect is obvious. Furthermore, the change in the control parameters can truly reflect the system characteristics, which can be used to conduct a preliminary analysis of the system performance. Thus, the HDU position control system is still based on traditional PID control.
A neural network is a computational model that comprehensively simulates the human brain neural network in terms of structure, mechanism, and function [14][15][16]. By virtue of its complex nonlinear network structure and efficient iterative learning performance, it has obvious advantages compared with other nonlinear optimization methods. Some research works have shown that neural networks can fit arbitrary nonlinear functions. Swic presented an original machine learning-based automated approach for controlling the process of machining of low-rigidity shafts using artificial intelligence methods. Three models of hybrid controllers based on different types of neural networks and genetic algorithms were developed [17]. Rego deals with the problem of finding the control Lyapunov function that keeps the system stable. To find the Lyapunov function, this paper proposes the use of reinforcement learning with two neural networks based on the Lyapunov stability theory [18]. Nobahari focuses on developing a nonlinear controller based on the convolutional neural networks to control different plants. It is assumed that prior knowledge of the plants is very limited and there are only sensory input-output data history of the plants [19]. Wang studied the hysteresis nonlinear characteristics of piezoelectric actuators, a novel hybrid modeling method based on long short-term memory (LSTM) and nonlinear autoregressive with external input (NARX) neural networks is proposed [20].
The neural network is used to learn the relationship between parameters and control performance under different working conditions, and to find out the optimal control parameters under the current working conditions, which can improve the control accuracy of the system under various working conditions and eliminate the work of manual adjustment of parameters. Compared with variable value PID based on manual parameter adjustment, the method based on neural networks can output parameters with continuous variation according to different working conditions, thereby improving the accuracy of control. In addition, the latter method is not restricted by a specific number of conditions in the expert table. Thus, the applicable scope of the improved expert table holds great significance for the application of engineering.
The structure and the contribution of this paper is organized as follows: in Section 2, a mathematical model is established for the HDU position control system. In the model, many factors are carefully considered, such as servo valve nonlinearity, flow-pressure nonlinearity, and load characteristics. In Section 3, aiming at the inconvenience of variable value PID based on manual parameter adjustment in engineering practice, a method of employing double-layer back propagation (BP) neural networks for learning the law of PID control parameters is proposed, and the simulation results are shown, this is the main contribution of our paper. In Section 4, experimental research is carried out on the HDU performance test platform.

Introduction to the Sampling System
The HDU is a highly integrated system that includes a servo valve-controlled cylinder, which is the legged robot joint actuator. Figure 1 shows the quadruped robot prototype, the single leg hydraulic drive system, and the HDU.   The parameters definition and simulation values of the above system are shown in Table 1. The purpose of this paper is to present a new PID controller based on neural networks instead of the PID control parameter in Figure 2.

Learning Strategy Design
Neurons are the basic unit of neural networks and their main function is to simulate the functional characteristics of biological neurons [21][22][23]. Considering that the input of the neural network in this paper comes from the sensor data of the control system, a Tanh activation function in the Sigmoid activation functions (the latter is generally referred to as a Sigmoid activation function) was selected as the activation function of neurons.
In order to make the system automatically output the optimal control parameters according to the working conditions, it is necessary to design the appropriate neural network structure first. If the neural network is too simple, the fitting accuracy will be reduced; if the neural network is too complex, the convergence will be slow, and even the generalization ability of the neural network will be reduced. Therefore, it is very important to design a neural network with an appropriate structure. Then, designing learning strategies to enable the neural network to learn effectively are needed, including the learning objects of the neural network, the selection of samples, the initial processing of samples, and iterative learning methods. In this section, a parameters learner based on a double-layer BP neural network is designed, which can realize automatic parameter learning. The overall learning strategy is shown in Figure 3, and the details are explained in the following sections.

Generation of Learning Samples
The sample is a very important part of neural network learning problems and is the source of learning for effective information. The sample data in this paper were driven by position control system simulation or experimental collection in the HDU. The data contained random interference generated by the system itself, and the range of each variable data was also different, so it was necessary to process the data before it was used for learning. The sample data used in this section had to meet the following conditions: (1) The samples should cover a wider range of working conditions and control parameters as much as possible, and the performance indexes under the corresponding working conditions should be obtained through experiment or simulation, so that the neural network can learn the characteristics of the control system and improve the adaptive ability of the control method.
(2) The sample should be universal. The hydraulic system is a highly nonlinear timevarying system, and the dynamic characteristics of the system change with the differ-ent external conditions. Collection of data should be carried out after the hydraulic system has been started up and run stably under good heat dissipation conditions. (3) The data interval of each variable in the sample should be as consistent as possible, which is beneficial for improving the convergence speed and stability of neural networks. According to the above conditions and principles, a plan of learning data for the PID position control system of the HDU was designed in this section. By generating the input signals and change signals of the control parameters, then importing them into the control model, automatic data acquisition was realized.
In order to prove the effectiveness of the proposed learning strategy, part of the overall working conditions of the HUD were selected for verification to reduce unnecessary work, and then the control parameter range was simplified based on the simulation results of the PID control system shown in Section 2. The working conditions and control parameters finally determined in this section are shown in Table 2. The final working conditions are generated by the permutation and combination of sinusoidal frequency, sinusoidal amplitude, and P gain in Table 2, and there are eight groups of sinusoidal frequency, 15 groups of P gain, 10 groups of sinusoidal amplitude, and 1200 working conditions in total. In order to avoid the mutual influence between two adjacent working conditions, each working condition runs for two cycles, with an overall sampling time of approximately 1632 s. Moreover, the mean of the control deviation absolute value at each moment of the last cycle is taken as the basis for evaluating the control performance.
The desired input signals in the simulation are shown in Figure 4. Due to the long sampling time, sinusoidal curves at different frequencies are relatively dense, as shown in the Figure 4 below.
The working conditions parameters include sinusoidal frequency and amplitude of input signal, the control parameters are P gain of the PID control method, and the performance index in the system is the mean of control deviation absolute value. It can be seen that in Table 2, there is an order of magnitude difference in the size of these three variables, which is not beneficial to the learning of the neural network. Therefore, the above three variables should be appropriately transformed to make their interval roughly between 0 and 1. So, the concept "data after processing" in the following section is the data after normalization. The P gain of controller in the simulation is shown in Figure 5.

Performance Fitting of Control System
In Section 3.2, the mean of control deviation e under different working conditions and control parameters are obtained through simulation. In this section, neural network 1 is used to fit the relationship among the working condition parameters, control parameters, and the mean of control deviation e. Then, neural network 1 can be used to calculate the mean of control deviation e with different control parameters under each working condition. The parameters with the minimum of mean of control deviation e under each working condition are selected, so as to complete the optimization process of the control parameters.
(1) Input and output of the neural network Neural network 1 was designed. The input of the neural network is a three-dimensional vector, which represents the sinusoidal frequency and amplitude of the input signal and P gain, respectively, and the output is the mean of control deviation e of the corresponding set of parameters.
(2) Selection of the loss function The loss function is the index used to evaluate the model fitting effect, and the goal of neural network learning is to make the loss function as small as possible. The input and output variables of the neural network are continuous values, and the mean square error function is adopted. Its expression is as follows: (3) Determination of the neural network structural parameters The total number of neural network layers is three, including the input layer, the output layer, and a hidden layer. The number of neurons in the hidden layer is 13, and the activation function is Sigmoid, the overall structure of neural network 1 is shown in Figure 6. The sinusoidal input signals and control parameters are shown in Table 2, the output of the neural network (mean of control deviation e) indicates the mean of the control deviation e between the input signals and output signals of the HDU position control system. The input of neural network 1 after data processing is shown in Figure 7. The output of neural network after data processing is shown in Figure 8.
The processed data are fed into the neural network for learning until the gradient is less than 10 −6 or the mean square deviation is less than 10 −4 .

Optimization of the Control Parameters
The sinusoidal frequency and amplitude of the input signals can be determined for a specific working condition. Taking the control parameters as independent variables, mapping the relationship established through neural network 1 as a function and the mean of control deviation e as the dependent variable, the relationship between the control performance and control parameters can be obtained under this working condition. There is an obvious rule between the control performance and the control parameters, so the control parameters with better control performance can be obtained through the curves. The optimal control parameters under the working conditions are selected according to certain rules and neural network 2 is used to learn the relationship between the working condition parameters and the selected control parameters. After learning, the neural network is used to adaptively change the control parameters according to the working conditions, so as to realize the adaptive control. The specific learning model was designed as follows: (1) Selection of the neural network input and output The purpose of neural network 2 is to calculate the control parameters that meet the rules under different working conditions. Therefore, the input of the neural network are the sinusoidal frequency and amplitude of the input signals, which are generated through permutation and combination with a sinusoidal frequency of 0.4~2 Hz and a sinusoidal amplitude of 1~5 mm, forming at the intervals of 0.01 Hz and 0.05 mm, respectively. The neural network output are the selected control parameters which could control the model in Figure 2 instead of the PID. The overall structure of neural network 2 is shown in Figure 9.
(2) Rules of parameter selection The control parameters with the minimum of control deviation e are selected to form the output sample of the neural network.
(3) Training of neural network 2 Neural network 2 consists of three layers, including a hidden layer and 10 neurons in this hidden layer. The activation function is Sigmoid, and the loss function is the mean square error.

Simulation
Neural network 2, after training, was applied to the HDU position control system. Then, the updated schematic diagram of the HDU position control system is shown in Figure 10.
While the working conditions parameters changed, the neural network 2 automatically adjusted the control parameters according to the working condition to realize the adaptive control. Based on the MATLAB/Simulink model of the system established in Section 2, this section introduces a MATLAB function module for the neural network 2 calculation, and the results were output to the PID control model.
In the simulation, the initial position of the hydraulic cylinder piston was 25 mm, the P gain was the output of neural network 2, the I gain was 2, and the D gain was 0. The simulation working conditions are shown in Table 3.
The ideal control deviation (reference signal) was 0 which means that there is no control deviation between the input and the output. The comparison curves with constant and variable value PID are shown in Figure 11 (adaptive PID control based on a neural network is neural network PID for short, control deviation e is deviation e for short).
The control deviation of the adaptive PID control system based on the neural network (the blue curves in Figure 11) is shown in Table 4 (maximal relative deviation is equal to the ratio of the maximum deviation to the sinusoidal amplitude).       According to the simulation results, under the three working conditions, the maximum relative deviation of the adaptive PID method based on a neural network decreased by an average of 31.3% compared with the maximum relative deviation of the constant value PID and increased by 7.87% compared with the maximum relative deviation of the variable value PID. The deviation of the adaptive PID method based on a neural network was greatly reduced compared with the constant value PID, which approached the effect of the manually adjusted PID control parameters and maintained a good control performance under multiple working conditions. Due to space limitations, additional simulation results are not included in this paper.

Introduction to the Experimental System
The experiment of this study was carried out on the performance test platform of the HDU. The platform is mainly composed of two HDUs, which are installed in the top. The HDU on the left adopts the position of closed-loop control, while the HDU on the right adopts the force closed-loop control position. In the experiment, the HDU on the left carried out the performance test of the relevant control algorithm, and the HDU on the right carried out the zero-force servo control. In each experiment, the working conditions of the left and right HDUs were the same. The photo of the experimental platform is shown in Figure 12a. After the control algorithm in MATLAB/simulink, we used the code to automatically generate the target C code that could then be identified by the controller. Compared with manual C coding, combining MATLAB/simulink with the encoder can quickly design and test the control algorithms, avoid the complexity of the underlying C code writing, and improve the speed of the controller implementation stage. In the experiment, the data sampling frequency was 1 KHz. Figure 13 is the schematic diagram of the experimental signal input and data acquisition.

Collection of Learning Samples
As a joint actuator of robots, the HDU is the key to determining the motion performance of robots. According to the movement of the robot during trotting, pacing, and other gaits, the proposed sampling range of experimental learning samples is shown in Table 5.
The final working conditions were obtained by permutation and combination in the table, with a total of 324 groups of working conditions, and each group of working conditions ran for three cycles. In order to avoid mutual influence between adjacent conditions, the mean of control deviation for the last two working conditions was taken as the evaluation of the performance index. The generated system input signal sequence is shown in Figures 14 and 15, and the signal acquisition interface is shown in Figure 16.

Optimization of the Control Parameters
The samples obtained in Section 4.2 were used to learn the relationship among the working conditions parameters, the control parameters, and the control performance, and the neural network structure and data processing methods used were the same as those in Section 3.3. The training performance of the neural network is shown in Figure 17. It can be seen that after the completion of neural network learning, the value of the mean square error reached the magnitude 10 −4 , which well estimated the control performance and laid a foundation for the next calculation of control parameters.
The control performance index of the HDU was set as follows: the maximum of control deviation e should not exceed 5% of the sinusoidal amplitude. Based on the obtained neural network, the corresponding system performance under different working conditions and the control parameters were calculated, and the control parameters required to meet the control performance requirements were selected. The working condition parameters were taken as the input of neural network 2, and the selected control parameters were taken as the desired output of neural network 2. The sinusoidal frequency of the input signal was 0.5-2 Hz and the amplitude was 5-15 mm, and the input signals were generated by permutation and combination at intervals of 0.01 Hz and 0.05 mm, respectively.
The neural network structure and data processing methods used were the same as those used in Section 3.4. The learning performance of neural network 2 is shown in Figure 18. It can be seen that the neural network converged rapidly, and the value of the mean square error reached an order of magnitude 10 −1 after learning, which meets the requirements of controlling parameter adjustment accuracy.

Experiment of Adaptive PID Control Based on a Neural Network
In order to verify the performance of the adaptive PID control based on a neural network, an experiment was carried out on the performance test platform of the HDU under the working conditions shown in Table 3, and the control performance of the system under different working conditions was tested.
The initial position of the piston of the HDU was 25 mm, and the oil source pressure of the system was 5 MPa. The working conditions were input into the adaptive PID control system based on the neural network, and a deviation curve was obtained, which was compared with the deviation curve of the PID control with constant and variable values, as shown in Figure 19. The control deviation of the adaptive PID method based on the neural network (the blue curves in Figure 19) is shown in Table 5.
As shown in Figure 19 and Table 6, due to the setting of the parameter selection rules, the control deviation was slightly larger than that of the constant value PID under working condition 1. It greatly improved over that of the constant value PID method under the other two working conditions. The maximum relative deviation of the three working conditions reduced by 22.13% on average compared with that of the constant value PID method, which is close to the deviation level of the variable value PID method. On the whole, the control accuracy of the adaptive PID method based on a neural network was between the constant value PID method and the variable value PID method, which is slightly worse than the variable value PID method. However, its control accuracy was better than that of the constant value PID method, which has good adaptability and can maintain better control accuracy under various working conditions. According to the proposed method in this paper, more parameter information corresponding to working conditions can be learned, and the same research idea can be extended to other control systems with similar structures. Moreover, based on this double-layer BP neural network, other "machine learning" methods such as deep deterministic policy gradient (DDPG) could be researched.

Conclusions
In this paper, an adaptive PID control method using a double-layer BP was designed. Neural network 1 is used to fit the relationship among the working parameters, the control parameters, and the control performance. Neural network 2 is used to fit the relationship between the working condition parameters and the selected control parameters, and to realize the adaptive adjustment of the PID control parameters according to the working condition parameters. The results showed that the designed method can automatically adjust the control parameters in the learning range and the working conditions near it, and it has a certain adaptability. It basically achieved the desired control precision. Compared with the constant value PID method, the deviation was reduced by 31.3%, and the performance was close to that of the variable value PID. Avoiding the disadvantage of the variable value PID requiring repeated manual adjustment of parameters, it provides practical value in engineering.  Whether the electro-hydraulic servo valve can output the corresponding flow and pressure under the control of the electrical analog signals is the core of the electro-hydraulic servo control system, and whether the model is accurate or not has a great influence on the overall modeling accuracy. The general modeling method for the electro-hydraulic servo valve is linearization at a specific working point (usually at zero position of the servo valve). However, this method cannot accurately reconstruct the characteristics of the servo valve in all working areas. In order to improve the accuracy of the model, the nonlinear factors of pressure and flow for the electro-hydraulic servo valve are considered in this paper, and the flow equations of the electro-hydraulic servo valve were obtained as follows: The inlet oil flow of the servo valve is: The return oil flow of the servo valve is: The equivalent flow coefficient K d is expressed as: For the convenience of expression and research, let: According to Equations (A1), (A2), (A4), and (A5), the flow equations of the servo valve can be written.
The inlet oil flow of the servo valve is: The return oil flow of the servo valve is: The response of the servo valve is often much higher than that of the hydraulic power components. In order to simplify the analysis and design of the dynamic characteristics in the system, the transfer function of the electro-hydraulic servo valve is equivalent to a second-order oscillation link, and the transfer function of the spool position and the input voltage of servo valve is obtained in Equation (A8).
The hydraulic cylinder is a component of the hydraulic actuator, the final carrier of the output power of the hydraulic system, and the control object of the electro-hydraulic servo valve. Its dynamic characteristics largely determine the performance of the system. Assuming that the connecting pipe diameter between the servo valve and the hydraulic cylinder is large enough, the pressure loss, fluid quality influence, and pipeline dynamic characteristics are all ignored; the hydraulic cylinder pressure in the working cavity is equal, the oil bulk modulus and the oil temperature are constant, and the internal and external leakage of the hydraulic cylinder are laminar flow, then the flow equations of the two working cavities for the asymmetric hydraulic cylinder can be obtained.
The rodless cavity flow of the asymmetric hydraulic cylinder and the volume of the servo valve to the rodless cavity are: The rod cavity flow of the asymmetric hydraulic cylinder and the volume of the servo valve to the rod cavity are: The inlet or return oil cavities of the HDU are all set inside the servo cylinder body. Considering the difference of the initial position for the piston in the servo cylinder, the initial volume of the rodless and rod cavities can be obtained as: Considering the coulomb friction force of the hydraulic cylinder is very small relative to the load force, the coulomb friction force is included in the load force and not considered separately. According to Newton's second law, the dynamic equilibrium equation on the piston is: The transfer function between the feedback voltage of the position sensor and the position of the piston rod in the servo cylinder is: A block diagram of the position closed-loop control system of the HDU can be obtained by combining Equations (A1)-(A13), which is shown in Figure 2 in Section 2.

Appendix A.2. Neuron Model
Let the input of neurons be an n-dimensional vector, u 1 , u 2 , · · ·, u n , and let the vector u = [u 1 ; u 2 ; · · ·u n ] represent the input of neurons. Neurons assign different weights to each input element, and the final input is obtained after summing, which is called the net input: where w = [w 1 ; w 2 ; · · ·w n ] is the n-dimensional weight vector and b is the offset value. In the human brain, different input signals cause the neurons to produce different electrical signals. Artificial neurons use a nonlinear function to simulate this function and finally obtain the output value of the neuron x: where f (·) is referred to as an activation function. The introduction of the activation function improves the ability of expression and learning in neural networks. Derivable activation functions can use numerical optimization methods to update the network parameters, and self-defined activation functions can limit the scope of input and output in neural networks, keeping the overall calculation domain within a reasonable range, then improving the stability of the learning.
Sigmoid activation functions are S-shaped on the whole, closing to linear near 0 and tending to saturate at both ends [24,25]. The commonly used Sigmoid activation functions can be divided into logistic activation functions and Tanh activation functions.
Logistic activation functions are expressed as: It can be seen that the standard logistic activation functions can map the data from the real interval to the scope of 0 and 1. After a certain transformation, the input can cover the whole range of data for the sensors in the control system, and the output can be limited to a certain effective interval, which can be continuously derivable.
Tanh activation functions are expressed as: The standard Tanh activation functions can map data from the real interval to the scope of −1 and 1, which can be used to control the output control value in the system.