Simple Learning-Based Robust Trajectory Tracking Control of a 2-DOF Helicopter System

: Stabilization and tracking control of Unmanned Aircraft Systems (UASs) such as helicopters in a complex environment with system uncertainties, unknown disturbances, and noise is a challenging task; therefore, to compensate for system uncertainties and unknown disturbances, this paper presents a trajectory tracking control strategy for a 2-DOF (degree of freedom) helicopter system testbed by employing a gradient descent-based simple learning control law that minimizes the cost function corresponding to desired closed-loop error dynamics of the nonlinear system under control. In addition, to ensure the stability of the closed-loop nonlinear system, further analysis is provided. The learning capability of the designed controller makes it suitable to take system uncertainties and unknown disturbances into account. The results of computer simulations and real-time experiment using the Quanser AERO helicopter are included to demonstrate the effectiveness of the designed control strategy.

A helicopter is one of the most flexible types of rotary-craft UASs with horizontally spinning rotors that supply the thrust and lift.This structure provides the capabilities of Vertical Takeoff and Landing (VTOL), hovering, and flying in different directions.Numerous linear and nonlinear control strategies have been designed to control helicopters.Linear control techniques for helicopters include Proportional-Integral-Derivative (PID), Linear-Quadratic-Regulator (LQR), and linearization-based adaptive and optimal control methods [5][6][7][8].For example, based on linear quadratic techniques, an adaptive augmentation method was designed and experimentally applied to control the Quanser 2-DOF helicopter in [5].The authors in [6] developed an optimal LQR controller for the attitude tracking control of the 2-DOF helicopter.In their work, the authors utilized the Adaptive Particle Swarm Optimization (APSO) approach to acquire the Q and R matrices.In another study [7], a control framework has been designed to track desired trajectories for the 2-DOF helicopter by applying an extended linearization method.Later, the authors in [8] developed a Multi-Step Q-Learning (MsQL) method for solving the optimal output regulation problem for the Quanser 2-DOF helicopter.Although these linear control techniques are relatively easy to design and implement, their operational range is limited.
As helicopter systems are inherently nonlinear and due to numerous sources of uncertainties and disturbances, there remain many open problems in the design of control algorithms that can effectively compensate for these challenges; therefore, simple linear control strategies may not be effective in meeting the desired performance requirements [27].To this end, in this paper, we focus on developing robust nonlinear control strategies for tracking control of the 2-DOF helicopter systems.In the past few years, several studies have been devoted to designing and validating nonlinear adaptive and robust controllers to better compensate for the nonlinearity, uncertainties, and disturbances in the 2-DOF helicopter systems.For example, a backstepping controller has been designed by incorporating nested saturation feedback functions for tracking control of helicopters in [14].In a relevant study [28], to independently track the yaw and pitch position references, a backstepping adaptive nonlinear controller has been designed.The authors in [16] implemented an optimized fractional-order sliding mode controller (SMC) on the 2-DOF Quanser AERO helicopter testbed where they chose the sliding surface in a fractional-order hyperplane in order to reduce the chattering.Other nonlinear and learning-based control strategies developed for helicopter systems can be found in [11][12][13]15,[17][18][19].
In our previous work [9], to achieve both asymptotic regulation to desired set points and trajectory tracking of a 2-DOF helicopter system an observer-based SMC strategy is developed.Moreover, to prove set point stabilization and robust trajectory tracking of the helicopter, and the convergence of the velocity estimates, Lyapunov-based analyses are provided.In [10], to achieve robust nonlinear tracking control of the 2-DOF helicopter system, we extended our results in [9] by replacing the sliding mode observer with a bank of dynamic filters.Furthermore, to illustrate the effectiveness of the dynamic filter-based tracking control strategy, numerical simulation and experimental results are provided using the Quanser 2-DOF AERO helicopter.
Although all these nonlinear and learning-based methods have several advantages such as dealing with uncertainties and disturbances, they are mostly computationally expensive to be implemented in real-time.To this end, in this paper, we follow the key steps presented in [20] for the design of a simple learning control strategy for the trajectory tracking of a 2-DOF helicopter system.Our main contributions in this paper are as follows: i Development of a simple learning based robust control algorithm that is computationally efficient for real-time implementation.ii Demonstration of the effectiveness of the proposed tracking control law via experimental results using the Quanser 2-DOF AERO helicopter.
The rest of this paper is organized as follows: the mathematical model of the 2-DOF helicopter system is presented in Section 2. In Section 3, the gradient descent-based simple learning control strategy design is investigated and discussed.We illustrate the satisfactory performance of the proposed method by providing several numerical simulations and experimental implementation in Section 4, and finally conclusions and further extensions are presented in Section 5.

Mathematical Model
Consider the schematic representation of a 2-DOF helicopter system shown in Figure 1, where the right-handed frame B is a body-fixed frame with axes x, y, and z.The system has two identical propellers.The horizontal propeller generates a thrust force F p at distance r p , which results in a pitch torque around the y-axis.The vertical propeller generates a thrust force F y at distance r y , which results in a yaw torque abut the z-axis.Let θ and ψ denote the pitch and yaw angles, respectively.Then, the dynamical model of the helicopter can be expressed as [9]: where Where m is the helicopter mass, l cm is the distance between the center of mass and the point of rotation, (D p , D y ) are viscous friction coefficients, (K pp , K py , K yp , K yy ) are the thrust torque constants, and J p and J y are moments of inertia (MOIs) around the pitch and yaw axes, respectively.The control input voltages to the DC torque motors are V p and V y , which control the pitch and yaw propellers, respectively.These input voltages are restricted to [−24, 24] V.

Control Design
Consider the nonlinear dynamical system defined by ( 1) and (2).Define the following input transformation from (V p , V y ) to (u 1 , u 2 ): so that (1) and ( 2) can be rewritten as Clearly, each of these two equations can be written in the following form: where the state vector and control input are defined as x = [x 1 , x 2 ] T ∈ R 2 and u ∈ R, respectively.The modeling uncertainties and external disturbances are lumped together as disturbance d ∈ R, which is assumed bounded.The control objective is to design a control input u such that the system tracks the given reference trajectory r(t) = [r 1 (t), r 2 (t)] T , where r 2 = ṙ1 , while all the states and input remain bounded.

Remark 1.
Although there exist several works on UASs or similar systems such as [29,30], which do not require any knowledge of system parameters, our work demonstrates that considering some information about the dynamics of the system could significantly reduce the computational complexity of the designed controller and make it suitable for real-time implementation; however, one can extend our results by designing an approximator to estimate the system parameters, e.g., [21].
Define the tracking error variables and consider the following control input: where d is the estimated disturbance and k i > 0, i = 1, 2, are control gains.Then the closed-loop error dynamics can be derived as follows ė1 = e 2 (11) For robust control performance, the desired closed-loop error dynamics defined as should converge to zero.Here gradient decent method is used for the minimization of the closed-loop error function (or the cost function) given by The time-update rule for controller gains is given by ki where α i > 0 is the learning rate for the ith controller gain.Similarly, the update rule for the disturbance estimate is where α d is the learning rate for the disturbance estimate d.The controller gains and disturbance estimates are updated until the cost function goes to zero, i.e., C(e, k des ) = 0.

Stability Proof
The closed-loop error dynamics can be rewritten as We assume that the average rate of change of the disturbance d is much smaller than that of the error state variables so that taking the time derivative of ( 17) with ḋ = 0 yields ...
Plugging the expressions for ki and ˙d in (18) yields ...
where ξ = [e 1 ė1 ë1 ] T is the state and Equation ( 19) is in a pseudo-linear form.Following [31], we perform a Routh-Hurwitz criterion-based stability analysis.Clearly, a i (ξ) > 0, i = 1, 2, 3, ∀ξ and the characteristic equation corresponding to a i (0), i = 1, 2, 3, is given by Applying the Routh-Hurwitz criterion, it can be shown that all the roots of characteristic Equation ( 22) have negative real parts if a 1 (0)a 2 (0) > a 3 (0), where Equivalently, the closed-loop error system is asymptotically stable if In what follows, we choose k i , k des i , and α d such that this stability condition is satisfied.

Numerical Simulation and Experimental Results
This section presents the results of computer simulations and experimental implementation of the 2-DOF helicopter control under three different scenarios.The performance of the control law proposed in Section 3 is evaluated using the system parameters of the Quanser AERO 2-DOF helicopter.Nominal and estimated system parameters are given in Tables 1 and 2, respectively.It is worth noting that the proposed controller is robust to variation in system parameters meaning that regardless of using the values in Table 1 or Table 2, the performance of the developed controller is satisfactory.The performance of the proposed method is further validated by comparing it to PID and LQR controllers, which are available on the Quanser website.
In the first two scenarios, we have considered two cases (Sections 4.1 and 4.2) for the computer-based numerical simulation and the last scenario is provided for experimental validation (Section 4.3).In the first scenario, the proposed simple learning-based controller is utilized to generate control actions for sinusoidal trajectories while in the second scenario, it is utilized for maintaining the system to track constant trajectories.In the numerical simulation scenarios, the total simulation time is 60 s.All experimental implementations are carried out on a platform with the following specifications: Windows 10 Enterprise; Processor: Intel(R) Core(TM) i7-9700 CPU @ 3.00 GHz, 3000 Mhz, 8 Core(s), 8 Logical Processor(s); RAM: 32.0 GB.The initial conditions for the simulations are selected as The expressions for the desired closed-loop pitch and yaw error dynamics are given by c y (e, k des ) = ëy + k des y 2 ėy + k des y 1 e y (26) where (e p , e y ) are the pitch and yaw errors, ( ėp , ėy ) are the pitch rate and yaw rate errors, and ( ëp , ëy ) are the pitch acceleration and yaw acceleration errors.The parameters (k des p 1 , k des y 1 ) and (k des p 2 , k des y 2 ) are the desired simple-learning gains for the pitch and yaw angles and their rates, respectively.
The following gains for the desired closed-loop error dynamics are used in both cases (Sections 4.1 and 4.2) for the simulation: The initial controller gains and their learning rates are given by In both simulation scenarios, external torques equal to 0.5 N • m are imposed on both pitch and yaw as external disturbances.The initial disturbances and disturbance learning rates are as follows dp (0) = 0, dy (0) = 0 α dp = 1, α dy = 1

Case I: Tracking Sinusoidal Trajectories
In the first simulation, time-varying trajectory tracking control of the 2-DOF helicopter is investigated.The desired trajectory is given as the following: Figure 2 shows the sinusoidal time-varying desired trajectories and the tracking outputs of the system where the proposed controller is in green (solid line), the LQR in magenta (dotted line), the PID in cyan (dashed-dotted line), and the desired trajectory in black (dashed line).The control gains (k p 1 , k p 2 ) and (k y 1 , k y 2 ), the estimated disturbances ( dp , dy ), and the control input voltages (V p , V y ) for Case I are shown in Figures 3-6.By looking at Figure 2, we can observe that the pitch and yaw angles perfectly tracks the reference trajectory by using all the controllers; however, it is worth noting that the performance of the proposed controller is superior compared to the other two controllers (LQR and PID) especially in tracking the pitch angle.

Case II: Tracking Constant Trajectories
In the second simulation, constant trajectory tracking control of the 2-DOF helicopter is investigated.The following set point is considered as the desired trajectory: Figure 7 shows the constant desired trajectories and the tracking outputs of the system where the proposed controller is in green (solid line), the LQR in magenta (dotted line), the PID in cyan (dashed-dotted line), and the desired trajectory in black (dashed line).The control gains (k p 1 , k p 2 ), and (k y 1 , k y 2 ), the estimated disturbances ( dp , dy ), and the control input voltages (V p , V y ) for Case II are shown in Figures 8-11.As seen in Figure 7, the pitch and yaw angles converge to the desired set point; however, the performance of the other two controllers (LQR and PID) especially in tracking the pitch angle is degraded.It is worth noting that the parameters of all the controllers remained the same as in the previous scenario (i.e., Section 4.1), which further validates the effectiveness of the proposed controller under varying conditions.

Case III: Experiment
We experimentally validated our proposed method on the Quanser 2-DOF AERO 2-DOF helicopter system testbed (see Figure 12).The desired trajectory is given as: The following gains for the desired closed-loop error dynamics are used for experimental validation: The initial controller gains and their learning rates are given by The disturbance vector is initialized as dp (0) = 0, dy (0) = 0 The disturbance learning rates are Figure 13 shows the sinusoidal time-varying desired trajectories and the tracking outputs of the system where the proposed controller is in green (solid line), the LQR in magenta (dotted line), the PID in cyan (dashed-dotted line), and the desired trajectory in black (dashed line).The control gains (k p 1 , k p 2 ), and (k y 1 , k y 2 ), the estimated disturbances ( dp , dy ), and the control input voltages (V p , V y ) for Case III are shown in Figures 14-17      As seen in Figure 13, the pitch and yaw angles track the desired trajectory perfectly; however, the performance of the other two controllers (LQR and PID) especially in tracking the pitch angle is degraded.

Conclusions
A trajectory tracking control strategy for a 2-DOF helicopter system testbed is presented in this paper.The proposed learning strategy compensates for model uncertainties and disturbances.This is performed by designing a gradient descent-based simple learning control strategy that minimizes the cost function defined by the error dynamics of the nonlinear system.The effectiveness of the attitude trajectory tracking control method is validated through both computer simulation and experimental results.Comparison with the well-known existing methods such as LQR and PID controllers further validate the satisfactory performance of the developed method.
Future work will consider extending our results by designing an approximator to estimate the system parameters in real-time, which will make the current architecture less dependent on the knowledge of the system.
Figure13shows the sinusoidal time-varying desired trajectories and the tracking outputs of the system where the proposed controller is in green (solid line), the LQR in magenta (dotted line), the PID in cyan (dashed-dotted line), and the desired trajectory in black (dashed line).The control gains (k p 1 , k p 2 ), and (k y 1 , k y 2 ), the estimated disturbances ( dp , dy ), and the control input voltages (V p , V y ) for Case III are shown in Figures14-17.