1. Introduction
Quadrotor unmanned aerial vehicles (UAVs) have attracted attention thanks to their ability to hovering and to take off and landing vertically. Due to their under-actuated nature, quadrotors’ position control is performed by controlling the attitude angles [
1]. For this reason, attitude control of quadrotors has been a hot research topic in recent years. However, quadrotors are subject to parameter uncertainty and external disturbance, which threaten flight safety and pose huge challenges to the design of controllers [
2]. In addition, with the popularity of quadrotors, higher requirements are being placed on the controllers. Thus, it is urgent to design an advanced controller to improve reliability and rapidity.
In the literature, plenty of approaches have been studied for the quadrotor attitude control problem. As a classical controller, proportion integration differentiation (PID) is widely used because of its simple structure and good control effect [
3,
4,
5]. Taybe et al. [
6] developed an augmented proportion differentiation (PD) attitude controller that guarantees exponential stability. Cao et al. [
7] focused on the position control of quadrotors using an inner–outer loop control structure. The outer loop generates a saturated thrust, reference roll, and pitch angles, while the inner loop is designed to follow these reference angles using a traditional PID controller.
Due to nonlinearity and disturbances, the control effect of PID is unsatisfactory. As one of the most important control techniques, sliding mode control (SMC) is able to handle nonlinear systems with external disturbances. Based on second-order SMC, Zheng et al. [
8] designed a controller for a small quadrotor unmanned aerial vehicle (UAV). Xiong et al. [
9] designed a highly coupled and nonlinear controller for a fully actuated UAV through a novel robust terminal sliding mode control algorithm. Nevertheless, the oscillation caused by SMC is the main obstacle restricting its application.
To achieve robust performance and stabilization, the robust
control method of George Zames has been widely studied [
10]. Due to the uncertain nature of aircraft systems, Babar et al. [
11] improved the traditional inner–outer loop strategy and adopted a robust controller for the inner control loop. Liu et al. [
12] designed a distributed robust controller consisting of a position controller and an attitude controller for multiple quadrotors with nonlinearities and disturbances.
To deal with nonlinearities and disturbances, the main idea of active disturbance rejection control (ADRC) is to reduce the state feedback, whether linear or non-linear, to a cascade of integrators [
13,
14]. To solve the problem that UAV tracking control relies too much on mathematical modeling and the accuracy of measurements, Niu et al. [
15] proposed a longitudinal pitching angle control system based on a nonlinear ADRC. Lotufo et al. [
16] combined ADRC with embedded model control (EMC), relying on the disturbance rejector to bridge the gap between model and reality.
However, there are issues remaining that deserve attention [
17].
The classical controller design relies on understanding the physics of flight, and has difficulty to handling the coupling multiple loops design task. In other words, the classical one-loop-at-a-time design cannot guarantee success when more loops are added and coupled.
Modern control techniques often require exact knowledge of models and are sensitive to parameter uncertainty and external disturbances [
18]. However, different loads in each flight mission lead to uncertainty in system parameters. Meanwhile, parameters may be difficult to obtain, especially aerodynamic parameters. This sometimes leads to unstable behaviors, limiting the application of model-based controllers.
For modern robust controllers [
12], it is usually difficult to obtain the upper bounds of external disturbance and parameter uncertainty, which causes unsatisfactory performance.
In the ADRC algorithm, the predefined bandwidth of the closed-loop system is unable to guarantee the tradeoff between robustness and transient tracking performance. Meanwhile, the estimation of parameters affects the ability of the controller to resist disturbances [
14].
Aiming at the controller parameters tuning problem, many optimization algorithms have been used, including genetic algorithms (GA) [
19], particle swarm optimization (PSO) [
20], and grey wolf optimization (GWO) [
21]. Bolandi et al. [
22] used an analytical optimization method to tune a conventional PID controller for stabilization and disturbance rejection of quadrotors.
With the development of computer science and technology, reinforcement learning (RL) is able to autonomously learn optimal strategies through continuous interaction with the environment and is considered one of the most likely approaches for achieving general artificial intelligence [
23]. Lee et al. [
24] proposed an RL-based adaptive PID controller for dynamic positioning systems. The results showed that the system had better station-keeping performance without any deterioration in its control efficiency. Gheisarnejad et al. [
25] proposed a deep deterministic policy gradient (DDPG)-based supplementary controller to enhance the adaptive capability of the tracking control problem. Zhao et al. [
26] employed RL to update the optimal control weights in the fault-tolerant formation control law design. Zheng et al. [
27] used the Q-learning algorithm to select the adaptive parameters for ADRC. However, as Q-learning can only deal with discrete problems, the states need to be stored in the Q table, and the action must be discrete. By itself, Q-learning cannot deal with complex continuous problems such as attitude control of UAVs. RL, which can solve the nonlinear optimal consensus control problem, is widely used in fault-tolerant control. Ma et al. [
28] presented an adaptive model-free fault-tolerant control scheme based on integral RL by introducing the integral of the tracking error. Li et al. [
29] designed direct adaptive optimal controllers by combining the backstepping technique with RL. The critic network is used to approximate the strategic utility functions and the action network is used to approximate the unknown and desired control input signals.
Motivated by the above discussions, ADRC based on DDPG is proposed in this paper. The main contributions of this paper are as follows:
A realistic and nonlinear model of quadrotors is established, considering parameter uncertainty and external disturbances.
Online continuous adjustment of the bandwidth of the closed loop is realized by DDPG, and is beneficial for balancing the robustness and transient tracking performance.
DDPG is adopted to achieve fast and accurate compensation for the total disturbance of the system, leading to the response speed and control accuracy being further improved.
The remainder of this paper is organized as follows. In
Section 2, the proposed dynamic quadrotor model with internal and external disturbances is provided. The proposed DDPG-based ADRC is presented in
Section 3. The simulation results are provided and analyzed in
Section 4. Finally,
Section 5 presents our conclusions.