Deep Deterministic Policy Gradient-Based ADRC for Quadrotor Altitude and Attitude Control Subject to Disturbance

Sanal, Sini; Thangavelu, Ananthan

doi:10.3390/automation7030091

Open AccessArticle

Deep Deterministic Policy Gradient-Based ADRC for Quadrotor Altitude and Attitude Control Subject to Disturbance

by

Sini Sanal

^* and

Ananthan Thangavelu

Department of Electrical and Electronics Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India

^*

Author to whom correspondence should be addressed.

Automation 2026, 7(3), 91; https://doi.org/10.3390/automation7030091 (registering DOI)

Submission received: 8 March 2026 / Revised: 6 June 2026 / Accepted: 9 June 2026 / Published: 12 June 2026

Download

Browse Figures

Versions Notes

Abstract

This paper proposes a reinforcement learning-assisted active disturbance rejection control (ADRC) framework for a nonlinear quadrotor unmanned aerial vehicle (UAV). Conventional ADRC controllers are designed for the quadrotor altitude and attitude channels. To evaluate robustness under disturbance-intensive conditions, a composite external disturbance is injected into the roll-channel dynamics. A Deep Deterministic Policy Gradient (DDPG)-based adaptive tuning mechanism is integrated into the roll-channel ADRC for the nonlinear state error feedback (NLSEF) gain adaptation, while fixed-parameter ADRC is retained for the remaining three channels. Without requiring system linearization and prior knowledge of disturbance models, the reinforcement learning agent learns an optimal gain adaptation policy directly through interaction with the nonlinear roll subsystem. Quantitative simulations demonstrate superior roll-axis disturbance rejection, leading to 90% faster settling time, the root mean square (RMS) control effort being reduced by 5.1%, and a 7.6% peak input suppression compared to conventional ADRC. The learning-based adaptation maintains comparable tracking accuracy across all channels while significantly improving transient recovery and control smoothness in the most disturbance-sensitive axis, validating selective reinforcement learning integration for robust nonlinear quadrotor flight control.

Keywords:

quadrotor; Deep Deterministic Policy Gradient; disturbances; active disturbance rejection control; altitude control

1. Introduction

Quadrotor attitude and altitude control has recently become an active area of research due to its critical role in various fields recently, such as aerial photography, geographic mapping, shipping and delivery, disaster management, search and rescue, law enforcement, wildlife monitoring, precision agriculture, and weather forecasting. In most of these applications, quadrotors are required to hover, track paths, and remain stable in outdoor environments [1,2,3,4,5]. However, outdoor environments are affected by wind disturbances, turbulence, and sudden gusts, which can seriously affect flight stability and mission success. Several control strategies have been reported in the literature for quadrotor position and attitude tracking under disturbances. In [6], a sliding mode control (SMC)-based tracking controller for quadrotor UAVs considering nominal dynamics with bounded uncertainties has been proposed; however, the method suffers from oscillations and chattering, and lacks an explicit adaptive disturbance estimation mechanism. In [7], a fault-tolerant predictive control scheme is developed using a discrete-time sliding mode observer to handle generic bounded disturbances, but the approach remains sensitive to observer noise, exhibits chattering, and does not incorporate adaptive learning capabilities. Similarly, disturbance observer-based attitude control methods for aggressive maneuvers are highly model-dependent and predominantly compensate only matched disturbances, which limits their robustness against unmodelled dynamics and external wind disturbances [8]. These limitations motivate the need for an adaptive, learning-assisted disturbance rejection framework capable of handling nonlinear uncertainties with reduced chattering and improved robustness.

In recent years, active disturbance rejection control (ADRC) and intelligent optimization techniques have been widely explored for UAV attitude control under disturbances. A dual closed-loop ADRC scheme for quadrotor attitude stabilization under wind gusts [9] employs a Proportional-Derivative (PD) outer loop, which limits disturbance rejection performance and lacks validation with realistic wind field models. In [10], an improved extended state observer (ESO) was developed with nonlinear feedback to address actuator faults and wind gusts; however, the method lacks online parameter adaptation. Hybrid intelligent algorithms, such as fish swarm optimization and Particle Swarm Optimization (PSO) with elite Gaussian learning, have been proposed for ADRC parameter optimization in fixed-wing UAVs; nevertheless, the tuning remains environment-independent, and disturbances are treated as unknown without adaptive updating [11]. A pigeon-inspired optimization approach has also been proposed for ADRC tuning in Vertical Take-off and Landing (VTOL) UAVs, achieving faster convergence but without providing rigorous stability analysis [12]. In [13], an improved Beetle Antennae Search–Sine Cosine Algorithm optimization (BAS–SCA)-based ADRC tuning method proposed with variable step-size search, combined second-order ADRC with Ant Colony Optimization (ACO) and BAS optimization; both methods rely on offline parameter tuning and focus only on attitude control, without addressing highly dynamic wind conditions. An adaptive composite disturbance rejection control (ACDRC) using iterative learning for agricultural micro-UAVs proposed [14].

In recent years [15,16,17,18], ADRC improvement focused through enhanced observer structures and composite disturbance rejection. An improved ESO is developed to address measurement noise, along with an adaptive composite disturbance rejection scheme for quadrotor attitude control. Later a finite-time composite control strategy for wind disturbance rejection in UAVs was introduced, which proposed a switching semi-decoupled ADRC framework for ground vehicles. Although these approaches enhance robustness and convergence, they still rely on fixed or offline-tuned parameters and do not provide online learning capability, which limits their adaptability under rapidly varying and highly nonlinear disturbance conditions in realistic quadrotor operations. The adaptive fast-finite-time observer in [19] provides rapid convergence under uncertainties; however, it requires complex adaptive gain design and may increase implementation complexity for high-dimensional nonlinear UAV systems. The sliding mode observer presented in [20] exhibits strong robustness against uncertainties and disturbances, but the discontinuous switching structure may introduce chattering effects and increased sensitivity to measurement noise. The nonlinear ESO-based sliding mode controller in [21] demonstrates improved disturbance estimation capability; nevertheless, the control structure involves additional optimization and sliding-mode components that increase computational burden. The Extended Kalman Filter (EKF)-based observer in [22] achieves effective state estimation for robotic manipulators, but its performance strongly depends on accurate system modeling and covariance tuning, which may limit robustness under significant model uncertainties and external disturbances.

In contrast, the proposed ESO offers a comparatively simpler model-independent disturbance estimation framework with lower computational complexity and easier parameterization for real-time quadrotor implementation. Moreover, the integration of the ESO with the DDPG-based adaptive NLSEF tuning mechanism enhances disturbance rejection and tracking performance under nonlinear and uncertain operating conditions without requiring precise system modeling or complex switching logic.

More recent studies have integrated fuzzy logic, neural networks, and deep reinforcement learning with ADRC to enhance disturbance rejection capability [23,24,25,26]. A fuzzy ADRC scheme with parameters optimized using an improved whale optimization algorithm, achieving reduced steady-state error and improved anti-interference performance; however, the parameter adaptation is performed offline, limiting real-time adaptability. An intelligent attitude controller combining ADRC with fuzzy logic and an adaptive radial basis function neural network to tune ESO and NLSEF parameters online proposed, but the approach lacks extensive real-world validation. A reinforcement learning-based parameter optimization strategy for active disturbance rejection control for an autonomous underwater vehicle proposed, in which discrete action spaces are formulated, making the approach unsuitable for continuous-control quadrotor systems.

A PID–deep reinforcement learning (DRL)-based wind disturbance compensation strategy proposed and the controller improves adaptability compared to classical PID, it exhibits limited responsiveness to fast, highly time-varying real-world wind fields.

Overall, existing ADRC-based and optimization-assisted controllers either rely on offline tuning, lack rigorous stability guarantees, or are validated under mild or unrealistic disturbance conditions, and many learning-based methods adopt discrete action spaces that are unsuitable for UAV dynamics, thereby highlighting the need for an online adaptive and learning-based disturbance rejection framework operating in a continuous-control space, capable of handling strong, time-varying disturbances in full-scale quadrotor systems.

The main contributions of this article are summarized as follows:

A Deep Deterministic Policy Gradient (DDPG)-based adaptive tuning strategy is proposed for the nonlinear active disturbance rejection control (NLADRC) framework, with specific emphasis on the online optimization of the NLSEF gain parameter β₁. In contrast to existing reinforcement learning (RL)-based ADRC approaches that tune multiple parameters or observer gains, the proposed method emphasizes selective tuning of the most sensitive nonlinear feedback gain, thereby reducing learning complexity while improving control smoothness and preventing actuator saturation under composite time-varying disturbances.
A composite control architecture is developed by integrating the DDPG-optimized NLADRC with a complete six-degree-of-freedom (6-DOF) quadrotor dynamic model. The extended state observer (ESO) estimates the combined effects of system uncertainties and disturbances in real time, while the DDPG agent adaptively tunes the roll-channel NLSEF gain $β_{1}$ . Unlike many existing studies based on simplified or decoupled models, the proposed framework considers a composite disturbance, enabling a more realistic robustness evaluation.
Numerical simulations demonstrate that the proposed DDPG-NLADRC for roll-channel scheme achieves significant improvements in trajectory tracking accuracy, disturbance rejection capability, and actuator stress mitigation.

The structure of this article is organized as follows:

Section 2 presents the mathematical modeling of the quadrotor system; Section 3 describes the design of ADRC strategy; Section 4 provides the design of proposed DDPG-based ADRC scheme; Section 5 provides numerical simulation, results and comparative analysis; finally, Section 6 concludes the paper and discusses future research directions.

2. Nonlinear Quadrotor System Modeling

A quadrotor has three linear position variables and three angular position variables. Accurate derivation of the quadrotor dynamics is challenging due to its under-actuated nature, strong coupling among states, and high sensitivity to external disturbances. In this work, the system dynamics are formulated using the Newton–Euler framework. To describe the motion, two coordinate systems are defined: an earth-fixed inertial reference frame

E = {x_{e}, y_{e}, z_{e}}

and a body-fixed reference frame

B = {x_{B}, y_{B}, z_{B}}

. The quadrotor motion is governed by the combined lift forces generated by the four rotors

(f_{1}, f_{2}, f_{3}, f_{4})

together with gravitational effects. Both translational and rotational motions are regulated through appropriate adjustment of the individual rotor speeds [27].

Quadrotor flight states are defined by three translational coordinates (x, y, z) and three rotational angles (

ϕ, θ, Ψ

) as depicted in Figure 1. Roll angle,

ϕ

determines the rotation around x-axis. Pitch angle, θ, determines the rotation around the y-axis and yaw angle,

Ψ

determines the rotation around z-axis. The quadrotor dynamics are described using two coordinate frames: the inertial frame (Earth frame), which is fixed to the ground and used to represent the global position and orientation of the vehicle, and the body-fixed frame, which is attached to the quadrotor’s center of mass and used to express the forces, torques, and rotational dynamics.

Using the coordinate transformation principle, the rotation matrix that maps vectors from the body-fixed frame to the earth-fixed inertial frame is as given in the following equation:

R = [\begin{matrix} C_{ψ} C_{θ} & C_{ψ} S_{θ} S_{ϕ} - S_{ψ} C_{ϕ} & C_{ψ} S_{θ} C_{ϕ} - S_{ψ} S_{ϕ} \\ S_{ψ} C_{θ} & S_{ψ} S_{θ} S_{ϕ} - S_{ψ} C_{ϕ} & S_{ψ} S_{θ} C_{ϕ} - C_{ψ} S_{ϕ} \\ {- S}_{θ} & C_{θ} S_{ϕ} & C_{θ} C_{ϕ} \end{matrix}]

(1)

where

C_{x}

= cos(x) and

S_{x}

= sin(x). Thrust T in the direction of z-axis is created by the combined rotor forces. By using transformation matrix for angular velocities from the inertial frame to the body frame,

[\begin{matrix} \dot{ϕ} \\ \dot{θ} \\ \dot{ψ} \end{matrix}] = [\begin{matrix} 1 & S_{ϕ} T_{θ} & C_{ϕ} T_{θ} \\ 0 & C_{ϕ} & {- S}_{ϕ} \\ 0 & \frac{S_{ϕ}}{T_{θ}} & \frac{C_{ϕ}}{C_{θ}} \end{matrix}] [\begin{matrix} p \\ q \\ r \end{matrix}]

(2)

where

v = [p q r]

^T represents the angular velocities in body frame. Angular speed of the ith rotor is ω_i and generates the force f_i in the rotor-axis direction. The rotor’s angular velocity and acceleration produce a torque

τ_{i}

around the rotor axis:

τ_{i} = b ω_{i}^{2}, f_{i} = k ω_{i}^{2},

(3)

where b is the drag constant and k is the thrust factor. The combined forces generated by the rotors produce a total thrust,

T

, along the body

z

-axis. The torque vector

τ_{B}

consists of the roll, pitch, and yaw components

τ_{ϕ}

,

τ_{θ}

, and

τ_{ψ}

acting about the corresponding body-frame axes:

τ_{B} = [\begin{matrix} τ_{ϕ} \\ τ_{ϴ} \\ τ_{Ψ} \end{matrix}] = [\begin{matrix} l k (- ω_{2}^{2} + ω_{4}^{2}) \\ l k (- ω_{1}^{2} + ω_{3}^{2}) \\ \sum_{i = 1}^{4} τ_{i} \end{matrix}]

(4)

where

l

is the distance between the rotor and the center of mass of the quadrotor. The quadcopter is assumed as a rigid body and Newton–Euler equations are used to describe system dynamics. The combined effect of inertial and centrifugal force is equal to the gravitational force and rotor’s total thrust:

m {\dot{V}}_{B} + v X (m V_{B}) = R^{T} G + T_{B}

(5)

In the inertial frame, quadrotor motion is governed by only gravitational force and the rotor generated thrust:

\begin{matrix} \ddot{x} = \frac{T}{m} C_{ψ} S_{θ} C_{ϕ} + S_{ψ} S_{ϕ} \\ \ddot{y} = \frac{T}{m} S_{ψ} S_{θ} C_{ϕ} - C_{ψ} S_{ϕ} \\ \ddot{z} = - g + \frac{T}{m} C_{θ} C_{ϕ} \end{matrix}}

(6)

In the body frame,

I \dot{v} + v \times (I v) + Γ = τ

(7)

where

Γ

is the gyroscopic force and

τ

is the external torque.

\begin{matrix} \ddot{ϕ} = \frac{(I_{y y} - I_{z z}) q r}{I_{x x}} - \frac{q ω_{Γ}}{I_{x x}} + \frac{τ_{ϕ}}{I_{x x}} \\ \ddot{θ} = \frac{(I_{z z} - I_{x x}) p r}{I_{y y}} + \frac{p ω_{Γ}}{I_{y y}} + \frac{τ_{θ}}{I_{y y}} \\ \ddot{ψ} = \frac{(I_{x x} - I_{y y}) p q}{I_{z z}} + τ_{ψ} / I_{z z} \end{matrix}}

(8)

where

ω_{Γ} = ω_{1}

−

ω_{2}

{+ ω}_{3}

{- ω}_{4}

. Based on the complex dynamic characteristics of the quadrotor model, the overall control system is typically decomposed into four independent control channels, the altitude control channel (z), the pitch angle channel (θ), the roll angle channel (ϕ), and the yaw angle channel (ψ). This four-channel decomposition leads to the decentralized control design and the easier implementation of control algorithms. However, due to the strong coupling between translational and rotational motions, a robust and adaptive control strategy is essential to ensure stability and tracking performance under disturbances and modeling uncertainties.

3. Active Disturbance Rejection Controller Design

Owing to the dynamic structure, quadrotor aircraft system can be represented by four decoupled control channels: the altitude (z) channel, pitch angle (θ) channel, roll angle

(ϕ)

channel and the yaw angle (ψ) channel. ADRC scheme is a nonlinear control strategy in response to the limitations of the PID control algorithm. ADRC has a nonlinear tracking differentiator (TD), extended state observer (ESO) and nonlinear state error feedback law (NLSEF). The TD eliminates sudden changes in the setpoint and provides a smoothed input signal. ESO estimates each state variable value and the disturbance estimation value. The NLSEF law gives the control strategy of the controlled object. A nonlinear TD was constructed as the as numerical integration is more reliable and stable than numerical differentiation in a noisy environment. The design of the roll-channel controller is presented in this section as an example. A second-order TD can be designed as follows [27]:

\begin{matrix} v_{1} (k + 1) = v_{1} (k) + h . v_{2} (k) \\ v_{2} (k + 1) = v_{2} (k) + h . f h a n (v_{1} - v_{0}, v_{2}, r, h_{0}) \end{matrix}}

(9)

where

v_{0}

is reference signal,

v_{1}

is the tracking signal and

v_{2}

is its derivative, h is the sampling time, r is the speed factor that determines the convergence rate of the tracking differentiator and h₀ is the filtering factor associated with the sampling step size. The function

sign (\cdot)

denotes the signum function. The nonlinear function fhan(·) is referred to as the optimal synthetic rapid control function and is defined as follows:

f h a n (v_{1}, v_{2}, r, h_{0}) = \begin{matrix} d = r {h_{0}}^{2} \\ a_{0} = h_{0} x_{2} \\ y = x_{1 +} a_{0} \\ a_{1} = \sqrt{d (d + 8 | y |)} \\ a_{2} = a_{0} + s i g n (y) \frac{(a_{1 -} d)}{2} \\ s_{y} = \frac{s i g n (y + d) - s i g n (y - d)}{2} \\ a = (a_{0} + y - a_{2}) s_{y} + a_{2} \\ s_{a} = \frac{s i g n (a + d) - s i g n (a - d)}{2} \\ f h a n = - r (\frac{a}{d} - s i g n (a)) s_{a} - r s i g n (a) \end{matrix}}

(10)

The extended state observer (ESO) designed for the roll dynamics is expressed by the following equations:

\begin{matrix} e = z_{1} (k) - y (k) \\ z_{1} (k + 1) = z_{1} (k) + h (z_{2} (k) - β_{01} f a l (e, a_{1}, δ)) \\ z_{2} (k + 1) = z_{2} (k) + h (z_{3} (k) - β_{02} f a l (e, a_{2}, δ) + b_{0} u (k)) \\ z_{3} (k + 1) = z_{3} (k) + h (- β_{03} f a l (e, a_{3}, δ) + b_{0} u (k)) \end{matrix}}

(11)

where

z_{1}

,

z_{2}

and

z_{3}

are the observed values of

v_{1}, v_{2}

and roll-channel total disturbance, respectively.

β_{01}, β_{02} a n d β_{03}

denote the roll-channel observer gains, whereas y denotes the output of the roll channel. The parameter

h

denotes the sampling step size, parameter

b_{0}

represents the compensation factor, and

u (k)

is the control input.

a_{1}, a_{2} a n d a_{3}

are nonlinear factors; δ is a filter factor. fal(∙) is a nonlinear function, employed to improve estimation accuracy and enhance disturbance rejection capability, and it is defined as

f a l (e, a, δ) = \begin{matrix} \frac{e}{δ^{1 - a}}, | e | < δ \\ s i g n (e), {| e |}^{a}, | e | > δ \end{matrix}}

(12)

Finally, with the properly designed extended state observer, the nonlinear state error feedback law (NLSEF) for the roll channel is given by

\begin{matrix} e_{1} = v_{1} - z_{1} \\ e_{2} {= v}_{2} - z_{2} \\ u_{0} = β_{1} f a l (e_{1}, a_{1}, δ) + β_{2} f a l (e_{2}, a_{2}, δ) \end{matrix}}

(13)

u (k) = u_{0} (k) - \frac{z_{3} (k)}{b_{0}}

(14)

where

β_{1}, β_{2}

are the nonlinear combination coefficients,

a_{1}, a_{2}

are nonlinear factors, and u

(k)

is the final control quantity.

4. DDPG-Enhanced ADRC Design and Convergence Analysis

Reinforcement learning (RL) has emerged as an effective framework for enhancing controller autonomy in complex dynamical systems by enabling performance improvement through continuous interaction with the environment, rather than relying on explicit system models. In this work, a Deep Deterministic Policy Gradient (DDPG) algorithm is employed due to its suitability for continuous state and action spaces, which is essential for online tuning of ADRC parameters in nonlinear quadrotor dynamics. The DDPG architecture consists of actor and critic neural networks, where the actor generates continuous-valued tuning actions and the critic evaluates their quality by jointly processing system states and control inputs through parallel information pathways to estimate the action–value function.

The observation vector of the RL agent is designed to capture both the transient and steady-state characteristics of the system under disturbed operating conditions. The reward function is formulated to penalize tracking errors, excessive control effort, and abrupt variations in the control input. Furthermore, episode termination is implemented through the “Is Done” condition, whereby the training episode is terminated whenever actuator constraints are violated, safety limits are exceeded, or the maximum episode duration is reached, thereby ensuring stable and safe learning. Through the integration of observation space, reward shaping, and termination logic, the DDPG-based agent learns optimal ADRC tuning parameters in real time, enabling adaptive disturbance rejection and improved control performance under varying flight conditions.

In this study, a Deep Deterministic Policy Gradient (DDPG) framework is employed to adaptively tune the nonlinear state error feedback (NLSEF) gain

β_{1}

of the ADRC controller for roll motion control of the quadrotor as shown in Figure 2. This choice is motivated by its dominant role in shaping the transient response and control effort in ADRC. In comparison,

β_{2}

mainly affects damping characteristics, while the ESO gains primarily influence observer convergence and disturbance estimation dynamics. Therefore, by restricting the reinforcement learning agent to a single, highly sensitive parameter, the complexity of the learning problem is significantly reduced, leading to improved convergence and stability of the DDPG algorithm.

The state (observation) vector is defined as

s = [e_{ϕ}, {\dot{e}}_{ϕ}, ϕ, \dot{ϕ}]^{T}

, where

e_{ϕ}

and

{\dot{e}}_{ϕ}

denote the roll angle tracking error and its derivative, respectively, while

ϕ

and

\dot{ϕ}

represent the roll angle and roll rate. The action of the agent is chosen as the incremental adjustment of the NLSEF gain, expressed as

α = Δ β_{1} \in [- 30, 30]

, and the adaptive NLSEF gain is updated according to

β_{1} (t) = β_{1,0} + Δ β_{1} (t)

. This formulation allows the agent to dynamically tune the control effort in response to system dynamics and disturbances. The reward function is designed to penalize large tracking errors and angular velocities, and it is given by

r = - (10^{4} e_{ϕ}^{2} + 10 {\dot{ϕ}}^{2}) + R_{b}

, where the bonus term

R_{b} = 5

is awarded when

∣ e_{ϕ} ∣ < 0.01

, and

R_{b} = 0

otherwise. Stable training and safe exploration are ensured through proper reward formulation, while the training episode is terminated if the roll angle exceeds 0.8 rad or the roll rate exceeds 5 rad/s. The main hyperparameters used for training are as follows: sampling time is set to 0.005 s, the discount factor is chosen as 0.99, and the target network smoothing factor is fixed at

10^{- 3}

. A mini-batch size of 64 is employed with a replay buffer capacity of

10^{5}

. The initial exploration noise standard deviation is set to 0.3 and decays at a rate of

10^{- 5}

to ensure sufficient exploration during early training and stable convergence thereafter. The actor network is trained by performing gradient ascent on the expected Q-value, following the deterministic policy gradient formulation. The policy gradient is computed as

\nabla_{θ^{μ}} J \approx \frac{1}{N} \sum_{i = 1}^{N} \nabla_{a} Q (s_{i}, a ∣ θ^{Q}) \nabla_{θ^{μ}} μ (s_{i} ∣ θ^{μ})

(15)

The critic is optimized by minimizing the mean squared error (MSE) loss between its predicted Q-value and the target Q-value, expressed as

L = \frac{1}{N} \sum_{i = 1}^{N} {(Q (s_{i}, a_{i} ∣ θ^{Q}) - y_{i})}^{2},

(16)

where the target value is defined as

y_{i} = r_{i} + γ Q^{'} (s_{i + 1}, μ^{'} (s_{i + 1}))

(17)

where

μ (\cdot ∣ θ^{μ})

denotes the actor network parameterized by

θ^{μ}

, which maps the state

s_{i}

to a continuous action

a_{i}

, and

Q (\cdot ∣ θ^{Q})

represents the critic network parameterized by

θ^{Q}

, which estimates the action–value function. The symbols

Q^{'} (\cdot)

and

μ^{'} (\cdot)

indicate the corresponding target networks used to stabilize learning. The term

r_{i}

denotes the immediate reward at sample

i

,

γ \in (0, 1)

is the discount factor, and

N

is the mini-batch size drawn from the replay buffer. The gradients

\nabla_{a} Q (s_{i}, a ∣ θ^{Q})

and

\nabla_{θ^{μ}} μ (s_{i} | θ^{μ})

represent the critic and actor gradients, respectively, which together drive the policy update through backpropagation. This learning mechanism enables the DDPG agent to iteratively improve the control policy by accurately estimating the long-term reward associated with each NLSEF gain adjustment. For Lyapunov-based stability analysis for the proposed DDPG-based ADRC applied to the roll channel of the quadrotor system, the roll dynamics can be expressed in the ADRC canonical form as

\dot{ϕ} = p, \dot{p} = f (ϕ, p, t) + b u

(18)

where

f (ϕ, p, t)

represents the lumped total disturbance including model uncertainties and external aerodynamic effects and

b

denotes the nominal control gain. The tracking errors are defined as

e_{1} = ϕ_{ref} - ϕ, e_{2} = {\dot{e}}_{1} = - p

(19)

A standard quadratic Lyapunov candidate function is selected as

V_{c} = \frac{1}{2} (e_{1}^{2} + e_{2}^{2})

(20)

which is positive definite and radially unbounded. The time derivative of

V_{c}

along the system trajectories is given by

{\dot{V}}_{c} = e_{1} {\dot{e}}_{1} + e_{2} {\dot{e}}_{2} = e_{1} e_{2} + e_{2} (- f - b u)

(21)

The DDPG-based ADRC control law is defined as

u = \frac{1}{b} (u_{0} - \hat{f})

where

\hat{f} = z_{3}

is the ESO estimate of the total disturbance and

u_{0}

is the virtual control signal generated by the DDPG-tuned nonlinear state error feedback. Substituting the control law yields

{\dot{V}}_{c} = e_{1} e_{2} - e_{2} u_{0} - e_{2} \tilde{f}

(22)

where

\tilde{f} = f - \hat{f}

denotes the ESO estimation error. Since the ESO is designed to be asymptotically convergent, the estimation error satisfies

{l i m}_{t \to \infty} \tilde{f} = 0

. Furthermore, the DDPG policy generates a stabilizing feedback law of the form

u_{0} = k_{1} e_{1} + k_{2} e_{2}

, with

k_{1} > 0

and

k_{2} > 0

, leading to

{\dot{V}}_{c} = - (k_{2}) e_{2}^{2} - (k_{1} - 1) e_{1} e_{2} - e_{2} \tilde{f}

. For

k_{1} > 1

and

k_{2} > 0

, the first two terms are negative semi-definite, while the last term vanishes asymptotically due to ESO convergence.

Hence, the Lyapunov derivative satisfies

{\dot{V}}_{c} \leq - λ ∥ [e_{1} e_{2}] ∥^{2}

for some

λ > 0

after a finite transient. To account for observer dynamics, the ESO estimation errors are defined as

{\tilde{z}}_{i} = x_{i} - z_{i}

,

i = 1,2, 3

, and a composite Lyapunov function is constructed as

V_{tot} = V_{c} + \frac{1}{2} \sum_{i = 1}^{3} {\tilde{z}}_{i}^{2}

. Using standard ADRC observer theory, the ESO error dynamics are exponentially stable, which implies

{\dot{V}}_{tot} \leq - α ∥ e ∥^{2} - β ∥ \tilde{z} ∥^{2}

, with positive constants

α, β > 0

. Therefore, all tracking errors and ESO estimation errors are uniformly ultimately bounded, and the closed-loop roll subsystem under the proposed DDPG-based ADRC is globally asymptotically stable in the sense of Lyapunov.

The stability analysis of the proposed DDPG-assisted ADRC framework is performed under the assumption that the extended state observer (ESO) provides sufficiently accurate disturbance estimation and that the DDPG algorithm converges to a bounded control policy. In practical implementations, reinforcement learning-induced uncertainties such as exploration noise, neural network approximation errors, and incomplete convergence may influence the transient system response and robustness characteristics. Nevertheless, since the DDPG algorithm is utilized mainly for adaptive parameter tuning, the primary closed-loop stability behavior is governed by the ADRC framework and ESO dynamics. Therefore, the proposed method preserves stable tracking performance under bounded disturbances and parameter variations within the considered operating conditions.

To assess the convergence characteristics of the proposed DDPG-assisted ADRC framework, the reinforcement learning agent was trained for a total of 300 episodes, each consisting of a maximum of 500 time steps. The evolution of the episode reward and the corresponding 20-episode moving-average reward during the training process is illustrated in Figure 3. As observed, the episode reward exhibits minor fluctuations during the initial training phase due to the exploration mechanism of the DDPG algorithm. However, the moving-average reward converges smoothly and stabilizes at approximately 8957 after nearly 100 episodes, indicating successful policy convergence. Although several isolated reward drops are observed throughout the training process, these are attributed to exploratory actions and do not affect the overall learning stability. Furthermore, all training episodes reached the maximum episode length of 500 steps without premature termination, demonstrating stable closed-loop operation during learning. The critic-estimated Q-value increased progressively throughout the training process and reached 1133.29 at the final episode, confirming successful value-function learning and convergence of the DDPG algorithm.

5. Numeric Simulation

The parameters used in the quadrotor simulation are listed in Table 1. The performance of the proposed control framework is investigated through time-domain simulations under both nominal and disturbed operating conditions. Initially, the quadrotor is commanded to track a constant reference altitude of

z_{d} = 5

m, while the desired attitude angles are set to

[ϕ_{d}, θ_{d}, ψ_{d}] = [0.5, 0.2, 0.005] r a d

. These reference values are selected to represent a realistic non-hover maneuver involving simultaneous roll and pitch deflections, rather than a trivial equilibrium condition. The initial state of the system is chosen as

[z_{0}, ϕ_{0}, θ_{0}, ψ_{0}] = [0 0.5732 0 0]

, where altitude is expressed in meters and angular states are expressed in radians. The chosen initial conditions introduce a moderate initial roll angle deviation while maintaining zero initial displacement in the remaining channels, thereby facilitating a meaningful understanding of the transient behavior, coupling effects, and convergence rate between the translational and rotational dynamics.

Two simulation scenarios are considered to evaluate the disturbance rejection capability and robustness of the control strategies under roll-channel disturbances. In the first case, the roll channel is subjected to external disturbances, and the system performance is analyzed using the conventional ADRC controller (Figure 4, Figure 5, Figure 6 and Figure 7). In the second case, the same reference trajectory, initial conditions, and disturbance profile are maintained. However, the controller is replaced with the proposed DDPG-based ADRC framework (Figure 8, Figure 9, Figure 10 and Figure 11).

This allows the comparison of the roll tracking performance, control effort, and robustness improvement achieved through reinforcement learning-based adaptive tuning over the conventional ADRC approach. To evaluate the disturbance rejection capability of the nominal and proposed DDPG-based ADRC scheme, an external disturbance is injected into the roll channel in the form of a composite torque disturbance. The disturbance is mathematically modeled as

d_{ϕ} (t) = 0.5 u (t - 1) + 0.5 s i n (3 t) + 0.0316 ξ (t)

, where

u (t - 1)

represents a step input applied at

t = 1 s

,

\sin (3 t)

denotes a sinusoidal disturbance component,

ξ (t)

is a zero-mean, and it is a 0.1 variance Gaussian white noise process, where

ξ (t) \sim N (0,1)

. This composite disturbance emulates realistic environmental effects such as sudden wind gusts, periodic aerodynamic oscillations, and measurement or process uncertainties. Among all attitude channels, the roll dynamics are particularly sensitive to such disturbances and are strongly coupled with the overall system behavior. Hence, injecting disturbances only in the roll channel is sufficient to evaluate the global disturbance rejection capability of the controller. Furthermore, the NLSEF gain parameter

β_{1}

of the roll channel is adaptively tuned using the DDPG agent. The parameter

β_{1}

directly governs the nonlinear feedback intensity and significantly influences transient performance indices such as rise time, overshoot, and control effort. Adaptation of a single, critical control parameter effectively minimizes learning complexity without compromising performance, resulting in improved stability, faster convergence, and practicality of the proposed control framework.

The quantitative performance comparison between the conventional ADRC and the proposed DDPG-based ADRC is summarized in Table 2 for all four channels under disturbed operating conditions. Both controllers are evaluated using standard time-domain performance indices, including root mean square error (RMSE), settling time (

T_{s}

), steady-state error (

e_{s s}

), and control effort metrics. Tracking accuracy in each channel is evaluated using RMSE, defined as RMSE =

\sqrt{\frac{1}{T} \int_{0}^{T} {(y (t) - r (t))}^{2} d t}

, where

y (t)

is the system output and

r (t)

is the reference trajectory. A lower RMSE indicates superior trajectory tracking and enhanced disturbance rejection capability. RMS control effort gives the quantitative measure of the average control energy utilized by the system throughout the simulation interval, Urms =

\sqrt{\frac{1}{T} \int_{0}^{T} u^{2} (t) d t}

. Peak control effort is the maximum absolute control magnitude applied, Upeak =

m a x | u (t) |

.

The tracking response of the baseline controller, illustrated in Figure 4, demonstrates that the ADRC is capable of maintaining stable reference tracking across all four channels, namely altitude, roll, pitch, and yaw. In the altitude channel, the controller achieves an RMSE of 3.30 m with a settling time of 1.81 s. The response is smooth and free from overshoot, indicating satisfactory baseline tracking capability. RMS control effort of 174.02 N and peak value of 218.88 N indicate sustained actuator activity during disturbance rejection. ESO outputs and disturbance estimates of each channel are depicted in Figure 5 and Figure 6, respectively.

In attitude channels, the roll channel exhibits the most pronounced transient oscillations, reflected by an RMSE of 0.23 rad and a settling time of 5.0 s. The oscillatory behavior of the roll channel results from the combined effects of strong nonlinear coupling among rotational dynamics, high sensitivity of angular acceleration to torque disturbances and rapid disturbance variation.

This results in a high RMS control effort of 191.78 N·m and peak value of 267.19 N·m, confirming the aggressive compensation depicted in Figure 6. The pitch channel performs better, with an RMSE of 0.08 rad, settling time of 0.88 s, RMS control effort of 174.24 N·m and peak control effort of 220.01 N.m. The yaw channel achieves an RMSE of 0.002 rad with a settling time of 0.63 s, while requiring an RMS control effort of 152.83 N·m and a peak control effort of 251.85 N·m.

RMS control effort of 152.83 N and peak control effort of 251.85 N·m are utilized in the yaw channel. The yaw channel is characterized by relatively low rotational inertia, and it exhibits particularly high peak torque as small angular deviations require rapid corrective action. Overall, although the conventional ADRC guarantees system stability and satisfactory tracking performance across all channels, the results clearly reveal that its primary drawback in terms of excessive control effort and increased actuator stress, particularly in the roll channel. This motivates the need for an adaptive learning-based enhancement to improve transient performance while simultaneously reducing control aggressiveness.

The DDPG-optimized ADRC tracking responses are depicted in Figure 8. Altitude RMSE remains unchanged at 3.30 m, demonstrating that steady-state accuracy is preserved. However, RMS control effort decreases from 174.02 N to 166.55 N, representing improved actuation smoothness and energy efficiency. Although the peak control effort increases slightly to 228.21 N, and transient duration is reduced, suggesting sharper but shorter corrective action. This reflects a more efficient redistribution of control energy. The state estimates and disturbance estimates provided by the ESO in each channel with the proposed controller depicted in Figure 9 and Figure 10 respectively.

Table 2 presents a quantitative comparison between conventional ADRC and DDPG-based ADRC under composite disturbances. In the disturbance-sensitive roll channel, the DDPG-based ADRC demonstrates a marked improvement in transient and steady-state behavior. The settling time is reduced to 0.50 s with a reduced RMSE of 0.10 rad and zero steady-state error. The reduction in settling time highlights the effectiveness of the learned adaptation in enhancing system responsiveness. Furthermore, the reduced control effort indices, characterized by an RMS value of 182.02 N·m and a peak value of 246.81 N·m, indicate that the improved transient performance is achieved without significant increase in actuation demand. From the results, it is clear that the DDPG-assisted tuning mechanism successfully balances disturbance rejection and control smoothness. In particular, the moderate RMS value reflects reduced average control energy expenditure, while the bounded peak torque confirms the absence of aggressive or impulsive control action during disturbance rejection.

In the pitch channel, the RMSE is slightly reduced from 0.08 rad to 0.07 rad, while the RMS control torque decreases from 174.24 N·m to 167.15 N·m, indicating that improved tracking accuracy is achieved with lower average actuation effort. A slight increase in peak control effort to 231.69 N·m along with reduced RMS control effort indicates that the controller applies a brief but stronger corrective torque during the initial transient phase to accelerate disturbance suppression, while maintaining lower average energy consumption over the full response. In yaw dynamics, RMSE is maintained at 0.002 rad, indicating excellent tracking accuracy and effective disturbance attenuation. Settling time decreases from 0.63 s to 0.56 s, reflecting faster convergence of yaw angle error. The RMS control torque remaining nearly unchanged indicates that the improved transient response is achieved without increasing the average control energy. This reflects efficient gain tuning, where the controller enhances responsiveness while preserving energy neutrality over the full simulation horizon. The peak control effort decreases significantly from 251.85 N·m to 214.39 N·m. This reduction in maximum instantaneous torque suggests that the DDPG-based tuning mitigates abrupt control spikes typically associated with disturbance rejection in yaw motion. The proposed controller reduces actuator stress and potential saturation risk, thereby enhancing hardware implementation feasibility.

It is observed that the DDPG-based ADRC controller produces higher-frequency variations in the control signal during the transient interval (1–2 s) compared to the conventional ADRC. This behavior arises primarily from the interaction between the learned policy and the observer dynamics. In particular, the Deep Deterministic Policy Gradient policy adaptively increases the nonlinear state error feedback gains to rapidly reduce tracking error during transients. As a result, the controller becomes highly sensitive to small variations in the error states and to residual estimation errors from the extended state observer (ESO), which is still converging in this interval. These effects lead to rapid gain fluctuations and overcompensation, lead to oscillatory control action. While this improves tracking accuracy, it may not be directly implementable in practical quadrotor systems due to actuator bandwidth and rate limitations. To address this, smoothing techniques such as low-pass filtering or control rate limiting can be incorporated.

Overall, a substantial performance enhancement is observed with the DDPG-based ADRC compared to the conventional ADRC. Roll channel exhibits a reduction in settling time and steady-state error, demonstrating superior disturbance rejection and faster transient recovery. Collectively, these results confirm that adaptive tuning of the NLSEF gains through DDPG method enhances the inherent robustness of the ADRC framework while significantly improving overall control performance. The proposed DDPG-based adaptation not only accelerates dynamic response and improves disturbance rejection capability, but also optimizes control energy utilization and reduces excessive actuation demands. Consequently, the controller achieves a balanced trade-off between responsiveness, robustness, and actuator efficiency, thereby strengthening the practical feasibility and real-time deployment potential of the control system. The proposed DDPG-assisted ADRC framework was evaluated under a representative disturbance profile consisting of step, sinusoidal, and noise components. Although improved tracking and disturbance rejection performance were achieved under the considered conditions, the generalization capability of the learned policy under unseen disturbance scenarios has not been fully investigated in this study. Further validation under varying disturbance conditions and parameter uncertainties will be considered in future work.

While the proposed DDPG-based NLADRC framework demonstrates improved tracking performance and control smoothness, certain limitations should be noted. First, the adaptive mechanism is restricted to tuning only a single NLSEF parameter (β₁), which may limit the achievable performance compared to full multi-parameter optimization of the ADRC structure. Parameters related to the extended state observer (ESO) and other NLSEF gains remain fixed, which may reduce adaptability under significantly varying operating conditions. Future work will focus on extending the proposed approach to multi-parameter and multi-axis tuning, as well as experimental validation on real quadrotor platforms.

The present study mainly focuses on the closed-loop performance of the proposed DDPG-assisted ADRC framework under the considered simulation conditions. Detailed convergence analysis and evaluation across multiple random seeds have not been included in the current work. Further repeatability and convergence investigations will be considered in future work. Although the proposed method is validated through numerical simulations, its practical implementation is feasible due to the structure of the ADRC framework and the use of continuous-control actions generated by the DDPG agent. However, real-world deployment may introduce additional challenges, including sensor noise, actuator constraints, communication delays, and onboard computational limitations. Hardware-in-the-loop (HIL) simulation and real-time experimental validation are necessary to further assess the performance of the proposed method under practical operating conditions. These aspects will be considered as part of future work.

6. Conclusions

Simulation based comparative evaluation evidently gives that the proposed DDPG-based ADRC achieves better tracking accuracy comparable to the conventional ADRC across altitude and attitude channels with closely matched RMSE and steady-state error indices. Marginal deviations in the error metrics of certain states remain within acceptable bounds and can be attributed to the inherent exploration and adaptive nature of the proposed method. The learning-based framework delivers pronounced improvements in control effort quality. In the roll channel, the proposed DDPG-based ADRC achieves 6.5% reduction in RMSE, 90% faster settling along with 5.1% reduction in RMS control effort and 7.6% suppression of peak control torque, demonstrating substantially improved transient performance with enhanced control efficiency. These enhancements yield smoother actuation, superior actuator efficiency, and reduced control aggressiveness all while maintaining high performance. Future work will focus on multi-parameter adaptive tuning and hardware validation to ensure scalability under real-world uncertainties.

Author Contributions

Conceptualization, S.S.; methodology, S.S.; software, S.S.; validation, S.S.; investigation, S.S.; data curation, S.S.; writing—original draft preparation, S.S.; writing—review and editing, S.S.; visualization, S.S.; supervision, A.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data are available upon request.

Acknowledgments

No external support was received for the preparation of this manuscript beyond the contributions described in the author contribution and funding sections. The research data presented in this study originate from the authors’ own investigations. All figures and graphs were generated using the authors’ own MATLAB R2023b code based on the collected research data. The authors have carefully reviewed and edited the generated outputs and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADRC	Active Disturbance Rejection Control
UAV	Unmanned Aerial Vehicle
DDPG	Deep Deterministic Policy Gradient
NLSEF	Nonlinear State Error Feedback
SMC	Sliding Mode Control
ESO	Extended State Observer
PSO	Particle swam Optimization
VTOL	Vertical Take-off and Landing
BAS-SCA	Beetle Antennae Search–Sine Cosine Algorithm
ACO	Ant Colony Optimization
DRL	Deep Reinforcement Learning
NLADRC	Nonlinear Active Disturbance Rejection Control
TD	Tracking Differentiator
RL	Reinforcement Learning
RMSE	Root Mean Square Error
RMS	Root Mean Square

References

Manalathody, A.; Krishnan, K.S.; Subramanian, J.A.; Thangavel, S.; Thangeswaran, R.S.K. Nonlinear Controller for a Drone with Slung Load. In Recent Advances in Aerospace Engineering; Singh, S., Ramulu, P.J., Gautam, S.S., Eds.; Springer Nature: Singapore, 2024. [Google Scholar]
Thangeswaran, R.S.K.; Ramakrishnananda, B.; Ganesan, A.K.A.; Srikanth, R.; Sankaran, R.; Patnaik, S.; Sekar, S.P. Sizing of inter-city electric vertical takeoff and landing aircraft. AIP Conf. Proc. 2023, 2766, 020015. [Google Scholar] [CrossRef]
Aich, H.; Akella, S.P.; Ramakrishnananda, B.; Kumar, T.R.S. An Aerodynamic Model for Gliding Snake-Bots. In New Technologies and Developments in Unmanned Systems; Karakoc, T.H., Le Clainche, S., Chen, X., Dalkiran, A., Ercan, A.H., Eds.; Springer Nature: Singapore, 2023. [Google Scholar]
Sini, S.; Ananthan, T. A Disturbance Observer Based Control for Quadrotor Aircraft Subject to Wind Gusts. In Proceedings of the 2022 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES), Thiruvananthapuram, India, 10–12 March 2022; pp. 491–496. [Google Scholar]
Manalathody, A.; Krishnan, K.S.; Subramanian, J.A.; Thangavel, S.; Pushpangathan, J.V.; Paranjothy, H. Nonlinear and Linear PID Controllers-Based Hybrid Flight Control Strategy for a Quadcopter with Slung Load. IEEE Access 2025, 13, 48520–48542. [Google Scholar] [CrossRef]
Xiong, J.-J.; Zheng, E.-H. Position and attitude tracking control for a quadrotor UAV. ISA Trans. 2014, 53, 725–731. [Google Scholar] [CrossRef]
Shu, Q.; Yang, P.; Wang, Y.; Ma, B. Fault Tolerant Predictive Control Based on Discrete-Time Sliding Mode Observer for Quadrotor UAV. J. Adv. Comput. Intell. Intell. Inform. 2018, 22, 498–505. [Google Scholar] [CrossRef]
Castillo, A.; Sanz, R.; García, P.; Qiu, W.; Wang, H.; Xu, C. Disturbance observer-based quadrotor attitude tracking control for aggressive maneuvers. Control Eng. Pract. 2019, 82, 14–23. [Google Scholar] [CrossRef]
Yang, H.; Cheng, L.; Xia, Y.; Yuan, Y. Active Disturbance Rejection Attitude Control for a Dual Closed-Loop Quadrotor Under Gust Wind. IEEE Trans. Control Syst. Technol. 2018, 26, 1400–1405. [Google Scholar] [CrossRef]
Guo, Y.; Jiang, B.; Zhang, Y. A novel robust attitude control for quadrotor aircraft subject to actuator faults and wind gusts. IEEE/CAA J. Autom. Sin. 2018, 5, 292–300. [Google Scholar] [CrossRef]
Kang, C.; Wang, S.; Ren, W.; Lu, Y.; Wang, B. Optimization design and application of active disturbance rejection controller based on intelligent algorithm. IEEE Access 2019, 7, 59862–59870. [Google Scholar] [CrossRef]
He, H.; Duan, H. A multi-strategy pigeon-inspired optimization approach to active disturbance rejection control parameters tuning for vertical take-off and landing fixed-wing UAV. Chin. J. Aeronaut. 2022, 35, 19–30. [Google Scholar] [CrossRef]
Yang, F.; Jiang, X.; Li, J.; Luo, C.; Ma, S. Disturbance Rejection Attitude Control of Quadrotor UAVs Based on Improved BAS Algorithm. In Proceedings of the 2023 IEEE 13th International Conference on CYBER Technology in Automation, Control, and Intelligent Systems (CYBER), Qinhuangdao, China, 11–14 July 2023. [Google Scholar]
Li, W.; Yang, F.; Zhong, L.; Wu, H.; Jiang, X.; Chukalin, A.V. Attitude control of UAVs with search optimization and disturbance rejection strategies. Mathematics 2023, 11, 3794. [Google Scholar] [CrossRef]
Du, Y.; Cao, W.; She, J. Analysis and design of active disturbance rejection control with an improved extended state observer for systems with measurement noise. IEEE Trans. Ind. Electron. 2023, 70, 855–865. [Google Scholar] [CrossRef]
Wang, S.; Chen, J.; He, X. An adaptive composite disturbance rejection for attitude control of the agricultural quadrotor UAV. ISA Trans. 2023, 129, 564–579. [Google Scholar] [CrossRef] [PubMed]
Li, S.H.; Sun, Z.X.; Talpur, M.A. A finite time composite control method for quadrotor UAV with wind disturbance rejection. Comput. Electr. Eng. 2023, 103, 108299. [Google Scholar] [CrossRef]
Wang, H.; Zuo, Z.; Xue, W.; Wang, Y.; Yang, H. Switching longitudinal and lateral semi-decoupled active disturbance rejection control for unmanned ground vehicles. IEEE Trans. Ind. Electron. 2024, 71, 3034–3043. [Google Scholar] [CrossRef]
Razmjooei, H.; Palli, G.; Abdi, E.; Terzo, M.; Strano, S. Design and experimental validation of an adaptive fast-finite-time observer on uncertain electro-hydraulic systems. Control Eng. Pract. 2023, 131, 105391. [Google Scholar] [CrossRef]
Razmjooei, H.; Palli, G.; Strano, S.; Tordela, C. Development of sliding mode observers for estimating sideslip angle and lateral forces in road vehicles. Trans. Inst. Meas. Control 2025, 48, 2032–2044. [Google Scholar] [CrossRef]
Hui, J. Nonlinear extended state observer-based model-free near-optimal sliding mode water level controller of an inverted U-tube steam generator. Eng. Appl. Artif. Intell. 2026, 163, 112755. [Google Scholar] [CrossRef]
Liu, J.; Chen, T.; Dou, Z.; Li, X.; Zou, X. Trajectory Tracking Control of a Six-Axis Robotic Manipulator Based on an Extended Kalman Filter-Based State Observer. Machines 2026, 14, 78. [Google Scholar] [CrossRef]
Chen, G.; Jiang, Y.; Guo, K. Neural Active Disturbance Rejection Adaptive Lateral Manipulation Control Method for Unmanned Driving Robot. IEEE Intell. Transp. Syst. Mag. 2022, 15, 387–399. [Google Scholar] [CrossRef]
Shan, Z.; Wang, Y.; Liu, X.; Wei, C. Fuzzy Automatic Disturbance Rejection Control of Quadrotor UAV Based on Improved Whale Optimization Algorithm. IEEE Access 2023, 11, 69117–69130. [Google Scholar] [CrossRef]
Shen, S.; Xu, J.; Chen, P.; Xia, Q. An intelligence attitude controller based on active disturbance rejection control technology for an unmanned helicopter. IEEE Trans. Veh. Technol. 2023, 72, 2936–2946. [Google Scholar] [CrossRef]
Ma, Q.; Wu, Y.; Shoukat, M.U.; Yan, Y.; Wang, J.; Yang, L.; Yan, F.; Yan, L. Deep Reinforcement Learning-Based Wind Disturbance Rejection Control Strategy for UAV. Drones 2024, 8, 632. [Google Scholar] [CrossRef]
Chang, K.; Xia, Y.; Huang, K.; Ma, D. Obstacle avoidance and active disturbance rejection control for a quadrotor. Neurocomputing 2016, 190, 60–69. [Google Scholar] [CrossRef]

Figure 1. Free-body representation of quadrotor dynamics.

Figure 2. Flowchart for DDPG-based ADRC algorithm for quadrotor roll control.

Figure 3. Episode reward and 20-episode moving-average reward.

Figure 4. Setpoint tracking with ADRC.

Figure 5. ESO outputs with ADRC.

Figure 6. Disturbance estimates with ADRC.

Figure 7. Control effort with ADRC.

Figure 8. Setpoint tracking with DDPG-based ADRC.

Figure 9. ESO outputs with DDPG-based ADRC.

Figure 10. Disturbance estimates with DDPG-based ADRC.

Figure 11. Control effort with DDPG-based ADRC.

Table 1. Quadrotor model parameters.

Variable	Value	Measuring Unit
Mass	m = 0.468	kg
Distance between rotor and center of mass of quadrotor	l = 0.225	m
Lift constant	k = 2.95 × 10⁻⁶
Drag constant	b = 1.14 × 10⁻⁷
Inertia moment of rotor	I_M = 3.357 × 10⁻⁵	kg. m²
x-axis moment of inertia	I_xx = 4.856 × 10⁻³	kg. m²
y-axis moment of inertia	I_yy = 4.856 × 10⁻³	kg. m²
z-axis moment of inertia	I_zz = 8.801 × 10⁻³	kg. m²

Table 2. Quantitative comparison between conventional ADRC and DDPG-based ADRC under composite disturbances.

Channel	Controller	RMSE	Ts (s)	RMS Control Effort	Peak Control Effort
Altitude	ADRC	3.30 m	1.81	174.02 N	218.88 N
Altitude	DDPG-ADRC	3.30 m	1.81	166.55 N	228.21 N
Roll	ADRC	0.23 rad	5.00	191.78 N·m	267.19 N·m
Roll	DDPG-ADRC	0.10 rad	0.50	182.02 N·m	246.81 N·m
Pitch	ADRC	0.08 rad	0.88	174.24 N·m	220.01 N·m
Pitch	DDPG-ADRC	0.07 rad	0.87	167.15 N·m	231.69 N·m
Yaw	ADRC	0.002 rad	0.63	152.83 N·m	251.85 N·m
Yaw	DDPG-ADRC	0.002 rad	0.56	152.84 N·m	214.39 N·m

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sanal, S.; Thangavelu, A. Deep Deterministic Policy Gradient-Based ADRC for Quadrotor Altitude and Attitude Control Subject to Disturbance. Automation 2026, 7, 91. https://doi.org/10.3390/automation7030091

AMA Style

Sanal S, Thangavelu A. Deep Deterministic Policy Gradient-Based ADRC for Quadrotor Altitude and Attitude Control Subject to Disturbance. Automation. 2026; 7(3):91. https://doi.org/10.3390/automation7030091

Chicago/Turabian Style

Sanal, Sini, and Ananthan Thangavelu. 2026. "Deep Deterministic Policy Gradient-Based ADRC for Quadrotor Altitude and Attitude Control Subject to Disturbance" Automation 7, no. 3: 91. https://doi.org/10.3390/automation7030091

APA Style

Sanal, S., & Thangavelu, A. (2026). Deep Deterministic Policy Gradient-Based ADRC for Quadrotor Altitude and Attitude Control Subject to Disturbance. Automation, 7(3), 91. https://doi.org/10.3390/automation7030091

Article Menu

Deep Deterministic Policy Gradient-Based ADRC for Quadrotor Altitude and Attitude Control Subject to Disturbance

Abstract

1. Introduction

2. Nonlinear Quadrotor System Modeling

3. Active Disturbance Rejection Controller Design

4. DDPG-Enhanced ADRC Design and Convergence Analysis

5. Numeric Simulation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI