Next Article in Journal
Design and Field Validation of a Modular Vision-Guided UAV System for Real-Time Adaptive Vegetative Restoration
Previous Article in Journal
Bridging the Fragmentation in Unmanned Aircraft System Traffic Management (UTM): A Systematic Survey on UTM
Previous Article in Special Issue
Energy-Constrained UAV-UGV Coordination for Online Task Discovery in Known Environments with Obstacles
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Precision Docking of a Foldable Quadrotor on a Wheel-Legged Robot via CFNTSM with GFA-FEO and FiLM-SAC Deep Reinforcement Learning

1
2011 College, Nanjing Tech University, Nanjing 211816, China
2
College of Electrical Engineering and Control Science, Nanjing Tech University, Nanjing 211816, China
*
Author to whom correspondence should be addressed.
Drones 2026, 10(5), 378; https://doi.org/10.3390/drones10050378
Submission received: 13 April 2026 / Revised: 8 May 2026 / Accepted: 11 May 2026 / Published: 14 May 2026

Highlights

What are the main findings?
  • An air–ground cooperative robotic system is presented, featuring a foldable quadrotor and a small wheel-legged robot that employs electro-permanent magnets for autonomous deployment and retrieval.
  • A hierarchical control framework integrating an improved continuous finite-time nonsingular terminal sliding mode controller (CFNTSM), a generalized-fuzzy-approximationbased finite-time extended observer (GFA-FEO), and a feature-wise linear modulation soft actor–critic (FiLM-SAC) deep reinforcement learning policy is developed, enabling sub-centimetre docking precision (<10 mm) and robust payload-adaptive takeoff under a twofold mass increase.
What are the implications of the main findings?
  • This framework presents a promising and comprehensively modeled approach toward autonomous air–ground robotic coordination for factory inspection, geographical surveying, and disaster-response missions.
  • The proposed control method resolves the severe control-authority degradation during in-flight folding and compensates for gait-induced periodic vibrations.

Abstract

Deploying unmanned aerial vehicles (UAVs) cooperatively with legged robots for disaster response and inspection requires autonomous docking on miniature walking platforms. This study addresses the problem of landing a foldable quadrotor onto the back of a trotting wheel-legged robot ( 300 × 180 mm) and subsequently taking off while carrying it as a payload. Four tightly coupled challenges distinguish this task from conventional mobile-platform landing: (i) an extremely small landing surface, (ii) gait-induced periodic vibrations at 2.5 Hz, (iii) continuous platform translation at 0.3 0.8 m/s, and (iv) surface docking that requires simultaneous position and attitude matching rather than mere point tracking. The proposed framework comprises four components: (1) a novel single-servo crank-rocker folding mechanism that reduces the folded body footprint by 48.5% and the maximum linear dimension from 590 mm to 309 mm (↓47.6%) compared with the prior dual-servo design; (2) a staged Continuous Fast Nonsingular Terminal Sliding Mode (CFNTSM) controller combined with a Gait-Frequency-Aware Finite-time Extended Observer (GFA-FEO); (3) a Feature-wise Linear Modulation Soft Actor-Critic (FiLM-SAC) residual reinforcement-learning policy conditioned on physical states and mission phase, with an adaptive trust weight λ ( t ) ; and (4) a payload-adaptive takeoff strategy with parameter hot-switching to handle the twofold mass increase. Extensive Monte Carlo simulations and ablation studies across three experiment groups demonstrate that the proposed hierarchical framework achieves sub-centimetre (<10 mm) position accuracy and <3° attitude matching on a walking platform. Quantitatively, the full method reduces docking RMSE by 42% relative to the model-based CFNTSM + GFA-FEO controller without residual RL (4.2 vs. 7.2 mm) and reduces post-lock takeoff RMSE by 63% through FEO hot-switching (16.2 vs. 44.2 mm).

1. Introduction

Disaster response, post-earthquake reconnaissance, hazardous industrial inspection, and large-scale geographic surveys all require the terrain-traversing capability of legged robots and the three-dimensional mobility of unmanned aerial vehicles (UAVs). Recent air–ground robotic systems exploit this complementarity for precision farming, collaborative perception, navigation, and cooperative localization [1,2,3,4]. However, most such systems treat aerial and ground robots as independent cooperative agents rather than integrated bodies that must physically dock, lock, and relaunch. An attractive operational concept involves a quadrotor that carries a miniature wheel-legged robot over obstacles, deploys it with precision, and subsequently re-docks and retrieves it (Figure 1). During inactive phases, the UAV remains stowed on the robot’s back with minimal power consumption.
Our system pairs a foldable quadrotor with a compact wheel-legged robot (back surface 300 × 180  mm). Docking is achieved via electro-permanent magnets (EPMs): the robot’s back carries a steel plate, while the UAV’s underside mounts four corner EPMs. A foldable airframe is essential because (i) the deployed propeller span (∼420 mm) far exceeds the 180 mm back width, and protruding blades would collide with swinging legs; (ii) the folded UAV forms a compact backpack with minimal drag; and (iii) a reduced footprint lowers collision risk in confined spaces.
This scenario introduces four tightly coupled challenges that distinguish it from conventional UAV landing settings:
1.
Tiny landing surface with gait-induced vibration. The landing surface is extremely compact (EPM tolerance ±30 mm), while the trot gait at 2.5  Hz produces ±5 mm vertical and ±2.0°/±1.5° roll/pitch oscillations; the robot walks at 0.3–0.8 m/s throughout docking.
2.
Surface docking, not point tracking. The UAV must match a rigid, tilting surface—not merely track a point—with tolerances Δ p x y   < 10  mm, | Δ ϕ | , | Δ θ | < 3 ° , and | z ˙ | < 0.1  m/s (Section 2.5).
3.
Fold-induced control-authority loss. Folding substantially reduces motor moment arms and introduces transient zero-crossing singularities, degrading attitude authority precisely when millimetre-level alignment is required (Section 2.2.2).
4.
Payload-adaptive takeoff with mass doubling. At EPM lock the coupled mass approximately doubles, the centre of mass shifts, and the inertia tensor changes abruptly while the arms must unfold to restore authority.
Prior work has addressed several parts of this problem. Air–ground robot teams have been studied for collaborative perception, occlusion-aware navigation, GNSS-challenged localization, formation control, and task allocation [1,2,3,4,5,6,7,8]. UAV landing on moving platforms has also been investigated using visual servoing, visual–inertial guidance, shipboard landing, and MPC-based methods [9,10,11,12]. These studies provide the cooperative and target-tracking background, but they usually assume loose air–ground coupling or comparatively large rigid landing areas rather than a small gait-excited robot back that must become mechanically locked to the UAV.
Aerial manipulation, grasping, and perching show how contact can extend the role of UAVs beyond free flight [13,14,15,16]. In parallel, morphing and foldable UAVs demonstrate that mechanical reconfiguration can improve access, agility, or mission versatility while introducing configuration-dependent dynamics [17,18,19,20,21,22,23,24,25]. However, these works mainly address branches, walls, gaps, perching, or free-flight morphing; they do not link fold-angle-dependent moment arms to surface docking tolerances and post-lock coupled-body dynamics.
Learning-enhanced UAV control and simulation have recently improved agile flight, residual compensation, and robust tracking under disturbances [26,27,28,29,30,31,32,33,34,35,36,37,38]. Yet existing learning and observer designs rarely condition residual authority simultaneously on fold angle, gait state, and mission phase, and they do not address the observer reset caused by a sudden mass/inertia jump after EPM locking. The missing link is therefore a unified formulation for a foldable quadrotor that must dock on a small vibrating walking-robot’s back and then take off with the robot as payload.
These gaps motivate four objectives: (G1) precision surface docking with fold-induced moment-arm degradation; (G2) exploit periodic gait structure jointly with fold-angle-dependent authority; (G3) physically conditioned, phase-aware residual RL with an adaptive trust weight; (G4) payload-adaptive takeoff after EPM locking under an abrupt mass/inertia change.
The main contributions are:
1.
Fold-induced coupling analysis and surface-docking formulation (G1). We derive the closed-form dependence of B ( α ) and J ( α ) on fold angle  α , revealing how a 121.5 ° shuriken fold degrades σ 2 ( B ) by  48 % and transiently drives motor moment arms through zero. To the best of our knowledge, this is the first fold-angle-dependent surface-docking model for a foldable quadrotor landing on a walking platform.
2.
Staged CFNTSM with GFA-FEO (G2). A gait-frequency-aware finite-time extended observer fuses velocity differencing, ESO, and a periodic internal model tuned to f g ; the CFNTSM operates with FSM-scheduled gains and activates enhanced-gain mode during terminal descent.
3.
FiLM-SAC residual RL with adaptive λ ( t ) (G3). A residual policy conditioned via FiLM on fold state, gait parameters, and mission phase is modulated by an adaptive trust weight λ ( t ) based on tracking error, observer residual, and FSM transition signals, extending fixed-weight residual formulations.
4.
Payload-adaptive takeoff with FEO hot-switching (G4). Upon EPM locking, an FEO hot-switch protocol updates the plant model and resets observer states; the FiLM-SAC residual bridges the re-convergence blind window for stable takeoff from the stowed state.
The rest of this paper is organised as follows. Section 2 derives the dynamic model. Section 3 presents the three-layer controller and FSM. Section 4 details PSO optimisation. Section 5 reports simulation results. Section 6 discusses findings and limitations. Section 7 concludes.

2. System Description and Dynamic Modelling

This section presents the hardware configuration of the air–ground cooperative system, derives the folding kinematics and fold-dependent moment arms of the single-servo mechanism, and establishes the dynamic models for the foldable quadrotor, the walking-legged robot, and the coupled body formed upon docking.
The model equations are derived by combining geometric mechanism closure with rigid-body dynamics. The folding linkage is first reduced to a planar four-bar loop-closure problem, yielding Freudenstein’s relation and the fold angle  α as the mechanism configuration variable. The motor locations are then obtained from the arm-pivot geometry, and the control moment arms are derived from the cross product τ = r × F . The quadrotor translation and rotation equations follow the Newton–Euler formulation in the body frame, with  B ( α ) and J ( α ) updated from the fold geometry. The walking platform is represented by a kinematic gait-excitation model whose dominant frequencies are set by stride length and walking speed, while the post-docking body is obtained by mass aggregation and the parallel-axis theorem. Ground effect is included as an empirical near-surface thrust correction during docking and takeoff.

2.1. System Overview and Coordinate Frames

The system comprises two agents: a foldable quadrotor ( m u = 2.50  kg) and a wheel-legged robot ( m d = 2.50  kg, back surface 300 × 180  mm). The folding mechanism and hardware configuration are detailed in Section 2.2; Figure 2 shows the prior dual-servo design alongside the proposed crank-rocker design for direct comparison.
Three right-handed coordinate frames are used throughout (Figure 3). The world frame { W } is Earth-fixed with the z-axis pointing upward (ENU convention). The body frame { B } has its origin at the UAV centre of mass, z B along the thrust axis, x B pointing toward arm 1 (front-right in the deployed X-configuration); Euler angles η = [ ϕ , θ , ψ ] T (roll, pitch, yaw) describe the rotation from { W } to { B } via the ZYX convention. The dog frame { D } is centred on the robot’s back surface, with x D along the walking direction and z D normal to the back surface.

2.2. Single-Servo Crank-Rocker Folding Mechanism

The folding mechanism supplies the configuration variable used by the allocation and inertia models. The single-servo architecture defines the geometric constraints, the four-bar loop closure maps servo motion to fold angle, and the resulting motor positions determine the moment arms available to the controller.

2.2.1. Mechanism Design

The folding mechanism adopts the single-servo four-bar crank-rocker architecture demonstrated by Tuna et al. [22], who validated a 0.6 s fold/deploy cycle in the FOLLY quadcopter. However, FOLLY’s flight experiments were conducted exclusively in the fully deployed configuration; the effects of in-flight morphing on dynamics and control authority were not investigated. The present work fills this gap by deriving the fold-angle-dependent allocation matrix  B ( α ) and embedding  α into both the dynamic model and the control design.
The airframe uses a 150 × 150  mm square base plate with four arms ( L a = 116  mm) attached at corner pivots. A central servo drives a 18/62-tooth gear pair ( n g = 3.44 : 1  reduction), a cross-shaped crank ( r c = 45  mm), and four connecting rods ( l r = 75.93  mm) that synchronise all arms. Compared to the prior dual-servo design (Figure 2a), the single-servo mechanism reduces the folded body footprint from 88,500 mm2 to 45,546 mm2 (−48.5%), the maximum linear dimension from 590 mm to 309 mm (−47.6%), and eliminates front/rear servo synchronisation. The n g = 3.44 : 1  gear reduction is selected primarily to preserve torque margin for the 2.50  kg airframe and the four-arm linkage, rather than to pursue faster morphing than FOLLY. In the commanded fold/deploy profile used by the simulations, the servo traverses 205 ° over T fold = 0.6  s, so the mechanism retains the same nominal deployment-time benchmark while trading excess speed for higher output torque at the crank. The controller uses the measured/commanded fold angle α ( t ) rather than assuming instantaneous deployment; if hardware load tests require a slower profile, the same model remains valid after updating T fold .

2.2.2. Kinematics, Fold Geometry, and Moment Arms

Each arm’s motion is governed by a planar four-bar linkage (ground link G = 88.4  mm, crank r c = 45  mm, coupler l r = 75.93  mm, output link d e = 32.16  mm). The loop-closure yields Freudenstein’s relation:
K 1 cos ϕ K 2 cos θ c + K 3 = cos ( θ c ϕ ) ,
with K 1 = G / r c , K 2 = G / d e , K 3 = ( d e 2 l r 2 + r c 2 + G 2 ) / ( 2 r c d e ) . The mechanism satisfies the Grashof condition ( d e + G < r c + l r by 0.3 mm). Because this margin is small, the prototype does not rely on the Grashof inequality alone for mechanical robustness. The practical design uses a finite mechanical end-stop, clearance at the pin joints, and a slew-limited servo command to keep the linkage away from a high-speed toggle crossing. Thermal expansion over the expected indoor/outdoor operating range is much smaller than the link lengths and is absorbed by the joint clearances, while any rise in fold-servo current or a mismatch between commanded and measured α can be used as a jamming indicator in the hardware implementation. A crank rotation of 59.55 ° (servo travel 205 ° , 76% utilisation) drives each arm through Δ α = 121.5 ° : from the deployed X-configuration ( α = 45 ° ) through a transient +-pattern ( α 0 ° ) to the fully folded shuriken at α fold = 76.5 ° (Figure 4). The propeller-tip bounding box contracts from 420 × 420  mm to 309 × 309  mm (−45.6% area); the folding-body envelope reaches 213 × 213  mm when folding propellers are considered. The mechanical end-stop is set by motor-housing–pivot-shaft contact at α = 76.6 ° .
The motor positions and moment arms are computed directly from the arm-pivot geometry. Arm k pivots at P k on the 125 × 125  mm pivot square and carries its motor at radius  L a . The motor position is:
p m , k ( α k ) = P k + L a cos α k sin α k ,
where a f = 125 / 2 = 62.5  mm. The roll and pitch moment arms follow from τ = r × F :
l ϕ , k ( α k ) = a f , y , k + L a sin α k , l θ , k ( α k ) = ( a f , x , k + L a cos α k ) .
At deployment ( α = 45 ° ) all motors have equal moment arms | l ϕ , k | = 144.8  mm; at full fold the average drops to ∼70 mm ( 52 % ), and individual motors transiently pass through zero moment arms at intermediate angles ( α 33 ° ), creating control-authority singularities that necessitate the gain-scheduled controller of Section 3.5 (Figure 5).

2.3. Foldable Quadrotor Dynamics

The following standard assumptions are adopted for the foldable-quadrotor model:
  • A1. The airframe (central plate, arms, and motors) is rigid; structural deformation during folding is negligible.
  • A2. Roll and pitch angles are bounded in ( π / 2 , π / 2 ) , and the yaw angle is bounded in ( π , π ) .
  • A3. External disturbances (wind, ground effect, and gait coupling) and their time derivatives are bounded: d ext d ¯ , d ˙ ext d ¯ 1 , τ ext τ ¯ , τ ˙ ext τ ¯ 1 .

2.3.1. Thrust, Attitude, and Translational Dynamics

Each rotor k ( k = 1 , , 4 ) spins at angular speed Ω k and produces a thrust force and a reactive drag torque along z B :
f k = k f Ω k 2 , τ d , k = k τ Ω k 2 ,
where k f and k τ are the thrust and torque coefficients, respectively. The total thrust is F = k = 1 4 f k . The torque-to-thrust ratio is c τ = k τ / k f . Rotors 1 and 3 spin clockwise (CW), rotors 2 and 4 counter-clockwise (CCW); their spin-direction sign is σ k { 1 , + 1 , 1 , + 1 } , so that the net yaw torque vanishes in hover.
For the attitude kinematics, let η = [ ϕ , θ , ψ ] T denote the ZYX Euler angles (roll, pitch, yaw) from { W } to { B } . The corresponding rotation matrix R S O ( 3 ) that maps body-frame vectors to the world frame is:
R ( η ) = c θ c ψ s ϕ s θ c ψ c ϕ s ψ c ϕ s θ c ψ + s ϕ s ψ c θ s ψ s ϕ s θ s ψ + c ϕ c ψ c ϕ s θ s ψ s ϕ c ψ s θ s ϕ c θ c ϕ c θ ,
where c ( · ) = cos ( · ) and s ( · ) = sin ( · ) .
The relation between the Euler–angle rates η ˙ and the body angular velocity ω = [ p , q , r ] T  is:
η ˙ = W 1 ( η ) ω ,
with
W 1 = 1 s ϕ t θ c ϕ t θ 0 c ϕ s ϕ 0 s ϕ / c θ c ϕ / c θ ,
where t θ = tan θ . Note that W 1 is singular at θ = ± π / 2 , which is excluded by Assumption A2.
For translation, let p = [ x , y , z ] T denote the UAV centre-of-mass position in { W } . The translational dynamics are:
m p ¨ = R ( η ) F e 3 m g e 3 + d ext ,
where m is the (possibly time-varying) system mass, F the total thrust from (4), R the rotation matrix (5), and  d ext the external disturbance satisfying Assumption A3.
Since the arms rotate in the horizontal plane, all propellers remain vertical. Thrust efficiency remains constant ( η T = 1.0 ) for all fold angles α , indicating that folding does not compromise vertical thrust generation.

2.3.2. Rotational Dynamics with Variable Moment Arms

Defining the angular-momentum vector G ( α ) = J ( α ) ω , the rotational dynamics follow from Euler’s equation in the body frame:
G ˙ ( α ) = J ( α ) ω ˙ + J ˙ ( α ) ω = ω × J ( α ) ω + τ ctrl + τ G + τ D + τ ext ,
where the terms on the right-hand side are, respectively, the gyroscopic cross-coupling, the control torque from differential thrust, the rotor gyroscopic torque, the aerodynamic drag torque, and the external disturbance torque.
Remark 1. 
When the fold angle α varies in flight, the inertia tensor changes with time and J ˙ 0 . In the proposed docking manoeuvre, the folding motion is scheduled during the early descent phase and completed before the terminal capture window, so that α ˙ = 0 during the critical landing segment and J ˙ = 0 . When folding does occur in flight (e.g., during cruise), the ratio J ˙ ω / J ω ˙ is small because the fold transition (≈0.6 s) is much slower than the rotational dynamics; this term is therefore absorbed into τ ext and handled by the observer (Section 3).
Modelling each arm-plus-motor assembly as a point mass m a at position p m , k from (2), the inertia tensor about the body-frame origin is:
J ( α ) = J 0 + k = 1 4 m a p m , k ( α ) 2 I 3 p m , k ( α ) p m , k T ( α ) ,
where J 0 is the central-body inertia (batteries, electronics, frame plate) and m a is the lumped mass of one arm assembly (arm tube, motor, propeller, ESC). Owing to the four-fold shuriken symmetry ( α k = α 1 + ( k 1 ) · 90 ° ), the products of inertia vanish for all α and J ( α ) remains diagonal:
J ( α ) = diag J x x ( α ) , J y y ( α ) , J z z ( α ) ,
with J x x = J y y (roll and pitch inertias are equal) due to the 90 ° rotational symmetry. Since all arm masses lie in the z B = 0 plane, J z z ( α ) = J x x ( α ) + J y y ( α ) by the perpendicular-axis theorem and likewise varies with  α .
The control torque τ ctrl is produced by differential thrust via the allocation matrix:
τ ctrl ( α ) = B ( α ) f , f = [ f 1 , f 2 , f 3 , f 4 ] T ,
where the allocation matrix B ( α ) encodes the α -dependent moment arms from (3):
B ( α ) = y m , 1 y m , 2 y m , 3 y m , 4 x m , 1 x m , 2 x m , 3 x m , 4 σ 1 c τ σ 2 c τ σ 3 c τ σ 4 c τ ,
with c τ = k τ / k f the torque-to-thrust ratio and σ k { 1 , + 1 , 1 , + 1 } the propeller spin-direction sign (rotors 1, 3 CW; rotors 2, 4 CCW).
The reactive gyroscopic torque due to spinning rotors is:
τ G = k = 1 4 J r ( ω × e 3 ) σ k Ω k ,
where J r is the rotor moment of inertia about its spin axis. This term is small in near-hover but becomes non-negligible during aggressive attitude manoeuvres or rapid motor speed changes.
The body drag force from rotor downwash and the associated drag torque are modelled as:
τ D = k d diag ( 1 , 1 , 1 ) ω ,
where k d > 0 is the lumped aerodynamic drag coefficient.
As α changes, B ( α ) varies continuously, introducing roll–pitch cross-coupling that compounds the moment-arm degradation analysed in Section 2.2.2.

2.3.3. Ground Effect and Model Parameters

During the terminal descent ( h < 0.3  m above the robot’s back) and at takeoff, the thrust is amplified by ground effect. We adopt the Cheeseman–Bennett model:
K GE ( h ) = 1 1 ( R p / 4 h ) 2 ,
where h is the distance from the propeller disc to the nearest ground surface (or robot back surface). As h R p / 4 , K GE , producing strong thrust perturbations in the final approach. During near-surface flight, the effective thrust in (8) is replaced by F eff = K GE ( h ) F .
The model parameters in the preceding equations are fixed before controller design so that the control and optimisation sections can be interpreted against a single physical platform. Table 1 summarises the key physical and aerodynamic values used in the dynamic model, including the deployed and folded inertia values that drive the fold-aware gain scheduling.

2.4. Walking Robot Platform Model

The walking robot contributes a moving, periodically tilted landing surface. For docking control, the dominant effects are the stride-driven heave, roll, and pitch components at the robot’s back. The gait frequency is therefore estimated from robot geometry and walking speed, then mapped into the 6-DoF pose of the landing surface.

Gait-Induced Surface Pose Model

This model converts the walking speed of the ground robot into the time-varying pose of the landing surface. The first step estimates the dominant gait frequency, and the second maps this frequency into the heave, roll, and pitch components tracked by the UAV controller.
The gait frequency serves as the fundamental excitation frequency for the GFA-FEO internal model. Estimating it from robot walking speed and leg-defined stride length enables changes in robot velocity to be mapped directly into the observer harmonics.
The trot stride length is set by the leg geometry and dynamic similarity [39]:
l stride l leg = L thigh + L shank ,
yielding:
f g = v l stride = v l leg .
For the target robot ( L thigh = 0.10  m, L shank = 0.10  m, nominal v = 0.5  m/s): f g = 0.5 / 0.20 = 2.5  Hz. The operating range v [ 0.3 , 0.8 ]  m/s maps to f g [ 1.5 , 4.0 ]  Hz.
The UAV controller tracks the landing surface rather than the ground robot centre. Therefore, the gait excitation is added to the nominal dog pose at the back-surface frame, producing the time-varying position and roll–pitch attitude that define the docking reference.
The landing surface centre position and attitude are:
p land ( t ) = p dog ( t ) + R dog [ 0 , 0 , h back ] T ,
η land ( t ) = η dog ( t ) + Δ η gait ( t ) ,
where Δ η gait is the gait-induced oscillation:
Δ η gait ( t ) = A ϕ sin ( 2 π ( f g / 2 ) t + φ ϕ ) A θ sin ( 2 π f g t + φ θ ) 0 , Δ z gait ( t ) = A z sin ( 2 π · 2 f g · t + φ z ) .
Typical amplitudes: A ϕ 1.5 ° (roll, at  f g / 2 ), A θ 2.0 ° (pitch, at  f g ), A z 5  mm (heave, at  2 f g ).

2.5. Docking Interface and Coupled-Body Transition

The docking interface links the pre-lock capture condition to the post-lock dynamic model. Before the EPM is energised, the UAV must enter the capture window in position, attitude, and vertical contact speed. After magnetic locking, the vehicle and ground robot are treated as a rigid coupled body for payload-adaptive takeoff.
The EPM array can only capture the steel plate if position, attitude, and contact velocity are simultaneously within tolerance. These physical limits define the docking event used by the FSM and by the simulation success metrics.
A docking attempt is deemed successful if and only if, at the instant of EPM activation, all of the following hold simultaneously:
p UAV , x y p land , x y < 10 mm , | ϕ UAV ϕ land | < 3 ° , | θ UAV θ land | < 3 ° , | z ˙ contact | < 0.1 m / s , | ϕ ˙ | , | θ ˙ | 0 rad / s .
After EPM locking, the UAV no longer behaves as the original 2.50 kg airframe. The takeoff model uses a rigid coupled body whose mass doubles, whose centre of mass shifts toward the ground robot, and whose inertia tensor is updated by the parallel-axis theorem.
Upon EPM locking, the UAV and robot form a rigid coupled body:
m c = m u + m d = 2.50 + 2.50 = 5.00 kg ( twofold mass increase ) ,
p CoM = m u p u + m d p d m c ,
J c = J u ( α ) + J d + m d d 2 I 3 d d T ,
where d = p d p CoM and the parallel-axis theorem is applied. The translational and rotational dynamics (8) and (9) remain valid with the substitutions ( m , J ) ( m c , J c ) .

2.6. Experimental Prototype Platform

To support the proposed system concept beyond simulation, a full-scale hardware prototype has been designed and assembled. The platform comprises two agents and the resulting hardware layout, including the sensing, compute, folding, propulsion, and ground-robot communication modules, is summarised in Figure 6.

2.6.1. Foldable Quadrotor (Aerial Segment)

The airframe is a custom-designed multi-layer foldable frame that implements the crank-rocker mechanism of Section 2.2 with four 2810-1300 KV BLDC motors (Epower, Shanghai, China), a 70 A 4-in-1 ESC (MicroAir Technology, Shenzhen, Guangdong, China), and a 20 kg digital servo (Xinhui Power Technology, Shenzhen, Guangdong, China) driving the fold linkage via a custom-built 5 V UBEC (Nanjing Tech University, Nanjing, China). The avionics stack is organised in four tiers: Level 1 houses a 6S 5300 mAh 75C LiPo battery (AHTECH, Xuzhou, Jiangsu, China) and an Intel RealSense D415 depth camera (Intel Corporation, Santa Clara, CA, USA). Level 2 carries the foldable frame with gear–crank–connecting-rod mechanism, BLDC motors, servo, and 5 V UBEC. Level 3 hosts an NVIDIA Jetson Orin Nano (NVIDIA Corporation, Santa Clara, CA, USA) running the proposed CFNTSM + GFA-FEO + FiLM-SAC stack and a MicroAir743v2 flight controller (MicroAir Technology, Shenzhen, Guangdong, China; PX4 firmware v1.16.0), communicating via USB-C; the Jetson also drives the fold servo over a PWM signal line. Level 4 mounts a Unitree L2 4D LiDAR (Unitree Robotics, Hangzhou, China) powered by a dedicated custom-built 12 V UBEC (Nanjing Tech University, Nanjing, China). A separate custom-built UPS module (12 V; Nanjing Tech University, Nanjing, China) provides independent power to the Jetson. The docking interface uses the custom-built EPM array (Nanjing Tech University, Nanjing, China) mating with a custom-fabricated Q235 galvanised-steel plate (Nanjing Tech University, Nanjing, China) on the robot’s back.

2.6.2. Wheel-Legged Robot Dog (Ground Segment)

The ground platform is a custom-built quadruped robot (Nanjing Tech University, Nanjing, China) with hybrid wheel–leg locomotion. A custom-built STM32-based main board (Nanjing Tech University, Nanjing, China) coordinates all actuators and communicates with a custom-built handheld controller (Nanjing Tech University, Nanjing, China) via a custom-built ESP32 radio module (Nanjing Tech University, Nanjing, China). Each leg is driven by a crank-rocker linkage with two bus servos (Hiwonder Technology, Shenzhen, Guangdong, China) ( 4 × 2 ), while four DC TT geared motors (Hongyu Electronic Technology, Shenzhen, Guangdong, China) ( 4 × 1 ) on each wheel enable continuous wheeled travel. An onboard IMU streams gait orientation data to the UAV through the ESP32 wireless link, providing the GFA-FEO with the prior gait-frequency estimate  f ^ g . Power is supplied by a custom-built rechargeable 3 × 18,650 battery module (Nanjing Tech University, Nanjing, China). This bi-directional data link ensures that the aerial controller receives real-time gait phase information without relying on an external motion-capture system.

2.6.3. Relative-Pose Sensing Pipeline

The controller requires the relative pose Δ x u d = ( Δ p , Δ η ) between the UAV and the robot back during the final docking phases. In the simulation experiments, this quantity is provided to the controller as a noisy relative-pose measurement; the noise level is consistent with the sensor-noise floor used in the performance-ceiling analysis ( σ p = 0.5  mm per position channel, Section 3.7). For the assembled hardware prototype, the intended onboard pipeline separates the visual and LiDAR front ends and fuses them at the state-estimation layer. VINS-Fusion is a suitable representative visual–inertial front end for the D415/IMU stream, whereas the FAST-LIO family and LIO-SAM are suitable LiDAR–inertial front ends for the Unitree L2/IMU stream [40,41,42]. They are therefore not claimed to be single camera–LiDAR fusion algorithms; instead, their odometry and registration residuals are transformed into a common robot-deck/UAV frame using offline extrinsic calibration and time synchronisation. At long and medium range, the Unitree L2 point cloud provides a coarse robot/deck pose through plane extraction and ICP-based registration to the known landing-plate geometry. At short range, the RealSense D415 refines the estimate using depth-assisted fiducial/edge detection around the EPM plate; this vision update is used for the final M 4  dock entry test. The dog IMU and ESP32 link provide a roll/pitch and gait-phase prior, and an EKF (or equivalent factor-graph estimator) fuses the VIO increments, LiDAR–inertial odometry, LiDAR registration residuals, depth-camera pose residuals, UAV inertial data, and gait prior into the relative pose and covariance used by the FSM.
The M 4 transition is therefore gated by both the estimated pose and its covariance: if the lateral uncertainty exceeds the millimetre-level docking margin or the attitude uncertainty approaches the 3 ° capture limit, the FSM remains in align or descend instead of energising the EPM. The present study reports simulation-level validation of the controller under the above measurement-noise model: implementing and experimentally validating the full onboard fusion stack is part of the planned hardware tests. The conversion from sensing outputs to docking/takeoff control inputs is detailed in Figure 7.

3. Hierarchical Control Design

This section presents the three-layer hierarchical controller for precision docking and payload-adaptive takeoff. The overall architecture adopts a position–attitude dual-loop cascade: the outer (position) loop generates desired attitude commands, and the inner (attitude) loop produces torque commands that are mapped to individual motor thrusts via a fold-aware mixer.
Within each loop, three complementary control layers operate in parallel (Figure 8): (1) a gait-frequency-aware finite-time extended observer (GFA-FEO) that cancels the dominant periodic disturbance at f g and its harmonics; (2) a continuous fast nonsingular terminal sliding-mode controller (CFNTSM) with α -scheduled gains for broadband disturbance rejection and finite-time convergence; and (3) a FiLM-SAC residual RL policy conditioned on fold angle, gait parameters, and mission phase, with an adaptive trust weight  λ ( t ) . A finite-state machine (FSM) orchestrates the mission by switching controller parameters, observer gains, and the RL trust weight across eight operational phases.

3.1. Controller Interaction and Coordination

The three controller components are coordinated as a parallel compensation structure under FSM supervision, rather than as independent methods that are activated in isolation. At each control step, the fold-angle-dependent plant model provides g ( α ) , B ( α ) , and  J ( α ) to both the CFNTSM law and the fold-aware allocation layer. The GFA-FEO estimates the lumped disturbance in the same standardised plant coordinates and passes d ^ to the CFNTSM law as a feedforward compensation term. CFNTSM then computes the stabilising nominal input, while the FiLM-SAC actor receives the tracking errors, observer residuals, fold state, gait estimate, and FSM phase and outputs only a bounded residual correction after the model-based input has been formed.
The FSM coordinates transitions by updating three quantities together: observer bandwidth and internal-model gains, CFNTSM gain-scheduling multipliers, and the residual-RL trust bound λ ¯ ( M i ) . These updates are filtered or ramped at phase boundaries so that the commanded torque does not jump when the mission changes from approach to descent, docking, locking, stowing, or takeoff. Consequently, the model-based CFNTSM + GFA-FEO backbone remains active throughout the mission, while the learned residual is allowed to contribute mainly in phases where structured residual errors are expected, such as fold-through descent and the observer re-convergence window after EPM locking.
Figure 8 summarises this coordinated signal flow from reference generation through the GFA-FEO, CFNTSM, FiLM-SAC residual, fold-aware mixer, and folded-airframe plant.

3.2. Mission-Phase Finite-State Machine

The docking-and-takeoff mission is decomposed into eight sequential phases managed by a deterministic FSM. Each phase defines a distinct set of reference trajectories, controller gains, observer bandwidths, and RL trust-weight bounds. Table 2 lists all phases with their entry/exit conditions.
Each phase M i carries a parameter dictionary
Θ i = k 1 , p ( i ) , k 2 , p ( i ) , β p ( i ) , k 1 , η ( i ) , k 2 , η ( i ) , β η ( i ) , L p ( i ) , L η ( i ) , λ ¯ ( i )
that is loaded upon entry and held constant within the phase. Here, the subscripts p and η distinguish the position- and attitude-loop parameters, which are tuned independently (Section 4.1). Transitions between adjacent phases are irreversible in the nominal mission; an emergency abort returns to M 1 .
Three phases deserve particular attention. During descend ( M 3 ), the GFA-FEO switches to enhanced-gain mode (Section 3.4), the CFNTSM gains are boosted to compensate for moment-arm degradation during folding, and λ ¯ is elevated to allow larger residual-RL corrections. At lock ( M 5 ), the RL trust weight is ramped to zero ( λ 0 ) to prevent learned actions from disturbing the mechanical locking process. During takeoff ( M 7 ), the FEO parameters have already been hot-switched at the M 5 M 6 transition (Section 3.9) to accommodate the doubled mass; λ is temporarily elevated, but remains bounded, to bridge the observer re-convergence blind window ( T blind 0.2 0.3  s). If the observer residual or tracking error does not decay after this window, the FSM holds the vehicle in a conservative vertical-climb mode and prevents further mission progression until the residual falls below threshold.

3.3. Standardised Plant Model

Both the position and attitude loops are expressed in a unified second-order form. Define the generalised state x { p , η } and the corresponding tracking error e = x x ref . The error dynamics are:
e ¨ = f ( x , x ˙ ) + g ( α ) u + d ( t ) ,
where u is the control input, g ( α ) is the fold-angle-dependent control-effectiveness matrix, and  d ( t ) is the lumped disturbance satisfying d d ¯ and d ˙ d ¯ 1 (Assumption A3).
For the position loop, from (8) with e p = p p land :
e ¨ p = F m R e 3 g e 3 p ¨ land + 1 m d ext ,
where F is the total thrust, R the rotation matrix (5), and  p ¨ land includes the gait-induced acceleration of the landing surface. The position-loop lumped disturbance is d p = p ¨ land + d ext / m + ( K GE 1 ) F R e 3 / m , incorporating the reference acceleration, external wind, and ground-effect residual.
For the attitude loop, from (9) with e η = η η d (where η d comes from the outer loop):
e ¨ η = J 1 ( α ) ω × J ( α ) ω + B ( α ) f + τ G + τ D + τ ext η ¨ d .
Here the control-effectiveness matrix is g η ( α ) = J 1 ( α ) B ( α ) , which varies continuously with the fold angle and becomes asymmetric in the folded state (Section 2.2.2). The attitude-loop lumped disturbance is d η = J 1 [ ω × J ω + τ G + τ D + τ ext ] η ¨ d .
Unlike conventional quadrotor models where g is a constant diagonal matrix, g η ( α ) = J 1 ( α ) B ( α ) varies with α and is non-diagonal in the folded state due to the asymmetric moment arms. This α -dependence propagates to gain scheduling (Section 3.5) and the allocation inverse (Section 3.6).

3.4. Gait-Frequency-Aware Finite-Time Extended Observer

The GFA-FEO is designed to estimate the lumped disturbance d ( t ) in (26) with two key enhancements over a standard finite-time extended observer: (i) a periodic internal-model term tuned to the gait frequency f g and its harmonics, and (ii) an enhanced-gain mode activated during the descend phase.

3.4.1. Observer Structure

Consider the standardised plant (26) with state vector [ e , e ˙ , d ] T . The GFA-FEO introduces observer states [ e ^ , e ˙ ^ , d ^ ] and estimation errors e ˜ i = ( · ) ^ i ( · ) i :
e ^ ˙ = e ˙ ^ 1 L e ˜ 1 ε 1 , e ˙ ^ ˙ = g ( α ) u + d ^ + d ^ IM 2 L 2 e ˜ 1 2 ε 1 + e ˙ ^ vel , d ^ ˙ = 3 L 3 e ˜ 1 3 ε 1 + d ^ IM ,
where · γ denotes the signed fractional power x γ = | x | γ sign ( x ) , L > 0 is the observer bandwidth (tuned separately for the position and attitude loops as L pos and L att to account for their different noise environments and disturbance spectra), 1 , 2 , 3 > 0 are design gains, and ε 1 ( 0 , 1 / 2 ) is the fractional exponent satisfying the finite-time convergence condition [43]. The combined disturbance estimate d ^ + d ^ IM is clamped to [ d ^ max , d ^ max ] per channel, where d ^ max > 0 is a saturation bound that prevents transient over-compensation during observer initialisation or sudden disturbance transitions.
The velocity-error correction term e ˙ ^ vel exploits the measured velocity (from state estimation or optical flow) to accelerate convergence:
e ˙ ^ vel = μ v ( e ˙ ^ e ˙ meas ) , μ v > 0 .
Both the position and velocity measurements fed to the observer are first passed through a discrete first-order low-pass filter y k = α f x k + ( 1 α f ) y k 1 , where α f = Δ t / ( Δ t + 1 / ( 2 π f c ) ) . The position-channel cutoff is f c , p = 30  Hz and the velocity-channel cutoff is f c , v = 60  Hz, chosen to attenuate sensor noise above the gait-harmonic band ( 2 f g = 5  Hz) while preserving sufficient phase margin for the observer and control loops.

3.4.2. Gait-Frequency Internal Model

A standard FEO treats the disturbance as wideband and estimates it through high-gain injection. However, the gait-induced disturbance is dominated by three narrow-band components at f g / 2 , f g , and  2 f g (Section 2.4). A wideband observer achieves zero steady-state error only for constant disturbances; for sinusoidal signals at frequency ω n , a phase lag ω n / L persists unless an internal model at that frequency is embedded.
The internal-model augmentation generates a feedforward correction in the disturbance channel:
d ^ IM = n H a n sin ( 2 π n f g t ) + b n cos ( 2 π n f g t ) ,
where H = { 1 2 , 1 , 2 } indexes the three gait harmonics. The Fourier coefficients a n , b n are adapted online via:
a ˙ n = γ n e ˜ sin ( 2 π n f g t ) σ IM a n , b ˙ n = γ n e ˜ cos ( 2 π n f g t ) σ IM b n ,
with adaptation rate γ n > 0 . The σ -modification terms σ IM a n and σ IM b n ( σ IM > 0 small) provide a leakage that prevents unbounded growth of the Fourier coefficients when the gait frequency drifts or the observer operates in transient conditions, at the cost of a small steady-state bias that is negligible for σ IM γ n . By the internal-model principle, the closed-loop observer achieves exact asymptotic cancellation of the gait-periodic component, regardless of the observer bandwidth L—a property unattainable by the standard FEO alone.
Figure 9 compares the disturbance estimation error | d ˜ / d | ( j ω ) for the standard FEO and the proposed GFA-FEO. The internal-model terms create deep notches at f g / 2 , f g , and  2 f g (the three dominant gait harmonics), reducing the estimation error by over 30 dB relative to the standard observer at those frequencies. In enhanced-gain mode ( L ( M 3 ) = 2 L ), the entire curve shifts downward, further improving broadband disturbance rejection during the critical descent phase.
When the gait frequency is not known a priori (e.g., the robot changes speed), f g is estimated from the dog-frame IMU by peak detection on the pitch-rate spectrum and communicated to the UAV at 10 Hz.

3.4.3. Enhanced-Gain Mode

The observer must be more aggressive during terminal descent than during approach because the landing surface motion is both closer and more consequential: a few millimetres of phase lag can decide whether the EPM array enters the capture window. Enhanced-gain mode is thus implemented as an FSM-triggered temporary increase in observer bandwidth and internal-model adaptation, rather than as a globally large gain that would amplify noise throughout the mission.
During the descend phase ( M 3 ), the UAV approaches the vibrating surface, and the gait-induced disturbance amplitude grows. Simultaneously, the arm folding degrades moment arms, reducing the disturbance-rejection bandwidth. To counteract both effects, the FSM triggers an enhanced-gain mode:
L ( M 3 ) = κ L L ( M 2 ) , γ n ( M 3 ) = κ γ γ n ( M 2 ) ,
where κ L > 1 and κ γ > 1 are boost factors. The transition is smoothed by a first-order filter L ( t ) = L ( M 2 ) + ( L ( M 3 ) L ( M 2 ) ) ( 1 e t / τ L ) to avoid observer transients.

3.4.4. GFA-FEO Convergence

Theorem 1. 
Consider the standardised plant (26) with bounded disturbance d ˙ d ¯ 1 . The GFA-FEO (29)–(32) guarantees that the observation errors e ˜ , e ˙ ˜ , and d ˜ converge to a neighbourhood of the origin in finite time T obs . Furthermore, the steady-state disturbance estimation error for any component at frequency n f g , n H , satisfies d ˜ n f g 0 as t .
Proof. 
Define the scaled estimation errors ξ 1 = L 1 e ˜ , ξ 2 = L 2 e ˙ ˜ , ξ 3 = L 3 d ˜ . Under the coordinate transformation, the error dynamics become:
ξ ˙ i = L ξ i + 1 i ξ 1 1 i ε 1 + δ i , i = 1 , 2 ,
with ξ 4 = 0 and δ i containing the internal-model terms and d ˙ / L 3 . Consider the Lyapunov function:
V obs = i = 1 3 1 2 ξ i T ξ i .
Its time derivative satisfies (following the homogeneous-domination approach of [43]):
V ˙ obs c 1 L V obs 1 ε 1 / 2 + c 2 d ¯ 1 L 3 ,
where c 1 , c 2 > 0 depend on 1 , 2 , 3 . For sufficiently large L, the first term dominates, yielding finite-time convergence to the ball ξ ( c 2 d ¯ 1 / ( c 1 L 4 ) ) 1 / ( 2 ε 1 ) .
For the periodic component, the analysis proceeds in the regime t > T obs where e ˜ ξ ¯ is already small. Define a ˜ n = a ^ n a n , b ˜ n = b ^ n b n where a n , b n are the true Fourier coefficients. Consider the augmented Lyapunov function V IM = n 1 2 γ n ( a ˜ n 2 + b ˜ n 2 ) . Substituting the adaptation law (32):
V ˙ IM = n a ˜ n T e ˜ sin ( 2 π n f g t ) n b ˜ n T e ˜ cos ( 2 π n f g t ) .
The gait frequencies { n f g : n H } are distinct and non-resonant, so the sinusoidal regressors are persistently exciting. By the Barbalat–LaSalle argument (the integrand is uniformly continuous and the integral is bounded), a ˜ n and b ˜ n converge to zero as t , guaranteeing complete cancellation of the gait-periodic disturbance.    □

3.5. α -Scheduled CFNTSM Controller

The observer of Section 3.3 provides disturbance estimates, while the tracking law must preserve authority as folding reduces the rotor moment arms. The controller therefore combines a thrust–attitude cascade, a nonsingular terminal sliding surface, fold-angle gain scheduling, and a fold-aware control law.

3.5.1. Cascade Connection: Thrust–Attitude Extraction

The position-loop CFNTSM computes a desired inertial-frame acceleration a cmd R 3 (see (43) below); adding the gait-acceleration feedforward yields the total commanded acceleration a = a cmd + p ¨ land . To compensate for the phase delay introduced by the first-order motor dynamics ( τ m = 20  ms), the gait-acceleration feedforward p ¨ land is evaluated at the phase-advanced time t + τ m rather than the current time t, i.e., p ¨ land ( t ) p ¨ land ( t + τ m ) . This predictive shift reduces the steady-state vertical tracking bias caused by the lag between desired and realised thrust. The desired thrust magnitude and body-frame attitude are extracted via the standard differential-flatness relations:
F cmd = m a x 2 + a y 2 + ( a z + g ) 2 , ϕ des = arctan 2 a y , b , a z + g , θ des = arctan 2 a x , b , a y , b 2 + ( a z + g ) 2 ,
where a x , b = a x cos ψ + a y sin ψ and a y , b = a x sin ψ a y cos ψ are the yaw-rotated horizontal acceleration components. The desired Euler angles fed to the inner loop are then η d = [ ϕ des + ϕ gait , θ des + θ gait , ψ d ] T , where ϕ gait , θ gait are the gait-induced body-roll and pitch of the landing platform.
The inner-loop angular-velocity reference is ω ref = W ( η d ) η ˙ d , where W is the Euler-to-body-rate Jacobian and η ˙ d denotes the total rate of change of η d —including both the gait-rate contribution and the rate generated by the outer-loop acceleration command (obtained via numerical differentiation at the control rate, low-pass clipped at ±20 rad/s to prevent derivative spikes).
The sliding surface combines position or attitude error with a nonsingular terminal term in the error derivative. This choice keeps the finite-time convergence property of terminal sliding mode control while avoiding the singularity that would otherwise appear when e ˙ j approaches zero.
For each control channel of (26), the continuous fast nonsingular terminal sliding-mode (CFNTSM) surface is defined as:
s j = e j + β j e ˙ j p j / q j , j { x , y , z , ϕ , θ , ψ } ,
where β j > 0 is the surface parameter, and  p j , q j are positive odd integers satisfying 1 < p j / q j < 2 to ensure nonsingularity of the control law when e ˙ j = 0 .

3.5.2. Fold-Angle-Dependent Gain Scheduling

As α decreases from 45 ° (deployed) to 76.5 ° (folded), the effective moment arms shrink significantly (Section 2.2.2). If the controller gains remain fixed, the closed-loop bandwidth degrades proportionally, potentially destabilising the attitude loop in the folded state.
To compensate, the CFNTSM gains are scheduled as functions of the second singular value of the torque allocation matrix B R 3 × 4 :
k 1 , j ( α ) = k 1 , j nom · σ 2 ( B dep ) σ 2 ( B ( α ) ) + ϵ σ , k 2 , j ( α ) = k 2 , j nom · σ 2 ( B dep ) σ 2 ( B ( α ) ) + ϵ σ ,
where k 1 , j nom , k 2 , j nom are the nominal gains tuned for the deployed configuration, σ 2 ( B dep ) is the second singular value of the deployed allocation matrix (a constant reference that normalises the scheduling ratio to unity at α = 45 ° ), σ 2 ( B ( α ) ) is the current second singular value (computed in real time from the known α ), and ϵ σ > 0 is a small regularisation constant preventing division by zero at hypothetical rank-deficient configurations. The second singular value is chosen instead of the minimum ( σ 3 ) because σ 3 is dominated by the yaw channel (whose moment arm is an order of magnitude smaller than roll/pitch), and scheduling on σ 3 would yield excessively large gain ratios that destabilise the roll/pitch loops.
This scheduling ensures that the effective control authority experienced by the sliding-mode controller remains approximately constant across the entire fold range: as the physical moment arms shrink, the gains increase to maintain the same closed-loop bandwidth.
The regularisation constant ϵ σ also sets the practical trade-off between convergence speed and control-input limits. The nonsingular terminal surface itself is shaped by the exponent p j / q j in (39); the small “epsilon” used in the CFNTSM implementation is the gain-scheduling regulariser ϵ σ , distinct from the observer exponent ε 1 and the ultimate-bound radius ϵ s in the stability proof. From (40), the scheduled gains are upper-bounded by k 1 , j max = k 1 , j nom σ 2 ( B dep ) / ϵ σ and analogously for k 2 , j . A smaller ϵ σ , therefore, increases k 1 , j and k 2 , j when the folded configuration makes σ 2 ( B ( α ) ) small, which increases the finite-time convergence coefficient in (55) and reduces settling time. The cost is a larger requested u CFNTSM , j and a higher risk of motor-thrust clipping, noise amplification, or chattering near rank-deficient configurations. Conversely, a larger ϵ σ softens the gain amplification and keeps the allocated thrusts farther from actuator limits, but it slows the terminal convergence and leaves a larger residual error during fold-through descent. In implementation, ϵ σ is chosen so that the scheduled gains at α = 76.5 ° remain compatible with the damped pseudo-inverse allocation and the motor thrust bounds; the condition-number warning and thrust clipping in Section 3.6 provide the final input-limit safeguards.
Because the gait-induced disturbance amplitude grows with frequency, the gain-scheduling ratio is tuned at the nominal f g nom = 2.5  Hz can be overly aggressive at lower gait frequencies where less compensatory authority is needed. To prevent this, the reference singular value is scaled by a frequency-dependent factor:
σ 2 eff = σ 2 ( B dep ) · min 1 , f g f g nom ,
so that (40) becomes k 1 , j ( α , f g ) = k 1 , j nom σ 2 eff / [ σ 2 ( B ( α ) ) + ϵ σ ] . At f g = f g nom the factor is unity and the original schedule is recovered; at lower frequencies the effective ceiling is reduced proportionally, preventing over-amplified gains from degrading precision.
Additionally, the FSM applies a phase-specific multiplier:
k j ( M i ) ( α ) = ζ ( M i ) · k j ( α ) ,
where ζ ( M i ) 1 is a phase boost factor (e.g., ζ ( M 3 ) > 1 during descend for tighter tracking near the vibrating surface).

3.5.3. Control Law

The control input combines three terms: cancellation of the known channel dynamics, compensation of the GFA-FEO disturbance estimate, and finite-time feedback on the sliding variable. Written per standardised channel, the CFNTSM law is
u CFNTSM , j = g j 1 ( α ) f j + d ^ j + q j p j β j | e ˙ j | 2 p j / q j sign ( e ˙ j ) + k 1 , j ( α ) s j ρ 1 + k 2 , j ( α ) s j ,
where d ^ j is the GFA-FEO estimate from (29), f j is the known drift term, and  ρ 1 ( 0 , 1 ) provides the finite-time convergence characteristic. For the position loop, f j = 0 because the gravity term is absorbed into the thrust–attitude extraction (38). For the attitude loop, the known drift is the gyroscopic coupling
f j = J 1 ( α ) ω × J ( α ) ω j ,
which is computable in real time from the measured angular velocity and the known inertia tensor. The remaining nonlinear terms (drag torques τ D , gravity torques τ G , external disturbances τ ext , and the reference acceleration η ¨ d ) are absorbed into d η and estimated by the GFA-FEO.
The term g j 1 ( α ) cancels the fold-dependent control effectiveness; combined with the gain scheduling (40), this constitutes a double compensation for the moment-arm degradation—cancellation in the plant model and gain amplification for disturbance rejection.
Remark (non-diagonal attitude effectiveness). For the position loop, g j ( α ) is the scalar thrust-to-force effectiveness for each translational channel. For the attitude loop, g j ( α ) denotes the j-th diagonal element of g η ( α ) = J 1 ( α ) B ( α ) . As noted in Section 3.3, this matrix becomes non-diagonal in the folded state. The off-diagonal coupling terms are treated as part of the lumped disturbance d η under Assumption A3 and estimated by the GFA-FEO. This approximation is valid provided the off-diagonal entries remain smaller than the minimum diagonal gain, a condition satisfied in hardware for α 76.5 ° (verified numerically in Section 5).

3.5.4. Continuity at FSM Transitions

The FSM changes controller gains at phase boundaries; for example, when the controller switches from approach to descent or from docking to takeoff. Although these changes are physically motivated, applying them as instantaneous steps would inject artificial torque spikes into the allocation layer. The smoothing rule below preserves the intended phase-dependent gains while making the commanded input continuous.
When the FSM transitions from phase M i to M i + 1 , the gain parameters k j change discontinuously. To avoid control chattering, a first-order exponential filter smooths the transition:
k j ( t ) = k j ( M i ) + k j ( M i + 1 ) k j ( M i ) 1 e ( t t sw ) / τ sw ,
where t sw is the switching instant and τ sw > 0 is the smoothing time constant (typically 50–100 ms).

3.6. Fold-Aware Control Allocation

The CFNTSM attitude controller outputs a desired torque vector τ cmd R 3 (roll, pitch, and yaw). Together with the desired total thrust F cmd from the position loop, the full allocation problem is
1 1 1 1 B ( α ) M ( α ) R 4 × 4 f 1 f 2 f 3 f 4 = F cmd τ cmd .
In the deployed X-configuration, M is well-conditioned and its inverse is the classical symmetric mixer. As α decreases, M ( α ) becomes progressively ill-conditioned: the minimum singular value drops from σ ̲ = 0.116 at α = 45 ° to σ ̲ = 0.055 at α = 76.5 ° .
The fold-aware allocation computes the motor thrusts via the damped pseudo-inverse
f = M T ( α ) M ( α ) M T ( α ) + δ 2 I 4 1 F cmd τ cmd ,
where δ > 0 is a damping factor that prevents numerical blow-up near rank deficiency. The result is clipped to the feasible thrust range f k [ 0 , f max ] .
At each control step, σ ̲ ( M ( α ) ) is computed and compared against a threshold σ warn . If σ ̲ < σ warn , the controller limits the demanded torque magnitude to prevent actuator saturation, and a warning flag is set in the FSM.

3.7. Model-Based Performance Ceiling Analysis

Before introducing the learned residual layer, it is instructive to characterise the architectural performance limit of the model-based controller alone—i.e., the best tracking accuracy attainable by CFNTSM + GFA-FEO regardless of parameter tuning.
To quantify this ceiling, we conducted an exhaustive multi-phase parameter sweep (over 200 configurations) covering the CFNTSM gains ( k 1 , k 2 , β , p / q , ρ 1 ) , the GFA-FEO bandwidth and internal-model gains ( i , L , ε 1 , γ n , μ v ) , and the measurement LPF cutoffs ( f c , p , f c , v ) . Each configuration was evaluated under the nominal docking scenario ( v d = 0.5  m/s, f g = 2.5  Hz, v w = 2  m/s) with 10 independent random seeds.

3.7.1. Bottleneck Identification

The ceiling analysis has two key findings. First, each successive model-based enhancement yields diminishing returns: the gap from PID to CFNTSM is 6.3 mm, from CFNTSM to +FEO is 2.3 mm, and from +FEO to +GFA-FEO is 1.5 mm. The position RMSE converges to a plateau around 5.1 mm despite exhaustive tuning of all available degrees of freedom.
The representative values supporting this comparison are summarised in Table 3.
Figure 10 visualises the same performance ceiling and decomposes the residual error sources at the model-based optimum.
Panel (b) decomposes the residual error at the optimum into per-axis contributions. The z-axis component (3.5 mm) is dominated by a ≈ 3.0 mm steady-state bias from the motor time constant τ m ; the phase-advance feedforward compensates the dominant harmonic, but the nonlinear interaction between fold-angle-dependent thrust effectiveness and the first-order motor lag leaves a residual beyond linear phase compensation. The x-axis component (3.2 mm) is driven by gait-coupled lateral disturbance at the walking frequency and the GFA-FEO finite-time transient during frequency lock-in. The y-axis component (1.9 mm) approaches the sensor noise floor ( σ p = 0.5  mm per sample, amplified by differentiation in the velocity channel).

3.7.2. Implication: Necessity of Learned Residual Compensation

The 5.1 mm plateau establishes the noise-free architectural limit of the model-based controller. When measurement noise is included ( σ p = 0.5  mm per position channel, amplified by differentiation in the velocity feedback path), the effective baseline RMSE rises to approximately 7 mm in the nominal scenario—still within the < 10 mm docking tolerance (22), but leaving limited margin under more challenging conditions (higher  f g , stronger wind). The remaining structured error stems from three sources that resist analytical cancellation: (i) the nonlinear coupling between fold-angle-dependent dynamics and motor lag, (ii) the finite convergence time of the FEO during transient disturbances, and (iii) sensor noise amplification in the velocity feedback path.
A learned residual policy, operating on the rich observation vector o conditioned on the fold angle  α and FSM phase, can compensate for these structured residuals through nonlinear function approximation—particularly the sensor-noise-induced gap between the 5.1 mm noise-free limit and the ∼7 mm with-noise performance—and improve robustness across the operational envelope.

3.8. FiLM-SAC Residual Reinforcement Learning

The model-based controller leaves a small but structured residual tracking error near the docking tolerance. The FiLM-SAC layer is introduced as a bounded residual compensator that can adapt its action to the fold state, gait estimate, and FSM phase while leaving the CFNTSM–GFA-FEO controller as the stabilising backbone (Figure 11).

3.8.1. Residual Control Framework and Network Architecture

The residual layer is deliberately added after the model-based law so that learning only corrects the remaining bounded error rather than replacing the stabilising controller. The total control input is
u = u CFNTSM + λ ( t ) Δ u RL ,
where λ ( t ) [ 0 , λ ¯ ( M i ) ] is the adaptive trust weight (Section 3.8.2). The policy receives an observation o R 18 ( e p , e ˙ p , e η , ω , d ^ p , d ^ η , each 3-dim) and a conditioning vector c = [ α , α ˙ , f ^ g , A ^ g , M FSM / 8 ] R 5 encoding fold state, gait parameters, and mission phase. Following [26], c modulates the observation encoder via FiLM:
z = γ ϕ ( o ) + β , ( γ , β ) = h ψ ( c ) ,
where ϕ is a two-layer encoder (256 units, ReLU) and h ψ produces per-channel scale and shift. The modulated features are decoded into Δ u RL R 4 . The twin critics share encoder  ϕ , so the residual action and its SAC value estimates are learned in a common feature space.

3.8.2. Training Objective and Adaptive Trust Weight

The reward balances docking precision with residual smoothness so that the learned term complements the stabilising model-based controller. The crash indicator suppresses unsafe exploration, while domain randomisation exposes the policy to mass, wind, and gait uncertainty.
The policy is trained with SAC [44] (automatic entropy tuning). The reward is
r t = w p e p 2 w η e η 2 w u Δ u RL 2 w s 1 [ crash ] ,
with domain randomisation over mass (±10%), inertia (±15%), wind (≤3 m/s), gait frequency ( f g [ 1.5 , 3.0 ]  Hz), and initial conditions.
The adaptive trust weight gates the learned residual using online evidence of unmodelled dynamics: tracking error, observer residual magnitude, and the proximity of an FSM transition. In phases where the analytical controller already tracks within the nominal noise level, the learned authority is reduced.
Unlike prior residual-RL works that fix λ as a constant hyperparameter [27,35], we adapt λ ( t ) in real time based on three signals:
λ ( t ) = λ ¯ ( M i ) · σ c e e + c d d ^ FEO + c s Δ M FSM ( t ) ,
where σ ( · ) is the sigmoid function (output in [ 0 , 1 ] ), c e , c d , c s > 0 are design parameters (tracking-error suppression, disturbance-residual amplification, and transition-inhibitor weights, respectively), and  Δ M FSM ( t ) = κ sw e ( t t sw ) / τ λ is a decaying inhibitor that temporarily reduces λ after each FSM transition at time t sw .
Thus, the learned residual is suppressed during large tracking excursions, amplified when the FEO residual indicates unmodelled disturbances, and transiently reduced at FSM transitions to prevent overreaction to parameter step changes.

3.9. Payload-Adaptive Takeoff with FEO Hot-Switching

Upon EPM engagement (lock, M 5 ), the system mass doubles to  m c (23). The inertia tensor changes abruptly, and the centre of mass shifts. The GFA-FEO, designed for mass m u , would exhibit a large transient estimation error if its internal states were simply carried over.
At the lockstow transition ( M 5 M 6 ), the following hot-switch protocol is executed: (1) model parameters are updated to m m c , J J c from (23)–(25); (2) observer states are reset ( d ^ 0 , a ^ n 0 , b ^ n 0 ) since the internal-model coefficients must be re-learned for the new plant; and (3) the bandwidth is boosted via L κ hs L with κ hs > 1 (typically 1.5–2.0) for fast re-convergence, then ramped back to the nominal value over T ramp = 0.5  s.
Between the observer reset and re-convergence (the “blind window”, duration T blind 0.2 0.3  s), the FEO estimate is unreliable. During this interval, the FiLM-SAC residual policy serves as the primary disturbance compensator:
λ ( t ) = λ ¯ max · 1 e ( t t sw ) / τ b , t [ t sw , t sw + T blind ] ,
where t sw is the M 5 M 6 transition instant (when the hot-switch protocol fires and the observer is reset), λ ¯ max is the maximum permissible trust weight (phase M 7 ), and  τ b controls the ramp-up. After T blind , the observer has re-converged, and λ reverts to the adaptive law (51). The elevated residual authority is not an open-loop override: the CFNTSM backbone and thrust allocator remain active, and  λ ( t ) Δ u RL is saturated by λ ¯ max u ¯ RL . If  d ^ or the position error remains above the takeoff threshold at t sw + T blind , the trust weight is ramped down, and the FSM keeps the vehicle in the conservative climb/hold state rather than switching to cruise.
This design ensures that at no point during the mass-doubling transient is the system without active disturbance compensation: either the FEO or the RL policy (or both) is providing corrective action.

3.10. Closed-Loop Stability Analysis

Theorem 2. 
Consider the foldable-quadrotor system (8) and (9) with the three-layer controller (48), the GFA-FEO (29)–(32), and the α-scheduled CFNTSM (43). Under Assumptions A1–A3, if the observer bandwidth L is sufficiently large, the CFNTSM gains satisfy k 1 , j > 0 , k 2 , j > 0 , and the RL residual is bounded Δ u RL u ¯ RL , then the tracking errors e p and e η converge to a neighbourhood of the origin in finite time.
Proof. 
The proof proceeds in three steps corresponding to the three layers.
Step 1 (GFA-FEO convergence). By Theorem 1, the observation error converges in finite time T obs , after which d ˜ ϵ ¯ d with ϵ ¯ d = O ( L 1 ) .
Step 2 (CFNTSM finite-time convergence). After t > T obs , substitute d ^ = d + d ˜ into (43). The closed-loop sliding dynamics become:
s ˙ j = k 1 , j ( α ) s j ρ 1 k 2 , j ( α ) s j + g j 1 ( α ) d ˜ j + λ Δ u RL , j .
Consider the Lyapunov function V s = 1 2 j s j 2 . Its derivative is:
V ˙ s j k 1 , j ( α ) | s j | 1 + ρ 1 + k 2 , j ( α ) s j 2 | s j | ϵ ¯ d + λ ¯ u ¯ RL / g ̲ ( α ) ,
where g ̲ ( α ) = min j | g j ( α ) | > 0 is guaranteed by the damped allocation (47) and gain scheduling (40). Applying Young’s inequality, for  k 2 , j > ϵ ¯ d , j / ( g ̲ ϵ s 2 ) :
V ˙ s c s V s ( 1 + ρ 1 ) / 2 + ϵ V ,
where c s > 0 and ϵ V is a residual proportional to ϵ ¯ d + λ ¯ u ¯ RL . By finite-time stability theory, the sliding variable converges to | s j | ϵ s in time T s V s ( 0 ) ( 1 ρ 1 ) / 2 / [ c s ( 1 ρ 1 ) / 2 ] .
Once s j 0 , the sliding dynamics e j + β j e ˙ j p j / q j = 0 drive the tracking error to zero in finite time T e β j q j / ( q j p j ) | e j ( T s ) | ( q j p j ) / q j · q j / ( q j p j ) .
Step 3 (RL residual as bounded perturbation). The residual-RL term appears in (53) as a bounded perturbation scaled by λ ( t ) λ ¯ . By the adaptive law (51), λ decreases when tracking error is small, forming a self-regulating mechanism: large corrections occur only when the model-based controller is insufficient, and they diminish as tracking improves. The ultimate bound on e is thus:
e ϵ ¯ d + λ ¯ u ¯ RL k min · g ̲ ( α ) ,
which can be made arbitrarily small by increasing the observer bandwidth L (reducing ϵ ¯ d ) or the controller gains k min .
Cascade stability. The position and attitude loops are connected in cascade: the outer loop generates η d for the inner loop. The attitude loop is designed with a closed-loop bandwidth at least three times that of the position loop (a standard cascaded-UAV design assumption), so the inner loop may be treated as approximately settled on the timescale of the outer loop. By the semi-global practical stability theorem for cascades [28], if each loop individually converges to its respective bound, the cascaded system is semi-globally practically stable, completing the proof.    □

4. PSO Parameter Optimisation

The hierarchical controller of Section 3 introduces 18 tunable hyperparameters in the CFNTSM–GFA-FEO control core whose values significantly affect tracking precision, docking success rate, and disturbance rejection. Hand-tuning an 18-dimensional space is impractical and provides no optimality guarantee. Following recent quadrotor control-optimisation studies [28,30], we employ Particle Swarm Optimisation (PSO) [45] with domain-randomisation-robust evaluation and validate the resulting parameters through a structured Monte Carlo cross-validation protocol.
The 18 parameters span the CFNTSM sliding-mode gains and the GFA-FEO observer hyperparameters. Phase-scheduling parameters (enhanced-gain boost factors, RL trust-weight coefficients, and FSM smoothing constants) are designed as fixed ratios relative to the optimised base values and are therefore not included in the search space, following the principle of minimising the optimisation dimensionality for robust convergence [45]. The resulting search ranges, initial values, and layer assignments are listed in Table 4.

4.1. Optimisation Problem Formulation

The PSO search space contains the controller and observer parameters that directly influence docking precision. Candidate solutions are ranked by a fitness function that combines tracking error, contact velocity, and hard penalties for failed or unsafe docking.

4.1.1. Decision Variables

The 18 hyperparameters are partitioned into two functional layers.
Layer 1: CFNTSM base parameters (8 dimensions). Unlike conventional implementations that share gains across position and attitude loops, the cascade architecture requires separate tuning because the two loops have fundamentally different dynamics—the position loop drives force commands through gravity compensation, while the attitude loop drives torque commands through an α -dependent inertia tensor. The position-loop parameters are θ 1 , p = { k 1 pos , k 2 pos , β pos } and the attitude-loop parameters are θ 1 , η = { k 1 att , k 2 att , β att } , where superscripts distinguish the nominal deployed-configuration gains for each loop. Two parameters are shared: p / q ( 1 , 2 ) the terminal-mode exponent (continuously optimised, not discretised) and ρ 1 ( 0 , 1 ) the finite-time convergence exponent.
Layer 2: GFA-FEO observer parameters (10 dimensions). These govern the observer convergence speed, internal-model adaptation, and disturbance estimate conditioning: θ 2 = { 1 , 2 , 3 , L pos , L att , ε 1 , γ n , μ v , σ IM , d ^ max } . Here, 1 , 2 , 3 are the finite-time gains (29); L pos and L att are separate observer bandwidths for the position and attitude loops, reflecting their different noise environments and disturbance spectra; ε 1 ( 0 , 1 2 ) is the fractional exponent; γ n the internal-model adaptation rate (32); μ v the velocity correction gain (30); σ IM the σ -modification leakage coefficient (32); and d ^ max the saturation bound on the disturbance estimate, which prevents transient over-compensation during observer initialisation.

4.1.2. Fitness Function

Each candidate parameter vector θ = [ θ 1 , p , θ 1 , η , θ 2 ] is evaluated on a complete docking simulation (20 s, Δ t = 1  ms RK4) in which the UAV descends from 2 m onto a walking robot. To ensure robustness, each evaluation draws N DR = 12 domain-randomised conditions—mass m U ( 0.9 , 1.1 ) m 0 , wind v w U ( 0 , 3 )  m/s at random azimuth, and gait frequency f g U ( 2.0 , 3.0 )  Hz—replicated across N seed = 4 independent random seeds. The fitness is the mean over all N DR × N seed = 48 episodes, suppressing seed-dependent variance that could bias the optimiser.
The composite fitness minimises:
J PSO ( θ ) = w 1 RMSE p + w 2 RMSE η + w 3 | v z , contact | + P ( θ ) ,
where RMSE p is the position root-mean-square error (mm) computed over the last 40% of episode steps (focusing on steady-state docking precision), RMSE η is the attitude RMSE (°) over the same window, and | v z , contact | is the absolute vertical velocity at the final step (m/s), penalising hard contact; the weight vector w = [ 10 , 5 , 50 ] prioritises soft contact and position accuracy. The penalty term P ( θ ) enforces hard constraints:
P = 100 · 1 ¬ dock + 20 · 1 η peak > 15 ° + 30 · 1 e p , final > 50 mm ,
where 1 ( · ) denotes the indicator function. Any simulation divergence ( e p > 0.5  m or | ϕ | , | θ | > 60 ° ) results in a default penalty J PSO = 2000 .

4.2. Domain-Randomisation-Robust PSO

The optimisation procedure is designed to avoid tuning a controller only for one nominal trajectory. We initialise the swarm from a known feasible baseline, optimise all layers jointly, and evaluate each particle under freshly sampled uncertainty in mass, wind, gait frequency, and random seed.

4.2.1. Initialisation and PSO Update Rule

Rather than starting from a Latin Hypercube Sample, we initialise one particle at the hand-tuned baseline (Table 4, “Init.” column) and the remaining N p 1 particles uniformly at random within the search bounds. This warm-start strategy preserves the engineering knowledge embedded in the baseline while allowing the swarm to explore the full search space.
After initialisation, a standard PSO [45] is applied to all 18 dimensions simultaneously. Compared to multi-phase decomposition strategies, joint optimisation avoids the risk of sub-optimal coordination between separately tuned layers and simplifies the implementation.
The swarm uses N p = 50 particles ( 2.8 × the search dimension), constant inertia weight w = 0.7 , cognitive and social coefficients c 1 = c 2 = 1.5 , and velocity clamp at 50 % of the per-dimension search range. The large velocity clamp and moderate inertia weight encourage exploration in the early iterations, while the constant inertia avoids the convergence stagnation sometimes observed with linearly decaying schedules in high-dimensional noisy landscapes [45]. Initial velocities are sampled uniformly at ± 20 % of the range. Boundary violations are handled by elastic reflection: particles that exceed the bound are reflected by 50% of the overshoot distance, then clipped.

4.2.2. Domain-Randomisation Evaluation

Each particle is evaluated under N DR = 12 randomised environment conditions (mass, wind, and gait frequency) with N seed = 4 independent random seeds, yielding 48 full 20 s episodes per particle per iteration. Crucially, each iteration draws fresh random conditions to prevent the optimiser from overfitting to a fixed set of perturbations. The per-particle fitness is the arithmetic mean of all 48 episode fitness values. This domain-randomisation scheme is critical: a preliminary version using N DR = 2 exhibited extreme overfitting, with apparent fitness improvements that did not generalise (see Section 4.4).
Each iteration evaluates N p × N DR × N seed = 50 × 12 × 4 = 2400 episodes. The total budget over 100 iterations is 240,000 simulations. With the Numba JIT-compiled physics engine running in parallel across 24 CPU threads via prange, each iteration completes in approximately 21 s, giving a total wall-clock time of ≈ 35  min on a 24-core AMD Ryzen 9 7945HX laptop.

4.3. Monte Carlo Cross-Validation Protocol

To verify that the PSO-optimised parameters generalise beyond the training distribution, we design a structured cross-validation protocol that extends the evaluation to conditions outside the PSO training range. The grid stresses the two dominant external factors in the model: gait frequency sets the periodic disturbance harmonics, while wind speed sets the broadband force disturbance. Extending both ranges beyond the PSO training distribution tests whether the tuned parameters retain margin outside the optimisation episodes.
The two dominant environmental factors—gait frequency f g and wind speed v w —are swept over a 50 × 50 uniform grid with f g [ 1.5 , 4.0 ] Hz and v w [ 0 , 4.0 ] m/s (fixed wind direction ψ w = 0.5 rad). The PSO training distribution uses f g [ 2.0 , 3.0 ] Hz and v w [ 0 , 3.0 ] m/s; the cross-validation grid extends to 1.5 Hz, 4.0 Hz, and 4.0 m/s to test out-of-distribution robustness. UAV mass is held at the nominal value m 0 = 2.5 kg to isolate the effects of gait frequency and wind.
The success criterion follows the docking definition in Section 2.5 and is evaluated through the same scalar fitness used during optimisation. A successful run must meet position, attitude, contact-velocity, and bounded-excursion limits. Each of the 2500 grid cells runs one full 20 s docking simulation per parameter configuration, yielding 5000 simulations total (2500 hand-tuned + 2500 PSO-optimised). A simulation is classified as a docking success if the composite fitness satisfies J PSO < 100 , i.e., the docking penalty (+100 for failure) is not triggered. This binary criterion is more stringent than simply checking the final position, as it also requires bounded attitude excursion, low contact velocity, and sub-50 mm final position error (via (58)).
The validation metrics summarise robustness across the grid. The overall success rate measures global robustness, whereas the operational-band success rate focuses on the gait frequencies expected during normal wheel-legged walking. Two aggregate metrics are reported: the overall success rate (SR), defined as the fraction of all 2500 grid cells achieving J PSO < 100 , and the operational-band SR restricted to f g [ 2.0 , 3.0 ] Hz (typical quadruped walking cadences).

4.4. Optimisation Results

The optimisation results are reported in three steps: convergence of the swarm, physical interpretation of the optimised parameters, and Monte Carlo generalisation outside the PSO training distribution.

4.4.1. PSO Convergence

Figure 12 shows the global-best fitness J PSO versus iteration. The initial swarm-best fitness is 691.58 (predominantly driven by the docking-failure penalty from randomly initialised particles). Rapid improvement occurs in the first 30 iterations, reducing J PSO to 76.83 —a 99.1% share of the total 691.58 71.02 improvement. The curve then enters a plateau: iterations 30–100 contribute only incremental refinement ( 76.83 71.02 ), confirming that the 100-iteration budget is sufficient and the swarm has effectively converged. The final improvement occurs at iteration 93 ( 72.21 71.02 ), after which no further progress is observed.
To verify the robustness of the optimised parameters, the final g-best vector is re-evaluated under an independent protocol: 8 random seeds × 12 DR conditions (96 episodes), yielding a mean fitness of 146.05 compared to 309.07 for the hand-tuned baseline—a 52.7% improvement. The difference between the g-best fitness (71.02) and the re-evaluation mean (146.05) reflects the broader DR sampling in the independent test (which includes harder conditions near the boundary of the training distribution).

4.4.2. Optimised Parameters

The optimised parameter vector has a clear physical interpretation and reveals three dominant parameter shifts. First, attitude gains are dramatically increased ( k 1 att : 7.1 29.5 , + 315 % ): the hand-tuned baseline underestimated the gains required for the folded configuration, and since moment arms shrink substantially during folding, the α -scheduling (40) amplifies these nominal gains to maintain bandwidth; the PSO raises the baseline so that even the deployed configuration tracks more tightly. Second, observer bandwidths and velocity correction are massively increased ( L pos : 2.9 12.6 , μ v : 6.9 17.7 ): the hand-tuned observer was overly conservative, and the increased disturbance-channel gain ( 3 : 0.21 2.27 , + 998 % ) dramatically improves broadband rejection beyond what the internal-model harmonics provide. Third, the internal-model adaptation rate is reduced ( γ n : 104.5 14.9 , 86 % )—a counter-intuitive shift that compensates for the much higher observer bandwidth: with a faster broadband observer, the IM adaptation can afford to be slower, avoiding interactions between the IM dynamics and the finite-time observer transient; the increased σ -modification coefficient ( σ IM : 0.19 0.29 ) reinforces this conservative tuning. The position-loop CFNTSM gains remain largely unchanged ( k 1 pos : + 12 % , k 2 pos : 20 % ), suggesting that the original position tracking was adequate and the primary bottleneck was attitude regulation during folding.
Table 5 lists the optimised values alongside the hand-tuned initialisations.

4.4.3. Monte Carlo Cross-Validation

Figure 13 and Table 6 summarise the 50 × 50 cross-validation results. The overall docking success rate increases from 37% to 59%, and within the operational gait-frequency band f g [ 2.0 , 3.0 ] Hz the rate reaches 97%, up from 59%. Figure 13 visualises the per-cell fitness J PSO on a logarithmic colour scale across the ( f g , v w ) grid; the solid contour marks the J = 100 success boundary. The hand-tuned configuration (a) achieves low fitness only in a narrow band f g 1.8 2.6 Hz, with scattered failures even within this range at higher wind speeds. The PSO-optimised configuration (b) extends the low-fitness region to f g [ 1.5 , 2.9 ] Hz across the full wind range v w [ 0 , 4 ] m/s, with sharply increasing fitness only at f g 3.1 Hz where the gait harmonics approach the Nyquist limit of the 50 Hz control loop.
The residual failures at high f g are not a parameter-tuning deficiency but rather reflect a fundamental limitation: at f g = 4 Hz, the third gait harmonic ( 2 f g = 8 Hz) exceeds the observer bandwidth even in enhanced-gain mode, causing the disturbance estimate to lag behind the actual perturbation. These conditions fall outside the target operational envelope and would require architectural changes (e.g., a higher control rate or additional IM harmonics) rather than parameter re-tuning. The PSO-optimised parameters are used in all subsequent simulation experiments (Section 5).

5. Simulation Results

The proposed controller is evaluated through three experiment groups of increasing scope: precision landing and docking (Section 5.2), payload-adaptive takeoff (Section 5.3), and a full end-to-end mission cycle (Section 5.4). All experiments use the PSO-optimised parameters from Section 4 and share the simulation environment described below.

5.1. Simulation Environment

The simulation is implemented in custom Python code (Python 3.8.10; NumPy 1.24.4 for numerical computation and Matplotlib 3.7.5 for plotting) with a 6-DoF rigid-body integrator (RK4, Δ t = 1 ms). The foldable-quadrotor dynamics (8) and (9) with fold-angle-dependent inertia (Table 1) and the walking-robot gait model (21) are evaluated at every time step. The Cheeseman–Bennett ground-effect model (16) is active for z < 4 R p . Motor dynamics are modelled as first-order with τ m = 20 ms. Sensor noise is additive Gaussian: position σ p = 0.5 mm, attitude σ η = 0.1 ° , angular rate σ ω = 0.5 ° /s.
The nominal scenario parameters are: dog walking speed v d = 0.5 m/s (gait frequency f g = 2.5 Hz), lateral wind v w = 2 m/s (constant), and initial UAV position ( 0 , 0 , 2.0 ) m with the dog at the origin heading along the + x axis.
Compared methods. Five configurations are tested in Experiment 1 and four in Experiment 2. All learning-based methods use the same pre-trained FiLM-SAC policy (10 M environment steps with domain randomisation). Table 7 summarises the methods and indicates which components are active.

5.2. Experiment 1: Precision Landing and Docking

Experiment 1 isolates the most critical part of the mission: terminal landing and EPM docking on the walking platform. Ablated variants remove FiLM-SAC, GFA-FEO, or adaptive λ to quantify the role of each layer against the PID baseline.
Figure 14 first visualises the representative landing geometry before the aggregate metrics are reported. The grey deck marks the 300 × 180 mm walking-platform landing surface, and the green band denotes the ± 10 mm EPM alignment tolerance. M1 follows a smooth, direct descent with sub-centimetre lateral deviations during the final 0.5 m, whereas M2 (w/o RL) exhibits larger lateral excursions during M 3 M 4 , and M5 (PID) shows sustained oscillations that prevent clean convergence to the docking window.
The UAV starts at p 0 = ( 0 , 0 , 2.0 ) m and must dock on the robot’s back while the robot walks at v d = 0.5 m/s. The mission proceeds through the three FSM phases—approach, descent, and docking; arms fold from deployed ( α = 45 ° ) to fully stowed ( α = 76.5 ° ) during descent. Each method is evaluated over 100 independent runs at each of three gait frequencies f g { 2.0 , 2.5 , 2.8 } Hz, with domain randomisation of wind direction and speed (0–5 m/s), mass ( ± 10 % ), approach duration (1–3 s), and descent duration (6–10 s). A run is declared successful if and only if condition (22) is satisfied within 20 s. Table 8 reports the nominal condition f g = 2.5 Hz; cross-frequency results are discussed below.

5.2.1. Docking Quantitative Results

Table 8 compares docking precision, attitude matching, contact speed, and success rate across the full controller and four ablated baselines. Each value is computed from 100 randomised runs with wind, mass, timing, and gait perturbations.
Table 8 shows that M1 (Ours Full) achieves the best docking position RMSE ( 4.2 mm) and maintains a 100 % success rate across all three tested gait frequencies. Statistical significance is assessed via Welch’s t-test on the per seed docking position RMSE. The key observations are:
(i) Necessity of the residual RL (M1 vs. M2). Removing FiLM-SAC increases the docking position RMSE by 71% (from 4.2 to 7.2 mm, p = 6.2 × 10 85 ) and the descent-phase RMSE by 43% (from 6.5 to 9.3 mm). At f g = 2.5 Hz M2’s success rate is 99%; at f g = 2.8 Hz, it drops further to 94%, whereas M1 remains at 100% across all frequencies. The performance gap is most pronounced during descent, where the RL residual compensates for the unmodelled aerodynamic coupling between folding arms and the time-varying allocation matrix—nonlinearities outside the GFA-FEO’s disturbance class.
(ii) Necessity of the GFA framework (M1 vs. M3). Replacing GFA-FEO—which combines gait-frequency internal-model harmonics at f g / 2 , f g , and 2 f g with α -scheduled gains—with a standard FEO increases the docking RMSE by 29% (from 4.2 to 5.4 mm, p = 7.0 × 10 8 ) and reduces the success rate from 100% to 92%. The standard FEO cannot fully cancel the periodic gait disturbance nor compensate for the fold-angle-dependent control effectiveness, leading to a substantially higher variance ( ± 2.0 mm vs. ± 0.7 mm) that reflects inconsistent performance across domain-randomised conditions. At f g = 2.8 Hz, the gap persists (4.5 vs. 5.0 mm, SR 100 % 95 % ), confirming the benefit at higher gait frequencies where the periodic disturbance is more aggressive.
(iii) Benefit of adaptive λ ( t ) (M1 vs. M4). Fixing λ = 0.3 (the time average of the adaptive gate) increases the docking RMSE by 33% (from 4.2 to 5.6 mm, p = 5.1 × 10 39 ) and the descent-phase RMSE by 26% (from 6.5 to 8.2 mm). Both methods maintain 100% success at f g 2.5 Hz (M4 drops to 99% at f g = 2.8 Hz); the primary difference lies in precision. The adaptive law concentrates RL authority during descent and docking, where it matters most, while attenuating RL influence during the approach phase, where the model-based controller already suffices.
(iv) PID baseline (M5). Despite PSO-optimised gains, the PID controller achieves a docking RMSE of 7.9 mm—88% worse than M1 ( p = 2.9 × 10 65 )—and a descent-phase RMSE of 11.9 mm (83% worse). Its success rate drops to 90% at f g = 2.0 Hz and 87% at f g = 2.8 Hz. Without observer-based disturbance compensation, the gait-induced oscillation propagates directly into the position loop, and the absence of gain scheduling produces fixed control authority that cannot adapt to the varying allocation matrix during arm folding. Notably, PID exhibits the lowest peak attitude excursion ( 3.3 ° vs. M1’s 5.0 ° ) and lowest contact velocity ( 0.016 m/s vs. 0.070 m/s), reflecting its conservative gains; however, this conservatism comes at the cost of substantially worse tracking precision.

5.2.2. Time-Series Analysis

Figure 15 presents the time histories of key signals for M1 during a representative run. The FSM phases M 1 M 6 are indicated by background shading. The GFA-FEO disturbance estimate d ^ z tracks the phase and frequency of the true gait-induced acceleration but attenuates the peak amplitude by approximately 50%. This is an intentional consequence of the leakage term σ IM in the internal-model adaptation law (32): setting σ IM > 0 prevents the Fourier coefficients from diverging when the actual gait frequency deviates from the nominal value, at the cost of a steady-state amplitude bias.
Panel (a) confirms that the vertical error remains inside the ± 10 mm docking tolerance after capture, while panel (b) compares the UAV roll angle with the landing-surface roll and zooms into the descent interval. In panel (c), the GFA-FEO estimate d ^ z is overlaid with the scaled gait heave acceleration; the Pearson correlation r = 0.86 and amplitude ratio 0.45 show that the observer captures the dominant phase while intentionally attenuating the amplitude. The uncompensated residual—visible as the gap between the grey and red envelopes—is precisely the broadband disturbance that the RL residual δ u corrects, explaining why removing FiLM-SAC (M2) degrades tracking by 71%. Panel (d) shows that λ ( t ) peaks at ≈0.4 during M 3 , where fold-through causes the strongest gait coupling, and falls to near zero during M 5 after locking.
Off-diagonal coupling verification. As noted in Section 3.5, the off-diagonal terms of g η ( α ) are treated as part of the lumped disturbance. At the most extreme fold angle α = 76.5 ° , the maximum off-diagonal-to-diagonal ratio is max i j | g η , i j | / min i | g η , i i | = 0.18 , confirming the approximation is valid (the GFA-FEO estimates this coupling as part of d η ).

5.3. Experiment 2: Payload-Adaptive Takeoff

Experiment 2 evaluates the opposite transition: after docking, the UAV must take off while carrying the robot as payload. This test focuses on whether the hot-switch protocol correctly updates the observer and plant model after the sudden mass and inertia change.
Starting from the fully stowed and locked state ( M 6 : α = 76.5 ° , coupled mass m c = 5.00 kg), the UAV receives a takeoff command and must climb to z = 3.0 m while unfolding arms ( α : 76.5 ° 45 ° ) and stabilising the coupled body. A lateral wind gust of 1–5 m/s is applied at a randomised time t gust [ 1 , 3 ] s. Each method is evaluated over 100 independent runs with randomised wind direction and speed, ramp duration (2–4 s), unfold duration (3–5 s), mass perturbation ( ± 10 % ), and sensor noise.
Four methods are compared (Table 7): M1 (Ours Full with hot-switch), M2 (w/o RL), a variant M1′ that omits the hot-switch protocol (the observer retains pre-locking parameters), and M5 (PID with hot-switch).

Takeoff Quantitative Results

Table 9 focuses on the transient response after the EPM lock and payload takeoff command. In addition to altitude tracking, it reports peak roll excursion and early-window position RMSE, both of which are sensitive to the observer and plant-model reset.
The main observations are as follows.
(i) FEO hot-switch is critical (M1 vs. M1′). Without hot-switching, the observer retains the pre-locking mass m u = 2.50 kg in its internal model. The resulting plant-model mismatch ( m c / m u = 2.0 ) causes the FEO to grossly overestimate the control effectiveness, leading to a position RMSE that almost triples (from 16.2 to 44.2 mm, p = 2.3 × 10 19 ) and peak roll excursions that increase by 68% (from 3.4 ° to 5.7 ° ). Figure 16c shows the initial 85 mm position error spike caused by the mass mismatch, followed by persistent oscillations of 15–30 mm.
(ii) CFNTSM + FEO vs. PID (M1 vs. M5). The PID controller, despite PSO-optimised gains, exhibits 12× higher altitude overshoot ( 5.1 vs. 0.4 cm) and 72% worse position RMSE ( 27.9 vs. 16.2 mm, p = 1.9 × 10 34 ). Panel (c) shows that M5 suffers large error transients after the wind gust, reflecting the lack of model-based disturbance compensation.
(iii) Role of RL during takeoff (M1 vs. M2). M2 (model-based controller only, δ u = 0 ) achieves slightly lower RMSE than M1 (13.9 vs. 16.2 mm). This result is expected: the FiLM-SAC policy was trained exclusively for the landing phase (Exp. 1), and its residual actions—calibrated for descent/docking dynamics—are not optimised for the fundamentally different takeoff trajectory. The 17% overhead confirms that the residual RL is benign (not destabilising) even when operating outside its training distribution, while the model-based controller alone is sufficient for the less demanding takeoff task.
Figure 16 visualises the corresponding trajectory and disturbance-estimation behaviour.

5.4. Experiment 3: Full Mission Cycle

Experiment 3 combines the preceding phases into a continuous mission cycle. The trajectory passes through approach, folding, docking, payload takeoff, cruise, and final delivery to test error boundedness across phase transitions.

5.4.1. Scenario

The full mission unfolds over 40 s and traverses six phases. The primary evaluation uses M1 (Ours Full); a PSO-tuned PID baseline (M5) is overlaid in the timeline figure for visual comparison. Phase I (0–5 s) is a cruise approach at z = 2.0 m with arms deployed ( α = 45 ° ). In Phase II (5–8 s) a cosine-ramp descent brings the UAV from z = 2.0 m to z dog = 0.35 m while the arms fold proportionally ( 45 ° 76.5 ). Phase III (8–11 s) is the docking hold: EPM lock triggers when e p < 15 mm, e η < 5 ° , and | z ˙ z ˙ gait | < 0.10 m/s are sustained for 0.3 s: the hot-switch protocol (Section 3.9) fires. Phase IV (11–16 s) is takeoff with the coupled body via a cosine ramp to z del = 3.0 m while the arms unfold. Phases V–VI (16–40 s) comprise the cruise to the delivery altitude and station-keeping. Domain randomisation matches Experiments 1–2: m U ( 0.9 , 1.1 ) m nom , f g U ( 2.0 , 2.8 ) Hz, wind speed U ( 0 , 5 ) m/s with random direction. The RL residual is gated: active in Phases II–IV (descent, dock, and takeoff) and disabled in Phases I, V, VI (cruise and station-keep). Each of 100 seeds produces one complete 40 s trajectory.

5.4.2. Mission Timeline

Figure 17 shows a representative mission. The system achieves a 100% docking success rate and 100% mission completion rate across all 100 runs. Docking lock-on occurs at t = 8.75 ± 0.20 s (i.e., 0.75 s into Phase III), confirming rapid convergence to the surface.
Panel (a) shows altitude tracking: the actual trajectory follows the commanded cosine-ramp descent to the dog surface, docking hold, and ascent to the delivery altitude. Panel (b) reports the fold-angle trajectory, where α ( t ) completes the full cycle 45 ° 76.5 ° 45 ° . Panel (c) compares the position-error norm of M1 with the PID baseline; M1 peaks during descent fold-through and then settles below 10 mm in cruise, whereas PID remains substantially less accurate throughout the mission. Panel (d) overlays the FEO vertical disturbance estimate with the scaled gait heave acceleration, showing that the observer tracks the periodic gait component across the phase sequence, including after the mass-doubling transition; the PID baseline has no corresponding disturbance observer.
Table 10 reports position and attitude RMSE by phase for both M1 (Ours Full) and M5 (PID), each evaluated over 100 random seeds. These phase-wise metrics provide the numerical basis for interpreting where the full mission is most demanding and where the proposed controller gains the largest advantage over PID.
For M1, the most demanding interval is Phase II (descent), with position RMSE 16.90 ± 2.55 mm and attitude RMSE 2.51 ± 0.68 ° , driven by the simultaneous altitude ramp and arm folding through the minimum- σ 2 region. This phase combines the least favourable actuator geometry with the strongest gait-induced deck motion, so it is the point at which the gain scheduler and disturbance observer are most heavily exercised. Phase IV (takeoff) produces the second-largest transient ( 11.78 ± 0.32 mm) because the EPM-locked system undergoes a sudden mass-doubling step and FEO re-initialisation before the observer states settle to the coupled-body dynamics. After the transition settles, Phase V (cruise) and Phase VI (station-keep) exhibit the tightest tracking: 4.61 ± 0.36 mm and 4.22 ± 0.17 mm, respectively, confirming that the observer and gain scheduler fully adapt to the coupled-body dynamics.
The peak instantaneous position error across the entire mission is 27.43 ± 3.63 mm, occurring during the descent-to-dock transition, and it remains below the tolerance margin required for magnetic capture.
By contrast, the PID baseline (M5) exhibits a mission-wide peak error of 62.51 ± 20.34 mm— 2.3 × larger than M1—with the worst degradation in Phase II ( 37.73 ± 12.49 mm vs. 16.90 mm for M1) and Phase III ( 17.27 ± 2.99 mm vs. 5.06 mm). The largest gap, therefore, occurs exactly where the system requires fold-aware allocation, gait-frequency disturbance rejection, and residual correction simultaneously. These results confirm that the CFNTSM + GFA-FEO architecture with RL residual provides substantially better disturbance rejection than a tuned PID controller across the full mission (Figure 17c,d).

5.4.3. 6-DoF Error Panorama

Figure 18 decomposes the tracking error into six channels over the full 40 s. The most informative observations are as follows. During descent fold-through ( t 5 –8 s), e z peaks at ∼20 mm as α traverses the minimum- σ 2 region; the adaptive gain scheduler (40) maintains control authority. At the mass transition ( t 8 s), the FEO reset produces a brief e x spike (∼10 mm) that decays within 0.5 s.
During post-coupling takeoff ( t 11 s), e z shows a smooth ∼15 mm undershoot during the cosine-ramp ascent with doubled mass, fully absorbed by the CFNTSM. Yaw coupling remains small ( e ψ < 1 ° during mass transitions, <0.1° in cruise), confirming that four-rotor yaw authority is preserved even at α fold . In all critical intervals, the errors remain well within the docking tolerance or recover within 0.5 s, confirming that the gain scheduling, hot-switch, and RL residual mechanisms work in concert to maintain tracking across the full flight envelope.
The temporal error distribution also explains why the full-stack controller outperforms the PID baseline in the mission-level statistics. The largest excursions coincide with known physical events—fold-through, magnetic locking, payload coupling, and takeoff mass doubling—rather than with slow drift or accumulated observer bias. This indicates that the remaining error is dominated by short, bounded transients at mode transitions, while the controller recovers to millimetre-level tracking during the quasi-steady cruise and station-keeping phases.

5.5. Summary of Results

Table 11 consolidates the key findings across all three experiments and links each result to the control module that it isolates. Experiment 1 explains the landing-side accuracy gains by separating the contributions of the RL residual, gait-frequency-aware observer, and adaptive trust weight. Experiment 2 tests the payload transition and shows that the hot-switch protocol is needed when the coupled mass and inertia change abruptly. Experiment 3 then verifies that these improvements remain compatible over the full docking-to-delivery mission rather than only in isolated phases.
Taken together, the results show that no single layer accounts for the reported performance. The model-based CFNTSM + GFA-FEO backbone supplies bounded tracking and periodic disturbance rejection, the fold-aware scheduling preserves authority through changing geometry, and the FiLM-SAC residual improves precision in the nonlinear descent and docking regime. The table, therefore, serves as a compact cross-check between the ablation evidence, the payload-takeoff validation, and the complete-mission benchmark.

6. Discussion

The three experiment groups indicate that reliable docking on a walking platform depends on the coordinated action of all control layers rather than on any single module. The model-based CFNTSM + GFA-FEO backbone rejects the dominant gait-periodic disturbance and adapts the control gains to the fold-dependent allocation matrix B ( α ) . The FiLM-SAC residual then compensates for unmodelled aerodynamic coupling between the folding arms and rotor wake, which lies outside the disturbance class represented by the linear internal model, while the adaptive trust weight λ ( t ) concentrates learned authority during descent and docking. The ablation results in Table 11 therefore support the intended division of labour: observer-based periodic rejection, fold-aware finite-time control, and bounded learned residual compensation each provide a statistically significant reduction in tracking error.
The hot-switch protocol is particularly important during the transition from docking to payload-carrying takeoff. Without updating the plant model at magnetic locking (M1′), the twofold mass mismatch produces a 173% increase in takeoff RMSE (Table 11). Under domain randomisation, the observer reset, boosted bandwidth, bounded residual authority, and fallback climb/hold condition collectively prevent the blind window from becoming an open-loop interval. The residual policy was trained only for landing and still introduces only a modest 17% overhead in the takeoff comparison, suggesting that the learned component remains well bounded outside its primary training distribution; nevertheless, including takeoff transients in future residual-policy training should further reduce this margin.
The full-mission experiment demonstrates completion of a 40 s, six-phase docking-to-delivery cycle in simulation. Across 100 randomised seeds, both the docking and mission success rates remain at 100%, indicating that the FSM transitions, gain scheduler, FEO re-initialisation, and RL gating operate coherently over the complete sequence. The largest position error ( 27.4 ± 3.6 mm) occurs during descent fold-through, where σ 2 ( B ) reaches its minimum and the vehicle is in its most mechanically constrained configuration. Even in this interval, the transient remains within the EPM capture tolerance and decays rapidly after the mode transition.
Several limitations should be noted. Although a full-scale hardware prototype has been assembled (Section 2.6), the present results are derived from numerical simulation with domain randomisation and synthetic sensor noise. Flight experiments on the physical Jetson Orin Nano/PX4 platform, therefore, represent the immediate next step. On the perception side, the study assumes sub-centimetre relative-pose accuracy, whereas outdoor deployment will require real-time fusion of RealSense D415 visual/depth updates, Unitree L2 LiDAR registration, and inertial measurements in the onboard estimator. On the mechanical side, the crank-rocker mechanism satisfies the Grashof condition with a geometric margin of only 0.3 mm. Hardware validation must therefore verify repeatability under tolerance stack-up, servo heating, gear backlash, and repeated high-frequency folding. The design mitigates these risks through joint clearance, mechanical end-stops, a slew-limited 0.6 s fold command and current/fold-angle consistency checks for jam detection, but these provisions still require bench endurance testing before aggressive in-flight morphing is attempted. Finally, the FiLM-SAC policy was trained at a single gait frequency and evaluated over f g [ 2.0 , 2.8 ] Hz; extending validation to broader gait speeds and different robot morphologies, and evaluating multi-joint folding architectures that preserve greater control authority at extreme fold angles constitute important directions for future investigation.

7. Conclusions

This work presents a hierarchical control framework designed for autonomous docking of a foldable quadrotor on a trotting wheel-legged robot, followed by payload-adaptive takeoff. The framework integrates an α -scheduled CFNTSM controller, a gait-frequency-aware finite-time extended observer (GFA-FEO), and a FiLM-SAC residual RL policy with an adaptive trust weight λ ( t ) .
The simulation results support three main conclusions. First, the fold-aware CFNTSM + GFA-FEO backbone maintains the vehicle inside the EPM capture window despite gait-induced surface motion and fold-angle authority loss, while the FiLM-SAC residual further reduces docking RMSE from 7.2 ± 0.5 mm to 4.2 ± 0.7 mm at f g = 2.5 Hz. Second, adaptive trust weighting improves precision relative to a fixed residual weight, reducing docking RMSE by 25%. Third, after EPM locking, the FEO hot-switch protocol is necessary for payload-adaptive takeoff: it reduces the early takeoff RMSE by 63% after the coupled mass approximately doubles. In the complete 40 s simulated mission, the full system achieves 100% success over 100 randomised seeds, with the largest transient position error ( 27.4 ± 3.6 mm) occurring during the mechanically constrained descent fold-through phase.
These conclusions are supported within the simulated and domain-randomised operating envelope considered in this study. Because hardware flight tests have not yet been conducted, these results should be regarded as simulation-level evidence for the proposed mechanism–control concept rather than final field-deployment validation. A hardware prototype has been assembled (Section 2.6); future efforts will target sim-to-real transfer validation, onboard visual-inertial state estimation, and multi-phase reinforcement-learning policy refinement.

Author Contributions

Conceptualization, Q.G. and Z.S.; methodology, Q.G.; software, Q.G.; validation, Q.G.; formal analysis, Q.G.; investigation, Q.G.; writing—original draft preparation, Q.G.; writing—review and editing, Z.S.; supervision, Z.S.; project administration, Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The simulation code and data will be made available upon acceptance.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Pretto, A.; Aravecchia, S.; Burgard, W.; Chebrolu, N.; Dornhege, C.; Falck, T.; Fleckenstein, F.; Fontenla, A.; Imperoli, M.; Khanna, R.; et al. Building an Aerial-Ground Robotics System for Precision Farming: An Adaptable Solution. IEEE Robot. Autom. Mag. 2021, 28, 29–49. [Google Scholar] [CrossRef]
  2. Zhou, Y.; Quang, L.; Nieto-Granda, C.; Loianno, G. CoPeD—Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments. IEEE Robot. Autom. Lett. 2024, 9, 6416–6423. [Google Scholar] [CrossRef]
  3. Wang, J.; Guan, X.; Sun, Z.; Shen, T.; Huang, D.; Liu, F.; Cui, H. OMEGA: Efficient Occlusion-Aware Navigation for Air-Ground Robots in Dynamic Environments via State Space Model. arXiv 2024, arXiv:2408.10618. [Google Scholar] [CrossRef]
  4. Xu, H.; Wang, C.; Bo, Y.; Jiang, C.; Liu, Y.; Yang, S.; Lai, W. An Aerial and Ground Multi-Agent Cooperative Location Framework in GNSS-Challenged Environments. Remote Sens. 2022, 14, 5055. [Google Scholar] [CrossRef]
  5. Lu, W.; Bin, D.; Ma, L.; Ma, M.; Ma, Z.; Chen, X.; Wang, L.; Feng, Y.; Jiang, Z.; Shi, Y.; et al. Semi-Distributed Cross-Modal Air-Ground Relative Localization. arXiv 2025, arXiv:2511.06749. [Google Scholar]
  6. Shi, Y.; Hua, Y.; Yu, J.; Dong, X.; Lü, J.; Ren, Z. Cooperative Fault-Tolerant Formation Tracking Control for Heterogeneous Air–Ground Systems Using a Learning-Based Method. IEEE Trans. Aerosp. Electron. Syst. 2024, 60, 1742–1755. [Google Scholar] [CrossRef]
  7. Yang, J.-X.; Xu, Y.; Wu, Z.-G.; Li, Y. Distributed Estimation and Data-Driven Formation Control of Air–Ground Systems via Efficient Reinforcement Learning. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 16622–16633. [Google Scholar] [CrossRef]
  8. Li, Z.; Mao, R.; Chen, N.; Xu, C.; Gao, F.; Cao, Y. ColAG: A Collaborative Air–Ground Framework for Perception-Limited UGVs’ Navigation. In Proceedings of the 2024 IEEE International Conference on Robotics and Automation (ICRA), Yokohama, Japan, 13–17 May 2024. [Google Scholar]
  9. Baca, T.; Hert, D.; Loianno, G.; Saska, M.; Kumar, V. Model Predictive Trajectory Tracking and Collision Avoidance for Reliable Outdoor Deployment of Unmanned Aerial Vehicles. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 6753–6760. [Google Scholar]
  10. Keipour, A.; Pereira, G.A.S.; Bonatti, R.; Garg, R.; Rastogi, P.; Dubey, G.; Scherer, S. Visual Servoing Approach to Autonomous UAV Landing on a Moving Vehicle. Sensors 2022, 22, 6549. [Google Scholar] [CrossRef]
  11. Chang, C.-W.; Lo, L.-Y.; Cheung, H.C.; Feng, Y.; Yang, A.-S.; Wen, C.-Y.; Zhou, W. Proactive Guidance for Accurate UAV Landing on a Dynamic Platform: A Visual–Inertial Approach. Sensors 2022, 22, 404. [Google Scholar] [CrossRef]
  12. Huang, Y.; Zhu, M.; Zheng, Z.; Low, K.H. Linear Velocity-Free Visual Servoing Control for Unmanned Helicopter Landing on a Ship With Visibility Constraint. IEEE Trans. Syst. Man Cybern. Syst. 2021, 52, 2979–2993. [Google Scholar] [CrossRef]
  13. Ollero, A.; Tognon, M.; Suarez, A.; Lee, D.; Franchi, A. Past, Present, and Future of Aerial Robotic Manipulators. IEEE Trans. Robot. 2022, 38, 626–645. [Google Scholar] [CrossRef]
  14. Meng, J.; Buzzatto, J.; Liu, Y.; Liarokapis, M. On Aerial Robots With Grasping and Perching Capabilities: A Comprehensive Review. Front. Robot. AI 2022, 8, 739173. [Google Scholar] [CrossRef]
  15. Zheng, P.; Xiao, F.; Nguyen, P.H.; Farinha, A.; Kovac, M. Metamorphic Aerial Robot Capable of Mid-Air Shape Morphing for Rapid Perching. Sci. Rep. 2023, 13, 1297. [Google Scholar] [CrossRef]
  16. Wüest, V.; Jeger, S.; Feroskhan, M.; Ajanic, E.; Bergonti, F.; Floreano, D. Agile Perching Maneuvers in Birds and Morphing-Wing Drones. Nat. Commun. 2024, 15, 8330. [Google Scholar] [CrossRef]
  17. Li, C.-X.; Wu, H.-N.; Yang, T. Coordinated Control of Flight and Morphing for Morphing Quadrotor via Reinforcement Learning. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 12755–12766. [Google Scholar] [CrossRef]
  18. Idrissi, M.; Salami, M.; Annaz, F. A Review of Quadrotor Unmanned Aerial Vehicles: Applications, Architectural Design and Control Algorithms. J. Intell. Robot. Syst. 2022, 104, 22. [Google Scholar] [CrossRef]
  19. Singh, K.; Mehndiratta, M.; Feroskhan, M. QuadPlus: Design, Modeling, and Receding Horizon-Based Control of a Hyperdynamic Quadrotor. IEEE Trans. Aerosp. Electron. Syst. 2021, 58, 1766–1779. [Google Scholar] [CrossRef]
  20. Lou, H.; Wu, Q.; Wang, H.; Li, M.; Liu, H.; Sun, N. Structure, Modeling, and Control of Morphing Quadrotors: A Review. Int. J. Precis. Eng. Manuf. 2026, 27, 825–842. [Google Scholar] [CrossRef]
  21. Falanga, D.; Kleber, K.; Mintchev, S.; Floreano, D.; Scaramuzza, D. The Foldable Drone: A Morphing Quadrotor that can Squeeze and Fly. IEEE Robot. Autom. Lett. 2019, 4, 209–216. [Google Scholar] [CrossRef]
  22. Tuna, T.; Ovur, S.E.; Gokbel, E.; Kumbasar, T. Design and Development of FOLLY: A Self-Foldable and Self-Deployable Quadcopter. Aerosp. Sci. Technol. 2020, 100, 105807. [Google Scholar] [CrossRef]
  23. Yang, T.; Wu, H.-N.; Wang, J.-W. cc-DRL: A Convex Combined Deep Reinforcement Learning Flight Control Design for a Morphing Quadrotor. arXiv 2024, arXiv:2408.13054. [Google Scholar] [CrossRef]
  24. Hu, D.; Pei, Z.; Shi, J.; Tang, Z. Design, Modeling and Control of a Novel Morphing Quadrotor. IEEE Robot. Autom. Lett. 2021, 6, 8013–8020. [Google Scholar] [CrossRef]
  25. Cao, C.; Li, F.; Ding, R.; Huang, T.; Yang, C.; Gui, W. Intelligent Attitude Control for Morphing Flight Vehicle: A Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2025, 74, 8851–8865. [Google Scholar] [CrossRef]
  26. Bauersfeld, L.; Kaufmann, E.; Scaramuzza, D. User-Conditioned Neural Control Policies for Mobile Robotics. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 1342–1348. [Google Scholar]
  27. Zhang, R.; Zhang, D.; Mueller, M.W. ProxFly: Robust Control for Close Proximity Quadcopter Flight via Residual Reinforcement Learning. In Proceedings of the 2025 IEEE International Conference on Robotics and Automation (ICRA), Atlanta, GA, USA, 19–23 May 2025. [Google Scholar]
  28. Ullah, M.; Gao, H.; Nasir, A.; Wang, Y.; Wang, C. Adaptive-Neural Finite-Time Sliding Mode Control for Quadrotor Helicopter Attitude Stabilization in Complex Environments. IEEE Trans. Aerosp. Electron. Syst. 2025, 61, 1175–1185. [Google Scholar] [CrossRef]
  29. Wang, S.; Hao, W.; Ma, W.; Mei, T. Reinforcement Learning Based Optimized Sliding Mode Attitude Control Strategy for Quadrotor Against Unknown Time-Varying Disturbances. Eng. Res. Express 2026, in press. [Google Scholar] [CrossRef]
  30. Mahmood, A.A.; García, F.; Al-Kaff, A. A Novel Design of a Sliding Mode Controller Based on Modified ERL for Enhanced Quadcopter Trajectory Tracking. Drones 2025, 9, 737. [Google Scholar] [CrossRef]
  31. Foehn, P.; Kaufmann, E.; Romero, A.; Penicka, R.; Sun, S.; Bauersfeld, L.; Laengle, T.; Cioffi, G.; Song, Y.; Loquercio, A.; et al. Agilicious: Open-Source and Open-Hardware Agile Quadrotor for Vision-Based Flight. Sci. Robot. 2022, 7, eabl6259. [Google Scholar] [CrossRef]
  32. Kaufmann, E.; Bauersfeld, L.; Loquercio, A.; Müller, M.; Koltun, V.; Scaramuzza, D. Champion-Level Drone Racing Using Deep Reinforcement Learning. Nature 2023, 620, 982–987. [Google Scholar] [CrossRef]
  33. Dimmig, C.A.; Silano, G.; McGuire, K.; Gabellieri, C.; Hönig, W.; Moore, J.; Kobilarov, M. Survey of Simulators for Aerial Robots: An Overview and In-Depth Systematic Comparisons. IEEE Robot. Autom. Mag. 2024, 32, 153–166. [Google Scholar] [CrossRef]
  34. Staessens, T.; Lefebvre, T.; Crevecoeur, G. Adaptive Control of a Mechatronic System Using Constrained Residual Reinforcement Learning. IEEE Trans. Ind. Electron. 2022, 69, 10447–10456. [Google Scholar] [CrossRef]
  35. Ishihara, Y.; Hazama, Y.; Suzuki, K.; Yokono, J.J.; Sabe, K.; Kawamoto, K. Improving Wind Resistance Performance of Cascaded PID Controlled Quadcopters Using Residual Reinforcement Learning. arXiv 2023, arXiv:2308.01648. [Google Scholar] [CrossRef]
  36. Wen, S.; Shu, Y.; Rad, A.; Wen, Z.; Guo, Z.; Gong, S. A Deep Residual Reinforcement Learning Algorithm Based on Soft Actor-Critic for Autonomous Navigation. Expert Syst. Appl. 2025, 259, 125238. [Google Scholar] [CrossRef]
  37. Hua, H.; Fang, Y. A Novel Reinforcement Learning-Based Robust Control Strategy for a Quadrotor. IEEE Trans. Ind. Electron. 2023, 70, 2812–2821. [Google Scholar] [CrossRef]
  38. Yu, C.; Rosendo, A. Multi-Modal Legged Locomotion Framework with Automated Residual Reinforcement Learning. IEEE Robot. Autom. Lett. 2022, 7, 10312–10319. [Google Scholar] [CrossRef]
  39. McN. Alexander, R. Mechanics of bipedalism. In Scale Effects in Animal Locomotion; Academic Press: London, UK, 1977; pp. 93–110. [Google Scholar]
  40. Qin, T.; Pan, J.; Cao, S.; Shen, S. A General Optimization-Based Framework for Local Odometry Estimation With Multiple Sensors. arXiv 2019, arXiv:1901.03638. [Google Scholar] [CrossRef]
  41. Xu, W.; Zhang, F. FAST-LIO: A Fast, Robust LiDAR-Inertial Odometry Package by Tightly-Coupled Iterated Kalman Filter. IEEE Robot. Autom. Lett. 2021, 6, 3317–3324. [Google Scholar] [CrossRef]
  42. Shan, T.; Englot, B.; Meyers, D.; Wang, W.; Ratti, C.; Rus, D. LIO-SAM: Tightly-Coupled Lidar Inertial Odometry via Smoothing and Mapping. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 5135–5142. [Google Scholar]
  43. Du, H.; Qian, C.; Yang, S.; Li, S. Recursive Design of Finite-Time Convergent Observers for a Class of Time-Varying Nonlinear Systems. Automatica 2013, 49, 601–609. [Google Scholar] [CrossRef]
  44. Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: New York, NY, USA, 2018; Volume 80, pp. 1861–1870. [Google Scholar]
  45. Kennedy, J.; Eberhart, R. Particle Swarm Optimization. In Proceedings of the IEEE International Conference on Neural Networks (ICNN), Perth, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
Figure 1. Representative application scenarios. (a) Factory inspection. (b) Geographic survey and mapping. (c) Aerial retrieval.
Figure 1. Representative application scenarios. (a) Factory inspection. (b) Geographic survey and mapping. (c) Aerial retrieval.
Drones 10 00378 g001
Figure 2. Comparison of the prior and proposed folding mechanisms. (a) Prior dual-servo design folding into a 150 × 590 mm elongated strip. (b) Proposed single-servo crank-rocker design. (c) Engineering drawing with key dimensions.
Figure 2. Comparison of the prior and proposed folding mechanisms. (a) Prior dual-servo design folding into a 150 × 590 mm elongated strip. (b) Proposed single-servo crank-rocker design. (c) Engineering drawing with key dimensions.
Drones 10 00378 g002
Figure 3. Coordinate frames and spatial relationships used in the dynamic model. { W } : Earth-fixed ENU world frame (blue); { B } : UAV body frame (red); { D } : robot-dog back-surface frame (green). The position vector p , rotation R ( η ) , gravity, and thrust vectors are annotated.
Figure 3. Coordinate frames and spatial relationships used in the dynamic model. { W } : Earth-fixed ENU world frame (blue); { B } : UAV body frame (red); { D } : robot-dog back-surface frame (green). The position vector p , rotation R ( η ) , gravity, and thrust vectors are annotated.
Drones 10 00378 g003
Figure 4. Fold transition key frames from deployed ( α = 45 ° ) to fully folded ( α = 76.5 ° ). Blue: body BBox; red dashed: propeller-tip BBox.
Figure 4. Fold transition key frames from deployed ( α = 45 ° ) to fully folded ( α = 76.5 ° ). Blue: body BBox; red dashed: propeller-tip BBox.
Drones 10 00378 g004
Figure 5. Fold–angle dependence of control moment arms. (a,b) Roll and pitch moment arms per motor. (c) σ 2 ( B ( α ) ) , dropping ∼48% from deployed to fully folded.
Figure 5. Fold–angle dependence of control moment arms. (a,b) Roll and pitch moment arms per motor. (c) σ 2 ( B ( α ) ) , dropping ∼48% from deployed to fully folded.
Drones 10 00378 g005
Figure 6. Hardware prototype collage and annotated assembled configuration of the air–ground cooperative robot prototype.
Figure 6. Hardware prototype collage and annotated assembled configuration of the air–ground cooperative robot prototype.
Drones 10 00378 g006
Figure 7. Perception-to-control pipeline used by the hardware concept. Depth-camera, LiDAR, dog-IMU, and UAV-inertial measurements are first processed by separate visual–inertial, LiDAR–inertial, and landing-plate registration modules, then fused into a relative-pose EKF. The FSM uses both the pose estimate and covariance to gate docking/takeoff phases and passes the resulting state to GFA-FEO, CFNTSM, FiLM-SAC, and the fold-aware mixer.
Figure 7. Perception-to-control pipeline used by the hardware concept. Depth-camera, LiDAR, dog-IMU, and UAV-inertial measurements are first processed by separate visual–inertial, LiDAR–inertial, and landing-plate registration modules, then fused into a relative-pose EKF. The FSM uses both the pose estimate and covariance to gate docking/takeoff phases and passes the resulting state to GFA-FEO, CFNTSM, FiLM-SAC, and the fold-aware mixer.
Drones 10 00378 g007
Figure 8. Block diagram of the three-layer hierarchical control architecture. The FSM selects phase-dependent gains ( Θ i ) for both the position and attitude loops. In each loop, GFA-FEO provides feedforward disturbance compensation, CFNTSM provides robust sliding-mode feedback, and FiLM-SAC adds a learned residual correction modulated by λ ( t ) . The fold-aware mixer inverts B ( α ) in real time.
Figure 8. Block diagram of the three-layer hierarchical control architecture. The FSM selects phase-dependent gains ( Θ i ) for both the position and attitude loops. In each loop, GFA-FEO provides feedforward disturbance compensation, CFNTSM provides robust sliding-mode feedback, and FiLM-SAC adds a learned residual correction modulated by λ ( t ) . The fold-aware mixer inverts B ( α ) in real time.
Drones 10 00378 g008
Figure 9. Frequency-domain comparison of the disturbance estimation error | d ˜ / d | (dB). Blue: standard FEO ( L = 20 , no internal model). Red: proposed GFA-FEO ( L = 20 ) with internal-model harmonics at H = { f g / 2 , f g , 2 f g } . Green dashed: GFA-FEO in enhanced-gain mode ( L = 40 , κ L = 2 ). Orange dotted lines mark the three gait-harmonic frequencies.
Figure 9. Frequency-domain comparison of the disturbance estimation error | d ˜ / d | (dB). Blue: standard FEO ( L = 20 , no internal model). Red: proposed GFA-FEO ( L = 20 ) with internal-model harmonics at H = { f g / 2 , f g , 2 f g } . Green dashed: GFA-FEO in enhanced-gain mode ( L = 40 , κ L = 2 ). Orange dotted lines mark the three gait-harmonic frequencies.
Drones 10 00378 g009
Figure 10. Model-based performance ceiling analysis (noise-free evaluation). (a) Position RMSE under progressive addition of controller components showing diminishing returns and a noise-free plateau at 5.1 mm; the dashed red line marks the 10 mm docking tolerance (22) and the dotted purple line indicates the 7 mm with-noise baseline. (b) Per-axis RMSE decomposition at the model-based optimum, annotated with dominant bottleneck sources; the shaded region between the 5.1 mm plateau and the 10 mm tolerance represents the RL compensation margin.
Figure 10. Model-based performance ceiling analysis (noise-free evaluation). (a) Position RMSE under progressive addition of controller components showing diminishing returns and a noise-free plateau at 5.1 mm; the dashed red line marks the 10 mm docking tolerance (22) and the dotted purple line indicates the 7 mm with-noise baseline. (b) Per-axis RMSE decomposition at the model-based optimum, annotated with dominant bottleneck sources; the shaded region between the 5.1 mm plateau and the 10 mm tolerance represents the RL compensation margin.
Drones 10 00378 g010
Figure 11. Integrated FiLM-SAC residual-learning architecture. The actor maps the observation o and conditioning vector c to a bounded residual action through FiLM modulation; the environment applies this residual on top of the CFNTSM + GFA-FEO baseline under domain randomisation and returns transitions stored in replay buffer D . Twin critics share the encoder and are trained with SAC.
Figure 11. Integrated FiLM-SAC residual-learning architecture. The actor maps the observation o and conditioning vector c to a bounded residual action through FiLM modulation; the environment applies this residual on top of the CFNTSM + GFA-FEO baseline under domain randomisation and returns transitions stored in replay buffer D . Twin critics share the encoder and are trained with SAC.
Drones 10 00378 g011
Figure 12. PSO convergence curve: global-best fitness J PSO versus iteration (18 D, 50 particles, 12 DR × 4 seeds per evaluation). Dashed red line: hand-tuned baseline fitness (re-evaluated). The swarm reaches 99% of its total improvement within 30 iterations.
Figure 12. PSO convergence curve: global-best fitness J PSO versus iteration (18 D, 50 particles, 12 DR × 4 seeds per evaluation). Dashed red line: hand-tuned baseline fitness (re-evaluated). The swarm reaches 99% of its total improvement within 30 iterations.
Drones 10 00378 g012
Figure 13. Monte Carlo cross-validation fitness landscape ( 50 × 50 = 2500 conditions per configuration, logarithmic colour scale). Dark green indicates low fitness (good); dark red indicates high fitness (failure). Solid contour: J PSO = 100 success boundary. (a) Hand-tuned parameters (SRall = 37%, SRop = 59%); (b) PSO-optimised parameters (SRall = 59%, SRop = 97%). Dashed box: operational band f g [ 2 , 3 ] Hz.
Figure 13. Monte Carlo cross-validation fitness landscape ( 50 × 50 = 2500 conditions per configuration, logarithmic colour scale). Dark green indicates low fitness (good); dark red indicates high fitness (failure). Solid contour: J PSO = 100 success boundary. (a) Hand-tuned parameters (SRall = 37%, SRop = 59%); (b) PSO-optimised parameters (SRall = 59%, SRop = 97%). Dashed box: operational band f g [ 2 , 3 ] Hz.
Drones 10 00378 g013
Figure 14. Experiment 1 representative trajectories in the x z -plane (dog-body frame). The panels show lateral (y) error versus altitude during descent and final attitude matching between ϕ UAV and ϕ land .
Figure 14. Experiment 1 representative trajectories in the x z -plane (dog-body frame). The panels show lateral (y) error versus altitude during descent and final attitude matching between ϕ UAV and ϕ land .
Drones 10 00378 g014
Figure 15. Experiment 1 time series for M1 (Ours Full): vertical error, roll tracking, disturbance estimation, fold angle, and adaptive trust weight.
Figure 15. Experiment 1 time series for M1 (Ours Full): vertical error, roll tracking, disturbance estimation, fold angle, and adaptive trust weight.
Drones 10 00378 g015
Figure 16. Experiment 2: payload-adaptive takeoff (representative run). (a) Altitude tracking error: M1′ (no hot-switch) reaches 8 cm due to the 2 × mass mismatch; M5 (PID) shows the largest overshoot. (b) FEO estimate d ^ z : M1 (with hot-switch) maintains bounded ± 3 m/s2 oscillation, whereas M1′ diverges to 10 m/s2 bias, demonstrating hot-switch necessity. (c) Smoothed 3-D position error for M1, M1′, and M5: M1′ peaks at 85 mm; M1 achieves the lowest steady-state error among the three methods.
Figure 16. Experiment 2: payload-adaptive takeoff (representative run). (a) Altitude tracking error: M1′ (no hot-switch) reaches 8 cm due to the 2 × mass mismatch; M5 (PID) shows the largest overshoot. (b) FEO estimate d ^ z : M1 (with hot-switch) maintains bounded ± 3 m/s2 oscillation, whereas M1′ diverges to 10 m/s2 bias, demonstrating hot-switch necessity. (c) Smoothed 3-D position error for M1, M1′, and M5: M1′ peaks at 85 mm; M1 achieves the lowest steady-state error among the three methods.
Drones 10 00378 g016
Figure 17. Experiment 3: full mission cycle for a representative run. (a) Altitude tracking; (b) fold-angle cycle; (c) position-error norm; (d) vertical disturbance estimate. Background bands denote the six mission phases.
Figure 17. Experiment 3: full mission cycle for a representative run. (a) Altitude tracking; (b) fold-angle cycle; (c) position-error norm; (d) vertical disturbance estimate. Background bands denote the six mission phases.
Drones 10 00378 g017
Figure 18. 6-DoF tracking error over the full 40 s mission. (a) Position errors e x , e y , e z (mm); (b) attitude errors e ϕ , e θ , e ψ (°). Background bands mark the six mission phases. The descent fold-through ( t 5 –8 s) and takeoff mass transition ( t 11 s) produce the largest transients, both of which settle within 0.5 s.
Figure 18. 6-DoF tracking error over the full 40 s mission. (a) Position errors e x , e y , e z (mm); (b) attitude errors e ϕ , e θ , e ψ (°). Background bands mark the six mission phases. The descent fold-through ( t 5 –8 s) and takeoff mass transition ( t 11 s) produce the largest transients, both of which settle within 0.5 s.
Drones 10 00378 g018
Table 1. Quadrotor dynamic-model parameters.
Table 1. Quadrotor dynamic-model parameters.
ParameterSymbolValue
Total massm2.50 kg
Arm assembly mass (each) m a 0.18 kg
Arm length (pivot to motor) L a 116 mm
Propeller radius R p 65 mm
Thrust coefficient k f 1.5 × 10 5  N · s2
Torque coefficient k τ 2.4 × 10 7  N · m · s2
Torque-to-thrust ratio c τ 0.016 m
Rotor inertia J r 3.8 × 10 5  kg · m2
Aerodynamic drag coeff. k d 2.5 × 10 3  N · m · s
Roll/pitch inertia (deployed) J x x dep 1.2 × 10 2  kg · m2
Roll/pitch inertia (folded) J x x fold 5.8 × 10 3  kg · m2
Yaw inertia (deployed) J z z dep 2.1 × 10 2  kg · m2
Yaw inertia (folded) J z z fold 1.0 × 10 2  kg · m2
Table 2. Mission-phase FSM: phases and transition conditions.
Table 2. Mission-phase FSM: phases and transition conditions.
IDPhaseEntry Condition
M 1 approachMission start; UAV airborne, p p dog > 2  m
M 2 align p x y p land , x y < 0.5  m and | z z land | < 1.0  m
M 3 descendAligned: Δ p x y < 50  mm; arms begin folding ( α ˙ < 0 )
M 4 dockDocking window: Δ p x y < 30  mm, | Δ ϕ | , | Δ θ | < 3 °
M 5 lockEPM activated; docking condition (22) satisfied
M 6 stowLocked; arms fold to α fold ; motors idle
M 7 takeoffStow complete; takeoff command received
M 8 cruise z > z land + 1.0  m and p ˙ < 0.2  m/s
Table 3. Model-based performance ceiling: position RMSE under progressive component addition (10-seed average, nominal docking scenario, and noise-free evaluation). All configurations use PSO-optimised parameters.
Table 3. Model-based performance ceiling: position RMSE under progressive component addition (10-seed average, nominal docking scenario, and noise-free evaluation). All configurations use PSO-optimised parameters.
Configuration e ¯ p (mm) e x (mm) e y (mm) e z (mm) z ¯ bias (mm) e ¯ η (°)
PID baseline15.2 ± 0.33.8
CFNTSM only8.9 ± 0.14.82.57.0 7.0 2.0
CFNTSM + FEO (no IM)6.6 ± 0.13.52.05.2 4.8 2.2
CFNTSM + GFA-FEO5.1 ± 0.03.21.93.5 3.0 2.4
Note: Bold indicates the best value for the primary position-RMSE metric.
Table 4. PSO decision-variable space (18 dimensions).
Table 4. PSO decision-variable space (18 dimensions).
LayerSymbolLowerUpperInit.Description
CFNTSM k 1 pos 35012.66Position sliding gain
k 2 pos 1255.63Position reaching gain
β pos 0.053.00.102Position surface coeff.
k 1 att 2807.11Attitude sliding gain
k 2 att 24025.91Attitude reaching gain
β att 0.13.00.770Attitude surface coeff.
p / q 1.051.81.064Terminal exponent
ρ 1 0.30.980.691Finite-time exponent
GFA-FEO 1 0.35.01.61Position-channel gain
2 0.38.01.73Velocity-channel gain
3 0.14.00.206Disturbance-channel gain
L pos 1602.91Position observer bandwidth
L att 1609.39Attitude observer bandwidth
ε 1 0.050.450.188Fractional exponent
γ n 10200104.5IM adaptation rate
μ v 0.5206.88Velocity correction gain
σ IM 0.011.00.191 σ -modification coeff.
d ^ max 1010099.4Disturbance saturation
Table 5. Hand-tuned vs. PSO-optimised parameter values.
Table 5. Hand-tuned vs. PSO-optimised parameter values.
ParameterHand-TunedPSO-OptimisedChange
k 1 pos 12.6614.13 + 12 %
k 2 pos 5.634.51 20 %
β pos 0.1020.101 1 %
k 1 att 7.1129.54 + 315 %
k 2 att 25.9133.08 + 28 %
β att 0.7701.097 + 42 %
p / q 1.0641.059 0 %
ρ 1 0.6910.706 + 2 %
1 1.611.37 15 %
2 1.733.35 + 94 %
3 0.2062.265 + 998 %
L pos 2.9112.57 + 332 %
L att 9.3913.92 + 48 %
ε 1 0.1880.069 63 %
γ n 104.514.93 86 %
μ v 6.8817.67 + 157 %
σ IM 0.1910.287 + 50 %
d ^ max 99.462.24 37 %
Table 6. Monte Carlo cross-validation results ( 50 × 50 = 2500 conditions per configuration).
Table 6. Monte Carlo cross-validation results ( 50 × 50 = 2500 conditions per configuration).
MetricHand-TunedPSO-Optimised
Overall success rate (%)3759
Operational-band SR (%, f g [ 2 , 3 ]  Hz)5997
Table 7. Compared methods and active components. A check mark indicates that the component is active, whereas—indicates that the component is absent.
Table 7. Compared methods and active components. A check mark indicates that the component is active, whereas—indicates that the component is absent.
IDMethodCFNTSMGFA-FEOFiLM-SAC α -Sched.Adaptive λ
M1Ours (Full)
M2w/o RL
M3w/o GFAFEO only
M4Fixed- λ λ = 0.3
M5PIDPID
Table 8. Experiment 1 results ( f g = 2.5 Hz): precision landing and docking (100 runs per method; mean ± std).
Table 8. Experiment 1 results ( f g = 2.5 Hz): precision landing and docking (100 runs per method; mean ± std).
MetricM1M2M3M4M5
Pos. RMSEdock (mm) ↓ 4.2 ± 0.7 7.2 ± 0.5 5.4 ± 2.0 5.6 ± 0.5 7.9 ± 1.1
Att. RMSEdock (°) ↓ 2.0 ± 0.2 2.1 ± 0.2 1.9 ± 0.3 2.1 ± 0.2 2.4 ± 0.1
Pos. RMSEdesc (mm) ↓ 6.5 ± 0.7 9.3 ± 2.8 6.9 ± 0.7 8.2 ± 0.7 11.9 ± 3.1
Contact vel. (m/s) ↓ 0.070 ± 0.027 0.067 ± 0.028 0.061 ± 0.025 0.070 ± 0.029 0.016 ± 0.010
Success rate (%) ↑ 100 999210094
Peak att. excursion (°) ↓ 5.0 ± 0.7 4.8 ± 1.2 4.8 ± 0.7 4.7 ± 0.6 3.3 ± 0.5
Note: Bold indicates the best value for each metric where applicable.
Table 9. Experiment 2 results: payload-adaptive takeoff (100 runs per method; mean ± std).
Table 9. Experiment 2 results: payload-adaptive takeoff (100 runs per method; mean ± std).
MetricM1M2M1 (No HS)M5
Altitude overshoot (cm) ↓ 0.4 ± 0.2 0.2 ± 0.1 0.3 ± 0.5 5.1 ± 1.8
Settling time to ±5 cm (s) ↓ 2.8 ± 0.5 2.8 ± 0.5 2.8 ± 0.5 2.9 ± 0.3
Peak roll excursion (°) ↓ 3.4 ± 0.4 2.9 ± 0.4 5.7 ± 1.6 7.7 ± 2.2
Pos. RMSE t [ 0 , 5 ]  s (mm) ↓ 16.2 ± 1.8 13.9 ± 2.1 44.2 ± 24.8 27.9 ± 6.4
Note: Bold indicates the best value for each metric where applicable.
Table 10. Experiment 3: per-phase position RMSE for M1 (Ours Full) and M5 (PID), each over 100 runs (mean ± std). Attitude RMSE (M1) is reported for the phases where gait-coupled attitude tracking is most critical.
Table 10. Experiment 3: per-phase position RMSE for M1 (Ours Full) and M5 (PID), each over 100 runs (mean ± std). Attitude RMSE (M1) is reported for the phases where gait-coupled attitude tracking is most critical.
PhaseM1 Pos. RMSE (mm)M5 Pos. RMSE (mm)M1 Att. RMSE (°)Key Challenge
I Approach 6.07 ± 0.54 8.18 ± 1.05 Gait tracking
II Descend 16.90 ± 2.55 37.73 ± 12.49 2.51 ± 0.68 α fold-through
III Dock 5.06 ± 1.00 17.27 ± 2.99 2.09 ± 0.08 Surface hold
IV Takeoff 11.78 ± 0.32 15.26 ± 0.77 Mass doubling
V Cruise 4.61 ± 0.36 7.64 ± 0.51 Coupled cruise
VI Station 4.22 ± 0.17 7.52 ± 0.53 Long-term hold
Peak error (mission-wide)M1: 27.43 ± 3.63  mm/M5: 62.51 ± 20.34  mm
Docking timeM1: 8.75 ± 0.20  s/M5: 8.87 ± 0.20  s
Table 11. Summary of key results across experiments.
Table 11. Summary of key results across experiments.
Finding   Exp.   ComparedResult
RL residual necessity1M1 vs. M2Dock RMSE 42 % ( 4.2 vs. 7.2  mm, p < 10 84 )
GFA internal model1M1 vs. M3Dock RMSE 22 % ( 4.2 vs. 5.4  mm), SR + 8  pp
Adaptive λ ( t ) 1M1 vs. M4Dock RMSE 25 % ( 4.2 vs. 5.6  mm, p < 10 38 )
FEO hot-switch2   M1 vs. M1Pos. RMSE 63 % ( 16.2 vs. 44.2  mm, p < 10 18 )
CFNTSM + FEO vs. PID   2M1 vs. M5Pos. RMSE 42 % ( 16.2 vs. 27.9  mm, p < 10 33 )
Full-cycle feasibility3M1 vs. M5100% SR; peak err. M1 27.4 vs. M5 62.5  mm
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gu, Q.; Sun, Z. Precision Docking of a Foldable Quadrotor on a Wheel-Legged Robot via CFNTSM with GFA-FEO and FiLM-SAC Deep Reinforcement Learning. Drones 2026, 10, 378. https://doi.org/10.3390/drones10050378

AMA Style

Gu Q, Sun Z. Precision Docking of a Foldable Quadrotor on a Wheel-Legged Robot via CFNTSM with GFA-FEO and FiLM-SAC Deep Reinforcement Learning. Drones. 2026; 10(5):378. https://doi.org/10.3390/drones10050378

Chicago/Turabian Style

Gu, Qibin, and Zhenxing Sun. 2026. "Precision Docking of a Foldable Quadrotor on a Wheel-Legged Robot via CFNTSM with GFA-FEO and FiLM-SAC Deep Reinforcement Learning" Drones 10, no. 5: 378. https://doi.org/10.3390/drones10050378

APA Style

Gu, Q., & Sun, Z. (2026). Precision Docking of a Foldable Quadrotor on a Wheel-Legged Robot via CFNTSM with GFA-FEO and FiLM-SAC Deep Reinforcement Learning. Drones, 10(5), 378. https://doi.org/10.3390/drones10050378

Article Metrics

Back to TopTop