Next Article in Journal
Autonomous Maneuvering Decision-Making Method for Unmanned Aerial Vehicle Based on Soft Actor-Critic Algorithm
Previous Article in Journal
Efficacy of Drone-Applied Fungicide Treatments in Control of Sunflower Diseases
Previous Article in Special Issue
Research on Dynamic Center-of-Mass Reconfiguration for Enhancement of UAV Performances Based on Simulations and Experiment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Delay-Aware UAV Swarm Formation Control via Imitation Learning from ARD-PF Expert Policies

by
Rodolfo Vera-Amaro
1,*,
Alberto Luviano-Juárez
1 and
Mario E. Rivero-Ángeles
2
1
SEPI-UPIITA, Instituto Politécnico Nacional, Mexico City 07740, Mexico
2
CIC-IPN, Instituto Politécnico Nacional, Mexico City 07738, Mexico
*
Author to whom correspondence should be addressed.
Drones 2026, 10(1), 34; https://doi.org/10.3390/drones10010034
Submission received: 25 November 2025 / Revised: 2 January 2026 / Accepted: 4 January 2026 / Published: 6 January 2026
(This article belongs to the Special Issue Advanced Flight Dynamics and Decision-Making for UAV Operations)

Highlights

What are the main findings?
  • Training imitation-learning policies with delay-augmented ARD-PF demonstrations leads to a measurable expansion of the probabilistic stability region across swarm size, formation spacing, and communication delay.
  • The admissible delay boundary follows a consistent scaling law with swarm cardinality and inter-agent distance, which can be identified from simulation data using a low-dimensional parametric model.
What are the implications of the main findings?
  • Communication latency should be treated as a first-class design variable in learning-based swarm controllers, rather than as a post hoc disturbance or tuning parameter.
  • The proposed delay–stability relationship enables joint control–communication design by explicitly linking formation geometry and swarm size to delay tolerance limits.

Abstract

This paper studies delay-aware formation control for (unmanned aerial vehicle) UAV swarms operating under realistic air-to-air communication latency. An attractive–repulsive distance-based potential-field (ARD-PF) controller is used as an expert to generate demonstrations for imitation learning in multi-UAV cooperative systems. By augmenting the training data with communication delay, the learned policy implicitly compensates for outdated neighbor information and improves swarm coordination during autonomous flight. Extensive simulations across different swarm sizes, formation spacings, and delay levels show that delay-robust imitation learning significantly enlarges the probabilistic stability region compared with classical ARD-PF control and non-robust learning baselines. Formation control performance is evaluated using internal geometric error, global offset, and multi-run stability metrics. In addition, a predictive delay–stability model is introduced, linking the maximum admissible communication delay to swarm size and inter-agent spacing, with low fitting error against simulated stability boundaries. The results provide quantitative insights for designing communication-aware UAV swarm systems under latency constraints.

1. Introduction

Unmanned aerial vehicle (UAV) swarms are emerging as a key technology for large-scale monitoring, infrastructure inspection, precision agriculture, environmental surveillance, cooperative logistics, and disaster-response missions, driven by their inherent redundancy, scalability, and distributed sensing capabilities [1,2]. In many UAV swarm missions, coordinated motion among multiple vehicles is not optional but a functional requirement to ensure adequate area coverage, timely data acquisition, and tolerance to individual agent failures. Under this perspective, formation control becomes a key enabler for real-world swarm deployment. Nonetheless, maintaining formation integrity in realistic scenarios is far from straightforward. Coordination among UAVs relies on wireless links that inherently suffer from propagation delays, buffering effects, routing latency, and channel degradation. These sources of delay introduce outdated or inconsistent relative state information, weaken synchronization among agents, and can substantially deteriorate the closed-loop response of formation controllers. When latency becomes significant, the resulting behavior often includes sustained oscillations, progressive deformation of the desired geometry, or, in extreme cases, a breakdown of swarm coherence [3,4,5].
As a consequence, the impact of communication delay has been examined extensively in the multi-agent and multi-UAV control literature. Several studies have focused on identifying admissible delay bounds for formation and path-following controllers and on characterizing how stability margins shrink as latency increases [2,3]. Other contributions introduce explicit delay-compensation strategies or redesigned feedback architectures to mitigate the destabilizing effects of time-varying communication delays [4,6]. Along a different line, event-driven communication policies, topology-switching schemes, and delay-tolerant coordination mechanisms have been proposed to cope with intermittent connectivity while reducing overall communication load [1,7]. Despite these advances, most delay-aware solutions remain based on hand-crafted analytical controllers, whose performance can deteriorate under nonlinear dynamics or unmodeled delay-induced distortions.
Artificial potential-field (APF) and distance-based potential controllers remain attractive for UAV formation due to their decentralized structure and low computational complexity. However, they typically assume synchronous or near-instantaneous access to neighbor states, which makes them particularly sensitive to delayed or asynchronous information exchange. In contrast, learning-from-demonstration (LfD) and imitation learning (IL) provide a data-driven alternative by synthesizing control policies directly from expert behavior. Recent surveys report rapid progress in LfD for robotic systems [8,9], and emerging works explore IL for multi-robot coordination and spatial organization [10,11]. However, despite parallel advances in delay-aware formation control and imitation learning for multi-agent systems, the use of distance-based potential-field controllers as expert demonstrations for training delay-robust learned policies remains largely unexplored.
In this paper, we address this gap by proposing a delay-aware learning-from-demonstration framework in which (i) an attractive–repulsive distance-based potential-field (ARD-PF) controller is simulated under realistic leader–follower and inter-follower communication delays, (ii) a dataset of expert demonstrations is collected across varying swarm sizes, formation spacings, and delay magnitudes, and (iii) a neural behavior cloning (BC) policy is trained to imitate expert actions while implicitly compensating for delay-induced distortions. The main contributions of this work are summarized as follows:
  • A delay-aware ARD-PF formulation that explicitly incorporates leader–follower and inter-follower latency in the formation control process.
  • A dataset generation methodology capturing ARD-PF expert demonstrations across heterogeneous swarm geometries and communication-delay conditions.
  • A behavior cloning policy that learns implicit delay-compensation behaviors directly from expert trajectories, without explicit delay modeling.
  • A comprehensive quantitative evaluation demonstrating that the learned policy improves formation stability, reduces geometric distortion, and increases tolerance to large communication delays compared with classical ARD-PF control.
The organization of this manuscript is organized as follows. Section 2 reviews prior work on delay-aware UAV formation control and learning-based multi-agent coordination. Section 3 introduces the delay-aware ARD-PF controller together with the dataset construction procedure and the imitation-learning formulation. The proposed control architecture, including the expert policy, observation design, and learning framework, is described in Section 4. Section 5 presents the simulation environment and discusses the obtained experimental results. Finally, Section 6 summarizes the main findings of this study and outlines directions for future work.

2. Related Work

2.1. UAV Swarm Formation and Communication Delay

Formation control is a fundamental enabler of cooperative UAV swarm operation and has been widely studied under paradigms such as leader–follower schemes, virtual structures, and behavior-based approaches. Despite substantial progress, recent surveys consistently identify scalability, robustness, and communication constraints as the main challenges in multi-UAV and multi-robot systems [12,13,14].
Recent work has increasingly highlighted the close interaction between communication performance and cooperative UAV control. Topology control strategies for multi-UAV networks show how dynamic connectivity, link quality, and latency affect coordination and information flow, motivating communication-aware designs in flying ad hoc networks (FANETs) [15]. In parallel, reinforcement learning approaches, such as Proximal Policy Optimization (PPO), have been applied to distributed UAV formation control, demonstrating the ability of learning-based policies to handle complex coordination tasks under realistic networking conditions [16].
Building on these insights, a growing body of research has focused on formation and path-following control under non-negligible communication delays. Existing studies derive explicit stability conditions, propose delay-compensation mechanisms, and explore topology-switching and event-triggered coordination to mitigate the effects of outdated information and bandwidth constraints [1,2,3,4].
Overall, these contributions demonstrate that communication latency can substantially shrink stability margins and impair formation performance, which directly motivates the development of coordination strategies that explicitly account for delay effects.

2.2. Potential Field and Distance-Based Formation Control

Artificial potential-field (APF) and distance-based potential methods remain attractive for UAV formation control due to their decentralized structure, low computational cost, and natural encoding of collision avoidance and spacing constraints. Recent studies have focused on improving convergence properties, reducing oscillations, and enhancing robustness under non-ideal conditions. For instance, piecewise potential-field formulations have been proposed to improve smoothness and reduce oscillatory behavior in fixed-wing UAV formations [17], while improved adaptive APF methods combined with sliding-mode control have been shown to increase robustness against disturbances and dynamic obstacles [18]. Other works integrate APF-based formation with consensus mechanisms or model predictive control to better handle nonlinear agent interactions and environmental constraints [1,19]. Despite these advances, most APF-based strategies implicitly assume near-synchronous state updates or address communication delays only through conservative gain tuning, limiting their effectiveness under variable or severe latency. In contrast, the present work leverages an attraction–repulsion distance-based potential-field controller (ARD-PF) as an expert whose delay-affected behavior is explicitly exploited to train a learning-based controller.

2.3. Imitation Learning and Data-Driven Multi-Robot Control

Learning-from-demonstration (LfD) and imitation learning (IL) offer data-driven alternatives to analytically designed controllers by learning policies directly from expert trajectories, without requiring reward design or online exploration. Recent surveys report rapid progress in LfD across diverse robotic domains, including manipulation, autonomous driving, and human–robot interaction [8,9]. In the context of multi-robot and swarm systems, IL has been used to reproduce collective behaviors from simulated or expert data. Agunloye et al. propose a framework for imitating spatial organization patterns in multi-robot systems, with emphasis on steady-state geometric arrangements [10], while Wu et al. introduce an adversarial imitation-learning approach with deep attention mechanisms to reproduce a variety of swarm behaviors from global trajectory observations [20]. These works demonstrate that IL can capture complex interaction rules in distributed systems; however, they typically rely on delay-free demonstrations or focus on steady-state collective behavior rather than dynamic leader–follower formation tracking under communication latency.
Overall, the literature shows substantial progress in delay-aware formation control, enhanced potential-field methods, and imitation learning for multi-agent coordination. Nevertheless, these research directions have largely evolved in parallel: delay-aware formation controllers rarely exploit learning-based policies; potential-field approaches generally assume synchronous or low-latency communication; and existing IL-based swarm controllers typically learn from delay-free data or emphasize static spatial organization. This motivates the need for a unified framework that combines the interpretability and structure of distance-based potential-field control with the flexibility and robustness of learned policies. In this work, we address this gap by generating demonstrations from a delay-affected ARD-PF controller and training a behavior cloning policy to implicitly compensate for latency-induced distortions, as detailed in the following sections.

3. System Model

We consider a swarm of N unmanned aerial vehicles (UAVs) composed of one leader and N 1 followers, as shown in Figure 1. The swarm operates in three-dimensional space; however, formation geometry and coordination are defined on the horizontal plane, which is standard practice in multi-UAV formation control and allows isolating the impact of communication delays on relative motion [21,22]. The position of UAV i at time step k is described by the state vector
p i [ k ] = x i [ k ] y i [ k ] z i [ k ] R 3 ,
with sampling period d t . The leader trajectory p 0 [ k ] is assumed to be known to the system and generated by a reference path-planning routine, which may follow a parametric or waypoint-based specification [23]. Followers are required to maintain prescribed relative offsets with respect to the leader while ensuring collision avoidance and swarm cohesion.

3.1. Formation Geometry

Each follower i { 1 , , N 1 } is associated with a prescribed relative displacement vector
d i = d i x d i y d i z R 3 ,
which defines its nominal position within the formation with respect to the leader. In this work, a rotated V-shaped configuration is considered. This formation pattern has been widely adopted in both biological flocking studies and engineered multi-agent systems, as it provides a practical compromise between inter-agent visibility, spacing, and coordination efficiency [24,25]. In addition, aerodynamic analyses of natural formations indicate that V-shaped arrangements can improve collective flight efficiency, which further motivates their use in UAV swarm applications [26,27].
To maintain consistency between the formation geometry and the leader’s motion, the desired position of follower i at time step k is expressed as
p i * [ k ] = p 0 [ k ] + R [ k ] d i ,
where p 0 [ k ] denotes the leader position. The rotation matrix R [ k ] SO ( 2 ) accounts for the leader’s yaw angle θ 0 [ k ] and is given by
R [ k ] = cos θ 0 [ k ] sin θ 0 [ k ] 0 sin θ 0 [ k ] cos θ 0 [ k ] 0 0 0 1 .
This rotation matrix applies a planar rotation in the horizontal ( x , y ) plane while keeping the vertical component d i z constant, ensuring that the formation undergoes a coherent rigid-body rotation aligned with the leader’s heading and preserves the intended relative geometry throughout the maneuver [21,28].

3.2. UAV Dynamics

The motion of each UAV is described using a discrete-time kinematic model,
p i [ k + 1 ] = p i [ k ] + v i [ k ] d t ,
where v i [ k ] R 3 denotes the commanded velocity at time step k. This velocity-based representation is commonly adopted in formation-control studies because it captures the essential relative motion between agents while avoiding the complexity of full attitude and thrust dynamics [22,23]. Physical feasibility is enforced by constraining the control input according to
v i [ k ] v max ,
which reflects typical actuation limits of multirotor UAV platforms.

3.3. Delay Model

Communication among agents is affected by latency arising from wireless propagation, onboard computation, buffering, and asynchronous update processes [2,3,4]. As a result, the information available to agent i from agent j corresponds to a previous sampling instant,
k τ i j [ k ] ,
where τ i j [ k ] Z 0 represents the communication delay. In the simulations, communication delays are specified in seconds and converted to integer-valued sample offsets according to
τ i j [ k ] = τ i j ( s ) d t ,
where τ i j ( s ) denotes the delay expressed in seconds and d t is the sampling period. The controller therefore relies on delayed position measurements of the form
p ˜ j [ k ] = p j [ k τ i j [ k ] ] ,
rather than on the most recent state information. This bounded-delay formulation captures the dominant effect of outdated neighbor data on formation behavior, which is later quantified through multi-run simulations and probabilistic stability analysis.

3.4. ARD-PF Expert Controller

The ARD–PF controller combines an attractive component, which drives each follower toward its assigned formation position, with a repulsive interaction that enforces collision avoidance and minimum spacing. The resulting artificial potential acting on follower i is expressed as
U i = U i att + U i rep ,
with the attractive component given by
U i att = K att 2 p ˜ 0 [ k ] + R [ k ] d i p i [ k ] 2 ,
where R [ k ] aligns the reference offsets d i with the leader’s orientation. Collision avoidance is introduced through the repulsive term
U i rep = j i K rep 1 p ˜ j [ k ] p i [ k ] 2 + ε ,
with a small constant ε > 0 preventing singularities at short range.
The control action is computed from the negative gradient of the potential,
v i [ k ] = p i U i ,
yielding a decentralized formation controller whose behavior is strongly influenced by communication delays. Owing to its analytical transparency and sensitivity to latency, the ARD–PF controller serves as an appropriate expert policy for generating delay-aware demonstrations in imitation-learning studies [3,4,10,29].

3.5. Problem Formulation

The objective is to learn a delay-robust control policy of the form
π θ : R m R 3 ,
parameterized by neural network weights θ , mapping delayed observations to follower velocities. For imitation learning, the expert action is defined as the commanded velocity after integration and saturation, rather than the raw potential gradient. This choice reflects the physically realizable control input applied to the UAVs. The behavior cloning objective seeks the parameter vector minimizing the expert imitation loss
θ * = arg min θ k , i v i expert [ k ] π θ ( o i [ k ] ) 2 ,
where o i [ k ] includes delayed states, velocity history, and formation descriptors. This formulation follows the classical behavior cloning framework [30,31].

4. Delay-Aware Control Architecture

This section presents the proposed delay-aware control pipeline, which integrates an analytical ARD-PF expert controller with a data-driven imitation-learning framework. The architecture consists of three tightly coupled components: (i) an ARD-PF expert operating under delayed information, (ii) a dataset generation process that explicitly injects communication latency into the observations, and (iii) a behavior cloning (BC) policy trained to imitate and implicitly compensate for delay-distorted expert actions. This design preserves the interpretability and structure of potential-field control while enabling improved robustness to communication-induced delays, as demonstrated in the experimental results.

4.1. ARD-PF Expert Policy for Demonstration Generation

The ARD-PF expert controller described in Section 3 is used to generate the demonstrations driving the imitation-learning pipeline. At each time step, follower i computes attractive and repulsive interaction terms following the artificial potential-field framework [22,32]. The resulting expert control input is
u i exp [ k ] = K att p i * [ k ] p i [ k ] + K rep j i p i [ k ] p j [ k ] p i [ k ] p j [ k ] 3 + ε ,
where the desired formation point p i * [ k ] incorporates rotation through (3). The control input is translated into a bounded velocity update,
v i [ k + 1 ] = sat v max v i [ k ] + u i exp [ k ] d t ,
followed by the discrete-time kinematic propagation
p i [ k + 1 ] = p i [ k ] + v i [ k + 1 ] d t .
The ARD-PF controller is particularly suitable as an expert for imitation learning, due to its smooth decentralized behavior and its well-documented sensitivity to delayed state information [3,4,29]. Consequently, the collected demonstrations naturally encode both nominal formation dynamics and delay-induced distortions, which are later exploited by the learning-based controller.

4.2. Observation Model for Learning-Based Control

The learned controller adopts a decentralized observation model commonly used in multi-UAV coordination [21,22]. Rather than relying on full-state broadcasts, each follower constructs a compact observation vector capturing local geometry, short-term motion history, and the effects of delayed communication.
The first component is a temporal stack of relative leader–follower displacements,
h i ( t ) = p 0 [ k t ] p i [ k t ] , t = 0 , , H 1 ,
which summarizes recent motion trends and implicitly encodes velocity and acceleration information. Such history-based representations are widely used in delay-aware and memory-augmented control policies [10,33], enabling the policy to reason over asynchronous state updates.
The second component is the instantaneous velocity v i [ k ] , which provides dynamic information that cannot be reconstructed from position data alone and has been shown to improve the stability of learning-based controllers [34,35]. The third component of the observation is the geometric descriptor d i , which encodes the specific role of follower i within the formation. By explicitly including this descriptor, the policy can distinguish between symmetric agents, preserve the intended formation geometry, and generalize more effectively across different swarm sizes and formation layouts.
The full observation vector is obtained by concatenating all components,
o i [ k ] = h i ( 0 ) , , h i ( H 1 ) , v i [ k ] , d i R m ,
resulting in a compact yet expressive local representation of the swarm state. This design follows established practices in decentralized and learning-based multi-agent control, where local observations are constructed to balance informativeness and scalability [31,36].
When communication is assumed to be ideal, the expert trajectories generated by the ARD-PF controller yield a nominal supervised dataset composed of input–output pairs,
( o i [ k ] , u i exp [ k ] ) ,
which provides a baseline for imitation learning and serves as a reference for subsequent comparisons under delayed communication conditions [30,37].

4.3. Delay-Augmented Dataset for Robust Learning

To explicitly incorporate the effect of communication latency during training, delays are injected directly into the observation histories used by the learning algorithm. Communication delays are injected into both leader and neighbor information used in the observation vector. Specifically, delayed leader–follower and inter-agent relative position histories are constructed for training and evaluation. For each training instance, a delay value is sampled from the discrete set
τ { 0 , 1 , , τ max } ,
and the corresponding historical state measurements are shifted accordingly,
p ˜ j [ k t ] = p j [ k t τ ] .
Each realization of the delay variable produces an augmented supervised training pair of the form
( o i ( τ ) [ k ] , u i exp [ k ] ) ,
thereby constructing a dataset that spans a wide range of latency conditions. This augmentation mechanism is conceptually related to domain-randomization and disturbance-injection techniques [38,39], which are commonly employed to improve robustness against distribution shifts. In the context of UAV swarms, the proposed approach effectively emulates communication latency arising from wireless channel variability, queuing effects, and multi-hop routing processes [2,4].

4.4. Behavior Cloning Controller

The learning-based formation controller is implemented as a multilayer perceptron (MLP) [40] with parameters θ , which maps the local decentralized observation vector to a three-dimensional control command,
u ^ i [ k ] = π θ ( o i [ k ] ) ,
where o i [ k ] is defined in (20). The network consists of two hidden layers with 64 neurons each, using ReLU activation functions [41]. This architecture offers a favorable balance between expressive capacity and training stability, and is widely used in continuous control and imitation-learning applications [34].
The policy parameters are obtained by minimizing a supervised imitation objective,
θ * = arg min θ i , k u i exp [ k ] π θ ( o i [ k ] ) 2 ,
which corresponds to an l 2 regression problem over the expert control commands [30,35]. Although this formulation does not explicitly impose stability guarantees, it allows the learned policy to faithfully reproduce the expert behavior and to generalize it when operating under delayed observations.
Following this training pipeline, three imitation-learning controllers are obtained, differing only in how communication delays are handled during training and evaluation. These variants are summarized in Table 1. The first controller (IL no delay) is trained and evaluated exclusively on delay-free data. The second variant (IL non-robust) is trained without delay but deployed under delayed observations, resulting in a clear distribution mismatch [42]. The proposed controller (IL delay-robust) is trained using the delay-augmented dataset, thereby exposing the policy to a broad range of latency realizations during learning.
By explicitly randomizing delays during training, the delay-robust controller learns control responses that are less sensitive to outdated state information. This effect directly translates into improved probabilistic stability and higher delay tolerance, as demonstrated in the experimental results reported in Section 5.

4.5. Simulation Under Delay

During evaluation, both ARD-PF and learned policies operate on delayed state information reconstructed as
p ˜ j [ k ] = p j [ k τ i j [ k ] ] ,
with delays τ i j [ k ] sampled uniformly from { 0 , 1 , , τ max } . Identical delay realizations are used across controllers to enable controlled and fair comparisons of formation accuracy, trajectory deviation, and inter-agent spacing.

4.6. Delay–Stability Predictive Model

Communication latency affects closed-loop stability through its interaction with swarm size, formation density, and the nonlinear coupling induced by the ARD-PF controller. To capture these effects, we introduce the predictive model
τ max ( N , d sep ) = π 2 k 0 N α d sep β ,
which characterizes how the admissible delay margin scales with swarm cardinality N and inter-agent spacing d sep .
The structure of (28) is motivated by classical results on time-delay systems, where local linearization of the error dynamics yields
e ˙ ( t ) K eff e ( t τ ) ,
and stability requires
τ < π 2 K eff ,
as established in the literature [43,44,45]. To connect this bound with swarm properties, the effective stiffness is modeled as
K eff k 0 N α d sep β ,
where N α captures the increase in coupling density with swarm size, d sep β reflects the influence of formation compactness, and k 0 aggregates the baseline stiffness induced by ARD-PF gains. Substitution into the delay–stability inequality yields (28), which provides a compact, physically grounded predictor of delay tolerance consistent with the empirical stability regions observed in the results section.

5. Experimental Results

For this section, we evaluated the performance of three formation-control strategies: classical ARD-PF, imitation learning trained without delay awareness, and the proposed delay-robust imitation-learning controller. The evaluation spanned a wide range of communication delays, swarm sizes, and formation spacings. Formation stability, trajectory tracking, and collision safety were assessed through qualitative trajectory inspection, time-domain error metrics, probabilistic stability analysis, and validation of an analytical delay–stability model that predicted the maximum admissible delay as a function of swarm density.

5.1. Simulation Environment

All our experiments were conducted in a custom Python version 3.11 simulation environment implementing the kinematic model in (5), the ARD-PF expert controller in (16), and the discrete communication delay operator in (7)–(9).
The leader UAV followed a square trajectory at constant velocity, generated by a piecewise-linear path planner. At each time step, the leader’s heading determined the rotation matrix R [ k ] used to compute the desired follower positions via (3). The followers tracked these reference positions using either the ARD-PF controller, the imitation-learning policy, or the delay-robust imitation-learning policy.
We evaluated the swarms with
N { 5 , 7 , 10 } , d sep { 3 , 5 , 8 } m ,
covering dense, moderate, and sparse formations. Communication delays were sampled uniformly as
τ i j [ k ] { 0 , 1 , , τ max } ,
with
τ max { 0.1 , 0.5 , 1 , 2 , 5 , 10 , 15 } s ,
where d t = 0.1 s.
Each simulation spanned five complete laps of the leader’s trajectory and produced full three-dimensional trajectories for all agents. Representative results are shown in Figure 2 for the configuration N = 5 , d sep = 5 m, v max = 1.0 m/s, and τ max = 2 s. This scenario provided a controlled yet sufficiently challenging setting to highlight the qualitative impact of communication delay on formation stability.
Clear and consistent differences in formation behavior emerged across the evaluated controllers. In the absence of communication delay, the ARD-PF controller maintained a compact formation with only mild oscillatory motion. When delay was introduced, ARD-PF remained nominally stable; however, noticeable drift and phase lag developed along the trajectory. The imitation-learning controller trained exclusively under delay-free conditions accurately reproduced the nominal formation when communication was ideal, but rapidly lost coherence once delayed observations were introduced, a consequence of the resulting distribution mismatch. By contrast, the delay-robust imitation-learning controller sustained a compact and nearly symmetric formation under the same delay realizations, providing clear evidence of effective implicit compensation for delayed state information.

5.2. Reproducibility and Implementation Details

To ensure reproducibility, all simulation, control, and learning parameters are reported explicitly. The simulations were executed with a fixed sampling interval d t = 0.1 s, maximum velocity v max = 1.0 m/s, and a square leader trajectory of side length 30 m repeated for five laps. The observation history length was H = 3 , yielding a temporal window of 0.3 s.
The communication delays were implemented as integer-valued sampling offsets according to (7)–(9). The behavior cloning dataset was generated by simulating the ARD-PF expert across all combinations of N, d sep , and τ max , resulting in approximately 1.2 × 10 5 state–action pairs.
The neural policy was trained using the Adam optimizer with learning rate 10 3 , batch size 256, and 150 training epochs. A fixed random seed was used to ensure repeatability. All the parameters are summarized in Table 2 and Table 3.

5.3. Performance Metrics

Formation stability was quantified using two complementary metrics: internal formation error and global XY offset. These metrics captured geometric distortion within the swarm and macroscopic drift relative to the leader.
  • Internal formation error.
e form [ k ] = 1 N 1 i = 1 N 1 ( p i [ k ] p 0 [ k ] ) o ˜ i [ k ] .
  • Global XY offset.
e off [ k ] = 1 N 1 i = 1 N 1 ( p i [ k ] p 0 [ k ] ) x y .
Figure 3 and Figure 4 report the smoothed time-domain evolution of both metrics. Delay-robust imitation learning maintained systematically lower error levels, reducing formation distortion by approximately 15–25% relative to delayed ARD-PF and 20–30% relative to non-robust imitation learning, averaged over all tested ( N , d sep , τ max ) configurations.

5.4. Stability Map Across Delay, Swarm Size, and Spacing

Robustness under stochastic delays was evaluated probabilistically. For each ( N , d sep , τ max ) configuration, multiple simulation runs were executed and stability was quantified as
ρ ( N , d sep , τ max ) = S R ,
where S denotes the number of stable runs out of R realizations.
Figure 5 shows that increasing the swarm size and enforcing tighter inter-agent spacing reduce the overall delay tolerance of the formation. Across all evaluated configurations, the delay-robust imitation learning controller consistently attains higher stability probabilities, particularly under moderate-to-large communication delays. For comparison purposes, a stability threshold of 70 % indicated by the blue dashed line is adopted as a decision boundary: configurations whose stability probability lies above this threshold are considered stable, whereas those below it are classified as unstable.

5.4.1. Stability Classification Criterion

Each simulation run was classified as stable or unstable, based on two objective criteria reflecting formation coherence and collision safety. We let e form [ k ] denote the internal formation error defined in Section 5.3, and we let d min [ k ] be the minimum inter-agent distance at time step k. A run was declared unstable if at any time during the simulation either
e form [ k ] > E max = 25 m ,
indicating a clear breakdown of the desired formation geometry, or
d min [ k ] < d coll = 0.5 m ,
indicating loss of collision safety.
For each configuration ( N , d sep , τ max ) , multiple simulation runs with independent random seeds were executed. The resulting stability rate was defined as the fraction of runs that remained stable over the entire simulation horizon. Table 4 reports these stability rates, where a configuration is labeled stable if at least 80 % of the runs satisfy the above criteria.

5.5. Delay Stability Model Validation

The predictive delay–stability model of (28) was validated by comparing the simulated and theoretical stability boundaries for the imitation learning controllers. The parameters were fitted via nonlinear least squares over all the tested ( N , d sep , τ max ) configurations.
Figure 6 shows that the model accurately captured the dominant scaling trends of the stability boundary. The mean relative fitting errors remained below 12%, confirming that the model provides a reliable predictor of admissible delay margins for learning-based formation control.
The discrepancy between predicted and simulated delay boundaries was quantified by
ε = τ max sim τ max model 2 τ max sim 2 ,
yielding ε IL 6.3 % , ε IL _ delay 11.8 % , and ε IL _ robust 7.4 % .
All our results were obtained in simulation; experimental multi-UAV flight validation has been left for future work.

6. Conclusions

This paper investigated delay-aware formation control for UAV swarms by combining a distance-based attractive–repulsive potential-field (ARD-PF) controller with imitation learning. By generating expert demonstrations under explicitly delayed leader–follower and inter-agent communication and by training a behavior cloning policy with delay-augmented observations the proposed approach enables implicit compensation of outdated state information without requiring explicit delay estimation or prediction mechanisms.
Our extensive simulation results demonstrate that the delay-robust imitation learning controller consistently outperformed both classical ARD-PF control and non-robust imitation learning baselines. Across a wide range of swarm sizes, formation spacings, and communication delays, the proposed policy achieved lower internal formation error, reduced global trajectory offset, and significantly higher probabilistic stability. These results confirm that exposing learning-based controllers to communication delays during training is a key factor in achieving robust coordination in realistic UAV swarm deployments.
Beyond empirical performance gains, this work introduces a predictive delay–stability model that captures how the maximum admissible communication delay scales jointly with swarm cardinality and formation density. The proposed model accurately reproduces the observed stability boundaries with low fitting error and provides a compact analytical link between communication latency and closed-loop formation stability. This relationship offers practical insight for the joint design of control and communication parameters in UAV swarms. Future work will focus on experimental validation with real UAV platforms, as well as the inclusion of packet loss, intermittent connectivity, and adaptive communication-aware control mechanisms.

Author Contributions

Conceptualization, R.V.-A.; methodology, R.V.-A., A.L.-J., and M.E.R.-Á.; software, R.V.-A.; validation, R.V.-A., A.L.-J., and M.E.R.-Á.; formal analysis, R.V.-A.; investigation, R.V.-A.; resources, A.L.-J. and M.E.R.-Á.; data curation, R.V.-A.; writing, R.V.-A.; writing, review, and editing, R.V.-A., A.L.-J., and M.E.R.-Á.; visualization, R.V.-A.; supervision, A.L.-J. and M.E.R.-Á.; project administration, R.V.-A.; funding acquisition, R.V.-A., A.L.-J., and M.E.R.-Á. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Instituto Politécnico Nacional with grant number SIP-20250270.

Institutional Review Board Statement

The research presented in this article involved exclusively computational simulations and did not include human participants, animals, or sensitive data. Accordingly, no institutional ethical approval was required.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Tao, C.; Zhang, R.; Song, Z.; Wang, B.; Jin, Y. Multi-UAV Formation Control in Complex Conditions Based on Switching Topology and Communication Constraints. Drones 2023, 7, 185. [Google Scholar] [CrossRef]
  2. Pham, T.V.; Nguyen, T.D. Path-Following Formation of Fixed-Wing UAVs under Communication Delay: A Vector Field Approach. Drones 2024, 8, 237. [Google Scholar] [CrossRef]
  3. Du, Z.; Qu, X.; Shi, J.; Lu, J. Formation Control of Fixed-Wing UAVs with Communication Delay. ISA Trans. 2024, 146, 154–164. [Google Scholar] [CrossRef] [PubMed]
  4. Wang, L.; Zhao, Y.D.; Zhang, B.L.; Cai, Z.; Xue, J.; Zhao, Y. Delay-Based Feedback Formation Control for Unmanned Aerial Vehicles with Feedforward Components. In Advances in Guidance, Navigation and Control, Proceedings of 2022 International Conference on Guidance, Navigation and Control, Harbin, China, 5–7 August 2022; Yan, L., Duan, H., Deng, Y., Eds.; Springer: Singapore, 2023; pp. 6846–6857. [Google Scholar]
  5. Guo, M.; Zhang, J.; Yan, G. Consensus and Formation Control of Multi-Agent Systems under Communication Delays: A Review. Sensors 2022, 22, 1037. [Google Scholar] [CrossRef]
  6. Yan, Z.; Han, L.; Li, X.; Dong, X.; Li, Q.; Ren, Z. Event-Triggered Formation Control for Time-Delayed Discrete-Time Multi-Agent Systems Applied to Multi-UAV Formation Flying. J. Franklin Inst. 2023, 360, 3677–3699. [Google Scholar] [CrossRef]
  7. Wang, C.; Wang, J.; Wu, P.; Gao, J. Consensus Problem and Formation Control for Heterogeneous Multi-Agent Systems with Switching Topologies. Electronics 2022, 11, 2598. [Google Scholar] [CrossRef]
  8. Correia, A.; Alexandre, L.A. A Survey of Demonstration Learning. Robot. Auton. Syst. 2024, 182, 104812. [Google Scholar] [CrossRef]
  9. Sosa-Cerón, A.D.; González-Hernández, H.G.; Reyes-Avendaño, J.A. Learning from Demonstrations in Human–Robot Collaborative Scenarios: A Survey. Robotics 2022, 11, 126. [Google Scholar] [CrossRef]
  10. Agunloye, A.O.; Ramchurn, S.D.; Soorati, M.D. Learning to Imitate Spatial Organization in Multi-robot Systems. arXiv 2024, arXiv:2407.11592. [Google Scholar] [CrossRef]
  11. Spatharis, C.; Blekas, K.; Vouros, G.A. Modelling Flight Trajectories with Multi-Modal Generative Adversarial Imitation Learning. Appl. Intell. 2024, 54, 7118–7134. [Google Scholar] [CrossRef]
  12. Bu, Y.; Yan, Y.; Yang, Y. Advancement Challenges in UAV Swarm Formation Control: A Comprehensive Review. Drones 2024, 8, 320. [Google Scholar] [CrossRef]
  13. Ouyang, Q.; Wu, Z.; Cong, Y.; Wang, Z. Formation Control of Unmanned Aerial Vehicle Swarms: A Comprehensive Review. Asian J. Control 2023, 25, 570–593. [Google Scholar] [CrossRef]
  14. Alqudsi, Y.; Makaraci, M. UAV Swarms: Research, Challenges, and Future Directions. J. Eng. Appl. Sci. 2025, 72, 82. [Google Scholar] [CrossRef]
  15. Alam, M.M.; Trestian, R.; Ghinea, G. Topology Control Algorithms in Multi-Unmanned Aerial Vehicle Networks: An Extensive Survey. J. Netw. Comput. Appl. 2022, 207, 103495. [Google Scholar] [CrossRef]
  16. Yu, H.; Liu, Y.; Wang, Z. A Proximal Policy Optimization Method in UAV Swarm Formation Control. Alex. Eng. J. 2024, 100, 268–276. [Google Scholar] [CrossRef]
  17. Fang, Y.; Yao, Y.; Zhu, F.; Chen, K. Piecewise-potential-field-based path planning method for fixed-wing UAV formation. Sci. Rep. 2023, 13, 2234. [Google Scholar] [CrossRef]
  18. Zhang, P.; Lü, T.S.; Song, L.B. Enhanced Multi-UAV Formation Control and Obstacle Avoidance Based on Improved Adaptive Artificial Potential Field (IAAPF) with Sliding Mode Control. Drones 2024, 8, 514. [Google Scholar] [CrossRef]
  19. Li, Y.; Liu, D.; Ma, K.; Chen, Y.; Liu, Y.; Chen, C.; Chen, X.; Liu, X.; Xu, N.; Yuan, J. Control of Unmanned Aerial Vehicle Swarms to Cruise and Loiter Under Strong Convective Turbulence. Drones 2025, 9, 271. [Google Scholar] [CrossRef]
  20. Wu, Y.; Wang, T.; Liu, T.; Zheng, Z.; Xu, D.; Peng, X. Adversarial Imitation Learning with Deep Attention Network for Swarm Systems. Complex Intell. Syst. 2025, 11, 26. [Google Scholar] [CrossRef]
  21. Wang, J.; Hu, X. Distributed Consensus in Multi-Vehicle Cooperative Control: Theory and Applications (Ren, W.; Beard, R.W.; 2008) [Book Shelf]. In IEEE Control Systems Magazine; IEEE: New York, NY, USA, 2010; Volume 30, pp. 85–86. [Google Scholar] [CrossRef]
  22. Olfati-Saber, R. Flocking for multi-agent dynamic systems: Algorithms and theory. IEEE Trans. Autom. Control 2006, 51, 401–420. [Google Scholar] [CrossRef]
  23. Landry, B. Planning and Control for Quadrotor Flight Through Cluttered Environments. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 2015. Available online: https://groups.csail.mit.edu/robotics-center/public_papers/Landry15.pdf (accessed on 3 January 2026).
  24. Miyazaki, R.; Yasuta, Y.; Han, X.; Tomita, K.; Kamimura, A. Decentralized Multi-UAV Formation Control and Navigation over a Self-Organizing Coordination Network. In Proceedings of the 2023 IEEE/SICE International Symposium on System Integration (SII), Atlanta, GA, USA, 17–20 January 2023; pp. 1–6. [Google Scholar] [CrossRef]
  25. Oh, K.M.; Park, M.J.; Ahn, H.S. A survey of multi-agent formation control. Automatica 2015, 53, 424–440. [Google Scholar] [CrossRef]
  26. Lissaman, P.B.S.; Schollenberger, C.A. Formation Flight of Birds. Science 1970, 168, 1003–1005. [Google Scholar] [CrossRef]
  27. Harvey, C.; Inman, D. Aerodynamic Efficiency of Gliding Birds vs. Comparable UAVs: A Review. Bioinspiration Biomimetics 2021, 16, 031001. [Google Scholar] [CrossRef]
  28. Fax, J.A.; Murray, R.M. Information flow and cooperative control of vehicle formations. IEEE Trans. Autom. Control 2004, 49, 1465–1476. [Google Scholar] [CrossRef]
  29. Jin, F.; Ye, Z.; Li, M.; Xiao, H.; Zeng, W.; Wen, L. A New Hybrid Reinforcement Learning with Artificial Potential Field Method for UAV Target Search. Sensors 2025, 25, 2796. [Google Scholar] [CrossRef] [PubMed]
  30. Argall, B.D.; Chernova, S.; Veloso, M.; Browning, B. A survey of robot learning from demonstration. Robot. Auton. Syst. 2009, 57, 469–483. [Google Scholar] [CrossRef]
  31. Ho, J.; Ermon, S. Generative Adversarial Imitation Learning. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016; pp. 4572–4580. [Google Scholar]
  32. Khatib, O. Real-time Obstacle Avoidance for Manipulators and Mobile Robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), St. Louis, MO, USA, 25–28 March 1985; pp. 500–505. [Google Scholar] [CrossRef]
  33. Ishida, Y.; Noguchi, Y.; Kanai, T.; Shintani, K.; Bito, H. Robust Imitation Learning for Mobile Manipulator Focusing on Task-Related Viewpoints and Regions. arXiv 2024, arXiv:2410.01292. [Google Scholar] [CrossRef]
  34. Schaal, S. Is Imitation Learning the Route to Humanoid Robots? Trends Cogn. Sci. 1999, 3, 233–242. [Google Scholar] [CrossRef]
  35. Ross, S.; Gordon, G.J.; Bagnell, D. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS) 2011, Fort Lauderdale, FL, USA, 11–13 April 2011; pp. 627–635. Available online: http://proceedings.mlr.press/v15/ross11a/ross11a.pdf (accessed on 3 January 2026).
  36. Cao, Y.; Yu, W.; Ren, W.; Chen, G. Recent Progress in the Consensus of Multi-Agent Systems. IEEE Trans. Ind. Inform. 2013, 9, 427–438. [Google Scholar] [CrossRef]
  37. Pomerleau, D.A. Efficient Training of Artificial Neural Networks for Autonomous Navigation. Neural Comput. 1991, 3, 88–97. [Google Scholar] [CrossRef] [PubMed]
  38. Tobin, J.; Fong, R.; Ray, A.; Schneider, J.; Zaremba, W.; Abbeel, P. Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. In Proceedings of the IROS, Vancouver, BC, Canada, 24–28 September 2017; pp. 23–30. [Google Scholar] [CrossRef]
  39. Kurutach, T.; Clavera, I.; Duan, Y.; Tamar, A.; Abbeel, P. Model-Ensemble Trust-Region Policy Optimization. arXiv 2018, arXiv:1802.10592. [Google Scholar] [CrossRef]
  40. Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks Are Universal Approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
  41. Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the ICML, Haifa, Israel, 21–24 June 2010; pp. 807–814. Available online: https://www.cs.toronto.edu/~fritz/absps/reluICML.pdf (accessed on 3 January 2026).
  42. Li, Z.; Babuska, R.; Della Santina, C.; Kober, J. Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning. arXiv 2025, arXiv:2502.07645. [Google Scholar] [CrossRef]
  43. Niculescu, S.-I. Delay Effects on Stability: A Robust Control Approach. In Computer Science Logic; Lecture Notes in Control and Information Sciences; Springer: Berlin/Heidelberg, Germany, 2001; Volume 269, ISBN 978-1-84628-553-0. [Google Scholar]
  44. Li, Z.; Yan, H.; He, Y.; Park, J.H.; Peng, Y. Stability analysis of linear systems with time-varying delay via intermediate polynomial-based functions. Automatica 2020, 113, 108756. [Google Scholar] [CrossRef]
  45. Park, J.M.; Park, P.G. Finite-interval quadratic polynomial inequalities and their application to time-delay systems. J. Franklin Inst. 2020, 357, 4316–4327. [Google Scholar] [CrossRef]
Figure 1. Formation geometry. A leader follows a reference trajectory while each follower maintains a rotated offset R [ k ] d i .
Figure 1. Formation geometry. A leader follows a reference trajectory while each follower maintains a rotated offset R [ k ] d i .
Drones 10 00034 g001
Figure 2. Representative 3D swarm trajectories under the evaluated control strategies. Each simulation spanned five laps of the leader’s trajectory.
Figure 2. Representative 3D swarm trajectories under the evaluated control strategies. Each simulation spanned five laps of the leader’s trajectory.
Drones 10 00034 g002
Figure 3. Internal formation error for all control strategies.
Figure 3. Internal formation error for all control strategies.
Drones 10 00034 g003
Figure 4. Global XY offset for all control strategies.
Figure 4. Global XY offset for all control strategies.
Drones 10 00034 g004
Figure 5. Empirical stability probability ρ as a function of maximum delay for different swarm sizes and inter-agent spacings.
Figure 5. Empirical stability probability ρ as a function of maximum delay for different swarm sizes and inter-agent spacings.
Drones 10 00034 g005
Figure 6. Regression of the delay–stability predictive model for the three IL variants.
Figure 6. Regression of the delay–stability predictive model for the three IL variants.
Drones 10 00034 g006
Table 1. Comparison of imitation learning variants evaluated in this work.
Table 1. Comparison of imitation learning variants evaluated in this work.
ControllerDelay in TrainingDelay in EvaluationDataset Size
IL (no delay)NoNoSame
IL (non-robust)NoYesSame
IL (delay-robust)Yes (random τ )YesSame
Table 2. Simulation and ARD-PF parameters used throughout the experiments.
Table 2. Simulation and ARD-PF parameters used throughout the experiments.
ParameterValueDescription
d t 0.1  sSimulation sampling interval
v max 1.0  m/sMaximum UAV velocity
H3Observation history length
K att 2.0 Attractive potential gain
K rep 2.0 Repulsive potential gain
ε 10 2 Repulsion smoothing constant
d sep { 3 , 5 , 8 }  mNominal inter-agent spacing
N { 5 , 7 , 10 } Total number of UAVs
τ max 0.1 –15 sMaximum communication delay
Table 3. Training parameters for imitation learning controllers.
Table 3. Training parameters for imitation learning controllers.
ParameterValueDescription
Training epochs150Number of supervised training epochs
Batch size256Mini-batch size
Learning rate 10 3 Adam learning rate
Random seeds10Independent seeds
Dataset size 1.2 × 10 5 State–action pairs
Delay augmentationUniform [ 0 , τ max ] Robust IL only
Network architecture2 × 64 ReLUMLP hidden layers
Table 4. Empirical stability outcomes based on multi-run evaluation ( R = 10 runs per configuration). A configuration is labeled Stable if at least 80 % of runs satisfied the stability criteria of Section 5.4.1.
Table 4. Empirical stability outcomes based on multi-run evaluation ( R = 10 runs per configuration). A configuration is labeled Stable if at least 80 % of runs satisfied the stability criteria of Section 5.4.1.
d sep (m)N τ max (s)ARD-PFARD-PF (Delay)IL (No Delay)IL (Delay)IL (Delay-Robust)
350.1–2StableStableStableStableStable
355StableStableStableUnstableStable
3510–15StableUnstableStableUnstableStable
370.1–2StableStableStableStableStable
375StableStableStableUnstableStable
3710–15StableUnstableStableUnstableStable
3100.1–0.5StableStableStableUnstableUnstable
3101–5StableStableUnstableUnstableStable
31010–15StableStableUnstableUnstableUnstable
550.1–2StableStableStableStableStable
555StableUnstableStableUnstableStable
5510–15UnstableUnstableStableUnstableStable
570.1–0.5StableStableStableStableStable
571–5StableStableUnstableUnstableStable
5710–15UnstableUnstableUnstableUnstableStable
5100.1–2StableUnstableUnstableUnstableUnstable
5105–15StableUnstableUnstableUnstableUnstable
85–10allMostly unstableMostly unstableMostly unstableMostly unstableMostly unstable
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vera-Amaro, R.; Luviano-Juárez, A.; Rivero-Ángeles, M.E. Delay-Aware UAV Swarm Formation Control via Imitation Learning from ARD-PF Expert Policies. Drones 2026, 10, 34. https://doi.org/10.3390/drones10010034

AMA Style

Vera-Amaro R, Luviano-Juárez A, Rivero-Ángeles ME. Delay-Aware UAV Swarm Formation Control via Imitation Learning from ARD-PF Expert Policies. Drones. 2026; 10(1):34. https://doi.org/10.3390/drones10010034

Chicago/Turabian Style

Vera-Amaro, Rodolfo, Alberto Luviano-Juárez, and Mario E. Rivero-Ángeles. 2026. "Delay-Aware UAV Swarm Formation Control via Imitation Learning from ARD-PF Expert Policies" Drones 10, no. 1: 34. https://doi.org/10.3390/drones10010034

APA Style

Vera-Amaro, R., Luviano-Juárez, A., & Rivero-Ángeles, M. E. (2026). Delay-Aware UAV Swarm Formation Control via Imitation Learning from ARD-PF Expert Policies. Drones, 10(1), 34. https://doi.org/10.3390/drones10010034

Article Metrics

Back to TopTop