Bio-Inspired Fission–Fusion Control and Planning of Unmanned Aerial Vehicles Swarm Systems via Reinforcement Learning

: Swarm control of unmanned aerial vehicles (UAV) has emerged as a challenging research area, primarily attributed to the presence of conflicting behaviors among individual UAVs and the influence of external movement disturbances of UAV swarms. However, limited attention has been drawn to addressing the fission–fusion motion of UAV swarms for unknown dynamic obstacles, as opposed to static ones. A Bio-inspired Fission–Fusion control and planning via Reinforcement Learning (BiFRL) algorithm for the UAV swarm system is presented, which tackles the problem of fission–fusion behavior in the presence of dynamic obstacles with homing capabilities. Firstly, we found the kinematics models for the UAV and swarm controller, and then we proposed a probabilistic starling-inspired topological interaction that achieves reduced overhead communication and faster local convergence. Next, we develop a self-organized fission–fusion control framework and a fission decision algorithm. When dealing with various situations, the swarm can autonomously re-configure itself by fissioning an optimal number of agents to fulfill the corresponding tasks. Finally, we design a sub-swarm confrontation algorithm for path planning optimized by reinforcement learning, where the sub-swarm can engage in encounters with dynamic obstacles while minimizing energy expenditure. Simulation experiments demonstrate the capability of the UAV swarm system to accomplish self-organized fission–fusion control and planning under different interference scenarios. Moreover, the proposed BiFRL algorithm successfully handles adversarial motion with dynamic obstacles and effectively safeguards the parent swarm.


Introduction
UAV swarm systems exhibit collective behavior among drones and their interaction with the environment [1].Nature frequently serves as an inspiration source for the development of these systems, drawing from phenomena like insect movement [2], bird flocks [3], and fish schools [4,5].The motion of such swarms is inherently dynamic, with the group size and composition undergoing frequent changes as individual members split or merge, referred to as "fission-fusion" behavior [6][7][8].Fission-fusion of the swarm holds significant importance in the natural world for various animal species.For instance, tree-dwelling bats rely on shared survival space within the group [9], while bird flocks enhance their chances of survival [10], and herds of bison and giraffes evade predators [11].
Recently, researchers from diverse fields have increasingly recognized the significance of fission-fusion behaviors.They have successfully emulated these behaviors to achieve robot-controlled motion [12][13][14], enhance the efficiency of swarm resource search [15], and accomplish planning objectives, such as task allocation and obstacle avoidance [1,16,17].For example, Wang et al. [13] incorporate the fission-fusion motion into the formation controller of an underwater vehicle, Nauta et al. [15] study the population resource search with the fission-fusion concept, whereas Reséndiz-Benhumea et al. [16] integrate the fissionfusion movements observed in ant colonies into the task assignment algorithm for robot swarms.As such, the investigation of fission-fusion behavior in swarm systems holds substantial theoretical research and practical application values.
Swarm fission-fusion behavior involves the ability of individuals within a certain range to gather into a cohesive group with synchronized movement through specific interactions.Reynolds et al. [18] propose three fundamental behavioral rules to address this issue: collision avoidance, speed consistency, and mutual aggregation.Subsequently, extensive research has been conducted based on these rules for implementing swarm controls [19][20][21] and exploring swarm fission-fusion dynamics [17,[22][23][24].Nevertheless, most of these studies primarily focus on collision avoidance for static obstacles within a single swarm or during the fission-fusion process in a known environment with complete knowledge.There has been limited research on swarm planning in the presence of unknown dynamic obstacles, particularly of those with tracking capabilities.This is primarily because existing path planning strategies for static or dynamic obstacles often rely on global environment information [25][26][27][28].For instance, Wang et al. [26] perform path planning through a sampling-based algorithm, while Garrett et al. [27] propose to address the continuous space subproblems and integrate the discrete and continuous aspects of the search process to achieve path planning objectives.In the real world, however, many dynamic obstacles remain unknown or possess tracking capabilities, continually intruding upon and pursuing the targeted swarm to disrupt its progress.Previous methods [22][23][24][25] for swarm control and planning may be significantly affected by limited success rate and resource utilization in reaching the intended destination.
Typical swarm controllers incorporate fixed-neighbor distance interactions for fissionfusion motions [1,22,[29][30][31].However, this interaction manner imposes a communication load that increases with the swarm size, making it unfeasible for large-scale swarms [25, [29][30][31].In practice, large-scale swarms are usually achieved through limited interactions among a subset of individuals, enabling the formation of expansive swarms [32][33][34].Notably, studies have shown that each starling in a flock interacts with only six to seven of its closest neighbors, allowing for the swarm formation comprising thousands of individuals through topological interaction structures [32].As such, numerous researchers have investigated the starling topology and its utilization in swarms [35][36][37].However, in the context of UAV swarms, the reliance on a seven-nearest-neighbor topological interaction model often leads to local convergence, as shown in Figure 1.It still remains a problem of the local convergence to maintain swarm integrity while reducing the communication load among UAV swarm members.
Based on the above discussion, this paper focuses on investigating the bio-inspired fission-fusion control and planning of UAV swarm systems by enhancing the starling topology and incorporating a reinforcement learning algorithm.The proposed approach, referred to as the Bio-inspired Fission-Fusion Control and Planning via Reinforcement Learning (BiFRL), encompasses several key contributions.Firstly, we introduce a probabilistic starling-inspired topological interaction (PSTI) structure, which deviates from the traditional fixed-range neighbor interactions in swarm motion while effectively reducing the communication load and likelihood of local convergence.Next, a self-organized fission-fusion control framework and a fission decision algorithm are developed, which enables the swarm to autonomously divide into sub-swarms when encountering dynamic obstacles.It allows for precise control of sub-swarm units.Finally, we propose a sub-swarm confrontation via reinforcement learning (SCRL), which can efficiently plan confrontation movement with minimal energy loss through reinforcement learning techniques.This characteristic facilitates the original objectives of the parent swarm remaining unaffected by dynamic obstacles, while the sub-swarms can seamlessly reintegrate into the parent swarm once the interference ceases.Finally, the effectiveness and robustness of the proposed BiFRL algorithm for UAV swarm are validated through simulation models that consider dynamic obstacles approaching from different directions.The present study provides the following contributions:  We introduced a probabilistic starling-inspired topological interaction structure, which effectively reduces the communication load and likelihood of local convergence.


A self-organized fission-fusion control framework for UAV swarms is presented that extends the existing social force model, where a fission decision algorithm is designed to manipulate the composition of sub-swarms and target the dynamic obstacles.


We developed a reinforcement learning sub-swarm confrontation algorithm to achieve a self-organized sub-swarm against dynamic obstacles in unknown environments, which significantly improves the adaptability of UAV swarms.


The proposed method s feasibility and validity are demonstrated through extensive numerical simulations, accompanied by the development of several numerical evaluation indicators.
The subsequent sections of this paper are structured as follows: In Section 2, we present the formulation of the problems and review of existing models pertaining to fissionfusion control and planning in UAV swarms.Section 3 presents the proposed BiFRL framework and its components in detail.We then conduct the simulation experiments to analyze the proposed method in Section 4. Finally, the findings and conclusions of this study are presented in Section 5.

Unmanned Aerial Vehicles Kinematic Model
In the event that the UAV is endowed with a tri-loop autopilot for velocity, altitude control, and heading angle.The kinematic model of the UAV can be simplified as follows [38]: The present study provides the following contributions: • We introduced a probabilistic starling-inspired topological interaction structure, which effectively reduces the communication load and likelihood of local convergence.• A self-organized fission-fusion control framework for UAV swarms is presented that extends the existing social force model, where a fission decision algorithm is designed to manipulate the composition of sub-swarms and target the dynamic obstacles.• We developed a reinforcement learning sub-swarm confrontation algorithm to achieve a self-organized sub-swarm against dynamic obstacles in unknown environments, which significantly improves the adaptability of UAV swarms.• The proposed method's feasibility and validity are demonstrated through extensive numerical simulations, accompanied by the development of several numerical evaluation indicators.
The subsequent sections of this paper are structured as follows: In Section 2, we present the formulation of the problems and review of existing models pertaining to fission-fusion control and planning in UAV swarms.Section 3 presents the proposed BiFRL framework and its components in detail.We then conduct the simulation experiments to analyze the proposed method in Section 4. Finally, the findings and conclusions of this study are presented in Section 5.

Unmanned Aerial Vehicles Kinematic Model
In the event that the UAV is endowed with a tri-loop autopilot for velocity, altitude control, and heading angle.The kinematic model of the UAV can be simplified as follows [38]: are the rate and control input commands of horizontal velocity; ψ ϖ i , Ψ are the rate and control input commands of the heading angle; λ ϖ i is altitude change rate and control input commands of agent i in ϖ; α ψ , α v , α h , and α λ are the self-driving instrument control parameters.The following are the UAV flight condition constraints that are taken into account.
where v min and v max are the minimal and ceiling horizontal speeds, respectively; φ max indicates the maximum lateral overload; g denotes the acceleration of gravity; λ min and λ max present the minimum and maximum height change rates, respectively, which are all greater than zero.

Dynamic Obstacle Movement Model
In this research, a dynamic obstacle with an expansive sensing limit is established, and the UAVs in its proximity are selected as tracking targets.The kinematic equation governing the dynamic obstacle is formulated as follows: .
where x invader = (x invader , y invader , h invader ) ∈ R n represents the three-dimensional position coordinates of the dynamic obstacle, v invader ∈ R n and u invader ∈ R n are the velocity vector rate and control input of the dynamic obstacle, κ aut invader and κ invader are the inertia coefficient and racking factor of the dynamic obstacle, and −δ invader ||v invader || 2 v invader is the frictional force generated by the interaction between the dynamic obstacle and its surrounding environment.

Traditional Unmanned Aerial Vehicles Swarm Dynamics Model
In the present investigation, two swarm systems are present ϖ ∈ [sub − swarm, parent − swarm].In the present investigation, two swarm systems comprising a total of N agents are examined.It is observed that these swarms traverse through a threedimensional space without any imposed boundary constraints.It is noteworthy that the two swarms under consideration are equivalent and autonomous, with each adhering to its own set of swarm movement regulations.The subsequent double integrator governs the motion of the agent: . where , indicating the velocity vector, u ϖ i ∈ R 3 denotes the control input acting on; m ϖ i is the quality, −ξ||v ϖ i || 2 v ϖ i is the friction against air, and ξ is the damping factor of air.The majority of internal interactions between agents adhere to the cohesion alignment [22] and separation rules [21].A prototypical implementation is outlined as follows: Appl.Sci.2024, 14, 1192 5 of 20 where u is the velocity alignment, u is the navigation function, u is the position of cooperation, and γ ine is the inertia coefficient.u is defined as follows: where Γ pos is the coefficient of position cooperation, l ϖ a , l ϖ c are the agent desire spacing and the motion attenuation factor, d ϖ ij is the distance between two agents, N ϖ i is utilized to represent the number of agents that are interacting with agent i in ϖ, and N ϖ i (t) is the collection of the neighbors in ϖ that are interacting with agent i at time t.
Typical definitions of u is given as follows: where Γ vel is the parameter of u . Equation ( 8) represents the widely recognized "velocity consensus" algorithm, which allows the swarm to quickly reach a common speed and sustain a homogeneous state [18].The present study aimed to investigate the ability of a swarm using Equation ( 8) to perform self-organized fission movements when only a few agents in the swarm detect the presence of an obstacle.However, researcher findings revealed that the swarm was unable to execute self-organized fission movements in such a scenario [34].To address this limitation, Yang et al. [17] introduced an intermittent selective mechanism that allows swarms encountering static obstacles to engage in fission-fusion motion.However, when confronted with dynamic obstacles possessing tracking capabilities or uncertain interference directions, the UAV swarm often struggles to complete its flight with optimal resource utilization.Furthermore, such situations may severely disrupt the normal movement of the swarm.The effects of dynamic obstacles on traditional swarm motion are illustrated in Figure 2.
Hence, it is of great interest to develop a new control algorithm that facilitates the self-organized fission-fusion of UAV swarms in three-dimensional space while maintaining a lightweight structure and low communication costs.In this study, we propose a probabilistic starling-inspired topological interaction approach, which enables the UAV swarm to execute fission-fusion motions while minimizing communication requirements and reducing local convergence among swarm members.Additionally, we propose a sub-swarm confrontation algorithm based on reinforcement learning that is designed specifically for unknown dynamic obstacles with tracking capabilities.This algorithm allows the subswarm to self-organize and effectively confront unknown dynamic obstacles with minimal energy loss while simultaneously safeguarding the parent swarm from the disruptive influence of these obstacles.
Appl.Sci.2024, 14, x FOR PEER REVIEW 5 of 21 where  = ( ,  , ℎ ) ∈  is the position of UAV  in ϖ ∈ [ −  ,  −  , ∈  , indicating the velocity vector,  ∈  denotes the control input acting on;  is the quality, −‖ ‖  is the friction against air, and  is the damping factor of air.
The majority of internal interactions between agents adhere to the cohesion alignment [22] and separation rules [21].A prototypical implementation is outlined as follows: where  is the velocity alignment,  is the navigation function,  is navigational force,  is the position of cooperation, and  is the inertia coefficient. is defined as follows: where  is the coefficient of position cooperation,  ,  are the agent desire spacing and the motion attenuation factor,  is the distance between two agents,  is utilized to represent the number of agents that are interacting with agent  in ϖ, and  () is the collection of the neighbors in ϖ that are interacting with agent  at time .
Typical definitions of  is given as follows: where  is the parameter of  .Equation ( 8) represents the widely recognized "velocity consensus" algorithm, which allows the swarm to quickly reach a common speed and sustain a homogeneous state [18].The present study aimed to investigate the ability of a swarm using Equation ( 8) to perform self-organized fission movements when only a few agents in the swarm detect the presence of an obstacle.However, researcher findings revealed that the swarm was unable to execute self-organized fission movements in such a scenario [34].To address this limitation, Yang et al. [17] introduced an intermittent selective mechanism that allows swarms encountering static obstacles to engage in fissionfusion motion.However, when confronted with dynamic obstacles possessing tracking capabilities or uncertain interference directions, the UAV swarm often struggles to complete its flight with optimal resource utilization.Furthermore, such situations may severely disrupt the normal movement of the swarm.The effects of dynamic obstacles on traditional swarm motion are illustrated in Figure 2.  Hence, it is of great interest to develop a new control algorithm that facilitates the self-organized fission-fusion of UAV swarms in three-dimensional space while maintaining a lightweight structure and low communication costs.In this study, we propose a probabilistic starling-inspired topological interaction approach, which enables the UAV swarm to execute fission-fusion motions while minimizing communication requirements and reducing local convergence among swarm members.Additionally, we propose a subswarm confrontation algorithm based on reinforcement learning that is designed specifically for unknown dynamic obstacles with tracking capabilities.This algorithm allows the sub-swarm to self-organize and effectively confront unknown dynamic obstacles with minimal energy loss while simultaneously safeguarding the parent swarm from the disruptive influence of these obstacles.

Conversion Relations between Swarm Controller and Kinematic Model
The  obtains the autopilot control input for UAV is given as follows: where  ,  are the swarm control inputs in the horizontal direction and  is the control inputs in the height direction.
The output values from the unmanned aerial vehicle dynamics model can be translated into vectors denoting both the position and velocity.These vectors serve as the inputs for the UAV swarm controller, as outlined below:

Conversion Relations between Swarm Controller and Kinematic Model
The u ϖ i obtains the autopilot control input for UAV is given as follows: where u ϖ x i , u ϖ y i are the swarm control inputs in the horizontal direction and u ϖ h i is the control inputs in the height direction.
The output values from the unmanned aerial vehicle dynamics model can be translated into vectors denoting both the position and velocity.These vectors serve as the inputs for the UAV swarm controller, as outlined below:

Bio-Inspired Fission-Fusion Control and Planning via Reinforcement Learning Algorithm
The proposed BiFRL algorithm consists of four parts.First, we propose a probabilistic starling-inspired topological Interaction that provides a starling communication structure for UAV swarms that avoids local convergence.Second, in the SFCRL algorithm, we have developed a self-organized fission-fusion control framework tailored for unmanned aerial vehicle swarms.Then, on this basis, we propose a fission decision algorithm and a sub-swarm confrontation via reinforcement learning to realize the effect of a controllable number of sub-swarm in fission-fusion swarm movement and confrontation movement facing unknown dynamic obstacles.

Probabilistic Starling-Inspired Topological Interaction
In practical scenarios, interaction structures that rely on fixed-distance neighbors often result in large communication overheads, thus hindering the possibility of large-scale UAV swarming in real-world applications.To address the above issues, we present a probabilistic starling-inspired topological interaction approach against the existing sevennearest-neighbor topological interaction structure in Algorithm 1.It draws inspiration from the topological interaction observed in starling flocks and offers a solution to mitigate the communication burden associated with fixed-distance neighbors while reducing the occurrence of localized swarming.It introduces a probabilistic decision model that performs a probabilistic decision when a swarm of UAVs swarms a localized convergence, and if two swarms are identified as locally convergent, the UAVs in the swarm will interact with the UAVs that are farther away from them as the new topology.

Self-Organized Fission-Fusion Control Framework
The fission-fusion dynamics within a swarm involve two contrasting and competitive behaviors.The fusion behavior necessitates the formation of a collectively coordinated ensemble among all individuals, while the fission behavior requires a disruption of the original order, giving rise to distinct smaller sub-swarms [5,19].To account for the influence of dynamic obstacles, we integrate intrusive forces into the self-organized fission-fusion control algorithm, as delineated below: where γ ine v ϖ i represents the inertial term associated with the velocity of the UAV; ζϑ ϖ i pertains to the stochastic disturbance term generated in the context of the UAV.To address the interference posed by dynamic obstacles, we extend velocity coordination in the following manner: where u lur represents the attractive force generated when a sub-group identifies dynamic obstacles.u tra is the capture force generated when a sub-swarm engages in the capture of dynamic obstacles, with a detailed analysis of the specific mechanisms behind the capture force provided in Section 3.4.℘ ϖ j is the state of the agent.We have taken into consideration the impact of dynamic obstacles on sub-swarm and expanded the position cooperation term based on Equation (7).The definition of position cooperation is outlined as follows: where ε invader is the position interference coefficient of the dynamic obstacle, and d ϖ iinvader represents the distance between the dynamic obstacle and agent i.According to Equation (14), when the sub-swarm is tighter from the dynamic obstacle, the further the UAVs of the sub-swarm are from each other to prevent the agents in the sub-swarm from getting too close to each other to be more vulnerable togetherly to attack.

Fission Decision Algorithm
Algorithm 1 has depicted the establishment of the interaction topology for the entire swarm through a limited number of interactions among agents.We further illustrate the sub-swarm selection mechanism employed by the swarm when encountering an obstacle, as shown in Algorithm 2. Under the influence of topological interactions, the algorithm selectively organizes sub-swarm based on the interference direction of dynamic obstacles, concurrently achieving a controllable number of agents within the swarm.

Sub-Swarm Confrontation via Reinforcement Learning
After splitting the sub-swarms, we formulate the sub-swarm path planning problem with a Markov decision process (MDP).An MDP provides a mathematical framework for modeling sequential decision-making problems where an agent must choose actions in a sequence to achieve a desired goal.The MDP can be denoted as a 4-tuple: ⟨S, A, P, R⟩, where S is the set of state s t , A is the set of available actions (a t ) of the UAV, P is the transition probability distribution, and R defined as S × A → R , which is the reward function.
In time slot t, the state S in the environment can be denoted as S t = c ps , c ss , c e , c t , which indicates the coordinate of the parent swarm, sub-swarm, dynamic obstacle, and target, respectively.The action of a UAV can be written as A t = {v t , θ t }, which denote the flying speed and direction of the UAV, respectively.
The reward function in time slot t is comprised of three parts, which are described as The first term r s2e is a reward for the distance between the sub-swarm and the dynamic obstacle, denoted as are the safe and capture distance between the sub-swarm and dynamic obstacle, respectively.This term is parameterized through σ and α.
The second term (r s2t ) is a reward for the distance between the sub-swarm and the target, given by r s2t = −βd s2t + c 1 (17) where d s2t is the distance between the sub-swarm and the target, and β and c 1 are parameters for adjustment.The third term (r e2p ) is a reward for the distance between the dynamic obstacle and the parent swarm, and it is described as To address the path planning problem of the sub-swarm, we propose an RL-based algorithm.Reinforcement learning is an effective approach for tackling sequential decision problems in MDPs, aiming to maximize the cumulative reward within an episode.Among various RL algorithms, the proximal policy optimization (PPO) algorithm stands out as one of the most efficient methods for policy optimization.Another notable off-policy RL algorithm is the soft actor-critic (SAC) algorithm, which demonstrates high sample efficiency.Additionally, deep Q-learning (DQN) is a widely used RL method, which, however, is limited to discrete action spaces.In this study, we introduce the SAC algorithm to realize the antimotion of sub-swarm with dynamic obstacles.The concept of swarm motion order proposed by Vicsek [38] is typically assessed using an order parameter.In this section, we employ a set of quantitative metrics as order parameters to analyze the fission-fusion motion of the UAV swarm.
Polarization Index.φ ∈ [0, 1] denotes the degree to which all drones tend to move in the same direction at this moment.Given the introduction of stochastic disturbances, we have established a benchmark (φ f lock ) for the polarization index.When surpassing φ f lock , it indicates the formation of a stable swarm.
where v ϖ i ∈ R 3 is the velocity vector.Differentiation Index.The single polarization index does not adequately capture the movement characteristics of the swarm during sub-swarm movements.To address this limitation, we incorporate the differentiation index [17] to assess the velocity variation among agents within the swarm, defined as follows: where Γ and ℏ indicate the skewness and kurtosis of the velocity distribution of the UAV swarm, respectively, for differentiation index λ ∈ [0, 1], when λ = 1 denotes two independent Renoulli distributions signifies the complete fission of velocities; In this study, when λ > 0.9 we determine that two independent swarms have been formed.

Performance Evaluation Index
To assess the effectiveness of the swarm in resisting the interference from dynamic obstacles, this section proposes the precision of stimuli as an indicator of the swarm's resilience to interference.Additionally, we establish a communication load to quantify the communication pressure.
Precision of Stimuli.The precision of stimuli denotes the proximity of the swarm's motion direction to that of dynamic obstacles.When Λ = 0, it signifies a lack of interference as the motion directions are dissimilar.Conversely, when Λ = 1, it indicates severe interference, as the motion directions are entirely congruent.The definition is as follows: where r i , r att and r o , respectively signify the unit velocity directions of unmanned aerial vehicle i, the swarm, and the dynamic obstacles.Communication Load.This term represents the average communication cost incurred by the UAV fleet, where a lower communication load indicates a reduced average communication cost.It is defined as follows: where N ϖ il represents the number of unmanned aerial vehicles interacting with unmanned aerial vehicle i.

Detailed Simulation Parameters
Tables 1 and 2 show the detailed simulation parameters of the experiments and RLbased path planning of the sub-swarm, respectively.It is worth noting that the purpose of the parameter list we have provided is in the hope that researchers will be able to reproduce the methodology of this study more quickly.In fact, the parameters in the table can be modified according to different research situations, which does not affect the validity of the proposed algorithm.

Simulation Results Analysis
In this section, we assess the ability of the UAV swarm to achieve fission-fusion and perform confrontation movements, ensuring that the parent swarm movement remains unaffected during anti-confrontation scenarios.

Evaluation of the Probabilistic Starling-Inspired Topological Structure
For the validation of the PSTI structure, we conducted multiple tests with varying birth ranges for the agents (ranging from 4 to 13).Each test consisted of 1000 trials, with 130 iterations per trial.Figure 3 displays the probabilities of generating a stable swarm using the previous seven-nearest-neighbor and our proposed PSTI structures, respectively, after randomly generating 20 agents at different birth ranges.

Simulation Results Analysis
In this section, we assess the ability of the UAV swarm to achieve fission-fusion and perform confrontation movements, ensuring that the parent swarm movement remains unaffected during anti-confrontation scenarios.

Evaluation of the Probabilistic Starling-Inspired Topological Structure
For the validation of the PSTI structure, we conducted multiple tests with varying birth ranges for the agents (ranging from 4 to 13).Each test consisted of 1000 trials, with 130 iterations per trial.Figure 3 displays the probabilities of generating a stable swarm using the previous seven-nearest-neighbor and our proposed PSTI structures, respectively, after randomly generating 20 agents at different birth ranges.From Figure 3, it is observed that both methods achieve high swarm rates within a small range.However, as the range increases, the number of seven-nearest-neighbour topological interaction models gradually exhibits local convergence problems.Even at a range of 15, the probability of local convergence reduces to 50 percent.Conversely, the proposed PSTI structure consistently produces superior results across all tested ranges, which demonstrates its ability to enhance the swarming efficiency of UAV swarms while reducing the occurrence From Figure 3, it is observed that both methods achieve high swarm rates within a small range.However, as the range increases, the number of seven-nearest-neighbour topological interaction models gradually exhibits local convergence problems.Even at a range of 15, the probability of local convergence reduces to 50 percent.Conversely, the proposed PSTI structure consistently produces superior results across all tested ranges, which demonstrates its ability to enhance the swarming efficiency of UAV swarms while reducing the occurrence of local convergence.It is worth noting that the PSTI structure does exhibit some instances of local convergence due to the random characteristics of the incorporated probability coefficients.However, we can maintain a low communication load while achieving superior swarming performance with our proposed approach by balancing the probability factors.

Simulation of the Sub-Swarm Confrontation via Reinforcement Learning Algorithm
We build a simulation environment to satisfy the demand for abundant interactions between the swarm and the dynamic obstacle.Assuming that the mission area is 100 × 100.The coordinates of the parent swarm, sub-swarm, and dynamic obstacle are generated by the planning algorithms mentioned above.We use the center point of the sub-swarm to optimize its trajectory, and the sub-swarm flies at a constant altitude of its initial height.When an episode starts, the coordinates of both swarms and the dynamic obstacle are generated.Once the parent swarm is captured by the dynamic obstacle, when d e2p ≤ d min e2p or the sub-swarm runs for more than T = 28 steps, the episode ends.The maximal speed of the parent swarm, sub-swarm, and the dynamic obstacle are v max p = 1, v max sub = 2, v max e = 1.5, respectively.
We then evaluate the performance of RL-based algorithms in the simulation environment, including the PPO, SAC, and DQN methods.For the sake of inadaptability to the continuous action space of the DQN algorithm, we discretize each action of sub-swarm (i.e., the flying speed (v t ) and direction (θ t )) into 10.The three algorithms have the same hyperparameters of γ = 0.99, a learning rate of l r = 0.0003, and network architecture of N = 128 × 128 × 128. Figure 4 shows the accumulative reward of these RL algorithms over training steps for the same network depth.It can be observed that these algorithms achieve similar accumulated rewards, while the agent based on the SAC algorithm converges faster and is as stable as the agent of the PPO algorithm.

𝑑
≤  or the sub-swarm runs for more than  = 28 steps, the episode ends.The maximal speed of the parent swarm, sub-swarm, and the dynamic obstacle are  = 1,  = 2,  = 1.5, respectively.We then evaluate the performance of RL-based algorithms in the simulation environment, including the PPO, SAC, and DQN methods.For the sake of inadaptability to the continuous action space of the DQN algorithm, we discretize each action of sub-swarm (i.e., the flying speed ( ) and direction ( )) into 10.The three algorithms have the same hyperparameters of  = 0.99, a learning rate of  = 0.0003, and network architecture of  = 128 × 128 × 128. Figure 4 shows the accumulative reward of these RL algorithms over training steps for the same network depth.It can be observed that these algorithms achieve similar accumulated rewards, while the agent based on the SAC algorithm converges faster and is as stable as the agent of the PPO algorithm.Simulation results of swarm motion are depicted in Figure 6a-f.Figure 6a showcases the whole process of fission-fusion motion of the UAV swarm when encountering dynamic obstacles.This motion is achieved through our proposed BiFRL algorithm.The subswarm successfully executes the confrontation movement and seamlessly integrates back

Simulation of the Bio-Inspired Fission-Fusion Control and Planning via Reinforcement Learning Algorithm
Simulation results of swarm motion are depicted in Figure 6a-f.Figure 6a showcases the whole process of fission-fusion motion of the UAV swarm when encountering dynamic obstacles.This motion is achieved through our proposed BiFRL algorithm.The sub-swarm successfully executes the confrontation movement and seamlessly integrates back into the parent swarm upon completion.The point of origin for the dynamic obstacle is (40, 60, 30).When the swarm perceives the presence of dynamic obstacles, it initiates a self-organizing fission into two swarms, which include a parent swarm and a sub-swarm.The two swarms swiftly synchronize their positions.the confrontation process, the sub-swarm effectively confronts the interference from dynamic obstacles utilizing the SFCRL algorithm.Finally, following the resolution of the sub-swarm's engagement, it autonomously converges back into the parent swarm, fusion into a stable and cohesive swarm.Figure 6b illustrates random initial spots of all agents.Figure 6c demonstrates the rapid formation of a stable swarm by intelligent agents under random initial spots.In Figure 6d, the swarm perceives the obstacle, leading to its self-organization and fission into two state-stable swarms.The confrontation motion against dynamic obstacles is executed based on the SFCRL Figure 6b illustrates random initial spots of all agents.Figure 6c demonstrates the rapid formation of a stable swarm by intelligent agents under random initial spots.In Figure 6d, the swarm perceives the obstacle, leading to its self-organization and fission into two state-stable swarms.The confrontation motion against dynamic obstacles is executed based on the SFCRL algorithm.Figure 6e exhibits the sub-swarm completing its confrontation movement and initiating its return to the parent swarm after the dynamic obstacle stops tracking.Lastly, Figure 6f illustrates the self-organization of the sub-swarm as it returns to the parent swarm following the conclusion of the antagonistic movement, culminating in a fusion.7 demonstrate that the entire swarm successfully completes the swarming within 10 s.After the fission campaign, both swarms have strong robustness.Upon the sub-swarm's return to the parent swarm, there is a brief decrease in the polarization index, followed by stabilization around 1. Notably, the parent swarm is unaffected by the dynamic obstacle throughout the entire process.
Appl.Sci.2024, 14, x FOR PEER REVIEW 17 of 21 algorithm.Figure 6e exhibits the sub-swarm completing its confrontation movement and initiating its return to the parent swarm after the dynamic obstacle stops tracking.Lastly, Figure 6f illustrates the self-organization of the sub-swarm as it returns to the parent swarm following the conclusion of the antagonistic movement, culminating in a fusion.7 demonstrate that the entire swarm successfully completes the swarming within 10 s.After the fission campaign, both swarms have strong robustness.Upon the sub-swarm s return to the parent swarm, there is a brief decrease in the polarization index, followed by stabilization around 1. Notably, the parent swarm is unaffected by the dynamic obstacle throughout the entire process.The differentiation index showed that 20 randomly generated UAVs rapidly established stable swarms within the first 0-3 s of the initial, which persists until approximately  algorithm.Figure 6e exhibits the sub-swarm completing its confrontation movement and initiating its return to the parent swarm after the dynamic obstacle stops tracking.Lastly, Figure 6f illustrates the self-organization of the sub-swarm as it returns to the parent swarm following the conclusion of the antagonistic movement, culminating in a fusion.

Evaluation of the Bio-Inspired Fission-Fusion Control and Planning via Reinforcement Learning Algorithm
Figure 7 depicts the temporal evolution of the polarization index of the UAV swarm combined with reinforcement learning.The polarization indices in Figure 7 demonstrate that the entire swarm successfully completes the swarming within 10 s.After the fission campaign, both swarms have strong robustness.Upon the sub-swarm s return to the parent swarm, there is a brief decrease in the polarization index, followed by stabilization around 1. Notably, the parent swarm is unaffected by the dynamic obstacle throughout the entire process.The differentiation index showed that 20 randomly generated UAVs rapidly established stable swarms within the first 0-3 s of the initial, which persists until approximately The differentiation index showed that 20 randomly generated UAVs rapidly established stable swarms within the first 0-3 s of the initial, which persists until approxi-mately 24 s.At around 27 s, the sub-swarm undergoes rapid fission, forming two stable sub-swarms, while maintaining a constant differentiation index of approximately 1. Subsequently, at approximately 92 s, after completing the confrontation and returning to the parent swarm, it seamlessly fuses back into a stable swarm, thereby validating the effectiveness of our proposed algorithm.
Figure 9 illustrates the precision of stimuli exhibited by both swarms under dynamic obstacle interference.Both swarms demonstrate commendable precision of response stimuli throughout the fission-fusion process.The precision of stimuli of the sub-swarm, influenced by the SFCRL algorithm, exhibits some variability; however, it consistently demonstrates a high level of accuracy.In contrast, the parent swarm experiences slight fluctuations in response stimuli when dynamic obstacles are detected.These fluctuations primarily arise from the need for the parent swarm to stabilize into a new formation during the fission-fusion process and are not directly caused by the dynamic obstacles themselves.Furthermore, it is observed that the obstacles do not significantly impact the parent swarm during the remaining period duration.
Appl.Sci.2024, 14, x FOR PEER REVIEW 18 of 21 24 s.At around 27 s, the sub-swarm undergoes rapid fission, forming two stable subswarms, while maintaining a constant differentiation index of approximately 1. Subsequently, at approximately 92 s, after completing the confrontation and returning to the parent swarm, it seamlessly fuses back into a stable swarm, thereby validating the effectiveness of our proposed algorithm.Figure 9 illustrates the precision of stimuli exhibited by both swarms under dynamic obstacle interference.Both swarms demonstrate commendable precision of response stimuli throughout the fission-fusion process.The precision of stimuli of the sub-swarm, influenced by the SFCRL algorithm, exhibits some variability; however, it consistently demonstrates a high level of accuracy.In contrast, the parent swarm experiences slight fluctuations in response stimuli when dynamic obstacles are detected.These fluctuations primarily arise from the need for the parent swarm to stabilize into a new formation during the fission-fusion process and are not directly caused by the dynamic obstacles themselves.Furthermore, it is observed that the obstacles do not significantly impact the parent swarm during the remaining period duration.Figure 10 presents the dynamics of communication load in UAV swarms employing different interaction structures.In this study, we compare the communication load of the PSTI structure with various fixed-distance communication structures.The simulation results demonstrate that in the absence of fission-fusion, the probabilistic starling-inspired topological interaction achieves a significant reduction in communication load, ranging from 50% to 85% compared to other structures.Even during the fission-fusion process, our proposed interaction structure continues to exhibit advantages.Conversely, as the fixed distance interaction structure adopts smaller interactive distances, the likelihood of local convergence increases during the swarming process.Figure 10 presents the dynamics of communication load in UAV swarms employing different interaction structures.In this study, we compare the communication load of the PSTI structure with various fixed-distance communication structures.The simulation results demonstrate that in the absence of fission-fusion, the probabilistic starling-inspired topological interaction achieves a significant reduction in communication load, ranging from 50% to 85% compared to other structures.Even during the fission-fusion process, our proposed interaction structure continues to exhibit advantages.Conversely, as the fixed distance interaction structure adopts smaller interactive distances, the likelihood of local convergence increases during the swarming process.

Conclusions
In recent years, researchers have shown significant interest in the UAV swarm, particularly its self-organized fission-fusion control methods.However, dealing with dynamic obstacles that possess tracking capabilities remains a challenge in UAV swarm control.As such, we present a bio-inspired fission-fusion control and planning of a UAV

Conclusions
In recent years, researchers have shown significant interest in the UAV swarm, particularly its self-organized fission-fusion control methods.However, dealing with dynamic obstacles that possess tracking capabilities remains a challenge in UAV swarm control.As such, we present a bio-inspired fission-fusion control and planning of a UAV swarm system via reinforcement learning algorithm to tackle this issue.In contrast to existing methodologies, our approach is targeted at resolving the interference posed by unknown dynamic obstacles through the utilization of fewer resources.The proposed self-organized fission-fusion framework facilitates autonomous grouping and separation in response to dynamic obstacles while maintaining control over the size of the sub-swarm.Additionally, we introduce sub-swarm confrontation via reinforcement learning to handle the selection of confrontation paths in the presence of unknown disturbance directions.After completing the confrontation, the sub-swarm seamlessly integrates back into the parent swarm through self-organization.Furthermore, we propose a probabilistic starling-inspired topological interaction structure, which effectively mitigates the issue of swarm local convergence encountered by existing seven-nearest-neighbor algorithms.To validate the competitiveness of our approach, we conduct extensive simulations involving different swarm initial ranges and evaluate the communication load as a performance metric.The results demonstrate the effectiveness and feasibility of our proposed BiFRL algorithm, which combines reinforcement learning with UAV swarm fission-fusion control and planning to handle unknown dynamic obstacle disturbances.We believe that our proposed algorithm has a positive effect on improving the efficiency of swarm control in dynamic environments and the ability of clusters to combat dynamic disturbances.This has a positive effect on the research in the field of multi-swarm.
However, the only resource terms we considered during the study are limited, and it is clear that we need to consider more resource terms, e.g., electromagnetic interference in the environment, leakage rate during recognition of dynamic obstacles, and other factors.These issues will make the existing dynamic environment more complex but more realistic.We will introduce more AI algorithms to try to solve such complex dynamic problems.This will, in turn, improve the robustness and applicability of the algorithms.We will perform in-depth sensitivity analyses for more complex environments to observe the robustness of the algorithms [39].In future work, we will further investigate the interference of complex dynamic obstacles within the swarm fission-fusion approach and study the influence of specific parameters on the algorithm in greater depth.

Figure 1 .
Figure 1.Illustration of the interaction manners.(a) Comparison of the fixed-distance (left) and topological (right) interaction structures; (b) local convergence issue of the topological interaction structure.

Figure 1 .
Figure 1.Illustration of the interaction manners.(a) Comparison of the fixed-distance (left) and topological (right) interaction structures; (b) local convergence issue of the topological interaction structure.

Figure 2 .
Figure 2. Effects of the dynamic obstacles on typical swarm movement.The swarm avoids dynamic obstacles with tracking functions (a) by only evasions or (b) in a simple fission-fusion operation, which either cannot reach or misses the target point; (c) by constant fission and meandering or (d) by continuous fission-fusion operations on a fixed path, which consume large resource cost.
d sa f e s2e), d s2e < d sa f e s2e 0, d s2e ≥ d sa f e s2e(16) where d s2e is the distance between the sub-swarm and the dynamic obstacle,

e2p ( 18 )
where d e2p means the distance between the dynamic obstacle and the parent swarm, d min e2p means the minimum distance between the dynamic obstacle and parent swarm, γ and c 2 are parameters for adjustment, and clip log d e2p d min e2p − 1 , −1, 1 is a mathematical operation which removes the incentive for moving log d e2p d min e2p − 1 outside of the interval [−1, 1].The sub-swarm is captured by the dynamic obstacle if d e2p <= d min e2p .

Figure 4 .
Figure 4. Accumulated reward versus training steps of different RL algorithms.

Figure 5
Figure 5 illustrates the trajectory of the sub-swarm planned by using the PPO algorithm.The visualization demonstrates the efficient response of the sub-swarm to dynamic obstacles, considering various obstacle coordinates.The sub-swarm consistently approaches the target successfully by the end of each episode.

Figure 4 .
Figure 4. Accumulated reward versus training steps of different RL algorithms.

Figure 5
Figure 5 illustrates the trajectory of the sub-swarm planned by using the PPO algorithm.The visualization demonstrates the efficient response of the sub-swarm to dynamic obstacles, considering various obstacle coordinates.The sub-swarm consistently approaches the target successfully by the end of each episode.

Figure 6 .
Figure 6.Self-organized fission-fusion process of UAV swarm.(a) indicates the complete UAV swarm fission-fusion movement, while (b-f) denotes different steps within the fission-fusion process.

Figure 6 .
Figure 6.Self-organized fission-fusion process of UAV swarm.(a) indicates the complete UAV swarm fission-fusion movement, while (b-f) denotes different steps within the fission-fusion process.

4. 3 . 4 .
Figure7depicts the temporal evolution of the polarization index of the UAV swarm combined with reinforcement learning.The polarization indices in Figure7demonstrate that the entire swarm successfully completes the swarming within 10 s.After the fission campaign, both swarms have strong robustness.Upon the sub-swarm's return to the parent swarm, there is a brief decrease in the polarization index, followed by stabilization around 1. Notably, the parent swarm is unaffected by the dynamic obstacle throughout the entire process.

4. 3 . 4 .
Figure7depicts the temporal evolution of the polarization index of the UAV swarm combined with reinforcement learning.The polarization indices in Figure7demonstrate that the entire swarm successfully completes the swarming within 10 s.After the fission campaign, both swarms have strong robustness.Upon the sub-swarm s return to the parent swarm, there is a brief decrease in the polarization index, followed by stabilization around 1. Notably, the parent swarm is unaffected by the dynamic obstacle throughout the entire process.

Figure 7 .
Figure 7. Polarization index of the UAV swarm.

Figure 8
Figure 8 illustrates the evolution of the differentiation index in the swarm, integrated with reinforcement learning.

Figure 8 .
Figure 8. Differentiation index of the UAV swarm.

Figure 7 .
Figure 7. Polarization index of the UAV swarm.

Figure 8
Figure 8 illustrates the evolution of the differentiation index in the swarm, integrated with reinforcement learning.

Figure 7 .
Figure 7. Polarization index of the UAV swarm.

Figure 8
Figure 8 illustrates the evolution of the differentiation index in the swarm, integrated with reinforcement learning.

Figure 8 .
Figure 8. Differentiation index of the UAV swarm.

Figure 8 .
Figure 8. Differentiation index of the UAV swarm.

Figure 9 .
Figure 9. Precision of stimuli of the UAV swarm.

Figure 9 .
Figure 9. Precision of stimuli of the UAV swarm.

Algorithm 3: Sub-swarm confrontation algorithm Input: c parent−swarm t , c sub−swarm t , c enemy t , c target t Output: u lur Function: sub
-swarm Confrontation with dynamic obstacles Initialize parameter vectors ψ, ψ, θ, ϕ Initialize replay buffer D for each iteration do for each time step do Sample a t according to a t ∼ π ϕ (a t |s t ) from target network Observe s t+1 according to s t+1 ∼ p(s t+1 |s t , a t )