A Collaborative Navigation Algorithm for Unmanned Aerial Vehicles Based on Joint Cognition and Risk Perception

Huang, Chenkang; Wei, Ruixuan; Jiang, Benqi; Wei, Pengfei; Zhang, Qirui

doi:10.3390/drones10030186

Open AccessArticle

A Collaborative Navigation Algorithm for Unmanned Aerial Vehicles Based on Joint Cognition and Risk Perception

by

Chenkang Huang

¹,

Ruixuan Wei

²,

Benqi Jiang

¹,

Pengfei Wei

¹ and

Qirui Zhang

^2,*

¹

Graduate School, Air Force Engineering University, Xi’an 710038, China

²

Aviation Engineering School, Air Force Engineering University, Xi’an 710038, China

^*

Author to whom correspondence should be addressed.

Drones 2026, 10(3), 186; https://doi.org/10.3390/drones10030186

Submission received: 3 February 2026 / Revised: 5 March 2026 / Accepted: 6 March 2026 / Published: 9 March 2026

(This article belongs to the Section Artificial Intelligence in Drones (AID))

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

A novel Joint Cognition and Risk Perception (JCRP) framework for multi-UAV cooperative navigation is proposed, which integrates sequential cooperation mechanisms, dynamic trust evaluation, and risk-aware path planning to address the conflict between prior cognition and real-time perception in dynamically unknown environments.
Experiments in both static and dynamic maze environments show that, compared with baseline methods, JCRP reduces the path length of follower UAVs by approximately 41.39% and improves the safe decision ratio by 10.9 percentage points. Real-world physical platform tests further validate its robustness under practical conditions.

What are the implications of the main finding?

The JCRP framework establishes a new paradigm for multi-UAV cooperative navigation in complex dynamic environments, and advances autonomous systems toward a better balance between safety and efficiency through cognitive transfer and risk-adaptive mechanisms.
The proposed framework and its physical verification demonstrate the feasibility of deployment in real-world scenarios including search and rescue, infrastructure inspection, and swarm logistics, providing support for the engineering application of multi-UAV cooperative technologies.

Abstract

Addressing the challenges of cooperative navigation for unmanned aerial vehicles (UAVs) in dynamic unknown environments, this paper proposes a collaborative method based on Joint Cognition and Risk Perception (JCRP). The method employs a sequential cooperative framework, where a pioneer UAV constructs a transferable environmental map, while successor UAVs integrate this prior knowledge with real-time perceptions to form a joint cognitive representation. A dynamic trust mechanism quantitatively evaluates cognitive reliability, enabling risk-aware path planning that balances safety and efficiency. Simulations and physical experiments demonstrate that JCRP reduces the path length of follower UAVs by approximately 41.39% and improves the safe decision ratio by 10.9 percentage points over baseline methods. These results validate the method’s robustness in complex scenarios, such as maze-like environments, highlighting its potential for applications in search-and-rescue.

Keywords:

UAV; cooperative navigation; knowledge transfer; joint cognition; risk perception

1. Introduction

In recent years, Unmanned Aerial Vehicle (UAV) technology has witnessed rapid advancement, with applications expanding from open skies to complex indoor environments. This expansion demonstrates significant potential in critical domains such as search and rescue, infrastructure inspection, and logistics distribution [1,2,3]. These scenarios are typically characterized by labyrinthine properties, including intricate environmental structures, limited prior information, and frequent dynamic disturbances, which pose substantial challenges to UAV path planning in terms of autonomy, adaptability, and cooperative capabilities [4]. Within such dynamic and unknown settings, traditional planning methods that rely extensively on static and complete environmental knowledge exhibit inherent limitations due to their dependence on precise environmental models, thereby hindering practical deployment.

To enhance mission effectiveness in dynamic unknown environments, the research paradigm has evolved significantly from single-agent intelligence to collaborative swarm intelligence. While single-agent approaches such as hybrid GWODE [2], improved PSO [5], and ACO-VP [6] algorithms have addressed specific aspects of environmental uncertainty, their reliance on isolated decision-making suffers from inherent efficiency bottlenecks. A single UAV is constrained by limited perceptual range and computational resources, resulting in inefficient exploration in large-scale environments [7]. This fundamental limitation necessitates a paradigm shift toward multi-UAV cooperative systems.

The concept of multi-UAV cooperation encompasses a spectrum of collaborative paradigms designed to enhance overall system efficacy [8]. Among these, the sequential cooperation paradigm emerges as an efficient strategy that facilitates experience transfer and learning among UAV systems. This paradigm decouples exploration and exploitation processes along the temporal dimension: a pioneer UAV constructs an environmental cognitive map, while successor Unmanned Aerial Vehicles (UAVs) reuse this prior experience to achieve systemic performance improvements. However, the central challenge lies in enabling successor UAVs to develop a deep understanding and effective utilization of prior cognition—ensuring safety while attaining qualitative improvements in task efficiency, rather than merely executing pre-defined paths.

To address these cooperative challenges, knowledge transfer methods have shifted research focus towards behavioral-level adaptation. Methods based on transfer learning, such as multi-target intention recognition [9], pigeon-inspired optimization [10], and neural network-based multitask optimization [11], essentially perform policy transfer migrates parametric decision rules or action strategies across tasks, where agents learn ’how to act’ without developing structured cognition of why the environment is as it is. While effective in static or similar environments, this paradigm exhibits fundamental limitations: the transferred knowledge remains a ‘black box’ encoding input-output mappings rather than an interpretable environmental model, rendering agents incapable of reasoning about environmental dynamics or adapting to structural changes beyond the training distribution. Consequently, their adaptability diminishes under unexperienced environmental changes due to limited generalization capability stemming from insufficient environmental understanding.

To achieve deeper environmental understanding and address the limitations of policy transfer, research has advanced into perception and data fusion techniques, giving rise to the paradigm of cognitive transfer. Unlike policy transfer, which centers on behavioral strategies, cognitive transfer prioritizes the migration of structured environmental representations. Its core goal is to equip agents with explicit cognitive models of the environment, enabling them to understand “what the environment is” rather than merely learning “how to act.” Existing studies have employed multi-model fusion and multi-sensor integration for trajectory correction and dynamic obstacle avoidance [12,13], as well as deep transfer learning frameworks for knowledge migration [14]. Despite establishing foundations for cognitive transfer, these approaches remain insufficient: they primarily focus on instantaneous perceptual data processing within single-agent contexts, lacking systematic mechanisms to integrate prior cognitive maps from pioneer agents with successor agents’ real-time observations; moreover, when environmental dynamics cause conflicts between prior knowledge and current sensory inputs, they possess no quantitative framework to evaluate information source credibility, rendering them incapable of discriminating between obsolete prior knowledge and reliable real-time data. These deficiencies critically limit genuine intelligent fusion and proactive risk assessment capabilities, thereby compromising system adaptability and safety in dynamic unknown environments.

To address these limitations, this paper investigates cooperative navigation for multiple UAVs in dynamic unknown environments from a joint cognition perspective. We propose a Joint Cognition and Risk-aware cooperative autonomous navigation method (JCRP) that incorporates a cognitive fusion network based on Bayesian inference with a dynamic trust mechanism to resolve cognitive discrepancies. This approach enables effective integration of prior cognitive maps with real-time local perceptions. The main contributions are summarized as follows:

A cooperative cognitive framework underpinned by dynamic trust assessment is developed, which effectively integrates prior cognition with real-time perception through Bayesian inference. This integration is enhanced by a dynamic trust mechanism that quantifies cognitive discrepancies, thereby significantly improving the system’s perceptual consistency and adaptability in highly dynamic environments;
A risk-aware autonomous navigation method is proposed, incorporating cognitive credibility to achieve a balance between safety and operational efficiency via a multi-factor cost function. This approach strengthens the overall robustness and practical utility of the system when operating under dynamic uncertainties;
To establish a thorough validation framework, comprehensive simulation testbeds and a real-world UAV platform are constructed, enabling rigorous assessment of method performance across simulated and physical environments.

The remainder of this paper is organized as follows: Section 2 summarizes the related work. Section 3 presents the problem statement and mathematical formulation. Section 4 elaborates on the proposed methodology, including cognitive experience construction, joint cognitive fusion, dynamic trust modeling, and risk-aware navigation strategies. Section 5 validates the algorithm’s efficacy through extensive simulations and physical experiments conducted in both static and dynamic environments. Finally, Section 6 concludes the paper by summarizing the principal findings and highlighting the key contributions.

2. Related Work

2.1. UAV Path Planning

Path planning methodologies for UAVs can be systematically classified along a continuum of environmental complexity, spanning from static known environments to dynamic known environments, static unknown environments, and ultimately the most challenging domain of dynamic unknown environments [15]. In scenarios where a complete environmental model is available a priori, classical algorithms such as A* [16], Dijkstra [17], and the Rapidly-exploring Random Tree (RRT) [18] can generate globally optimal or sub-optimal paths. However, these algorithms exhibit inherent limitations in practical applications characterized by incomplete prior information or dynamic uncertainties, as their performance critically depends on the accuracy of the underlying environmental model [19].

The shift toward online perception and real-time decision-making for completely unknown environments has produced innovative approaches using reinforcement learning and intelligent optimization [20]. For instance, while Zhou et al.’s optimized Q-learning algorithm demonstrates improved convergence [21], such methods inherit inherent limitations including computational intensity, sensitivity to local optima, and substantial training requirements. Similarly, sampling-based methods like RRT variants achieve exploration efficiency but sacrifice path optimality and real-time responsiveness [22]. These trade-offs highlight a fundamental challenge: current approaches prioritize either optimality or adaptability, but rarely achieve both simultaneously.

In recent years, path planning for dynamic partially-known environments has emerged as a significant research focus. This class of methods integrates offline global planning with online real-time replanning principles. By combining prior knowledge with real-time sensory data, these approaches aim to construct adaptive decision-making models capable of handling dynamic and uncertain environments. Gelli et al.’s trajectory planning technique [23] exemplifies this hybrid approach. However, like similar methods, it exhibits a common limitation in that it lacks robust mechanisms for quantitatively evaluating and reconciling discrepancies between prior knowledge and real-time perceptions. This limitation is particularly pronounced in multi-agent cooperative settings, where inadequate management of cognitive differences may compromise risk assessment efficacy and affect overall system dependability.While algorithmic refinements continue to emerge, the field still requires frameworks capable of dynamically evaluating information credibility and adaptively adjusting decision-making processes. The methodology presented in this work addresses this challenge through its integrated architecture for cognitive fusion and quantitative trust assessment.

2.2. Multi-Robot Cooperative Exploration

Multi-robot systems can substantially enhance the efficiency of unknown environment exploration through task partitioning and cooperative mechanisms. Existing research has predominantly focused on real-time cooperative strategies, which rely on continuous and tight data interaction among multiple robots during task execution to achieve consistent and highly efficient collective behaviors. However, such requirements are often impractical in real-world environments with communication constraints. For instance, Vetrella et al. developed a cooperative navigation method based on a central filter that fuses real-time perceptual data from swarm members to collectively improve localization accuracy in GPS-denied environments [24]. In contrast, Yu et al. employed a central planner to assign pre-defined scanning paths to the swarm and utilized optimization algorithms for task allocation, minimizing the total mission time and achieving high parallel efficiency under ideal communication conditions [7]. Zhou et al. developed the RACER system, which embodies a decentralized cooperative exploration strategy. This methodology dynamically allocates exploration regions via online pairwise interactions, enabling efficient and rapid coverage of unknown environments [25]. These studies reflect the strategic diversity of real-time cooperation from different perspectives. While efficient, such approaches impose stringent requirements on the real-time performance and reliability of inter-robot communication, which can easily become a system bottleneck in complex indoor or communication-restricted environments.

In contrast to these communication-intensive paradigms, our work proposes an sequential cooperation framework that reconceives the multi-robot exploration problem. The core concept involves the temporal decoupling of exploration and exploitation processes: a pioneer UAV is dedicated to environmental exploration and cognitive map construction, while successor UAVs focus exclusively on efficient navigation by leveraging the transferred prior cognition. This approach markedly reduces the dependency on continuous high-bandwidth communication.

2.3. Knowledge Transfer

In the domain of robotic learning, transfer learning techniques substantially enhance learning efficiency and system adaptability by facilitating knowledge migration across related tasks or environments. Rusu et al. introduced Progressive Neural Networks, which enable effective policy transfer from simulation to real-world settings through hierarchical feature extraction, thereby mitigating domain shift impacts [26]. Subsequently, Lan Bo et al. developed a transfer reinforcement learning approach based on Low-Rank Adaptation (LoRA), incorporating pre-trained models with an enhanced experience replay mechanism to achieve efficient path planning and obstacle avoidance in unknown environments [27]. These approaches transfer black-box policy parameters rather than explicit environmental understanding. As a result, the recipient agent inherits behavioral patterns without grasping the underlying environmental dynamics. This makes the agent unable to adapt when pre-trained policies fail to address unforeseen dynamic changes. This limitation is particularly severe in multi-UAV cooperation, where rigid reliance on fixed policies undermines adaptive decision-making processes.

In response to this deficit, recent research has shifted toward structured cognitive model transfer. This research direction prioritizes the migration of interpretable environmental representations over implicit policies. In power inspection applications, Fan et al. implemented a multi-sensor fusion strategy with Bayesian-updated occupancy grid mapping, effectively addressing dynamic obstacle identification and avoidance through environmental topology modeling. This approach facilitates the formation of structured cognitive representations via spatial reasoning [13]. In the realm of evolutionary optimization for dynamic tasks, Chen et al. introduced the Multi-Source Knowledge Transfer Evolutionary Algorithm. This algorithm fuses temporally predictive solutions with those from analogous environments to enable effective knowledge reuse [28]. Li et al. further refined this approach with the MTEA-DSAT algorithm, which designs a dynamic selection mechanism to adaptively transfer elite solution information or search directions based on quantified task similarity [29]. These approaches significantly enhance system interpretability and generalization capability through the extraction and transfer of structured knowledge. However, these methods lack a rigorous mechanism for dynamic credibility assessment of transferred knowledge. Recipient agents may uncritically rely on outdated or invalid transferred knowledge, compromising both system safety and efficiency. The Joint Cognition and Risk Perception framework proposed in this work embeds a dynamic trust mechanism into cognitive knowledge transfer to enable adaptive and reliable decision-making.

The proposed JCRP framework fundamentally diverges from existing methodologies, as systematically compared in Table 1, through its integrated architecture that unifies cognitive fusion with dynamic trust assessment. Unlike policy-transfer methods that migrate opaque decision rules, JCRP transfers structured environmental representations while continuously evaluating their validity. Distinct from cognitive-transfer approaches that assume uniform reliability of prior knowledge, JCRP quantifies spatially-varying credibility through Bayesian discrepancy detection and temporal consistency validation. This dual capability lays the foundation for safe and efficient sequential cooperation in dynamic environments.

3. Problem Formulation

3.1. Problem Description

This research considers a dynamic and unknown two-dimensional labyrinthine environment incorporating complex obstacles such as sudden falling gravel in disaster rescue scenarios, designed to emulate the spatial uncertainty and dynamic variability characteristic of real-world scenarios such as indoor search and disaster rescue operations. Within this framework, efficient cognitive experience transfer and collaborative navigation between a pioneer UAV (UAV-1) and a successor UAV (UAV-2) are realized through structured interaction. The specific process is structured as follows: UAV-1 initiates from the launch area and conducts autonomous exploration in a completely unknown dynamic labyrinth environment, aiming to identify a safe path to the designated landing area while ensuring obstacle avoidance throughout the process. Subsequently, UAV-1 transmits the acquired environmental cognitive map to UAV-2 via an onboard data link. Finally, by leveraging this prior knowledge, UAV-2 rapidly plans a safe trajectory and efficiently reaches the destination with minimal or no dependence on real-time obstacle detection.

The labyrinth environment is represented by a continuous coordinate space, denoted as

W \subset R^{2}

. To facilitate computational processing, this space is discretized into a collection of N grid cells, represented by the set

M = {m_{1}, m_{2}, \dots, m_{N}}

. Each grid cell is characterized by the coordinates of its center point.

To enhance visual clarity and intuitive representation of the labyrinth structure, adjacent grid cells whose states are both identified as “occupied” are interconnected and abstracted as wall structures within the diagram, as illustrated in Figure 1. The occupancy state of each grid cell

m_{i}

within the environment is represented by a binary random variable

O_{i}

, where

O_{i} = 1

signifies “occupied” and

O_{i} = 0

denotes “free.” The collective set of all grid states constitutes the ground truth map of the environment, defined as

O = {O_{1}, O_{2}, \dots, O_{N}}

.

Building upon the aforementioned environmental modeling foundation, the path planning problem for UAV-2 can be formally articulated as the following optimization problem: to find a path

π = (m_{1}, m_{2}, \dots, m_{L})

from the start point S to the goal point G, pursuing the minimization of its expected cumulative cost, expressed as

\min_{π} C (π) = E [C_{d i s t a n c e} (π) + λ \cdot C_{r i s k} (π, T)]

(1)

where

C_{d i s t a n c e} (π)

denotes the total path length.

C_{r i s k} (π, T)

signifies the accumulated risk along the trajectory. This risk is a trust-based comprehensive indicator. It does not represent geometric proximity to obstacles or collision probability alone. Instead, it quantifies potential safety threats arising from conflicts between the prior cognitive map and real-time perception. Such threats include unexpected obstacles, sudden environmental changes, or temporary blockages in previously passable regions.

λ

serves as a weighting coefficient designed to balance efficiency against safety.

This optimization must adhere to the following constraints:

1. Safety Constraint: As shown in Figure 2, the UAV is required to maintain a safe distance

r_{s a f e}

from all obstacles at every time instant t, expressed mathematically as

\min_{m_{i} \in M_{occ} (t)} ‖ p (t) - p_{m_{i}} ‖_{2} \geq r_{safe}, t \in [0, T]

(2)

where

p (t)

represents the position coordinates of UAV-2 at time t,

M_{occ} (t) \subset M

denotes the set of grid cells classified as occupied at time t, and

p_{m_{i}}

is the center coordinate of grid cell

m_{i}

.

2. Dynamic Adaptation Constraint: Upon detection of a discrepancy between the prior cognitive map

M_{prior}

and the real-time situational awareness, specifically when

O_{i} (t) \neq O_{i} (M_{prior})

for any grid cell, the path must be adjusted promptly to ensure operational safety.

3.2. UAV Kinematics Model

To reduce computational complexity in autonomous UAV navigation, we use a fixed altitude to simplify the 3D motion to 2D planar analysis, as shown in Figure 3.

Following this model reduction, the resulting two-dimensional kinematic representation of the UAV is depicted in Figure 4. The system’s kinematic behavior can be described by the following set of differential equations:

\{\begin{matrix} \dot{x} (t) & = v (t) \cdot \cos (φ (t)) \\ \dot{y} (t) & = v (t) \cdot \sin (φ (t)) \\ \dot{φ} (t) & = ω (t) \end{matrix}

(3)

Here,

\dot{x} (t)

and

\dot{y} (t)

denote the velocity components of the UAV along the x and y axes at time t, respectively,

v (t)

represents the UAV’s flight speed, while

φ (t)

and

ω (t)

correspond to the heading angle and yaw rate, respectively. Thus, the kinematic state of the UAV at any time t can be comprehensively characterized by a four-dimensional vector

ξ (t) = [x (t), y (t), φ (t), v (t)]

, encapsulating its positional and dynamic attributes.

3.3. UAV Detection Model

To empower UAVs with the capability for real-time obstacle detection and avoidance in unknown environments, this paper constructs an obstacle perception model, illustrated in Figure 5. The hardware foundation of this model comprises eight infrared sensors mounted on the UAV. Physically, these sensors are arranged in four orthogonal directions, with two sensors per direction for improved reliability. Given that the detection threshold is comparable to the UAV size, the circular detection area can be simplified into eight equally spaced directions to ensure full environmental coverage, as illustrated in Figure 6.

Each infrared sensor detects the presence of obstacles within its specific directional sector. When the distance between a sensor and an obstacle falls below a predefined detection threshold, the corresponding direction is identified as obstructed. The detection outcome of each sensor is binarized, where sensors with obstacle occlusions are flagged as “1” and unobstructed sensors are flagged as “0”. By sequentially arranging the detection results from all sensors in a clockwise manner, an 8-bit infrared obstacle detection vector

S_{a l l}

is obtained:

S_{a l l} = [s_{1}, s_{2}, s_{3}, s_{4}, s_{5}, s_{6}, s_{7}, s_{8}]

(4)

where

s_{i} \in {0, 1}

represents the state of the i-th sensor. This binary vector provides a compact and efficient representation of the instantaneous obstacle distribution in the UAV’s immediate vicinity, serving as crucial perceptual input for subsequent navigation decision-making processes.

4. Proposed Approach

4.1. Overall Framework

The JCRP algorithm proposed in this paper addresses the fundamental challenges of multi-UAV cooperative navigation in dynamic unknown environments. Departing from conventional real-time multi-agent interaction approaches, the JCRP framework employs an asynchronous sequential cooperative paradigm that temporally decouples environmental exploration from experience utilization. This design facilitates efficient migration and intelligent exploitation of prior cognition. By incorporating a dynamic trust mechanism, the JCRP algorithm enables UAVs to quantify cognitive conflicts and execute risk-aware decision-making, thereby achieving a paradigm shift from reactive obstacle avoidance to proactive risk mitigation. The overall technical architecture of the JCRP algorithm comprises two distinct phases, as illustrated in Figure 7.

The first phase focuses on prior cognition construction and transmission. UAV-1 performs autonomous environmental exploration using onboard sensors and incrementally builds a transferable prior cognitive map through Simultaneous Localization and Mapping (SLAM) techniques. Upon map completion, UAV-1 transmits the cognitive map to UAV-2 via a communication link, establishing the foundation for subsequent cooperative navigation.

The second phase constitutes an intelligent decision-making closed loop for autonomous cooperative navigation, featuring sequential and interdependent real-time interaction among three core modules. The Joint Cognitive Fusion Module first fuses the prior cognitive map from UAV-1 with real-time perceptual data acquired by UAV-2’s onboard sensors to generate a hybrid cognitive map that accurately characterizes the current environmental state. Taking this hybrid cognitive map and real-time perceptual data as dual inputs, the Dynamic Trust Module quantifies discrepancies between prior cognitive knowledge and real-time environmental observations, and further constructs a dynamic trust map to establish a quantitative reliability assessment framework for the hybrid cognitive map. The Risk-Aware Path Planning Module then synthesizes environmental state information from the hybrid cognitive map and risk metric data derived from the dynamic trust map to generate optimal navigation trajectories that achieve an adaptive balance between operational efficiency and navigation safety.

4.2. Collaborative-Oriented Cognitive Experience Construction and Fusion

4.2.1. Prior Cognitive Construction

To facilitate the quantification and transfer of cognitive representations, this study adopts a grid-based map as a unified formalism for environmental cognition. Through an environmental mapping mechanism, traversable regions and obstacles within the operational domain are projected onto a prior cognitive map, thereby constructing a probabilistic understanding of the environment. Each grid cell

m_{i}

is characterized by a binary random variable

O_{i}

following a Bernoulli distribution, where the probability distribution

P (O_{i})

explicitly encodes the confidence level of occupancy for the corresponding grid. This probabilistic framework elegantly captures environmental uncertainties:

P (O_{i} = 1) = 1

signifies absolute certainty of obstruction, while

P (O_{i} = 1) = 0

indicates confirmed free space.

UAV-1, equipped with infrared sensors, sequentially acquires environmental observations denoted as

z_{1 : t}^{(UAV 1)}

. Leveraging a simultaneous localization and mapping (SLAM) pipeline, the system iteratively refines the occupancy probability for each grid cell

m_{i}

. To enhance numerical stability and computational efficiency, the update process is implemented in log-odds form, circumventing precision underflow and simplifying arithmetic operations. The recursive update rule is formulated as follows:

\begin{matrix} L (m_{i} ∣ z_{1 : t}^{U A V 1}) = & L (m_{i} ∣ z_{1 : t - 1}^{(UAV 1)}) \\ + \log (\frac{P (O_{i} = 1 ∣ z_{t}^{(UAV 1)})}{1 - P (O_{i} = 1 ∣ z_{t}^{(UAV 1)})}) - L_{0} \end{matrix}

(5)

where

L (m_{i} ∣ z_{1 : t}^{(UAV 1)})

represents the posterior Log-Odds value after incorporating all observations up to time t,

L (m_{i} ∣ z_{1 : t - 1}^{(UAV 1)})

denotes the prior Log-Odds value at time

t - 1

, and the logarithmic term

\log (\frac{P (O_{i} = 1 ∣ z_{t}^{(UAV 1)})}{1 - P (O_{i} = 1 ∣ z_{t}^{(UAV 1)})})

constitutes the inverse observation model, quantifying the evidential contribution of the current measurement

z_{t}^{(UAV 1)}

to the grid cell’s state estimation. The parameter

L_{0}

signifies the initial prior Log-Odds value, typically assuming complete environmental ignorance with

P (O_{i} = 1) = 0.5

, thereby yielding

L_{0} = 0

.

The occupancy probability for grid cell

m_{i}

is subsequently derived through the logistic transformation of the updated Log-Odds value:

P (O_{i} = 1 ∣ z_{1 : t}^{(UAV 1)}) = 1 - \frac{1}{1 + \exp (L (m_{i} ∣ z_{1 : t}^{(UAV 1)}))}

(6)

This transformation effectively converts the additive evidence accumulation in the log-odds domain back to a probabilistic interpretation bounded between zero and one. The prior cognitive map is formally defined as the comprehensive set of all grid cell probabilities:

M_{prior} = {P (O_{i} = 1 ∣ z_{1 : T}^{(UAV 1)})}_{i = 1}^{N}

, which constitutes the transferable prior cognitive experience.

4.2.2. Joint Cognitive Development

The primary objective of UAV-2 is to intelligently leverage rather than blindly follow the prior cognitive map

M_{prior}

. To achieve this, UAV-2 must effectively integrate this prior knowledge with real-time local sensor observations

z_{t}^{(UAV 2)}

to generate a hybrid cognitive map

M_{posterior}

that accurately captures the current environmental situation.

The cognitive fusion challenge can be formally formulated as a sequential Bayesian estimation problem: given the prior probability

P (O_{i}) = P (O_{i} = 1 ∣ z_{1 : T}^{(UAV 1)})

, incorporate new observational evidence

z_{t}^{(UAV 2)}

to update the occupancy state of grid cell

m_{i}

. According to Bayes’ theorem:

P (O_{i} ∣ z_{t}^{(UAV 2)}) = \frac{P (z_{t}^{(UAV 2)} ∣ O_{i}) P (O_{i})}{P (z_{t}^{(UAV 2)})}

(7)

The joint cognitive fusion formulation for UAV-2 is consequently derived as

L_{posterior} (m_{i}) = L_{prior} (m_{i}) + L_{inv} (z_{t}^{(UAV 2)} ∣ m_{i})

(8)

where

L_{prior} (m_{i}) = \log (\frac{P (O_{i})}{1 - P (O_{i})})

represents the prior cognitive knowledge from UAV-1, and

L_{inv} (z_{t}^{(UAV 2)} ∣ m_{i}) = \log (\frac{P (z_{t}^{(UAV 2)} ∣ O_{i} = 1)}{P (z_{t}^{(UAV 2)} ∣ O_{i} = 0)})

denotes the inverse observation model for UAV-2, computed similarly to UAV-1’s sensor model.

This formulation essentially performs weighted integration of prior and real-time cognition within a Bayesian framework, enabling UAV-2 to progressively rectify errors or outdated information in the prior cognitive map. The resulting hybrid cognitive map

M_{posterior} = {L_{posterior} (m_{i})}_{i = 1}^{N}

provides a dynamically updated environmental representation, establishing a robust probabilistic foundation for subsequent trust evaluation and collaborative navigation tasks.

4.3. Dynamic Trust Modeling Based on Cognitive Discrepancy

The hybrid cognitive map

M_{posterior}

accurately represents probabilistic estimates of environmental states; however, it lacks an inherent mechanism to discriminate the reliability of cognitive information. In dynamic environments, the trustworthiness of prior knowledge exhibits spatial heterogeneity. To enhance the system’s adaptability to environmental changes, this study introduces a dynamic trust degree

T_{i} \in [0, 1]

for each grid cell

m_{i}

, quantifying its reliability relative to the prior cognition

M_{prior}

. A higher

T_{i}

value signifies stronger confidence in the consistency between the current environment and prior cognition within the corresponding region, whereas a lower value indicates potential obsolescence of the prior knowledge.

To achieve dynamic trust assessment, quantifying the discrepancy between prior cognition and real-time perception is essential. This work employs the Kullback-Leibler (KL) divergence as a metric for evaluating cognitive differences. Defining the prior cognition constructed by UAV-1 as

Q_{i}

(representing the occupancy probability of grid cell

m_{i}

) and the local cognition obtained by UAV-2 via real-time observation as

P_{i}^{t}

, the KL divergence from the prior cognition to the local cognition at time t for grid cell

m_{i}

is formulated as

D_{K L} (P_{i}^{t} ‖ Q_{i}) = \sum_{x \in {0, 1}} P_{i}^{t} (x) \log (\frac{P_{i}^{t} (x)}{Q_{i} (x)})

(9)

which expands to the operational expression:

\begin{matrix} D_{K L}^{(i, t)} = & p_{posterior}^{(i, t)} \log (\frac{p_{posterior}^{(i, t)}}{p_{prior}^{(i)}}) \\ + (1 - p_{posterior}^{(i, t)}) \log (\frac{1 - p_{posterior}^{(i, t)}}{1 - p_{prior}^{(i)}}) \end{matrix}

(10)

where

D_{K L}^{(i, t)} \geq 0

, with equality holding if and only if the two distributions are identical. This metric provides a principled measure of information gain when updating beliefs from the prior distribution

Q_{i}

to the posterior distribution

P_{i}^{t}

, effectively capturing the magnitude of environmental changes through probabilistic divergence analysis.

Direct utilization of instantaneous Kullback-Leibler (KL) divergence for decision-making is susceptible to transient fluctuations induced by sensor noise, potentially leading to unreliable trust assessments. To enhance robustness, this study implements a filtering mechanism grounded in a temporal sliding window approach. This strategy mitigates the impact of sporadic measurement anomalies by incorporating historical consistency checks.

Initially, a binary cognitive discrepancy indicator

D_{i}^{t}

is defined based on a preset divergence threshold

τ

:

D_{i}^{t} = I [D_{K L}^{(i, t)} > τ]

(11)

where

I [\cdot]

denotes the indicator function. A value of

D_{i}^{t} = 1

signifies a statistically significant discrepancy between the current observation and the prior cognition at grid cell

m_{i}

.

Subsequently, a temporal sliding window

W = [t - K, t]

of length K is introduced to evaluate persistence of discrepancies. The cumulative occurrence of

D_{i}^{t}

within this window is computed as

C_{i}^{t} = \sum_{k = t - K}^{t} D_{i}^{k}

, yielding a validated change flag

S_{i}^{t}

:

S_{i}^{t} = I [C_{i}^{t} \geq Γ]

(12)

where

Γ

represents the count threshold for triggering confirmed environmental changes within the window. The trust degree

T_{i}

is dynamically updated according to

S_{i}^{t}

through the following rule:

T_{i}^{t} = \{\begin{matrix} γ \cdot T_{i}^{t - 1}, & if S_{i}^{t} = 1 \\ \min (1, T_{i}^{t - 1} + δ), & if S_{i}^{t} = 0 \end{matrix}

(13)

Here,

γ \in (0, 1)

serves as a decay factor, while

δ

functions as a recovery increment, controlling the gradual restoration of trust in the absence of changes. This design reflects the system’s prudent skepticism toward prior knowledge, ensuring that trust degrades rapidly upon detecting persistent changes but recovers conservatively during stable periods.

Considering the spatial correlation inherent in environmental changes, when the trust degree of a grid cell

m_{i}

decays, its impact propagates to the spatial neighborhood

N (m_{i})

. The trust degree of neighboring grid cells

m_{j} \in N (m_{i})

is updated as

T_{j}^{t} = η \cdot T_{i}^{t} + (1 - η) \cdot T_{j}^{t - 1}, \forall m_{j} \in N (m_{i})

(14)

where

η

denotes the spatial propagation coefficient, governing the influence intensity of the central grid’s trust variation on its vicinity.Through this multi-layered mechanism, the system dynamically generates and maintains a trust map

τ = {T_{i}}

. This map exists independently of the hybrid cognitive map and explicitly encodes the “reliability” of the environment, providing a critical foundation for risk-aware decision-making in dynamic scenarios.

4.4. Risk-Aware Autonomous Navigation

The JCRP algorithm implements a local real-time optimization strategy guided by a global reference path. This section elaborates the navigation decision-making process of UAV-2 when cognitive conflicts are detected. By holistically integrating the hybrid cognitive map

M_{posterior}

, the trust map

τ

, and an incorporated curiosity mechanism, the algorithm formulates a multi-factor cost function that enables the UAV to dynamically balance target-directed attraction, curiosity-driven exploration, and risk-averse behavior at each decision point.

In autonomous robotic exploration, curiosity mechanisms propel agents to actively acquire environmental information for reducing uncertainty. This work adopts information entropy as the quantitative curiosity metric, where higher entropy values correspond to greater state uncertainty within a region, indicating enhanced potential information gain through exploratory actions.

For each grid cell

m_{i}

in the grid map, the uncertainty of its occupancy state can be measured by the information entropy

H (m_{i})

, defined as

H (m_{i}) = - p_{i} \log_{2} (p_{i}) - (1 - p_{i}) \log_{2} (1 - p_{i})

(15)

where

p_{i} = P (O_{i} = 1 | M_{posterior})

represents the occupancy probability of grid cell

m_{i}

. The entropy reaches its maximum value

H_{\max} = 1

when

p_{i} = 0.5

, indicating complete uncertainty about the grid’s state. Conversely, entropy achieves its minimum value

H_{\min} = 0

when

p_{i} = 0

or

p_{i} = 1

, reflecting certain knowledge about the grid being either free or occupied. Therefore, the entropy value directly quantifies the exploration value of investigating grid cell

m_{i}

.

Upon detection of cognitive discrepancies, the algorithm engages in dynamic environmental reasoning to adaptively respond to changing conditions. During each decision cycle, the system generates optimal control commands within a finite horizon by optimizing a multi-objective cost function

C (π)

, formulated as

\begin{matrix} C (π) = & α \cdot C_{distance} (π) + β \cdot C_{curiosity} (π, M_{posterior}) + λ \cdot C_{risk} (π, T) \end{matrix}

(16)

where

α

,

β

, and

λ

denote weighting coefficients that dynamically balance the relative importance of distinct objectives. The constituent cost components are meticulously designed as follows:

1. Target-Directed Cost: This component ensures purposive navigation by incentivizing progression toward the global goal point G. It quantifies the path efficiency through the cumulative Euclidean distance between consecutive grid centers along the trajectory:

C_{distance} (π) = \sum_{l = 1}^{L - 1} {∥ p (m_{l + 1}) - p (m_{l}) ∥}_{2}

(17)

where

p (m_{l})

represents the Cartesian coordinates of the center of grid cell

m_{l}

.

2. Curiosity-Driven Exploration Cost: To actively reduce environmental uncertainty, this term promotes exploration of high-information-entropy regions. It computes the negative average entropy of grids traversed by the path:

C_{curiosity} (π, M_{posterior}) = - \frac{1}{| M_{π} |} \sum_{m_{l} \in π} H (m_{l})

(18)

where

| M_{π} |

denotes the cardinality of the path-covered grid set, and

H (m_{l})

signifies the information entropy of grid

m_{l}

, with higher values indicating greater exploratory value.

3. Risk-Averse Cost: This component penalizes traversal through low-trust regions by directly incorporating the dynamic trust map T. The cumulative risk escalates as path segments intersect areas with diminished reliability:

C_{risk} (π, T) = \sum_{m_{l} \in π} (1 - T_{i})

(19)

T_{i}

denotes the dynamic trust degree of grid cell

m_{i}

. A diminished trust degree

T_{i}

indicates conflicts between prior cognition and real-time perception. It means the prior cognitive knowledge of the region is more likely to be invalid or obsolete. Such regions may contain unexpected obstacles, sudden environmental changes, or temporary blockages. The potential navigation risk is therefore higher. Consequently, even when the hybrid cognitive map

M_{posterior}

indicates a region as unobstructed, the planner exhibits detouring behavior due to elevated risk penalties.

The local path planning challenge thus reduces to a constrained optimization problem:

π^{*} = \arg \min_{π \in Π} C (π)

(20)

Based on the aforementioned optimization objectives, the autonomous navigation of UAV-2 constitutes a persistent optimization closed-loop, with its operational workflow delineated as follows:

Step 1: System Initialization and Path Generation. The system initializes by loading the prior cognitive map

M_{prior}

provided by UAV-1. Utilizing

M_{prior}

, a global reference path

π_{global}

from the start point S to the goal point G is generated using a global planning algorithm.

Step 2: Task Execution and Cognitive Fusion. UAV-2 navigates along the global path

π_{global}

, while continuously acquiring real-time environmental observations

z_{t}^{(UAV 2)}

through onboard sensors. The local environmental state is updated based on these observations to enable cognitive fusion.

Step 3: Cognitive Conflict Detection and Trust Assessment. The system perpetually compares real-time perceptual data with the prior cognitive map

M_{prior}

. For each grid cell

m_{i}

within the perceptual range, a cognitive discrepancy flag

D_{t} = 1

is activated upon detection of a new obstacle. When the cumulative discrepancy count

C_{i}^{t}

for a specific region within a sliding window exceeds a predefined threshold

Γ

, a valid environmental change is confirmed, triggering an update to the dynamic trust degree.

Step 4: Risk-Aware Decision-Making. Upon detection of a cognitive conflict, the system engages in local decision-making based on the latest hybrid cognitive map

M_{posterior}

and the trust map T. The planner ceases reliance on the global path

π_{global}

, and instead generates local paths using a multi-objective cost function. For a candidate local path segment

π_{local}

, the cost function is defined as

\begin{matrix} C (π) = & α \cdot C_{distance} (π) + β \cdot C_{curiosity} (π, M_{posterior}) + λ \cdot C_{risk} (π, T) \end{matrix}

(21)

By solving

π_{local}^{*} = \arg \min_{π \in Π} C (π)

, the optimal local action for the current state is derived.

Step 5: Control Command Execution and State Update. UAV-2 executes the control commands derived from

π_{local}^{*}

, and the system state is updated incrementally to reflect the new configuration.

Step 6: State Evaluation. After each movement step, the system assesses the following conditions: if a cognitive conflict persists, the process returns to Step 3 to reinitiate cognitive conflict detection and trust assessment; if risks are successfully mitigated and the global path is reconnected or the goal is reached, the task is considered complete or the process returns to Step 2; otherwise, it proceeds to Step 5 for the next step of local path planning.

Step 7: Algorithm Termination. The autonomous navigation workflow terminates when UAV-2 confirms arrival at the goal point G.

5. Experiment and Result Analysis

5.1. Experimental Environment Setting

To ensure the methodological rigor and empirical validity of our experimental evaluation, this section establishes a comprehensive testing environment based on the theoretical models developed in Section 3. A 60 × 60 two-dimensional labyrinth environment was constructed on the MATLAB R2024b platform, discretized into a grid-based map to simulate complex indoor scenarios with intricate spatial configurations. The algorithmic parameters is configured as detailed in Table 2, respectively, providing a standardized foundation for comparative performance analysis.

To quantitatively evaluate the performance of the proposed JCRP framework, two core evaluation metrics are defined to characterize navigation safety and risk resistance, respectively:

1. Safe Decision Ratio: This metric quantifies the ability of the UAV to make safe navigation decisions by measuring the proportion of high-trust grid nodes traversed during exploration. It is defined as

η = \frac{N_{safe}}{N_{total}} \times 100 %

(22)

where

N_{safe}

denotes the number of visited nodes with a trust degree greater than 0.7, and

N_{total}

is the total number of nodes visited during the exploration phase. A higher value of

η

indicates a stronger capability to avoid high-risk regions and make safe navigation decisions.

2. Risk Exposure Level: This metric assesses the algorithm’s risk resistance by quantifying the proportion of low-trust grid nodes encountered along the navigation path. It is defined as

ρ = \frac{N_{risk}}{N_{total}} \times 100 %

(23)

where

N_{risk}

denotes the number of visited nodes with a trust degree less than 0.3, and

N_{total}

is the total number of nodes visited during the exploration phase. A lower value of

ρ

indicates a reduced exposure to high-risk areas and a more robust risk avoidance capability.

5.2. Algorithm Effectiveness in Static Environments

To evaluate the fundamental performance of the JCRP algorithmn under idealized conditions, this section conducts simulation validations within a static labyrinth environment. By comparatively analyzing the exploration trajectory of UAV-1 and the optimized path of UAV-2 leveraging prior cognitive knowledge, we assess the efficacy of the cooperative cognitive mechanism in enhancing navigation efficiency.

As illustrated in Figure 8, the path planned by UAV-2 based on the prior cognitive map represents the current joint-cognitive optimal solution, demonstrating significant advantages over the tortuous exploration path of UAV-1 in an unknown environment. The statistical results of path metrics show that the exploration path of UAV-1 has a mean length of 390.19 and a large standard deviation of 181.69 with the value range from 196.0 to 1329.0. This data feature reflects the high randomness and instability of blind exploration in unknown static environments. In sharp contrast, the optimized path of UAV-2 achieves a much lower mean length of 202.58 and a markedly reduced standard deviation of 28.07 with the value range from 157.0 to 329.0. The detailed statistical distribution is presented in the box plot of Figure 9, which is calculated from 500 independent runs. Throughout the navigation process, UAV-2 consistently maintains safe distances from obstacles without any collisions, demonstrating the reliability of the proposed method in static environment.

As shown in Figure 10, multiple randomized tests with feasible path solutions confirm that the path length of UAV-2 is shortened by approximately 41.39% on average compared to UAV-1, validating the effectiveness of the cooperative cognitive mechanism in avoiding repetitive exploration and improving task efficiency. Under scenarios with reliable prior cognition, the JCRP algorithmn successfully achieves efficient and safe cooperative navigation through cognitive transfer, establishing a solid foundation for applications in structured environments.

5.3. Algorithm Performance Comparison in Dynamic Environments

To evaluate the adaptability and robustness of the JCRP algorithm under abrupt environmental changes, this study first verifies the algorithm’s basic path optimization performance in a static environment without dynamic obstacles as shown in Figure 11 and then constructs a dynamic simulation environment where passage channels are blocked due to sudden geological condition alterations for further performance testing. Under this experimental setup, a comparative analysis is conducted between JCRP and the baseline algorithm—Blind Trust in Prior (BTP). The BTP algorithm is chosen as the baseline because it adopts a typical strategy that fully relies on prior cognition and ignores real-time environmental changes, making it a direct and effective control for verifying the proposed risk-aware mechanism. Other path planning methods are not suitable as they depend on fixed environmental structures or pre-trained models and cannot adapt to the randomly changing and highly dynamic environment in this work.

The simulation results are illustrated in Figure 12 and Figure 13. Due to its over-reliance on prior cognition, the BTP algorithm’s planned paths tend to converge toward severely congested areas, resulting in significantly increased flight risks. In contrast, the JCRP algorithm effectively identifies high-risk regions through differential analysis between real-time sensory data and prior cognition. As shown in Figure 14, the trust values in areas where obstacles appear exhibit significant attenuation characteristics, indicating elevated risk levels in these regions. The risk-aware path planning mechanism enables the UAV to proactively generate safe detour trajectories, demonstrating the algorithm’s adaptability to environmental dynamic changes.

Statistical analysis based on multiple random seed experiments further validates the above conclusions. A total of 500 independent experiments are conducted to ensure statistical reliability. As shown in Figure 15 and Figure 16, in the absence of a trust model, the BTP algorithm frequently traverses low-trust regions in pursuit of geometric optimality due to its lack of risk perception capability, resulting in significantly higher risk exposure levels compared to JCRP. The JCRP algorithm achieves a mean safety decision ratio of 62.57% with a standard deviation of 6.87, which is 10.83 percentage points higher than the mean value of 51.74% with a standard deviation of 7.45 obtained by the BTP algorithm.Meanwhile, the JCRP algorithm reduces the mean risk exposure level to 16.78% with a standard deviation of 2.62 from the mean value of 21.58% with a standard deviation of 5.53 observed in the BTP algorithm, representing an average decrease of 4.80 percentage points. Although the complete JCRP algorithmn shows a marginal increase in path length, its average path risk value and other safety indicators are significantly reduced. The dynamic trust mechanism achieves substantial improvement in safety performance at a controllable cost of efficiency.

5.4. Physical Experiment Validation

To verify the practical utility of the JCRP algorithm from simulation to real-world applications, this study established a physical UAV testing platform. The experimental platform employed a small rotorcraft UAV equipped with omnidirectional infrared rangefinding sensors, with a physical maze environment constructed outdoors as shown in Figure 17. The experimental procedure maintained consistency with simulation settings: UAV-1 initially conducted environmental exploration to construct a prior cognitive map, followed by UAV-2 utilizing the JCRP algorithmn for navigation.

As illustrated in Figure 18, UAV-1 exhibited characteristically cautious and deliberate exploration behavior during its initial mission, with the key flight moments of the entire exploration phase shown in Figure 19, and the whole process required 11 min and 58 s for task completion. In contrast, UAV-2 efficiently leveraged the prior cognitive experience constructed by UAV-1 to rapidly plan and execute a safe navigation path, whose core flight moments in the navigation phase are presented in Figure 20, and the task was completed in just 6 min and 54 s. The results presented here are derived from a single real-world flight test. The actual flight trajectories showed consistency with simulation trends, with UAV-2 achieving approximately 42.34% reduction in task completion time compared to UAV-1. Despite the presence of real-world challenges including sensor noise, localization errors, and control latency, the JCRP algorithmn successfully guided the UAV to avoid obstacles and reach the target efficiently. The physical experimental results provide preliminary validation of the method’s effectiveness and practicality in real physical systems, establishing a solid foundation for its application in more complex scenarios.

6. Conclusions

This study systematically investigates the challenge of multi-unmanned aerial vehicle cooperative navigation in dynamic unknown environments, with a specific focus on the critical issue of credibility assessment when conflicts arise between prior cognition and real-time perception. To address this challenge, we propose a novel collaborative autonomous navigation method termed Joint Cognition and Risk Perception. The core of the JCRP framework involves the introduction of a dynamic trust mechanism and the construction of a Bayesian cognitive fusion network. In contrast to existing methods that either passively rely on prior knowledge or rely solely on real-time perception, this integrated architecture enables UAVs to quantitatively evaluate the reliability of prior knowledge and proactively adjust their reliance on prior cognition based on real-time perceptual feedback. Traditional methods often fail to quantify the credibility of prior knowledge, leading to either overly conservative behavior or risky decisions in dynamic environments. JCRP addresses this critical issue by explicitly modeling trust and risk, enabling a more intelligent balance between exploration and exploitation.

The effectiveness and robustness of the JCRP algorithm were validated through comprehensive simulations and physical experiments conducted in both static and dynamic labyrinth-type obstacle environments. The quantitative results demonstrate that, in static environments, JCRP reduces the path length of successor UAVs by an average of 41.39%, thereby significantly improving navigation efficiency. In dynamic environments characterized by abrupt obstacle changes, JCRP increases the safe decision-making rate by 10.9 percentage points and reduces the risk exposure level to 16.8%, demonstrating superior performance in safety-critical scenarios. Furthermore, in physical platform experiments, the task completion time of successor UAVs is shortened by 42.34%, confirming the algorithm’s practicality and the consistency of its performance with the simulation results.

In summary, the JCRP algorithm presented in this study provides a novel and effective solution for achieving reliable cooperative navigation of multiple UAVs operating under uncertainty. Future research endeavors will focus on three specific directions: 1. Scalability to large-scale clusters: Extending the JCRP framework to scenarios involving multiple pioneer and successor UAVs, and validating its performance in swarm navigation tasks. 2. Cognitive experience transmission under communication constraints: Developing lightweight cognitive map compression and incremental transmission mechanisms to enable efficient knowledge sharing in bandwidth-limited environments. 3. Multi-sensor fusion for enhanced perception: Integrating vision and LiDAR sensors to improve the accuracy and robustness of real-time environmental perception, thereby further strengthening the cognitive fusion process.

These directions aim to further enhance the practicality and generalization capabilities of the algorithm, enabling its deployment in more complex and demanding real-world applications.

Author Contributions

Conceptualization, C.H. and R.W.; Methodology, C.H., R.W., B.J. and Q.Z.; Software, C.H.; Validation, C.H., P.W.; Formal analysis, C.H.; Investigation, C.H., B.J. and P.W.; Data curation, C.H.; Writing—original draft, C.H.; Writing—review & editing, R.W. and Q.Z.; Visualization, C.H.; Supervision, R.W., Q.Z.; Project administration, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All experimental test data in this study contain proprietary experimental details and internal testing information, which means the raw data cannot be made publicly available. For reasonable requests from qualified researchers, the relevant data may be obtained from the corresponding author solely for the purposes of research replication and verification.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pham, H.X.; La, H.M.; Feil-Seifer, D.; Nguyen, L.V. Autonomous uav navigation using reinforcement learning. arXiv 2018, arXiv:1801.05086. [Google Scholar] [CrossRef]
Yu, X.; Jiang, N.; Wang, X.; Li, M. A hybrid algorithm based on grey wolf optimizer and differential evolution for UAV path planning. Expert Syst. Appl. 2023, 215, 119327. [Google Scholar] [CrossRef]
Xiao, L.; Xiao, Z.; Yang, L.; Xiong, L.; Hu, H.; Dai, M.; Xu, X. Application and Performance Evaluation of a UAV Navigation System Based on D* Algorithm for Substation Inspection. J. Nanchang Univ. (Nat. Sci.) 2026, 1–7. [Google Scholar] [CrossRef]
Cheng, X.; Wang, L.; Zhao, W.; Xu, Q.; He, Z. Autonomous Motion Planning Algorithm for Unmanned Aerial Vehicles in Maze Scenarios. In Proceedings of the International Conference on Guidance, Navigation and Control; Springer: Berlin/Heidelberg, Germany, 2024; pp. 408–416. [Google Scholar]
Sonny, A.; Yeduri, S.R.; Cenkeramaddi, L.R. Autonomous UAV path planning using modified PSO for UAV-assisted wireless networks. IEEE Access 2023, 11, 70353–70367. [Google Scholar] [CrossRef]
Li, J.; Xiong, Y.; She, J. UAV path planning for target coverage task in dynamic environment. IEEE Internet Things J. 2023, 10, 17734–17745. [Google Scholar] [CrossRef]
Yu, Y.; Lee, S. Efficient multi-UAV path planning for collaborative area search operations. Appl. Sci. 2023, 13, 8728. [Google Scholar] [CrossRef]
Wang, L.; Huang, W.; Li, H.; Li, W.; Chen, J.; Wu, W. A review of collaborative trajectory planning for multiple unmanned aerial vehicles. Processes 2024, 12, 1272. [Google Scholar] [CrossRef]
Wan, S.; Li, H.; Hu, Y.; Wang, X.; Cui, S. A multi target intention recognition model of drones based on transfer learning. J. Syst. Eng. Electron. 2025, 36, 1247–1258. [Google Scholar] [CrossRef]
Ruan, W.; Duan, H.; Deng, Y. Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements. IEEE/CAA J. Autom. Sin. 2022, 9, 1639–1657. [Google Scholar] [CrossRef]
Xue, Z.F.; Wang, Z.J.; Zhan, Z.H.; Kwong, S.; Zhang, J. Neural network-based knowledge transfer for multitask optimization. IEEE Trans. Cybern. 2024, 54, 7541–7554. [Google Scholar] [CrossRef]
Wang, W.; She, D.; Wang, J.; Han, D.; Jin, B. An Abnormal UAV Trajectory Correction Method Based on Multi-Model Fusion. J. Electron. Inf. Technol. 2025, 47, 1332–1344. [Google Scholar]
Fan, Y.; Chen, L.; Zhang, B.; Liu, Y.; Li, Z.; Sun, Y. Autonomous Inspection and Attitude Control of UAV for Overhead Lines in Distribution Networks Based on Multi-Sensor. Electr. Meas. Instrum. 2024, 61, 186–194. [Google Scholar] [CrossRef]
Wan, L.; Liu, R.; Sun, L.; Nie, H.; Wang, X. UAV swarm based radar signal sorting via multi-source data fusion: A deep transfer learning framework. Inf. Fusion 2022, 78, 90–101. [Google Scholar] [CrossRef]
Meng, W.; Zhang, X.; Zhou, L.; Guo, H.; Hu, X. Advances in UAV Path Planning: A Comprehensive Review of Methods, Challenges, and Future Directions. Drones 2025, 9, 376. [Google Scholar] [CrossRef]
Attoyibi, M.M.; Fikrisa, F.E.; Handayani, A.N. The implementation of a star algorithm (A*) in the game education about numbers introduction. In Proceedings of the 2nd International Conference on Vocational Education and Training (ICOVET 2018); Atlantis Press: Paris, France, 2019; pp. 234–238. [Google Scholar]
Zhou, X.; Yan, J.; Yan, M.; Mao, K.; Yang, R.; Liu, W. Path planning of rail-mounted logistics robots based on the improved dijkstra algorithm. Appl. Sci. 2023, 13, 9955. [Google Scholar] [CrossRef]
Kothari, M.; Postlethwaite, I.; Gu, D.W. Multi-UAV path planning in obstacle rich environments using rapidly-exploring random trees. In Proceedings of the 48h IEEE Conference on Decision and Control (CDC) Held Jointly with 2009 28th Chinese Control Conference; IEEE: Piscataway, NJ, USA, 2009; pp. 3069–3074. [Google Scholar]
Jones, M.; Djahel, S.; Welsh, K. Path-planning for unmanned aerial vehicles with environment complexity considerations: A survey. ACM Comput. Surv. 2023, 55, 234. [Google Scholar] [CrossRef]
Chen, X.; Tang, J.; Ruan, Y.; Zhan, J. Path Planning Methods for UAVs: A Survey. In Proceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering, Xi’an, China, 26–28 January 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 894–903. [Google Scholar]
Zhou, Q.; Lian, Y.; Wu, J.; Zhu, M.; Wang, H.; Cao, J. An optimized Q-Learning algorithm for mobile robot local path planning. Knowl.-Based Syst. 2024, 286, 111400. [Google Scholar] [CrossRef]
dos Santos, M.A.A.; Vivaldini, K.C.T. A review of the informative path planning, autonomous exploration and route planning using UAV in environment monitoring. In Proceedings of the 2022 International Conference on Computational Science and Computational Intelligence (CSCI); IEEE: Piscataway, NJ, USA, 2022; pp. 445–450. [Google Scholar]
Gelli, M.; Bigazzi, L.; Boni, E.; Basso, M. Suboptimal Trajectory Planning Technique in Real UAV Scenarios with Partial Knowledge of the Environment. Drones 2024, 8, 211. [Google Scholar] [CrossRef]
Vetrella, A.R.; Opromolla, R.; Fasano, G.; Accardo, D.; Grassi, M. Autonomous flight in GPS-challenging environments exploiting multi-UAV cooperation and vision-aided navigation. In Proceedings of the AIAA Information Systems-AIAA Infotech@ Aerospace, Grapevine, TX, USA, 9–13 January 2017; p. 0879. [Google Scholar]
Zhou, B.; Xu, H.; Shen, S. Racer: Rapid collaborative exploration with a decentralized multi-uav system. IEEE Trans. Robot. 2023, 39, 1816–1835. [Google Scholar] [CrossRef]
Rusu, A.A.; Večerík, M.; Rothörl, T.; Heess, N.; Pascanu, R.; Hadsell, R. Sim-to-real robot learning from pixels with progressive nets. In Proceedings of the Conference on Robot Learning, PMLR, Mountain View, CA, USA, 13–15 November 2017; pp. 262–270. [Google Scholar]
Bo, L.; Zhang, T.; Zhang, H.; Hong, J.; Liu, M.; Zhang, C.; Liu, B. 3D UAV path planning in unknown environment: A transfer reinforcement learning method based on low-rank adaption. Adv. Eng. Inform. 2024, 62, 102920. [Google Scholar] [CrossRef]
Chen, G.; Guo, Y.; Yang, X.; Ma, T.; Li, C.; Yuan, L.; Han, S. A Dynamic Constrained Multi-Objective Evolutionary Algorithm Based on Multi-Source Knowledge Transfer Strategy. Comput. Eng. Appl. 2025, 1–15. [Google Scholar]
Li, T.; Ma, H.; Li, X.; Xu, J. A Multi-Objective Multi-Task Evolutionary Algorithm with Dynamic Knowledge Selection and Adaptive Transfer. Appl. Res. Comput. 2025, 1–10. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the labyrinth environment: (a) Grid-based Maze Environment. (b) Corridor-style Maze Environment.

Figure 2. Schematic of UAV Safety Distance Constraint.

Figure 3. Diagram of Kinematic Model Simplification Process.

Figure 4. UAV Kinematic Model.

Figure 5. UAV Obstacle Perception Model: (a) Schematic of multi-directional infrared ranging. (b) Schematic of effective detection range.

Figure 6. Installation diagram of UAV infrared sensors.

Figure 7. Technical framework of UAV collaborative navigation.

Figure 8. Trajectory comparison between UAV-1 and UAV-2 in a static environment.

Figure 9. Path length comparison in static environment based on 500 independent runs.

Figure 10. Path length comparison in static environment.

Figure 11. Trajectory comparison in static environment.

Figure 12. Trajectory comparison without trust guidance in dynamic environment.

Figure 13. Trust-guided trajectory comparison in dynamic environment.

Figure 14. Visualization of trust degree map.

Figure 15. Comparison of safety decision ratios in dynamic scenarios.

Figure 16. Risk exposure degree comparison in dynamic scenarios.

Figure 17. Configuration of flight test scenario.

Figure 18. Comparative path analysis between UAV-1 and UAV-2.

Figure 19. Selected flight moments of UAV-1 during exploration phase.

Figure 20. Selected flight moments of UAV-2 during navigation phase.

Table 1. Comparative analysis of cooperative navigation methodologies and their limitations.

Methodology Category	Representative Approaches	Advantages	Limitations
Traditional Path Planning	A* [16], Dijkstra [17], RRT [18]	Theoretical optimality, completeness	Heavy dependency on accurate prior maps; vulnerable to environmental dynamics
Real-time Cooperative Strategies	Vetrella et al. [24], Yu et al. [7], Zhou et al. [25]	High parallelism, real-time coordination	Stringent communication requirements; single-point failures in centralized architectures
Policy Transfer Methods	Rusu et al. [26], Lan Bo et al. [27]	Reduced training time, transfer learning efficacy	Black-box decision-making; limited generalization to novel scenarios
Cognitive Transfer Approaches	Fan et al. [13], Chen et al. [28], Li et al. [29]	Enhanced interpretability, causal reasoning capabilities	Lack quantitative credibility assessment; limited dynamic fusion mechanisms

Table 2. Parameter Settings.

Parameter	Symbol	Value
Map Size	–	60 × 60
Start Coordinate	S	(60,0)
Goal Coordinate	G	(0,60)
Simulation Step Size	$Δ t$	0.1 s
Sliding window length	K	0.5
Valid change count threshold	$Γ$	3
Decay factor	$γ$	0.9
Trust recovery increment	$δ$	0.1
Propagation coefficient	$η$	0.3
Initial trust	$T_{0}^{i}$	1.0
Path length weight	$α$	0.3
Unknown area cost weight	$β$	0.3
Risk cost weight	$λ$	0.4

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, C.; Wei, R.; Jiang, B.; Wei, P.; Zhang, Q. A Collaborative Navigation Algorithm for Unmanned Aerial Vehicles Based on Joint Cognition and Risk Perception. Drones 2026, 10, 186. https://doi.org/10.3390/drones10030186

AMA Style

Huang C, Wei R, Jiang B, Wei P, Zhang Q. A Collaborative Navigation Algorithm for Unmanned Aerial Vehicles Based on Joint Cognition and Risk Perception. Drones. 2026; 10(3):186. https://doi.org/10.3390/drones10030186

Chicago/Turabian Style

Huang, Chenkang, Ruixuan Wei, Benqi Jiang, Pengfei Wei, and Qirui Zhang. 2026. "A Collaborative Navigation Algorithm for Unmanned Aerial Vehicles Based on Joint Cognition and Risk Perception" Drones 10, no. 3: 186. https://doi.org/10.3390/drones10030186

APA Style

Huang, C., Wei, R., Jiang, B., Wei, P., & Zhang, Q. (2026). A Collaborative Navigation Algorithm for Unmanned Aerial Vehicles Based on Joint Cognition and Risk Perception. Drones, 10(3), 186. https://doi.org/10.3390/drones10030186

Article Menu

A Collaborative Navigation Algorithm for Unmanned Aerial Vehicles Based on Joint Cognition and Risk Perception

Highlights

Abstract

1. Introduction

2. Related Work

2.1. UAV Path Planning

2.2. Multi-Robot Cooperative Exploration

2.3. Knowledge Transfer

3. Problem Formulation

3.1. Problem Description

3.2. UAV Kinematics Model

3.3. UAV Detection Model

4. Proposed Approach

4.1. Overall Framework

4.2. Collaborative-Oriented Cognitive Experience Construction and Fusion

4.2.1. Prior Cognitive Construction

4.2.2. Joint Cognitive Development

4.3. Dynamic Trust Modeling Based on Cognitive Discrepancy

4.4. Risk-Aware Autonomous Navigation

5. Experiment and Result Analysis

5.1. Experimental Environment Setting

5.2. Algorithm Effectiveness in Static Environments

5.3. Algorithm Performance Comparison in Dynamic Environments

5.4. Physical Experiment Validation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI