An Online Human-Aware Behavior Planning Method for Nondeterministic UAV System Under Probabilistic Model Checking

Zhu, Jiancheng; Wang, Peng; Peng, Yong; Yin, Quanjun

doi:10.3390/drones9120832

Open AccessArticle

An Online Human-Aware Behavior Planning Method for Nondeterministic UAV System Under Probabilistic Model Checking

College of Systems Engineering, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(12), 832; https://doi.org/10.3390/drones9120832

Submission received: 15 July 2025 / Revised: 14 November 2025 / Accepted: 25 November 2025 / Published: 1 December 2025

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

The Markov decision process is used to construct the global probabilistic behavior model of the UAV offline, but a finite state automaton in finite horizon is dynamically constructed online.
A value iterative algorithm is introduced to solve the optimal behavior plan online within the finite horizon, then an infinite horizon planning and execution algorithm is formed.

What is the implication of the main finding?

This paper proposes an online human-aware behavior planning method to enable a UAV dynamically satisfy the high-level LTL task description from the human collaborators, which has the potential to be applied to human–UAV collaboration.

Abstract

This paper proposes an online human-aware behavior planning method to enable Unmanned Aerial Vehicles (UAVs) to dynamically satisfy high-level linear temporal logic (LTL) task descriptions from human collaborators. The proposed method has the potential to be applied to the industrial human–UAV collaboration. The specific process is as follows. Firstly, the global high-level task of the UAV is described using the formal language of LTL, which can usually be issued by human collaborators in natural language. Secondly, the task description is transformed into a deterministic Rabin automaton, and then the state values of the automaton are given by the distance from the accepted states. Thirdly, the Markov decision process is used to construct the probabilistic behavior model of the UAV offline. Based on this model, a finite state automaton in the finite horizon is dynamically constructed online, and the Cartesian product system within the horizon is constructed, in which the expected arrival states within the horizon are set by the minimum state values. Finally, aiming at maximizing the reachability probability of the expected states, the value iterative algorithm is introduced to solve the optimal behavior plan online within the finite horizon. After it, an infinite horizon planning and execution algorithm is formed. Experiments and analysis show that the results of the online behavior planning are consistent with the task logic at the semantic level, indicating the correctness of the proposed method. Moreover, the proposed method can effectively alleviate the state explosion caused by the global probabilistic model checking and improve the efficiency of plan generation.

Keywords:

Unmanned Aerial Vehicle; temporal logic; behavior planning; Markov decision process; model checking; human–robot collaboration

1. Introduction

Nowadays, plentiful practical applications on the UAV technologies have been investigated and discussed [1,2]. And recently, formal languages have been proposed to specify complex tasks for UAVs, such as linear temporal logic formulas, as an intuitive yet powerful way to describe both spatial and temporal requirements for UAV behaviors. In this work, the UAV is allowed to operate along with human collaborators in close proximity to perform the desired tasks to achieve human–UAV shared production goals [3,4]. Similar to human–human collaboration, the UAV should be able to infer the intention of its human collaborators and generate behavior to avoid collisions or improve fluency during human–UAV collaboration, which can be referred to as human-aware behavior planning [5]. Some research studies [6,7] indicate that the accomplishment of human-aware behavior planning can garner positive effects during human–UAV collaboration. According to the experiments of different settings, more behavior-level UAV adaptation of the human’s behavior can better obtain team fluency and efficiency, as well as human satisfaction and comfort. To incorporate human intervention into behavior planning, shared autonomy has been studied by numerous research works. The domain of shared autonomy aims to study approaches to bring human into the loop of UAV control while maintaining a degree of autonomy [8,9]. In particular, the method’s need to design and verify autonomous systems and provide better feedback in human–UAV collaboration systems has been stressed. Model checking a formal method traditionally used in the design and verification of complex behaviors for software systems [10,11] has the potential to address these needs. Under model checking theory, considerable attention has been devoted to the research on the automatic synthesis of a controller for UAV [12,13], which is used to perform complex and high-level tasks described as temporal logic formulas, such as ‘periodically visiting the locations A, B and C, while avoiding the location D’. Research on this kind of problem mainly relies on a three-step hierarchical program [14,15,16]. First, the dynamics of the UAVs are abstracted into a finite discrete transition system based on sampling [17] or element decomposition. Second, using formal verification techniques, a discrete behavior plan is generated to complete the task. Third, the discrete behavior plan is transformed into a continuous controller of the original system. In these hierarchical steps, the generation of discrete behavior plans first requires combining a UAV system model and an automaton that derives the temporal logic task into a large-scale product system, resulting in high computational complexity in the process of generation, which is not suitable for online human–UAV collaboration scenarios. Many scholars seek to introduce model predictive control or receding horizon control technology into the generation process of discrete behavior plans.

1.1. Related Work

At present, the human-aware behavior planning methods for UAVs aiming at dynamically satisfying the linear temporal logic description of high-level tasks designed for deterministic transition systems (DTSs) mainly focus on two system models. One is a simple deterministic transition system, the other is an affine system based on a deterministic transition system. For deterministic transition systems, Ding et al. [18] consider a receding horizon control of finite deterministic systems, which must satisfy the high-level task descriptions represented by linear temporal logic formulas. They propose a controller synthesis framework inspired by model predictive control, in which the rewards of the states are locally optimized at each time step in the finite horizon, and real-time optimal control is used. By applying appropriate constraints, the infinite trajectory generated by the controller meets the desired temporal logic formula. Ding et al. [19] also consider a scenario in which UAVs can monitor in areas with multiple targets and meet a set of high-level and rich LTL task descriptions. They use a receding horizon controller to compute the optimal path of the UAV in a subset of task space, and provide a framework to ensure that the overall strategy of the system meets the expected LTL description, and the control decision is based on the local information obtained online. Cai et al. [20] study the online optimal behavior planning problem of autonomous UAVs satisfying linear temporal logic constraints. The hard and soft LTL constraints are considered, in which the hard constraint enforces the security requirements, while the soft constraint indicates that it can be relaxed to avoid adherence to user-specified tasks. A receding horizon control is synthesized to maximize the cumulative utility in a finite horizon, while ensuring that the security constraints and most of the soft constraints are fully satisfied. Ulusoy et al. [21] propose a receding horizon method to control the autopilot UAV to meet the rich task descriptions. These tasks come from the service requests in different areas to the autonomous driving UAVs. The global task description consists of a set of static and pre-defined temporal logic representations of requests, a set of regular expressions of dynamic requests that can only be locally perceived, and the service priority order of dynamic requests. Tumova et al. [22] study the synthesis of strategies for multi-robot systems to achieve the complex, high-level and long-term goals assigned to each robot. The purpose is to reduce the computational complexity by decomposing the strategy synthesis problem into a series of short horizon programming problems that can be solved iteratively at the runtime of the robots. The correctness of the method is discussed, and some assumptions are found. For deterministic affine systems, Wongpiromsarn et al. [23] propose a receding horizon framework to meet a class of linear temporal logic specifications, which can describe a wide range of attributes, including safety, stability, response, and guarantee. It is proved that the composition of the target generator, trajectory planner, continuous controller, and the corresponding receding horizon framework can ensure the correctness of the system. Nenchev et al. [24] solve the control problem of mobile UAVs that must complete limited tasks in a partially unknown static environment with the shortest time. The task is expressed as a syntax safe linear temporal logic (scltl) formula. Instead of abstracting the hybrid system of behavior in the modeling environment, they propose a method based on the parameterization of continuous behavior, and introduce a violation measurement for forced satisfaction description. Then, a parametric optimal control problem (OCP) is described, which can be solved in receding horizon only after previously unknown environment attributes are detected. Wongpiromsarn et al. [25] describe a Python-based (Python 2.7) software toolbox Tulip for embedded control software synthesis, which is proved to be correct relative to the expression subset of linear temporal logic specification. Tulip integrates finite state abstraction of the control system, digital synthesis from LTL description, and receding horizon planning. Tulip uses the receding horizon framework to decompose the synthesis problem into a series of smaller problems so as to ensure the correctness and reduce the computational complexity of the synthesis program. Shaffer et al. [26,27] propose the hierarchical integration of reactive synthesis, which assures desired system design traits, and dynamic allocation, used for making heuristic-based decisions, to manage UAVs fighting a wildfire. Yoo et al. [28] present an algorithm for automatically synthesizing a continuous non-linear flight controller according to a complex temporal logic task specification that can include contingency planning rules. Zhang et al. [17] propose a randomized sampling-based motion planning algorithm with probabilistic guarantees of completeness and optimality which greatly reduce the computation time. Li et al. [29] propose an optimal policy based on model-checking techniques to satisfy the complex task. Obviously, the above methods are designed for deterministic systems. As far as we know, an unresolved issue that has not received sufficient attention is online strategy synthesizing for nondeterministic systems under probabilistic model checking [30,31,32], which aims to dynamically complete probabilistic behavior modeling, formal verification, and behavior planning.

1.2. Contribution

The specific process of the method proposed in this paper is as follows. First, the global high-level task of the UAV is described using the formal language of linear temporal logic, which can usually be issued by human collaborators in natural language. Second, the task description is transformed into a deterministic Rabin automaton through the calculation of the ltl2dstar tool [33,34], and then the state values of the automaton are given based on the distance from the accepted states of the automaton. Third, the Markov decision process is used to construct the global probabilistic behavior model of the UAV offline, where the transitions triggered by the actions from a state are probabilistic and not deterministic. Based on this model, a finite state automaton in finite horizon is dynamically constructed online, and the product system within the horizon is constructed by Cartesian product, in which the expected arrival states within the horizon are set by the minimum state values. Fourth, aiming at maximizing the reachability probability of the expected states, a value iterative algorithm is introduced to solve the optimal behavior plan online within the finite horizon. After it, an infinite horizon planning and execution algorithm is formed. Through some simulation experiments, the results and analysis show that the strategies of the online behavior planning are consistent with the task logic at the semantic level, indicating the correctness of the proposed method. Moreover, the proposed method can effectively alleviate the state explosion caused by the global probabilistic model checking and improve the efficiency of plan generation. Finally, based on the Robot Operating System (ROS) and lidar, a smart UAV that can simultaneously build and navigate is built, and the speech recognition software and the algorithm of this paper are integrated to complete the autonomous navigation of the smart UAV under the voice commands of the human collaborators, which verifies the effectiveness of the method.

The organizational structure of the paper is as follows. In Section 2, some preliminaries of linear temporal logic and probabilistic behavioral models are given. In Section 3, a shared autonomy framework for industrial human–UAV collaboration is proposed. In Section 4.1, firstly the global task description for a UAV is specified; secondly the probabilistic behavior model of the UAV is established offline and then the Cartesian product system is constructed in the finite time domain. Online behavior planning within the finite time domain and the execution across the infinite time domain are presented in Section 4.2 and Section 4.3, respectively. Simulations are conducted in Section 5 and subsequently applied to the online human-aware behaviors planning for a smart car in Section 6. Finally, a brief summary and prospects are given in Section 7.

2. Preliminaries

2.1. Linear Temporal Logic

Linear temporal logic (LTL) is employed to describe the high-level task specifications of UAVs. Roughly, an LTL formula is composed of a set of atomic propositions Π, standard Boolean operators ¬ (negative), ∨ (disjunctive), ∧ (conjunction), → (implication), and temporal operators

X

(next),

U

(until), □ (always), ◊ (eventually). The syntax of an LTL formula is as follows:

φ \overset{Δ}{=} T |p |φ_{1} \land φ_{2} |\neg φ |X φ |φ_{1} U φ_{2}

(1)

where

T \overset{Δ}{=} True

and

p \in AP

. The semantics of LTL are given as a set of infinite words over the alphabet

2^{Π}

. A word satisfies an LTL formula

ϕ

if

ϕ

is true at the first position of the word;

□ φ

means that

φ

is true at all positions of the word;

◊ φ

means that

φ

eventually becomes true in the word;

X φ

…;

φ_{1} U φ_{2}

means that

φ_{1}

has to hold at least until

φ_{2}

is true. A detailed description of the syntax and semantics of LTL can be found in [9].

To apply these property patterns to the UAV missions, let atomic propositions indicate the most recent waypoint visited by the UAV and again consider the patterns in Example 1. Based on these property patterns, a number of useful mission specifications could be developed by an operator.

Example 1.

For instance, an operator might surveil a particular set of ordered waypoints according to a specification formed by a simple ‘sequencing’ property, e.g.,

◊ (w_{1} \land ◊ (w_{2} \land ◊ w_{3}))

(2)

Or, the operator might want to surveil one of the waypoints once using a ‘coverage’ property and establish a repeating patrol route for a set of ordered waypoints using a ‘recurrent sequencing’ property, e.g., using the specification

◊ w_{2} \land □ ◊ (w_{3} \land ◊ (w_{1} \land ◊ w_{5}))

(3)

UAV never surveils a particular waypoint, e.g.,

□ \neg w_{1}

(4)

The set of words satisfying LTL descriptions over the atomic proposition set can be represented by a deterministic Rabin automaton (DRA).

Definition 1

(DRA [10]). A DRA is defined as a tuple

R = (Q_{R}, q_{0, R}, Σ_{R}, δ_{R}, Ac c_{R})

, where

Q_{R}

is a finite set of states;

Σ_{R}

is an alphabet;

δ_{R} \subseteq Q_{R} \times Σ_{R} \times Q_{R}

is a transition function;

q_{0, R} \in Q_{R}

is the initial state;

Ac c_{R} \subseteq 2^{Q_{R}} \times 2^{Q_{R}}

is a set of accepted states pairs; and

Ac c_{R} = \{(L^{1}, K^{1}), (L^{2}, K^{2}), \dots, (L^{N}, K^{N})\}

,

L^{i}, K^{i} \subseteq Q_{R}, \forall i = 1, 2, \dots, N

. An infinite run

q_{0} q_{1} q_{2} \dots

of

A

is accepted if there are some state pairs

(L^{i}, K^{i}) \in Ac c_{R}

such that

\forall m \geq n, q_{m} \notin L^{i}

, for

\exists n \geq 0

and

\overset{\infty}{\exists} n \geq 0

,

q_{n} \in K^{i}

, where

\overset{\infty}{\exists}

means that there are infinite times. In other words, the run should intersect with

L^{i}

only finitely many times and

K^{i}

infinitely many times.

2.2. Probabilistic Behavior Model

The Markov decision process is used to model the global behavior of UAVs, which is mainly used to describe the probability and uncertain behavior of UAVs.

Definition 2

(MDP [10]). MDP is defined as a tuple

M = (S, \bar{s}, α_{M}, δ_{M}, L)

, where S is a finite set of states;

\bar{s} \in S

is the initial state;

α_{M}

is a set of behaviors;

δ_{M} : S \times α_{M} \to D i s t (S)

is a probabilistic transition function with

D i s t (S)

as a set of distribution over S; and

L : S \to 2^{AP}

is a labeling function, which maps each state to a set of atomic propositions.

The transition between states in MDP includes two steps. First, one or more behaviors are selected from the behavior set

α_{M}

. The set of valid behaviors at the state is given by

A (s) \overset{Δ}{=} \{a \in α_{M} |δ_{M} (s, a) > 0\}

. The transition generated by selecting the behavior

a \in A (s)

is uncertain. Secondly, a subsequent state

s^{'}

is randomly selected according to the probability distribution

δ_{M} (s, a)

, that is, the probability of transition to the state

s^{'}

is equal to

δ_{M} (s, a) (s^{'})

. In deterministic models, the transitions triggered by the behaviors are valid.

An infinite state path of MDP is a state behavior sequence

π = s_{0} a_{0} s_{1} a_{1} \dots

, in which, for all

i \in N

,

s_{i} \in S

,

a_{i} \in A (s_{i})

and

δ_{M} (s_{i}, a_{i}) (s_{i + 1}) > 0

. A finite state path

ρ = s_{0} a_{0} s_{1} a_{1} \dots a_{n - 1} s_{n}

is the prefix of an infinite state path that terminates at the state

s_{n}

.

F P a t h_{M, s}

and

I P a t h_{M, s}

are respectively defined as the path set containing finite states and that containing infinite states starting from state s in

M

. For a finite state path,

|ρ| = n

means its length, and

l a s t (ρ) = s_{n}

means its final state. For an infinite state path

π = s_{0} a_{0} s_{1} a_{1} \dots

, its

(i + 1)

-th state

s_{i}

is recorded as

π (i)

, and its trace

t r (π)

is a behavior sequence

a_{0} a_{1} \dots

.

Definition 3

(Plan [12]). The plan of MDP can be obtained by a function

σ : F P a t h_{M} \to D i s t (α_{M})

, only if

a \in A (l a s t (ρ))

,

σ (ρ) (a) > 0

.

The plan is used to select a behavior in each state of MDP so as to eliminate the uncertainty of the behaviors. The set of all plans for

M

is denoted as

A d v_{M}

.

3. Proposed Framework

This section presents the proposed shared autonomy framework into the human–UAV collaboration, which is shown in Figure 1 and divided into several phases, including task description, task interpretation, probabilistic behavior modeling, product system modeling, plan generation, task execution, and task monitoring.

(1): Task description aims to describe the high-level task instructions input by human in natural language as the LTL formula.
(2): Task interpretation is responsible for converting the LTL formula into an automaton, which can be understood by the UAV.
(3): Probabilistic behavior modeling is used to model the probabilistic behavior of the UAV.
(4): The probabilistic behavior model is synthesized with a task automaton in the form of a Cartesian product system.
(5): In the plan generation, the behavior plan is generated with a specific strategy generation algorithm.
(6): The corresponding task achievement probability is given to the task monitor.

Therefore, the human collaborators can monitor whether the UAV completes the task according to the requirements through task monitoring so as to dynamically adjust the high-level task instruction.

4. Methodology

4.1. Product System Based on Finite State Automaton

4.1.1. Finite State Automaton Within h

Based on the state values obtained from the DRA (transformed from the global task LTL), a finite state automaton is constructed online by graph search, and the accepted states within the horizon are determined.

Definition 4

(FSAh).The finite state automaton

A^{h}

within the time domain h is defined as

A^{h} = (Q_{A}, q_{i n i t, A}, Σ_{A}, δ_{A}, F_{A})

, where

$Q_{A} \subset Q_{R}$ is a finite set of states;
$q_{i n i t, A}$ is the initial state within the time domain;
$Σ_{A} = Σ_{R}$ ;
$Q_{A}^{0} = \{q_{i n i t, A}\}$ ;
For all $1 \leq j \leq h$ , if and only if $q \in Q^{j - 1}$ , $q^{'} \in Q_{A}^{j}$ and $(q, σ, q^{'}) \in δ_{A}^{j}$ ;
Finally, $Q_{A} = ⋃_{0 \leq j \leq h} Q_{A}^{j}$ and $δ_{A} = ⋃_{1 \leq j \leq h} δ_{A}^{j}$ ;
$q^{max} \in Q_{A} ∖ \{q_{i n i t, A}\}$ , $q \to^{*} q_{a c c}$ , and $|δ_{q, q_{a c c}}|$ is the minimum.

FSA h is not a DRA because it does not have an infinite state path, which is an automaton that reads a finite number of words and can be regarded as a directed graph.

By calculating the shortest distance from the state

q \in R

to the accepted states, the following distance to be completed can be defined for the DRA.

Definition 5

(Distance to be completed

d_{φ}

). If

\forall i \in \{1, \dots, N\}

,

q \notin K^{i}

and

\exists q_{Acc}

, such that

q \to^{*} q_{Ac c_{R}}

,

\to^{*}

means that there are connectivity paths, then

d_{φ} (q) = min_{q^{'} \in S u c_{q}} \{d_{φ} (q^{'}) + {|δ_{q, q^{'}}|}^{- 1}\}

(5)

where

|δ_{q, q^{'}}|

means the transition distance from the state q to the state

q^{'}

. If

\exists i \in \{1, \dots, N\}

,

q \in K^{i}

, then

d_{φ} (q) = 0

. For the other state q,

d_{φ} (q) = |Q_{R}|

, where

|Q_{R}|

represents the total number of

R

.

As can be seen from the above definition,

d_{φ}

represents the minimum transition distance to an accepted state set

K^{i}

. If an accepted state set

K^{i}

of

A c c_{R}

can be reached from q, then

d_{φ} (q) \leq |Q_{R}| - 1

. If q is an accepted state,

d_{φ} (q) = 0

. If there is no transition from q to an accepted state set

K^{i}

in the

A c c_{R}

, then

d_{φ} (q) = |Q_{R}|

. The value of

d_{φ}

can be calculated by the fixed point iteration method as follows:

d_{φ}^{k + 1} (q) = min \{d_{φ}^{k} (q), {min}_{q^{'} \in S u c_{q}} \{d_{φ}^{k} (q^{'}) + {|δ_{q, q^{'}}|}^{- 1}\}\}

(6)

where if

\forall i \in \{1, \dots, N\}

,

q \in K^{i}

, then

d_{φ}^{0} (q) = 0

. For the other state q,

d_{φ}^{0} (q) = |Q_{R}|

.

4.1.2. Product System Within H

The finite state automaton and its accepted states allow us to define tasks completed in time and order, which can make UAV transits complete the global tasks. We complete this process by defining a product system, which can capture the allowed behavior (state–behavior pair) of the UAV within the time domain.

Definition 6

(Product system within H). Within time domain H, the product system

P^{H}

of the MDP

M

and the finite state automaton

A^{h}

is defined as

P^{H} = (Q_{P}, q_{i n i t, P}, Σ_{P}, δ_{P})

, where we have the following:

$Q_{P} \subset S \times Q_{A}$ is a finite set of states;
$q_{i n i t, P} = (s_{i n i t}, q_{i n i t, A})$ , $s_{i n i t}$ is the initial state of the MDP within H;
$Σ_{P} = Σ_{A}$ ;
$Q_{P}^{0} = \{q_{i n i t, P}\}$ ;
For all $1 \leq j \leq H$ , if and only if $A (s) \in α_{M}$ , $σ \subset L (s)$ and $(q, σ, q^{'}) \in δ_{A}$ , $(s^{'}, q^{'}) \in Q_{P}^{j}$ , $((s, q), σ, (s^{'}, q^{'})) \in δ_{P}^{j}$ ;
Finally, $Q_{P} = ⋃_{0 \leq j \leq H} Q_{P}^{j}$ and $δ_{P} = ⋃_{1 \leq j \leq H} δ_{P}^{j}$ .

The tuple

P^{H}

is a product MDP and can be regarded as a directed graph. The state value of

Q_{P}

is inherited from the local automaton:

d_{φ} (Q_{P}) = \{\begin{matrix} d_{φ} (Q_{A}), \\ 0, \end{matrix} \begin{matrix} if Q_{A} \subseteq Q_{P} \\ otherwise \end{matrix}

(7)

4.1.3. Determining the Target State Within H

As this paper constructs the product system based on a local automaton, there may be no accepted states within the current horizon. Therefore, the following method is proposed to determine the local target state for the product system

P^{H}

within the finite horizon. As in Definition 6, each state in the product system

P^{H}

has a distance

d_{φ} (Q_{P})

from the accepted states. The smaller the distance is, the closer the UAV reaches the state to meet the task specification. Therefore, it is necessary to obtain the states with the minimum distance from the state set of the Cartesian product system within the current horizon

Q_{P}^{M} = \cup_{Q_{P}} arg {min}_{Q_{P}} d_{φ} (Q_{P})

to determine the local target state. Because the product system

P^{H}

is obtained by Cartesian product of the MDP in the horizon H and the automaton in the horizon h, multiple states may have the same minimum distance from the accepted states, but not all these states can be regarded as the local target state due to the acceptance condition of the automaton. Thus, the classical Dijkstra algorithm is introduced to calculate the local target state

Q_{P}^{F}

in the set

Q_{P}^{M}

closest to the initial state of

P^{H}

. Finally, a plan

σ_{P}^{*} = arg {max}_{σ_{P}} {Pr}_{P^{H}, q_{i n i t, P}}^{max} (Q_{P}), Q_{P} \in Q_{P}^{F}

is generated for the UAV to arrive at the state

Q_{P}^{F}

with the maximum probability from the initial state.

Example 2.

An example of determining the local target state within the current horizon is shown in Figure 2. The MDP model of the UAV is illustrated in Figure 2a. The LTL task specification is

φ \overset{Δ}{=} \neg A U B \land X F C

, which means that the UAV cannot pass through A in the process of going to B, and then go to C after arriving at B. The DRA transformed from the task specification of the UAV is shown in Figure 2b, and the initial state is 2. When

h = 1

, the DRA can reach states 1 and 3 from the initial state 2. When reaching state 1, it means that the task specification is violated. Due to the previously defined state values, UAV will not reach A, and the reachable states in the DRA are limited to states 2 and 3. Therefore, the constructed finite horizon product system is shown in Figure 2c. According to the defined distance metric, state 3 of DRA is closer to the accepted state 0 than state 2, so the distance of state 3 is less than state 2. Then, in the constructed product system, the states in the same color region have the same distance, in which the distance in the yellow region is less than that in the green region, that is, it is necessary to determine the local target state within the current horizon in the yellow region. The initial state of the current product system is (1, 2), while the state closest to the initial state in the yellow region is (3, 3). Thus, state (3, 3) is determined as the target state of the current product system, indicating that within the current horizon, the UAV can meet some of the task specification φ and reach B without going through A.

When

h = 2

, states 1, 3, and 0 can be reached from the initial state 2 in the DRA. The same condition for reaching 1 is violated. Due to the previously defined state values from which the UAV will not reach A, the reachable states in the DRA are 2, 3, and 0. Therefore, the constructed finite horizon product system is shown as Figure 2d, in which the states in the blue region have the minimum distance, that is, the target state within the current horizon is determined in the blue region. The state closest to the initial state (1, 2) is (4, 0), so the target state within the current horizon is (4, 0). It means that the UAV can meet the task specification φ in the current horizon.

4.2. Behavior Planning Within the Time Domain H

The value iteration algorithm is used to obtain an optimal plan

σ^{max}

of the product system

P^{H}

to maximize the cumulative probability of reaching the desired target state. The specific formula is as follows:

σ^{H, max} (s) \overset{Δ}{=} arg max_{a \in α_{M}} (\sum_{s^{'} \in S} δ_{M} (s, a) (s^{'}) \cdot P r_{s^{'}}^{max} (r e a c h (q^{max})))

(8)

The reverse value iteration and the plan generation process can be seen in ref. [5]. Algorithm 1 shows the procedure of the online behavior planning within finite time domain.

Algorithm 1 Procedure of

s h o r t_h o r i z o n_p l a n

.

Input: The MDP $M$ of the UAV, the deterministic Rabin automaton $R$ , the current state of the UAV S, the current state of the automaton q, the finite horizon $h, H \in N$ .
Output: The optimal behavior plan $σ^{max}$ to maximize the probability of task satisfiability within the horizon.

1:: Initialize $q_{i n i t, P} = (s, q)$ ;
2:: Construct the finite state automaton $A^{h}$ , calculate the local target state $q^{max}$ (Definition 4);
3:: Construct the product system $P^{H}$ (Definition 6);
4:: Generate the optimal behavior plan $σ^{H, max}$ within the horizon based on value iteration.

4.3. Tasks Execution Across Infinite Time

In order to meet the global task description, after the UAV obtains the optimal behavior plan within the horizon H, it executes according to the behavior plan. When it reaches the expected state within the horizon, it takes the reached expected state as the initial state in the new horizon to replan the optimal behavior plan for the next cycle. The process is repeated until the UAV meets the global task description. Algorithm 2 shows the execution process of infinite horizon iteration.

Algorithm 2 Infinite horizon planning and execution.

Input: State set of the UAV S, behavior set $α_{M}$ , distribution $D i s t (S)$ , alphabet $AP$ , label function L, task description $φ$ .
Output: Null.

1:: Initialize horizon $h, H \in N$ ;
2:: Use the ltl2dstar tool to transform $φ$ into the deterministic Rabin automaton $R$ , and calculate state values (Definition 1);
3:: Construct the product system $P^{H}$ (Definition 6);
4:: Generate the optimal behavior plan $σ^{H, max}$ within the horizon based on value iteration.

5. Simulation and Analysis

5.1. Settings

The correctness and effectiveness of the proposed method are verified by simulation. A UAV is required to operate in a search-and-rescue scenario according to the instructions of human collaborators in the grid mountainous areas. Due to the complex environment and task, it is assumed that the UAV has actuator deviations. The UAV automatically leaves the initial location, successively goes to positions

A, B, C

for search tasks. Then, it delivers supplies to the target area. Simulations below are all carried out on a computer with core i7-1065G7.

5.2. Correctness and Effectiveness

Before completing the whole simulation, we verify the correctness and efficiency of the proposed method through some simple LTL operators. The test case is shown in Figure 3.

Firstly, the LTL task specification of the UAV is set as

φ_{1} \overset{Δ}{=} \neg A U C

, which means that ‘the UAV avoids reaching the location A until reaching the location C’, and the generated DRA using the ltl2dstar tool is shown in Figure 4. As can be seen from the figure, there are three states in the automaton, of which the initial state is state 0 and the accepted state is state 2. All states can be reached from the initial state in one step, i.e.,

h = 1

, and the simulation parameters set for

φ_{1}

are shown in Table 1.

The deterministic plans generated within different time domains are shown in Figure 5.

The related time consumption and the size of the generated state space are shown in Figure 6. It can be seen from the figures that the proposed method can generate the correct plans for the UAV, and can effectively improve the timeliness of plan generation compared with the classical probabilistic model checking method.

Secondly, the LTL task specification of the UAV is set as

φ_{2} \overset{Δ}{=} G (A \to X \neg A)

, which means that ‘the UAV always visits the location A and next avoids reaching the same location’, and the generated DRA is shown in Figure 7.

The deterministic plans generated within different horizons are shown in Figure 8, and the simulation parameters set for

φ_{2}

are shown in Table 2.

The related time consumption and the size of the generated state space are shown in Figure 9. It can be seen that the proposed method can generate the correct plan for the UAV, and can effectively improve the efficiency of plan generation compared with the global calculation method.

Thirdly, the LTL task specification of the UAV is set as

φ_{3} \overset{Δ}{=} G \neg A \land GF C

, which means that ‘the UAV always avoids reaching the location A and repeats reaching the location C’, and the generated DRA is shown in Figure 10.

The deterministic plans generated within different time domains are shown in Figure 11, and the simulation parameters set for

φ_{3}

are shown in Table 3.

The related time consumption and the size of the generated state space are shown in Figure 12. It can be seen from the figure that the same conclusion as above can be made.

5.3. Result Analysis

In this subsection, according to the simulation settings in Section 5.1, the top-level task specification of the UAV is described as the following LTL formula:

φ_{1} \overset{Δ}{=} F A \land X F B \land X F C \land X F D \land X F E

(9)

Then, it can be transformed into a deterministic Rabin automaton. Subsequently, the probabilistic behavior model of the UAV is constructed. The set of behaviors that the UAV can choose to perform at each grid of the mountainous area is

α_{M} = \{\to, ↓, \leftarrow, \to\}

. It is assumed that these behaviors can be realized by the underlying controller of the UAV, so that the UAV can move between adjacent grids. In addition, we also assume that there is an actuator bias, for example, when the UAV performs behavior ‘←’ at the state

s 9

, there is a probability arrival state

s 10

of 0.8, and there are probability arrival states

s 6

and

s 15

of 0.1, respectively. When the path between two states is blocked, or there is an impassable state in cross grid, the execution probability of the current behavior of the UAV is 1.0 and 0.9, respectively. Moreover, atomic labels A, B, C, D, and E are set at the states

s 16

,

s 25

,

s 19

,

s 5

, and

s 4

, respectively.

Then, Algorithm 1 is used to generate the behavior plans within the finite time domain. Here, we set the time domain of the automaton as

h = 1

and the time domain of the MDP as

H = 3

. Successively the finite state automatons within the time domain and the product systems within the time domain are constructed. Figure 13 shows the directed graph representation of the product system in the first planning cycle, of which the corresponding starting node is

q_{i n i t, P} = (' s 9^{'}, 6)

, and the value of the expected state within the time domain is 0.5.

Based on the product system constructed within the finite time domain, the value iterative algorithm is used to obtain the optimal plan to maximize the cumulative probability of reaching the desired states, where the iterative convergence threshold is set to

ε \overset{Δ}{=} 10^{- 6}

, and the generated behavior plans within the finite time domain are shown in Figure 14. In the figure, the states of the MDP and the optimal state–behavior pairs corresponding to the product system within the time domain

H = 3

are marked with a red polygon box and red arrow, respectively. Therefore, the optimal plan for the UAV to reach the desired state within the time domain

h = 1

planned from the initial state

s 9

is

σ^{max} = s 9 \leftarrow s 10 ↓

, in which the corresponding optimal behavior values at

s 9

and

s 10

are about 0.494327 and 0.494450, respectively. By intuitively understanding the temporal logic of the LTL Formula (9), the primary task performed by the UAV is ‘

F A

’ (that is, starting from the initial state

s 9

to reach A). It can be seen that the results of the online behavior planning are consistent with the task logic at the semantic level, indicating the correctness of the finite time domain planning method.

Finally, assuming that there are no environmental disturbances and the actuator deviation of the UAV, we use Algorithm 2 to plan the global behaviors of the UAV. Figure 15 shows the results of dynamic modeling, planning and unbiased execution of the UAV according to the LTL Formula (9) from the initial state s9, and marks the corresponding behavior return value, in which the blue letter-number-arrow marks indicate the planning results of the initial points within the horizon, and the red letter-number-arrow marks indicate the planning results of the other states within the horizon. Similarly, through the visual analysis of the temporal relationship of the tasks, it can be seen that the results of dynamic modeling and planning are consistent with the temporal logic of LTL formulas, and the planned behaviors within a single horizon are optimal compared with other behaviors. To sum up, the proposed method under the framework of probabilistic model checking can effectively realize the dynamic modeling of UAVs and online behavior planning.

To further demonstrate the efficiency improvement of the proposed method, the time consumption of plan generation and the size of generated state space within different horizons are analyzed and compared with the infinite horizon plan and the sampling-based method [17]. The results, presented in Figure 16 and Figure 17, show that the proposed method can effectively alleviate the state explosion caused by probabilistic model checking and improve the efficiency of plan generation compared with the infinite time domain plan, of which the computation time is a little longer than the sampling-based method.

To further verify the correctness of the proposed method, in addition to the task specification (9), some complex task specifications with different structures are considered as follows.

For the reason of space, the corresponding DRAs of task specification

φ_{2}

and

φ_{3}

are not provided here:

φ_{2} \overset{Δ}{=} F A \land X \neg B U C \land X F C \land X F D

(10)

φ_{3} \overset{Δ}{=} F (A \land F B \land F C) \land \neg B U C

(11)

Then the determinations of goal states for different task specifications containing different types of operators within different horizons are tested, i.e., the strategies are synthesized under different H and h, compared with global strategy synthesis. For the task specification

φ_{2}

, the product automatons are constructed in 0.00914 s and 0.01283 s, respectively, under

H = 3

,

h = 1

, and

H = 6

,

h = 2

, and the times of constructing the product system are 4.1822 s and 6.3167 s, which is faster than the global synthesis (0.02732 s and 11.6418 s). And the time of synthesizing the strategy are 2.0745 s, 4.2838 s and 10.6219 s, respectively. The strategy synthesized with the maximum probability of satisfying it under

H = 3

,

h = 1

, and

H = 6

,

h = 2

is shown in Figure 18a and Figure 18b, respectively, while the global strategy synthesized is shown in Figure 18c.

As can be seen from the above figures, the strategies synthesized by the proposed method are correct under different time horizons, as they are making the UAV move towards satisfying task specification

φ_{2}

, which is the same as the global synthesis. And for the task specification

φ_{3}

, the product automatons are constructed in 0.01223 s and 0.01351 s, respectively, under

H = 4

,

h = 1

and

H = 6

,

h = 2

, and the time of constructing the product system is 6.3641 s and 9.1905 s, which is much shorter than that of global synthesis, where the time of constructing the product automaton and product system is 0.17683 s and 13.75924 s. And the time of synthesizing the strategy is 5.6602 s, 8.0493 s, and 12.4981 s, respectively. The strategy synthesized with the maximum probability of satisfying it under

H = 4

,

h = 1

, and

H = 6

,

h = 2

is shown in Figure 19a and Figure 19b, respectively, while the global strategy synthesized is shown in Figure 19c.

As can be seen from the above figures, the strategy synthesized under

H = 4

,

h = 1

is not optimal due to its short horizon. And the strategy synthesized under

H = 6

,

h = 2

is the same as that of global synthesis. While all the above strategies synthesized are correct, we guide the UAV to move towards satisfying the task specification

φ_{3}

.

6. Experiments

In this section, the proposed method is applied to the developed ROS-based hardware-in-the-loop simulation system shown in Figure 20 and Figure 21, where the environment is constructed in Unreal Engine with the task areas, and the dynamics of the UAV is real. The human collaborator injects the top-level task instruction in the ground control station, and a behavior plan is generated online. Based on this, the collaborator can adjust the task at any time, thus playing the role of real-time supervision and intervention. We use the remote control PC to log in to the onboard MiniPC via the router (here, Rockchip 3588 is used), use the voice assistant toolkit (xfei-asr) to receive and translate the collaborator’s voice commands, and then send the generated LTL-form commands to the on-board MiniPC. The onboard MiniPC as the host computer is equipped with ROS. Its main functions include (i) using the package task-generator to convert the LTL form instructions into a task automaton that the UAV can understand in the finite horizon; (ii) according to the current task progress and the expected task goal in the horizon, using the package plan generator to calculate the behavior plan of the discretized workspace in the horizon; (iii) after receiving the odometer data and sensor data fed back by the lower computer, according to the current pose of the UAV, combined with the current state–behavior mapping in the behavior plan, the package local planner is used to calculate the next path planning result; and (iv) mapping the planning result to the speed and angular velocity of the control, and feeding it back to the lower computer (the Pixhawk is used here). After receiving the speed and angular velocity of the upper computer, the lower computer converts it into the PWM value of the final drive motor, and at the same time, it calculates the encoder information into the speed and angular velocity of the control point and sends it to the upper computer.

For the hardware-in-the-loop experiment, we adopted the task

ϕ_{1}

and scenario as above, where the UAV is mainly consisted of three parts: the mission computing board operating ROS, the flight control board setting as HIL mode with a GPS module, and the real dynamic models in simulation environment. The Rockchip 3588 establishes serial communication with the Pixhawk autopilot to send the target positions through the Marvos package, and the Pixhawk autopilot connects to the workstation by Universal Serial Bus to send the motor drive signals of the UAV computed from the target positions to simulation environment. The UAV communicates wirelessly to the computer via ROS. Under different horizons, the task plans are generated according to the

ϕ_{1}

given by the human collaborator in the ground control station and then deployed to the UAV. Then, the UAV starts to execute the task according to the plans. The process of task execution is shown in Figure 22, and as can be seen from the figure, under the horizon

H = 3

,

h = 1

, and

H = 4

,

h = 2

, the UAV both completes the task

ϕ_{1}

correctly, which reaches the corresponding task areas in time.

7. Conclusions

In this paper, through introducing the receding horizon mechanism into the probabilistic model checking, an online human-aware behavior planning method for nondeterministic system is proposed to make the UAV dynamically satisfy the top-level LTL task description from the human collaborators. Experiments and analysis show that the results of the online behavior planning are consistent with the task logic at the semantic level. Moreover, the proposed method can effectively alleviate the state explosion caused by the global probabilistic model checking.

In the follow-up, the theoretical research work will be further refined, and the improvement efficiency of computational performance will be deeply analyzed. In addition, the proposed method can be applied to some broader real industrial human–UAV collaboration fields.

Author Contributions

Conceptualization, J.Z.; methodology, J.Z.; software, J.Z.; validation, P.W.; formal analysis, J.Z.; investigation, Y.P.; resources, Y.P.; data curation, P.W.; writing—original draft preparation, J.Z.; writing—review and editing, J.Z.; visualization, J.Z.; supervision, Q.Y.; project administration, J.Z.; funding acquisition, P.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by The Science and Technology Innovation program of Hunan Province grant number 2025RC3130.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

We would like to thank Li for providing technical support for this research.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Su, Y.; Huang, L.; LiWang, M. Joint Power Control and Time Allocation for UAV-Assisted IoV Networks Over Licensed and Unlicensed Spectrum. IEEE Internet Things J. 2024, 11, 1522–1533. [Google Scholar] [CrossRef]
He, Y.; Huang, F.; Wang, D.; Chen, B.; Li, T.; Zhang, R. Performance Analysis and Optimization Design of AAV-Assisted Vehicle Platooning in NOMA-Enhanced Internet of Vehicles. IEEE Trans. Intell. Transp. Syst. 2025, 26, 8810–8819. [Google Scholar] [CrossRef]
Ji, Z.; Liu, Q.; Xu, W.; Liu, Z.; Yao, B.; Xiong, B.; Zhou, Z. Towards Shared Autonomy Framework for Human-Aware Motion Planning in Industrial Human-Robot Collaboration. In Proceedings of the 2020 IEEE 16th International Conference on Automation Science and Engineering (CASE), Hong Kong, China, 20–21 August 2020. [Google Scholar]
Liu, Z.; Shi, Z.; Liu, W.; Zhang, L.; Wang, R. Integrated Optimization of Ground Support Systems and UAV Task Planning for Efficient Forest Fire Inspection. Drones 2025, 9, 684. [Google Scholar] [CrossRef]
Alami, R.; Chatila, R.; Clodic, A.; Fleury, S.; Herrb, M.; Montreuil, V.; Sisbot, E.A. Towards human-aware cognitive robots. In Proceedings of the Fifth International Cognitive Robotics Workshop, Washington, DC, USA, 17–19 July 2006. [Google Scholar]
Wu, W.; Chang, T.; Li, X.; Yin, Q.; Hu, Y. Vision-Language Navigation: A Survey and Taxonomy. arXiv 2022, arXiv:2108.11544. [Google Scholar] [CrossRef]
Lasota, P.A.; Shah, J.A. Analyzing the effects of human-aware motion planning on close-proximity human–robot collaboration. Hum. Factors 2015, 57, 21–33. [Google Scholar] [CrossRef] [PubMed]
Aigner, P.; McCarragher, B. Human integration into robot control utilising potential fields. In Proceedings of the International Conference on Robotics and Automation, Albuquerque, NM, USA, 25 April 1997; Volume 1, pp. 291–296. [Google Scholar]
Crandall, J.W.; Goodrich, M.A. Characterizing efficiency of human robot interbehavior: A case study of shared-control teleoperation. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Lausanne, Switzerland, 30 September–4 October 2002; Volume 2, pp. 1290–1295. [Google Scholar]
Baier, C.; Katoen, J.-P. Principles of Model Checking; The MIT Press: Cambridge, MA, USA, 2008. [Google Scholar]
Pronk, C. Model Checking, the technology and the tools. In Proceedings of the 2012 International Conference on System Engineering and Technology (ICSET), Bandung, Indonesia, 11–12 September 2012; pp. 1–2. [Google Scholar] [CrossRef]
Forejt, V.; Kwiatkowska, M.; Norman, G.; Parker, D. Automated Verification Techniques for Probabilistic Systems. In Formal Methods for Eternal Networked Software Systems: 11th International School on Formal Methods for the Design of Computer, Communication and Software Systems, SFM 2011, Bertinoro, Italy, 13–18 June 2011; Advanced Lectures; Bernardo, M., Issarny, V., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6659, pp. 53–113. [Google Scholar]
Kantaros, Y.; Zavlanos, M.M. A Temporal Logic Optimal Control Synthesis Algorithm for Large-Scale Multi-Robot Systems. Int. J. Robot. Res. 2020, 39, 812–836. [Google Scholar] [CrossRef]
Kloetzer, M.; Belta, C. A Fully Automated Framework for Control of Linear Systems from Temporal Logic Specifications. IEEE Trans. Autom. Control 2008, 53, 287–297. [Google Scholar] [CrossRef]
Bhatia, A.; Maly, M.R.; Kavraki, L.E.; Vardi, M.Y. Behavior planning with Complex Goals. IEEE Robot. Autom. Mag. 2011, 18, 55–64. [Google Scholar] [CrossRef]
Kress-Gazit, H.; Fainekos, G.E.; Pappas, G.J. Temporal-Logic-Based Reactive Mission and Behavior planning. IEEE Trans. Robot. 2009, 25, 1370–1381. [Google Scholar] [CrossRef]
Zhang, Z.; Du, R.; Cowlagi, R.V. Randomized Sampling-Based Trajectory Optimization for UAVs to Satisfy Linear Temporal Logic Specifications. Aerosp. Sci. Technol. 2020, 96, 105591. [Google Scholar] [CrossRef]
Ding, X.; Lazar, M.; Belta, C. LTL receding horizon control for finite deterministic systems. Automatica 2014, 50, 399–408. [Google Scholar] [CrossRef]
Ding, X.C.; Belta, C.; Cassandras, C.G. Receding horizon surveillance with temporal logic specifications. In Proceedings of the 49th IEEE Conference on Decision and Control (CDC), Atlanta, GA, USA, 15–17 December 2010; pp. 256–261. [Google Scholar]
Cai, M.; Peng, H.; Li, Z.; Gao, H.; Kan, Z. Receding Horizon Control Based Online Behavior planning with Partially Infeasible LTL Specifications. arXiv 2021, arXiv:2007.12123. [Google Scholar]
Ulusoy, A.; Belta, C. Receding horizon temporal logic control in dynamic environments. Int. J. Robot. Res. 2014, 33, 1593–1607. [Google Scholar] [CrossRef]
Tumova, J.; Dimarogonas, D.V. A Receding Horizon Approach to Multi-Robot Planning from Local LTL Specifications. arXiv 2020, arXiv:1403.4174. [Google Scholar]
Wongpiromsarn, T.; Topcu, U.; Murray, R.M. Receding horizon control for temporal logic specifications. In Proceedings of the 13th ACM International Conference on Hybrid Systems: Computation and Control—HSCC’10, Stockholm, Sweden, 12–15 April 2010. [Google Scholar]
Nenchev, V.; Belta, C. Receding horizon robot control in partially unknown environments with temporal logic constraints. In Proceedings of the 2016 European Control Conference (ECC), Aalborg, Denmark, 29 June–1 July 2016; pp. 2614–2619. [Google Scholar]
Wongpiromsarn, T.; Topcu, U.; Ozay, N.; Xu, H.; Murray, R.M. TuLiP: A software toolbox for receding horizon temporal logic planning. In Proceedings of the 14th International Conference on Hybrid Systems: Computation and Control—HSCC’11, Chicago, IL, USA, 12–14 April 2011. [Google Scholar]
Shaffer, J.A.; Carrillo, E.; Xu, H. Hierarchal Application of Receding Horizon Synthesis and Dynamic Allocation for UAVs Fighting Fires. IEEE Access 2018, 6, 78868–78880. [Google Scholar] [CrossRef]
Shaffer, J.; Carrillo, E.; Xu, H. Receding Horizon Synthesis and Dynamic Allocation of UAVs to Fight Fires. In Proceedings of the IEEE Workshop on Advanced Robotics and its Social Impacts, Copenhagen, Denmark, 21–24 August 2018. [Google Scholar]
Yoo, C.; Fitch, R.; Sukkarieh, S. Online Task Planning and Control for Fuel-Constrained Aerial Robots in Wind Fields. Int. J. Robot. Res. 2016, 35, 438–453. [Google Scholar] [CrossRef]
Li, J.; Cai, M.; Kan, Z.; Xiao, S. Model-Free Reinforcement Learning for Motion Planning of Autonomous Agents with Complex Tasks in Partially Observable Environments. Auton. Agents Multi-Agent Syst. 2024, 38, 14. [Google Scholar] [CrossRef]
Tumova, J.; Dimarogonas, D.V. Multi-robot planning under local LTL specifications and event-based synchronization. Automatica 2016, 70, 239–248. [Google Scholar] [CrossRef]
Liu, Y.; Er, M.J.; Guo, C. Online time-optimal path and trajectory planning for robotic multipoint assembly. Assem. Autom. 2021, 41, 601–611. [Google Scholar] [CrossRef]
Cai, M.; Zhou, Z.; Li, L.; Xiao, S.; Kan, Z. Reinforcement learning with soft temporal logic constraints using limit-deterministic generalized Büchi automaton. J. Autom. Intell. 2025, 4, 39–51. [Google Scholar] [CrossRef]
Babiak, T.; Blahoudek, F.; Křetínský, M.; Strejček, J. Effective Translation of LTL to Deterministic Rabin Automata: Beyond the (F,G)-Fragment; Springer International Publishing: Cham, Switzerland, 2013. [Google Scholar]
Boker, U.; Lehtinen, K.; Sickert, S. On the Translation of Automata to Linear Temporal Logic; Springer International Publishing: Cham, Switzerland, 2022. [Google Scholar]

Figure 1. The schematic diagram of the proposed framework.

Figure 2. Examples of the local target state determination. (a) MDP. (b) DRA. (c) Local target state determination when

H = 4, h = 1

. (d) Local target state determination when

H = 4, h = 2

.

Figure 2. Examples of the local target state determination. (a) MDP. (b) DRA. (c) Local target state determination when

H = 4, h = 1

. (d) Local target state determination when

H = 4, h = 2

.

Figure 3. The probabilistic behavior model of the UAV, in which s1 denotes the initial location of the UAV, characters

A, B, C

indicate some target locations, and the gray arrows indicate connectivity between states. The execution probability of the UAV behavior is 0.9 and 0.1, respectively.

Figure 3. The probabilistic behavior model of the UAV, in which s1 denotes the initial location of the UAV, characters

A, B, C

indicate some target locations, and the gray arrows indicate connectivity between states. The execution probability of the UAV behavior is 0.9 and 0.1, respectively.

Figure 4. DRA corresponding to

φ_{1} \overset{Δ}{=} \neg A U C

.

Figure 4. DRA corresponding to

φ_{1} \overset{Δ}{=} \neg A U C

.

Figure 5. The behavior plans indicated by the red arrow, which satisfy the top-level LTL task description

φ_{1} \overset{Δ}{=} \neg A U C

.

Figure 5. The behavior plans indicated by the red arrow, which satisfy the top-level LTL task description

φ_{1} \overset{Δ}{=} \neg A U C

.

Figure 6. The performance comparison.

Figure 7. DRA corresponding to

φ_{2} \overset{Δ}{=} G (A \to X \neg A)

.

Figure 7. DRA corresponding to

φ_{2} \overset{Δ}{=} G (A \to X \neg A)

.

Figure 8. The behavior plans indicated by the red arrow, which satisfy the top-level LTL task description

φ_{2} \overset{Δ}{=} G (A \to X \neg A)

.

Figure 8. The behavior plans indicated by the red arrow, which satisfy the top-level LTL task description

φ_{2} \overset{Δ}{=} G (A \to X \neg A)

.

Figure 9. The performance comparison.

Figure 10. DRA corresponding to

φ_{3} \overset{Δ}{=} G \neg A \land GF C

.

Figure 10. DRA corresponding to

φ_{3} \overset{Δ}{=} G \neg A \land GF C

.

Figure 11. The behavior plans indicated by red arrow, which satisfy the top-level LTL task description

φ_{3} \overset{Δ}{=} G \neg A \land GF C

.

Figure 11. The behavior plans indicated by red arrow, which satisfy the top-level LTL task description

φ_{3} \overset{Δ}{=} G \neg A \land GF C

.

Figure 12. The performance comparison.

Figure 13. Directed graph representation of the product system in the first planning cycle.

Figure 14. Plans generation in the first planning cycle. (a) The MDP corresponding to the product system. (b) The optimal state–behavior pairs in the first iteration within the time domain.

Figure 15. Full cycle behavior planning and execution process (when the actuator is unbiased).

Figure 16. The performance comparison. (a) Time consumption of building product automatons (second). (b) Time consumption of building product system (second). (c) Time consumption of policy generation (second).

Figure 17. Comparison for the size of generated state space.

Figure 18. The strategy synthesized for

φ_{2}

.

Figure 18. The strategy synthesized for

φ_{2}

.

Figure 19. The strategy synthesized for

φ_{3}

.

Figure 19. The strategy synthesized for

φ_{3}

.

Figure 20. The framework of hardware-in-the-loop experiment.

Figure 21. The functional composition of the flight experiment system.

Figure 22. The process of task execution.

Table 1. The simulation parameters for

φ_{1} \overset{Δ}{=} \neg A U C

.

Table 1. The simulation parameters for

φ_{1} \overset{Δ}{=} \neg A U C

.

Group 1	Group 2	Group 3
$h = 1$	$h = 1$	$h = \infty$
$H = 2$	$H = 3$	$H = \infty$

Table 2. The simulation parameters for

φ_{2} \overset{Δ}{=} G (A \to X \neg A)

.

Table 2. The simulation parameters for

φ_{2} \overset{Δ}{=} G (A \to X \neg A)

.

Group 1	Group 2	Group 3
$h = 1$	$h = 2$	$h = \infty$
$H = 1$	$H = 2$	$H = \infty$

Table 3. The simulation parameters for

φ_{3} \overset{Δ}{=} G \neg A \land GF C

.

Table 3. The simulation parameters for

φ_{3} \overset{Δ}{=} G \neg A \land GF C

.

Group 1	Group 2	Group 3
$h = 1$	$h = 2$	$h = \infty$
$H = 2$	$H = 4$	$H = \infty$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, J.; Wang, P.; Peng, Y.; Yin, Q. An Online Human-Aware Behavior Planning Method for Nondeterministic UAV System Under Probabilistic Model Checking. Drones 2025, 9, 832. https://doi.org/10.3390/drones9120832

AMA Style

Zhu J, Wang P, Peng Y, Yin Q. An Online Human-Aware Behavior Planning Method for Nondeterministic UAV System Under Probabilistic Model Checking. Drones. 2025; 9(12):832. https://doi.org/10.3390/drones9120832

Chicago/Turabian Style

Zhu, Jiancheng, Peng Wang, Yong Peng, and Quanjun Yin. 2025. "An Online Human-Aware Behavior Planning Method for Nondeterministic UAV System Under Probabilistic Model Checking" Drones 9, no. 12: 832. https://doi.org/10.3390/drones9120832

APA Style

Zhu, J., Wang, P., Peng, Y., & Yin, Q. (2025). An Online Human-Aware Behavior Planning Method for Nondeterministic UAV System Under Probabilistic Model Checking. Drones, 9(12), 832. https://doi.org/10.3390/drones9120832

Article Menu

An Online Human-Aware Behavior Planning Method for Nondeterministic UAV System Under Probabilistic Model Checking

Highlights

Abstract

1. Introduction

1.1. Related Work

1.2. Contribution

2. Preliminaries

2.1. Linear Temporal Logic

2.2. Probabilistic Behavior Model

3. Proposed Framework

4. Methodology

4.1. Product System Based on Finite State Automaton

4.1.1. Finite State Automaton Within h

4.1.2. Product System Within H

4.1.3. Determining the Target State Within H

4.2. Behavior Planning Within the Time Domain H

4.3. Tasks Execution Across Infinite Time

5. Simulation and Analysis

5.1. Settings

5.2. Correctness and Effectiveness

5.3. Result Analysis

6. Experiments

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI