Drone-Based Autonomous Motion Planning System for Outdoor Environments under Object Detection Uncertainty

Recent advances in autonomy of unmanned aerial vehicles (UAVs) have increased their use in remote sensing applications, such as precision agriculture, biosecurity, disaster monitoring, and surveillance. However, onboard UAV cognition capabilities for understanding and interacting in environments with imprecise or partial observations, for objects of interest within complex scenes, are limited, and have not yet been fully investigated. This limitation of onboard decision-making under uncertainty has delegated the motion planning strategy in complex environments to human pilots, which rely on communication subsystems and real-time telemetry from ground control stations. This paper presents a UAV-based autonomous motion planning and object finding system under uncertainty and partial observability in outdoor environments. The proposed system architecture follows a modular design, which allocates most of the computationally intensive tasks to a companion computer onboard the UAV to achieve high-fidelity results in simulated environments. We demonstrate the system with a search and rescue (SAR) case study, where a lost person (victim) in bushland needs to be found using a sub-2 kg quadrotor UAV. The navigation problem is mathematically formulated as a partially observable Markov decision process (POMDP). A motion strategy (or policy) is obtained once a POMDP is solved mid-flight and in real time using augmented belief trees (ABT) and the TAPIR toolkit. The system’s performance was assessed using three flight modes: (1) mission mode, which follows a survey plan and used here as the baseline motion planner; (2) offboard mode, which runs the POMDP-based planner across the flying area; and (3) hybrid mode, which combines mission and offboard modes for improved coverage in outdoor scenarios. Results suggest the increased cognitive power added by the proposed motion planner and flight modes allow UAVs to collect more accurate victim coordinates compared to the baseline planner. Adding the proposed system to UAVs results in improved robustness against potential false positive readings of detected objects caused by data noise, inaccurate detections, and elevated complexity to navigate in time-critical applications,


Introduction
The elevated number of stranded people and human loss caused by natural disasters, weather events, crime, and military conflicts is an ever-present issue [1]. According to the Australian Institute of Criminology, an average of 38,000 people are reported missing per year in Australia [2]. Unfortunately, around 2% of them (or 720 persons) are never found, and 2600 are reported (long-term) missing for more than three months. Thus, it is beneficial to improve search and rescue (SAR) efforts by developing technology that cooperates with first responders to locate as many victims as soon as an emergency situation is declared.
Recent advances in autonomy of unmanned aerial vehicles (UAVs)-or droneshave increased their adoption in time-critical applications, such as disaster monitoring, surveillance, and SAR [3][4][5][6]. Compared to manned aircraft, UAVs offer high versatility to scout areas due to their reduced size and cost, extensive payload adaptability, and ease of piloting by incorporating automated tasks, including waypoint navigation, obstacle avoidance, and autonomous take-off and landing [7][8][9]. In time-critical applications, such as SAR, real-time camera streaming is critical for UAV operators to understand the situation context and manoeuvre the aircraft during the mission [10]. However, this strong reliance in the communication system compromises the usability of UAVs if such systems fail [11]. Furthermore, operators are susceptible to fatigue in the long run by controlling the UAV and photo-interpreting streamed camera frames to find victims [12]. As a result, developing decision-making algorithms onboard UAVs could further expand their applicability in civilian and time-critical applications. UAVs enriched with autonomous onboard object detection for real-time decision making eliminate the dependency of UAV pilots to photointerpret streamed frames, and in their communication systems to control the aircraft.
Real-world and time-critical UAV applications are full of uncertainties. As shown in Figure 1, uncertainty comes from uncontrolled external factors, such as unknown environment conditions, strong wind gusts, dynamic obstacles, suboptimal survey altitude, and partial observability. Research has demonstrated that applied theory on decision making using partially observable Markov decision processes (POMDPs) can increase cognition capabilities in UAVs for autonomous path planning under uncertainty [13][14][15]. UAV frameworks for real-time target finding and tracking in cluttered indoor environments under uncertainty [16,17] have been developed by using augmented belief trees (ABT) [18], an online POMDP solver that computes and updates motion policies in real time. Similarly, POMDPs have also been used to solve multi-objective UAV tasks, such as collision avoidance, path planning, and multiple target tracking [19,20]. In the SAR domain, autonomous navigation in humanitarian relief operations have also been tested using POMDPs, with most of the implementations achieving experimental results only in simulation [21,22]. A recent research work towards real-world tests shows a UAV motion planner implementation using the SARSOP POMDP solver [23] onboard a small UAV (UAVs with a maximum take-off weight (MTOW) of 13.5 kg [24]). However, the system was only validated using trivial targets and not with humanoid-shaped mannequins [25].
Uncertainty in time-critical UAV operations is not limited to external factors. Internal UAV systems, which also generate uncertainty include noisy camera frames during streaming, low image resolution, suboptimal camera settings, and inaccurate detection outputs from vision-based object detectors. Research works on onboard decision-making in small UAVs for autonomous navigation under object detection uncertainty from Convolutional Neural Network (CNN) models are scarce. Some of the closest approaches are the studies from Sandino et al. [26,27], which describe a framework and POMDP problem formulation for a SAR application in a global navigation satellite system (GNSS)-denied environment using a sub-2 kg UAV. The framework was tested in simulation and with real flight tests indoors, simulating a collapsed building scenario. Another study presented by Chanel et al. [28] shows an autonomous multiple-car detector using a UAV and a POMDP optimised mid-flight. However, there are limitations with reproducibility; relevant experimental details are not disclosed, such as the UAV frame, companion computer, payload, and computer vision algorithms.
This paper presents an autonomous drone motion planning and object finding system under object detection uncertainty for outdoor environments. The UAV exploration and object inspection tasks are mathematically formulated using a POMDP, which models uncertainty with probabilistic distributions. A case study inspired by SAR is illustrated below, where a small UAV is deployed to locate a missing person (victim) in a bushland region. Two victim locations across a simulated flight area were evaluated to test the system at various levels of occlusion. This work evaluates the performance of the proposed motion planner by operating the UAV with the following flight modes: (1) mission mode (baseline motion planner), which uses a traditional survey flight plan to explore the area of interest; (2) offboard mode, where the POMDP-based motion planner fully controls the behaviour of the UAV for exploration and inspection of areas from detections of potential victims; and (3) hybrid mode, which uses mission mode to explore the area of interest and triggers offboard mode to inspect dedicated regions from past detections of potential victims. The proposed UAV system design uses hardware in the loop (HIL) to integrate a physical companion computer into the simulation environment. By using HIL to run the decision-making and vision algorithms, it is possible to obtain performance results comparable with those obtained with real flight tests.

System Architecture
The system architecture uses a modular design that allocates tasks to dedicated nodes, as shown in Figure 2. This architecture design allows users to collect experimental data in simulated environments to be as close as possible to data collected on real-world tests.
The proposed system architecture categorises its modules into two primary categories, namely the simulator and the companion computer (operating under HIL). The simulator is comprised by items within in the simulated environment including models of the UAV body frame, payloads (i.e., cameras), the victim and relevant obstacles. In addition, the simulator contains plugins to add the UAV's autopilot in the software in the Loop (SIL) mode, and relevant peripheral sensors. The readings of these simulated sensors are used by the autopilot to run its local position estimator based on an extended Kalman filter (EKF). The second category contains the system modules that can be run onboard a UAV's companion computer using HIL, namely, the Computer vision module, decision-making module, mapping module, and motion module. Specific implementations to enable SIL and HIL are covered in detail in Section 4. Each sequential decision process starts by capturing observations of the environment, calculating an action using the motion planner, and then the UAV applying the action. The entire sequence is monitored via telemetry using the ground control station, enabling the operator switch between the autonomous motion planner and manual control if required. Beginning the sequence, environment observations are collected using streamed camera frames, and processed in the computer vision module. This module detects people and their location inside the camera frames with a CNN. As CNNs require mathematical operations, which are computationally expensive in embedded hardware, system resources allocated to run CNN models are delegated to the vision processing unit (VPU). Any positive victim detection outputs are ultimately sent to the decision-making module and managed using the observation server.
The decision-making module is primarily constituted by the POMDP motion planner and the observation server. The observation server handles any data traffic requested by the motion planner, such as local position estimations by the autopilot, any positive detections by the computer vision module, and 3D occupancy maps by the mapping module. This map is constituted by volumetric occupancy grids and displays the presence and localisation of obstacles in the surveyed environment. In this implementation, the 3D occupancy map is used by the decision-making and motion modules for obstacle avoidance and collision prevention, respectively.
The motion planner features a POMDP to mathematically formulate the sequential decision-making problem. The POMDP-based motion planner outputs action commands to the motion module, which will subsequently translate and send these commands to the flight controller. Lastly, the flight controller adjusts the control signals to the UAV motors to reflect the requested action from the motion planner, and an idle period is granted to the system while the UAV moves to the next location.

Motion Planner Design
A decision-making problem for autonomous navigation in UAVs can be formulated as the sequence of actions A that the UAV (or agent) needs to take to accomplish a mission goal. This set of actions are limited to the operational abilities of the UAV within the environment. States S are defined as the set of parameters that outline the system. Every action taken by the UAV causes a change of state s ∈ S in the time domain. An illustration of the interaction between the agent and the environment is depicted in Figure 3.  Each change of state (or action-state chain) is quantified with a reward r. By giving numerical values to every action-state chain, the behaviour of the UAV can be influenced by seeking the highest possible reward when specific desired states are reached (e.g., locating a victim without colliding with nearby obstacles). Simultaneously, action-state combinations that lead to undesirable future states (e.g., colliding with nearby obstacles, or flying beyond the survey geofence) could be quantified with a penalty (or negative reward). Therefore, these problems can be solved by finding the optimal sequential set of actions that a UAV should take to obtain the highest possible accumulated reward as quick as possible [29].
Data received by UAV of the environment (or observations) from its sensors and processed vision outputs (i.e., CNN model detections) in real-world applications are, unfortunately, noisy, incomplete or inaccurate. These imperfections restrict the inference of actual system states, which have inspired the mathematical formulation of POMDPs [30].
This work uses a POMDP to compute a motion policy, which outputs low-level action commands derived using environment observations. A POMDP [31] is a tuple A, S, O, T, Z, R, b 0 , γ , where A is a finite set of UAV actions, S is a finite set of states, and O is a finite set of observations. Each time the UAV takes an action a ∈ A from a state s ∈ S, the probability of moving to a new state s ∈ S is defined by a transition function T(s, a, s ) = P(s | s, a). After an action is taken, the UAV receives an observation o ∈ O followed by an observation function Z (s , a, o) = P(o | s , a). Lastly, the model quantifies every decision chain (or action taken a ∈ A from state s ∈ S) with a reward r, calculated using the reward function R(a, s).
In real-world applications, UAVs are limited to capture partial information about the surveyed environment (and the objects to be detected). A POMDP models the uncertainty given by partial observability of its observed states using a probability distribution over the system states. This modelling is known as the belief b, defined as follows: where H is the history of actions, observations, and rewards the UAV has experienced until a time step t. A POMDP solver starts planning from an initial belief b 0 , which is usually generated using the initial conditions (and assumptions) of the flight mission. When an observation is made, the belief distribution is subsequently updated. The motion policy π of the UAV is represented by mapping belief states to actions π : b → A. An illustration of a belief tree and motion policy in POMDPs is shown in Figure 4. . Belief tree and motion policy representation in partially observable MDPs (POMDPs). Each circle illustrates the system belief as the probability distribution over the system states (red dots). When an observation is made, the belief distribution is subsequently updated.
The solution of a POMDP is obtained after finding the optimal policy π * , which is defined as follows: where γ ∈ [0, 1] is the discount factor and defines the relative importance of immediate rewards compared to future rewards. The solver selected in this work uses augmented belief trees (ABT) [18], which reduce computational demands by reusing past computed policies and updating the optimal policy if changes to the POMDP model are detected. Furthermore, formulated problems with ABT allow declaring continuous values for actions, states, and observations. The following subsections detail the problem assumptions, and the components of the POMDP tuple for exploration and object detection applied on multi-rotor UAVs.

Assumptions
The following assumptions are applied to the problem formulation for exploration and object detection (i.e., victims) using multi-rotor UAVs: • An initial 3D occupancy map of the environment is pre-loaded to the UAV before taking-off (further details on how the map is generated can be found in Section 4.4). • Observations come from real-time streaming of processed camera frames (from the computer vision module), the 3D occupancy map, and the estimated local UAV position from the autopilot. • The problem formulation is constrained to detect only a single, static victim. However, multiple victims located in the search area could be detected if the motion planner is launched multiple times (see Section 4.5 for further details). • The motion planner starts once the UAV reaches the initial position waypoint of the flight survey. • The planner concludes computing motion policies when either: (1) the UAV detects a victim with a high confidence value that exceeds a given threshold (covered in Section 3.7); (2) the UAV covers the entire exploration extent of the search area without finding any victims; or (3) the UAV exceeds a maximum flight timeout.

Action Commands
This implementation proposes seven low-level UAV actions, which produce position changes in at least one of axes of the world coordinate frame as shown in Table 1. This design allows adding more actions if the application requires it. Possible scenarios include actions to set pitch and roll angles of a camera gimbal, the heading of the UAV (yaw) and actions applied to more than one Cartesian axis (e.g., up and forward, forward and right, etc). Other UAV actions that are executed out of the scope of the POMDP motion planner, such as take-off, return to launch, and land are instead managed by the autopilot. Table 1. Set of action commands comprised of local position commands referenced to the world coordinate frame. The system keeps records of its current local position at time step k, and calculates δ as the change of position magnitude of coordinates x u , y u and z u from time step k to time step k + 1.
Each time the UAV takes an action, a change δ in UAV local position coordinates is applied. The magnitude for δ x and δ y is variable, and its value is defined by the desired overlap of collected camera frames between time steps, as shown below: where l FOV is the length of the projected camera's field of view (FOV), and λ ∈ [0, 1) is the desired overlap value. High values of λ result in smoother motion response and conservative detection of victims in the scene, whereas low values lead to more aggressive UAV response. A detailed description of FOV estimations from vision-based sensors can be found in Section 3.7.

States
In POMDP formulation for autonomous UAV path-planning, the defined states serve two goals: (1) to model the motion dynamics of the UAV; and (2) to oversee key system conditions during the flight mission. A system state s ∈ S for autonomous UAV exploration and object detection is defined as: where p u is the position of the UAV in the world coordinate frame, f crash is a flag raised when the UAV crashes with an obstacle, f roi is a flag indicating whether the UAV flying beyond the flying limits, f dct is the flag raised if a potential victim is detected by the UAV. If f dct = True, the position of the victim in the world coordinate frame is given in p v , with c v ∈ [0, 1] providing the corresponding detection confidence. The system reaches a terminal state whenever c v ≥ ζ, where ζ is the confidence threshold. The formulated state space is flexible and can be further extended for two or more victims, as well as additional system conditions if required.

UAV Motion Model
The transition from current to new states is defined through a simplified version of the motion dynamics of a multi-rotor UAV as follows: where p u (k) is the position of the UAV at time step k, and ∆p u (k) is the change in the UAV's position from time step k to time step k + 1. While this implementation does not incorporate the UAV Euler yaw angles in the action space, Equation (6) can be further expanded by including the multi-rotor rotation matrix [32] as presented in the problem formulation given in [26]. The procedure to model ∆p u (k) follows the system identification process presented in [27]. The primary workflow is defined as follows: 1.
Track the position responses of the UAV controlled system using a series of step setpoints via a Vicon motion capture system (Vicon, Oxford, UK).

2.
Estimate the transfer function of the UAV controlled system using the MATLAB system identification toolbox.

3.
Discretise the transfer function using the Tustin approximation method.

4.
Obtain the difference equation using the inverse Z transform.
The generic difference equation is defined as: where y(k) is the response of the system, N is the order of the transfer function, a i and b i are the numerical constants for each setpoint and response function variables respectively, and T s is the sampling period. The constants used for the experiments are given in Appendix A.

Reward Function
Any given reward r to the UAV after taking an action a ∈ A from state s ∈ S is calculated using a reward function R(a, s). This function design allows multi-objective task definition, and critically influences the UAV behaviour during flight missions. This work proposes Algorithm 1 as a novel algorithm for R(a, s), which incorporates obstacle avoidance, exploration, and object detection. The proposed algorithm contains several reward variables, whose values are set following an iterative process of the system in the simulator to observe the combination that provides the best behaviour of the UAV to explore the environment and inspect areas with potential victims inside them. Table 2 shows the converged reward values for this implementation. Algorithm 1 Reward function R for exploration and object detection. 1: r ← 0 2: if f crash then 3: r ← r crash UAV crashing cost 4: else if f roi then 5: r ← r out Beyond safety limits cost 6: else if f dct then 7: r ← r dtc Detected object reward 8: if c v ≥ ζ and a = Down then 10: r ← r + r conf 11: end if 12: else 13: r ← r action Action cost 14: 16: r ← r + r fov · ε Footprint overlap cost 17: end if 18: return r The reward function R first evaluates any states that will negatively affect the integrity of the UAV. R returns negative reward values if a given action a provokes a crash or lets the UAV fly beyond the safety limits regardless whether the UAV detects a potential victim. This modelling encourages the UAV to apply a policy that approaches a victim without colliding with an obstacle in the process. If the UAV detects a potential victim (Step 6), R calculates a linear function (Step 8), which returns increased values as the UAV gets closer to the minimum allowed altitude. Once the confidence value c v from the CNN object detector surpasses a pre-defined threshold, a bigger reward is returned as the potential victim is assumed to be found (Step 9 and 10).
In case there are no detections, R applies a set of cost functions to encourage a greedy horizontal exploration of the environment. The first is the inverse function from (Step 8) to encourage the UAV to explore at the maximum possible altitude to maximise the camera's footprint extent. The second is an exponential function (Step 14), which assigns lower cost values the closer the UAV is to the victim. This function calculates the Manhattan distance between the UAV and the victim d v and the maximum exploration distance d w , which are defined as follows: Sub-optimal policies could be generated if a set of distributed victim position particles are equidistant from the UAV by using Equation (10). Moreover, redundant traversed paths have been reported in the literature that implemented similar problem formulations [16,27]. The third cost function (Step 16) counteracts the above-mentioned effects by logging the traversed path by the UAV and introducing the concept of exploration footprint. The concept records the accumulated visual footprint of the downward-looking camera into a two-dimensional (2D) map. The map extent equals the extent limits of the surveyed area, and it is updated after calculating the camera's footprint once the UAV collects a new observation. An illustration of the concept is shown in Figure 5.
The overlap ε between the camera's current footprint and its correspondent location in the footprint map is defined as follows: where F i (p u ) are the pixel values of the projected FOV in the footprint map, and n is the total number of projected pixels in the footprint. A maximum overlap value of 1 indicates that such action will place the UAV to a fully previously explored area, and, as indicated in Step 16 of Algorithm 1, the whole penalty value r fov will be added to the reward. A minimum value of 0 means that a given action will place the UAV in an unexplored area and no penalty will be added to the reward. Intermediate values of ε represent partial overlapping, adding a partial penalty value r fov to the reward.

Observations
A captured observation o ∈ O-data from UAV sensors about the state of the environment and the UAV itself (illustrated in Figure 2)-is defined as follows:  (13), is a summary statistic that measures the frequency of positive detections between the last two observation calls: where n is the number of processed image frames between observation calls, and o dtc is the flag indicating a positive detection per processed frame i.

Observation Model
An advantage of ABT compared to other solvers is that the probabilistic transition functions T and Z do not require to be explicitly defined, but instead it uses a generative model. This model generates T and Z using a modelled observation o given an action a and the next state s . In this implementation, the variables contained in the generative model are the local position of the UAV s p u , the local position of the victim s p v , and the detection confidence o ζ . The estimation of s p u follows the UAV motion estimations detailed in Section 3.4.
Potential victim detections and their subsequent positioning estimations are conditioned by the camera pose at the UAV frame and its projected footprint of the environment. Specifically, if the 2D local position coordinates of the victim s p v (x, y) are within the projected footprint limits of the camera, the victim is assumed to be detected. This estimation is done by calculating the sum of angles between a 2D point (i.e., s p v ) and each pair of points that constitute the footprint boundaries (the footprint rectangular corners) [33]. The 2D projected footprint extent l of a vision-based sensor, illustrated in Figure 6, can be calculated using Equations (14) and (15): where s p u is the UAV altitude, α and β are the camera's pointing angles from the vertical z and horizontal x axis of the world coordinate frame, w is the lens width, h is the lens height, and f is the focal length. The footprint corners c from the camera's local coordinate frame I are translated to the world's coordinate frame W using the following transformation:

Ground
where s p u is the next UAV position state, and ϕ u is the Euler yaw angle of the UAV. However, as no actions involve adjusting the heading of the UAV mid-flight, and assuming yaw estimation errors are negligible, Equation (16) is simplified as follows: The detection confidence o ζ that comes as part of the output data from the CNN object detector is modelled using Equation (18): where ζ min is the minimally accepted confidence threshold, z max and z min are the maximum and minimum UAV flying altitudes respectively, and d uv is the Manhattan distance between the UAV and the victim.

Experiments
The proposed system is evaluated under the context of SAR to autonomously scout a bushland area and find a lost person (victim). The following subsections describe the environment setup, victim locations, hardware and software used, and tested flight modes to accomplish the mission. The experiments assume the environmental terrain and 3D occupancy map have been generated prior to launching the UAV to survey the area.

Environment Setup
The system was tested under a simulated replica of the Queensland University of Technology (QUT) Samford Ecological Research Facility (SERF), located at 148 Camp Mountain Road, Samford QLD 4520, Australia. SERF is a 51 hectare property comprising a protected dry sclerophyll forest and grazing zones. From this property, a smaller 2.25 hectare of a mostly flat area was chosen for the tests as it contains an environment with a rich combination of open, mixed, and dense tree canopy areas, as depicted in Figure 7a.
The environment was emulated using the Gazebo Robotics Simulator. High-fidelity terrain textures from SERF were generated after projecting a high-resolution georeferenced red, green, blue (RGB) mosaic onto the world's surface, as illustrated in Figure 7b. Imagery datasets for the mosaic are captured using a DJI Zenmuse XT2 camera (DJI, Shenzhen, China), mounted on a DJI M210 V2 (DJI, Shenzhen, China). The UAV collected the data at an above ground level (AGL) height of 60 m, 70% overlap, and 60% sidelap, at a constant speed of 10 m/s. Orthomosaics have been produced from video files by extracting and georeferencing still images, significantly increasing the area coverage rates.
Relevant obstacles in the environment, such as trees and greenhouses, are placed following their corresponding projections in the world's terrain. In addition, the altitude values of each tree in the scene are set following a LiDAR dataset collected from a Hovermap mapper (Emesent Pty Ltd., QLD, Australia), shown in Figure 7c. The dataset has been captured by mounting the Hovermap into a DJI M600Pro (DJI, Shenzhen, China) at an AGL height of 60 m and constant speed of 5.4 m/s.

Victim Locations
Two sets of experiments with a single victim (male adult) at two locations are proposed to validate the system in terms of detection complexity. The first location (L1) setup (latitude 152.87309239; longitude −27.38938274) displays the victim in an area free of obstacles and no visibility challenges from streamed UAV data. The second location (L2) setup (latitude 152.87300287; longitude −27.38966054) features the victim located close to one of the trees where partial or full occlusion may occur as portrayed in Figure 8. For both locations, the victim is static, lying on the ground with a heading of 0 • from the north compass.

Hardware
The system architecture shown in this paper ( Figure 2) classifies hardware into two primary components: (1) the companion computer, which runs all of the modules as if the system is tested on real flight tests; and (2) the workstation that supports the simulated environment and the SIL implementation of the autopilot.
The companion computer selected for these experiments follows the recommended hardware used by Sandino et al. [27] for onboard navigation using small quadrotor UAVs. The computer vision, decision-making, motion, and 3D occupancy mapping modules run on an UP 2 (AAEON Technology Inc., New Taipei City, Taiwan). Relevant specifications include a 64-bit quad-core Intel ® Pentium ® N4200 processor running at 1.1 GHz, 64 GB eMMC solid state drive, 8 GB DDR3 RAM, four FL110 USB 3.0 connectors, two Ethernet controllers, two high-speed UART controllers, and one mPCIe connector. The mPCIe connector from the UP 2 is used to plug an AI CORE Movidius Myriad X VPU 2485 and load CNN object detectors and perform real-time inference.
The simulated environment setup, ground control station, and the autopilot (running in SIL mode) are tested under a desktop computer with the following specifications: 64-bit 12-core Intel ® Core ® i7-8700 CPU running at 3.2 GHz, 512 GB solid state drive, 32 GB DDR4 RAM, 6 GB NVIDIA GeForce GTX 1060, six (6) USB 3.0 ports, and one Ethernet controller.

Software and Communications
The presented system applies a set of various software solutions per module. It uses the robot operating system (ROS) [34] as the common middleware for communications and allows the modular design presented in this paper. The software for both the companion computer and desktop runs under Linux Ubuntu 18.04 64-bit operating system.
The CNN model architecture used to detect potential victims in the computer vision module is a Google MobileNet single-shot detector (SSD) [35]. This instance of MobileNet SSD is tuned using pre-trained weights from the PASCAL VOC2012 dataset and deployed using the Caffe framework [36]. The model classifies up to 21 unique objects from the PASCAL dataset, and achieves a mean average precision of 72.7%. The computer vision module only filters and outputs the class person, and positive detections with a confidence of at least 60%. The CNN is loaded to the VPU using OpenVINO and OpenCV libraries. The UAV system is not limited to this detector, and other CNN models trained with airborne specific datasets can be implemented instead. Nevertheless, existing limitations from the chosen model (i.e., most of the labelled images were taken from a front-view camera configuration) aid its intentional usage to demonstrate of the robustness of the motion planner against higher levels of detection uncertainty.
The key nodes of the decision-making module are programmed using the C++ programming language and the TAPIR toolkit [37], which implements the ABT online POMDP solver. The mapping module, which contains 3D occupancy map node for obstacle avoidance, runs under the OctoMap library [38]. A benefit of using OctoMap is the capability of generating and updating occupancy maps from 3D point cloud data such as LiDAR or depth sensors. Therefore, the LiDAR data captured at SERF by Hovermap are loaded to generate an initial occupancy map of the environment, as shown in Figure 9. The emulation of the autopilot is performed using the PX4 SIL plugin for Gazebo. This plugin provides a high-fidelity experience of the interface and flight controller behaviour with other modules. The communication interface between the motion module and the PX4 autopilot (with the SIL Gazebo plugin) is based on the second version of the Micro Air Vehicle Link (MAVLink) protocol [39] and applied through MAVROS, a ROS wrapper of MAVLink. Telemetry to the ground control station is achieved using QGroundControl via a UDP connection.

UAV Flight Modes
Three flight modes are proposed to evaluate the system at each victim location mentioned in Section 4.2. From a SAR application context, the presumed location of the victim is unknown. Hence, the flight modes are designed to scout the entire delimited flying area.

Mission Mode
The first flight mode, denominated here as mission mode, is intended to be served as the UAV motion planner baseline for this study. This flight mode uses the traditional survey approach provided by many modern flight controllers. Typically, a sequential list of position waypoints that follows the desired survey pattern by the pilot are generated by a ground control station software and sent to the UAV autopilot. The flight plan to assess this mode follows the flight parameter values from Table 3, and is compiled using QGroundControl, as shown in Figure 10. This flight mode is triggered with the UAV on the ground and allows the UAV to take off, reach the initial position waypoint, and perform the survey. The mode terminates after completing the survey and manoeuvring the UAV back to its launch position.

Offboard Mode
The second flight mode, denominated here as Offboard mode, runs the POMDP-based motion planner described in Section 3. In concordance to the flight plan design from mission mode, the initial conditions set for the offboard mode and defined variables from Section 3.4 are detailed in Table 4. Before the planner is run at the initial UAV location p u0 , TAPIR-the toolkit the planner is coded-runs an offline policy solver for two seconds. Then, for every time step, an observation is collected and TAPIR triggers the solver online to update the motion policy with a timeout of 800 ms. The remaining time is an idle period while the UAV moves to the next position following the action taken from the updated policy. Compared to the first flight mode, this flight mode finishes once the maximum flight time is exceeded or if the victim is found (i.e., ζ ≥ 85%).
The initial state belief b 0 of the POMDP model consists of the position of UAV and the victim. The UAV position belief follows a normal distribution with a mean of p u0 and standard deviation of 1.5 m. The victim position belief follows a uniform distribution whose limits are equivalent to the survey extent of the flight plan from mission mode, as shown in Figure 11.

Hybrid Mode
The third flight mode, known as hybrid mode, combines the methods from mission and offboard modes. This mode splits the overall navigation task into two components: it uses mission mode to skim the survey area and triggers the POMDP motion planner after a first positive detection. The primary difference in the design of the POMDP motion planner between the offboard and hybrid modes relies on the modelling of the initial belief b 0 . In a hybrid mode, the mean value of the normal distribution of the UAV position belief is now defined as the UAV position coordinate, where the first victim detection takes place. Instead of distributing the victim position belief uniformly across the entire flying area, this mode constraints the search area to the extent of the camera's FOV, as shown in Figure 12.

Results
Validation of the system through the presented SAR case study contains the following performance metrics: (1) Heatmaps of recorded GNSS coordinates of any detected victims; (2) accuracy of output detections and collisions; (3) speed analysis of POMDP-based motion planner in offboard and hybrid flight modes; and (4) system speed to cover the survey area or until the victim is detected. Each performance indicator is evaluated per flight mode and victim locations (which represent various detection complexities because of occlusion from nearby obstacles). Each setup combination (flight mode and victim location) was tested for 20 iterations. Demonstrative videos of the UAV system operating under mission (Video S1), offboard (Video S2), and hybrid (Video S3) flight modes can be found in the Supplementary Materials section.
A series of heatmaps of GNSS coordinates (from detected victims) returned by mission, offboard and hybrid flight modes are shown in Figures 13-15 respectively.
The distribution of GNSS coordinates in the heatmaps suggests a considerably higher precision when the system runs offboard and hybrid flight modes to report victim locations compared to the baseline (mission) flight mode. The visualisation complexity from the defined victim locations does not indicate any significant differences to the precision outputs of the system. The variable distribution of victim GNSS coordinates in mission mode ( Figure 13) was caused by false positive readings from specific object visualisations. For this case study and environment setup, other objects wrongly detected as people included trees, greenhouse rooftops, and to a lesser degree, bare grass. Examples of these detections are displayed in Figure 16.  The detection outputs of the CNN model also had an impact in the accuracy metrics of the system when surveying the studied area in mission mode. A summary of the accuracy and collision metrics per flight mode and victim location is shown in Table 5. Table 5. Accuracy and collision metrics of the system to locate a victim at two locations (L1 and L2). Each setup was evaluated for 20 iterations.

Setup Detections (%) Misses (%) Non-Victims (%) Collisions (%)
Mission ( Regardless of the victim location in the surveyed area, the use of the MobileNet SSD CNN model had a negative impact on any flights using mission mode. After testing each setup combination for 20 iterations, the positive detection rate for victims in a trivial location (or Location 1 (L1) as referred in Table 5) did not exceed 20% of the reported detections. In contrast, offboard and hybrid flight modes reported improved detection rates of 100% and 92.5% respectively. An overall slight decrease in the victim detection rates was observed between trivial and complex locations, where up to 5.0% of the iteration runs failed to confirm a victim's location in hybrid mode. The tests reported that 2.5% of the iterations in hybrid mode (5.0% for Location 1) with the UAV colliding with obstacles. However, taking into account all of the tested runs from offboard and hybrid modes, the overall collision probability experienced using the POMDP-based motion planner is 1.25%.
A speed analysis of the evaluated flight modes tracked the elapsed time from the initial position waypoint until a victim detection was confirmed. The collected data are represented with box plots, as shown in Figure 17. The data distribution shows a wider range in the top quartile and top whisker on the conducted tests using offboard mode for both victim locations. This distribution anomaly was caused by increased uncertainty given to the solver to compute a motion policy for a higher extent of area to survey compared to hybrid mode. Similarly, a bigger search area resulted in increased combinations of traversed paths to survey and locate the victim. Recorded paths indicated a trend of the solver to navigate towards the centre of the area extent, surrounding trees, as shown in Figure 18a.
Data distribution and traversed paths on tests under hybrid flight mode were more consistent as the exploration extent is constrained to the lawnmower pattern followed in mission mode. However, elapsed times in hybrid mode were considerable higher at both victim locations because of the occurrences of false positives. As first detection readings trigger offboard mode to inspect such locations, the system required additional time to confirm or discard the presence of any victim. An illustration of the traversed path by the UAV in hybrid flight mode is shown in Figure 18c.
The average elapsed time to locate the victim in locations 1 and 2, shown in Table 6, indicates contrasting values between offboard and other flight modes. A trend to detect a victim at location 2 quicker than at location 1 was caused by the proximity of the initial position waypoint to begin the search (bottom right corner of the survey pattern). Overall, the system required 2.3 times more time to survey the flight area in hybrid mode than in mission mode. Nonetheless, using tuned CNN models from UAV image datasets is expected to decrease the instances of false positive readings at it was the case in this work. As the primary contribution of this work is on presenting a fully autonomous robust motion planner system against high levels of object detection uncertainty and partial observability, a pre-trained detector from a PASCAL VOC2012 dataset was purposely used to test the performance of the system.  A further preliminary evaluation of the UAV system presented in this paper has been conducted with real flight tests at QUT SERF. The UAV framework has been adopted from the system tested by Sandino et al. [27] for a SAR case study in GNSS-denied environments, which follows a similar system architecture for simulation and real flight tests. The demonstrative flights tested mission, offboard, and hybrid flight modes to locate an adult mannequin placed in a trivial location. As shown in Figure 19, the UAV frame consists of a carbon-fibre Holybro X500 quadrotor, a Pixhawk 4 with GPS and 433 MHz telemetry radio. The tested companion computer in HIL mode (Section 4.3) and software tools (Section 4.4) are integrated to the frame during the tests, as well as a GoPro Hero 9 for real-time streaming of RGB frames. A detailed visual representation of the system can be found in Supplementary Video S4.

Discussion
The presented system constitutes a step towards fully autonomous navigation onboard small UAVs in real-world environments under uncertainty and partial observability. This research expands the navigation problem of Sandino et al. [26,27], by (1) formulating a more sophisticated reward function for a better discrimination between exploration and object inspection; (2) proposing optimised action commands based on the desired overlap between the camera's FOV and the UAV altitude; and (3) incorporating a new flight mode (i.e., hybrid mode) to further optimise the exploration capabilities when partial information on the environment is provided beforehand (e.g., UAV navigation in outdoor scenarios, compared to GNSS-denied environments). Furthermore, this work indicates an enhanced design of UAV simulation environments by using realistic representations of ground textures and elevation, obstacle location and 3D occupancy maps from airborne UAV georeferenced mosaic and LiDAR datasets.
The system architecture and POMDP motion planner presented in this paper improve the autonomy capabilities of small UAVs under environment and object detection uncertainty. Under the presence of uncertainty from potential incorrect readings of vision-based detectors, such as CNNs, the standard objective of many existing approaches is to improve the performance metrics of their customised CNN-based object detectors for real-world UAV operations [40]. In contrast, this paper proposes an alternative approach to this uncertainty issue by augmenting the cognitive power onboard UAVs. Using reactive UAV systems that calculate and update a motion policy while interacting with the environment allow higher flexibility to detect objects (i.e., victims) at various levels of object detection uncertainty, regardless if this uncertainty comes from inaccurate detections by the detector model, noise in the data, or sensor malfunctioning. Even though the performance of the object detector model can be enhanced with tailored datasets and fit models from UAV data [41,42], this approach shows the possibility to use UAVs to navigate autonomously and accomplish a victim finding mission using off-the-shelf detectors, such as MobileNet SSDs.
The speed of the UAV system to survey an area is partly conditioned by the flight plan design, and especially the maximum surveying altitude. The maximum altitude is defined by extrinsic and intrinsic factors on a case study basis. Extrinsic factors include the maximum height of surrounding obstacles, lifting restrictions in high-altitude environments, and local flight regulations. Intrinsic factors are defined by hardware and software limitations in the UAV to detect objects (i.e., victims). For instance, the camera lens properties (e.g., sensor width and focal length) define the camera's FOV and, thus, the footprint extent and scale of observed objects. The image resolution of streamed camera frames into the computer vision module also determines the number of given pixels that represent an object to be detected using CNNs [43]. In this implementation, the system could capture positive victim detections using the MobileNet SSD model for a maximum surveying altitude of 20 m (and camera lens properties shown in Table 3). However, other CNN-based object detectors trained with airborne datasets are expected to allow surveys at higher altitudes and object detections with smaller scale visualisations, which translate in reduced duration for the UAV to survey the same area extent [40,43].
The design of the computer vision module is not limited to the MobileNet SSD detector, but it can be extended with other CNN architectures compatible with OpenVINO. Zhang et al. [44], for instance, propose a global density fused CNN model for UAV images that addresses object detection uncertainty factors such as variations in scale, viewpoint, and occlusion. A survey on CNN models for object detection on the VisDrone2019 datasetan established low-altitude UAV dataset-shows that recent architectures, such as Cascade RCNN, RefineDet, and CornerNet outperform methods such as YOLO, SSD, and faster RCNN [40]. However, low average precision values for the best performing models (i.e., 17.41% on the VisDrone2019 dataset compared to 40.6% on the MS-COCO dataset) implies that further efforts in CNN architectures for object detection from airborne UAV data are required. Future implementations of CNN models discussed above could require a separate study of any model performance impacts caused by the streaming and processing of high-resolution frames onboard sub-2 kg UAVs, and onboard inference of CNN models in embedded computing systems, such as the UP 2 , Jetson Nano, and Edge TPU [45,46].
The UAV system presented in this paper ultimately offers several benefits for first responders in the event of an emergency. The system provides real-time telemetry of camera frames processed onboard the UAV as, thus, prior secured risk and accessibility assessment of the environment, the presence of any found victims, and their visible health conditions. The generated list of GNSS coordinates shown in Figures 13-15 can be further transferred to the ground team (SAR squads) to update intervention strategy, and other UAVs for air-to-ground first aid deployment. An additional extension of this system is the use of other sensors and computer vision models such as Doppler radars, microphones, and thermal cameras [5,47,48], which provide a better understanding of health conditions from potential victims.

Conclusions
This article presented a modular system for autonomous motion planning and object finding, capable of running in resource-constrained hardware onboard small UAVs. The system uses a POMDP to increase autonomy of UAVs for obstacle avoidance, environment exploration, and object inspection in outdoor environments under object detection uncertainty and partial observability. The system was evaluated with an emulated SAR scenario in Gazebo using real-world airborne georeferenced rasters, and airborne LiDAR point clouds to produce 3D occupancy maps. The task consisted of finding a lost person last seen in a bushland in two locations at various levels of occlusion and complex path planning strategies because of nearby obstacles. High-fidelity simulation results were achieved as the proposed system architecture combines SIL to emulate real-behaviour from a PX4 autopilot, and HIL for onboard inference of a MobileNet SSD detector, and the computed motion policy using TAPIR and the ABT solver.
The proposed UAV system was tested in simulation and with preliminary real flight tests with three flight configurations (i.e., mission, offboard and hybrid) and four performance metrics: (1) heatmaps of accumulated GNSS coordinates of the victim; (2) accuracy metrics to detect a victim; (3) speed analysis of proposed flight modes utilising the POMDPbased motion planner; and (4) speed analysis until the victim is found. Heatmaps show that false positive victim detections recorded in mission mode, from objects, such as trees, building structures and grass, were discarded (or rejected) by the UAV after exploring the area using the POMDP-based planner with and without a pre-defined flight plan (i.e., in hybrid and offboard modes respectively). Offboard and hybrid flight modes achieved the highest detection accuracy rates (of 100% and 92.5% respectively), and a slight decrease in detection rates (from 95.0% to 90.0%) was observed between trivial and complex victim locations. The elevated occurrence of false positive detections from the MobileNet SSD detector resulted in increased flight times in offboard and hybrid flight modes. However, the use of other CNN architectures, such as Cascade RCNN, RefineDet, and CornerNet are expected to reduce flight times of the UAV to inspect and reject false positive detections.
Further research avenues include a detailed experimental analysis using real flight tests, static and dynamic obstacles, and with sensors commonly used for SAR applications, such as thermal cameras. Placing the victim at random locations or increasing the number of victims could contribute to further understand the limits of the used POMDP solver. Similar case studies that can also be explored include surveillance for victims, using dynamic camera gimbal configurations and computer vision systems, which provide additional statistics of the victim, such as age, gender, and heartbeat rate. An evaluation of performance impacts in resource-constrained hardware for onboard inference using more complex CNN architectures for UAV object detection, and processing of high-resolution streamed frames, could help understand the limits of the proposed system architecture in sub-2 kg UAVs. Other sequential decision-making algorithms could also be implemented to compare the performance of the ABT solver.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A. Difference Equation Constants
A Holybro S500 quad-rotor UAV (Holybro, China) was tested in a controlled VICON flying arena for a series of position responses in the x, y and z Cartesian axes. The UAV responses per axis were recorded for increments of 0.5 m every five seconds. The obtained difference equation constants for a sampling period T s = 0.1 s are defined in Tables A1-A3: