UAV4PE: An Open-Source Framework to Plan UAV Autonomous Missions for Planetary Exploration

: Autonomous Unmanned Aerial Vehicles (UAV) for planetary exploration missions require increased onboard mission-planning and decision-making capabilities to access full operational potential in remote environments (e


Introduction
The use of unmanned aerial vehicles (UAVs), or drones, continues to spread across diverse fields such as ecology, geology, environmental protection and planetary exploration [1][2][3][4][5]. Planetary exploration is also an area of growth, with the Mars Helicopter Ingenuity (see Figure 1), exceeding remarkable performance goals. At the time of writing, Ingenuity has completed over 34 successful flights [6], providing key foundational knowledge in using UAVs for planetary exploration.
Originally developed as a technology demonstration, Ingenuity has endured much longer than baseline and increased the efficiency of the National Aeronautics and Space Administration (NASA)'s flagship Perseverance rover mission. Perseverance was designed to study habitability and biosignature preservation in rocks in Mars' Jezero crater, to drill core samples from them, and prepare the samples to be returned to Earth by future missions [7][8][9]. Ingenuity's remarkable performance in scouting locations for Perseverance's main mission gave NASA the confidence to include two Ingenuity-like helicopters to support the Mars Sample Return (MSR) program in the years ahead [10], further consolidating the importance of UAVs for planetary exploration tasks (See Figure 2). One important objective of mission planning for Mars exploration is the search for biosignatures. Biosignatures are morphological, mineral, chemical, or isotopic traces of organisms preserved in the rock record [11]. UAVs can help accelerate the search for biosignatures at different scales, with textural biosignatures being the most promising for near-future UAV-based applications [12]. However, the literature on autonomous UAVs capable of making real-time decisions towards the detection of biosignatures is scarce [5].
The need for higher levels of UAV autonomy for planetary exploration is increasing, especially with the proliferation of more UAV missions in complex scenarios such as NASA's Dragonfly mission to Titan, with over 2 hours of communication delay. Tasks such as autonomous waypoint surveying or navigation strategies that preserve safety margins have the potential to accelerate the deployment of UAVs for planetary exploration scenarios. But mission-planning and navigation strategies for remote planetary exploration is an emerging field of research, with limited availability of frameworks with which to perform studies.
Ingenuity validated the value and significance of UAV platforms for planetary exploration missions. The helicopter's capability to accelerate planetary exploration was demonstrated after moving from a technology demonstration phase to an operational phase [13,14]. NASA Scientists reported multiple benefits of having the Ingenuity helicopter available to them, including the capability to acquire contextual data in the area surrounding the rover's workspace, which is critical to the interpretation of the geological features studied in each location visited by the wheeled rover [15].
Scouting further afield is the main application of Ingenuity, and these scouting missions are strictly timed and overseen using state machines and waypoints [15]. Ingenuity has flight-level autonomy and can execute a mission autonomously with close monitoring from Earth. In this scenario, the mission planners are humans on Earth that pick the waypoints and commands that Ingenuity will execute autonomously, which is necessary due to the long latency associated with communications between Earth and Mars (over 7 min).
Future UAV missions may be designed to scout or assist a Mars rover in different ways, such as localisation enhancement by context images [16], target identification, and sample collection [10]. Other UAV missions will aim to conduct scientific experiments without a wheeled companion, for example, NASA's Dragonfly mission to Titan [9]. All such UAV applications will require successful autonomous mission planning, take-off, navigation, and landing.
The science yield of UAV missions such as Dragonfly can be increased by implementing uncertainty-tolerant methods to deal with environmental and internal system uncertainty involved in UAV mission planning. Several techniques to deal with mission planning have been proposed and are summarised in [17]. However, complete UAV mission planning implementations for planetary exploration are scarce, particularly those focused on uncertainties. Additionally, the authors are not aware of implementations of tools to test, validate and benchmark UAV autonomous mission planning formulations for planetary exploration.

UAV Autonomous Mission Planning for Planetary Exploration
Through the advancement of autonomous mission planning for UAVs, we aim to leverage the capabilities of UAVs and onboard sensors to accomplish more prolonged and more complex missions. A wide range of applications, such as broader-scale terrain exploration and more-targeted, faster and lower-cost data acquisition, are some of the main prospects. Previous studies have focused on techniques to improve autonomy at different levels in multiple areas such as the overall mission architecture [17][18][19], navigation [20,21], target identification [22,23], and flight [2,8]. Also of relevance are techniques developed for mission planning and navigation in Global Positioning System (GPS)-denied environments [24][25][26], as well as the use of Partially Observable Markov Decision Process (POMDP) [27].
POMDP formulations help with decision-making under uncertainty in the system behaviour (actions) and the environment (observations). UAV navigation [28] has benefited from POMDP approaches. However, mission planning can also be addressed with this approach [29]. One of the main challenges of using POMDP techniques is the computational resources required to generate an optimal solution that can be used online. To address this problem, online solvers have been developed since the introduction of the method by Kaelbling et al. [27] in 1998. Since then, POMDP has shown to be a robust approach for modelling a system when there is uncertainty in the actions and observations [25]. Another important challenge of using POMDP in real-world applications is the development of solutions (called Policies) that adopt safe and conservative behaviours. The trade-off between the exploration of actions and exploitation of rewards demands significant attention due to its implications on the safe operation of a system [30].
Our previous work formulated a planetary exploration mission planning problem using POMDP [31]. The proposed formulation was initially implemented only by simulation in [24]. The work presented in this paper extends and improves the previous implementation by introducing a realistic simulation scenario using Gazebo (http://gazebosim.org/ (accessed on 10 November 2022 )) and deployment on a real platform using the Robotic Operating System (ROS) (https://www.ros.org/ (accessed on 10 November 2022)) and PX4 Autopilot (https://px4.io/ (accessed on 10 November 2022 )). The POMDP problem formulation was also simplified, and further tests of the formulation parameters are available. To address the paucity of widely available open-source frameworks to test and experiment with different mission-planning and navigation strategy configurations, we present here UAV4PE, an open-source framework that can be modified and extended. The proposed framework provides an option to abstract the complexity of the UAV systems and simulation environments, combining efforts to consolidate and sustain a UAV planetary exploration framework that closes the gap between simulation, emulation and the real world.

Frameworks for UAV Autonomous Planetary Exploration
Multiple frameworks are proposed in the literature for relevant applications in the planetary exploration domain. Vanegas et al. [20] presented a framework for UAV navigation in GPS-Denied environments, where a POMDP-based probabilistic motion planning code is proposed and validated in simulation and with real experiments. Walker et al. [32] proposed a framework that uses multiple UAV exploration for target-finding in GPS-denied and partially observable environments. This framework focused on simulation and modelling the exploration planning problem as a decentralised multi-agent graph search solved using a modern POMDP solver. Sandino et al. [23] introduced a framework for navigation and object detection in cluttered indoor environments, exposing the results in simulated and real-world scenarios.
F-Prime (https://github.com/nasa/fprime (accessed on 10 November 2022)) is the current open-source firmware used in Ingenuity [33]. This framework facilitates the implementation of modular components, which are designed to be compact, reusable and portable, and capable of being compiled and executed in diverse hardware architectures, including ARM, X86, and others. The F-Prime framework also facilitates abstraction from the operative system, providing options for dealing with threads, synchronization, files, and time. The F-Prime firmware is implemented in C++, which places it close to the hardware resources, facilitating efficient implementations and deployments on final hardware, despite bearing complex and often slower development and testing procedures.
Maxim et al. [34] presented the POMDP.jl framework for sequential decision-making under uncertainty. This framework uses the Julia (https://julialang.org/ (accessed on 10 November 2022)) programming language, which is growing in popularity due to its fast, composable, general, dynamic, reproducible and open-source philosophy. POMDP.jl aims to be a common programming vocabulary for expressing problems as MDPs and POMDP, writing solver software, and running simulations efficiently (https://github. com/JuliaPOMDP/POMDPs.jl (accessed on 10 November 2022)). Klimenko et al. [35] prosed TAPIR (https://github.com/rdl-algorithm/tapir (accessed on 10 November 2022)), a software toolkit for approximating and adapting POMDP solutions online [35]. This solver has been widely used for online motion planning and target finding. It also includes standard benchmark tests, including rock sampling, tag, homecare and others, that serve as templates to be extended. However, the above-mentioned frameworks fail to address the need for benchmarks and frameworks to understand UAV autonomous mission planning for planetary exploration scenarios.

Background
The essential tools on which the UAV4PE framework rely are introduced and explained in this section, including ROS, the UAV Flight stack used, and the POMDP.jl library.

ROS
The Robot Operating System (ROS) was chosen as the meta-operating system for the UAV4PE Framework. ROS provides access to hardware abstractions, low-level device control, packages, nodes and communication management. ROS also offers standard and well know interfaces for the cameras, operating systems, and flight controller (Autopilot) systems used in this work. Note that each version of ROS is matched with a version of Ubuntu; in this work, we use ROS Noetic version, which was installed in Ubuntu Mate 20.04. Ubuntu MATE was installed in a Raspberry Pi 4 model B, using a microSD card. This selection was performed based on multiple years of experience with previous embedded onboard computers and versions of ROS and Ubuntu. ROS documentation is extensive and detailed; this section will introduce the main key elements to clarify the architecture used in the UAV4PE framework. Further documentation and tutorials about ROS can be extended on ROS's official websites (https://wiki.ros.org/ (accessed on 1 Dex 2022 )).
The ROS concepts of packages and nodes are essential for the UAV4PE Framework. These concepts provide modularity and interoperability across the framework and its components. The framework philosophy separates simulation, emulation, mission planning, navigation, vision, experiments, sensing, and hardware scopes as the main packages. Each package can include a set of nodes running a modular task programmed in Python or C++. Additionally, each package is set as an independent code repository, allowing individual module changes to be tracked independently. This folder structure follows the ROS package architecture. Each package can contain various subfolders for source files or scripts known as nodes, ROS launch files, models, custom ROS message type definitions and configurations.
The executable scripts within the packages are called Nodes and benefit from the ROS communication backend following the subscriber/publisher philosophy. Nodes can publish messages to information channels called Topics in the ROS communication backend; other nodes can access these messages when subscribing to topics. Note that each node can subscribe to and publish various topics and message types.

UAV Flight Stack
The UAV is controlled by a Pixracer flight controller module using PX4, running the firmware version FMUv4 1.11.3. The PX4 autopilot presents multiple benefits to the framework. It provides a highly customisable flight controller stack, compatible with standard UAV communication interfaces such as MavLink (https://mavlink.io/en/ (accessed on 10 November 2022 )). PX4 firmware also supports a wide range of flight controllers and sensor integrations. It can also be integrated with tools such as QGroundControl (https://docs.qgroundcontrol.com/master/en/getting_started/quick_start.html (accessed on 10 November 2022)), which is also open source and facilitates the visualisation of telemetry information, images, parameter configuration, and sensor calibration; and is available in most operative systems, including Android and iOS. The PX4-based autopilot is commanded using its offboard functionality, where a companion computer is connected through serial communication.
In this work, we also use the Queensland University of Technology Aerospace System (QUTAS) GitHub (https://github.com/qutas (accessed on 10 November 2022)), which hosts a collection of open-sourced projects. This work uses multiple QUTAS projects to control the small UAV platform utilised in real-world experiments. It also provides tools to simulate UAVs, providing quick testing in virtual cost-free environments with sufficient proximity to real-world results.

POMDPs and the POMDP.jl Library
A POMDP is defined by the tuple S, A, O, T, Z, R, b 0 , γ , where S, A, O are a finite set of UAV states, actions, and observations, respectively [36]. Whenever the agent in POMDP theory or UAV in this case, takes an action a ∈ A from a state s ∈ S, the transition probability to a new state s ∈ S is defined by a transition function T(s, a, s ) = P(s | s, a).
With a taken action a ∈ A, the UAV receives an observation o ∈ O encoded by the observation function Z (s , a, o) = P(o | s , a). Every decision chain is then costed with a reward r, calculated using the reward function R(a, s). Considering that data gathering by UAV systems is imperfect, partial observability about the state of the UAV itself is always present in real-world applications. As a result, a POMDP uses probability distributions over the system states to model the uncertainty of its observed states. This modelling is denominated the belief b(H) = P[s 1 | H], · · · , P[s n | H], where H is the history of actions, observations and rewards the UAV has accumulated until a time step t, or H = a 0 , o 1 , r 1 , · · · , a t−1 , o t , r t . The planning policy π of the UAV is represented by mapping belief states to actions π : b → A. The POMDP formulation solution is the optimal policy π * , calculated as follows: where γ ∈ [0, 1] is the discount factor and defines the relative importance of immediate rewards compared to long-term rewards. A given POMDP solver starts planning from an initial belief b 0 , which is usually defined with the initial conditions (and assumptions) of the flight mission using probabilistic distributions. Multiple libraries and solver implementations of POMDP solvers are available in the literature [5]. POMDP.jl is used in this framework due to its flexibility and active community. Additionally, the library provides many MDP and POMDP offline and online solvers. In this work, the POMCP solver [30], supplied as the basicPOMCP.jl (https: //github.com/JuliaPOMDP/BasicPOMCP.jl (accessed on 10 November 2022)), was used to generate a policy solution online. The policy was used to choose the following action command, so the action maximises the mission planning expected reward.
In previous work, Serna et al. [31] formulated a high-level approach for the UAV mission planning problem. The formulation in this work is a simplified version of the one proposed by Serna et al. [31], as illustrated in Figure 3. The main differences include the removal of the crashed state. Additionally, the consistency (externals related to surface type and environment) and integrity (internals related to battery levels and system health) components were omitted in this work to increase the framework's simplicity. The proposed simplification allowed for the study and integration of the framework components, providing a concise starting point that can be scrutinised more manageably.

Framework Implementation
The UAV4PE framework is based on commonly used tools, including Linux (Ubuntu 20.04), ROS, PX4 autopilot, and the Julia POMDP Library, POMDP.jl [34]. The combination of these tools provides a promising framework for further investigation into the autonomy of UAVs for planetary exploration. This work also validates the framework's applicability in real-world platforms using embedded computers such as Raspberry Pi. In addition, the framework offers tools to automatically run and report experiments, contributing to the quick and safe development of new navigation and mission planning formulations that can be deployed quickly on real-world platforms based on PX4-compatible autopilots. The UAV4PE framework includes standard metrics that can be used to evaluate, benchmark and compare the performance of future formulations and scenarios.
The POMDP model formulation introduced in this work reduces the observations from the system proposed in [31] to the UAV state obtained by the navigation system. Figure 3 presents the simplification, where the Integrity block is reduced to a battery monitoring function inside the navigation module to command emergency landing if the available power drops below 10%. However, this is not included in the POMDP formulation observations.
The Consistency block reflects the system state and is the main and only observation used on the POMDP solver at each timestep. The timestep was set to ten seconds, where the POMDP solver optimizes the policy and informs the next action a ∈ A that maximises the reward. This timestep value also ensures the navigation module has enough time to execute the commanded actions.
The biosignatures detection module implemented in this work can detect ArUco markers targets and surface types (colour) red and green. However, the surface type detector is not included as an observation of the POMDP formulation. The mission planning block results are visualised using Tmux (https://github.com/tmux/tmux/wiki (accessed on 10 November 2022)) and SSH (https://www.ssh.com/academy/ssh/openssh (accessed on 10 November 2022)), Rviz (http://wiki.ros.org/rviz (accessed on 10 November 2022)) and RQT (http://wiki.ros.org/rqt (accessed on 10 November 2022)) programs running in the Ground Control System (GCS) using the ground control computer (Raspberry Pi). The mission planning node implements the UAV planetary exploration mission planner problem formulated as a POMDP, as presented in Section 3.1, and solved online using the basicPOMCP solver from the POMDP.jl framework [35].
The Robot Operating System (ROS) was used to connect all the framework modules. The ROS software architecture is illustrated in Figure 4, depicting the main nodes, packages and launch files used to run experiments, either in real-world, emulation or simulation scenarios. The items in red represent the hardware components required only in realworld tests. This facilitates testing without the need for real hardware and facilitates the integration of additional hardware and software elements.

UAV4PE_Mission_Planner
The UAV4PE_mission_planner package contains a high-level POMDP mission planning formulation and POMDP online solver. The formulation is solved online using the BasicPOMCP solver from the POMDP.jl library, interfaced with ROS using RobotOS.jl (https://jdlangs.github.io/RobotOS.jl/latest/ (accessed on 10 November 2022)). The ba-sicPOMCP solver runs in a single thread of an AMD® Ryzen 5 2600 six-core processor CPU with 16 GB of RAM and an NVIDIA GP1060 [GeForce GTX 1060 5GB] during simulations and emulations. In real-world experiments, the solver runs using a single thread in a Raspberry Pi 4 model B.
The POMDP formulation is designed to plan the essential UAV mission steps required for a UAV planetary exploration and target detection mission, as illustrated in Figure 5. The UAV's essential mission steps are modelled as the system's states to depict what tasks the UAV is executing at any time. Each POMDP formulation element is presented and described in detail in the following sections.

States (St)
In this work, each state st ∈ S of the UAV is modelled as a discrete number between 0 and 4 and is used as follows: (0) Landed, (1) Hovering, (2) Horizontal exploration, (3) Vertical inspection, and (4) Landing. Figure 5 illustrates the states with the action with the highest probability of moving the UAV to that state.  (5) Landing. The arrows indicate the action with the highest probability of moving the UAV to that state. Green areas are safe for landing, while red areas are dangerous to land over.

Observations (O)
The main observation used for the POMDP formulation is the mission's current State (St), which is partially observable due to the intrinsic complexity of the tools involved in the navigation and control of the UAV. The uncertainty in the observation was modelled using a sparse categorical probability distribution. The distribution specifies the probabilities of observing a state st ∈ S for a given action a(t) ∈ A. Table 1 present the probabilities used.

Actions (A)
The proposed actions model the minimum steps required to plan and perform a UAV mission to explore a planetary surface while collecting detailed data on places of interest during the exploration. The finite set of actions a ∈ A is defined as follows: • Stay On the Ground (a = 0): If this action is commanded from a different state to Landed, the navigation module performs a landing action. This action incorporates charging, processing, and idle tasks. • Hover (a = 1): It is a transition action between Take-off, Explore, Inspect and Landing actions. If commanded from the Landed state, a take-off action is attempted; otherwise, this action holds the position of the UAV. • Horizontal search or Explore (a = 2): The Explore action commands the UAV to explore, following which it will change its airborne horizontal position based on the navigation strategy, with a preference for unexplored regions. By performing this action, the UAV aims to explore the map cell by cell. • Vertical descend or Inspect (a = 3): The Inspect action commands the UAV to descend and collect surface data at a higher resolution. • Land (a = 4): This action is the last step to satisfy a successful flight and commands the UAV to navigate to the closest safe landing region and land there.

Transition Function (T)
The transition function generates a future state St t+1 probability or belief based on a previous state St t and an action a t using a sparse categorical probability distribution, explicitly indicating the probability of transition between states after choosing an action (see Figure 6). The transition probabilities are detailed in Table 2.  The reward function is a parameter list that rewards each set (state St t and action a t ). The reward function uses conditional if statements based on the status of the state St t to apply the respective rewards, as presented in Algorithm 1. The reward function intends to maximise the time spent inspecting or exploring and addresses the cost of executing a mission, taxing the take-off, hovering, and landing actions. To accomplish that maximisation, positive/big reward values were used for the inspection and explored rewards, while negative/small values were used for other rewards.

UAV4PE_Environments
This package contains the 3D models required for the gazebo emulation environment. Furthermore, the UAV4PE_environments package contains launch files to launch the multiple components required in simulation and emulation, as illustrated in Figure 4.

Github.com/Qutas
Multiple packages from the Queensland University of Technology Aerospace Systems (QUTAS) GitHub were used. The spar_uavasr.launch file from the qutas/spar package was used to run the different nodes required for the simulation environment. The environment.launch file from the qutas/qutas_lab_450 package was used to activate the connection with the motion tracking system in real-world tests and to activate reference frames and safety areas for the UAV motion in simulation, emulation and real-world experiments. Further details can be found in each package repository.

UAV4PE_Navigation
The UAV4PE_navigation package contains multiple scripts, including a navigation node launched using load_navigation.launch integrated with the POMDP formulation, as dis-cussed in Section 3.1. This navigation node contains a navigation strategy that commands the UAV to the next available and unexplored cell using the following steps: (1) north or up, (2) west or left, (3) south or down, and (4) east or right. If none of the contiguous cells is explorable due to obstacles or already explored cells, the same strategy is applied to n cells: (1) n*cells up, (2) n*cells left, (3) n*cells down, and (4) n*cells right. Additionally, when there are no unexplored cells in the surroundings of the UAV, a navigation action towards the next closest valid cell is taken, moving the UAV towards the next available cell on the map from left to right, and top to bottom. Each test starts with the UAV in the same safe cell [8,8], in the centre of the map. The navigation strategy node allows for the exploration of adjacent cells while studying the POMDP formulation results, making the navigation steps easily traceable.

UAV4PE_Vision
This package includes vision-processing scripts based on OpenCV that can be used for emulation and simulation to detect the surface type (red or green) and ArUco markers. During emulation experiments, the images are generated using the Gazebo camera plugging, which creates a camera view attached to the drone. In real-world experiments, an OAK-D lite camera was used.

UAV4PE_Experiments
This package contains multiple scripts to run experiments and plot the results generated. Simulation and emulation experiments can be run using the script run_experiments.py contained in this package. Each experiment requires a configuration file that stores the minimum details required to replicate the experiments, including POMDP formulation parameters, solver parameters, and maps used. The configuration files are saved using a unique configuration number incremented for each new configuration, e.g., conf1.
The results from each experiment include (1) the navigation logs generated by the UAV4PE_navigation package and (2) the mission planning information in the form of decision trees generated by the UAV4PE_mission_planner package. The results of each experiment are stored using the local machine timestamp, which is stored using the following convention: UAV4PE_experiments/data/configuration/timestamp/map_name. The experimental results in the data folder facilitate the generation of multiple results plots, as presented in the results in Section 6. Experiments using real-world hardware generated data using the same folder conventions as simulation and emulation. However, the execution of the experiments in real hardware required careful execution of the launch scripts presented in Figure 4 to enhance safety.

UAV4PE_Hardware
This package lists and references compatible hardware components that can be used with the framework. The components used in this work are detailed in Section 4.

System Architecture
The previous literature reviews several design concepts for space exploration using UAVs [7]. However, this work selected a low-cost and commercially off-the-shelf platform as the hardware platform. This facilitates real testing of the framework and UAV platform without requiring sophisticated extraterrestrial testing facilities.
The selected system is a small (under 2 kg) UAV system, composed of an S300 frame, with brushless motors (1400 kv) and 20 Amps Electronic Speed Controllers (ESC), piloted by an mRo PixRacer R15 flight controller, and running PX4 firmware, as described in Figure 7. A Luxonis OAK-D lite fixed-focus camera sensor is used to collect surface images and detect targets and surface types. The fixed focus variation was selected for its performance in high-vibration applications, removing the need for damping systems or gimbals. A Raspberry Pi 4 model B running Ubuntu 20.04 and ROS Noetic is used as the onboard computer. A motion tracking array is placed on top of the UAV to capture the position of the UAV, replacing the GPS system that works outdoors for an indoor UAV local position system. A 3D scanned model of the UAV used in this work is available on Sketfab (https://skfb.ly/oyKuM (accessed on 10 November 2022)).  The system is powered by a 4000 mAh three-cell lithium polymer battery, which provides between 10.5 and 12.6 V in normal operating conditions. The voltage supplied by the battery is regulated to power the flight controller and the onboard computer, which powers other submodules as illustrated in Figure 8 in red. The battery provides up to 10 min of flight endurance while keeping the weight of the UAV under 2 kg. The UAV system uses multiple communication interfaces between the components, including the Universal Asynchronous Receiver-Transmitter serial communication or UART, the Inter-Integrated Circuit synchronous communication or I2C, the Serial Peripheral Interface or SPI, the Pulse Width Modulation or PMW, the Pulse Position Modulation for radio control or PPM, and analog and digital signals, as detailed in Figure 8 in black. Figure 8. UAV System architecture detailing power distribution from the battery to main components and peripherals using red arrows. The signals are also illustrated with black arrows detailing signal types between components. The components partially outside the X500 Frame box in grey represent physical interaction with the environment. The real UAV system experiments required extra hardware components, including a Ground Control System (GCS), to visualize the results and command the UAV to start the mission. Figure 9 illustrates the connections between the components in the real test case using the Raspberry Pi 4 model B onboard and a Raspberry Pi 4 model B as a GCS.

Experiments
A simple experimental scenario was introduced to facilitate testing the framework. The proposed scenario emulates a planetary exploration mission, including some of the main elements that can be expected, such as surface types, targets, and UAV conditions. Four maps were created to test the model under different conditions. The maps include the following six types of surfaces: (1) target, (2) unexplored, (3) safe, (4) explored, (5) boundary and (6) dangerous. The different surfaces introduce risks and potential targets (ArUco marker), which emulate the presence of biosignatures. The map was designed to be easily replicated and implemented virtually and in the real world. The maps implemented are presented in Figure 10. The selected size is 16x16 cells, allowing the UAV to execute the exploration mission. Once the UAV moves over a cell, it is marked as explored.

Experimental Approaches
Three approaches were used to test the framework and scenario proposed in Section 3.1. All the environments use the UAV4PE_mission_planner formulation with the BasicPOMCP solver and the UAV4PE_navigation module, which interfaces with the UAV depending on the scenario being tested, including the following: (1) Simulation using the QUTAS uavasr_emulator that simulates the UAV flight dynamics inside ROS. (2) Emulation using ROS Gazebo simulator and Software In The Loop (SITL) that emulates the UAV flight controller software (PX4). (3) Real-world experiments using the UAV system described in Section 4. The mission concept of operation that applies to the three experimental approaches is presented in Figure 11. The goal is to explore the allocated map in search of a target while avoiding dangerous areas. The target is detected using the ArUco marker OpenCV library in the real test or when the UAV system flies above the ArUco marker location in the simulation. The simulation and emulation experiments were executed using an automated python script within UAV4PE_experiments. Figure 11. Mission concept of operation, illustrating the test scenario, the expected mission steps, and surface types.

Simulation Environment
The simulation environment facilitates testing of the POMDP formulation presented in Section 3.1 and the basicPOMCP solver parameters. It also provides an ideal testing environment that separates UAV flight controller-related issues, with a clean and simple platform to execute experiments. A screenshot of the simulation environment using RViz is shown in Figure 12.
The simulation includes all the simulation software components presented in Section 3 (ROS, RViz, UAV4PE and QUTAS packages) and the formulation parameters presented in Section 3.1.

Emulated Environment
The mission planner system was tested in an emulation environment using Gazebo (http://gazebosim.org/ (accessed on 10 November 2022)) simulator. The gazebo simulator provides a realistic 3D environment with photo-realistic textures of the real environment. PX4 Software In The Loop (SITL) was used (https://docs.px4.io/master/en/simulation/ gazebo.html (accessed on 10 November 2022)), emulating software-level flight controller (autopilot) communication and behaviour. With the flight controller emulation component, this emulation approach drastically reduces the gap between the simulation and the real scenario, granting access to faster system development and debugging. A screenshot of the emulation environment is presented in Figure 13.

Real Environment
The real test scenario consisted of a 4 m × 4 m flying area with a vision-based position tracking system (Vicon) as the localisation source. The surface retains red and green carpets, following the surface type colour convention for dangerous and safe areas across the testing approaches. A picture of the setup is shown in Figure 14.

Experiments Setup
Due to the intrinsic probabilistic variations of each experiment, multiple experiments were run for each map and configuration, including twelve experiments for simulation and emulation and four for the real world. Multiple configurations were tested using different parameters for the reward function of the POMDP formulation and the online solver. The reward function presented in Section 3.1.5, Algorithm 1, was configured to reward the exploration and inspection as the mission's main goals while penalising other states, depending on their impact on battery and mission safety.
The number of configurable parameters in the reward function is six, as defined in Section 3.1. Tables A1 and A2 summarise the reward (R) values tested in simulation, emulation and real experiments. The number of POMDP solver parameters that can be configured in the basicPOMCP solver is nine. However, this work was limited to a single parameter test for the solver, the discount factor (γ, see Section 2.3).
Multiple simulation experiments were performed to explore the POMDP solver discount factor (γ) parameter influence on the mission planning formulation. This discount factor investigation consists of ten experiments. The experiments used a sub-configuration with identical configuration values and only changed the discount factor. The discount factor was incrementally changed from 0.1 to 0.99, with 0.1 increments.

Data Extraction and Analysis
The data generated during experiments are stored in multiple formats as described in Section 3.6. The data generated from the experiments are processed and stored in a Comma-Separated Value (CSV) file.
The CSV file was loaded into the Jupyter notebook part of the UAV4PE_experiments repository using Google Colab (https://colab.research.google.com/ (accessed on 10 November 2022)). The notebook uses multiple Python data science popular libraries, including Pandas, Matplotlib, seaborn and NumPy. The notebook was used to analyse the results of the experiments, and some of the most relevant analyses are presented in Section 6. However, a wider range of analyses can be found in the original notebook.
The data are stored in a Panda data frame using the structure presented in Table 3. The type column describes the experimental approach of the experiment (simulation, emulation or real). The conf column details the configuration used in the experiment; more details about tested configurations can be found in Tables A1 and A2. The expFolder column retains information about the experimental execution date and the folder name where the raw data were stored. The map describes the map used in the experiments; the map options were described in Section 5. The expNumber describes the experiment identification number using a counter that starts from zero for each new experiment configuration. Finally, a new metric is introduced to identify the parameter configuration that maximises the exploration performance in terms of the total area explored, the number of actions performed and if the target is found. The metric was called exploredRatio, and Equation (2) presents how it was computed for each experiment using the variable names introduced in Table 3.
Data Filtering Multiple filters were applied to the data to filter nullified experiments (due to tool issues) and to extract the data from different perspectives. One of the first filters removed the experiments where the area explored was equal to zero. Eleven experiments were found in emulation with a zero area explored. This was attributed to communication issues with the emulated UAV in Gazebo (UAV arming command rejected, avoiding UAV to take-off).
The experiments that successfully managed to explore 100% of the maps were also extracted. This extracted data were used to compare the configurations that successfully completed the mission and yielded the best performance (fewer action counts).
The data were also filtered to extract configurations tested in all three experimental approaches, including simulation, emulation and real-world experiments, showing the UAV4PE framework behaviour differences and similarities in those approaches.

Results
A total of 2112 experiments were used for analysis, distributed into 1659 experiments in simulation, 409 experiments in emulation and 44 real flights. Of all the experiments for the configurations tested, just 395 (18.7%) successfully completed the mission, exploring 100% of the map. Table 4 summarises the experiments, providing details about minimum and maximum values, standard deviation (std), the mean and percentiles. The overall results mean values grouped by map and configuration type are presented in Table 5. The Experiment Number column shows an experimental number that reflects the predominance of simulation and emulation experiments compared to the number of real experiments. The Target Found column indicates that experiments using maps 16A and 16AD found the target more often than experiments using 16B and 16BD. This is attributed to the navigation strategy that prioritizes exploring the top left side of the maps where the target is located in 16A and 16AD. The Area Explored column indicates that more area was explored in real experiments, followed by emulation experiments and simulation. The main reason for this explored area difference is the limited configurations used in real experiments (best configurations) compared to those tested on emulation and simulation. The Takeoffs Count column shows that the experiments on emulation performed fewer takeoffs. Finally, the Actions Count column indicates that simulation experiments executed more actions than emulation and real experiments. This difference is attributed to the relatively higher number of experiments and configurations tested on simulation. The overall results from Table 5 show the mean dataset distribution and wide information about the differences between the maps and experimental-type approaches. However, it provides a narrow understanding of the POMDP formulation behaviour due to the broader set of configurations presented. A further analysis in Section 6.1 provides detailed information about the POMDP formulation behaviour and the configurations that yield the most promising results.

Successful Missions Analysis
A successful mission was defined as a mission where the map explorable area is fully explored, and the target is found. A partially successful mission was defined as a mission where the target is found, but the map's explorable area was not fully explored. Following these definitions, 299 experiments were successful with 100% of the explorable area explored, while the target was found in 946 experiments. Experiments with configuration conf0 were excluded from this analysis as they were generated as a baseline metric and are successful by definition. Figure 15 shows the average success distribution for all the successful configurations tested grouped by the maps (see Figure 15a) and by experiment type (see Figure 15b), indicating a higher percentage of successful missions for the maps with increased dangerous area 16AD and 16BD compared to maps 16A and 16B. This indicates that successful missions are more likely to occur on maps with fewer explorable cells. Additionally, Figure 15b highlights the predominance of simulation and emulation results within the experiments. Partially successful mission results indicate the target was found in higher ratios in the maps 16A and m16AD, as presented in Figure 16a. This difference is attributed to the navigation strategy prioritising the exploration of the top-left area of the maps, where the target in 16A and 16AD is placed. This highlights the direct influence of navigation strategy and mission planning. Additionally, keeping the navigation module simple during the initial tuning and development of the mission planning formulation is important so the results can be analysed.
The configurations that yielded successful missions are presented in Figure 17, where the total number of actions to achieve the success is also illustrated. The configurations with the smaller number of actions are more efficient, meaning they require fewer actions to complete the mission.
The configurations conf1, conf2, conf7.1 and conf8, showed more efficient behaviours. However, the number of experiments for each configuration presented in Table A3 indicates that configurations conf1 and conf2 yield few successful missions, which, based on the twelve experiments per map experimental setup, can be attributed to glitches in the emulation approach. The next configuration with enough successful mission experiments is conf7.1, which presents eleven successful cases in real experiments, 32 in emulation and 20 in simulation (see Table A3). Thus, configuration conf7.1 consistently yielded the most successful mission experiments across experimental types.  The exploredRatio metric introduced in Section 5.2 was used to visualise and compare the configurations responsible for successful missions. Figure 18 presents the exploredRatio metric for each configuration and map. The higher the exploredRatio metric, the more efficient the configuration. Baseline configuration conf0 shows the theoretical maximum exploration ratio value, followed by configuration conf7.1. This metric provides a direct way to measure and compare the UAV-based POMDP mission planning configuration performance.
Two configurations (conf 7.1 and conf8) were tested on all the experimental type approaches, including simulation, emulation and real-world. Figure 19 presents comparative results showing that results for all maps and configurations in the real-world experiments used fewer actions to complete the mission successfully. This is attributed to differences in the maximum speed of the real UAV and the UAV model dynamics used in simulation and emulation. The figure also shows that 16AD and 16DB were completed with the fewest actions on average in the simulation and emulation experiments. In real experiments, maps 16AD and 16DB showed larger standard deviations than maps 16A and 16D due to the higher number of successful experiments for maps 16AD and 16DB.

Conclusions and Future Work
The results presented here show an approach to modelling UAV mission planning for planetary exploration using POMDP. The proposed POMDP formulation successfully planned and commanded a UAV in simulation, emulation and real-world experiments, exploring multiple maps and finding a target in different unknown locations. The low percentage of successful experiments, of around 18.7% over the total configuration tests, indicates that optimal configurations are not trivial. However, successful configurations, such as conf7.1, show that POMDP formulations can successfully plan UAV missions for planetary exploration. The results in this work condense the analysis provided in the UAV4PE_experiments notebook, which can be further explored to find detailed relations between configurations.
This work also introduces the main aspects of the UAV4PE framework, for which source code can be accessed to be reused and extended, providing extra tools to develop more autonomous and more robust mission planning strategies transferable to real-world applications. The result presented in this work can be used as a benchmark and baseline for further UAV autonomous mission planning studies. The data analysis notebook provides a powerful tool for using data science libraries to extract informative knowledge from the UAV4PE framework experiments. Furthermore, the new metric introduced in Section 5.2 can be used as a configuration fitness metric, facilitating further studies, for example, using genetic algorithms to search for optimal parameter configurations for the POMDP formulation and solver.
Future work focusing on UAV platforms that resemble UAV concepts for space exploration is encouraged. The development of systems for self-rescue in case of malfunctions is recommended. For this work, the scope was limited to a compact setup that can be replicated affordably, given the challenges related to testing UAV concepts in extraterrestrial environments. Nevertheless, the simulation and emulation environment can be modified to replicate extraterrestrial conditions using the Gazebo physics engine and the simulator UAV motion equations. Safety guidelines for UAV operation must be followed for real-world experiments. Moreover, additional safety features, such as netted flying area facilities, propeller guards, and UAV motor arming safe checks, are recommended. The experiments presented in this work can be extended in future work by adding, for example, experiments with terrain variations and momentary environmental changes such as wind gusts. Future work could focus on the inclusion of more realistic planetary exploration environments such as Mars landscapes and 3D terrain models provided by NASA (https://github.com/nasa/NASA-3D-Resources (accessed on 10 November 2022)). Additionally, natural 3D surfaces digitised via photogrametry with realistic targets can be added using Digital Elevation Models (DEM) or Meshes.
Further improvements can be achieved in the mission planning formulation. The framework modularity allows for adding more detailed and precise models to describe additional system observations and actions. Environment-aware observations such as the sun and wind effects can be introduced. Mission planning formulations can benefit from the inclusion of energy storage dynamics, power management and harvesting, supplementary surface types, and uncertainty in emulating a GPS-denied environment such as Mars. A UAV endurance analysis can also be performed in future work to identify its impact on mission planning in different environments such as Mars or Antarctica. We used an ArUco marker detector as the biosignature detector module; however, we also developed a biosignature detection system separately [12].
Ongoing work aims to enhance the model definition and formulation, improve the framework's documentation, and explore the influence of broader solver configurations. Future work also will involve migrating the packages in the framework to ROS2.

Data Availability Statement:
The collected data supporting this article's research findings are available in UAV4PE_experiments/data.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: