Flying Free: A Research Overview of Deep Learning in Drone Navigation Autonomy

: With the rise of Deep Learning approaches in computer vision applications, signiﬁcant strides have been made towards vehicular autonomy. Research activity in autonomous drone navigation has increased rapidly in the past ﬁve years, and drones are moving fast towards the ultimate goal of near-complete autonomy. However, while much work in the area focuses on speciﬁc tasks in drone navigation, the contribution to the overall goal of autonomy is often not assessed, and a comprehensive overview is needed. In this work, a taxonomy of drone navigation autonomy is established by mapping the deﬁnitions of vehicular autonomy levels, as deﬁned by the Society of Automotive Engineers, to speciﬁc drone tasks in order to create a clear deﬁnition of autonomy when applied to drones. A top–down examination of research work in the area is conducted, focusing on drone navigation tasks, in order to understand the extent of research activity in each area. Autonomy levels are cross-checked against the drone navigation tasks addressed in each work to provide a framework for understanding the trajectory of current research. This work serves as a guide to research in drone autonomy with a particular focus on Deep Learning-based solutions, indicating key works and areas of opportunity for development of this area in the future.


Introduction
Since 2016, drone technology has seen an increase in consumer popularity, growing in market size from 2 billion USD in 2016 [1] to 22.5 billion USD in 2020 [2]. As small form factor UAVs similar to the drone pictured in Figure 1 flooded the market, several industries adopted these devices for use in areas including but not limited to cable inspection, product monitoring, civil planning, agriculture and public safety. In research, this technology has been used mostly in areas related to data gathering and analysis to support these applications. However, direct development of navigation systems to provide great automation of drone operation has become a realistic aim, given the increasing capability of Deep Neural Networks (DNN) in computer vision, and its application to the related application area, vehicular autonomy. The work outlined in this paper is twofold: (1) it provides a common vocabulary around levels of drone autonomy, mapped against drone functionality, and (2) it examines research works within these functionality areas, so as to provide an indexed top-down perspective of research activity in the autonomous drone navigation sector. With recent advances in hardware and software capability, Deep Learning has become very versatile and there is no shortage of papers involving its application to drone autonomy. While domain-knowledge engineered solutions exist that utilize precision GPS, lidar, image processing and/or computer vision to form a system for autonomous navigation, these solutions are not robust, have a high cost for implementation, and can require important subsystems to be present for optimal operation, such as network access. The focus in this paper is on navigation works that utilise Deep Learning or similar learning-based solutions as a basis for implementation of navigation tasks towards drone autonomy. Just as Deep Learning underpins the realisation of self-driving cars, the ability of trained Deep Learning models to provide robust interpretation of visual and other sensor data in drones is critical to the ability of drones to reach fully autonomous navigation. This paper aims to highlight navigation functionality of research works in the autonomous drone navigation area, across the areas of environmental awareness, basic navigation and expanded navigation capabilities. While the general focus is on DNN-based papers, some non-DNN-based solutions are present in the collected papers for contrast. Research projects focused specifically on the development of new navigational techniques with or without the cooperation of industry partners form the definition of what is considered as the state of the art-not as currently implemented solutions in industry but solutions and implementations being actively researched with the potential for future development.

Sources
Our overview covers peer-reviewed publications, acquired using conditional searches of relevant keywords including "drones", "autonomous navigation", "artificial intelligence" and "deep learning" or other similar keywords in databases of quality research such as Google Scholar, IEEE Xplore and ArXiv. The most common source of publications found after selection was revealed to be the IEEE Xplore database [3], likely due to the high coverage of high-quality published academic research in the area of electronic engineering and computer science. From the sources found, the most relevant papers on autonomous drone navigation were selected by assessing their relevance to the topic as well as the number of citations per year, as a basic measure for citation analysis [4,5]. The set of papers selected is referred to as the "research pool" (Appendixes A-E).

Approach
In this section, we explain the structure and high level metrics that we apply to this overview.

Levels of Autonomy
As a first step, we need to define the concept of autonomy for drones, with a view to recognising different levels of autonomous navigation. This paper identifies the emergent navigation features in current research against these levels. We apply the Six levels of autonomy standard published by the Society of Automation Engineers (SAE) International. Though the context of these levels was intended by SAE for autonomous ground vehicles, the logic can apply to any vehicle capable of autonomy [6]. The concept of autonomy for cars and drones is similar, implying a gradual removal of driver roles in the navigation of obstacles and path finding. This, progressing to fully independent autonomous navigation regardless of restrictions due to surface bound movement or obstacles. By examining the SAE levels of autonomy for cars, we note how each level is directly applicable to drones. This provides a useful line of analysis for our overview In Figure 2, we set out the functionality of drone navigation, mapped against these levels of autonomy. Autonomy starts at Level 1 with some features assisted, including GPS guidance, airspace detection and landing zone evaluation. These features are designed to provide automated support to a human operator. These features are already to be found in commercially available drones. Level 2 autonomous features are navigational operations that are specific and use case dependent, where an operator must monitor but not continuously control. In the context of drone operation this can include features where the drone is directed to navigate autonomously if possible, e.g., the "follow me" and "track target" navigational commands. Some of these features are available in premium commercial products. Level 3 features allow for autonomous navigation in certain identified environments where the pilot is prompted for engagement when needed. At level 4 the drone must navigate autonomously within most use cases without the need for human interaction. Level 5 autonomy implies Level 4 autonomy but in all possible use cases, environments and conditions and as such is considered a theoretical ideal that is outside the scope of this overview. Though this paper aims at evaluating the features of papers in the context of Level 4 autonomy, it was found that the bulk of the papers approached in the research pool involved Level 2 or 3 autonomy, with the most common project archetype involving DNN training for autonomous navigation in a specific environment.

Features of Autonomy
We identified that autonomous navigation features fall into three distinct groups: "Awareness", which details the vehicle's understanding of its surroundings, which can be collected via non-specific sensors; "Basic Navigation", which includes the functionality expected from autonomous navigation, such as avoiding relevant obstacles and collision avoidance strategies; and "Expanded Navigation", which covers features with a higher development depth such as pathway planning and multiple use case autonomous navigation. These groupings and their more detailed functional features are listed in Figure 3, as identified for Level 4 automation. In addition, we note that common engineering features are a useful category for this overview of navigation capability, and we include these as a fourth category for analysis. This is done to acknowledge projects in the research pool that are aimed at achieving a goal within a given hardware limitation, such as optimisations for lower-end hardware and independence from subsystems such as wireless networks [7].

Citations
In this overview, we indicate the level of research activity by functional area of autonomous drone navigation. We note that within the research domain of autonomous drone navigation there is a lack of standard metrics to enable comparison of contribution and performance. In Section 3, we include "number of citations" as a basic indicator of research attention, whilst also acknowledging that the number of citations can be ambiguous. We order our research by number of citations per year to allow for elapsed time building larger citation counts. We also note that citations in themselves are not a quality indicator, but are simply an indicator of research attention/critical analysis from other works.

Evaluation Criteria in the Literature
The most common technical approach in the research pool is that of deep learningbased navigation policies implemented on monocular quad-rotor helicopter drones. Within these, the most common criteria for the evaluation of neural networks are accuracy and F1 score. These are applied to assess the ability of the particular DNN to correctly address a particular sensor-data driven tasks, such as object detection, image classification or distance assessment. While accuracy is straightforward, being a direct measure of the network's ability to predict values correctly against the test dataset, F1 score is less transparent as a harmonic mean of precision and recall [8]. As such, a low F1 value implies a high number of false positive predictions. Due to DNN accuracy being dependent on the quality of the data, and F1 score being both data-specific and situational, we consider it irrelevant to compare the accuracy and F1 score of one DNN architecture to another if the application of the said architecture is in an entirely different environment. Efficiency, in the context of drone navigation, can take the form of processing time in milliseconds (ms), or the power draw while the solution is running in milliwatts (mW). This can be relevant across environments and applications, as it is in part a product of the DNN architecture itself and the implementation of that architecture into experiments, not necessarily the training/test dataset that was fed into it. For this overview, this metric is only represented in the form of processing time, as power draw is more reliant on the engineering of the hardware. Though evaluating quantitative values such as accuracy, efficiency and F1 score are outside the scope of this paper, they are included where visible in the full research pool.

Results
The following results are a subset of the full research pool that contains the navigation features of the most cited papers per year published, organised by the feature headers described in Figure 3. Quantitative results, using the aforementioned typical evaluation criteria, are available for reference in Appendixes A-E (A complete evaluation matrix for the research pool, with bold text for readability, is available in Table S1 in the Supplementary Materials, additionally Table S2 is included in the Supplementary Materials as an abbreviation legend).

Awareness
This encompasses any feature that is included in the referred solution as analysis of the drone's spatial environment; though basic navigation features can be developed without this understanding, it limits the capability of the said navigation. Projects that do not include awareness features could lead to limited command capability and an over-reliance on prediction; the feature mappings of the awareness section can be seen in Table 1. •

Spatial Evaluation (SE):
The drone can account for the basic spatial limitations of its surrounding environment, such as walls or ceilings, allowing it to safely operate within an enclosed space. • Obstacle Detection (ODe): The drone can determine independent objects, such as obstacles beyond the bounds of the previously addressed Spatial Evaluation, but does not make a distinction between those objects. • Obstacle Distinction (ODi): The drone can identify distinct objects with independent properties or labels, e.g., identifying a target object and treating it differently from other objects or walls/floors in the environment.

Basic Navigation
Most of the solutions examined implement features in the category of basic navigation, which we describe as core navigation features for autonomous drones. The Basic Navigation features outlined below are tabulated in Table 2. • Autonomous Movement (AM): The drone has a navigation policy that allows it to fly without direct control from an operator; this policy can be represented in forms as simple as navigation commands such as "go forward" or as complex as a vector of steering angle and velocity in two dimensions that lie on the x-z plane. • Collision Avoidance (CA): The drone's navigation policy includes learned or sensed logic to assist in avoiding collision with non-distinct obstacles. • Auto Take-off/Landing (ATL): The drone is able to enact self-land and take-off routines based on information from its awareness of the environment; this includes determining a safe spot to land and a safe thrust vector to take off from.

Expanded Navigation
Expanded navigation covers elements of autonomy that we suggest are second-level navigation autonomy features, relative to those of Section 3.2, and will be addressed at a later stage than the core features of basic navigation. These features would increase the operational capacity of a drone autonomy project that already covers some features of basic navigation; the following features are tabulated in Table 3. •

Path Generation (PG):
The drone attempts to generate or optimize a pathway to a given location, the application of the generated pathway can vary depending on the goal of the project (e.g., pathways for safety or pathways for efficiency). The implemented navigational policy makes use of full three-dimensional movement strategies enabling the drone to navigate above or below obstacles as well as around them.

Engineering
This group heading does not tie directly into Level 4 autonomous navigation, but captures additional challenges that apply to a portion of the covered research. It encompasses any feature that advances the robustness of drone physical implementation or addresses any common limitations related to drone hardware in the context of autonomous flight [7]. These feature mappings are visible in Table 4. •

On-Board Processing (OBO):
The drone does not rely on external computation for autonomous navigation. The on-board performance of navigation is performed with an efficiency comparable to an external system. • Extra Sensory (ES): The drone employs the use of sensors other than a camera and rotor movement information such as the RPM or thrust. The presence of this feature is not necessarily beneficial; however, the use of additional on-board sensors to aid in autonomous navigation may be worth the weight penalty and computational trade-off. • Signal Independent (SI): Drone movement policies do not rely on streamed information such as global position from a wireless/satellite network or other subsystems. This is likely to be a limiting factor, as such a feature may greatly improve the precision of an autonomous system.  Figure 4 indicates the focus of functional features in the research space based on how the relative frequency of features appearing in the research pool. This is a potentially useful indicator of which areas are lacking in research attention, versus research areas that are heavily covered. This information is discussed in detail in Section 4.

Discussion
Through analysis of the results across the feature headers, and the comparative results between the papers in the research pool, it is shown that there are areas which are significantly more developed in the current research space. Conversely, this analysis also identifies underdeveloped areas where opportunity exists for further research.

Common Learning Models
Three particular Deep Learning models appear most frequently in the research pool in support of autonomous decision making. Firstly, "VGG-16" [40] is a CNN image classifier that has been trained on the "ImageNet" dataset [41] of over 14 million images matched to thousands of labels. VGG-16 supports wide-ranging image classification or can serve as a base for transfer learning with fine-tuning using images specific to a target drone environment. The majority of research works that adopt it or the object detection model "YoloV3" [42] in the research pool use it as a base for collision avoidance or object detection/distinction. The "ResNet" architecture [43] originates from a CNN-based paper discussing the optimisation of the "AlexNet" architecture [44] through the utilisation of residual layer "shortcuts" that can approximate the activity of entire neural layers. Similar to VGG-16, ResNet is trained on the ImageNet dataset. The benefit of ResNet's shortcuts architecture is a considerable reduction of processing overhead, resulting in efficient models with low response times but maintaining comparable accuracy. This is favourable for drone operations that require a low CPU overhead. "DroNet" is more specific to the area of autonomous drone navigation and applies manually labelled car and bicycle footage as training data for navigation in an urban environment. Outputs for DroNet from a single image are specific to the purposes of drone navigation, providing a steering angle, to keep the drone navigating while avoiding obstacles, and a collision probability, to let the UAV recognize dangerous situations and promptly react to them. As a purpose-built autonomous drone network, the DroNet work [22] is highly cited and used as a base network for several other papers in the research pool.

Areas of Concentrated Research Effort
The most common project archetype seen throughout the research pool follows DNNbased autonomous movement with a quad rotor drone trained from bespoke data [7] or transfer-learned from a pretrained network [25]. The most frequent focus of research work within the research pool was for basic autonomous movements. Though the quality of various implementations and methods of acquiring results differ, solutions trended towards the same structure of approximately 75-95% navigational accuracy inside the project's use case. Whilst this is a wide range of navigational accuracy achievement and exact tasks will differ across individual research works, the high levels of accuracy for DNN-based navigation policies indicate that they are effective in the environments that they are trained for. Most projects took the approach of reducing complexity either by not relying on subsystems such as GPS or network access, and/or by partially or fully focusing on optimising network efficiency for on-board operation. Most projects also avoided the use of any additional sensors, instead relying on a single camera system. No papers in the research pool considered the use of dual cameras for spatial awareness, which defied author expectations.

Areas of Opportunity
A surprising result from the comparative analysis shows that there were few research projects with the environmental distinction feature. Of those that do, no project attempted to distinguish explicitly between two or more environments. Several projects did test their given implementations in various environs [22,29,38], but did not qualify as addressing the environmental distinction feature, as their approach did not provide consideration for the differences in those environments to be represented in the solution itself. There is no architecture modification to consider different environments, and there are no datasets used in the research pool with distinct environment labels. This area is of considerable potential, as the recognition of different environments could drastically affect the accuracy and efficiency of the solution, and provides a level of transparency within autonomous navigation that may be necessary for future regulatory compliance. Certain papers, such as Rodriguez et al. [45], used an interesting approach to training datasets by training their model on simulated data. However, such an approach can result in a significant trade-off in accuracy under realistic test conditions. However, it was noted that the visual fidelity of such simulations was poor compared to what is achievable in modern rendering engines, and some reduction in this trade-off can be seen when simulations are run through modern video-game engines [46], such as the Unity or Unreal engines. It is pertinent to note that a drone-specific simulation software known as Gazebo has been used in some projects, which demonstrates the validity of simulation [47].

Issues
Most research works explain their approach to model training and testing, explaining the chosen ground truth, labels and descriptions of how the navigation system interfaces with the CNN model. One issue to highlight, however, is a lack of uniformity of metrics in the domain. Some papers evaluate their approach using environment-specific metrics, such as the number of successful laps [46] and performance at different speeds [23]. In the DNN research space, the inclusion of visual descriptions of architectures and evaluation results comparing similar architectural or function-level approaches is crucial to the explainability of the project. The use of research work-specific metrics, when displayed without connection to a more common metric such as accuracy, makes it difficult to compare the performance of autonomous navigation approaches across the domain.
Another typical issue found in the research pool is various computer and electronic engineering hurdles not attempted too be overcome, not addressed, or the solutions carefully designed to work inside the boundaries of such hurdles. This reduces the robustness of the implementation and potentially limits the use cases in which the solution can operate. Power consumption, data processing, latency, sensor design and communication are all areas affected by this issue. We suggest that drone autonomy research projects could benefit greatly from interdisciplinary interaction.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: