Hybrid Machine Learning and Reinforcement Learning Framework for Adaptive UAV Obstacle Avoidance

Skarka, Wojciech; Ashfaq, Rukhseena

doi:10.3390/aerospace11110870

Open AccessReview

Hybrid Machine Learning and Reinforcement Learning Framework for Adaptive UAV Obstacle Avoidance

by

Wojciech Skarka

^1,*

and

Rukhseena Ashfaq

^2,*

¹

Department of Fundamentals of Machinery Design, Silesian University of Technology, 44-100 Gliwice, Poland

²

Department of Mathematics, COMSATS University Islamabad, Lahore Campus, Lahore 54000, Pakistan

^*

Authors to whom correspondence should be addressed.

Aerospace 2024, 11(11), 870; https://doi.org/10.3390/aerospace11110870

Submission received: 23 August 2024 / Revised: 13 October 2024 / Accepted: 21 October 2024 / Published: 24 October 2024

(This article belongs to the Special Issue UAV System Modelling Design and Simulation)

Download

Browse Figures

Versions Notes

Abstract

This review explores the integration of machine learning (ML) and reinforcement learning (RL) techniques in enhancing the navigation and obstacle avoidance capabilities of Unmanned Aerial Vehicles (UAVs). Various RL algorithms are assessed for their effectiveness in teaching UAVs autonomous navigation, with a focus on state representation from UAV sensors and real-time environmental interaction. The review identifies the strengths and limitations of current methodologies and highlights gaps in the literature, proposing future research directions to advance UAV technology. Interdisciplinary approaches combining robotics, AI, and aeronautics are suggested to improve UAV performance in complex environments.

Keywords:

UAV obstacle avoidance; UAV reinforcement learning; UAV deep learning; machine learning; UAV path planning; deep reinforcement learning; machine learning for UAV obstacle avoidance

Graphical Abstract

1. Introduction

Drones, sometimes referred to as unmanned aerial vehicles, or UAVs, have become more and more widespread in a variety of industries. These include the following: surveillance, inspection of structures, carrying of goods, as well as search and rescue in case of disasters. Commonly known as drones, UAVs are a type of aircraft that has received much attention and been adopted very widely in most of its applications. The demographics of this utilization stretch over solutions as different as aerial imaging, inspection of structures and ridges, deliveries, and search and rescue [1]. Over the past few years, interest has been rising in the literature related to unmanned aerial vehicles (UAVs) [2,3]. UAVs are often referred to as unmanned aircraft, these are aircraft that can fly on their own and do not require onboard pilots [4]. One of the main characteristics of UAVs is their high mobility, ease of deployment and, as a rule, low maintenance requirements [5,6]. Moreover, UAVs can mount different sensors, which make them versatile for a wide range of crucial tasks [7]. UAVs are applicable in a wide range of fields, such as wildfire surveillance [8,9], public surveillance systems [10], searching or tracking objects [11], goods delivery [12], medical aid, search and rescue [13], emergency message relaying [14], and intelligent transportation. However, they lack optimal behavior in complex dynamic environments due to human interventions and restriction in their transmission by radio frequencies [4]. Out of all the infrastructures based on roots, UAVs are regarded to be one of the most auspicious solutions for advancing smart environments [15].

The timeline depicted in Figure 1 illustrates the dramatic rise in scholarly publications concerning UAV obstacle avoidance from 2012 to 2024. Initially, the research was quite limited, with only two publications in 2012, slowly growing to a dozen by 2016. This slow start highlights the early development phases of UAV technologies, where the focus was predominantly on basic flight operations rather than the more intricate capability of autonomously navigating obstacles. As UAV applications began expanding into more complex environments, the need for sophisticated navigation systems became clear, sparking increased research in machine learning and reinforcement learning. These fields of artificial intelligence (AI) have been significant in training UAVs to learn how to navigate and address the presence of obstacles based on data collected by the UAV sensors. Reinforcement learning (RL) has been very useful when it comes to training UAVs to make necessary and appropriate decisions within a timeline in dynamic scenarios. Since 2017, more researchers have addressed the problem of UAVs and, by 2023, 306 publications were published. This growth suggests the improvement in AI and computational resources, as well as the maneuver complexity in unmanned operation, which consists of both stationary and moving obstacles. Thus, the application of modern machine learning and reinforcement learning technologies has made it possible to create intelligent, learning UAV systems, enhancing a company’s capabilities in terms of the level of automation and safety in operations. However, one more fact should be mentioned: the conflict in Ukraine has ramped up the augmentation of many aspects of UAV technology. The need to cater for surveillance and identification, as well as concrete evaluation of impact in conflict regions has continued to revolutionize UAV development for both military and civilian importance. For example, demands like using UAVs in enemy territory, which are counter responsive to threats, such as electromagnetic attack and counter UAV, have led to progress in stealth technology, secure communication, and self-decision-making technologies. The incorporation of machine learning (ML) with RL has not only improved these systems but made them admissible to adversarial or dynamic environments. Current research is concerned with developing UAVs that would have swarm behavior, threat identification, and self-navigation features required in military operations. Such scrutiny and accelerated innovations are precipitated by the functional usage demands precipitated by conflicts, as demonstrated in Ukraine. This spike in research and development is a clear indication of the scientific community’s broader initiative to enhance AI technology in UAVs, ensuring they can withstand the demands of contemporary warfare and take a lead in civilian uses such as in disaster aid and environmental surveillance.

Objective of the Review

Various RL algorithms are employed to teach UAVs how to navigate their surroundings independently. RL is effective in addressing diverse challenges where the UAV essentially learns from experience, mimicking the decision-making processes of a human expert in the field. Through interaction with the environment, the UAV processes information about its current state, selects actions accordingly, and is rewarded for providing comments. State representation for decision-making is facilitated by data gathered from UAV cameras and sensors scanning the environment [16]. The main goals of this review article are varied and encompass giving readers a thorough grasp of the state of the art in UAV navigation and obstacle avoidance research utilizing ML and RL methods. The objectives include exploring and evaluating various RL algorithms used for teaching UAVs to navigate autonomously and comparing the effectiveness of different RL approaches in improving their navigation capabilities. Another key objective is to analyze how ML techniques can enhance UAV performance in complex and dynamic environments and to examine the integration of ML algorithms with UAV systems for real-time obstacle detection and avoidance.

In this review, we intend to summarize the published work in both UAV navigation and obstacle avoidance with an emphasis on identifying current approaches, including specific strengths and weaknesses in the existing literature. Moreover, this review aims to identify gaps in the current literature and areas where future research may be needed. We present the issues, then highlight some potential areas where these complex ML-based techniques can be applied for UAV navigation and suggest future research to further improve the existing architecture of UAV navigation systems. This review intends to give an overview of the current computer vision progress in UAV exploration and suggest new methodologies dependent on a blend of approaches with automatons, AI, and flight related research. The article offers a glimpse into the latest developments and upcoming possibilities in UAV navigation and obstacle avoidance through the application of ML and RL methods.

2. Search Methodology

Literature Search Strategy

We meticulously developed our literature search strategy to cover the broad scope of UAV navigation and obstacle avoidance, illustrated in Figure 2. We began by exploring a broad range of databases, including IEEE Xplore, Google Scholar, PubMed, Science Direct, and the ACM Digital Library, which allowed us to gather a wide variety of interdisciplinary research sources. We selected key search terms, such as “UAV”, “obstacle avoidance”, “reinforcement learning”, “deep reinforcement learning”, “DL”, “ML-Techniques”, “UAV path planning”, “UAV obstacle avoidance”, “AI drone navigation”, “drone collision”, “UAV navigation”, “collision avoidance UAV control”, “drones”, “autonomous flight”, “autonomous drones”, “aerial robots”, “Unmanned Aerial Vehicle”, and “machine learning”, to cover various aspects of UAV technology comprehensively. Initially, we screened each article based on its title and abstract to ensure relevance to our research goals.

In the second phase, illustrated in Figure 3, we honed our strategy by concentrating on the most frequently occurring keywords identified in our initial search. This detailed stage involved examining the frequency and context of these keywords within the articles. This analysis helped us pinpoint major research trends and key methodological approaches in UAV obstacle avoidance. Our two-phase approach not only made the vast amount of literature more manageable but revealed significant research gaps and new advancements in UAV technology. This method provided a strong foundation for our review, ensuring that readers gain a thorough understanding of both the breadth and depth of current research in this field.

3. Machine Learning for Enhanced UAV Obstacle Avoidance

3.1. Overview of Machine Learning

Through the optimization of obstacle avoidance tactics, machine learning is essential in allowing autonomous navigation in the context of UAVs. Alan Turing first proposed the idea of AI in 1950, which aligned with the initial rise of computers [17]. He developed the Turing test to differentiate between machines and humans using a set of questions and answers. If the computer could pass the test, it would be considered to have intelligence comparable to that of humans. After the Turing test, scientists began to explore AI, ultimately resulting in its formal recognition as an academic field in 1956 [18]. Systems can now learn from data and provide data-centric solutions thanks to machine learning. ML is extensively utilized on the internet and in distant settings [19]. The integration of UAVs and machine learning enables functionalities like image categorization and segmentation, drone path mapping, caching, scheduling, and surveillance [20]. ML technology allows UAVs to navigate independently by recognizing objects, modifying flight paths, and managing resources instantly. ML algorithms are also capable of spotting irregularities in data collected by UAV sensors, leading to prompt reactions to security issues and precise trend analysis [21]. The use of ML algorithms for real-time processing of UAV data allows for flexible decision-making and swift responses to dynamic conditions [22].

ML plays a critical role in this field by enabling the analysis and comprehension of a vast amount of data collected by UAV sensors in real time. Traditional control methods often face challenges in dynamic environments where UAVs encounter complex and unpredictable obstacles. Utilizing ML, UAVs can use previous experiences which resulted from different scenarios, improving navigation strategies for the next time, while in most cases the process will be done autonomously. In other words, it can be summarized that an improvement in the obstacle avoidance of a UAV requires ML. As depicted in Figure 4, the concurrent use of supervised, unsupervised, and reinforcement models show that such a framework offers a strong foundation for the practical implementation of intelligent UAV systems, which can learn to take flight without external guidance and adapt their operating parameters to existing environmental conditions. Integration of ML in the obstacle avoidance of the UAV has greatly enhanced the arena by enabling decision-making.

In Figure 4, to improve UAV obstacle avoidance, a variety of ML techniques have been applied, including RL, unsupervised learning (UL), and supervised learning (SL). To delve deeper into these methods, we will start with supervised learning. This method is widely employed in ML and entails training algorithms with labelled datasets, where each data point is associated with a particular outcome. This training aids UAVs in forecasting results or making decisions using new data. The subsequent section will elaborate on the distinct methods and benefits of employing SL in UAV obstacle avoidance.

3.2. Supervised Learning (SL)

SL is the dominant method in ML, where the agent learns from input–output pairs to create a function that maps inputs to outputs. The goal is to label the output based on the input, especially in classification scenarios where the output falls into different categories [23]. As one kind of learning paradigm, supervised learning (SL) has been the basis for enabling unmanned aerial vehicles to learn from labelled datasets, such as sensor readings and flight data, that can enhance its performance in optimal navigation and obstacle avoidance. ANNs can be trained for UAV sensor data to classify the shape, nature of obstacles, or whether it is a static dynamic that could help to reroute the flight path in real time. Deep Learning (DL), a subset of SL, enables UAVs to process massive datasets from realistic environments, making UAVs more efficient in terms of predicting and avoiding obstacles. For UAV applications which rely on fast decision-making, and where autonomous capabilities are required in dynamic environments, the design of such SL techniques is key. In order to provide comprehensive insight into these strategies, Table 1 summarizes the various SL approaches used for UAV obstacle avoidance and describes their fundamental purposes.

In the next section, we will examine the UL techniques crucial for identifying patterns and structures within data without predetermined labels.

3.3. Unsupervised Learning (UL)

UL is the branch of ML that handles inputs without labelled output or mapping function. In turn, the algorithm does not receive the correct answers to the problems with parameters but must independently search for the patterns and dependencies within the data. One area of ML which includes UL is that the system finds its own way of sorting data through pattern recognition; the most well-known of this is clustering, in which the data points are grouped in categories that share a set of features [57]. ML approaches and methods, specifically unsupervised ones, are demanded in improving the navigation for a UAV and its avoidance of obstacles. These methods are helpful because they enable UAVs to learn and act based on complex environments while the labeled dataset is very costly and time-consuming to collect.

Popular methods for working with obstacle data are K-means for data clustering, grouping data points that are similar, and DBSCAN, for identifying the data obstacles by point density. Multilevel obstacle cluster analysis is conducted with the help of hierarchical clustering. Moreover, preprocessing data is performed by PCA and t-SNE to make an effective visual representation of the data. Approaches such as Isolation Forest or Auto encoders for obstacle identification are utilized to find unexpected obstacles due to the changes in the obstacle pattern. These methods are vital so that UAVs do not run into new obstacles that cannot be anticipated from the training data set. Additionally, SOMs reduce the obstacle data into a format that makes spatial distribution analysis more manageable. Other notable algorithms include the GMM, Affinity Propagation, Mean Shift Clustering, Spectral Clustering, and ICA, all contributing to more efficient and effective UAV obstacle avoidance systems [58].

UL does not involve labelled data or specific target values for direct comparison. Instead, algorithms perform tasks such as clustering or pattern recognition on the dataset, extracting significant features and representations. The trained model then makes predictions or classifications based on these learned patterns.

Figure 5 depicts a visual overview of the discussed ML techniques, encompassing various SL, UL, and RL strategies utilized in UAV obstacle avoidance.

3.4. Challenges in Machine Learning Techniques and the Need for Reinforcement Learning

ML is an ever-evolving field that improves machine intelligence, allowing machines to outperform humans in various tasks. However, ML approaches often face challenges, such as overfitting and underfitting, especially in dynamic situations due to their static nature. In robotics, the main challenges revolve around autonomous learning and adaptation, leading to significant research and development efforts. Specifically regarding UAVs, the primary challenge is to achieve autonomy and efficient failure response. Furthermore, ML systems demand extensive data for learning complex tasks, making data collection a laborious and sometimes impractical process.

RL is adept at adapting and continuous learning, making it well-suited for tasks that involve intricate decision-making, such as operating autonomous vehicles. Because of individual feedback, RL can learn from minimal data with interactive experience. In this way, by properly regulating the levels of exploration and exploitation, RL can reveal high-value strategies that ML may not notice, thus providing flexible solutions essential in changeable environments.

4. Integrating Machine Learning and Reinforcement Learning for UAV Obstacle Avoidance

The development of proactive methods that may control multifaceted and ever-changing contexts is vital for the development of UAVs. RL has come out as a technique capable of arriving at decisions in an uncertain environment; it is an powerful constituent in the development of UAVs equipped with an understanding of the environment and the knowledge of how to move around and avoid obstacles. Figure 6 categorizes the use of deep reinforcement learning (DRL) in UAVs and outlines the three major operational categories: path planning, navigation, and control. The areas highlighted above need to be developed to advance UAV obstacle avoidance performance. In DRL, the path planning of going from one point to another is done dynamically by analyzing the changes in the environment and the presence of obstacles. In the navigation domain, DRL compiles sensory inputs to help UAVs steer through their areas of operation. The control is based on the usage of DRL to adjust and maintain the stability of drone flight and to provide proper reactions to interactions with the environment and changes in flight conditions. Incorporation of these DRL applications also increases the level of autonomy and accuracy of UAVs in elemental navigation and their ability to effectively avoid obstacles.

Figure 7 presents RL and ML being used to allow UAVs to naturally and seamlessly fly between urban constructions. Path planning, navigation and control systems are illustrated in the diagram to achieve mobility in complex areas filled with obstacles. Path planning acts as the fundamental component through which the UAV can constantly update its path to bypass hurdles like trees and buildings. Control of the flight of the UAV is necessary, which is possible with the help of sensors located in its body and GPS. The control part of the system allows the UAV to implement navigational choices while keeping the UAV stable and varying in response to the external environment.

This illustration also shows the need for well-developed control systems and precise planning of the paths that UAVs will follow in increasing the level of autonomy of these vehicles. This picture also emphasizes the importance of modern systems of observing the environment in efficient obstacle avoidance. These technologies combine to afford significant autonomy to UAVs and ensure they can easily switch between real time modification of environment and the charting of routes. Such a configuration creates the basis for further studies; thus, in the subsequent sections of this document, especially in Section 4.2, reserved for UAV Navigation, and Section 4.4, dedicated to UAV Control we go into detail about the mechanics of UAV navigation, strategies and methods of UAV control, and ways to improve the accuracy and adaptability of the UAV control in challenging conditions.

For the purposes of path planning, navigation and flight control, various approaches are intensively used. In addition to the increasingly popular AI methods, model-based design is classically used. The built numerical models describe the basic issues of aircraft flight, such as flight mechanics, and aerodynamics [65]; also important in the context of electric drives are power supply and electric drive models, commonly used energy sources, and energy storage [65], as well as, less frequent and more difficult to interpret photovoltaic [66,67] or fuel cells [68]. These models are used directly in navigation, path planning or control, or indirectly together with artificial intelligence techniques.

4.1. Path Planning with Reinforcement Learning

Path planning is one of the most relevant topics as it influences the extent of a vehicle’s autonomy most of the time. It not only pertains to the degree of speculation/autonomy, but has implications for built-in subroutines, directions, capabilities, and resilience. A crucial concept in vehicle navigation, path planning covers trip duration management, obstacle avoidance, and various other challenges. Research often utilizes reinforcement learning to address these tasks, particularly in UAVs operating in uncertain environments where precise mathematical models are unavailable. Hence, the integration of RL and DL enables vehicles to learn their routes autonomously. By employing sophisticated learning algorithms, UAVs can navigate through dynamic environments safely, mitigating the risk of collisions [69]. Indeed, RL algorithms have found extensive applications across various research domains associated with UAV technology. One study suggested integrating DRL with an LSTM network to streamline the learning algorithm’s convergence speed. Another approach, as described in [70], focused on object avoidance and target tracking by incorporating a reward function and penalty actions to attain a smoother trajectory. Further studies employed RL algorithms for precise attitude control [71]. RL has been applied in conjunction with various optimizers, including the grey wolf optimizer (GWO). The GWO is a metaheuristic optimization algorithm with a strong background derived from the leading structure and hunting behavior of grey wolves. Nevertheless, it is defined by an adaptable bionic framework that allows for its context-based usage [72]. There is an algorithm for UAV path planning which possesses the ability to create a proper and efficient path for the UAV. This approach integrates the capabilities of RL and the GWO to improve four essential processes: exploration, geometric convergence, exploitation, and optimal course alteration [73]. The typical strategies of mitigating obstacles while working with large sets of data typically rely on computer science methods like the Simultaneous two methods, which are Structure from Motion (SFM) and Localization and Mapping (SLAM) [74]. An important aspect of the SLAM algorithm is the building of the map of the surroundings of the area in the near vicinity of the UAV and, at the same time, updating of the UAV’s position and route planning, based on information from various sensors. Another method of obstacle detection is to detect an object’s characteristics with the help of a depth camera. This work employs a novel DRL algorithm that combines imitation learning and DRL to train the data [75]. Moreover, in reference [76], a new learning algorithm was proposed to build a network structure with special attention to the IoT devices. This method entailed training a DDQN network alongside other aspects such as the available flight time of the autonomous drone and the positioning and number of sensors to be deployed. Thus, utilizing this network, the UAV can modify its actions as well as make navigation decisions when necessary. SFM then utilizes optical flow sensors to create a depth map as well as structure of the surroundings. Both SFM and SLAM require a planned path for the UAV to navigate. Therefore, the UAV must first stop to survey the area and plan the path it will follow [77]. These techniques are not good for real-time obstacle avoidance applications. Despite attempts to improve SLAM for immediate use, they do not address non-stationary obstacles with unpredictable motion nor identify unfeatured walls commonly present in indoor settings [27]. In the field of obstacle avoidance, the authors of [78] introduced a self-trained UAV capable of navigating around both stationary and moving obstacles in a 3D urban setting. A deep deterministic policy gradient (DDPG)-based learning method was used in the study to provide guidance to the vehicle towards its destination while minimizing the risk of collisions. This algorithm incorporates a reward mechanism aimed at reducing the distance between the UAV and its target, along with penalties for colliding with obstacles. Another approach treats obstacle avoidance as a model-free RL problem, suitable for environments with incomplete prior knowledge. This method employs DRL to train UAVs for indoor navigation in 3D hallway environments, using a diverse dataset of images capturing various environmental conditions. However, despite extensive training, the accuracy of the resulting deep Q-network may be insufficient. Furthermore, this method fails to mimic human learning behavior effectively.

Consequently, researchers have shifted focus towards studying human behavior in obstacle avoidance. Humans rely on memory to store and recall relevant information, guiding decision-making in diverse scenarios. Similarly, UAVs facing partial observability issues require memory-like mechanisms for real-time navigation planning [27]. In another investigation, the fusion of object avoidance to reach the end target via the shortest path and minimum time interval was explored. Flight path learning employed Q-learning algorithms, similar to prior studies, necessitating computational capacity for path planning while evading random obstacles. To ensure calculation speed, a dedicated unit handled computations, with the acquired results then implemented to the UAV. The simulation involved a basic indoor environment with walls, obstacles, start and target positions, each endowed with distinct reward values. Multiple maps of varying sizes and obstacle densities were generated accordingly [35]. The methodology employs a temporal attention deep recurrent Q-network to optimize the expected total of discounted rewards and enable effective object avoidance in novel environments through RL. Q-learning guides action selection by maximizing reward values and ensuring optimal performance based on predefined policies [79]. A study similar to [69] employed an RL model to enhance UAV performance by optimizing a position controller. To determine the next course of action for the future state, this model used the vehicle’s present position and its learning framework. The resulting position was then fed into the position controller, which instructed the propellers controller to generate enough thrust force to move the vehicle to its intended new position. A new methodology has been suggested, exploring different obstacles faced by UAVs. When sharing mission-related data online, it suggests using an interference-aware path planning technique to minimize interference with ground maps and lower wireless transmission delay [80]. The challenge is framed as a dynamic non-cooperative game, where the UAV takes on the role of the user. By employing DRL and utilizing echo state network (ESN) cells, trajectories for UAV navigation are optimized in relation to the ground network and other UAVs. This approach allows users to make decisions based on previous rewards, such as location and transmission power level. DRL is increasingly used in UAV path planning, enabling real-time adjustments to changing environments. Chao et al. (2019) introduced a real-time path planning strategy that considered potential threats from enemies, particularly crucial for anti-terrorism operations where self-preservation is vital in the face of possible radar detection and missile threats [81]. A simulated environment is created, incorporating adversary elements using the STAGE Scenario software tool, which features a comprehensive database including battlefield entities like radar and missiles [82]. A model is developed to assess the situation by utilizing information on the positions of vehicles and enemy units, which consist of radar and surface-to-air missiles. When the UAV is detected within radar range, calculations are performed to determine the distance between the enemy and the vehicle, as well as the maximum killing zone radius of the missile. The vehicle is trained with a dueling double deep Q-network (D3QN) algorithm for course planning. At the start of each training episode, the initial parameters of the network are established, and the UAV is placed in the STAGE Scenario. The situation map is then created, based on the assessment model, indicating the current state. An action is chosen and executed accordingly. Following the action’s completion, fresh data is added to the scenario map and an instant reward value is noted. The most recent version of the data is kept in memory, replacing the previous one [83].

4.2. Reinforcement Learning for UAV Navigation

In recent efforts to achieve optimal control for UAVs, DL algorithms have gained significant attention. An experimental study was conducted using various reinforcement learning techniques, categorizing them into two primary groups: discrete action space and continuous action space. The study begins by comparing different methods for drone navigation, including SL, UL, and RL, while highlighting the pros and cons of each approach. RL involves training the UAV to navigate and avoid obstacles through trial and error, allowing it to learn autonomously once the training environment is set up [84]. Initially, research introduced an approach for model-free DRL using convolutional neural networks [85]. While most UAV research focuses on simulation platforms for training and optimizing algorithms, the study aims to apply RL algorithms on actual hardware to achieve optimal path tracking in unknown environments. It addresses the key challenge of guiding the UAV along an optimal path by determining actions based on its current state (SK) and thrust force (τ). A conventional PID controller is proposed for stable navigation, using learning parameters and gain values as inputs. The controller adjusts the UAV’s next state based on accumulated rewards [86]. The paper progresses by applying RL, deriving algorithms for sequential decision-making in which the agent moves through a visual world in discrete time intervals. These equations explain how actions are chosen to maximize a reward signal in both discrete and continuous action spaces. For discrete action space, the agent uses a policy-based approach, employing a deep Q-network (DQN) to compute state values from high-dimensional data like images. To overcome DQN challenges, a new algorithm called double dueling DQN (D3QN) is introduced, combining a double DQN and a dueling DQN, which shows strong performance in experiments by reducing correlation and improving state quality. The study uses Airtime, a simulation platform to produce a realistic scene, using Unreal Engine visuals. However, the simulation has simple paths for UAVs, with all trees on flat ground [59]. To address this, a new environment with various obstacles was created, using sensors and CNN inputs to find the best path for the drone. To ascertain behavior, several DQN algorithms were used for continuous action space learning: actor-critic with Kronecker-factored trust region (ACKTR), proximal policy optimization (PPO), deep deterministic policy gradients (DDPG), and trust region policy optimization (TRPO). Manual data labelling was replaced with a policy gradient technique combined with U-Net-based segmentation. The results showed D3QN’s superiority in discrete action space and ACKTR’s smooth trajectories in continuous action space. Expert pilots performed better in wooded areas, while ACKTR excelled in an arena setting. The study also explored optimizing drone delivery routes using RL algorithms, defining drone delivery as the path a drone takes to reach its destination while navigating various obstacles [87]. Since drones are used for outdoor applications such as surveillance and delivery, precise navigation cannot rely solely on an international satellite navigation system (GNSS) [60]. Outdoor drone tasks require obstacle avoidance, which can be achieved through artificial intelligence methods [69]. The study examines various DRL algorithms that enable drones to reach targets through optimal decision-making driven by reward values. It introduces conceptual DRL algorithms to address current delivery challenges and suggests innovative architectures to improve existing solutions. In this context, RL represents the decision-maker as the agent and its environment as the surroundings. The agent interacts with the environment, with each action corresponding to a state within the state-space model [48]. The first suggested algorithm is the DQN, which combines the Q-network with neural network techniques [88]. DQN relies heavily on CNN for action-value optimization, including periodic updates to align action values with target values, reducing correlation. Experience replay, inspired by biology, randomizes data to eliminate correlation among states. Double DQN addresses positive bias by separating actions. Experiments were conducted in a virtual environment mimicking reality for safety. The state-space consists of three states, each paired with a neural network: original CNN, 2D JNN, and modified JNN-3D for vertical movement. Training each model took about 40 h, aiming to improve efficiency through checkpoints. Based on the findings, employing checkpoints led to a rise in successful action episodes and a decrease in failed ones. JNN achieved success in fewer episodes than CNN, with higher reward values and faster stabilization [89]. Another evolving use for UAVs is in search and rescue (SAR) operations, requiring UAV navigation through unknown terrains filled with dangers and obstacles [90]. Using UAVs in this field has spared human rescuers from entering hazardous environments and reduced the risks associated with sending humans on such missions [91].

Starting in 2020, studies began addressing the challenge of dynamic obstacles encountered by UAVs during flight, especially in outdoor environments. A study investigated how ML and RL approaches might be used to improve UAV navigation and obstacle avoidance. With an emphasis on state representation from UAV sensors and real-time environmental interaction, the efficacy of various RL algorithms in teaching UAVs autonomous navigation is evaluated. In addition to pointing out gaps in the literature and highlighting the advantages and disadvantages of present approaches, the study suggests future research avenues to develop UAV technology. A taxonomy of these artificial intelligence systems for UAV navigation is provided in Figure 8, which illustrates the various methodologies and their applications. These include learning-based approaches like DRL, asynchronous advantage actor-critic (A3C), partially observable Markov decision process (POMDP), and GWO, as well as optimization-based approaches like differential evolution (DE), genetic algorithm (GA), and particle swarm optimization (PSO). Table 2 illustrates optimization-based AI approaches.

4.3. Learning-Based AI Approaches

4.3.1. Reinforcement Learning

RL stands as a potent and extensively applied AI methodology, acquiring knowledge about its surroundings through action and strategy evaluation. At its core, RL comprises an agent and an environment, wherein the agent, employing the Markov decision process (MDP), engages with the environment, discerning optimal actions to pursue [118]. This technique involves an agent learning through interaction with a real or simulated environment. Periodically, the agent makes decisions, observes outcomes, and receives feedback, thereby improving its task performance over time. RL finds applications in autonomous navigation, where robots adjust strategies autonomously to achieve optimal policies. In essence, RL algorithms learn from past actions to solve specific problems.

Table 3, “RL Techniques and Their Applications in UAV Navigation and Obstacle Avoidance”, provides an overview of the different RL approaches and how they are used in UAV navigation and obstacle avoidance. In the context of UAV operations, this table offers a thorough summary of several RL algorithms, their traits, and their particular objectives.

4.3.2. Deep Reinforcement Learning (DRL)

To create solutions that become better with practice, DRL integrates ideas from RL with DL. This approach relies on repeated iterations and the assessment of a reward function to determine the most effective behavior for an agent. DRL approaches may be broadly classified into three categories: model-based, policy-based, and value-based. The goal of value-based RL is to maximize a value function for long-term rewards over a sequence of activities. Policy-based RL focuses on finding the policy that optimizes the objective function, with deterministic and stochastic approaches. Model-based RL either provides the agent with a predefined environmental model or requires the agent to learn the model to complete tasks within that environment.

An overview of the several DRL methods for obstacle avoidance and UAV navigation is given in Table 4 below, along with a description of each method’s unique objectives.

4.3.3. Deep Learning (DL)

Deep learning, especially through deep neural networks (DNN) in deep reinforcement learning (DRL), is extensively used for UAV navigation based on vision. Recent advancements in tasks such as object identification, localization, image segmentation, and depth perception enable the effective application of DNN methods. Researchers have successfully utilized these techniques to identify roads and streets in important routes and urban areas, aiming to enhance autonomy in self-driving vehicles [139]. DL allows UAVs to interpret large volumes of sensory input in real time, enabling rapid decision-making and adaptive learning. This is achieved by using sophisticated neural network topologies. Table 5 illustrates the several DL approaches used for obstacle avoidance and UAV navigation, each with unique features and objectives.

4.4. Reinforcement Learning for UAV Control

UAV control for specific tasks is complex due to dynamic conditions, energy efficiency needs, and vulnerability to disturbances. Classical control, based on linear dynamics, was insufficient, leading to the adoption of ML, particularly RL. DRL emerged to address the need for higher accuracy and robustness, enabling new control tasks in real time without predefined models. It supports various approaches, from attitude control to managing communication channels among multiple UAVs. DRL integrates well-known model-free methods like DDPG, TRPO, and PPO, each suited to specific applications based on their strengths and weaknesses. Utilizing the PPO algorithm, DRL was implemented for fixed-wing UAV control, addressing nonlinear dynamics and lateral-longitudinal coupling. Despite disturbances and varied conditions, PPO ensured stable flight and achieved the desired speed and angles. With low computation time, it’s suitable for UAV control. The critic network assesses behavior rewards for training, whereas the actor network calculates ideal conduct. The reward function is based on state distance from the desired one. Normalized data inputs were used to reduce computation time. However, successful outcomes were limited to simulations, casting doubt on real-world feasibility [150]. In tackling an issue with attitude control, different algorithms, such as DDGP, TRPO, PPO, and PID, were assessed. The findings leaned towards favoring the PPO algorithm, even though it shares similarities with TRPO. PPO’s benefits entail simplified implementation and adjustment. Simulation outcomes, evaluated using metrics like rising time and stability, indicate suitability for continuous tasks [58]. The hybrid flying system utilizes deep reinforcement learning (DRL), particularly the proximal policy optimization (PPO) algorithm, for immediate learning and avoiding collisions. It presents a versatile control approach suitable for multiple UAVs across diverse dynamics and scenarios. Integral error terms guarantee precise control without extra adjustments. The reward function in DRL evaluates energy efficiency, flight stability, and tracking accuracy. While hybrid UAVs present difficulties, flapping wing UAVs demand precise calculations and ongoing adaptation for efficient flight in varying conditions [151]. The goal of the project is to develop a model-free UAV control system by the use of deep reinforcement learning, especially the deep deterministic policy gradient (DDPG) method. Deformation control is managed through airfoil-mounted steering gears. Actions are chosen based on the current state, with random and deterministic policies guiding exploration and action evaluation. The reward function factors in angular orientation and linear position tracking [152].

The field of reinforcement learning is advancing quickly, with Robust-DDPG demonstrating better performance compared to DDPG in terms of convergence and stability. This improvement incorporates delayed learning, adversarial attack filtering, and mixed exploration to guarantee steady learning, resilient control, and swift convergence. Control parameters involve roll angle and linear speed, with the reward mechanism targeting effective goal attainment and collision avoidance [153]. The system ensures the accurate autonomous landing of a UAV on a designated platform using visual markers and camera input. Deep Q-learning networks (DQNs) are utilized, segmented into marker detection and vertical landing tasks. The adoption of double DQN architecture improves system alignment and complexity handling. Sparse rewards stemming from reliance on Markov decision processes are mitigated through “partitioned buffer replay”, which categorizes experience data to enhance decision-making [62]. Despite the potential shown by deep Q-networks in UAV control, they encounter limitations when tasked with image processing. Employing deep reinforcement learning, the UAV autonomously captures high-quality images of a person’s face while adjusting its height and position. The deep deterministic policy gradient (DDPG) outperforms conventional methods like PID control and 2D face recognition algorithms. The selection of rewards, based on control error influences both learning speed and tracking accuracy. The method demonstrates effectiveness even under discrete control commands, such as those encountered in simulation environments [154]. The DCNNP controls a swarm of five UAVs tasked with formation and defense using MARL. Centralized learning allows a single agent to coordinate behaviors. DCNNP surpasses random policy but exhibits slightly lower performance compared to perfect policy, particularly with fewer UAVs in surveillance tasks [63].

A new algorithm, DRL-EC3, built on an adjusted DDPG and actor-critic policy, seeks to lower energy usage in swarm UAVs while upholding task precision. The findings indicate the superiority of DRL-EC3 over random and greedy policies, particularly as UAV numbers rise [64]. The research emphasizes the significance of UAV channel allocation for effective task communication within swarms. DRL is preferred over Q-learning for its ability to manage large datasets efficiently. This approach significantly improves communication accuracy and speed among UAVs, marking a notable advancement in UAV control research [155].

A policy is an agent’s decision-making function (control strategy), which is a mapping from conditions to actions. There are two sorts of policies: on-policy and off-policy. Proximal policy optimization (PPO), trust region policy optimization (TRPO), and SARSA are examples of on-policy techniques that aim to assess or enhance the policy which is applied while making judgments. By contrast, policies other than the one used to create the data are evaluated or improved using off-policy approaches such as Q-learning, deep Q-network (DQN), and deep deterministic policy gradient (DDPG): see Figure 9. Without the need for a model, DRL provides a flexible framework for resolving these issues. It can also learn from mistakes and forecast an accurate answer based on future observations. RL algorithms are divided into two categories: model-based and model-free. Model-free RL algorithms select the best course of action based on the agent’s trial and error, whereas model-based RL algorithms attempt to select the best course of action based on the learned model of the environment. There are benefits and drawbacks to both model-free and model-based algorithms. Table 6 summarizes model-free RL techniques.

5. Future Directions and Challenges

Significant progress has been made in UAV technology, especially in the areas of autonomous navigation and obstacle avoidance. Still, there are several issues that require more study and improvement. One of the key challenges still facing us is autonomous and dependable navigation in complicated situations. Although UAV obstacle avoidance has greatly improved thanks to current ML and RL techniques, these algorithms perform worse in more complex and unstructured situations. For example, Figure 10 illustrates that RL algorithms perform better in urban settings than in forested and mountainous areas. Aggregated results based on a few cutting-edge studies are presented in Figure 10, Figure 11 and Figure 12 for obviation hindrances and navigating safely by UAV. The data was sourced from a review of papers published in the years from 2012 to 2024. The literature reviewed different methodologies including simulation and real-world experiments, which we summarized to showcase the trends and progression of reinforcement learning applications on UAVs.

This suggests that stronger algorithms that can manage a variety of unforeseen situations are required. Moreover, the success rate of RL algorithms declines with environmental complexity, underscoring the necessity of creating more sophisticated algorithms capable of adjusting to complicated settings, as seen in Figure 11.

UAV operations need real-time, high-accuracy, low-latency decision-making, particularly in areas where GPS is unavailable. To successfully satisfy these requirements, current methods must be improved. The trade-off between accuracy and latency for various RL algorithms shows that although some algorithms manage to strike a decent balance, others still have difficulty optimizing both parameters simultaneously. The latency and accuracy of RL algorithms illustrate Algorithm A’s superior balance, as shown in Figure 12.

In addition, UAVs have the potential to revolutionize precision agriculture and disaster relief, significantly enhancing operational efficiency in both sectors. AI-equipped UAVs can optimize crop monitoring and yield projections in precision agriculture, while also improving disaster management by providing real-time data and support during emergency response operations. The complexity of the environment rises, and this triggers the need for more sophisticated algorithms in learning across different environments, especially for exams in the RL. Another aspect that can increase decision-making speed in dynamic environments is the optimization of the UAV computational structures to accommodate real-time data-intensive computations. That is why we believe that in the context of the increasing popularity of UAVs, it is necessary to consider regulating issues and questions related to safety. The adoption of standard safety measures will lead to a decrease in accident occurrence frequencies and thus effective implementation of UAVs into the national airspaces. There is a need for cooperation between representatives of the industry, governmental bodies, and technological creators to design the required legislation to provide protection for citizens and their data.

An extensive review of the literature must be carried out to determine or assess the effectiveness of UAVs depending on the context of their use. Settling norms regarding response times, failure rates, and accuracy also set standards for the future evolution of technology. These situations mean that quantitative improvements and outcomes analysis are critical to addressing these challenges and progressing UAVs to fulfil the requirements of complex environments.

5.1. Comparative Analysis of Existing Studies

Finally, in other sections of this study, such as Section 4.3.3, we discussed in detail the various types of deep learning algorithms used in UAV obstacle avoidance; while in Section 5.1 we provide an overview of existing studies by comparing different methods to one another like dynamic obstacles and static obstacles avoidance techniques. This section addresses more the surveying and reviewing of previous works rather than the technical depth found in other sections like Section 4.3.3. As shown in Table 7, this portion of the research describes the various methods and findings from previous surveys connected to UAV obstacle identification and avoidance.

6. Conclusions

Integrating ML approaches as well as RL, the autonomous navigation and collision avoidance of UAVs has been a great enhancement. This article highlights a wide range of RL algorithms and applied ML solutions for autonomous UAV navigation. This paper investigates how these algorithms operate in coordination with UAV sensors for environ-mental assessment and real-time decision-making. This has been experimentally successful for allowing UAVs to make informed decisions quickly, through the development of RL algorithms. The usage of SL, UL, and RL types form a strong framework for intelligent UAV system design that can learn in different environmental conditions. This review focuses on how the combination of these approaches has been effective for improving UAV performance, especially in complex and dynamic environments. The results show these RL algorithms have the potential to solve a variety of autonomous navigation problems for UAVs. The fact is that these algorithms have been able to process information from the cameras and sensors of drones and choose from their data an optimal action by which they will be rewarded. If UAV systems are combined with ML methods, the efficiency and endurance of airborne navigation based on object presence can be substantially increased, e.g., gap closing in real-time visual detection to land promptly, so this will definitely improve the problem response time for obstacle avoidance.

The development of UAV depends on the enhancement of various factors to incorporate the multidisciplinary fields of robotics, AI, and Aeronautics. Thus, this assessment has identified shortcomings in the current body of literature and directions for further study to improve the reliability and effectiveness of UAV navigation systems. These two categories of learning will require constant improvement in the future in an effort to produce better and more efficient UAV operations. Therefore, combining ML with RL exhibits its impact in enhancing the UAV technology in the way of autonomous flying and avoiding the movements of objects. The effectiveness of various RL algorithms has been meticulously assessed, and the pros and cons of existing approaches have been acknowledged. For scholars and professionals in the field, this study provides a comprehensive overview of the current state of research in UAV navigation and obstacle avoidance, rendering it an invaluable resource.

Author Contributions

Conceptualization, W.S. and R.A.; methodology, W.S.; software, R.A.; validation, W.S. and R.A.; formal analysis, W.S. and R.A.; investigation, W.S. and R.A.; resources, R.A.; data curation, R.A.; writing—original draft preparation, R.A.; writing—review and editing, W.S.; visualization, R.A.; supervision, W.S.; project administration, W.S.; funding acquisition, W.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ACKTR	Actor-Critic using Kronecker-Factored Trust Region
A3C	Asynchronous Advantage Actor-Critic
ACO	Ant Colony Optimization
ADE	Artificial Differential Evolution
AOQPIO	Adaptive Oppositional Quantum-behaved Particle Swarm Optimization
A*	Search Algorithm
ACER	Actor-Critic with Experience Replay
AoL	Age of Information
BKR	Bayesian Kernel Regression
CNN	Convolutional Neural Network
CGAN	Conditional Generative Adversarial Network
CL	Contrastive Learning
CDE	Coupled Differential Evolution
CS	Cuckoo Search
DNN	Deep Neural Network
DBSCAN	Density-Based Spatial Clustering of Applications with Noise
DRL	Deep Reinforcement Learning
DDQN	Double Deep Q-Network
DDPG	Deep Deterministic Policy Gradient
DAC	Differential Algebraic Constraint
DCNN-GA	Deep Convolutional Neural Network with Genetic Algorithm
DPSO	Dynamic Particle Swarm Optimization
DEAKP	Differential Evolution with Adaptive Knowledge Propagation
DCNNP	Deep Convolutional Neural Network Processor
DRL-EC3	Deep Reinforcement Learning with Experience Cumulative Coefficient of Cooperation
ELM	Extreme Learning Machine
ESN	Echo State Network
EDDQN	Ensemble Deep Double Q-Network
ESCSO	Enhanced Semi-Continuous Scheduling Optimization
FNN	Feedforward Neural Network
FSR	Force Sensitive Resistor
FRDDM	Fuzzy Rough Dynamic Decision Making
G-FCNN	Gated Fully Convolutional Neural Network
GMM	Gaussian Mixture Model
GWO	Grey Wolf Optimizer
GNSS	Global Navigation Satellite System
GBPSO	Gaussian-Binary Particle Swarm Optimization
GA	Genetic Algorithm
GPS	Global Positioning System
HIER	Hierarchical Clustering
HR-MAGA	Hierarchical Representation-based Multi-Agent Genetic Algorithm
HSGWO-MSOS	Hybrid Shuffled Grey Wolf Optimizer with Modified Shuffle Operator Selection
ICA	Independent Component Analysis
IoT	Internet of Things
IDSIA	Dalle Molle Institute for Artificial Intelligence Research
IFDS	Iterative Flow Decomposition and Matching
IMUs	Inertial Measurement Units
JNN	Jump Neural Network
LSTM	Long Short-Term Memory
LiDAR	Light Detection and Ranging
LwH	Length-width-height
MARL	Multi-Agent Reinforcement Learning
MBO	Model-Based Optimization
MIMO	Multiple Input Multiple Output
MPP	Maximum Power Point
MRFT	Model Reference Adaptive Control with Tuning
NNP	Neural Network Pipeline
NAF	Normalized Advantage Function
OCGA	Optimized Cellular Genetic Algorithm
OBIA	Object-Based Image Analysis
PCA	Principal Component Analysis
PID	Proportional-Integral-Derivative
PPO	Proximal Policy Optimization
POMDP	Partially Observable Markov Decision Process
PFACO	Parallel Fuzzy Ant Colony Optimization
QoE	Quality of Experience
RDPG	Recurrent Deterministic Policy Gradient
RSS	Received Signal Strength
RWNN	Random Weight Neural Network
RBF	Radial Basis Function
RNN	Recurrent Neural Network
RL	Reinforcement Learning
SL	Supervised Learning
SVR	Support Vector Regression
SVM	Support Vector Machine
SOM	Self-Organizing Map
SLAM	Simultaneous Localization and Mapping
SFM	Structure from Motion
STAGE	Structured Temporal Attention Graph Ensemble
SAR	Synthetic Aperture Radar
SK	Sensitivity Kernel
SaR	Search and Rescue
SARSA	State-Action-Reward-State-Action
SAC	Soft Actor-Critic
SA-ACO	Simulated Annealing based Ant Colony Optimization
TRPO	Trust Region Policy Optimization
T-SNE	t-Distributed Stochastic Neighbor Embedding
UAV	Unmanned Aerial Vehicles
UL	Unsupervised Learning
UAV-BS	Unmanned Aerial Vehicle Base Station
WPCN	Wireless Powered Communication Network

References

Savkin, A.V.; Huang, H. Asymptotically optimal deployment of drones for surveillance and monitoring. Sensors 2019, 19, 2068. [Google Scholar] [CrossRef] [PubMed]
Boucher, P. Domesticating the drone: The demilitarisation of unmanned aircraft for civil markets. Sci. Eng. Ethics 2015, 21, 1393–1412. [Google Scholar] [CrossRef] [PubMed]
Clarke, R. Understanding the drone epidemic. Comput. Law Secur. Rev. 2014, 30, 230–246. [Google Scholar] [CrossRef]
Lu, Y.; Xue, Z.; Xia, G.-S.; Zhang, L. A survey on vision-based UAV navigation. Geo-Spat. Inf. Sci. 2018, 21, 21–32. [Google Scholar] [CrossRef]
Grippa, P.; Behrens, D.; Bettstetter, C.; Wall, F. Job selection in a network of autonomous UAVs for delivery of goods. arXiv 2016, arXiv:1604.04180. [Google Scholar]
Huang, Z.; Chen, C.; Pan, M. Multiobjective UAV path planning for emergency information collection and transmission. IEEE Internet Things J. 2020, 7, 6993–7009. [Google Scholar] [CrossRef]
Liu, M.; Yang, J.; Gui, G. DSF-NOMA: UAV-assisted emergency communication technology in a heterogeneous Internet of Things. IEEE Internet Things J. 2019, 6, 5508–5519. [Google Scholar] [CrossRef]
Arafat, M.Y.; Moh, S. Bio-inspired approaches for energy-efficient localization and clustering in UAV networks for monitoring wildfires in remote areas. IEEE Access 2021, 9, 18649–18669. [Google Scholar] [CrossRef]
Bushnaq, O.M.; Chaaban, A.; Al-Naffouri, T.Y. The role of UAV-IoT networks in future wildfire detection. IEEE Internet Things J. 2021, 8, 16984–16999. [Google Scholar] [CrossRef]
De Moraes, R.S.; De Freitas, E.P. Multi-UAV based crowd monitoring system. IEEE Trans. Aerosp. Electron. Syst. 2019, 56, 1332–1345. [Google Scholar] [CrossRef]
Wan, M.; Gu, G.; Qian, W.; Ren, K.; Maldague, X.; Chen, Q. Unmanned aerial vehicle video-based target tracking algorithm using sparse representation. IEEE Internet Things J. 2019, 6, 9689–9706. [Google Scholar] [CrossRef]
Chung, S.H.; Sah, B.; Lee, J. Optimization for drone and drone-truck combined operations: A review of the state of the art and future directions. Comput. Oper. Res. 2020, 123, 105004. [Google Scholar] [CrossRef]
Wu, C.; Ju, B.; Wu, Y.; Lin, X.; Xiong, N.; Xu, G.; Li, H.; Liang, X. UAV autonomous target search based on deep reinforcement learning in complex disaster scene. IEEE Access 2019, 7, 117227–117245. [Google Scholar] [CrossRef]
Wang, B.; Sun, Y.; Sun, Z.; Nguyen, L.D.; Duong, T.Q. UAV-assisted emergency communications in social IoT: A dynamic hypergraph coloring approach. IEEE Internet Things J. 2020, 7, 7663–7677. [Google Scholar] [CrossRef]
Mohamed, N.; Al-Jaroodi, J.; Jawhar, I.; Idries, A.; Mohammed, F. Unmanned aerial vehicles applications in future smart cities. Technol. Forecast. Soc. Chang. 2020, 153, 119293. [Google Scholar] [CrossRef]
AlMahamid, F.; Grolinger, K. Autonomous unmanned aerial vehicle navigation using reinforcement learning: A systematic review. Eng. Appl. Artif. Intell. 2022, 115, 105321. [Google Scholar] [CrossRef]
Kim, D.K.; Chen, T. Deep neural network for real-time autonomous indoor navigation. arXiv 2015, arXiv:1511.04668. [Google Scholar]
Crevier, D. AI: The Tumultuous History of the Search for Artificial Intelligence; Basic Books, Inc.: New Tork, NY, USA, 1993. [Google Scholar]
Hu, S.; Chen, X.; Ni, W.; Hossain, E.; Wang, X. Distributed machine learning for wireless communication networks: Techniques, architectures, and applications. IEEE Commun. Surv. Tutor. 2021, 23, 1458–1493. [Google Scholar] [CrossRef]
Pajares, G. Overview and current status of remote sensing applications based on unmanned aerial vehicles (UAVs). Photogramm. Eng. Remote Sens. 2015, 81, 281–330. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Bojarski, M.; Del Testa, D.; Dworakowski, D.; Firner, B.; Flepp, B.; Goyal, P.; Jackel, L.D.; Monfort, M.; Muller, U.; Zhang, J.; et al. End to end learning for self-driving cars. arXiv 2016, arXiv:1604.07316. [Google Scholar]
Barber, D. Bayesian Reasoning and Machine Learning; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
Kurtz, V.; Lin, H. Toward verifiable real-time obstacle motion prediction for dynamic collision avoidance. In Proceedings of the 2019 American Control Conference (ACC), Philadelphia, PA, USA, 10–12 July 2019; pp. 2633–2638. [Google Scholar]
Pedro, D.; Matos-Carvalho, J.P.; Fonseca, J.M.; Mora, A. Collision avoidance on unmanned aerial vehicles using neural network pipelines and flow clustering techniques. Remote Sens. 2021, 13, 2643. [Google Scholar] [CrossRef]
Chen, J.; Zhou, Y.; Lv, Q.; Deveerasetty, K.K.; Dike, H.U. A review of autonomous obstacle avoidance technology for multi-rotor UAVs. In Proceedings of the 2018 IEEE International Conference on Information and Automation (ICIA), Wuyishan, China, 11–13 August 2018; pp. 244–249. [Google Scholar]
Singla, A.; Padakandla, S.; Bhatnagar, S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge. IEEE Trans. Intell. Transp. Syst. 2019, 22, 107–118. [Google Scholar] [CrossRef]
Zhang, Z.; Xiong, M.; Xiong, H. Monocular depth estimation for UAV obstacle avoidance. In Proceedings of the 2019 4th International Conference on Cloud Computing and Internet of Things (CCIOT), Changchun, China, 6–7 December 2019; pp. 43–47. [Google Scholar]
Wang, C.; Liang, X.; Zhang, S.; Shen, C. Motion parallax estimation for ultra low altitude obstacle avoidance. In Proceedings of the 2019 IEEE International Conference on Unmanned Systems (ICUS), Beijing, China, 17–19 October 2019; pp. 463–468. [Google Scholar]
Yijing, Z.; Zheng, Z.; Xiaoyi, Z.; Yang, L. Q learning algorithm based UAV path learning and obstacle avoidence approach. In Proceedings of the 2017 36th Chinese Control Conference (CCC), Dalian, China, 26–28 July 2017; pp. 3397–3402. [Google Scholar]
Kurdi, M.M.; Dadykin, A.; ElZein, I.; Ahmad, I.S. Proposed system of artificial Neural Network for positioning and navigation of UAV-UGV. In Proceedings of the 2018 Electric Electronics, Computer Science, Biomedical Engineerings’ Meeting (EBBT), Istanbul, Turkey, 18–19 April 2018; pp. 1–6. [Google Scholar]
Liang, X.; Wang, H.; Li, D.; Liu, C. Three-dimensional path planning for unmanned aerial vehicles based on fluid flow. In Proceedings of the 2014 IEEE Aerospace Conference, Big Sky, MT, USA, 1–8 March 2014; pp. 1–13. [Google Scholar]
Han, X.; Wang, J.; Xue, J.; Zhang, Q. Intelligent decision-making for 3-dimensional dynamic obstacle avoidance of UAV based on deep reinforcement learning. In Proceedings of the 2019 11th International Conference on Wireless Communications and Signal Processing (WCSP), Xi’an, China, 23–25 October 2019; pp. 1–6. [Google Scholar]
Efe, M. Neural Network Assisted Computationally Simple PI^λD^μ Control of a Quadrotor UAV. IEEE Trans. Ind. Inform. 2011, 7, 354–361. [Google Scholar] [CrossRef]
Gao, W.N.; Fan, J.L.; Li, Y.N. Research on neural network pid control algorithm for a quadrotor. Appl. Mech. Mater. 2015, 719–720, 346–351. [Google Scholar] [CrossRef]
Kurnaz, S.; Cetin, O.; Kaynak, O. Adaptive neuro-fuzzy inference system based autonomous flight control of un-manned air vehicles. Expert Syst. Appl. 2010, 37, 1229–1234. [Google Scholar] [CrossRef]
Chowdhary, G.V.; Johnson, E.N. Theory and flight-test validation of a concurrent-learning adaptive controller. J. Guid. Control Dyn. 2011, 34, 592–607. [Google Scholar] [CrossRef]
Chowdhary, G.; Wu, T.; Cutler, M.; How, J.P. Rapid transfer of controllers between UAVs using learning-based adaptive control. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 5409–5416. [Google Scholar]
Lin, C.-J.; Chin, C.-C. Prediction and identification using wavelet-based recurrent fuzzy neural networks. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2004, 34, 2144–2154. [Google Scholar] [CrossRef]
Lin, C.-M.; Tai, C.-F.; Chung, C.-C. Intelligent control system design for UAV using a recurrent wavelet neural network. Neural Comput. Appl. 2014, 24, 487–496. [Google Scholar] [CrossRef]
Punjani, A.; Abbeel, P. Deep learning helicopter dynamics models. In Proceedings of the 2015 IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA, 26–30 May 2015; pp. 3223–3230. [Google Scholar]
Bansal, S.; Akametalu, A.K.; Jiang, F.J.; Laine, F.; Tomlin, C.J. Learning quadrotor dynamics using neural network for flight control. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 4653–4660. [Google Scholar]
Shin, J.; Kim, H.J.; Kim, Y. Adaptive support vector regression for UAV flight control. Neural Netw. 2011, 24, 109–120. [Google Scholar] [CrossRef]
Kan, E.M.; Lim, M.H.; Ong, Y.S.; Tan, A.H.; Yeo, S.P. Extreme learning machine terrain-based navigation for un-manned aerial vehicles. Neural Comput. Appl. 2013, 22, 469–477. [Google Scholar] [CrossRef]
Loquercio, A.; Maqueda, A.I.; Del-Blanco, C.R.; Scaramuzza, D. Dronet: Learning to fly by driving. IEEE Robot. Autom. Lett. 2018, 3, 1088–1095. [Google Scholar] [CrossRef]
Ross, S.; Melik-Barkhudarov, N.; Shankar, K.S.; Wendel, A.; Dey, D.; Bagnell, J.A.; Hebert, M. Learning monocular reactive UAV control in cluttered natural environments. In Proceedings of the 2013 IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany, 6–10 May 2013; pp. 1765–1772. [Google Scholar]
Yang, S.; Konam, S.; Ma, C.; Rosenthal, S.; Veloso, M.; Scherer, S. Obstacle avoidance through deep networks based intermediate perception. arXiv 2017, arXiv:1704.08759. [Google Scholar]
Kahn, G.; Villaflor, A.; Pong, V.; Abbeel, P.; Levine, S. Uncertainty-aware reinforcement learning for collision avoidance. arXiv 2017, arXiv:1702.01182. [Google Scholar]
Zhang, T.; Kahn, G.; Levine, S.; Abbeel, P. Learning deep control policies for autonomous aerial vehicles with MPC-guided policy search. In Proceedings of the IEEE International Conference on Robotics and Automation, Stockholm, Sweden, 16–21 May 2016; pp. 528–535. [Google Scholar]
Cherian, A.; Andersh, J.; Morellas, V.; Papanikolopoulos, N.; Mettler, B. Autonomous altitude estimation of a UAV using a single onboard camera. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St. Louis, MO, USA, 10–15 October 2009; pp. 3900–3905. [Google Scholar]
Scherer, S.; Rehder, J.; Achar, S.; Cover, H.; Chambers, A.; Nuske, S.; Singh, S. River mapping from a flying robot: State estimation, river detection, and obstacle mapping. Auton. Robot. 2012, 33, 189–214. [Google Scholar] [CrossRef]
Guo, X.; Denman, S.; Fookes, C.; Mejias, L.; Sridharan, S. Automatic UAV forced landing site detection using machine learning. In Proceedings of the 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Wollongong, Australia, 25–27 November 2014; pp. 1–7. [Google Scholar]
Pérez-Ortiz, M.; Peña, J.M.; Gutiérrez, P.A.; Torres-Sánchez, J.; Hervás-Martínez, C.; López-Granados, F. Selecting patterns and features for between- and within- crop-row weed mapping using UAV-imagery. Expert Syst. Appl. 2016, 47, 85–94. [Google Scholar] [CrossRef]
Rebetez, J.; Satizábal, H.F.; Mota, M.; Noll, D.; Büchi, L.; Wendling, M.; Cannelle, B.; Perez-Uribe, A.; Burgos, S. Augmenting a convolutional neural network with local histograms: A case study in crop classification from high-resolution UAV imagery. In Proceedings of the ESANN 2016, European Symposium on Artifical Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, 27–29 April 2016. [Google Scholar]
Giusti, A.; Guzzi, J.; Ciresan, D.C.; He, F.-L.; Rodriguez, J.P.; Fontana, F.; Faessler, M.; Forster, C.; Schmidhuber, J.; Di Caro, G.; et al. A Machine Learning Approach to Visual Perception of Forest Trails for Mobile Robots. IEEE Robot. Autom. Lett. 2015, 1, 661–667. [Google Scholar] [CrossRef]
Smolyanskiy, N.; Kamenev, A.; Smith, J.; Birchfield, S. Toward low-flying autonomous MAV trail navi-gation using deep neural networks for environmental awareness. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 4241–4247. [Google Scholar]
Choi, S.Y.; Cha, D. Unmanned aerial vehicles using machine learning for autonomous flight; state-of-the-art. Adv. Robot. 2019, 33, 265–277. [Google Scholar] [CrossRef]
Barták, R.; Vomlelová, M. Using machine learning to identify activities of a flying drone from sensor readings. In Proceedings of the the Thirtieth International Flairs Conference, Marco Island, FL, USA, 22–24 May 2017. [Google Scholar]
Muñoz, G.; Barrado, C.; Çetin, E.; Salami, E. Deep reinforcement learning for drone delivery. Drones 2019, 3, 72. [Google Scholar] [CrossRef]
Pham, H.X.; La, H.M.; Feil-Seifer, D.; Van Nguyen, L. Reinforcement learning for autonomous UAV navigation using function approximation. In Proceedings of the 2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), Philadelphia, PA, USA, 6–8 August 2018; pp. 1–6. [Google Scholar]
Zhong, W.; Wang, X.; Liu, X.; Lin, Z.; Ali, F. Joint optimization of UAV communication connectivity and obstacle avoidance in urban environments using a double-map approach. EURASIP J. Adv. Signal Process. 2024, 2024, 35. [Google Scholar] [CrossRef]
Polvara, R.; Patacchiola, M.; Sharma, S.; Wan, J.; Manning, A.; Sutton, R.; Cangelosi, A. Toward end-to-end control for UAV autonomous landing via deep reinforcement learning. In Proceedings of the 2018 International Conference on Unmanned Aircraft Systems (ICUAS), Dallas, TX, USA, 12–15 June 2018; pp. 115–123. [Google Scholar]
Tožička, J.; Szulyovszky, B.; de Chambrier, G.; Sarwal, V.; Wani, U.; Gribulis, M. Application of deep reinforcement learning to UAV fleet control. In Intelligent Systems and Applications: Proceedings of the 2018 Intelligent Systems Conference (IntelliSys); Springer International Publishing: Cham, Switzerland, 2019; Volume 2, pp. 1169–1177. [Google Scholar]
Liu, C.H.; Chen, Z.; Tang, J.; Xu, J.; Piao, C. Energy-efficient UAV control for effective and fair communication coverage: A deep reinforcement learning approach. IEEE J. Sel. Areas Commun. 2018, 36, 2059–2070. [Google Scholar] [CrossRef]
Peciak, M.; Skarka, W. Assessment of the Potential of Electric Propulsion for General Aviation Using Model-Based System Engineering (MBSE) Methodology. Aerospace 2022, 9, 74. [Google Scholar] [CrossRef]
Mateja, K.; Skarka, W.; Peciak, M.; Niestrój, R.; Gude, M. Energy Autonomy Simulation Model of Solar Powered UAV. Energies 2023, 16, 479. [Google Scholar] [CrossRef]
Peciak, M.; Skarka, W.; Mateja, K.; Gude, M. Impact Analysis of Solar Cells on Vertical Take-Off and Landing (VTOL) Fixed-Wing UAV. Aerospace 2023, 10, 247. [Google Scholar] [CrossRef]
Tyczka, M.; Skarka, W. Optimisation of Operational Parameters Based on Simulation Numerical Model of Hydrogen Fuel Cell Stack Used for Electric Car Drive. In Transdisciplinary Engineering: Crossing Boundaries; IOS Press: Amsterdam, The Netherlands, 2016; pp. 622–631. [Google Scholar]
Pham, H.X.; La, H.M.; Feil-Seifer, D.; Nguyen, L.V. Autonomous uav navigation using reinforcement learning. arXiv 2018, arXiv:1801.05086. [Google Scholar]
Li, B.; Wu, Y. Path planning for UAV ground target tracking via deep reinforcement learning. IEEE Access 2020, 8, 29064–29074. [Google Scholar] [CrossRef]
Koch, W.; Mancuso, R.; West, R.; Bestavros, A. Reinforcement learning for UAV attitude control. ACM Trans. Cyber-Physical Syst. 2019, 3, 1–21. [Google Scholar] [CrossRef]
Nadimi-Shahraki, M.H.; Taghian, S.; Mirjalili, S.; Zamani, H.; Bahreininejad, A. GGWO: Gaze cues learning-based grey wolf optimizer and its applications for solving engineering problems. J. Comput. Sci. 2022, 61, 101636. [Google Scholar] [CrossRef]
Qu, C.; Gai, W.; Zhong, M.; Zhang, J. A novel reinforcement learning based grey wolf optimizer algorithm for un-manned aerial vehicles (UAVs) path planning. Appl. Soft Comput. 2020, 89, 106099. [Google Scholar] [CrossRef]
Jiang, S.; Jiang, C.; Jiang, W. Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM tools. ISPRS J. Photogramm. Remote Sens. 2020, 167, 230–251. [Google Scholar] [CrossRef]
He, L.; Aouf, N.; Whidborne, J.F.; Song, B. Deep reinforcement learning based local planner for UAV obstacle avoidance using demonstration data. arXiv 2020, arXiv:2008.02521. [Google Scholar]
Bayerlein, H.; Theile, M.; Caccamo, M.; Gesbert, D. UAV path planning for wireless data harvesting: A deep reinforcement learning approach. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar]
Hasheminasab, S.M.; Zhou, T.; Habib, A. GNSS/INS-assisted structure from motion strategies for UAV-based imagery over mechanized agricultural fields. Remote Sens. 2020, 12, 351. [Google Scholar] [CrossRef]
Bouhamed, O.; Ghazzai, H.; Besbes, H.; Massoud, Y. Autonomous UAV navigation: A DDPG-based deep reinforcement learning approach. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; pp. 1–5. [Google Scholar]
Kim, I.; Shin, S.; Wu, J.; Kim, S.D.; Kim, C.G. Obstacle avoidance path planning for UAV using reinforcement learning under simulated environment. In Proceedings of the IASER 3rd International Conference on Electronics, Electrical Engineering, Computer Science, Okinawa, Japan, May 2017; pp. 34–36. [Google Scholar]
Challita, U.; Saad, W.; Bettstetter, C. Interference management for cellular-connected UAVs: A deep reinforcement learning approach. IEEE Trans. Wirel. Commun. 2019, 18, 2125–2140. [Google Scholar] [CrossRef]
Yan, C.; Xiang, X.; Wang, C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments. J. Intell. Robot. Syst. 2020, 98, 297–309. [Google Scholar] [CrossRef]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Wang, Y.-M.; Peng, D.-L. A simulation platform of multi-sensor multi-target track system based on STAGE. In Proceedings of the 2010 8th World Congress on Intelligent Control and Automation (WCICA 2010), Jinan, China, 7–9 July 2010; pp. 6975–6978. [Google Scholar]
Shin, S.-Y.; Kang, Y.-W.; Kim, Y.-G. Obstacle avoidance drone by deep reinforcement learning and its racing with human pilot. Appl. Sci. 2019, 9, 5571. [Google Scholar] [CrossRef]
Tai, L.; Liu, M. A robot exploration strategy based on q-learning network. In Proceedings of the 2016 IEEE International Conference on Real-Time Computing and Robotics (RCAR), Angkor Wat, Cambod, 6–10 June 2016; pp. 57–62. [Google Scholar]
Bamburry, D. Drones: Designed for product delivery. Des. Manag. Rev. 2015, 26, 40–48. [Google Scholar] [CrossRef]
Hii, M.S.Y.; Courtney, P.; Royall, P.G. An evaluation of the delivery of medicines using drones. Drones 2019, 3, 52. [Google Scholar] [CrossRef]
Altawy, R.; Youssef, A.M. Security, privacy, and safety aspects of civilian drones: A survey. ACM Trans. Cyber-Phys. Syst. 2016, 1, 1–25. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Liu, Y.; Nejat, G. Robotic urban search and rescue: A survey from the control perspective. J. Intell. Robot. Syst. 2013, 72, 147–165. [Google Scholar] [CrossRef]
Tomic, T.; Schmid, K.; Lutz, P.; Domel, A.; Kassecker, M.; Mair, E.; Grixa, I.L.; Ruess, F.; Suppa, M.; Burschka, D. Toward a Fully Autonomous UAV: Research Platform for Indoor and Outdoor Urban Search and Rescue. IEEE Robot. Autom. Mag. 2012, 19, 46–56. [Google Scholar] [CrossRef]
Jalal, L.D. Three-dimensional off-line path planning for unmanned aerial vehicle using modified particle swarm optimization. Int. J. Aerosp. Mech. Eng. 2015, 9, 1579–1583. [Google Scholar]
Hoang, V.T.; Phung, M.D.; Dinh, T.H.; Ha, Q.P. System architecture for real-time surface inspection using multiple UAVs. IEEE Syst. J. 2019, 14, 2925–2936. [Google Scholar] [CrossRef]
Huang, C.; Fei, J. UAV path planning based on particle swarm optimization with global best path competition. Int. J. Pattern Recognit. Artif. Intell. 2018, 32, 1859008. [Google Scholar] [CrossRef]
Cekmez, U.; Ozsiginan, M.; Sahingoz, O.K. Multi colony ant optimization for UAV path planning with obstacle avoidance. In Proceedings of the 2016 International Conference on Unmanned Aircraft Systems (ICUAS), Arlington, VA, USA, 7–10 June 2016; pp. 47–52. [Google Scholar]
Guan, Y.; Gao, M.; Bai, Y. Double-ant colony based UAV path planning algorithm. In Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; pp. 258–262. [Google Scholar]
Jin, Z.; Yan, B.; Ye, R. The flight navigation planning based on potential field ant colony algorithm. In Proceedings of the 2018 International Conference on Advanced Control, Automation and Artificial Intelligence (ACAAI 2018), Shenzhen, China, 21–22 January 2018; Atlantis Press: Dordrecht, The Netherlands, 2018. [Google Scholar]
Bagherian, M.; Alos, A. 3D UAV trajectory planning using evolutionary algorithms: A comparison study. Aeronaut. J. 2015, 119, 1271–1285. [Google Scholar] [CrossRef]
Tao, J.; Zhong, C.; Gao, L.; Deng, H. A study on path planning of unmanned aerial vehicle based on improved genetic algorithm. In Proceedings of the 2016 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China, 27–28 August 2016; Volume 2, pp. 392–395. [Google Scholar]
Yang, Q.; Liu, J.; Li, L. Path planning of UAVs under dynamic environment based on a hierarchical recursive multiagent genetic algorithm. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Gao, M.; Liu, Y.; Wei, P. Opposite and chaos searching genetic algorithm based for uav path planning. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 2364–2369. [Google Scholar]
Behnck, L.P.; Doering, D.; Pereira, C.E.; Rettberg, A. A modified simulated annealing algorithm for SUAVs path planning. IFAC-PapersOnLine 2015, 48, 63–68. [Google Scholar] [CrossRef]
Liu, K.; Zhang, M. Path planning based on simulated annealing ant colony algorithm. In Proceedings of the 2016 9th Inter-national Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 10–11 December 2016; Volume 2, pp. 461–466. [Google Scholar]
Xiao, S.; Tan, X.; Wang, J. A simulated annealing algorithm and grid map-based UAV coverage path planning method for 3D reconstruction. Electronics 2021, 10, 853. [Google Scholar] [CrossRef]
Ghambari, S.; Idoumghar, L.; Jourdan, L.; Lepagnot, J. A hybrid evolutionary algorithm for offline UAV path planning. In Artificial Evolution: 14th International Conference, Évolution Artificielle, EA 2019, Mulhouse, France, 29–30 October 2019, Revised Selected Papers 14; Springer International Publishing: Cham, Switzerland, 2020; pp. 205–218. [Google Scholar]
Yu, X.; Li, C.; Zhou, J.F. A constrained differential evolution algorithm to solve UAV path planning in disaster scenarios. Knowl.-Based Syst. 2020, 204, 106209. [Google Scholar] [CrossRef]
Yu, X.; Li, C.; Yen, G.G. A knee-guided differential evolution algorithm for unmanned aerial vehicle path planning in disaster management. Appl. Soft Comput. 2021, 98, 106857. [Google Scholar] [CrossRef]
Zhang, D.; Duan, H. Social-class pigeon-inspired optimization and time stamp segmentation for multi-UAV cooperative path planning. Neurocomputing 2018, 313, 229–246. [Google Scholar] [CrossRef]
Hu, C.; Xia, Y.; Zhang, J. Adaptive operator quantum-behaved pigeon-inspired optimization algorithm with application to UAV path planning. Algorithms 2018, 12, 3. [Google Scholar] [CrossRef]
Xie, C.; Zheng, H. Application of improved Cuckoo search algorithm to path planning unmanned aerial vehicle. In Intelligent Computing Theories and Application: 12th International Conference, ICIC 2016, Lanzhou, China, 2–5 August 2016, Proceedings, Part I 12; Springer International Publishing: Cham, Switzerland, 2016; pp. 722–729. [Google Scholar]
Hu, H.; Wu, Y.; Xu, J.; Sun, Q. Cuckoo search-based method for trajectory planning of quadrotor in an urban environment. Proc. Inst. Mech. Eng. Part G J. Aerosp. Eng. 2019, 233, 4571–4582. [Google Scholar] [CrossRef]
Sundar, K.; Misra, S.; Rathinam, S.; Sharma, R. Routing unmanned vehicles in GPS-denied environments. In Proceedings of the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), Miami, FL, USA, 13–16 June 2017; pp. 62–71. [Google Scholar]
Ghambari, S.; Lepagnot, J.; Jourdan, L.; Idoumghar, L. UAV path planning in the presence of static and dynamic obstacles. In Proceedings of the 2020 IEEE Symposium Series on Computational Intelligence (SSCI), Canberra, ACT, Australia, 1–4 December 2020; pp. 465–472. [Google Scholar]
Zhang, Z.; Wu, J.; Dai, J.; He, C. A novel real-time penetration path planning algorithm for stealth UAV in 3D complex dynamic environment. IEEE Access 2020, 8, 122757–122771. [Google Scholar] [CrossRef]
Qu, C.; Gai, W.; Zhang, J.; Zhong, M. A novel hybrid grey wolf optimizer algorithm for unmanned aerial vehicle (UAV) path planning. Knowl.-Based Syst. 2020, 194, 105530. [Google Scholar] [CrossRef]
Zhang, S.; Zhou, Y.; Li, Z.; Pan, W. Grey wolf optimizer for unmanned combat aerial vehicle path planning. Adv. Eng. Softw. 2016, 99, 121–136. [Google Scholar] [CrossRef]
Dewangan, R.K.; Shukla, A.; Godfrey, W.W. Three dimensional path planning using Grey wolf optimizer for UAVs. Appl. Intell. 2019, 49, 2201–2217. [Google Scholar] [CrossRef]
Ponsen, M.; Taylor, M.E.; Tuyls, K. Abstraction and generalization in reinforcement learning: A summary and framework. In International Workshop on Adaptive and Learning Agents; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–32. [Google Scholar]
Colonnese, S.; Cuomo, F.; Pagliari, G.; Chiaraviglio, L. Q-SQUARE: A Q-learning approach to provide a QoE aware UAV flight path in cellular networks. Ad. Hoc. Netw. 2019, 91, 101872. [Google Scholar] [CrossRef]
Chowdhury MM, U.; Erden, F.; Guvenc, I. RSS-based Q-learning for indoor UAV navigation. In Proceedings of the MILCOM 2019—2019 IEEE Military Communications Conference (MILCOM), Norfolk, VA, USA, 12–14 November 2019; pp. 121–126. [Google Scholar]
Zeng, Y.; Xu, X. Path design for cellular-connected UAV with reinforcement learning. In Proceedings of the GLOBECOM 2019—2019 IEEE Global Communications Conference, Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
Liu, X.; Liu, Y.; Chen, Y.; Hanzo, L. Trajectory design and power control for multi-UAV assisted wireless networks: A machine learning approach. IEEE Trans. Veh. Technol. 2019, 68, 7957–7969. [Google Scholar] [CrossRef]
Hu, J.; Zhang, H.; Song, L. Reinforcement learning for decentralized trajectory design in cellular UAV networks with sense-and-send protocol. IEEE Internet Things J. 2018, 6, 6177–6189. [Google Scholar] [CrossRef]
Liu, X.; Chen, M.; Yin, C. Optimized trajectory design in UAV based cellular networks for 3D users: A double Q-learning approach. J. Commun. Inf. Netw. 2019, 4, 24–32. [Google Scholar] [CrossRef]
Tu, G.-T.; Juang, J.-G. UAV path planning and obstacle avoidance based on reinforcement learning in 3d environments. Actuators 2023, 12, 57. [Google Scholar] [CrossRef]
Kalidas, A.P.; Joshua, C.J.; Quadir, A.; Basheer, S.; Mohan, S.; Sakri, S. Deep Reinforcement Learning for Vision-Based Navigation of UAVs in Avoiding Stationary and Mobile Obstacles. Drones 2023, 7, 245. [Google Scholar] [CrossRef]
Azzam, R.; Chehadeh, M.; Hay, O.A.; Humais, M.A.; Boiko, I.; Zweiri, Y. Learning-based navigation and collision avoidance through reinforcement for UAVs. IEEE Trans. Aerosp. Electron. Syst. 2023, 60, 2614–2628. [Google Scholar] [CrossRef]
Zeng, Y.; Xu, X.; Jin, S.; Zhang, R. Simultaneous navigation and radio mapping for cellular-connected UAV with deep reinforcement learning. IEEE Trans. Wirel. Commun. 2021, 20, 4205–4220. [Google Scholar] [CrossRef]
Huang, H.; Yang, Y.; Wang, H.; Ding, Z.; Sari, H.; Adachi, F. Deep reinforcement learning for UAV navigation through massive MIMO technique. IEEE Trans. Veh. Technol. 2019, 69, 1117–1121. [Google Scholar] [CrossRef]
Oubbati, O.S.; Atiquzzaman, M.; Baz, A.; Alhakami, H.; Ben-Othman, J. Dispatch of UAVs for urban vehicular networks: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2021, 70, 13174–13189. [Google Scholar] [CrossRef]
Oubbati, O.S.; Atiquzzaman, M.; Lakas, A.; Baz, A.; Alhakami, H.; Alhakami, W. Multi-UAV-enabled AoI-aware WPCN: A multi-agent reinforcement learning strategy. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communica-tions Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 10–13 May 2021; pp. 1–6. [Google Scholar]
Wang, C.; Wang, J.; Wang, J.; Zhang, X. Deep-reinforcement-learning-based autonomous UAV navigation with sparse rewards. IEEE Internet Things J. 2020, 7, 6180–6190. [Google Scholar] [CrossRef]
Theile, M.; Bayerlein, H.; Nai, R.; Gesbert, D.; Caccamo, M. UAV coverage path planning under varying power constraints using deep reinforcement learningg. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 24 October 2020–24 January 2021; pp. 1444–1449. [Google Scholar]
Chen, Y.; González-Prelcic, N.; Heath, R.W. Collision-free UAV navigation with a monocular camera using deep reinforcement learning. In Proceedings of the 2020 IEEE 30th international workshop on machine learning for signal processing (MLSP), Espoo, Finland, 21–24 September 2020; pp. 1–6. [Google Scholar]
Abedin, S.F.; Munir, S.; Tran, N.H.; Han, Z.; Hong, C.S. Data freshness and energy-efficient UAV navigation optimization: A deep reinforcement learning approach. IEEE Trans. Intell. Transp. Syst. 2020, 22, 5994–6006. [Google Scholar] [CrossRef]
Walker, O.; Vanegas, F.; Gonzalez, F.; Koenig, S. A deep reinforcement learning framework for UAV navigation in indoor environments. In Proceedings of the 2019 IEEE Aerospace Conference, Big Sky, MT, USA, 2–9 March 2019; pp. 1–14. [Google Scholar]
Maciel-Pearson, B.G.; Marchegiani, L.; Akcay, S.; Atapour-Abarghouei, A.; Garforth, J.; Breckon, T.P. Online deep reinforcement learning for autonomous UAV navigation and exploration of outdoor environments. arXiv 2019, arXiv:1912.05684. [Google Scholar]
Theile, M.; Bayerlein, H.; Nai, R.; Gesbert, D.; Caccamo, M. UAV path planning using global and local map information with deep reinforcement learning. In Proceedings of the 2021 20th International Conference on Advanced Robotics (ICAR), Ljubljana, Slovenia, 6–10 December 2021; pp. 539–546. [Google Scholar]
Wang, L.; Wang, K.; Pan, C.; Xu, W.; Aslam, N.; Nallanathan, A. Deep reinforcement learning based dynamic trajectory control for UAV-assisted mobile edge computing. IEEE Trans. Mob. Comput. 2021, 21, 3536–3550. [Google Scholar] [CrossRef]
Wang, L.; Wang, K.; Pan, C.; Xu, W.; Aslam, N.; Hanzo, L. Multi-agent deep reinforcement learning-based trajectory planning for multi-UAV assisted mobile edge computing. IEEE Trans. Cogn. Commun. Netw. 2020, 7, 73–84. [Google Scholar] [CrossRef]
Wang, C.; Wang, J.; Shen, Y.; Zhang, X. Autonomous navigation of UAVs in large-scale complex environments: A deep reinforcement learning approach. IEEE Trans. Veh. Technol. 2019, 68, 2124–2136. [Google Scholar] [CrossRef]
Liu, C.H.; Ma, X.; Gao, X.; Tang, J. Distributed energy-efficient multi-UAV navigation for long-term communication coverage by deep reinforcement learning. IEEE Trans. Mob. Comput. 2019, 19, 1274–1285. [Google Scholar] [CrossRef]
Gao, Y.; Ren, L.; Shi, T.; Xu, T.; Ding, J. Autonomous Obstacle Avoidance Algorithm for Unmanned Aerial Vehicles Based on Deep Reinforcement Learning. Eng. Lett. 2024, 32, 650–660. [Google Scholar]
Wang, F.; Zhu, X.; Zhou, Z.; Tang, Y. Deep-reinforcement-learning-based UAV autonomous navigation and collision avoidance in unknown environments. Chin. J. Aeronaut. 2024, 37, 237–257. [Google Scholar] [CrossRef]
Menfoukh, K.; Touba, M.M.; Khenfri, F.; Guettal, L. Optimized Convolutional Neural Network architecture for UAV navigation within unstructured trail. In Proceedings of the 2020 1st International Conference on Communications, Control Systems and Signal Processing (CCSSP), El Oued, Algeria, 16–17 May 2020; pp. 211–214. [Google Scholar]
Back, S.; Cho, G.; Oh, J.; Tran, X.-T.; Oh, H. Autonomous UAV trail navigation with obstacle avoidance using deep neural networks. J. Intell. Robot. Syst. 2020, 100, 1195–1211. [Google Scholar] [CrossRef]
Maciel-Pearson, B.G.; Carbonneau, P.; Breckon, T.P. Extending deep neural network trail navigation for unmanned aerial vehicle operation within the forest canopy. In Towards Autonomous Robotic Systems: 19th Annual Conference, TAROS 2018, Bristol, UK, 25–27 July 2018, Proceedings 19; Springer International Publishing: Cham, Switzerland, 2018; pp. 147–158. [Google Scholar]
Chhikara, P.; Tekchandani, R.; Kumar, N.; Chamola, V.; Guizani, M. DCNN-GA: A Deep Neural Net Architecture for Navigation of UAV in Indoor Environment. IEEE Internet Things J. 2020, 8, 4448–4460. [Google Scholar] [CrossRef]
Niu, Y.; Yan, X.; Wang, Y.; Niu, Y. 3D real-time dynamic path planning for UAV based on improved interfered fluid dynamical system and artificial neural network. Adv. Eng. Inform. 2024, 59, 102306. [Google Scholar] [CrossRef]
Bohn, E.; Coates, E.M.; Moe, S.; Johansen, T.A. Deep reinforcement learning attitude control of fixed-wing uavs using proximal policy optimization. In Proceedings of the 2019 International Conference on Unmanned Aircraft Systems (ICUAS), Atlanta, GA, USA, 11–14 June 2019; pp. 523–533. [Google Scholar]
Xu, J.; Du, T.; Foshey, M.; Li, B.; Zhu, B.; Schulz, A.; Matusik, W. Learning to fly: Computational controller design for hybrid uavs with reinforcement learning. ACM Trans. Graph. (TOG) 2019, 38, 1–12. [Google Scholar] [CrossRef]
Xu, D.; Hui, Z.; Liu, Y.; Chen, G. Morphing control of a new bionic morphing UAV with deep reinforcement learning. Aerosp. Sci. Technol. 2019, 92, 232–243. [Google Scholar] [CrossRef]
Wan, K.; Gao, X.; Hu, Z.; Wu, G. Robust motion control for UAV in dynamic uncertain environments using deep reinforcement learning. Remote Sens. 2020, 12, 640. [Google Scholar] [CrossRef]
Passalis, N.; Tefas, A. Continuous drone control using deep reinforcement learning for frontal view person shooting. Neural Comput. Appl. 2020, 32, 4227–4238. [Google Scholar] [CrossRef]
Yang, J.; You, X.; Wu, G.; Hassan, M.M.; Almogren, A.; Guna, J. Application of reinforcement learning in UAV cluster task scheduling. Future Gener. Comput. Syst. 2019, 95, 140–148. [Google Scholar] [CrossRef]
Mandloi, Y.S.; Inada, Y. Machine learning approach for drone perception and control. In Engineering Applications of Neural Networks: 20th International Conference, EANN 2019, Xersonisos, Crete, Greece, 24–26 May 2019, Proceedings 20; Springer International Publishing: Cham, Switzerland, 2019; pp. 424–431. [Google Scholar]
Lee, K.; Gibson, J.; Theodorou, E.A. Aggressive perception-aware navigation using deep optical flow dynamics and pixelmpc. IEEE Robot. Autom. Lett. 2020, 5, 1207–1214. [Google Scholar] [CrossRef]
Yang, X.; Chen, J.; Dang, Y.; Luo, H.; Tang, Y.; Liao, C.; Chen, P.; Cheng, K.-T. Fast depth prediction and obstacle avoidance on a monocular drone using probabilistic convolutional neural network. IEEE Trans. Intell. Transp. Syst. 2019, 22, 156–167. [Google Scholar] [CrossRef]
Xu, Z.; Zhan, X.; Chen, B.; Xiu, Y.; Yang, C.; Shimada, K. A real-time dynamic obstacle tracking and mapping system for UAV navigation and collision avoidance with an RGB-D camera. In Proceedings of the 2023 IEEE International Conference on Robotics and Automation (ICRA), London, UK, 29 May–2 June 2023; pp. 10645–10651. [Google Scholar]
Wakabayashi, T.; Suzuki, Y.; Suzuki, S. Dynamic obstacle avoidance for Multi-rotor UAV using chance-constraints based on obstacle velocity. Robot. Auton. Syst. 2023, 160, 104320. [Google Scholar] [CrossRef]
Müller, H.; Niculescu, V.; Polonelli, T.; Magno, M.; Benini, L. Robust and efficient depth-based obstacle avoidance for autonomous miniaturized uavs. IEEE Trans. Robot. 2023, 39, 4935–4951. [Google Scholar] [CrossRef]
Yasin, J.N.; Mohamed, S.A.S.; Haghbayan, M.-H.; Heikkonen, J.; Tenhunen, H.; Plosila, J. Unmanned aerial vehicles (uavs): Collision avoidance systems and approaches. IEEE Access 2020, 8, 105139–105155. [Google Scholar] [CrossRef]
Gandhi, D.; Pinto, L.; Gupta, A. Learning to fly by crashing. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 3948–3955. [Google Scholar]
Santos MC, P.; Rosales, C.D.; Sarcinelli-Filho, M.; Carelli, R. A novel null-space-based UAV trajectory tracking con-troller with collision avoidance. IEEE/ASME Trans. Mechatron. 2017, 22, 2543–2553. [Google Scholar] [CrossRef]
Al-Emadi, S.; Al-Senaid, F. Drone detection approach based on radio-frequency using convolutional neural network. In Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha, Qata, 2–5 February 2020; pp. 29–34. [Google Scholar]
Aldao, E.; González-Desantos, L.M.; Michinel, H.; González-Jorge, H. Uav obstacle avoidance algorithm to navigate in dynamic building environments. Drones 2022, 6, 16. [Google Scholar] [CrossRef]
Ming, Z.; Huang, H. A 3d vision cone based method for collision free navigation of a quadcopter UAV among moving obstacles. Drones 2021, 5, 134. [Google Scholar] [CrossRef]
Castillo-Lopez, M.; Sajadi-Alamdari, S.A.; Sanchez-Lopez, J.L.; Olivares-Mendez, M.A.; Voos, H. Model predictive control for aerial collision avoidance in dynamic environments. In Proceedings of the 2018 26th Mediterranean Conference on Control and Automation (MED), Zadar, Croati, 19–22 June 2018; pp. 1–6. [Google Scholar]
Kouris, A.; Bouganis, C.-S. Learning to fly by myself: A self-supervised cnn-based approach for autonomous navigation. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1–9. [Google Scholar]
Zhao, J.; Wang, H.; Bellotto, N.; Hu, C.; Peng, J.; Yue, S. Enhancing LGMD’s looming selectivity for UAV with spatial–temporal distributed presynaptic connections. IEEE Trans. Neural Netw. Learn. Syst. 2021, 34, 2539–2553. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Number of Publications related to UAV obstacle avoidance.

Figure 2. Search methodology.

Figure 3. Keywords used in literature search for ML and RL in UAV research.

Figure 4. Machine learning for UAV obstacle avoidance.

Figure 5. Machine learning techniques.

Figure 6. Deep reinforcement taxonomy algorithms for learning in unmanned aerial vehicle applications [58,59,60,61,62,63,64].

Figure 7. Dynamic navigation of a UAV in an urban environment demonstrating machine learning-aided reinforcement learning for UAV obstacle avoidance.

Figure 8. A classification system for artificial intelligence techniques used in UAV navigation.

Figure 9. Classification of reinforcement learning techniques: dynamic programming (DP), temporal difference (TD), Monte Carlo (MC), imagination-augmented agent (I2A), deep Q-network (DQN), trust region policy optimization (TRPO), actor-critic using Kronecker-factored trust region (ACKTR), actor-critic (AC), advantage actor critic (A2C), asynchronous advantage actor critic (A3C), deep deterministic policy gradient (DDPG), twin delayed DDPG (TD3), and soft actor-critic (SAC).

Figure 10. RL Algorithm success in varied environmental conditions [126,156,157,158].

Figure 11. Success rate variation of RL algorithms with environmental complexity [119,159,160,161,162,163].

Figure 12. Latency and accuracy of RL algorithms [158,159,162,163,164].

Table 1. Key Supervised Learning Techniques and their Applications in UAV Obstacle Avoidance.

Algorithms	Type	Description	References
LSTM RNN	SL	Improved prediction accuracy based on recent observations	[24]
Neural Network Pipeline (NNP)	SL	High accuracy (2%) and fast processing speed.	[25]
Neural Network	SL	Good real-time performance	[26]
Neural Network	SL	Created a depth map using the CGAN network to prevent more collisions.	[27]
Neural Network	SL	CNN network was used to forecast the depth map.	[28]
Neural Network	SL	Suggested a method that combines the epipolar geometry with the depth neural network.	[29]
Neural Network	SL	Utilized CNN to calculate the image’s optical flow.	[30]
Neural Network	SL	Suggested an ANN-based route planning method.	[31]
Neural Network	SL	Used the G-FCNN algorithm to assess the route.	[32]
Neural Network	SL	Utilizing the Q-learning method and neural network, the 3D collision avoidance problem was resolved.	[33]
Parameter FNN	SL	Attitude control	[34]
RBF NN	SL	Attitude control	[35]
Adaptive FNN	SL	Control of aerobatic maneuvers	[36]
CL	SL	Tracking trajectory	[37]
CL, BKR-CL	SL	Change of controller	[38]
RWNN	SL	Trajectory tracking	[39,40]
ReLU Network	SL	Aerobatic maneuver	[41]
NN	SL	Generalization of dynamics	[42]
SVR	SL	Attitude control	[43]
ELM	SL	Single UAV navigation	[44]
CNN	SL	Getting around a city	[45]
Imitation learning	SL	MAV flying over a woodland	[46]
CNN	SL	Trajectory prediction	[47]
NN, RL	SL	Estimating collisions	[48]
DNN, RL	SL	State estimation	[49]
Regression	SL	Estimation of altitude	[50]
SVM	SL	Mapping a river	[51]
GMM, SVM	SL	Detecting landing sites	[52]
OBIA	SL	Precision agriculture	[53]
Hybrid deep NN	SL	Classification of crops	[54]
DNN	SL	Perception of mountain trails	[55,56]

Table 2. Optimization-based AI approaches.

Algorithms, References	Type	Characteristics	Goals
PSO [92]	MPSO	PSO-generated infeasible paths are transformed into feasible ones through the integration of an error factor.	For the best collision-free 3D UAV route planning, use MPSO.
PSO [93]	DPSO	Discrete steps facilitate propagation, while multiple augmentations are utilized to improve convergence.	IoT and image processing in UAV control
PSO [94]	GBPSO	It assesses the current global path against alternative candidates to identify the most optimal choice.	GBPSO for enhanced 3D UAV routing
ACO [95]	Multi-ACO	It addresses the TSP problem, taking into account both intra-colony and inter-colony pheromone values.	ACO with many colonies for effective UAV path planning
ACO [96]	Double-ACO	It employs GA for generating the initial population.	Double-ant colony speeds UAV planning.
ACO [97]	PFACO	Achieves rapid convergence by utilizing MMAS and APF for enhanced global search capabilities.	PFACO for improved UAV path planning.
GA [98]	GA	The UAVs’ heading angle rate, acceleration, and rising angle rate make up their chromosomes.	Best possible 3D UAV route planning with GIS information.
GA [99]	Improved-GA	It comprises an encoding vector derived from the sequence of UAV yaw angles.	Using evolutionary algorithms to enhance UAV route planning.
GA [100]	HR-MAGA	It employs a hierarchical recursive procedure to establish a more detailed path.	Recursive hierarchical evolutionary algorithms for UAV route planning optimization.
GA [101]	OCGA	Uses OC and TLBO searches to achieve quick convergence.	Optimizing the OCGA algorithm for UAV route planning.
SA [102]	Modified-SA	Selection of the POIs is stochastic	Boost the effectiveness of UAV route planning.
SA [103]	SA-ACO	Hybrid algorithm	Enhance efficiency of underground path planning.
SA [104]	Grid-based SA	SA based on grid maps for 2D UAV navigation	Improve the coverage route planning of UAVs.
DE [105]	ADE	Selective mutation in hybrid algorithm.	In urban settings, maximize the effectiveness of UAV path planning.
DE [106]	CDE	Adaptive mutation limited differential evolution algorithm for catastrophe situational awareness in UAV path planning optimization.	Enhancing UAV route planning for disaster awareness using limited differential evolution.
DE [107]	DEAKP	Prioritize knee solutions over the Pareto front within a constrained multi-objective optimization context.	By optimizing distance, risk metrics, and operational limitations, a knee point-guided differential evolution method may be used to increase the efficiency of UAV path planning in catastrophe scenarios.
PIO [108]	SCPIO	The use of randomization, ergodicity of motion trajectories, strong sensitivity to beginning values, and hierarchical social class.	Creating a multi-UAV route planning model using SCPIO and TSS to improve efficiency and cooperation in challenging conditions.
PIO [109]	AOQPIO	The behavior of the system is highly sensitive to initial conditions, exhibits ergodic motion trajectories, and incorporates elements of randomization.	Adaptive QPIO improves UAV route planning in hazardous situations, outperforming PSO and its variations.
CS [110,111]	Improved-CS	Implement crossover and mutation operators within genetic algorithms and employ Chebyshev collocation points for representing coordinates.	Adaptive QPIO improves UAV route planning for better performance in difficult settings.
A* [112]	Dijkstra	Uses Hamiltonian and Eulerian path model and Dijkstra	Putting forward a unique GPS-free route planning and unmanned vehicle localization technique that makes use of cooperative algorithms.
A* [113]	DEA*	Hybrid DE and A* for urban UAV path planning.	Hybrid DE and A* for UAV collision-free path planning.
A* [114]	A*	Multi-step search with real-time A*	Stealth UAV real-time path planning.
GWO [115]	HSGWO-MSOS	Hybrid algorithm	Creating a hybrid HSGWO-MSOS algorithm to plan the best UAV route.
GWO [116]	2D-GWO	Ensure minimum fuel usage and zero threat	Creating GWO to maximize the effectiveness of UCAV route planning.
GWO [117]	3D-GWO	Localization and obstacle avoidance	Creating GWO to maximize the effectiveness of UCAV route planning.

Table 3. RL Techniques and their Applications in UAV Navigation and Obstacle Avoidance.

Algorithms, References	Type	Characteristics	Goals
RL [60]	Q-Learning	A calibrated reward system, employing e-greedy policy, operates in a coordinate-based state-action space. Q-learning adjusts PID controller parameters for navigation in a 2D environment, with function approximation based on FSR for efficiency.	Indoor navigation
RL [119]	Q-Learning	An adaptive reward system, utilizing e-greedy policy, functions within a goal-oriented state-action space. Navigating a 2D environment, the system autonomously seeks out charging points.	Improving QoE
RL [120]	Q-Learning	Utilizing an RSS-driven state-action space, the system employs a dynamic reward system and e-greedy policy, while also integrating obstacle avoidance.	Indoor SaR
RL [121]	TD-Learning	Employing a discrete state-action space, the system utilizes linear function approximation using tile coding to handle large state-spaces, alongside an e-greedy policy.	UAV-BS navigation
RL [122]	Multi-agent Q-Learning	The system anticipates user movement trajectories within a coordinate-based state-action space, factoring in energy usage with gradual convergence and integrating a dynamic rewarding system.	Data transmission with minimum power
RL [123]	Multi-agent Q-Learning	Real-time decentralized path planning with a model-based reward system, e-greedy policy, and reduced state-action space, all within a 3D environment.	Perform sense and send tasks
RL [124]	Double Q-Learning	An evolving reward system, paired with an e-greedy policy, functions within a goal-centric state-action space in a 2D environment.	Temporary BS support
RL [125]	Q-learning and SARSA	Utilizing reinforcement learning, optimizing UAV energy consumption, enhancing path planning efficiency.	UAV Efficiency Enhancement
RL [126]	Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Deep Q-Networks (DQN)	Obstacle detection and avoidance via diverse reinforcement learning approaches, contrasts DQN, PPO, and SAC methodologies for addressing static and dynamic obstacles, conducts evaluations in a virtual environment using AirSim and Unreal Engine 4, demonstrates SAC’s superior performance, and highlights the effectiveness of off-policy algorithms (DQN and SAC) compared to on-policy counterparts (PPO) for autonomous drone navigation.	Develop a drone system for autonomous obstacle avoidance.
RL [127]	(DNN-MRFT)	RL agent integration, direct policy transfer, DNN-MRFT parameter identification, reward function design, and fusion of high-level velocity actions and low-level control validated through simulations and real-world experiments with a 90% success rate, ensuring originality.	Closing the sim2real gap in UAVs with RL transfer and integrated control.

Table 4. Overview of Deep Reinforcement Learning Techniques for UAV Navigation and Obstacle Avoidance.

Algorithms, References	Type	Characteristics	Goals
DRL [128]	MDP-based Dueling DDQN	Utilizing a discrete action space, action approximation is performed through Artificial Neural Networks (ANNs), employing dual neural network architecture.	Simultaneous UAV navigation and radio mapping
DRL [129]	MDP-based DQN	Employing an RSS-driven state space, alongside an E-greedy policy, the system integrates a dynamic rewarding mechanism.	MIMO-based UAV navigation
DRL [130]	MDP-based DQN	Operating within a discrete action space contingent upon dispatched UAVs, the system incorporates an e-greedy policy and prioritized replay memory.	Urban vehicular connectivity
DRL [131]	MDP-based MADQN	Training centrally with an e-greedy policy, multiple agents share their individual locations, operating within a discrete action space.	AoL-aware WPCN
DRL [132]	MDP-based DQN	Incorporating an exceedingly sparse rewarding system and obstacle avoidance, the system employs LwH to enhance convergence within intricate environments.	UAV navigation in complex environment
DRL [133]	MDP-based DDQN	The system leverages CNN for map-based navigation while taking into account energy consumption.	UAV Coverage Path Planning
DRL [134]	MDP-based DQN	Employing object detection assistance, the system navigates through an object-oriented state space, integrating a binary rewarding system alongside an e-greedy policy.	Collision free UAV navigation
DRL [135]	MDP-based DQN	The system integrates user data freshness, UAV power consumption, and coordinates into the state space for DRL training with stochastic gradient descent.	UAV-BS Navigation
DRL [75]	MDP-based DQN	Implementing TD3fD for policy enhancement, employing a CNN-based DQN, the system includes obstacle avoidance capabilities.	Obstacle free UAV navigation
DRL [136]	POMDP-based DRL	MDP and POMDP are both employed for global and local planning, with policy improvements managed by TRPO.	Indoor navigation
DRL [137]	EDDQN	Employing a CNN-based DQN for map-guided navigation, the system adapts a modified Q-function to ensure effective obstacle avoidance.	Autonomous UAV exploration
DRL [138]	POMDP-based DDQN	CNN-based DDQN, grid-based navigation and searching	UAV-assisted data harvesting
Advantage Actor-Critic (A3C) [139,140]	A3C-based DRL	Prioritized experience replay is applied based on A3C, while multi-agent DDPG handles policy enhancement, alongside a pre-determined UAV navigation pattern.	UAV-assisted MEC
Advantage Actor-Critic (A3C) [141]	Fast-RDPG	Utilizing Fast-RDPG with LSTM-based DRL, the system achieves rapid convergence through an online algorithm.	UAV navigation in complex environment
Advantage Actor-Critic (A3C) [142]	A3C-based DRL	Utilizing A3C with a tailored policy gradient, the system accounts for energy consumption via an e-greedy policy, enabling decentralized multi-UAV navigation.	Distributed multi-UAV navigation
DRL [143]	DAC-SAC	The DAC-SAC algorithm integrates dual experience buffers, a Convolutional Neural Network (CNN), and a self-attention mechanism to improve UAV obstacle avoidance. Simulated trials affirm their efficacy in diverse environments, including scenarios with depth image inputs.	UAV autonomous obstacle avoidance
DRL [61]	D3QN	The trajectory planning approach for UAVs with cellular connectivity employs deep reinforcement learning and integrates connectivity and obstacle data to ensure safe flight, effectively navigating obstacles without AI plagiarism concerns.	Unique trajectory planning approach for cellular-connected UAVs
DRL [144]	FRDDM-DQN	The study presents FRDDM-DQN, an advanced deep-reinforcement-learning algorithm that combines Faster R-CNN with a Data Deposit Mechanism. It improves obstacle extraction, agent training, and adjusts to dynamic scenarios. In experiments, FRDDM-DQN demonstrates autonomous navigation, collision avoidance, and superior performance compared to alternative algorithms.	UAV autonomous navigation and collision avoidance

Table 5. Overview of DL Techniques for UAV Navigation and Obstacle Avoidance.

Algorithms, References	Type	Characteristics	Goals
DL [145]	CNN	CNN using the IDSIA dataset and ReLU activation	UAV navigation within unstructured trail
DL [146]	CNN	Following trails, recovering from disturbances, and avoiding obstacles.	UAV trail navigation
DL [147]	DNN	IDSIA dataset, trail navigation, and guidance	UAV navigation within forest
DL [148]	DCNN-GA	Model based on Exception and hyper-parameter tweaking based on GA	Indoor UAV navigation
DL [149]	Dynamic path planning IFDS with ANN	Optimization using ESCSO, Enhanced path navigation in intricate settings, real-time processing, reliable obstacle avoidance, improved computational efficiency.	Real-time UAV path planning enhancement

Table 6. Comprehensive Overview of Model-Free RL Algorithms for Enhanced UAV Control.

Algorithm	Description
SARSA	SARSA, which stands for State–Action–Reward–State–Action, is an algorithm used in artificial intelligence (AI) that belongs to the class of model-free and on-policy temporal difference (TD) learning methods. It functions by learning an action-value function.
SARSA Lambda	SARSA Lambda extends SARSA by including eligibility traces, enabling it to learn from more extended rewards. Moreover, SARSA Lambda is an on-policy TD learning algorithm.
Deep Q Network (DQN)	DQN, called deep Q-network, merges Q-learning with deep learning methods and operates on-policy.
Double DQN	Double DQN, a variant of DQN, employs two separate neural networks to estimate the action-value function, mitigating overestimation bias. It follows an off-policy approach.
Noisy DQN	Noisy DQN, a variant of DQN, injects noise into the action-value function during training to foster exploration and prevent the agent from being trapped in local optima. It adheres to an off-policy strategy.
Prioritized Replay DQN	Prioritized Replay DQN, a modification of DQN, prioritizes experience replay based on significance, enhancing the agent’s learning by focusing on crucial experiences. This algorithm operates on an off-policy basis.
Categorical DQN	Categorical DQN, an adaptation of DQN, employs a categorical distribution to model the action-value function, particularly beneficial for environments featuring a multitude of actions. This algorithm operates on an off-policy basis.
Distributed DQN (C51)	The Distributed DQN (C51) modifies DQN by using a quantized distribution for the action-value function, which improves its performance in continuous environments. This algorithm works on an off-policy basis.
Normalized Advantage Functions (NAF)	Normalized Advantage Functions (NAF) is a policy gradient method that employs advantage functions to adjust the policy. This algorithm operates on an off-policy basis.
Continuous DQN	Continuous DQN is an adaptation of the DQN algorithm tailored for environments with continuous action spaces. This algorithm operates on an off-policy basis.
REINFORCE (Vanilla policy gradient)	REINFORCE is a policy gradient method that employs the REINFORCE estimator for policy updates. This algorithm operates on a policy-based approach.
Policy Gradient	A class of algorithms known as Policy Gradient is designed to learn policies by directly optimizing predicted rewards. This strategy uses a policy-based methodology.
TRPO	A trust region is used by the policy gradient algorithm Trust Region Policy Optimization (TRPO) to ensure the security of policy changes. The on-policy methodology is used by this algorithm.
PPO	A policy gradient approach called Proximal Policy Optimization (PPO) uses a clipping goal to limit the size of policy updates. The on-policy methodology is used by this algorithm.
A2C/A3C	Policy gradient methods A2C (Advantage Actor-Critic) and A3C (Asynchronous Advantage Actor-Critic) are made to train several agents at once. These two algorithms follow an on-policy approach.
DDPG	Deep Deterministic Policy Gradient (DDPG) is a policy gradient algorithm tailored for continuous action spaces. It follows an off-policy approach.
TD3	In order to increase stability, TD3, a modified version of DDPG, uses two Q-networks and two target Q-networks in addition to an off-policy methodology.
SAC	A policy gradient algorithm that combines Q-learning and policy gradient approaches is called Soft Actor-Critic (SAC). It employs an off-policy strategy.
ACER	Actor-Critic with Experience Replay (ACER) is a policy gradient algorithm that incorporates experience replay to enhance efficiency. It follows an off-policy approach.
ACKTR	Actor-Critic with Kronecker Product Representation (ACKTR) is a policy gradient algorithm that utilizes a Kronecker product representation to train a factored policy. It follows an off-policy approach.

Table 7. Existing surveys related to UAVs obstacle detection and avoidance.

References	Year	Approach	Observations
[156]	2019	Obstacle detection in wooded areas and takeoff dynamics	considerable computational overhead
[165]	2020	using radio frequency signals to detect drones	comparatively high computational cost
[166]	2022	Avoiding collisions with stationary and moving objects within a structure	reduced UAV speed and obstacle detection using LiDAR sensors
[157]	2020	Estimate the movement of pertinent pixels in the robot’s intended path.	When it’s raining or gloomy outside, visual performance will be reduced.
[167]	2021	Obstacle identification and avoidance using vision	seeing just one side at a time and requiring a lot of processing power to see other orientations
[168]	2018	From trajectory tracking to model predictive control in 3D among several moving objects	NMPC is known to be computationally expensive.
[169]	2018	System for avoiding obstacles	High cost of calculation
[25]	2021	From trajectory prediction to vision-based collision avoidance with dynamic objects	High cost of processing hardware and computing
[170]	2021	Vision-based collision avoidance in maneuverable aircraft	Expensive processing costs and inability to identify tiny things
[159]	2023	Avoiding collisions with both moving and stationary objects using vision	High failure rate in complicated environments
[160]	2023	Taking into account the obstacle’s location and velocity, dynamic obstacle avoidance	Restricted to a 2D setting. Assumes UAV can determine position and velocity of every barrier and UAV.
[161]	2023	Avoiding obstacles with a Nano drone	A typical, untested indoor environment with 100% dependability at 0.5 m/s
[158]	2019	Monocular depth prediction in real time and obstacle detection and avoidance using a lightweight probabilistic CNN (pCNN)	The method is shown to be faster and more accurate than the state-of-the-art approaches on the devices such as TX2 and 1050Ti with increased accuracy from sparse depth from visual odometry.
[164]	2017	Trajectory tracking with collision avoidance using a potential function a null-space controller	Confirmed on a Parrot AR. Drone, works well in dynamic scenarios with a low-cost RGB-D sensor platform
[162]	2020	The following paper aims to present a thorough analysis of the current approaches to collision avoidance systems as applied to unmanned vehicles with particular regard to UAVs.	Compared several approaches and types of sensors in order to discuss the problem of avoiding collisions with UAVs.
[163]	2017	Gathered a massive repository of UAV crash incidents for developing a self-supervising navigation strategy	Uses crash data to provide evidence about organization of vehicle motion in crowded spaces

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Skarka, W.; Ashfaq, R. Hybrid Machine Learning and Reinforcement Learning Framework for Adaptive UAV Obstacle Avoidance. Aerospace 2024, 11, 870. https://doi.org/10.3390/aerospace11110870

AMA Style

Skarka W, Ashfaq R. Hybrid Machine Learning and Reinforcement Learning Framework for Adaptive UAV Obstacle Avoidance. Aerospace. 2024; 11(11):870. https://doi.org/10.3390/aerospace11110870

Chicago/Turabian Style

Skarka, Wojciech, and Rukhseena Ashfaq. 2024. "Hybrid Machine Learning and Reinforcement Learning Framework for Adaptive UAV Obstacle Avoidance" Aerospace 11, no. 11: 870. https://doi.org/10.3390/aerospace11110870

APA Style

Skarka, W., & Ashfaq, R. (2024). Hybrid Machine Learning and Reinforcement Learning Framework for Adaptive UAV Obstacle Avoidance. Aerospace, 11(11), 870. https://doi.org/10.3390/aerospace11110870

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Machine Learning and Reinforcement Learning Framework for Adaptive UAV Obstacle Avoidance

Abstract

1. Introduction

Objective of the Review

2. Search Methodology

Literature Search Strategy

3. Machine Learning for Enhanced UAV Obstacle Avoidance

3.1. Overview of Machine Learning

3.2. Supervised Learning (SL)

3.3. Unsupervised Learning (UL)

3.4. Challenges in Machine Learning Techniques and the Need for Reinforcement Learning

4. Integrating Machine Learning and Reinforcement Learning for UAV Obstacle Avoidance

4.1. Path Planning with Reinforcement Learning

4.2. Reinforcement Learning for UAV Navigation

4.3. Learning-Based AI Approaches

4.3.1. Reinforcement Learning

4.3.2. Deep Reinforcement Learning (DRL)

4.3.3. Deep Learning (DL)

4.4. Reinforcement Learning for UAV Control

5. Future Directions and Challenges

5.1. Comparative Analysis of Existing Studies

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI