Each domain was analyzed by reviewing several approaches and methods based on evaluating and discussing advantages, disadvantages, outcomes and significance. The following analysis of each domain was carried out with the aim of accelerating development of level 4 or 5 AVS.
  3.2. Decision Making
As the world economy and technology have grown and developed, vehicular ownership has increased rapidly, along with over one million traffic incidents worldwide per year. Statistics indicate that 89.8% of incidents took place because of wrong driver decision-making [
193]. To solve this issue with the concept of AVS, the decision-making process was one of the key fields for studying a combined deep learning and deep reinforcement learning-based approach to take humanlike driving decisions when accelerating and decelerating, lane shifting, overtaking and emergency braking, collision avoidance, vehicle behavior analysis and safety assessment.
For instance, the automated driving coordination problem was defined as a problem of the Markov Decision Process (MDP) in the research of Yu et al., during the simulation of vehicle interactions applying multi-agent reinforcement learning (MARL) with a dynamic coordination graph to follow lead vehicles or overtaking in certain driving scenarios [
121]. The advantage of the method was when most of the study focused on single vehicle policy, the proposed mechanism resolved the limitation of coordination problem in autonomous driving during overtaking and lane-shifting maneuvers, obtaining higher rewards than rule-based approaches.
In another work, the Driving Decision-Making Mechanism (DDM) was built by Zhang et al., using an SVM algorithm, optimized with the weighted hybrid kernel function and a Particle Swarm Optimization algorithm to solve decision-making issues including free driving, tracking car and lane changing [
122]. The proposed decision-making mechanism obtained 92% accuracy optimizing an SVM model compared with RBF kernel and BPNN model, where the evaluated performance shows that free driving achieved 93.1% and tracking car and lane changing achieved 94.7% and 89.1% accuracy, respectively, in different traffic environments within 4 ms for average reasoning time. The authors presented a hypothesis when analyzing the results: for driving decisions, road conditions have nearly no effect on heavy traffic density. Despite achieving good accuracy, some limitations were mentioned, such as not applying to real-world driving environments and not yet investigating critical driving scenes such as sudden presence of pedestrians or objects.
This issue of [
122], was solved by Fu et al., who proposed autonomous braking, analyzing a lane-changing behavior decision-making system for emergency situations, implementing an actor-critic-based DRL (AC-DRL) with deep deterministic policy gradient (DDPG) and setting up a multi-object reward function [
123,
124], obtaining 1.43% collision rate. The authors mentioned that using a large training dataset online can be tough and expensive, and the continuous action function decreased the convergence rate and can quickly be lowered to the maximum local.
Moreover, to overcome the limitation of reinforcement learning in complex urban areas, Chen et al. used model-free deep reinforcement learning approaches named Double Deep Q-Network (DDQN), Twin Delayed Deep Deterministic Policy Gradient (TD3) and Soft Actor-Critic (SAC) to obtain low dimensional latent states with visual encoding [
125]. They improved performance by implementing a CARLA simulator by altering frame dropping, exploring strategies and using a modified reward and network design. The method was evaluated in one of the most complicated tasks, a busy roundabout, and obtained improved performance compared to baseline. In the 50 min test, the three approaches were able to enter with high success rate but performance of DDQN and TD3 decreased after covering a long distance. In the best case, SAC achieved 86%, 80%, 74%, 64%, 58% success rate for first, second, third, desired exits and goal point, respectively, where DDQN and TD3 had an almost zero success rate for desired exit and goal point arriving.
To avoid training complexity in a simulation environment, the DDPG algorithm with actor-critic method was applied in [
124] using deep reinforcement learning (DRL), considering three reward function braking scenarios: braking too early and too late, and too-quick braking deceleration. The outcomes of their proposed methodology showed that the error collision rate was 1.43% which was gained by evaluating the performance of the diverse initial positions and initial speed strategies. The ratio of obtaining maximum deceleration was 5.98% and exceeding jerk was 9.21%, which were much improved compared to DDPG with steering and DQN with discrete deceleration.
A dueling deep Q-network approach was demonstrated by Liao et al. to make a strategy of highway decision making [
126]. The method was built for lane-changing decisions to make a strategy for AVS on highways where the lateral and longitudinal motions of the host and surrounding vehicles were manipulated by a hierarchical control system. The outcomes showed that after 1300, 1700, 1950 episodes, the approach was able to avoid collision after 6 h of training and 26.56 s of testing.
In another study, Hoel et al. introduced a tactical framework for a decision-making process of AVS combining planning with a DRL-extended Alpha Go algorithm [
127]. The planning phase was carried out with a modification in the Monte Carlo Tree Search, which builds a random sampling search tree and obtained a 70% success rate in highway cases. The contrast between traditional MCTS and the variant in this search was that a neural network formed through DRL aimed towards the search tree’s most major aspects and decreased the essential sample size and helped to identify long temporal correlations with the MCTS portion. However, the proposed process considered 20 simulation parameters and 11 inputs to a neural network which were very efficient and made more suitable for practical implementation.
Overtaking maneuvers for intelligent decision making while applying a mixed observable Markov decision process was introduced by Sezer, solving overtaking maneuvers on two-track roads [
128]. In this paper, the author presented a new formulation for the issue of double-way overtaking by the resources of the mixed observability MDP (MOMDP) to identify the best strategy considering uncertainties. This was used for overcoming the problem, and was illustrated by the active solvers’ growth and in cognitive technological advances by reducing time-to-collision (TTC) methods in different simulations. The method surpassed nine periods, relative to both MDP and conventional TTC methods. However, the limitation of proper discretion can also be considered with respect to the actual speed and distance values. A higher number of states that were specifically connected for computing and MOMDP algorithm tend to be required as the actual implementation hindrance.
To overcome the issue of vehicle overtaking which needs an agent to resolve several requirements in a wide variety of ways, a multigoal reinforcement learning (MGRL)-based framework was introduced to tackle this issue by Ngai et al. [
129]. A good range of cases of overtaking were simulated to demonstrate the feasibility of the suggested approach. When evaluating seven different targets, either Q-Learning or Double Action QL was being used with a fusion function to assess individual decisions depending upon the interaction of the other vehicle with the agent. The hypothesis of the work was that this proposal was very efficient at taking accurate decisions while overtaking, collision avoiding, arriving on target timely, maintaining steady speed and steering angle.
Brännström et al. presented a collision-avoiding decision-making system adopting a Bayesian network-based probabilistic framework [
130]. A driver model enabled the developer to carry out early actions in many circumstances in which the driver finds it impossible to anticipate the potential direction of other road users. Furthermore, both calculation and prediction uncertainties were formally discussed in the theoretical framework, both when evaluating driver adoption of an action and when predicting whether the decision-making method could avoid collision.
Another important decision-making task is intelligent vehicle lane-changing policy. Based on the area of acceleration and braking mechanism, a method was introduced by Zhu et al. [
131]. First, velocity and relative distance acceleration area was developed based on a braking mechanism and acceleration was used as a safety assessment predictor and then, a method for lane changing with the accelerating field was built, while the driver’s behaviors, performance and safety were taken into consideration. In compliance with the simulation findings, the use of lane-changing decision-making strategies based on the acceleration can be optimized with driver behaviors for lane-change steps, including starting line, span and speed establishing safety at the same time.
Although previous approaches presented a decision-making mechanism for lane changing, most of them did not show DMS for behavior prediction while lane changing [
132]. A fuzzy interface system with an LSTM-based method for AVS was proposed by Wang et al. to analyze behavior of surrounding vehicles to ensure safety while lane changing with 92.40% accuracy. The novelty of their work was the adjustment of motion state dynamically in advance.
Li et al. proposed a framework for the analysis of the behavior, using a gradient-boosting decision tree (GBDT), merging acceleration or deceleration behavior with the data from the trajectory of the vehicle processed in the noise method on the U.S. highway 101 [
133]. The partial dependency plots demonstrated that the effect on the fusion of acceleration or deceleration in independent variables by understanding the key impacts of multiple variables, was non-linear and thus distinct from the car tracking behavior with 0.3517 MAD (Mean Absolute Deviation) value, which suggested that the adoption of typical vehicle models in combination results cannot reflect characteristic behavior.
Further, DRL with Q-masking was applied by Mukadam et al. to make tactical decisions for shifting lanes [
134]. They introduced a system which provided a more organized and data-efficient alternative to a comprehensive policy learning on issues where high-level policies are difficult to formulate through conventional optimization or methods based on laws. The success rate of 91% was 21% higher than human perception and the 0% collision was 24% lower than human perception. This method of DRL with Q-masking worked best in the case of avoiding collision while lane shifting.
Similarly, Wang et al. adopted DRL but combined with rule-based constraints to take lane-changing decisions for AVS in a simulated environment and MDP, which was challenging for high-level policy to develop through conventional methods of optimization or regulation [
135]. The training agent could take the required action in multiple situations due to the environment of state representation, the award feature and the fusion of a high level of lateral decision making and a rule-based longitudinal regulation and trajectory adjustment. The method was able to obtain a 0.8 safety rate with superior average speed and lane-changing time.
Chae et al. demonstrated an emergency braking system applying DQN [
136]. The problem of brake control model was conceived in Markov’s decision-making process (MDP), where the status was provided by the relative location of the hazard and the speed of the vehicle and the operating space specified as the collection of brake actions including no braking, weak, medium and heavy braking operation, combining vehicle, pedestrian and multiple road conditions scenarios, and the obtained collision rate decreased from 61.29% to 0% for a TTC value from 0.9 s to 1.5 s. As a result, this DQN-based approach was selected as one of the most practical systems for SVM in terms of autonomous braking.
Furthermore, to analyze high-accuracy braking action from a driving situation declaring four variables, that is, speed of host vehicle, time to collision, relative speed and distance between host and lead vehicle, Wang et al. used hidden Markov and Gaussian mixture-based (HMGM) approach [
137]. The efficient technique was able to obtain high specificity and 89.41% accuracy despite not considering kinematic characteristics of lead or host vehicle for braking. However, the analysis of four variants while braking could be a pathway to develop an improved version of braking decision making for AVS.
When most of the approaches had dependency on datasets, methods such as DRL that combined DL and RL were extremely efficient for driving decision making in an unknown environment. For example, Chen et al. developed a brain-inspired simulation based on deep recurrent reinforcement Q-learning (DRQL) for self-driving agents with better action and state space inputting only screen pixels [
138]. Although the training process was long, it resulted in better-than-human driving ability and Stanford driving agent in terms of reward gain, which indicates that this approach was one of the most suitable for applying in AVS.
Another DRL-based approach combined with automatically generated curriculum (AGC), was extremely efficient for intersection scenarios with less training cost [
139]. The method obtained 98.69% and 82.1% mean average reward while intersection approaching and traverse. However, the approach might lack proper finishing or goal researching in some cases of intersection traverse, but it is still very efficient for not depending on pre-trained datasets.
Similarly, continuous decision-making for intersection cases in top three accident-prone crossing paths in a Carla simulator using DDPG and CNN surpassed the limitation of single scenario with discrete behavior outputs fulfilling the criteria for safe AVS [
140]. DDQG was utilized to address the MDP problem and find the best driving strategy by mapping the link between traffic photos and vehicle operations through CNN that solved the common drawback of rule-based RL methods deployed in intersection cases. The method obtained standard deviation (SD) values for left turn across path opposite direction and lateral direction, straight crossing path 0.50 m/s, 0.48 m/s and 0.63 m/s, respectively, although it only considered lateral maneuvers and two vehicles in the intersection.
In contrast, approach was introduced by Deshpande et al. for dealing with behavioral decision making for environments full of pedestrians [
141]. Deep recurrent Q-network (DRQN) was used for taking safe decisions to reach a goal without collision and succeeded in 70% of cases. With the comparatively lower accuracy, this approach also could be very appropriate if deep learning agents were added for better feature analysis.
For AVS navigation avoiding on-road obstacles, a double deep Q-learning (DDQN) and Faster R-CNN in a stochastic environment obtained stable average reward value after only 120 epochs with maximum 94% accuracy after 180,000 training steps with hyper-parameter tuning [
142]. However, this approach only considered vehicles in parallel and did not show how DDQN and Faster R-CNN are fused. Moreover, the approach was still unable to obtain stable performance in uncertain moments.
Mo et al. demonstrated reinforcement learning agent and an MCTS-based approach to reduce safe decision making and behaviors by safe policy search and risk state prediction module [
143]. This research assessed the challenge of decision making for a two-lane overtaking situation using the proposed safe RL approach and comparing it with MOBIL and DRQN. The proposed model outperformed MOBIL and DRQN by scoring 24.7% and 14.3% higher overtaking rate with 100% collision-free episodes and highest speed. Therefore, the proposed Safe RL could be a pathway for current AVS for risk-free trajectory decision making.
In conclusion, decision making is the most vital part of an intelligent system, and to obtain acceptable human-like driving decisions, multiple deep learning and deep reinforcement learning methods were analyzed (shown in 
Table 10). The discussed approaches where able to resolve severe limitations and outperformed in overtaking, braking, behavioral analysis and significant segments of decision making for full AVS.
  3.3. End-to-End Controlling and Prediction
End-to-end controlling is one of the major fields of study for AVS. Human mistakes were the main cause of road accidents, and fully autonomous vehicles can help reduce these accidents.
To improve the control system of AVS analyzing driving scenarios for lane changing, An et al. [
144] proposed a system that tried to approximate driver’s actions based on the data obtained from an uncertain environment that were used as parameters while transferring to parameterized stochastic bird statecharts (stohChart(p)) in order to describe the interactions of agents with multiple machine learning algorithms. Following that, a mapping approach was presented to convert stohChart(p) to networks of probabilistic timed automata (NPTA) and this statistical model was built to verify quantitative properties [
145]. In the learning case, weighted KNN achieved highest accuracy combined with the proposed method considering training speed and accuracy, where it achieved 85.5% accuracy in 0.223 s and in the best case, time cost for probability distribution time for aggressive, conservative and moderate driving styles was 0.094, 0.793 and 0.113 s, respectively. The authors categorized their work into learning phase, modelling phase and quantitative analyzing phase in order to develop the driving decision-taking phase.
A method was demonstrated by Pan et al. to control independently at high speeds using human-like imitation learning, involving constant steering and acceleration motions [
146]. The dataset’s reference policy was derived from a costly high-resolution model predictive controller, which the CNN subsequently trained to emulate using just low-cost camera sensors for observations. The approach was initially validated in ROS Gazebo simulations before being applied to a real-world 30 m-long dirt track using a one-fifth-scale car. The sub-scale vehicle successfully learnt to navigate the track at speeds of up to 7.5 m/s.
Chen et al. focused on a lane-keeping end-to-end learning model predicting steering angle [
147]. The authors employed CNN to the current NVIDIA Autonomous Driving Architecture, where both incorporated driving image extraction and asserting steering angle values. To test the steering angle prediction while driving, they considered the difference among ground truth angle which was generated by human drivers vs. predicted angle where they acquired higher steering prediction accuracy with 2.42 mean absolute error and suggested for data augmentation for training to achieve a better performance.
In another work, a technically applied system of multitask learning in order to estimate end-to-end steering angle and speed control, was proposed in [
148]. It was counted as one of the major challenging issues for measuring and estimating speed only based on visual perceptions. Throughout their research, the authors projected separation of speed control functions to accelerate or decelerate, using the front-view camera, when the front view was impeded or clear. Nevertheless, it also showed some shortcomings in precision and pre-fixed speed controls. By combining previous feedback speed data as a complement for better and more stable control, they improved the speed control system. This method could be stated to solve error accumulation in fail-case scenarios of driving data. They scored 1.26° Mean Absolute Error (MAE) in estimating real-time angles along with 0.19 m/s and 0.45 MAE on both datasets for velocity prediction. Thus, the improved result made the method one of the most applicable versions of CNN and data-driven AV controlling. While driving, people identify the structures and positions of different objects including pedestrians, cars, signs and lanes with human vision. Upon recognizing several objects, people realize the relation between objects and grasp the driving role. In the spatial processing of single images by the application of three-dimensional vectors, CNN has certain shortcoming in the study of time series. However, this issue cannot be overcome using CNN alone.
To solve this limitation Lee et al. demonstrated an end-to-end self-driving control framework combining a CNN and LSTM-based time-series image dataset applied in a Euro Truck simulator [
149]. The system created a driving plan which takes the changes into account over time by using the feature map to formulate the next driving plan for the sequence. Moreover, NVIDIA currently has succeeded in training a ConvNet for converting raw camera images into control steering angles [
150]. It resolved end-to-end control by predicting steering angle without explicating labels with approximately 90% autonomy value and 98% autonomous of the testing period. This approach was one of the most demonstrated approaches that boosted research of AVS applying deep learning methods.
A similar method, deep ConvNet, was used by Chen et al. to train for directly extracting the identified accessories from the front camera [
151]. A basic control system, based on affordance principles, provided steering directions and the decision to overtake proceeding vehicles. Rather than using lane-marking detection methods as well as other objects to assess indirect activity specifications of the car, a variety of driving measures allowances were specified. This method included the vehicle location, the gap to the surrounding lane markers and records of previous car driving. While this was a very trendy concept, for many reasons it may be challenging to handle traffic with complex driving maneuvers and make a human-like autonomous vehicle controlling system.
To deploy a human-like autonomous vehicle speed-control decision-making system Zhang et al. proposed a double Q-network-based approach utilizing naturalistic driving data built on the roads of Shanghai inputting low dimensional sensor data and high-dimensional image data obtained from video analysis [
152]. They combined deep neural networks and double Q-learning (DDQL) [
194,
195,
196] to construct the deep Q-network (DQN) model which was able to understand and make optimal control decisions in simultaneous environmental and behavioral states. Moreover, real-world data assessment reveals that DDQN can be used on a scale to effectively minimize these unreliable DQN problems, resulting in more consistent and efficient learning. DDQN had increased both in terms of interest precision and policy efficiency. The model performed 271.13% better than DQN in terms of speed-control decision making. Even so, the proposed approach could be more applicable to an unknown driving environment with combined CNN agent for feature extraction.
Chi et al. formulated a ST-LSM network that incorporates spatial and temporal data from previously multiple frames from a camera’s front view [
153]. Several ST-Conv layers were used in the ST-LSTM model to collect spatial information and a layer of Conv-LSTM was used to store temporarily data at the minimal resolution on the upper layer. However, the spatial and temporal connection among various feature layers was ignored by this end-to-end model. They obtained a benchmarking 0.0637 RMSE value on the Udacity dataset, creating the smallest 0.4802 MB memory and 37.107 MB model weight. The limitation of the paper was that all present end-to-end driving models were only equipped by focusing on the ground truth of the current frame steering angle, which indicated a lack of further spatiotemporal data.
Furthermore, to obtain a better control system, the previous issue was tackled, and an end-to-end steering control system was implemented by Wu et al. by concatenating future spatiotemporal features [
154]. They introduced the encoding for an advanced autonomous driving control system of spatiotemporal data on a different scale for steering angle approximation using the Conv-LSTM neural framework with a wide-spectrum spatiotemporal interface module. Sequential data were utilized to improve the space-time expertise of the model during development. This proposed work was compared with end-to-end driving models such as CgNet, NVIDIA’s PilotNet [
155] and ST-LSTM Network [
153], where the root mean square error (RMSE) was 0.1779, 0.1589 and 0.0622, respectively, and showed the lowest RMSE value of 0.0491 to predict steering angles, which was claimed to be more accurate than an expert human driver. Thus, this approach was applicable for a level 4 or 5 autonomous vehicle control system.
Moreover, a deep neural network-based approach with weighted N-version Programming (NVP) was introduced for resilient AV steering controlling [
156]. Compared to the other three networks (chauffeur, autumn, rambo), the proposed network showed 40% less RMSE retrieving steering angles in clear, rain, snow, fog and contrast lighting conditions. However, there was a high failure rate for the large developing cost for training an individual DNN model.
Aiming to build a vehicle motion estimation system for diversity awareness while driving, Huang et al., via latent semantic sampling [
157], developed a new method to generate practical and complex trajectories for vehicles. First, they expanded to include semantic sampling as merging and turning the generative adversarial network (GAN) structure with a low-dimensional semantic domain, formed the space and constructed it. It obtained 8% improvement on the Argoverse validation dataset baseline. They therefore sampled the estimated distribution from this space in a way which helped the method to monitor the representation of semantically different scenarios.
A CNN and state-transitive LSTM-based approach was demonstrated with multi-auxiliary tasks for retrieving dynamic temporal information from different driving scenarios to estimated steering angles and velocity simultaneously [
158]. The method applied the vehicle’s current location to determine the end-to-end driving model sub-goal angle to boost the steering angle estimation accuracy, which forecasted that the efficiency of the driving model would improve significantly. The combined method obtained 2.58° and 3.16° MAE for steering angle prediction and 0.66 m/s and 0.93 m/s speed MAE in GTA V and Guangzhou Automotive Cooperate datasets, respectively. Nevertheless, it showed a slow response in unknown environment, so this method might not be applicable in practical implementation.
In a similar manner, Toromanoff et al. presented a CNN-based model for lateral control of AVS using a fisheye camera with label augmentation technique for accurate corrections labelling under lateral control rule to tackle ceases of lateral control error in wide FoV [
159]. This method compares with pure offline methods where feedback was not implemented from a prediction which resulted in 99.5% and 98.7% autonomy in urban areas and highways after training with 10,000 km and 200 h driving video.
On the other hand, Smolyakov et al. reduced a huge number of parameters of CNN to avoid overfitting along with helping to find dependency on data sequence and implement in a CarND Udacity Simulator for predicting steering angles. However, the obtained unsatisfactory result was comparable to other reviewed results, where the accuracy was 78.5% [
160].
Similarly, a CNN-based approach was applied for both lateral and longitudinal motion controlling of AVS obtaining 100% autonomy on e-road track on TORCS simulator. Although it had performed very well, contributing to both kinds of motion controlling, it lacked training data for practical implementation and memory consumption for training two different neural networks for speed and steering angle prediction. This method could be better approached by implementing in real scenarios with a good amount of training data [
161].
In another proposal, a reinforcement learning-enabled throttle and brake control system was proposed by Zhu et al. [
162], focusing on a one leader and one follower formation. A neural dynamic programming algorithm evaluating with trial-and-error method was directly applied for adopting near-optimal control law. The control policy included the necessary throttle and brake control commands for the follower according to the timely modified corresponding condition. Simulation experiments were carried out using the well-known CarSim vehicle dynamic simulator to show the reliability of the approach provided.
To overcome traditional sensor-based pipeline for controlling AVS where there is a tendency to learn from direct mapping, Xiao et al. demonstrated multimodal end-to-end AVS applying conditional imitation learning (CIL), taking an RGBD image as raw data in a Carla simulator environment [
163]. The CNN-based CIL algorithm was evaluated in different weather modes to identify the performance for end-to-end control. The success rate of controlling in one turn and dynamic environment were 95% and 84%, respectively, which could be boosted through early fusion by changing the number of color channels from three (RGB) to four (RGBD). However, performance dropped almost 18.37% and 13.37% during controlling AVS with RGB image input for one turn and dynamic environment, respectively, in a new map of Carla simulators which could be considered as uncertain area
In brief, most of the deep learning approaches for end-to-end controlling and motion predications were based on CNN, showing efficient outcomes suitable for practical level 4 or 5 AVS. Most of the methods were deployed for estimating continuous steering angle and velocity, some controlling approaches taking into account resolving blind spot, gap estimation, overcoming slow drifting, both lateral and longitudinal motion controlling with methods such as multimodal multitask-based CNN, CNN-LSTM, Deep ConvNet, ST-LSTM, neural dynamic programming-based reinforcement learning with actor-critic network and RL. These methods faced challenges, such as noise created by human factor reasoning speed changes causing lower accuracy, only equipped by focusing on the ground truth of the current frame steering angle and not applying in a practical or complex environment. The overall summary of discussed methods is presented in 
Table 11.
  3.4. Path and Motion Planning
Precipitation-based autonomous navigation including path and motion planning in an unknown or complex environment is one of the critical concerns for developing AVS. To tackle the current problem and analyze the contribution, multiple deep learning and deep reinforcement learning (DRL) combined methods for path and motion planning are reviewed in this section.
Initially, You et al. focused on the issue of path planning of autonomous vehicles in traffic in order to repeat decision making by replicating the optimum driving technique of expert drivers’ actions for lane changing, lane and speed maintenance, acceleration and braking in MDPs on highways [
164]. The optimal control policy for the proposed MDP was resolved using deep inverse reinforcement learning (DIRL) and three MaxEnt IRL algorithms by utilizing a reward function in terms of a linear combination of parameterized function to solve model-free MDP. The trajectory proposals were executed at the time of overtaking and the policy recovery was reduced to 99%, even though there was insufficient evidence for the reflection of stochastic behavior.
To solve limitations of rule-based methods for safe navigation and better intersection problems for AVS, a vision-based path and motion planning formula was used by Isele et al., adopting DRL [
165]. Each wait action was proceeded by another wait or go action, meaning that each pathway was a series of waiting decisions that concluded in a go decision as well as the agent not being permitted to wait after the go action had been chosen. The method secured a success rate for forward, right, left and turn and challenge of 99.96%, 99.99%, 99.78% and 98.46%, respectively, which was 28% faster than the TTC (time-to-collision) method, although performance decreased three times and average time doubled during this challenging situation.
Zhang et al. proposed a risk analysis and motion planning system for autonomously operated vehicles focused on highway scenario motion prediction of surrounding vehicles [
166]. An interactive multiple model (IMM) and constant turn rate and acceleration (CTRA) model were used for surrounding vehicle motion prediction, and model predictive control (MPC) was used for trajectory planning that scored 3.128 RMSE after 5 s during motion prediction. Although it was designed for connected AVS, it is efficient for vision-based approaches.
Another approach, local and global path planning methodology, was presented in an RoS-based environment for AVS by Marin-Plaza et al., where they used the Dijkstra and time elastic bands (TEB) method [
167]. The path planning model was able to reach the goal with modest error by calculating Euclidean distance for comparing local and global pan waypoints, where it scored 1.41 m, which is very efficient. However, it was applicable only if the model was not specifically calibrated for the vehicle’s kinematics or if the vehicle was out of track, and did not consider complex scenarios. In another work, Islam et al. established a vision-based autonomous driving system that relied on DNN, which handled a region with unforeseen roadway hazards and could safely maneuver the AVS in this environment [
168]. In order to overcome an unsafe navigational problem, they presented object detection and structural segmentation-based deep learning architecture, where it obtained an RMSE value of 0.52, 0.07 and 0.23 for cases 1 to 3, respectively, and 21% safety enhancement adding hazard avoiding method.
Ma et al. proposed an efficient RRT algorithm that implemented a policy framework based on the traffic scenes and an intense search tree extension strategy to tackle traditional RRT problems where it faced a meandering route, an unreliable terminal state and sluggish exploration, and established more sustainable motion planning for AVS [
169]. In addition, the integrated method of the proposed fast RRT algorithm and the configuration time space could be adopted in complex obstacle-laden environments to enhance the efficiency of the expected trajectory and re-planning. A significant set of experimental results showed that the system was much quicker and more successful in addressing on-road autonomous driving planning queries and demonstrating its better performance over previous approaches.
In another work, an optimum route planner integrated with vehicle dynamics was designed by Gu et al. implementing an artificial potential field to provide maximum workable movement that ensured the stability of the vehicle’s path [
170]. The obstacles and road edges were typically used with constraints and not with any arbitrary feature in this method in the optimal control problem. Therefore, when designing the optimum route using vehicle dynamics, the path-planning method was able to treat various obstacles and road structures sharply in a CarSim simulator. The analysis showed that the method reduced computational costs by estimating convex function while path planning. A similar method was proposed by Wahid et al., where they used an artificial potential field with adaptive multispeed scheduler for a collision-avoidance motion planning strategy [
171].
Cai et al. demonstrated a novel method combining CNN, LSTM and state model which was an uncertainty-aware vision-based trajectory generation network for AVS’s path-planning approach in an urban traffic scene [
172]. The work was divided into two major parts: the first one was a CNN bottleneck extractor, and the second component included a self-attention module for calculating recurrent history and an LSTM module for processing spatiotemporal characteristics. Finally, they designed the probable collision-free path planning with speeds and lateral or longitudinal locations for the next 3.0 s after taking image stream and state information in the past 1.5 s considering as input. The method obtained more centralized error distribution and lower error medium.
For safe navigation for AVS in road scenarios with obstacles, a model prediction control-based advanced dynamic window (ADW) method was introduced by Kiss et al. [
173]. The method demonstrated differential drive that reached the destination location ignoring the desired orientation and did not require any weighted objective function.
A motion planning model based on the spatiotemporal LSTM network (SLN), which had three major structural components, was proposed by Bai et al. It was able to produce real-time feedback based on the extraction of spatial knowledge [
174]. First, convolutional long-term memory (Conv-LSTM) was applied in sequential image databases to retrieve hidden attributes. Secondly, to extract spatiotemporal information, a 3D CNN was used, and precise visual motion planning was displayed constructing a control model for the AV steering angle with fully connected neural networks. The outcome showed almost 98.5% accuracy and better stable performance compared with Hotz’s method [
147]. Nonetheless, the method was found to minimize state after generating overfitting on antecedent data for time-series data of previous steps, causing more computational cost and time.
Another motion planning avoiding-obstacle-based approach was proposed in a simulation environment [
175]. The motion planning method had the ability to infer and replicate human-like control thinking in ambiguous circumstances, although it was difficult to establish a rule base to tackle unstructured conditions. The approach was able to execute 45.6 m path planning with 50.2 s.
In conclusion, very few works have adopted a perception-based path and motion planning for AVS but the existing research adopting deep inverse reinforcement learning and MaxEnt IRL, deep Q-network time-to-go method, Dijkstra and time elastic bands method, DNN, advance RRT, artificial potential field, ADW using model predictive control and fuzzy logic made a remarkable contribution, with high accuracy, collision-free path planning, 21% safety enhancement adding hazard-avoiding method planning motion in a multilane turn-based intersection. Nevertheless, these methods were not practically implemented or theoretical, and some of the high-performing approaches were not tested in a real-life environment with heavy traffic. An overview of the deep learning methods selected for analysis to improve AVS is presented in 
Table 12.
  3.5. AR-HUD
Augmented reality (AR) in head-up display (HUD) or displaying in windshield for autonomous driving system as a medium of final visualizing of activities outcomes from the deep learning approach was overlayed with an autonomous driving system. The AR-based vehicular display system was essential for driving situation awareness, navigation and overall deployment as a user interface.
Yoon et al. demonstrated an improved forward collision alert system detection of cars and pedestrians fused into the HUD with augmented reality through using stereo cameras and visualized early alerts where SVM classifier was applied for object recognition and obtained an F1 score of 86.75% for car identification and 84.17% for pedestrian identification [
176]. The limitation of the work was noticed when the observed object moved rapidly and the car suddenly turned; it was visualized with delay. The proposed system yet needed to optimize efficiency and acceleration which in diverse vehicle conditions responds robustly to different and high speeds.
An analysis showed personal navigation with AR navigation assist equipped for use with a volumetric 3D-HUD and utilizing its parameters. An interface was developed for assisting to turn faster by locating turn points quicker than during regular navigation [
177]. The interface also helped to maintain user eyes and to fix them more precisely on the driving environment after analyzing traffic scenes with deep learning algorithm with proper registration of applications via spatial orientation of AR views on interface. On the basis of the results, however, the inadequate perception of the depth of a specified 2D HUD distance is obvious and the navigation system’s AR interface was ineffective without a 3D HUD.
An automatic AR based on a road tracking information method registration was introduced by Yoon et al., with a SIFT matching function and homography measurement method, which defined matching between camera and HUD providing the driver’s view was positioned to the front, which detected vehicle and pedestrians and converted them into AR contents after projective transformation [
178]. This solution was good enough for daytime performance but had limitations at nighttime. Nevertheless, the procedure had the ability to automate the matching without user interference, but it is inconvenient while projecting outcomes which occurred due to misreading local correspondence.
Park et al. demonstrated an AR-HUD-based driving safety instruction by identifying vehicle and pedestrians using the INRIA dataset [
179]. The identification method was built using SVM and HOG with 72% and 74% in fps accuracy and detected partial obstacles, respectively, applying a billboard sweep stereo (BSS) algorithm. The detected vehicles and pedestrians were overlapped on the HUD with the AR technique. Despite detecting obstacles in sunny and rainy scenarios, it was not deployed for nighttime scenarios.
In order to integrate outcomes with AR, the system was divided into two parts by Rao et al., 3D object detection and 3D surface reconstruction, to develop object-level 3D reconstruction using Gaussian Process Latent Variable Model (GPLVM) with SegNet and VPNet for in-vehicle augmented reality UI and parking system [
180]. Their AR-based visualization system was built with monocular 3D shaping, which was a very cost-efficient model and needed only a single frame in the input layer.
Furthermore, a new traffic sign-recognition framework based on AR was constructed by Abdi and Meddeb to overlay traffic signs with more recognizable icons overlapped in an AR-HUD to increase the visualization of a driver aiming to improve safety [
181]. The Haar Cascade detector and the verification of the theory using BoVW were combined with the relative spatial data between visual words, which had proven to be a reasonable balance between resource efficiency and overall results. A classifier with an ROI and allocated 3D traffic sign was subsequently developed using a linear support vector machine that required less training and computation time. During the decision-making process, this state-of-the-art methodology influenced the distribution of visual attention and could be more consistent with the improved approach of deep learning recognition relying on the GPU.
Addressing the challenge of overtaking an on-road slow vehicle, a see-through effect-based marker-less real-time driving system had been demonstrated by Rameau et al., applying AR [
182]. To overcome the occlusion and produce a seamless see-through effect, a 3D map of the surroundings was created using an upper-mounted camera and implementing an in-vehicle pose predictor system. With up to 15 FPS, they presented a faster novel real-time 2D–3D tracking strategy for localization of rear in a 3D map. For the purpose of decreasing bandwidth usage, the ROI was switched to the rear car impacted by an occlusion conflict. This tracking method on AR-HUD showed great efficiency and easy adoption capability for vehicle displaying systems.
To reduce the accident cases, Abdi et al. proposed augmented reality-based head-up display providing more essential surrounding traffic data as well as increasing interactions between drivers and vehicles to enhance drivers’ focus on the road [
183]. A custom deep CNN architecture was implemented to identify obstacles and final outputs will be projected in the AR head-up display. For AR-based projection in HUD, firstly, pose prediction of targeted ROIs were carried out and obtained 3D coordinates with points after achieving camera projection matrix to recognize AR 3D registration. This step created a 6-DOF pose of translation and rotation parameters which will be helpful for motion estimation calculation with planar homograph. Afterwards, the RANSAC method was applied to compute the homograph matrix, and OpenGL real camera was synchronized with a virtual camera that showed a projection matrix to map 2D points utilizing 3D surface points and developed a marker-less approach.
Lindemann et al. demonstrated an augmented reality-based windshield display system for autonomous vehicle with a view to assisting driving situation awareness in city areas and increase automated driving level from level 4 to 5 [
184]. This AR-based windshield display UI was developed based on deep learning-applied object detection to enhance situation awareness, aiming at both clear and lower-visibility conditions where they obtained very different situation awareness scores in low-visibility conditions in disabled windshield display but failed to obtain a good score when windshield UI was enabled. Nevertheless, it worked significantly better in clear weather conditions.
Park et al. presented a 2D histogram of oriented gradient (HOG) tracker and an online support vector machine (SVM) re-detector based on training of the TLD (tracking-learning-detector) functional vehicle tracking system for AR-HUD using equi-height mosaicking image (EHMI) [
185]. The system initially performed tracking on the pre-computed 2D HOG EHMI, when the vehicle was identified in the last frame. If the tracking failed, the system started re-detection using an online learning-based SVM classification. The tracking system conducted online learning frequently after the vehicle had been registered and minimized the further calculation necessary for tracking as the HOG descriptor for EHMI was already determined in the detection phase. The technique was perfect for deploying in various lighting and occlusion scenes since it adopted online learning. Refining the algorithm to make optimized hardware or embedded device and to identify other dangerous obstacles effectively in road scenes, this lightweight architecture-based proposed work could be a more acceptable approach for faster tracking and visualizing in HUD.
To represent driving situation awareness data, Park et al. introduced a vehicle augmented-reality system that deducts drivers’ distractions with an AR-based windshield of the Genesis DH model from Hyundai motors [
186]. The system presented driving conditions and warned a driver using a head-up monitor via the augmented reality. The system included a range of sub-modules, including vehicle and pedestrian recognition based on the deep learning model of [
179], vehicle state data, driving data, time to collision (TTC), hazard evaluation, alert policy and display modules. During most experiments, on the basis of TTC values and driver priority, the threat levels and application of augmented EHMI was already determined in the detection phase.
In this section, a combination of deep learning algorithms and their outcomes were visualized as the final task of AVS showing them in an AR-based HUD for better driving assistance. AR-HUD was adopted due to visualization in front display for early warning, navigation, object marking by overlapping, ensuring safety and better tracking. Although these studies had successful demonstrations, some major limitations were detected when analyzing the studies, such as visual delay for the case of sudden turn or rapid-moving objects, misreading of local correspondence, high computational cost while 3D shaping, visualizing challenges in extreme contrast and distraction for complex UI. 
Table 13 provides a summary of the section.