1. Introduction
Developments in mobility technology arise from the need to achieve greater independence and safety in ambiguous or challenging situations. The latest revolution in transportation is fully autonomous vehicles (AVs), which are equipped with advanced sensor technology, smart controllers, and intelligent actuators with the aim of providing a hassle-free driving experience for passengers and road users. Furthermore, complex algorithms and communication technologies integrated into autonomous driving systems, such as path planners, path trackers, and object detectors, help vehicles plan, analyze, and make appropriate decisions in their environments.
Nevertheless, uncertainty greatly hinders the adoption of AVs in real life. The complexity and natural behavior of the related built-in technologies lead to large errors and ambiguities in all vehicle subsystems. For example, this can be observed under adverse weather conditions: visual sensors cannot collect sufficient valuable information in rain or snow. The subsystems in AVs completely depend on measurements obtained by onboard sensors, including global navigation satellite systems (GNSSs), inertial measurement units (IMUs), cameras, and light detection and ranging (LiDAR) sensors, which are highly susceptible to varying environmental conditions as well as road features and geometries. Noisy data can lead to unsafe behavior and potential traffic accidents. Therefore, to allow vehicles to operate safely in the real world and reduce the ambiguity of sensor measurements, the associated errors require careful handling. Several attempts have been made to address this issue. For instance, some authors (e.g., [
1]) have fused multiple sensor readings to enhance the performance of the navigation system in uncertain situations. Other authors (see [
2,
3]) tackled the uncertainty issue using path planning approaches. However, the aleatoric uncertainty, which is related to the quality of the sensors and their measurement accuracy, is strongly correlated with the environmental conditions, and this correlation has been overlooked by the previous works. A thorough literature review of uncertainty-handling approaches is given in
Section 2.
With attention to residual correction, the Kalman filter is widely utilized as a state-based model and is highly appropriate for modeling the major random processes affecting AVs. In particular, GNSS readings are often far off the actual values. Through the incorporation of other sensor readings (i.e., sensor fusion), such as IMU signals, the Kalman filter is capable of correcting these innovations. One major limitation of this approach, however, is that such a state estimation procedure fails to predict the source of the uncertainty; consequently, the vehicle may not be warned about challenging conditions such that it can act to avoid a crash. Suitable analysis of the driving environment might yield valuable contextual information about the degraded sensor performance. Such information could potentially allow the sensor uncertainty to be modeled using machine learning.
Machine learning models endow AVs with intelligent functions. These models enable AVs to collect massive amounts of data from their environments using sensors, analyze those data, and ultimately make correct decisions accordingly. These models can learn to perform tasks as efficiently as humans. In such a way, machine learning can replace traditional techniques, making it competent for a wide range of AV functions such as object detection, classification, and segmentation. One powerful type of model, called a Bayesian neural network, is a hybrid of a deep neural network (DNN) and a probabilistic model, combining the flexibility of DNNs with the ability to estimate the uncertainty of its predictions.
The aim of the research presented in this paper is to understand how the uncertainty that arises under challenging conditions affects AV models and data, that is, how uncertainty from different sources influences model estimation. An end-to-end predictive approach for the sensor uncertainty of an AV is presented. A Kalman filter outputs the sensor uncertainty for each sensor on each road segment. These uncertainties are then correlated with the environmental and road features to predict sensor behaviors in future similar scenarios. This approach can be used to obtain an estimate of the sensor quality in any given space. The presented technique can be generalized and adapted to any driving subsystem to improve its quality. For instance, the resultant sensor uncertainty estimates can be incorporated into a path planner to avoid high-risk road segments.
In brief, the main contributions of this paper are as follows.
To handle sensor uncertainty in AVs, an end-to-end context-aware approach is proposed that takes advantage of both Kalman filtering and machine learning and allows for both epistemic and aleatoric uncertainties.
An evaluation of the proposed approach is performed using real data, and the results reveal that our approach can estimate the sensor uncertainty with a low margin of error.
The rest of this paper is organized as follows.
Section 2 reviews related works.
Section 3 introduces the proposed method.
Section 4 presents experimental results to illustrate and validate the theoretical findings. Finally,
Section 5 offers conclusions and perspectives on future work.
3. Method
The underlying aim of our approach is to enable cooperation between a Kalman filter and a machine learning model to estimate sensor uncertainty in a particular road segment. There are many possible Kalman filter variants and machine learning models other than those proposed in this paper. However, the results of our experiments demonstrate highly reliable outcomes overall when following the method detailed below. The proposed method includes three primary components: an ensemble-based Kalman filter, a sensor uncertainty estimator, and a BNN model.
Figure 1 presents an overview of the proposed method.
3.1. Ensemble-Based Kalman Filter
The Kalman filter is a powerful and practical estimator that is adopted in many AV systems, such as controllers, localization and mapping systems, and object trackers. The basic Kalman filter can generate estimates only for models described in terms of linear relationships. However, the sensor data collected by AVs are nonlinear; therefore, for such a filter to be used, the corresponding models must be decomposed into linear components. To assimilate these complex data, this work employs the EKF, which associates states with a measurement or observation space using nonlinear functions. The EKF consists of two stages: prediction and correction. The filter is first initialized with a state
with error covariance
. In the prediction stage, the state and the error covariance are projected forward as expressed by the following equations:
where
is a nonlinear model of the state transition dynamics,
u is a control variable vector,
Q is the process noise covariance matrix, and
F is an approximation of
obtained using the first derivative of the Taylor series called the Jacobian matrix, which is defined as
In the correction stage, the filter computes the optimal estimate by updating the initial estimate with a new sensor measurement
z using the relative weight or Kalman gain
K and also updates the error covariance.
where
and
H are functions that associate the vehicle’s state with the available observation space and its Jacobian, respectively, and
R is the sensor noise covariance matrix or measurement error.
In this paper, the Kalman filter is utilized to perform the localization task using only one target sensor and another supporting sensor (i.e., an IMU). The purpose of the EKF is not to localize the vehicle as in standard applications; rather, it is to detect and quantify the variation between the optimal estimated position and the observed position. IMUs as supporting sensors are less susceptible to errors, and the relevant errors can be corrected through filtering. The filtering error for each road segment is stored for use in future uncertainty prediction. Given an estimate of the current position, velocity, and acceleration of the vehicle, the future vehicle motion can be predicted using the laws of motion (or a Newtonian prediction model). Thus, the state vector to be estimated for each movement is defined as:
where
and
are the coordinates observed by sensor
m,
is the angular speed, and
s is the speed at which the vehicle moves. The angular speed and the true speed can be calculated via the following equations:
where
and
are the first derivatives (i.e., velocity) of the vehicle’s movement in the
x and
y coordinates and
and
are the second derivatives (i.e., acceleration). Given the state vector function
, the EKF linearizes the state transition matrix
H by evaluating the Jacobian (Equation (
10)) at the predicted state vector.
Another adjustment worth mentioning is that in the prediction stage, the process noise covariance matrix,
Q, incorporates the third derivatives (i.e., jerk) of the vehicle’s
x and
y positions (see Equation (
11)) as Gaussian random variables with zero mean and variances of
and
. These variances are typically unknown and thus are estimated from the historical data, as shown in Equation (
12). Considering the vehicle’s jerk in
Q enables the correction of errors caused by rapid acceleration changes, which are inevitable in AVs.
The sensor measurement vector, z, is designed to incorporate the noisy readings (x and y coordinates) that require filtering as measured by the target sensor. Reliable sensor signals such as velocity and acceleration are injected into the filtering process in the form of a control vector. The EKF estimates the optimal positions of the vehicle. These positions are then utilized to infer the sensor uncertainty in the current driving environment.
An ensemble-based filtering approach is adopted in this work for each sensor in the AV to enable the filter to converge quickly. Instead of a single EKF, we utilize several EKFs, each of which is initialized with different random values of the EKF parameters. These EKFs receive the same measurements, and their outcomes are aggregated before proceeding to the next step (i.e., uncertainty estimation).
3.2. Sensor Uncertainty Estimation
The objective is to estimate the sensor uncertainty, which quantifies the expected accuracy when using the target sensor in a specific situation. Therefore, the numerical uncertainty value is an estimate of the error. That is, the uncertainty indicates the quality of the sensor measurements, but it is not a guarantee of accuracy. The focus in this paper is to estimate the error as influenced or controlled by certain challenging conditions (i.e., random errors), which is complex enough that it cannot be filtered using the EKF.
One way to measure modeling errors in Gaussian systems is to use the Mahalanobis distances of the measurement residuals. In some of the literature, this approach is used to improve the filtering outcomes and remove outliers [
47,
48]. The Mahalanobis distance is adopted in this paper for sensor uncertainty estimation. The measurement residual of the EKF is defined as:
For the nonlinear Gaussian system presented in (
1) and (
2), the residuals should follow an exact distribution, with the projection of the process uncertainty into the observation space given by
Thus, the Mahalanobis distances
of the measurement residuals should follow the chi-square distribution with the same number of degrees of freedom as the measurement.
Using this approach, we can determine how many standard deviations the current measurement is away from the optimal estimated state. These estimates of sensor uncertainty at every time step are maintained in the uncertainty pool to retrain the predictive model. The uncertainty values vary among different road segments due to changes in the environmental conditions. Therefore, these uncertainty estimates are aggregated by their associated road segments.
3.3. Bayesian Neural Networks
Incorporating a probabilistic approach into deep learning enables the elimination of uncertainty. To achieve the uncertainty elimination, such models assign lower confidence levels to inaccurate predictions. BNNs are utilized in this research to capture the epistemic uncertainty, which is the uncertainty related to the model fitness due to the availability of insufficient data from which to learn effectively. BNNs differ from standard neural networks in that BNNs learn probability distributions over the network weights to encode the uncertainty of the weights, whereas standard neural networks learn certain weight and bias values (see
Figure 2). Before observing any data, a
prior probability distribution is defined. Once data are observed, the model learns the parameters and transforms the distribution into a
posterior distribution. In this work, the
prior and
posterior distributions are Gaussian in nature since the BNN output
y is the expected sensor error indicating the amount of sensor uncertainty, which is a continuous variable. The features considered in the BNN model include road and environmental attributes such as the number of lanes and the weather conditions.
Ideally, high uncertainty in a BNN indicates samples that are far away from the evaluated data, while the uncertainty of close samples is low. Given a training dataset
, the optimal solution for estimating the
posterior distribution of the network parameters is to use full Bayesian inference or Bayes’ theorem, expressed as
where
w denotes the network parameters;
and
are the posterior and prior parameter distributions, respectively;
is the likelihood function; and
is the probability of the training dataset occurring.
A prediction of a new sample,
, can be computed by taking the expected value of the optimized posterior distribution,
, as follows:
This expectation serves as the average prediction of an infinite number of neural networks, weighted by their posterior probabilities. Although this prediction is resistant to noise, an analytical solution for both the posterior distribution and the prediction is computationally intractable. Hence, we resort to approximation. In this work, a deterministic method, i.e., variational inference, is employed. The basic concept of the variational technique is to formulate the inference task as an optimization problem. This optimization problem is tackled by approximating the actual posterior distribution,
, by an instance of tractable distributions,
; i.e., a
that closely resembles
is determined. Because the variational posterior is Gaussian, it is parameterized by
with a mean
and a variance
. Instead of querying
, the BNN queries
to obtain an approximate solution, which is formally accomplished by minimizing the Kullback–Leibler (KL) divergence between the two probability distributions
and
. The related optimization objective is called the
variational free energy and is expressed as a combination of
complexity and
error losses:
where the first term is the
KL divergence between
and
, and the second term is the expectation of the likelihood function under the variational distribution
. The complexity cost determines the network cost, whereas the error cost determines the prediction accuracy
Rearranging the terms in Equation (
18) allows the optimization objective function to be written as
Hence, samples
drawn from
can approximate the objective function as follows:
Each iteration of the training process involves a forward propagation phase and a backward propagation phase. In the forward propagation phase, a random sample is drawn from the variational distribution and propagated through the network until it reaches the final layer. Then, the approximate objective function (Equation (
20)) can be evaluated. The first two terms of Equation (
20) can be evaluated in each layer since they are independent of the data, while the third term can be evaluated only at the forward-propagation end due to its data dependence. In the backward propagation phase, the gradients of the parameters of the variational posterior distribution are computed, and their values are adjusted accordingly.
The BNN model produces different estimates for identical inputs during each run as new weights are sampled from the distributions each time to build the network and generate an output. As the model becomes more certain about its weights, the outputs produced for the same inputs reveal less variability. This epistemic uncertainty is reduced when AVs are exposed to increasingly many different scenarios. Moreover, the output of standard BNN models are a point estimate for a given input. In our proposed approach, the output is modeled as normal distribution with learnable parameters. In this case, the BNN model also incorporates the aleatoric uncertainty caused by irreducible noise in sensor data. Due to this merit, the model can be trained with the negative log-likelihood function as the loss function.
3.4. Pretraining Process
The predictive model starts with equal probabilities for the uncertainty in all road segments, resulting in equal predictions. To improve the initial results, our approach assumes that some data samples exist for pretraining purposes. In this stage, the BNN model is fed with estimates of sensor uncertainty as well as the associated road features and environmental conditions. The resultant model can be utilized as an initial pretrained model and refined as the approach continues to operate.
4. Evaluation and Discussion
Extensive experiments were conducted on a real dataset to evaluate the performance of the proposed approach. Python 3.9 was utilized for implementation. All experiments were carried out on a system running macOS 11.5 Big Sur with a 2.9 GHz dual-core Intel Core i5 processor and 8 GB of memory. The implementation of the Kalman filter was developed from scratch, whereas for the BNN implementation, Keras [
49] and TensorFlow Probability [
50] were used. We evaluated the uncertainty existing in the data and generated by the GNSS.
Our evaluation included two data sources. The first data source was the Ford AV dataset [
51], which logs 18 trajectories captured by a fleet of Ford AVs under different environmental and driving conditions, including different weather, traffic, lighting, and construction conditions. With these diverse conditions, these data encompass different sources of uncertainty and thus are perfectly aligned with the objective of our proposed approach.
Table 1 summarizes the major features of the routes in the Ford AV dataset. The data were collected while the vehicles were being driven manually in Michigan.
The recorded coordinates were matched to the real-world road network in Open Street Map [
52]. Accordingly, the attributes associated with the matched roads could be obtained and processed. To solve the map matching problem, a hidden Markov model approach was exploited [
53]. Given the GNSS observations, each observation was matched to the most likely candidate road segment. An open-source routing software called Valhalla was utilized for map matching [
54]. The GNSS traces were matched based on time, acceleration, and directional angles to achieve the best matching results.
Since our focus was on predicting GNSS uncertainty, we selected four features (layer, geometry, speed limit, lane count) that could capture degradation in GNSS performance and fed them into the predictive model. The layer feature indicates the existence of occlusions preventing signals from reaching the GNSS receiver. The layer feature could have a positive or negative number, where the former indicates bridges, and the latter indicates tunnels and under-bridge road segments; a layer value of zero indicates normal road segments. The geometric complexity feature indicates the number of points representing a particular road segment (also known as resolution). For instance, the geometric complexity feature for a simple straight road segment (a line) has only two points. The posted speed limit feature indicates the road type, such as a highway or a residential road. The lane count feature indicates the road width. On a large road segment, which typically may have multiple-lanes, there is a lower chance of signals being blocked by trees or buildings.
The architecture of the BNN consisted of four layers: an input layer, two hidden layers, and an output layer (see
Figure 2). The input layer included four nodes representing the features mentioned earlier. The first and second hidden layers consisted of six and nine nodes, respectively. The output layer included only one node representing the sensor uncertainty. The
sigmoid activation function was applied for each node in the BNN hidden layers. The optimizer utilized to train the model was the RMSprop algorithm. The BNN was designed to produce a normal distribution so that we could calculate how probable was it that the actual data would be seen in the model’s predicted distribution. Therefore, the model was trained with the negative log-likelihood as the loss function. We also evaluated the accuracy in terms of the root mean square error (RMSE) and the mean absolute error (MAE).
The EKF received sensory and derived measurements, including position, velocity, acceleration, jerk, speed, and angular speed. An instance of sensory measurements fed into the EKF is presented in
Figure 3. Once the EKF had processed these measurements, the sensor uncertainty estimates were stored in the uncertainty pool. The stored estimates were aggregated by their associated road segments, and the average was computed. The resultant data consisted of 971 road segments for analysis. For evaluation purposes, we randomly selected 13 logs for training and 5 logs for testing. The selected test logs were Aug V2—Log 2, Aug V3—Log 1, Aug V3—Log 2, Aug V3—Log 6, and Oct V2—Log 3. The training set was shuffled and split into training and evaluation sets, corresponding to 82% and 18% of the samples, respectively. The entire approach was evaluated with earlier configurations on 13 out of the 18 logs. When testing the BNN outcomes, we predicted the GNSS uncertainties produced for the five different test logs and then compared those results with those estimated by the EKF. The findings revealed promising sensor uncertainty estimation performance.
Figure 4a depicts considerable differences between the GNSS uncertainties on different routes. These discrepancies are anticipated due to varying environmental conditions. However, routes with the same features demonstrate similar trends. For instance, Vehicles 2 and 3 exhibit similar trends on their August routes. Nevertheless, at the micro level, these identical routes have different GNSS uncertainties on individual road segments. In other words, the uncertainties differ within the same road segments.
Figure 4b illustrates the maximum GNSS uncertainties estimated for each route. It is apparent that in this case, the trends between Vehicles 2 and 3 disappear, and the differences between the maximum values can be large, such as the difference between the first logs in August for Vehicles 2 and 3, although they have identical route attributes but not environmental attributes. The fact that these vehicles run under different conditions is confirmed by examining the standard deviations of the GNSS uncertainties for the associated routes in
Figure 4a.
Figure 5 shows how the negative log-likelihood loss changes over the epochs along with the RMSE values for both the training and evaluation datasets. As can be observed, the learning curve converges very quickly during training, after approximately 1800 epochs, while the validation set average loss exhibits fluctuations across the finally epochs. Since the proposed approach is designed to handle thousands of samples, this rapid convergence could be related to the relatively low number of samples used for training. Nevertheless, the difference between the prediction accuracies on the training and evaluation sets is not significant, with training RMSE = 0.15 and evaluation RMSE = 0.155. Moreover, the training losses are higher during some epochs because relatively few samples were used for validation. An evaluation of the trained model on unseen data demonstrates an accuracy very close to the training accuracy (i.e., RMSE = 0.157), which implies that the model is not overfitted.
Each time being called the model produces different outputs because the weights are sampled from the given distributions.
Figure 6a shows an instance of BNN prediction results for the test datasets. The grey vertical bars depict the 95% confidence intervals of the associated predictions. The less certain the model predictions are, the wider the intervals are seen in the output. It can be seen in
Figure 6a that very few points are not predicted accurately. We ascribe mispredictions to the lack of similar examples in the training dataset. For these particular predictions, the RMSE and MAE are 0.119 and 0.087, respectively, (
Figure 6b). It is apparent from
Figure 6b that most of GNSS uncertainties are predicted with errors of less than 0.1 meters.
If provided with a model with a high level of confidence in its estimates of sensor uncertainty, the decision-making and planning for AVs could potentially be improved. Armed with the knowledge of the uncertainty of each sensor on every road segment, an AV could avoid or better maneuver through road segments with high uncertainty. Moreover, the AV might be able to set sensor preferences or prioritize road segments based on their high associated sensing accuracy. In this way, better decisions regarding challenging situations could certainly lead to safer navigation.
5. Conclusions and Future Research
This paper presents an end-to-end approach for exploiting the EKF and a BNN model with contextual information to predict the sensor uncertainties arising in challenging situations. The approach incorporates the EKF to measure the error encountered in the current situation. Using this error, the sensor uncertainty is estimated and associated with the context or environmental information for use in predicting future uncertainties. The proposed approach considers epistemic uncertainties, which are related to the lack of knowledge, and aleatoric uncertainties, which are related to stochastic nature of the data acquisition process. The results show that our learning approach performs very well in predicting GNSS uncertainties in real data. This approach has the potential to improve navigation safety for passengers and road users. With advance knowledge of possible sensor failures, AVs can better maneuver through challenging road segments that present risky conditions. Applications of our proposed approach are not limited to path planning; rather, all AV modules, such as localization, lane-keeping assistance, blind-spot detection, and object detection, can benefit from sensor uncertainty estimates. For instance, upon determination that a navigation-related sensor provides estimates with high uncertainty, the localization module can disregard its readings, avoiding localization with large errors.
However, given that neural networks are data intensive, through sharing data among AVs, the proposed approach can alleviate possible uncertain situations. Since sensors are fabricated by varying vendors and have different configurations, future research should consider modeling features of sensor configurations and qualities along with the environmental features. Moreover, a new decision-making module should be developed to analyze the sensor uncertainty estimates and support AVs in making the proper decision for each task.