A Bayesian Surprise Approach in Designing Cognitive Radar for Autonomous Driving

This article proposes the Bayesian surprise as the main methodology that drives the cognitive radar to estimate a target’s future state (i.e., velocity, distance) from noisy measurements and execute a decision to minimize the estimation error over time. The research aims to demonstrate whether the cognitive radar as an autonomous system can modify its internal model (i.e., waveform parameters) to gain consecutive informative measurements based on the Bayesian surprise. By assuming that the radar measurements are constructed from linear Gaussian state-space models, the paper applies Kalman filtering to perform state estimation for a simple vehicle-following scenario. According to the filter’s estimate, the sensor measures the contribution of prospective waveforms—which are available from the sensor profile library—to state estimation and selects the one that maximizes the expectation of Bayesian surprise. Numerous experiments examine the estimation performance of the proposed cognitive radar for single-target tracking in practical highway and urban driving environments. The robustness of the proposed method is compared to the state-of-the-art for various error measures. Results indicate that the Bayesian surprise outperforms its competitors with respect to the mean square relative error when one-step and multiple-step planning is considered.


Introduction
Despite a precipitous drop in driving during the pandemic, the Governors Highway Safety Association (GHSA) of the United States reported that 2020 had the most significant annual increase in pedestrian deaths [1]. This shocking report indicated that the fatality rate for pedestrians spiked by 21% compared to the previous year. Big technology companies have invested in making autonomous radar an integral part of safety systems to prevent such accidents [2]. Current driver assistance technologies use a combination of sensors (e.g., radar, LiDAR (light detection and ranging), camera, GPS, etc.) and software to identify certain safety risks that help the driver to avoid accidents [3,4]. Compared to video cameras and LiDAR, a radar sensor is unaffected by bad weather and light conditions, and it can also detect hidden targets behind other vehicles [5,6]. Undeniably, a well-designed radar system that further advances safety benefits is indispensable for the evolution of automotive technology.
Cognitive radar, first introduced by S. Haykin [7], is an engineering tool to build intelligent tracking sensors which will eventually make autonomous driving a reality [8]. The model was inspired by the perception-action process that takes place in the brain [9]. The cognitive radar continuously interacts with its surroundings to gather information and adapts its operating parameters to ensure accurate target tracking without human control. The sensor selects a transmit waveform that anticipates a better estimate of the target's state (i.e., distance, velocity) based on the information provided by the received radar measurements. The challenge in designing cognitive radar arises from the uncertain environment [10]. The design objective is to achieve an estimate of the target's state that minimizes the mean squared error and to gain informative radar measurements in the presence of such disturbances. To this end, this paper focuses on quantifying new information from noisy radar measurements to improve the state estimation process over time.
In the perception and learning literature, surprising events which violate prior expectations encourage information-seeking behaviors that affect learning and decision making [11]. Surprise is an emotion resulting from a discrepancy between an expectation and an actual observation [12]. It measures the amount of information that is associated with an unexpected event [13]. Several definitions and expressions of surprise have been proposed in previous studies [14][15][16][17][18]. The most common forms of surprise are the Shannon surprise [14], the Bayesian surprise [15], and the free energy [16]. For a biological agent, the Shannon surprise measures the unlikeliness of an outcome [14]. Meanwhile, the Bayesian surprise measures how much an agent's expectation changes when a new observation is made [15]. In [16], the free energy principle suggests that biological agents make decisions by reducing the Shannon surprise, and adjust their (internal) models to make better predictions by minimizing the Bayesian surprise.
This research adopts surprise as the main methodology to measure the amount of new information within the received radar measurements. In particular, the paper considers the Bayesian surprise since it computes how much information a new measurement provides to estimate future states based on prior knowledge. In previous works, the Bayesian surprise has been applied to different models and applications to acquire information from data [13,[19][20][21][22]. In [13,19], the Bayesian surprise measures attention and anticipates the human gaze to enhance computer vision applications. A similar attempt is followed in [20], where the Bayesian surprise detects anomalies for autonomous guided vehicles in an unsupervised fashion. In associative learning, the Bayesian surprise is also employed as an error-correction learning rule for the Rescorla-Wagner model [21]. A recent paper considers a Bayesian interpretation of surprise-based learning to perform model estimation [22]. It should be noted that selecting informative measurements using Bayesian surprise to improve state estimation can be viewed as active measurement selection for regression analysis. Similar ideas have been researched and implemented in the literature, and several previously shown results could be interpreted and re-introduced through the Bayesian surprise framework. The interested reader may wish to consult [23][24][25] for input and insight. Our theory is that the Bayesian surprise framework will help understand commonalities amongst methods, lead to interesting connections, and inspire future developments.
For a simple vehicle-following scenario, this article proposes a new design of cognitive radar that operates based on the Bayesian surprise. This research generates radar measurements from a family of linear Gaussian dynamic systems. Assuming that the parameters of the system are known, the Kalman filter [26] is applied as the optimal estimator in the mean squared error sense. Given the current estimated target's state, the sensor plans by measuring how much information each waveform-available from a predefined set-contributes to state estimation, and selects the one that conveys the maximum information based on the Bayesian surprise. In addition, the paper investigates the estimation algorithms for one-step and multiple-step planning. This proposed design assumes that the sensor is equipped with a predefined set of measurement noise covariances, where each one corresponds to a distinct waveform. Compared to other forms of surprise [14,16], the authors of this paper anticipate that the Bayesian surprise provides sufficient information to minimize the state estimation error.
Despite significant achievements in designing cognitive radar [9,[27][28][29], there remains limited literature that compares radar designs based on the choice of information measure and its associated waveform-/measurement-selection procedure. For the first time, this paper systemically analyzes the works in [16,27,30] as alternative methods for designing cognitive radar. Except in [27], where the Shannon entropy is used to quantify the informa-tion within radar measurements, the free energy principle [16] and the influence matrix [30] have not been directly addressed to solve the state estimation problem in cognitive radar. In addition, this work re-introduces these methods in the context of linear Gaussian dynamic models and shows how they are related to the proposed approach.
Numerous experiments are carried out to comprehensively evaluate and compare the estimation performance of the proposed cognitive radar with the state-of-the-art. The millimeter-wave radar sensor presumes transmitting frequency-modulated continuous wave (FMCW) signals operating in the 77 GHz frequency band [31]. The paper designs the parameters of the FMCW radar in a manner that supports single-target tracking in highway and urban environments. The paper considers various error measures to examine different aspects of estimation performance. The credibility of the proposed estimation algorithm is ranked based on a pairwise comparison scheme [32]. Simulation results determine whether the tracking performance is improved when the radar switches from one-step planning to multiple-step planning.
The rest of this paper is organized as follows. Section 2 presents the model assumptions and defines the research objective in designing cognitive radar for a simple vehiclefollowing scenario. Section 3 briefly reviews prior works and demonstrates our proposed method to solve the state estimation problem in cognitive radar. Section 4 evaluates the estimation performance of the proposed approach by emulating real-life driving scenarios.
Results are compared to alternative designs for different error measures. Finally, Section 5 concludes the paper.

Notation
In this paper, scalar variables are represented by non-bold lowercase letters (e.g., c), the vectors are denoted by bold lowercase letters (e.g., x), and matrices and sets of vectors are shown as uppercase bold letters, (e.g., F). In addition, tr{.}, |.|, and ||.|| represent the trace operator (e.g., tr{A}), the determinant operator (e.g., |A|), and the norm operator (e.g., ||x|| 2 P −1 = x T P −1 x), respectively. Moreover, {.} T applies the transpose operation on matrices (e.g., F T ). Figure 1a illustrates a simple vehicle-following scenario, where a cognitive radar is mounted on the host vehicle, tracking the state dynamics of a target vehicle (i.e., distance, velocity, etc.). Let us consider that something unexpected occurs, and the dynamics of the target vehicle change. To avoid a collision, the cognitive radar must be able to detect these changes to adjust the dynamics of the host vehicle accordingly. The radar signal received at the host vehicle provides information about the target's state. According to this information, the cognitive radar makes a decision and sends a waveform (or signal) that can provide a better estimate of the target's state at future time instances.

Problem Formulation
(a) (b) Figure 1. (a) A simple vehicle-following scenario [28] and (b) the block diagram of the cognitive radar as an autonomous system.
The goal of the cognitive radar is to ensure reliable and accurate tracking of the target over time. To accomplish this objective, Figure 1b presents a simple design of a cognitive radar as an autonomous system. Inspired by the cognitive dynamic system [27], the model consists of a radar environment, a receiver, an information processor, and a transmitter. The target of interest (i.e., target vehicle) is embedded in the radar environment. It is assumed that the system is equipped with a sensor profile library containing several types of waveforms. Let us consider the stages that the cognitive radar undergoes for a single cycle. Note that the term "cycle" refers to the processes that take place at one time instant. An estimate of a target's state is determined at the receiver by processing measurements from the radar environment. The information processor measures how much information each waveform-available from the sensor profile library-contributes to estimating the target's state for the next cycle by planning multiple time steps. Finally, the transmitter selects the waveform that leads to informative radar measurements and provides an improved estimate of the target's state. The sensor applies the chosen waveform to the radar environment and repeats the same cycle.
This research proposes a holistic methodology to quantify information and maintain informative radar measurements that minimize the state estimation error over time. To this end, the following introduces the assumptions made to model the cognitive radar and formulates the design objectives of this research.

Model Assumptions
The following presents the model assumptions to construct radar measurements for the simple vehicle-following scenario and addresses the sensor profile library.

Linear Gaussian Dynamic System
For the simple driving case depicted in Figure 1a, the radar measurements at time index k, denoted as z k ∈ R m , are obtained from a set of linear Gaussian state-space models [26], expressed as follows: where the evolution of the state vector, denoted as x k ∈ R n , follows a first-order Markov chain process. In this problem, the state represents the entities of motion regarding the host and target vehicle, written as where v 0 x,k and a 0 x,k are the velocity and acceleration of the host vehicle; v 1 x,k and a 1 x,k are the velocity and acceleration of the target vehicle; d x,k represents the longitude distance between the two cars. In Equation (1), F k ∈ R n×n and H k ∈ R m×n are, respectively, the transition matrix and the measurement matrix. Meanwhile, the state noise w k ∈ R n and measurement noise v k ∈ R m are assumed additive zero-mean white Gaussian processes, where Q k ∈ R n×n and R k ∈ R m×m are the state noise covariance and the measurement noise covariance, respectively. Note that the initial state follows a Gaussian distribution, denoted as x 0 ∼ N (x(0|0), P(0|0)), and is mutually uncorrelated with the noise elements.
According to the equations of motion that presume constant acceleration, F k and Q k are derived as [33] and where T s and σ 2 q refer to the sample time and the state noise variance, respectively. Since the dynamics of the target vehicle is of interest, the measurement matrix is assigned as where the velocity of the target vehicle, v 1 x,k , and the longitude distance, d x,k , are the available radar measurements. The choice of the measurement noise covariance depends on the waveform that the radar sensor conveys for target tracking. FMCW is the most well-known modulation format, where linear frequency ramps with different slopes are transmitted [6]. The FMCW modulation with a Gaussian-shaped pulse is commonly used in designing autonomous radars since it exhibits excellent range and velocity resolution. Thus, for a Gaussian-shaped pulse with FMCW modulation, R k is defined as follows [34]: where λ k−1 , b k−1 , c, f c , B, and η are the pulse duration, the chirp rate, the speed of light, the carrier frequency, the signal bandwidth, and the received signal-to-noise ratio (SNR), respectively. As shown in Equation (6), the measurement noise covariance depends on the pulse duration and chirp rate at the k − 1 time index. This indicates that the system's selection of the transmitted waveform (i.e., λ k−1 and b k−1 ) at the previous time cycle influences the radar measurements (i.e., z k ) at the current cycle. Both λ k−1 and b k−1 are the design parameters that signify the radar waveform based on the tracking application (e.g., single-or multiple-target tracking). Since the transmitter and the receiver of the radar sensor are both positioned on the host vehicle, the received SNR for the target vehicle located at distance d = d 2 x + d 2 y may be obtained as [34] where d y is the lateral distance and d 0 is the distance at which 0 dB SNR is achieved. Note that linear Gaussian dynamic systems suffice to model the motion dynamics when simple driving is assumed. However, modeling complex driving situations that consider multiple targets requires switching dynamic models that may not be necessarily expressed as in Equation (1).

Sensor Profile Library
For the model illustrated in Figure 1b, the cognitive radar is assumed to be equipped with a prescribed set of measurement noise covariances, referred to as the sensor profile library. According to Equation (6), the measurement noise covariance is computed based on the waveform parameters: pulse duration and chirp rate. The sensor profile library holds a large set of measurement noise covariances, denoted as R. Since it is computationally expensive and time-consuming to go through the entire library at each time cycle to select the informative measurement (or optimum waveform), a localized set is adopted instead. As a solution, this paper considers a k-nearest neighbors (kNN) method to obtain the localized set R L k = {R (1) , R (2) , . . ., R (N L ) } ∈ R, which includes measurement noise covariances that are neighbors to R k . The work in [28] views this localization approach as a form of an attention mechanism, which is one of the basic principles of cognition in modeling intelligent radar sensors.

Research Objective
The main goal of the cognitive radar is to estimate the target's state from the uncertain radar measurements while maintaining low estimation error on a cycle-by-cycle basis. Given the model assumptions, this work aims to demonstrate how the cognitive radar can manipulate the waveform signal parameters to improve the target's state estimate for the next time instant. In this regard, we mathematically express the design objective in terms of a state estimation problem and propose three research questions on modeling the information processor and the measurement-selection mechanism.
Suppose the motion dynamics of the vehicle-following scenario are expressed by Equation (1); the state estimation problem becomes finding the estimated state, denoted aŝ x k ∼ p(x k |Z k ), that minimizes the following objective function at each time step: arg min wherex k = x k −x k is the error between the true state and the estimated state, and p(x k |Z k ) is the probability density function (PDF) of the estimated target's state. Given that the radar measurements are available up to time k, Z k = {z i , i ≤ k}, Equation (8) estimates the target's state that minimizes the mean squared error. To accomplish this objective, this paper focuses on modeling the information processor and the measurement-selection technique by proposing the following research problems.
Research Problem 1. Let us assume that the parameters of the model in Equation (1) are known.
Determine the amount of new information in radar measurements that contribute to estimating the target's statex k .
The first research problem captures the essence of the information processor. Computing the information of the estimated target's state is crucial because the sensor determines the optimum waveform (or the measurement noise covariance) according to the information processor. Meanwhile, the following research problem deals with how the radar sensor can change to improve the estimate of the target's state.

Research Problem 2.
Let us assume that the measurement noise covariance, R k , can change at any time cycle. Based on Research Problem 1, develop an optimal selection methodology to minimize estimation error and achieve informative measurements with respect to the measurement noise covariance.
This problem represents the measurement-selection procedure that executes a waveform, leading to a better estimate of the target's state. It solves an optimization problem that depends on the choice of information measure. Finally, we contemplate a general setting, which combines Research Problem 1 and Research Problem 2 in designing the cognitive radar.
Research Problem 3. Let us consider that a set of measurement noise covariances R L k are available (i.e., R L k = {R (1) , R (2) , . . ., R (N L ) }). Derive the algorithm for the information processor and the measurement-selection procedure by looking forward to one time-step ahead. In addition, is it possible to extend the algorithm to acquire informative measurements by planning L steps in advance?

Proposed Method
This section briefly reviews the state-of-the-art designs to model the information processor and its corresponding measurement-selection criteria. The paper addresses Haykin's strategy specific to cognitive radar [27] and, for the first time, discusses alternative approaches that can apply to designing such systems [16,30]. Finally, our proposed solutions to Research Problems 1, 2, and 3 are presented.

Prior Works
Multiple measures are suggested in statistics and information theory to quantify information [14][15][16]30]. The most common is the Shannon entropy, which measures the amount of self-information or (Shannon surprise) of a particular observation, averaged over all possible outcomes [14]. In [27,28], the authors adopt the Shannon entropy to measure the information of the estimated target's state and model the information processor as follows: where p(x k |Z k ) is the posterior PDF of the estimated state. Haykin derives H k in terms of the estimated state covariance, P(k|k), when the Kalman filter is applied for state estimation (i.e., p(x k |Z k ) = N (x(k|k), P(k|k))). According to [27,28], the measurement-selection procedure chooses the measurement noise covariance that minimizes the Shannon entropy. While Haykin considers an information-theoretic approach to design cognitive radar, refs. [15,16] use surprise as the principal mechanism to acquire information. Surprise measures "how much wow" one experiences when encountering uncertain events [13]. The Bayesian surprise measures the Kullback-Leibler (KL) information between a prior probability distribution and its update when a new observation is made [15]. Based on the research objective, the Bayesian surprise is defined as where it determines the effect of the new radar measurement z k on the target's state estimation by measuring the KL distance from the predicted PDF to the posterior PDF. In addition, free energy is another type of surprise that measures the information of a new measurement by weighting and averaging it over all possible models [16]. The free energy is determined as follows: where p(z k |x k , Z k−1 ) is the probability of the measurements at time k, conditioned on the state and all past measurements. As shown, free energy is expressed in terms of Bayesian surprise and − ln p(z k |Z k−1 ), which refers to the Shannon surprise within the measurements. Note that the Bayesian surprise and the free energy have been adopted in many works to explain information-seeking behaviors in biological agents [17,[19][20][21][22]35,36]. However, these two have not been directly applied to the design of cognitive radar systems. A classical estimation/control methodology to evaluate the impact of new measurements is computing the trace of the influence matrix that is used in data assimilation for weather forecasting applications [30]. The influence matrix is also suitable for measuring radar measurements' contribution to estimating the target's state. For a specific configuration in which the Kalman filter is used for state estimation, the influence matrix is obtained as follows: where K k is the Kalman gain. In [30], the authors determine that the measurement with the maximum trace contributes more information to state estimation. Thus, the measurement noise covariance that maximizes the trace of the influence matrix is selected.

Solution to Research Problem 1
This paper proposes the Bayesian surprise as the main approach to quantifying the amount of new information within the estimated state. The Bayesian surprise demonstrates how the uncertainty in measurements improves future state estimations given prior estimates and provides valuable information to reduce estimation errors over time.
Since the radar measurements are constructed from linear Gaussian state-space models (see Equation (1)), and it is assumed that the parameters of the model are known, the Kalman filter is adopted for state estimation [26]. The Kalman filter is the optimal estimator in the mean square error sense that solves the research objective given in Equation (8). The filter estimates the state mean,x(k|k) = E[x k |Z k ], and its covariance matrix, P(k|k) = E[(x k −x(k|k))(x k −x(k|k)) T |Z k ], in an iterative manner. Algorithm 1 presents the two-step state prediction and estimation of the Kalman filter.
Algorithm 1 Kalman filter [26]. Measurement update (Estimation): Given that p(x k |Z k−1 ) = N (x(k|k − 1), P(k|k − 1)) and p(x k |Z k ) = N (x(k|k), P(k|k)) are available from the Kalman filter, the following expression for the Bayesian surprise is achieved [37]: wherex(k|k − 1), P(k|k − 1), and n are the predicted state mean, the predicted state covariance matrix, and the state space dimension, respectively. While the above expression demonstrates the Bayesian surprise in state space, the nature of the research problem requires rewriting Equation (13) in measurement space. In this regard, we determine the Bayesian surprise in measurement space as wherez(k|k − 1) = z k − H kx (k|k − 1) is the innovation vector, Pz(k|k − 1) = R k + H k P(k|k − 1)H T k is the innovation covariance, and m is the dimension of the measurement space. Equation (14) clearly shows the connection between the Bayesian surprise and the design parameters of the radar sensor (i.e., R k ); it also indicates how the information in a current radar measurement influences the Bayesian surprise (i.e., Pz(k|k − 1) −1 ).
As shown in (14), the Bayesian surprise at time k is a function of the measurement z k . In a case where z k is not available (e.g., multiple-step planning), this paper proposes computing the expectation of Bayesian surprise instead. The expectation of Bayesian surprise with respect to p(z k |Z k−1 ) ∼ N (H kx (k|k − 1), Pz(k|k − 1)) is obtained as follows: The measurement noise covariance, R k , and the information within the innovation, Pz(k|k − 1) −1 , are the only terms that appear in Equation (15). Equation (15) shows that the uncertainty in measurements, balanced by what the filter thinks about the measurements (i.e., R k Pz(k|k − 1) −1 ), impacts the expectation of Bayesian surprise.

Solution to Research Problem 2
This section explores the measurement-selection scheme associated with the choice of the information processor. Since the Bayesian surprise and its expectation are proposed to solve Research Problem 1, the challenge of Research Problem 2 becomes achieving an optimum selection mechanism based on the Bayesian surprise (or its expectation). To this end, let us first refer to the definition of the influence matrix given in Equation (12). A connection exists between the expected Bayesian surprise and the influence matrix. This relation is evident when rewriting Equation (12) in the measurement space. By expressing the Kalman gain as K k = P(k|k)H T k R −1 k and applying the matrix inversion lemma to Equation (12), the following definition is obtained: where I m×m is the identity matrix. The term R k Pz(k|k − 1) −1 appears in the expectation of Bayesian surprise as well. The influence matrix is a projection matrix (i.e., S k is symmetric and idempotent); a positive semi-definite matrix and all its diagonal elements are bounded between 0 and 1 [30]. Since the magnitude of the diagonal values of S k corresponds to the influence of the measurement, the trace of the influence matrix is acceptable for determining the impact of measurements [30]. An informative measurement that contributes to state estimation maximizes the trace of the influence matrix. The influence matrix trace provides insight into how to select informative radar measurements with respect to the Bayesian surprise. According to the properties of the influence matrix, its trace does not exceed the dimension of the measurement space m. Therefore, the maximization of tr{S k } is equivalent to minimizing tr{R k Pz(k|k − 1) −1 } (see Equation (16)). In this regard, the measurement-selection procedure concerning the influence matrix becomes solving the following optimization problem: where R min k is obtained when tr{R k Pz(k|k − 1) −1 } is minimized. To do so, the trace is differentiated with respect to the measurement noise covariance and is set equal to zero: The following is obtained by substituting the expression for Pz(k|k − 1) and employing the matrix inversion lemma: where R −1 k = 0 m×m is not applicable; hence, R min k = H k P(k|k)H k . For R k = H k P(k|k)H k , the trace of the influence matrix reaches its maximum.
To demonstrate how the selection criteria for the expectation of Bayesian surprise changes when R k = H k P(k|k)H k , it is suitable to revise E[S B k (z k )] in terms of the trace operator: where for any positive semi-definite square matrix A (i.e., R k Pz(k|k − 1) −1 ), ln |A| = tr{ln(A)}. According to the bounds of the natural logarithm, if A << I m×m , then abs(ln A) < (A −1 − I m×m ). In other words, the growth rate of (A −1 − I m×m ) is higher than ln A when A → 0 m×m . Applying this condition to Equation (20) for R k Pz(k|k − 1) −1 → 0 m×m makes it safe to say that E p(z k |Z k−1 ) [S B k (z k )] ≈ tr{(R k Pz(k|k − 1) −1 ) −1 − I m×m }. Therefore, when R k Pz(k|k − 1) −1 → 0 m×m (or R k = H k P(k|k)H k ), the expectation of Bayesian surprise reaches its maximum value. In an informative radar measurement that decreases the state estimation error, the measurement noise covariance is small, and the inverse of the innovation covariance is maximized.
To further elaborate, we consider a simple example by setting R k Pz(k|k − 1) −1 = αI m×m , where 0 ≤ α ≤ 1. We examine two extreme cases of α = 0.01 and α = 0.99. For α = 0.01, the expectation of Bayesian surprise becomes and for Evidently, as α → 0, the expectation of Bayesian surprise reaches a higher value than the case where α → 1. Moreover, the final two terms in Equation (21) (i.e., tr{100I m×m } − m = 99m ) are dominant as α approaches zero. Hence, when R k Pz(k|k − 1) −1 → 0 m×m , the expectation of Bayesian surprise is maximized.

Solution to Research Problem 3
The solution to the final research problem aligns with the discussions carried out in the last two sections. As assumed, the sensor profile library, depicted in Figure 1b, withholds a set of measurement noise covariances, defined as R L k = {R (1) , R (2) , . . ., R (N L ) }. At time cycle k, the cognitive radar estimates the target's state from the radar measurements. Through multiple stages of prediction and estimation (i.e., planning), the information processor measures the contribution of each measurement noise covariance to estimating the target's future state based on the expectation of Bayesian surprise. Eventually, the radar selects the measurement noise covariance with the maximum expectation of Bayesian surprise.
Given that the estimated state covariance, P(k|k), is accessible from the state estimation process, the proposed algorithm for one-step planning is summarized in Algorithm 2. Algorithm 2 presents the step-by-step procedure for obtaining the expected Bayesian surprise values corresponding to the i-th measurement noise covariance, i = 1, . . ., N L . To this end, the one-step measurement-selection mechanism based on the expectation of Bayesian surprise is demonstrated as follows: where R k+1 = R (i ) ∈ R L k leads to a better estimate of the target's state for time k + 1. The waveform associated with R k+1 is applied to the radar environment and sets a repeat of the cycle.

Algorithm 2
Cognitive radar for one-step planning.
1: P(k|k) and R L k = {R (1) , R (2) , . . ., R (N L ) } are available at time k 2: P(k + 1|k) = Q k + F k P(k|k)F T k 3: for i = 1, . . ., N L 4: P Table 1 summarizes alternative models of the information processor and its corresponding measurement-selection procedure for one-step planning. The expressions are derived when the Kalman filter is presumed for state estimation. It is straightforward to follow Haykin's design [27,28] and the influence matrix approach [30]. However, a detailed description of solving the research problems with respect to free energy is carried out in Appendix A. The authors make a similar case for using the expectation of free energy to model the information processor instead of the free energy itself. In addition, the measurement noise covariance that maximizes the expectation of free energy minimizes the state estimation error for the upcoming cycle. Haykin's method requires an additional step to calculate P(k + 1|k + 1), while the other three share the same term and exclude this extra step. Although Haykin's design implies the connection to the measurement noise covariance, his approach is aligned with the research objective. This is because it reduces the estimation error of the target's state by minimizing the estimated state covariance, P(k + 1|k + 1). Table 1. Information processor and measurement-selection designs for one-step planning.

Method Information Processor Measurement Selection
Haykin's Approach H Trace of Influence Matrix tr{S This paper also presents the means that extend the one-step planning algorithm to L steps. Since P(k|k) and the entire sensor profile library are available, the Kalman algorithm is partially applicable to predict the state covariance and to compute the innovation covariance and the estimated state covariance. The only difference is that the Kalman algorithm repeats L times to capture the influence of L future measurements. In this regard, the expectation of Bayesian surprise at time k + L is calculated as follows: where (ij. . .l) represents the L-length sequence of measurement noise covariances. Note that the expectation is computed with respect to p(z k+L |Z k+L−1 ), with mean and covariance, H k+Lx (ij...l) (k + L|k + L − 1) and P where R k+1 = R (i ) ∈ R L k . Algorithm 3 illustrates the proposed cognitive radar algorithm for L-step planning. Note that a similar process applies to the other models for L-step planning with some minor modifications.

Numerical Results
In this section, simulation results are presented to compare the state estimation performance of the proposed cognitive radar with the state-of-the-art listed in Table 1. The following demonstrates the experimental setup and parameter settings for generating radar measurements that emulate the simple vehicle-following scenario in Figure 1a. The paper suggests a radar configuration suitable for single-target tracking in highway and urban driving environments. Several error metrics are introduced to examine various aspects of the estimation performance. This section compares the system performance of the proposed one-step planning algorithm to its alternative competitors for different state estimation errors through a series of experiments. In addition, this section also analyzes the impact of multiple-step planning in improving state estimation performance. Results are verified over numerous Monte Carlo runs.

Simulation Setup and Data Generation
The purpose of the experiment is to evaluate the estimation performance of the proposed cognitive radar. Since the paper adopts the Kalman filter to accomplish state estimation, the model parameters in Equation (1) (i.e., F k , Q k , H k , R k ,x(0|0) and P(0|0)) are assumed available. In this regard, the radar sensor configuration and the parameter setting for generating radar measurements are presented.
For the vehicle-following scenario depicted in Figure 1a, the simulation assumes that the two cars are moving forward in the same lane (i.e., d y = 0). In this simulation, the FMCW radar sensor is mounted on the host vehicle and operates in the 77 GHz frequency band for short-and long-range applications [31]. The bandwidth of the transmitted radar signal is set to B = 100 MHz, and 0 dB SNR is achieved at d 0 = 2000 m. According to Equation (6), the measurement noise covariance depends on the choice of pulse duration and the chirp rate, R k (λ k−1 , b k−1 ). By assuming that the radar sensor maintains a maximum range of d max = 100 m and a maximum velocity of v max = 100 m/s, the sensor profile library consists of measurement noise covariances specified for the following values: where λ k−1 and b k−1 are configured to simulate a practical radar sensor for single-target tracking applications [6]. The sensor profile library is composed of N = 1810 measurement noise covariances, denoted as R = Since N is a large number, and going through the entire library at each time instant is cost-ineffective, this paper adopts the kNN method to obtain a smaller set with N L = 25 members. Figure 2 illustrates an example of a localized set of measurement noise covariance, R L k , that is distinguished by pulse duration and chirp rate. This article demonstrates highway and urban driving to examine the state estimation performance of the proposed cognitive radar for a realistic vehicle-following scenario. Since the true initial state, x 0 , and its estimation elements (i.e.,x(0|0), P(0|0)) depend on the driving environment, without loss of generality, the true initial state for highway driving is set to where the values are adjusted according to an in-city driving experience. Note that the estimated initial statex 0 ∼ N (x(0|0), P(0|0)) is a random value that changes per Monte Carlo run. This simulation sets the state noise variance to σ 2 q = 0.01 and the sample time to T s = 0.1 s for computing F k and Q k to ensure constant acceleration.

Evaluation Metrics
This paper considers numerous error measures to evaluate and compare the estimation performance of the proposed cognitive radar to the alternative design in Table 1. The error measures include the root mean square relative error (RMSRE), the average Euclidean relative error (ARE), the harmonic average relative error (HRE), and the geometric average relative error (GRE) [38]. Due to numerical reasons, the logarithm of the GRE is calculated instead, log(GRE). Table 2 provides the mathematical expressions of these error measures. In Table 2 k are, respectively, the state vector, the estimated state vector, and the state estimation error of the j-th Monte Carlo simulation at time step k. N mc represents the number of Monte Carlo simulations. The paper considers relative error measures since they are suitable for the performance evaluation of an estimation algorithm. However, the absolute value of the error metrics mentioned above-which computes the time average-is justified for ranking the overall state estimation performance of the cognitive radar. The absolute error counterparts of the relative error measures are given in Table 2. Table 2. Relative error measures and absolute values for performance evaluation [38].

Relative Error Metric Absolute of Error Metric
Root mean square relative error (RMSRE) Harmonic average relative error (HRE)

Performance Evaluation and Comparison for One-Step Planning
This section demonstrates the estimation response of the proposed radar design by tracking the velocity of the target vehicle, v 1 x,k , and the longitude distance, d x,k , when onestep planning is involved. The experiment examines target tracking in both highway and urban driving. Results are obtained for N mc = 10,000 Monte Carlo runs. Multiple attributes are considered for ranking the radar designs' overall estimation performance, as listed in Table 1. This paper applies a pairwise comparison technique that adopts a ranking vector (RV) to compare different estimation algorithms [32]. This method exploits comparison information based on the probability of the relative closeness of competing estimators to the true quantity. The authors in [32] discuss a variety of approaches for determining a unique RV. Here, order-preserving mapping is considered to obtain the RV for ranking state estimation performance. Since this method is straightforward, the paper solely refers to the results of applying this strategy. In addition, the authors of this paper decided only to present the RMSRE curves of the estimation performance to avoid unnecessary repetition. However, the absolute relative error measures (i.e., ARMSRE, AARE, AHRE, and AGRE) are recorded to evaluate the entire estimation performance. Figures 3 and 4, respectively, illustrate the RMSRE performance of velocity and longitude distance of the target vehicle for a highway driving experience. The duration of the experiment is set to 10 s. Although the RMSRE results are plotted in a logarithm scale, the figures show that the estimation response of the four radar models are in close proximity. In this regard, the estimation performance is ranked based on a pairwise comparison method for the mentioned error measures. Table 3 provides the absolute error measure values regarding the velocity of the target vehicle. For the results in Table 3, the RV based on order-preserving mapping is computed as follows: where the order of the elements in r 1 is similar to the order shown in Table 3. The magnitude reflects the goodness of the approach relative to each other. The larger the value, the better the corresponding estimation performance. According to r 1 , the rank for the velocity of the target vehicle in the highway scenario is indicating that the expectation of Bayesian surprise exceeds the alternative designs of the information processor. r 1 also implies that the expectation of free energy and Haykin's Shannon entropy rank similarly in estimation performance.    Table 4 reports the absolute error measures of the longitude distance for highway driving. Apparently, the four radar designs present identical outcomes for 10,000 runs of Monte Carlo simulations. This eventually leads to the following RV: where it indicates that the estimation performance of the longitude distance ranks the same for all the radar models.  6, respectively, depict the RMSRE curves of the velocity and longitude distance when the vehicle-following scenario takes place in an urban environment. In this experiment, results are simulated for 7 s. While the RMSRE curves regarding the four designs converge over time, the estimation response based on the trace of the influence matrix experiences the lowest error at earlier time instances. Table 5 displays the different approaches to modeling cognitive radar versus the absolute relative error metrics for estimating the target's velocity. As expected, the trace of the influence matrix presents a minimum level of error compared to the alternative designs. As a result, the following RV is achieved: where the influence matrix trace ranks the topmost in estimating the velocity. Our proposed scheme is second on the ranking scale. The expectation of Bayesian surprise experiences a more significant error than the trace of the influence matrix, with AHRE as the only exception. The expectation of free energy and Haykin's design present the poorest estimation. Table 6 provides the overall performance for longitude distance in urban driving. Table 6 shows that all models offer the same outcome for each error measure. Therefore, all designs are ranked equally regarding estimation performance, similar to highway driving.    According to this experiment, the following remarks can be made for one-step planning. In the case of highway driving, the expectation of Bayesian surprise outperforms the other three techniques in estimating the target's velocity. In the meantime, the trace of influence matrix is a better choice for modeling the information processor in an urban environment. Note that both methods only consider R k+1 Pz(k + 1|k) −1 as the means to minimize the state estimation error for the next time instant. This implies that the uncertainty in the measurements balanced by the certainty in innovation provides sufficient information to predict and estimate the target's dynamic state ahead of time.

Performance Evaluation and Comparison for L-Step Planning
This experiment evaluates the estimation performance of the proposed cognitive radar when the impact of L future measurements are considered in estimating the target's state for the upcoming time cycle. The results of this experiment are averaged over N mc = 1000 Monte Carlo simulations for the highway driving scenario. This section examines how v 1 x,k estimation improves when multiple-step planning is assumed. According to the figures and tables in the previous section, the longitude distance seems invariant for the various error measures. To this end, this experiment focuses only on estimating the target's velocity. Figure 7 illustrates the RMSRE performance of the proposed radar design for L = {1, 2, 3}. Figure 7 shows that the estimation error is substantially decreased by increasing the planning step from one to two. Although three-step planning outperforms them all, the amount of errors reduced by changing L = 2 to L = 3 is negligible compared to L = 1 to L = 2. Additionally, increasing L is associated with a longer simulation run time and higher computational complexity. Thus, two-step planning seems the optimum fit to enhance the state estimation performance of the proposed cognitive radar.
For two-step planning, this section also analyzes the estimation performance of the proposed cognitive radar with alternative designs. Figure 8 compares the RMSRE curves of the four radar designs by setting L = 2 and N mc = 10,000 for highway driving. According to Figure 8, the expectation of Bayesian surprise and the expectation of free energy present minimum estimation errors with respect to RMSRE, while surpassing the other two techniques. Table 7 supports this claim in terms of the absolute RMSRE. The results indicate that, on average, the expectation of Bayesian surprise improves the estimation process when multiple-step planning is considered.

Conclusions
This paper proposed a novel design of cognitive radar that plans the estimation response of the system based on the expectation of Bayesian surprise and makes a decision by reducing the estimation error over time. In this work, the radar measurements were expressed for a set of linear Gaussian state-space models to describe the motion dynamics of a simple vehicle-following scenario. Assuming that the model parameters are somehow known, the Kalman filter was applied for state estimation. According to the filter's estimate, the radar measures how much information each waveform-available from the sensor profile library-contributes to estimating the target's future state (i.e., velocity, distance), and chooses the one that maximizes the expectation of Bayesian surprise. This research showed that maximizing the expectation of Bayesian surprise leads to informative measurements and successively decreases the state estimation error. In addition, estimation algorithms for one-step planning and multiple-step planning are determined. The paper also demonstrated a unified framework to re-introduce and relate different design methodologies to model cognitive radar systems. Several experiments were carried out to evaluate and compare the estimation performance of the proposed method to alternative designs. Numerical results were implemented to emulate real-life highway and urban driving experiences. The paper examined the credibility of the proposed approach based on a pairwise comparison method for various error measures. Results indicated that the balance between uncertainty in the measurements and the certainty in innovations provides sufficient information for accurate target tracking for one-step planning. The paper also demonstrated that two-step planning improves the estimation error significantly compared to one-step planning. Meanwhile, the proposed radar design exceeds its competitors' overall estimation performance when two-step planning is applied.

Appendix A
To model the information processor based on the free energy, let us refer to Equation (11). By replacing p(z k |Z k−1 ) ∼ N (H kx (k|k − 1), Pz(k|k − 1)) in Equation (11), the following is achieved: where S B k (z k ) is given in Equation (14). Similar to the Bayesian surprise, free energy is a function of z k . In the case that z k is unavailable, the expectation of the free energy with respect to p(z k |Z k−1 ) is obtained instead: where the second term is the expectation of − ln p(z k |Z k−1 ). Compared to the expectation of Bayesian surprise, Equation (A2) captures the filter's uncertainty in interpreting the radar measurements. A recent study adopts the expectation of free energy as a means to investigate exploratory behavior in linear Gaussian dynamic systems [39]. An equivalent analysis of the measurement-selection procedure based on the expectation of Bayesian surprise also applies to the expectation of free energy. With some basic manipulations of Equation (A2), the expectation of free energy reaches its maximum when R k Pz(k|k − 1) −1 → 0 m×m (or R k = H k P(k|k)H k ). In this regard, the measurement noise covariance that maximizes the expectation of free energy leads to a better estimate of the target's state at the succeeding time instant.