Generation of Realistic Cut-In Maneuvers to Support Safety Assessment of Advanced Driver Assistance Systems

: Advanced Driver Assistance Systems (ADASs) attract constantly growing attention from academics and industry as more and more vehicles are equipped with such technology. Level-3 ADASs, like the DRIVE PILOT from Mercedes-Benz AG, are expected to appear more and more on the market in the next few years. However, automated driving raises new challenges for the system validation required for series approval. The replacement of a human driver as control instance expands the range of variants to be validated and veriﬁed. The scenario-based validation approach meets these challenges by simulating only speciﬁc safety-critical driving scenarios using software-in-the-loop simulation. According to the current state of the art, various safety-relevant driving scenarios are parameterized as idealized maneuvers which, however, requires a great modeling effort, and at the same time, such simpliﬁcations may bias the safety assessment. Therefore, a novel approach using artiﬁcial intelligence methods is taken here to generate more realistic driving scenarios. Namely, a generative model based on a variational autoencoder is trained with real-world data and then used to generate trajectories for a speciﬁc driving maneuver. Through a comprehensive analysis of the synthetic trajectories, it becomes clear that the generative model can learn and replicate relevant properties of real driving data as well as their probabilistics much better than the mathematical models used so far. Furthermore, it is proven that both the statistical properties and the time characteristics are almost equal to those of the input data.


Introduction
The validation of Advanced Driver Assistance Systems (ADASs) usually consists of real-world driving tests.With the increasing complexity of ADASs and the resulting increased test effort, these tests are too costly from an economic point of view [1].More efficient approaches to safeguarding highly automated vehicles are provided by the German research project PEGASUS [2].In particular, the field of simulative validation offers automated and efficient methods.Especially, the scenario-based approach, where only specific safety-critical driving scenarios are simulated, reduces the test effort to a minimum [3][4][5].
In such an approach, a simulation tool emulates the real ECU code with the automated driving function to perform software-in-the-loop simulation.The inputs for the simulated control device are generated by a simulation environment, including vehicle models and sensor models, as well as data from other ECUs installed in the system vehicle, also called ego vehicles.To ensure valid input data, the ego vehicle interacts with other traffic participants (road objects) in a virtual environment.The dynamics of the road objects, which ideally follow mathematical models depending on the described driving situations, are recorded by sensor models.Thus, a virtual world that is close to reality is processed and captured by the vehicle model.
Usually, the procedure is based on an ideal mathematical modeling of driving scenarios using physical parameters.For example, the cut-in maneuver in Figure 1a may be parameterized by an initial lateral displacement d 0 c of the cut-in vehicle (blue) with reference to the ego vehicle (red), headway time t * h , and velocity v * c when entering the ego lane and lateral distance d 1 c at the end of the maneuver; see also Figure 1b.Up to 20 parameters are used to describe such a cut-in maneuver and need to be identified for observed instances from field measurements [6].A specific choice of these parameter values will then result in a specific trajectory representing a simplified real maneuver, as shown in Figure 1c, which, however, is generally not able to represent all aspects of real traffic situations.Especially, unusual trajectories will appear, which can approximate observed trajectories only in a crude way, resulting in biased safety estimates.For example, Figure 2 shows over 8000 real cut-in trajectories measured from real drives of a vehicle fleet.High-intensity sections indicate areas where data points occur more frequently.The majority of the maneuvers have an initial lateral offset of 3 to 4 m from the center of the lane of the ego vehicle before changing lanes, which corresponds approximately to the average lane width of German motorways.After a lane change, the cut-in vehicles are often located in the middle of the ego lane.The highlighted curve picks an arbitrary example trajectory, clearly demonstrating the large difference between the idealized lateral position behavior in Figure 1c and real maneuvers.This demonstrates the need for more realistic approximations.
For safety assessment of ADASs, the generation of realistic maneuvers is not enough, their statistical properties also need to be retained.In the actual approach, the parameters of the idealized maneuvers (Figure 1) are identified for a set of real measured drives on German highways (Figure 2) and evaluated statistically [6], as illustrated in Figure 3a.The extracted probability densities are then used to generate a sample of parameter sets of any required size by Monte Carlo sampling [7], reconstructing corresponding idealized maneuvers as time series and feeding a software-in-the-loop simulation model to evaluate the criticality of these maneuvers.The obtained criticalities can finally be analyzed to perform a safety assessment of the investigated ADAS software which, however, is only as reliable as the underlying maneuvers representing the real traffic situations.In summary, the following problems and sources of error are associated with this idealized method: • low fitting errors require highly complex mathematical models describing maneuvers, • increasing mathematical complexity increases amount of input parameters to be identified, • modeling effort increases with number of logical scenarios to be investigated, • method is only applicable to clearly definable and separable driving situations.
Therefore, the present paper focuses on an alternative, AI-based approach for simulative validation of ADAS.In the newly developed methodology shown in Figure 3b, mathematical maneuver modeling is replaced by a generative AI-model in the form of a variational autoencoder.This model independently learns an efficient representation of the measured data, where the encoder network maps the given real measuring signals x into a lowdimensional space of latent parameters z and the decoder network generates new, but statistically identical samples of time-series x.For safety assessment, Monte-Carlo sampling may generate samples of latent variables z to be transformed to time-series x by the decoder and used for simulation analogously as in the state-of-the-art approach.However, the newly developed AI-model is expected to learn all the major features of real driving data and represent them through latent distributions and the decoder network better than the currently used mathematical model with physical parameters.
In order to apply this approach, the measured data need to be preprocessed, which is shown in Section 2 for the cut-in scenario.Next, the used variational autoencoder for generating realistic cut-in maneuvers is described in Section 3.For the validation of statistical properties, the proposed model is applied to the cut-in scenario in Section 4.
Finally the approximations by the AI-model are compared to the current approach.

Data Preprocessing
The quality of a machine learning model depends largely on the data basis used.Therefore, suitable measurement signals for the considered driving maneuver must be selected and first preprocessed.The data are based on ECU signals from the ego vehicle and sensor information on objects in its immediate field of view.The control unit detects and records position and kinematics of such objects where the available measurement data include, among others, the position (x M , y M ) of an object relative to the ego vehicle and its absolute speed.The ECU has a sampling rate of 50 Hz, i.e., the time-interval between two consecutive data points is 0.02 seconds.The position of an object relative to the ego vehicle is influenced by the course of the road.109 When a vehicle in front of the ego vehicle drives on a curve with radius R, the measured 110 lateral distance y M does not equal the lateral lane distance, which an ADAS has to react on, 111 see Fig. 4a.

112
In order to eliminate the curvature effect, the following geometric relations between 113 measured relative coordinates (x M , y M ) and the real longitudinal and radial distances 114 (s, d) may be considered: From the ratio of the two equations we obtain the angle 116 In summary, the following problems and sources of error are associated with this idealized method: • Low fitting errors require highly complex mathematical models describing maneuvers.

•
Increasing mathematical complexity increases amount of input parameters to be identified.

•
Modeling effort increases with number of logical scenarios to be investigated.

•
Method is only applicable to clearly definable and separable driving situations.
Therefore, the present paper focuses on an alternative, AI-based approach for simulative validation of ADASs.In the newly developed methodology shown in Figure 3b, mathematical maneuver modeling is replaced by a generative AI model in the form of a variational autoencoder.This model independently learns an efficient representation of the measured data, where the encoder network maps the given real measuring signals x into a low-dimensional space of latent parameters z, and the decoder network generates new but statistically identical samples of time series x.For safety assessment, Monte Carlo sampling may generate samples of latent variables z to be transformed to time series x by the decoder and used for simulation analogously, as in the state-of-the-art approach.However, the newly developed AI model is expected to learn all the major features of real driving data and represent them through latent distributions and the decoder network better than the currently used mathematical model with physical parameters.
In order to apply this approach, the measured data need to be preprocessed, which is shown in Section 2 for the cut-in scenario.Next, the used variational autoencoder for generating realistic cut-in maneuvers is described in Section 3.For the validation of statistical properties, the proposed model is applied to the cut-in scenario in Section 4. Finally, the approximations by the AI model are compared with the current approach.

Data Preprocessing
The quality of a machine learning model depends largely on the data basis used.Therefore, suitable measurement signals for the considered driving maneuver must be selected and first preprocessed.The data are based on ECU signals from the ego vehicle and sensor information on objects in its immediate field of view.The control unit detects and records position and kinematics of such objects, where the available measurement data include, among others, the position (x M , y M ) of an object relative to the ego vehicle and its absolute speed.The ECU has a sampling rate of 50 Hz, i.e., the time interval between two consecutive data points is 0.02 s.
In the context of this work, the position of an object is represented by so-called Lshapes derived from three corners framing the vehicle; see Figure 4a.From this simplified representation, the lateral position (x M , y M ) of the center point relative to the center of the ego vehicle can be determined over the entire acquisition time.These raw signals (Figure 4b) need to be preprocessed in several ways: Correction of curved roads.(ii) Smoothing and removal of dropouts.(iii) Downsampling data points.
The position of an object relative to the ego vehicle is influenced by the course of the road.When a vehicle in front of the ego vehicle drives on a curve with radius R, the measured lateral distance y M does not equal the lateral lane distance, which an ADAS has to react on; see Figure 4a.
In order to eliminate the curvature effect, the following geometric relations between measured relative coordinates (x M , y M ) and the real longitudinal and radial distances (s, d) may be considered: From the ratio of the two equations, we obtain the angle and, e.g., from the second equation in Equation ( 1), the correction formula for the lateral lane distance The required curve radius R may be estimated from the derivate of s = Rϕ resulting in where actual ego velocity v and angular velocity ω are typically known quantities.It should be noted that Equation ( 3) is valid for both right curves (R > 0) and left curves (R < 0).The required curve radius R may be estimated from the derivate of s = Rφ resulting in where actual ego velocity v and angular velocity ω are typically known quantities.It should be noted that Eq.( 3) is valid for both right curves (R > 0) and left curves (R < 0).
The recording of data in real driving tests is carried out in continuous series of measurements.However, due to the dynamics of a traffic situation and a constantly changing vehicle environment, gaps, jumps and the superimposition of high-frequency noise may appear, Fig. 4. Signals with such perturbations are not suitable as training data for a generative AI-model and must be removed [8,9].In the following, signal snippets with a maximum length of T = 20 seconds are considered to ensure that a maneuver is complete.For signal gaps shorter than a tolerated length ∆t = τT (e.g.τ = 0.02), linear interpolation between margin values is performed to reconstruct the missing signal values.While this approach can be assumed to be sufficient in the case of short signal gaps due to vehicle inertia, signals with larger gaps are discarded.In Fig. 4b two gaps are highlighted in grey and corresponding interpolation in red.
In addition to missing measured values, implausible signal changes in particular cause a reduction in signal quality.Therefore, smoothing with a Savitzky-Golay filter is used, which performs a local polynomial regression.Here fourth-order polynomials using a window with a fixed width of 13 signal values is sliding through the signal [10].After interpolation and application of the Savitzky-Golay filter, the red curve in Fig. 4b has no discernible noise anymore.
For the majority of common machine learning algorithms, it is necessary that all input data have the same dimension, which should be not too high as well.Filling missing signal values with predefined values such as zeros (zero padding) would correspond to incorrect maneuver characterictics.Therefore, and in order to reduce the number of data points, signals x j (t), t ∈ [0, T j ], with measured length T j are downsampled to a fixed size N t = 100 data points by resampling the time series to x ij = x j (t i ) at time points The recording of data in real driving tests is carried out in continuous series of measurements.However, due to the dynamics of a traffic situation and a constantly changing vehicle environment, gaps, jumps, and the superimposition of high-frequency noise may appear, as shown in Figure 4. Signals with such perturbations are not suitable, as training data for a generative AI model and must be removed [8,9].In the following, signal snippets with a maximum length of T = 20 s are considered to ensure that a maneuver is complete.For signal gaps shorter than a tolerated length ∆t = τT (e.g., τ = 0.02), linear interpolation between margin values is performed to reconstruct the missing signal values.While this approach can be assumed to be sufficient in the case of short signal gaps due to vehicle inertia, signals with larger gaps are discarded.In Figure 4b, two gaps are highlighted in gray and the corresponding interpolation in red.
In addition to missing measured values, implausible signal changes in particular cause a reduction in signal quality.Therefore, smoothing with a Savitzky-Golay filter is used, which performs a local polynomial regression.Here fourth-order polynomials using a window with a fixed width of 13 signal values are sliding through the signal [10].After interpolation and application of the Savitzky-Golay filter, the red curve in Figure 4b has no discernible noise anymore.
For the majority of common machine learning algorithms, it is necessary that all input data have the same dimension, which should not be too high, either.Filling missing signal values with predefined values such as zeros (zero padding) would correspond to incorrect maneuver characteristics.Therefore, and in order to reduce the number of data points, signals x j (t), t ∈ [0, T j ], with measured length T j , are downsampled to a fixed size N t = 100 data points by resampling the time series to x ij = x j (t i ) at time points based on linear interpolation between measured data.

Variational Autoencoder for Generating Realistic Cut-In Maneuvers
The dataset presented in the previous section is used for training a generative AI model with the structure shown in Figure 5. Formally, this task can be described as follows: The training data x result from an unknown distribution p(x) which must be learned by the generative model in order to be able to generate new comparable data x that follow the same probabilistics p θ ( x) ≈ p(x).To generate new samples, a random sample z ∈ R d is taken from a known distribution p z (z) and then transformed by the generator D(z; θ) with internal parameters θ to be learned.In principle, any generative model such as normalizing flows or generative adversarial networks may be used for this generative process.However, the different models are differently suited for specific tasks.Here, a variational autoencoder is used for the described process, which is briefly explained below.
based on linear interpolation between measured data.

Variational Autoencoder for Generating Realistic Cut-in Maneuvers
The dataset presented in the previous section is used for training a generative AI-model with the structure shown in Fig. 5. Formally, this task can be described as follows: The training data x result from an unknown distribution p(x) which must be learned by the generative model in order to be able to generate new comparable data x that follow the same probabilistics p θ ( x) ≈ p(x).To generate new samples, a random sample z ∈ R d is taken from a known distribution p z (z) and then transformed by the generator D(z; θ) with internal parameters θ to be learned.In principle, any generative model such as Normalizing Flows or Generative Adversarial Networks may be used for this generative process.However, the different models are differently suited for specific tasks.Here, a variational autoencoder is used for the described process, which is briefly explained below.
A variational autoencoder (VAE) is a type of generative model that can learn to generate new data by capturing the underlying distribution of the training data [11,12].VAEs are a variant of autoencoders consisting of an encoder and a decoder as shown in Fig. 5.The main idea behind VAEs is to learn a low-dimensional latent representation of the input data that captures the essential features and variations in the data.This latent representation can then be used to generate new samples that resemble the original data distribution [13].
More precisely, the encoder part E(x; ψ) of a VAE maps the input data x to a latent space representation z.It typically consists of several layers of a neural network that progressively reduce the dimensionality of the data, ultimately producing the mean µ and variance σ of a multivariate Gaussian distribution in the latent space.The encoder can be represented as a function of x generating the mean and the natural logarithm of the variance, i.e., E(x; ψ) = µ ln σ (6) where parameters ψ summarize weights and biases of the artificial neural network.To generate a sample in the latent space, we need to obtain a latent vector z that follows the desired distribution.However, to allow training by backpropagation and stochastic gradient descent, we cannot directly sample from the z-distribution.Instead, the reparametrization trick addresses this challenge by introducing a separate normally distributed random variable ε ∼ N (0, I), which is drawn from a multivariate standard Gaussian distribution.The vector z in the latent space is then obtained by where ⊗ is an elementwise multiplication of the two vectors σ and ε.A variational autoencoder (VAE) is a type of generative model that can learn to generate new data by capturing the underlying distribution of the training data [11,12].VAEs are a variant of autoencoders consisting of an encoder and a decoder, as shown in Figure 5.The main idea behind VAEs is to learn a low-dimensional latent representation of the input data that captures the essential features and variations in the data.This latent representation can then be used to generate new samples that resemble the original data distribution [13].
More precisely, the encoder part E(x; ψ) of a VAE maps the input data x to a latent space representation z.It typically consists of several layers of a neural network that progressively reduce the dimensionality of the data, ultimately producing the mean µ and variance σ of a multivariate Gaussian distribution in the latent space.The encoder can be represented as a function of x, generating the mean and the natural logarithm of the variance, i.e., E(x; ψ) = µ ln σ (6) where parameters ψ summarize weights and biases of the artificial neural network.To generate a sample in the latent space, we need to obtain a latent vector z that follows the desired distribution.However, to allow training by backpropagation and stochastic gradient descent, we cannot directly sample from the z-distribution.Instead, the reparametrization trick addresses this challenge by introducing a separate normally distributed random variable ε ∼ N (0, I), which is drawn from a multivariate standard Gaussian distribution.The vector z in the latent space is then obtained by where ⊗ is an element-wise multiplication of the two vectors σ and ε.
The decoder part D(z; θ) of the VAE takes the latent vector z and maps it back to the input space, aiming at a reconstruction of the original data.It also consists of several layers of an artificial neural network with weights and biases summarized in parameter vector θ to upsample the latent vector and eventually generate a reconstructed output x.Now, with both parts of the neural network architecture, an end-to-end training of the VAE by a joint optimization, also called evidence lower bound optimization (ELBO), can be performed [14].Due to the random character of generating z, outputs x and inputs x cannot be compared on a one-to-one basis but only statistically.Therefore, the loss function consists of two terms: the reconstruction loss MSE = mean k || x(k) − x (k) || 2 , which measures how well the decoder can reconstruct the input data, and a regularization term, typically the Kullback-Leibler divergence (D KL ), which encourages the approximate posterior distribution q ψ (z | x) to be close to a prior distribution p z (z): During the training process, the parameters of the encoder (ψ) and decoder (θ) networks are learned by minimizing this loss function using gradient descent or a similar optimization algorithms like ADAM.Once the VAE is trained, it can generate new samples by sampling from the prior distribution p z (z) and passing them through the decoder only.
Since the existing training data are multivariate time series signals, the model used must be able to correctly capture temporal dependencies within the data and take them into account when generating new data.Therefore, neural network topologies are needed that can recognize and learn temporal patterns in the training data.Both, convolutional neural networks and recurrent neural networks may be considered for the design of the variational autoencoder, since both network structures have proven to be suitable for tasks related to time series signals [15,16].In the following, convolutional neural networks are chosen, as shown in Figure 6.
The training data comprise N f = 3 different features consisting of time series namely time points t i , lateral lane distances d i , and velocities v i of the cut-in vehicle.The training dataset consists of samples x k with k ∈ 1, 2, ..., N s , shown in Figure 2. At this point, it should be mentioned that in the context of this work, the focus is on the lateral positions d i .
In the first layer of the encoder, each of the three input channels is weighted with a sliding 3×1 convolution filter again producing time series with 100 elements, which are then summed up and biased to produce a new channel; and 100 such operations are performed to finally end up with 100 intermediate results, which are then processed element-wise with the nonlinear Rectified Linear Unit (ReLU) σ ReLU (x) = max{0, x} as the activation function.The second and third layer operate in a way with reduced numbers of filters, reducing the number of channels first to 50 and then to 25.The latter output is flattened to a 25× 100 = 2500-dimensional vector and connected to the encoder output µ T , ln σ T T ∈ R 2d by a fully connected network, where d is the dimension of the latent space.
As shown in Figure 6, the decoder has an almost symmetric structure in reverse order.Instead of the convolutional layers, transposed convolutional layers are used.The output of the decoder network corresponds to the reconstruction of the sample that was used as input for the variational autoencoder.The random input z ∈ R d is first enlarged to a 2500-dimensional vector by a fully connected layer, which is then reshaped to 25 channels of time series with 100 data points, respectively.This is enlarged to 50 and 100 channels by transposed convolution with 3 × 1 filters and ReLU activation functions, respectively.Finally, the 100 channels are reduced to three channels by convolution with 3 × 1 filters.
The total number of network parameters θ and ψ of the variational autoencoder sums up to 227,603.Their training is carried out by an ADAM optimizer for a total of 700 epochs, where each epoch splits the total set of N s input examples into minibatches of size 32.In order to improve training performance [17], the features in Equation ( 9) living on different scales are initially normalized with respect to signal amplitudes, respectively, such that all features have the same range [−1; 1]: Version September 16, 2023 submitted to Appl.Mech.8 N t = 100 transposed convolution with 3x1-filters and ReLU activation functions, respectively.Finally, the 100 channels are reduced to three channels by convolution with 3x1-filters.
The total number of network parameters θ and ψ of the variational autoencoder sums up to 227603.Their training is carried out by an ADAM optimizer for a total of 700 epochs, where each epoch splits the total set of N s input examples into minibatches of size 32.In order to improve training performance [17], the features in Eq.( 9) living on different scales are initially normalized w.r.t.signal amplitudes, respectively, such that all features have the same range [−1; 1]: For comparibility of the output x with this normalized input x, the tanh(x) ∈ (−1, 1) is applied elementwise to the last layer as activation function.In order to generate trajectories similar the measured ones in Eq.( 9), these outputs x need to be a rescaled as follows: x(k) ij :=

Statistical validation of VAE
The result of applying the above concept to cut-in maneuvers with latent space dimension d = 10 are not only optimal encoders and especially decoders for generating realisitc maneuvers, but also a sample µ (k) , ln σ (k) , k = 1, ..., N s , with stochastical properties representing those of the input shapes x [17,18] may be used to fit coordinate-wise probability densities p µ (µ m ) and p σ (ln σ m ), m = 1...d, to the elements of the encoder output in Fig. 6.Some examples of these probability densities are shown in Fig. 7.
These density functions highlight, that the encoder outputs for mean (Fig. 7a) and logarithmic variance (Fig. 7b) do not always follow a standard Gaussian distribution.Only µ 1 and µ 8 , approximately follow a Gaussian distribution, whereas e.g.(ln σ 10 ) looks like a mixture of two Gaussian distributions.For comparibility of the output x with this normalized input x, the tanh(x) ∈ (−1, 1) is applied element-wise to the last layer as the activation function.In order to generate trajectories similar to the measured ones in Equation ( 9), these outputs x need to be a rescaled as follows:

Statistical Validation of VAE
The result of applying the above concept to cut-in maneuvers with latent space dimension d = 10 is not only optimal encoders (and especially decoders) for generating realistic maneuvers but also a sample µ (k) , ln σ (k) , k = 1, ..., N s , with stochastic properties representing those of the input shapes x (k) ij , k = 1, ..., N s .Kernel Density Estimation (KDE) [17,18] may be used to fit coordinate-wise probability densities p µ (µ m ) and p σ (ln σ m ), m = 1...d to the elements of the encoder output in Figure 6.Some examples of these probability densities are shown in Figure 7.
These density functions highlight that the encoder outputs for mean (Figure 7a) and logarithmic variance (Figure 7b) do not always follow a standard Gaussian distribution.Only µ 1 and µ 8 , approximately follow a Gaussian distribution, whereas, e.g., ln σ 10 , looks like a mixture of two Gaussian distributions.
The process of generating statistically correct approximation samples { x(k) i , k = 1, ..., N s }, which is important for correct failure assessment of ADASs, is then as follows: (i) Chose µ (k) , σ (k) according to their densities like those in Figure 7; (ii) Chose ε ∼ N(0, I) according to a standard Gaussian distribution; (iii) Superpose these quantities according to Equation (7) to obtain a sample {z (k) , k = 1, ..., N s } of latent variables; (iv) Transform these latent variables with the decoder into normalized trajectories { x(k) = D(z (k) ; θ), k = 1, ..., N s }; (v) Rescale these trajectories by Equation (11).For a check of the statistic properties, with respect to the time behavior, another analysis is applied.For every data point d i of the lateral distance time series, the mean and variance are estimated.The resulting curves for mean and variance of the lateral position d are illustrated in Fig. 9. Obviously, the high-intensity areas of the lateral position over time match very well with the real driving data.From this visual inspection, it can already be concluded that the probability of occurrence for specific driving situations is learned by the model and maintained when generating new data samples.In order to check quantitatively whether the synthetic data have equal statistical properties, density functions for lateral position are determined with a KDE for three different time strips of 1s width.Figure 8b proves that the statistical properties are almost equal for the time periods t A , t B , and t C .This property is essential for the validation of ADASs on the basis of a randomized generation of test cases.If, for example, the model would give preference to more critical driving situations, which would then account for a much larger proportion of all test cases generated than is the case in the real driving data, the probability of failure of ADASs would be overestimated.
For a check of the statistical properties with respect to time behavior, another analysis is applied.For every data point d i of the lateral distance time series, the mean and variances are estimated.The resulting curves for mean and variance of the lateral position d are illustrated in Figure 9.For a check of the statistic properties, with respect to the time behavior, another analysis is applied.For every data point d i of the lateral distance time series, the mean and variance are estimated.The resulting curves for mean and variance of the lateral position d are illustrated in Fig. 9.The filled areas mark the intervals [µ d,i − σ d,i , µ d,i + σ d,i ], i = 1, ..., N t , for the measured cutin data (Fig. 9a) and the generated trajectories (Fig. 9b).At the beginning of the maneuvers, the variation is higher than at the end, since the start point of the cut-in maneuver can differ within the lane widths on highways, whereas the goal of each maneuver is to reach the The filled areas mark the intervals [µ d,i − σ d,i , µ d,i + σ d,i ], i = 1, ..., N t , for the measured cut-in data (Figure 9a) and the generated trajectories (Figure 9b).At the beginning of the maneuvers, the variation is higher than at the end, since the start point of the cutin maneuver can differ within the lane widths on highways, whereas the goal of each maneuver is to reach the center of the ego lane at the end.Nevertheless, the comparison of measured and AI-generated trajectories highlights that the regions along time are almost identical, and no differences are visible.
Next, we may investigate if the generated trajectories not only fit statistical properties but also reflect the time characteristics of, e.g., the real cut-in maneuver highlighted in Figure 2. Unfortunately, a direct comparison of a generated trajectory with a given real one is not possible due to the probabilistic character of VAEs.Therefore, the nearest neighbor of the real trajectory within the generated dataset { x(k) } is searched, where the Euclidean distances of all data points of the time series are used as distance measure.Figure 10 shows the original trajectory from Figure 2 (blue dashed) and the closed neighbor (red) of all the synthetic motion curves in the generated dataset, which has great similarity to the measured cut-in trajectory.
It is particularly important to emphasize that this is a randomly generated curve and not just a reconstruction of the real trajectory determined, e.g., with a deterministic autoencoder.From the comparison shown, it can be concluded that the generative model is able to depict the real lateral trajectory more realistically than, e.g., the mathematical model [6], using a third-order polynomial (black) and about 20 parameters, even though the latter curve was found by optimization as the closest approximation of the given measured trajectory.While the mathematical model can only depict the basic course of the lane change, the generative model can reproduce all the specific characteristics of the maneuvers.With about the same number of parameters being µ and σ, the AI model can apparently depict the real driving data more realistically than the mathematical model currently used.also reflect the time characteristics of e.g. the real cut-in maneuver highlighted in Figure 2. Unfortunately, a direct comparison of a generated trajectory with a given real one is not possible due to the probabilistic character of VAE.Therefore, the nearest neighbor of the real trajectory within the generated data set { x(k) } is searched where the Euclidean distances of all data points of the time series are used as distance measure.Figure 10 shows the original trajectory from Fig. 2 (blue dashed) and the closed neighbor (red) of all the synthetic motion curves in the generated data set, which has great similarity to the measured cut-in trajectory.It is particularly important to emphasize that this is a randomly generated curve and not just a reconstruction of the real trajectory determined e.g. with a deterministic autoencoder.From the comparison shown, it can be concluded that the generative model is able to depict the real lateral trajectory more realistically than e.g. the mathematical model [6] using a third-order polynomial (black) and about 20 parameters, eventhough the latter curve has been found by optimization as closest approximation of the given measured trajectory.While the mathematical model can only depict the basic course of the lane change, the generative model can reproduce all the specific characteristics of the maneuvers.With about the same number of parameters being µ and σ, the AI model can apparently depict the real driving data more realistically than the mathematical model currently used.

Conclusions
Simulative validation plays a major role in the validation of highly automated driver assistance systems, such as the Drive Pilot from Mercedes-Benz AG, which was the first Level 3 system (according to SAE standard) to be approved in Germany.This requires the generation of safety-relevant driving situations with the same characteristic and stochastical properties as real traffic scenarios.The currently applied idealized models are not able to fulfill these needs, whereas the AI-based concept presented in this paper can generate highly variable maneuvers of same motion type and with the same probabilities as observed in reality.The statistical validation has shown, that both the time characteristics and the statistical properties fit those of the input data.It only requires the training of a variational autoencoder with measured data, but no physical understanding or any highlevel mathematical modeling effort.Therefore, the concept may be applied to any other common driving scenario, such as cut-out or cut-through scenarios, and used as basic tool to support the further development of highly automated driving functions.

Conclusions
Simulative validation plays a major role in the validation of highly automated driver assistance systems, such as the Drive Pilot from Mercedes-Benz AG, which was the first Level-3 system (according to SAE standard) to be approved in Germany.This requires the generation of safety-relevant driving situations with the same characteristic and stochastic properties as real traffic scenarios.The currently applied idealized models are not able to fulfill these needs, whereas the AI-based concept presented in this paper can generate highly variable maneuvers of the same motion type and with the same probabilities as those observed in reality.Statistical validation has shown that both the time characteristics and the statistical properties fit those of the input data.It only requires the training of a variational autoencoder with measured data and no physical understanding or any highlevel mathematical modeling effort.Therefore, the concept may be applied to any other common driving scenario, such as cut-out or cut-through scenarios, and used as a basic tool to support the further development of highly automated driving functions.

Patents
A patent application is in preliminary examination.

Figure 1 .
Figure 1.Cut-in scenario: (a) image sequence, (b) idealized parameterized maneuver, and (c) idealized example trajectory of lateral displacement with reference to ego lane.

onFigure 2 .
Figure 2. Variety of measured cut-in maneuvers with one real example trajectory for lateral displacement being highlighted

Figure 2 . 4 MonteFigure 3 .
Figure 2. Variety of measured cut-in maneuvers with one real example trajectory for lateral displacement being highlighted.Version September 16, 2023 submitted to Appl.Mech. 4 108

Figure 3 .
Figure 3. Generation of cut-in maneuvers by (a) idealized causal model and (b) AI model based on Monte Carlo simulation to determine the criticality of driver assistance systems.

Figure 4 .
Figure 4. Data preprocessing: (a) correction of curved trajectory and (b) smoothing and interpolation of measured signal

Figure 4 .
Figure 4. Data preprocessing: (a) correction of curved trajectory and (b) smoothing and interpolation of measured signal.

Figure 5 .
Figure 5. Structure of used VAE with dimension d of latent space.

Figure 6 .
Figure 6.Structure of used VAE with dimension d of latent space

Figure 6 .
Figure 6.Structure of used VAE with dimension d of latent space.

FigureFigure 8 .
Figure 8a shows the set of the same number as Figure 2 of randomly generated trajectories.Version September 16, 2023 submitted to Appl.Mech. 10

Figure 8 .
Figure 8. Generated lateral trajectories by decoder of the trained VAE (a) and corresponding densities (b) of real (underlying gray line) and generated (red line) trajectories for a time stripe of 1s width at various time instances.

Figure 8 .
Figure 8. Generated lateral trajectories by decoder of the trained VAE (a) and corresponding densities (b) of real (underlying gray line) and generated (red line) trajectories for time stripe of 1s width at various time instances

Figure 9 .
Figure 9. Mean and variation of (a) measured and (b) generated trajectories

Figure 9 .
Figure 9. Mean and variation of (a) measured and (b) generated trajectories.