Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Generation of Multiple Types of Driving Scenarios with Variational Autoencoders for Autonomous Driving

Future Transp. 2025, 5(4), 159; https://doi.org/10.3390/futuretransp5040159

by Manasa Mariam Mammen¹, Zafer Kayatas^1,*

and Dieter Bestle²

Reviewer 1: Anonymous

Reviewer 2:

Dimitri Konstantas

Reviewer 3:

Adina Aniculaesei

Future Transp. 2025, 5(4), 159; https://doi.org/10.3390/futuretransp5040159

Submission received: 8 August 2025 / Revised: 10 October 2025 / Accepted: 17 October 2025 / Published: 2 November 2025

(This article belongs to the Special Issue Autonomous Vehicles and Urban Evolution: Technological, Social and Environmental Perspectives)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper proposes a unified Variational Autoencoder framework to generate six types of driving maneuvers for autonomous driving validation. Evaluations demonstrate that the unified model preserves statistical properties of real data, achieves comparable performance to individual maneuver-specific VAEs, and reduces modeling complexity. In general, it presents a viable and concrete contribution to the related topic. Only a minor revision is recommended before accepting for this journal publication. Detailed comment are given as follows.

The paper is generally well-written, but inconsistent terminology may cause confusion. For instance, "combined VAE" and "unified VAE" are used interchangeably. Standardizing terms would improve clarity. Additionally, "physical space" lacks definition—specify whether it refers to trajectory coordinates or feature distributions.
Quantitative metrics robustly validate trajectory realism and distribution alignment. However, critical tests for safety assessment are missing. No analysis of generated trajectories violating physical constraints, e.g., abrupt acceleration.
Cut-through maneuvers show higher MSE, likely due to data scarcity—yet no ablation study confirms this. The unified model’s inference speed vs. individual VAEs is unreported, despite "scalability" being a key claim.
The unified framework’s ability to handle six maneuvers in one model is a practical advance for industry testing. However, the novelty is incremental: conditional VAEs for multi-class generation are well-established. The paper’s core innovation lies in demonstrating this for driving scenarios with shared latent structure, but deeper discussion of how latent clusters enable "controllable" generation is needed to distinguish it from prior VAE extensions.
The latest literature on autonomous driving should be reviewed for comprehensiveness. For instance, The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns; A review on reinforcement learning-based highway autonomous vehicle control; Deep transfer learning for intelligent vehicle perception: A survey.
Figures should be revised to make them more readable and informative. For instance, the font sizes and line widths in all the figures should be identical and large enough to see. Line style rather than line color should be used to differentiate the curves within each figure as the paper may be available at black-and-white print once accepted for publication.
Table 2 notes reduced batch sizes for CTL/CTR due to data imbalance but lacks empirical validation, e.g., training curves showing instability. The effect of λpred on classification accuracy is also unexplored—varying this weight could reveal trade-offs between trajectory quality and maneuver-type fidelity.

Author Response

The paper is generally well-written, but inconsistent terminology may cause confusion. For instance, "combined VAE" and "unified VAE" are used interchangeably. Standardizing terms would improve clarity. Additionally, "physical space" lacks definition—specify whether it refers to trajectory coordinates or feature distributions.

Comment: Thank you for the comment. We have addressed the consistency issues and included physical space definition.

Quantitative metrics robustly validate trajectory realism and distribution alignment. However, critical tests for safety assessment are missing. No analysis of generated trajectories violating physical constraints, e.g., abrupt acceleration.

Comment: Thank you for the comment; however, addressing these aspects would exceed the limits to the current paper. The very specific intention here is to develop a simple framework for generating trajectories with the same characteristics as real measured data. As this is almost perfectly achieved, safety assessment addressed in one of our earlier papers would produce reliable safety assessment. Although extreme maneuvers may be interesting from a theoretical point of view, as long as they are not part of real data they would deteriorate the certification process of ADAS.

Cut-through maneuvers show higher MSE, likely due to data scarcity—yet no ablation study confirms this. The unified model’s inference speed vs. individual VAEs is unreported, despite "scalability" being a key claim.

Comment: The authors thank the reviewer for the observation. We agree that the MSE for cut-through maneuvers is slightly higher. However, the absolute reconstruction error remains low and within acceptable bounds. Given the marginal difference, we did not conduct a full ablation, but we acknowledge this as a potential area for further study.

To address the comment on inference speed, we have now included a direct comparison in the manuscript.

The unified framework’s ability to handle six maneuvers in one model is a practical advance for industry testing. However, the novelty is incremental: conditional VAEs for multi-class generation are well-established. The paper’s core innovation lies in demonstrating this for driving scenarios with shared latent structure, but deeper discussion of how latent clusters enable "controllable" generation is needed to distinguish it from prior VAE extensions.

Comment: The authors fully agree that VAEs in general are well established and that the intention here is to make them applicable to a rather, but very important engineering task. As discussed in the conclusions, controllability of specific maneuvers is already part of our actual research, but especially statistically correct generation at the borders of clusters in the latent space and the correlation with a correct safety assessment needs more investigations before it can be published in an upcoming paper.

The latest literature on autonomous driving should be reviewed for comprehensiveness. For instance, The investigation of reinforcement learning-based end-to-end decision-making algorithms for autonomous driving on the road with consecutive sharp turns; A review on reinforcement learning-based highway autonomous vehicle control; Deep transfer learning for intelligent vehicle perception: A survey.

Comment: Thank you for pointing out these works. Since our study treats the ADAS functionality as a black box, the underlying implementation (rule-based or neural network) does not affect our validation process. Nevertheless, in the revised paper we cited recent literature for completeness.

Figures should be revised to make them more readable and informative. For instance, the font sizes and line widths in all the figures should be identical and large enough to see. Line style rather than line color should be used to differentiate the curves within each figure as the paper may be available at black-and-white print once accepted for publication.

Comment: In principle, the authors agree with your suggestion. However, as MDPI prints figures in color and everybody has a color printer nowadays, we better keep line colors for more clarity. Font sizes are now more consistent with the text, and the figure format follows MDPI standards, in line with previous publications.

Table 2 notes reduced batch sizes for CTL/CTR due to data imbalance but lacks empirical validation, e.g., training curves showing instability. The effect of λpred on classification accuracy is also unexplored—varying this weight could reveal trade-offs between trajectory quality and maneuver-type fidelity.

Comment: Thank you for this insightful comment. We acknowledge the value of a deeper analysis, but as our focus is on demonstrating the unified model’s applicability and performance, detailed studies on hyperparameter sensitivity were kept out of scope. The selected values were chosen through grid search optimization and found to offer stable and effective performance across scenarios.

Reviewer 2 Report

Comments and Suggestions for Authors

The paper address the important issue of autonomous driving in highways and how scenaria for ADS validation can be designed. The propose a new method for generating realistic scenaria that the ADS will have to face in a highway, addressing 6 maneuvercases where the ADS will respond to avoid collisions and accidents. The main innovation is first that the propose an integrated scenario generation, and they are based on real date for creating the propabilities that each scenario will appear. There is a lot of work done in this domain, as the authors acknowledge. The highway behaviour being an important theme in ADS development. They are thus addressing a know problems, and they provide a new method. The originality of the proposed work is the integration of the scenaria in one model. This is interesting but, to my opinion, not sufficient as is.

As you say (line 315) " Experimental results show that the unified model achieves
the same high performance as individually trained models, while preserving correct probabilities of occurrence of the different maneuver types. "

So, in the current state the results are the same (at most!). So, I wonder what is the advantage in your method (as it is now). The future extensions will indeed provide a new perspective (physical parameters, which I can expect to be something like wet road .. - am I correct?).

As you say you are based on real data from highways. Indeed these six maneuver types are the most common. However I am sure that in the available data you also have rarer cases of more complex situations. Can these be integrated in the model? would good to describe these situations which the standard models do not take into account.

Author Response

The paper address the important issue of autonomous driving in highways and how scenaria for ADS validation can be designed. The propose a new method for generating realistic scenaria that the ADS will have to face in a highway, addressing 6 maneuver cases where the ADS will respond to avoid collisions and accidents. The main innovation is first that the propose an integrated scenario generation, and they are based on real date for creating the propabilities that each scenario will appear. There is a lot of work done in this domain, as the authors acknowledge. The highway behaviour being an important theme in ADS development. They are thus addressing a known problems, and they provide a new method. The originality of the proposed work is the integration of the scenaria in one model. This is interesting but, to my opinion, not sufficient as is.

As you say (line 315) " Experimental results show that the unified model achieves the same high performance as individually trained models, while preserving correct probabilities of occurrence of the different maneuver types. "

Comment: As pointed out in the abstract and introduction, the advantages of the unified framework is its efficiency, consistency, and scalability compared to multiple specific VAEs. The revised paper tries to make this even more clear.

As you say you are based on real data from highways. Indeed, these six maneuver types are the most common. However I am sure that in the available data you also have rarer cases of more complex situations. Can these be integrated in the model? would good to describe these situations which the standard models do not take into account.

Comment: Thank you for this important comment. While our current focus is on frequently observed highway maneuvers, rare scenarios- such as pedestrian or obstacle avoidance are not well represented in the dataset and still require expert driven description. Nonetheless, integrating such cases into a generative framework is a good direction for future work, and we see potential in extending our model to support such hybrid approaches.

Reviewer 3 Report

Comments and Suggestions for Authors

Strengths

The paper addresses an important problem in the testing and validation of autonomous driving systems, namely the generation of diverse driving scenarios using variational autoencoders. The idea of unifying multiple maneuver types (cut-in, cut-out, cut-through) within a single Variational Autoencoder (VAE) framework is novel in this application domain and contributes to scalability compared to training separate models. The authors provide a clear description of their model architecture, use real-world data, and include both qualitative and quantitative evaluations (heatmaps, MSE, KDE, clustering). The discussion of future work on controllability and multi-agent dynamics points to promising directions.

Weaknesses

Related work: The literature review is limited. Beyond the authors’ own prior work, few external papers are discussed, especially in relation to diffusion models and conditional generative approaches that are strong alternatives to VAEs.
Paper structure: There is no clear separation between Methods and Materials, Evaluation, and Discussion. Model architecture, evaluation setup, and results are intermixed, making it difficult for the reader to follow. Reflection on threats to validity is missing.
Technical clarity:
- The role of the term ε ~ N(0, I) in the VAE is not consistently explained. It appears in the individual VAE architecture but is only discussed in detail when introducing the unified VAE.
- It is unclear whether the prior p(z) being a multivariate Gaussian is an assumption (as in standard VAEs) or derived empirically from real-world data. If it is data-driven, this should be emphasized.
- Some results of the unified model are presented before the unified model is introduced, which disrupts the logical flow.
Evaluation methodology:
- Evaluation is mixed with the architecture description. Typically, one would expect a hypothesis and then a quantitative confirmation or falsification. Here, the general purpose of each evaluation method is explained, but the extent to which the purpose was achieved is left to the reader’s interpretation.
- For the cut-through (CT) maneuvers, higher validation errors are attributed vaguely to “training dynamics.” It would be useful to clarify what training dynamics are meant, how early stopping patience plays a role, and to what extent the lower probability distribution of occurrence for CT classes influences performance.
- Figure 8 shows that the frequency distribution of generated maneuvers matches the training data very closely. Does this indicate potential overfitting? If not, why is overfitting not an issue in this case?
- The claim that the unified model achieves the same high performance as the separate models should be rephrased as “similarly high performance,” since the numerical results are not identical.
Reproducibility: The dataset cannot be shared due to OEM restrictions, which severely limits reproducibility. At minimum, a more detailed description of preprocessing, training setup, and computational cost should be provided.
Limitations:
- The approach only considers single surrounding vehicles, not multi-agent interactions, which are critical for safety validation.
- The generated scenarios appear realistic but it remains unclear whether they cover rare, safety-critical cases.
- Scalability claims are made, but training time and resource requirements are not reported.
- There is no discussion of controllability of generated outputs (beyond the future work section), although simple conditioning (e.g., lane, speed) could have been demonstrated.
- Some of the points above are mentioned briefly in the section Conclusion, but actually I think that the paper would benefit from having a separate subsection discussing the limitations of the approach, instead of the short paragraph in the Conclusion about this.

Suggestions for Improvement

Strengthen the related work section by discussing recent advances with diffusion models, conditional VAEs, and scenario generation frameworks beyond the authors’ prior work.
Clearly separate Methods/Materials, Evaluation, and Discussion of Results, and add a section on Threats to Validity (dataset imbalance, OEM data restrictions, generalization beyond highways, etc.).
Clarify the mathematical notation: explain the role of ε ~ N(0, I) consistently, and specify whether p(z) is an assumption or data-driven.
Improve the evaluation design: formulate hypotheses (e.g., “the unified VAE preserves statistical distributions across all maneuver classes”) and test them explicitly. Emphasize more explicitly the issues (if any) brought on by the CT class imbalance, training dynamics. Also address the question of overfitting.
Report computational aspects: training time, hardware used, number of parameters, and scalability in practice.
Discuss practical implications: how the generated scenarios can be integrated into established AV safety validation pipelines (PEGASUS, ISO 34502).
Add a discussion of the approach's limitations and future work.

Author Response

Strengths

Weaknesses

Related work: The literature review is limited. Beyond the authors’ own prior work, few external papers are discussed, especially in relation to diffusion models and conditional generative approaches that are strong alternatives to VAEs.
Paper structure: There is no clear separation between Methods and Materials, Evaluation, and Discussion. Model architecture, evaluation setup, and results are intermixed, making it difficult for the reader to follow.Reflection on threats to validity is missing.
Technical clarity:
- The role of the term ε ~ N(0, I)in the VAE is not consistently explained. It appears in the individual VAE architecture but is only discussed in detail when introducing the unified VAE.
- It is unclear whether the prior p(z)being a multivariate Gaussian is an assumption (as in standard VAEs) or derived empirically from real-world data. If it is data-driven, this should be emphasized.
- Some results of the unified model are presented before the unified model is introduced, which disrupts the logical flow.
Evaluation methodology:
- Evaluation is mixed with the architecture description. Typically, one would expect a hypothesis and then a quantitative confirmation or falsification. Here, the general purpose of each evaluation method is explained, but the extent to which the purpose was achieved is left to the reader’s interpretation.
- For the cut-through (CT) maneuvers, higher validation errors are attributed vaguely to “training dynamics.” It would be useful to clarify what training dynamics are meant, how early stopping patience plays a role, and to what extent the lower probability distribution of occurrence for CT classes influences performance.
- Figure 8 shows that the frequency distribution of generated maneuvers matches the training data very closely. Does this indicate potential overfitting? If not, why is overfitting not an issue in this case?
- The claim that the unified model achieves the samehigh performance as the separate models should be rephrased as “similarly high performance,” since the numerical results are not identical.
Reproducibility: The dataset cannot be shared due to OEM restrictions, which severely limits reproducibility. At minimum, a more detailed description of preprocessing, training setup, and computational cost should be provided.
Limitations:
- The approach only considers single surrounding vehicles, not multi-agent interactions, which are critical for safety validation.
- The generated scenarios appear realistic but it remains unclear whether they cover rare, safety-critical cases.
- Scalability claims are made, but training time and resource requirements are not reported.
- There is no discussion of controllability of generated outputs (beyond the future work section), although simple conditioning (e.g., lane, speed) could have been demonstrated.
- Some of the points above are mentioned briefly in the section Conclusion, but actually I think that the paper would benefit from having a separate subsection discussing the limitations of the approach, instead of the short paragraph in the Conclusion about this.

Comment: The authors appreciate the very detailed and fully correct assessment of the paper; thank you very much! In the revised paper we tried to address weakness criticism as much as possible in a single short paper.

Suggestions for Improvement

Strengthen the related worksection by discussing recent advances with diffusion models, conditional VAEs, and scenario generation frameworks beyond the authors’ prior work.

Comment: Thank you for the suggestion. We have extended the related work section to include the mentioned alternatives.

Clearly separate Methods/Materials, Evaluation, and Discussion of Results, and add a section on Threats to Validity(dataset imbalance, OEM data restrictions, generalization beyond highways, etc.).

Comment: Thank you for the suggestion. We retained the current structure due to page limitation and formatting constraints.

Clarify the mathematical notation: explain the role of ε ~ N(0, I)consistently and specify whether p(z) is an assumption or data driven.

Comment: Thank you for pointing this out. We clarified the mathematical notation, explained the role of ε ~ N(0, I) consistently, and specified p(z).

Improve the evaluation design: formulate hypotheses (e.g., “the unified VAE preserves statistical distributions across all maneuver classes”) and test them explicitly. Emphasize more explicitly the issues (if any) brought on by the CT class imbalance, training dynamics. Also address the question of overfitting.

Comment: Thank you for the comment. We revised the introduction to clarify the evaluation design and also added training dynamics. We monitored overfitting using a validation dataset measuring MSE.

Reportcomputational aspects : training time, hardware used, number of parameters, and scalability in practice.

Comment: Thank you for the suggestion. We added details on training time, used hardware, number of parameters, and scalability.

Discuss practical implications: how the generated scenarios can be integrated into established AV safety validation pipelines (PEGASUS, ISO 34502).

Comment: Thank you for the comment. The described AI based scenario modelling can be used for the scenario-based simulation approach (analogous PEGASUS, ISO 34502) as a substitution of the already used mathematical modelling of the driving maneuver such as cut-in, cut-out and cut-through. Within our paper, we referred to our previous work [23] where this point is explained in detail (compared mathematical approach with the AI-based approach).

Add a discussion of the approach's limitations and future work.

Comment: Thank you for the suggestion. However, instead of a separate section we extended the conclusion section by this very important aspect of explicitly naming limitations.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

Thank you for the updates and the clarifications. it is much clearer now.

Reviewer 3 Report

Comments and Suggestions for Authors

Thank you for your response and taking care of the issues pointed out in the review.

I would accept the paper in its present form.

Article Menu

Generation of Multiple Types of Driving Scenarios with Variational Autoencoders for Autonomous Driving

Strengths

Weaknesses

Suggestions for Improvement

Further Information

Guidelines

MDPI Initiatives

Follow MDPI