A Reproducible Reference Architecture for Automated Driving Scenario Databases
Abstract
1. Introduction
1.1. Problem Statement and Research Gap
- Fragmented end-to-end pipelines. Scenario generation, ingestion, validation, curation, and enrichment are often handled by separate tools or processes. While this is natural in complex toolchains, it can lead to heterogeneous schemas, implicit assumptions, and limited traceability of how scenarios were created, modified, or validated over time.
- Limited modeling of value-constrained querying. Many repositories support tag- or keyword-based retrieval. More structured constraints (e.g., numeric ranges, parameter thresholds, or attribute-value constraints) are not always modeled as explicit, indexed metadata aligned with standards. This can make precise retrieval more difficult when users require both semantic concepts and quantitative filters for targeted validation campaigns.
1.2. SCDB Usage Contexts, Interoperability, and Federation
1.3. Contributions
- Reference architecture and data model. We formalize an SCDB lifecycle (generation/ingestion, validation, curation, storage, indexing, query, export) and define a data model that treats OpenX artifacts and OpenLABEL metadata as first-class assets with explicit provenance and schema/version control.
- Standards-oriented generation and packaging. We describe a pipeline translating declarative scenario specifications (ODD descriptors, road topology, behavioral templates) into OpenDRIVE/OpenSCENARIO artifacts with OpenLABEL annotations and reproducible manifests.
- OpenLABEL-driven querying with value constraints. We define querying as OpenLABEL-based matching over categorical tags and value-carrying tag_data (numeric and textual), implemented through an explicit tag–scenario association index for scalable retrieval.
- Cloud-native, IaC deployment. We provide an Infrastructure-as-Code (IaC) deployment blueprint enabling reproducible provisioning and low operational overhead, supporting both local adoption and future interoperability patterns.
2. Related Work
3. Reference Architecture and Data Model
3.1. Architectural Requirements
- R1: Standards-first artifact exchange and management. Scenario artifacts and their semantics must use open standards (OpenDRIVE, OpenSCENARIO, and OpenLABEL) without toolchain-specific coupling. The reference implementation stores versioned, standards-compliant scenario packages to maximize interoperability and reproducibility; however, this is an implementation choice rather than an architectural requirement.
- R2: Lifecycle governance. Validation outcomes and curation state must be explicit, queryable, and enforced to prevent accidental reuse of invalid or deprecated scenarios.
- R3: Structured querying. The system must support semantic querying via standardized descriptors and value-constrained querying via numeric and textual tag_data, including combined predicates.
- R4: Reproducibility. Scenario creation, deployment, and export must be reproducible through explicit provenance, versioning, and Infrastructure-as-Code.
- R5: Extensibility. New vocabularies, ontologies, tag schemes, and future enrichment services must be integrable without refactoring the core data model.
3.2. Conceptual SCDB Workflow
- Create/Ingest: Generation or ingestion of scenario packages and metadata.
- Validate: Syntactic and semantic validation of artifacts and descriptors.
- Curate: Assignment of lifecycle state, versioning, and deprecation.
- Store: Persistence of immutable scenario packages and structured metadata.
- Index: Construction of semantic and value indexes for querying.
- Query: Retrieval using combined semantic and value predicates.
- Export: Delivery of reproducible scenario bundles with manifests.
3.3. Scenario Record Schema
- OpenLABEL as first-class query metadata
- Logical normalization for querying
- Indexing via tag–scenario association
- Provenance, human description, and lifecycle state
- Query semantics
- Candidate selection: Extraction of tag types from and retrieval of matching scenario identifiers via the junction relation .
- Predicate refinement: Application of constraints derived from valued tag_data in , and restriction to admissible lifecycle states (typically ).
- Reproducible packaging
4. Cloud-Native Implementation and Interfaces
4.1. Design Rationale and Deployment Model
4.2. Core Components
- Immutable artifact storage
- Metadata catalog and tag association index
- Stateless compute and API layer
- Lifecycle maintenance
4.3. OpenLABEL-Driven Querying Interface
- Query-as-OpenLABEL principle
- Categorical versus valued tags
- Two-stage retrieval pipeline
- Candidate selection. Tag types are extracted from and used to query the tag–scenario association index. Candidate scenario identifiers are retrieved through indexed lookups, avoiding full-table scans.
- Predicate refinement. Scenario metadata records for candidate identifiers are fetched in batches, and additional predicates are applied. These include constraints derived from valued tag_data (numeric comparisons and textual constraints) and admissibility checks on the curation state (e.g., restricting results to published scenarios).
4.4. Ingestion and Export Interfaces
- Scenario ingestion
- Scenario export
4.5. Identity, Access Control, and API Contracts
- Normal users, who are authorized to perform read-only querying operations such as OpenLABEL-based querying and metadata retrieval;
- Contributors, who are additionally authorized to perform write operations, including scenario ingestion, metadata updates, and lifecycle transitions (e.g., publishing);
- Administrators, who have extended privileges for governance operations, including user management and maintenance tasks.
4.6. Portability Considerations
5. Scenario Generation and Packaging
5.1. Generation Objectives and Scope
5.2. ODD-Centered, Parameter-Based Scenario Modeling
5.3. OpenDRIVE-Centered Scene Anchoring
- External OpenDRIVE ingestion, where complex road networks (e.g., urban layouts or highways) are imported from external sources and reused as immutable anchors;
- Procedural OpenDRIVE synthesis, where parameterized road motifs (e.g., two-way roads, multi-lane highways, merges, junctions, roundabouts) are generated programmatically for controlled experimentation.
5.4. Behavior Templates and OpenSCENARIO Emission
5.5. Constraint-Aware Concretization
- fixation of road parameters,
- assignment of initial states conditioned on road geometry,
- instantiation of behavioral parameters conditioned on initial states,
- assignment of environmental parameters.
5.6. Semantic Annotation and Reproducible Packaging
6. Validation, Curation, and Optional Enrichment
6.1. Two-Stage Validation
- Syntactic validation, ensuring conformance to data schemas (e.g., OpenDRIVE/OpenSCENARIO/OpenLABEL) and referential integrity among road, actors, and scenario definitions.
- Semantic validation, enforcing feasibility and plausibility constraints such as collision-free initialization, parameter bounds, and consistency with ODD definitions (e.g., admissible speed ranges under specific environmental conditions).
6.2. Optional Execution-Based Enrichment
6.3. Curation Workflow
7. Semantic and Value-Based Querying
8. Evaluation
8.1. Experimental Setup
- scenario ingestion and metadata registration,
- tag-based querying,
- value-constrained querying over tag_data,
- reproducible export of scenario packages.
8.2. Results and Interpretation
- Structured querying (R3)
- Lifecycle governance (R2)
- Reproducibility (R4)
9. Discussion
9.1. Positioning for Interoperability and Federation
9.2. Design Trade-Offs
9.3. Threats to Validity and Limitations
- Dataset coverage and representativeness. The initial corpus is designed for structural diversity to exercise the SCDB lifecycle; it is not intended to provide complete scenario-space coverage or statistical representativeness of real-world traffic distributions.
- Execution environment dependence. Execution traces and metrics enrichment depend on the chosen simulator and configuration; measurement values and criticality measures may vary across toolchains unless execution services and metric definitions are standardized and versioned.
- Ontology and tag completeness. Query semantics are constrained by the expressiveness and coverage of the adopted ontologies and tagging practices; incomplete vocabularies can lead to reduced recall even if the underlying artifacts are relevant.
- Implementation-specific effects. While the reference architecture is backend-agnostic, operational characteristics (e.g., cold-start latency, throughput limits, cost profile) depend on the chosen cloud or deployment substrate.
10. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Neurohr, C.; Westhofen, L.; Henning, T.; De Graaff, T.; Möhlmann, E.; Böde, E. Fundamental considerations around scenario-based testing for automated driving. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 121–127. [Google Scholar] [CrossRef]
- Zhang, X.; Khastgir, S.; Tiele, J.K.; Takenaka, K.; Hayakawa, T.; Jennings, P. Odd and behavior based scenario generation for automated driving systems. IEEE Access 2024, 12, 10652–10663. [Google Scholar] [CrossRef]
- Ulbrich, S.; Menzel, T.; Reschka, A.; Schuldt, F.; Maurer, M. Defining and Substantiating the Terms Scene, Situation, and Scenario for Automated Driving. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems (ITSC), Gran Canaria, Spain, 15–18 September 2015; pp. 982–988. [Google Scholar] [CrossRef]
- PEGASUS Project Consortium. The PEGASUS Method: Measuring Automated Driving Capability (Project Material). 2019. Available online: https://www.pegasusprojekt.de/en/about-PEGASUS (accessed on 12 September 2025).
- Weber, H.; Bock, J.; Klimke, J.; Roesener, C.; Hiller, J.; Krajewski, R.; Zlocki, A.; Eckstein, L. A framework for definition of logical scenarios for safety assurance of automated driving. Traffic Inj. Prev. 2019, 20, S65–S70. [Google Scholar] [CrossRef] [PubMed]
- Ko, W.; Park, S.; Yun, J.; Park, S.; Yun, I. Development of a framework for generating driving safety assessment scenarios for automated vehicles. Sensors 2022, 22, 6031. [Google Scholar] [CrossRef] [PubMed]
- Scholtes, M.; Westhofen, L.; Turner, L.R.; Lotto, K.; Schuldes, M.; Weber, H.; Wagener, N.; Neurohr, C.; Bollmann, M.H.; Körtke, F.; et al. 6-Layer Model for a Structured Description and Categorization of Urban Traffic and Environment. IEEE Access 2021, 9, 59131–59147. [Google Scholar] [CrossRef]
- ASAM e.V. ASAM OpenDRIVE. Online Resource. 2025. Available online: https://www.asam.net/standards/detail/opendrive/ (accessed on 30 September 2025).
- ASAM e.V. ASAM OpenSCENARIO 1.2.0 User Guide. Online Resource. 2022. Available online: https://www.asam.net/standards/detail/openscenario/ (accessed on 30 September 2025).
- ASAM e.V. ASAM OpenLABEL. Online Resource. 2021. Available online: https://www.asam.net/standards/detail/openlabel/ (accessed on 30 September 2025).
- ASAM e.V. ASAM OpenXOntology. Online Resource. 2024. Available online: https://www.asam.net/standards/detail/openxontology/ (accessed on 30 September 2025).
- De Gelder, E.; Paardekooper, J.P.; Saberi, A.K.; Elrofai, H.; Op den Camp, O.; Kraines, S.; Ploeg, J.; De Schutter, B. Towards an Ontology for Scenario Definition for the Assessment of Automated Vehicles: An Object-Oriented Framework. IEEE Trans. Intell. Veh. 2022, 7, 300–314. [Google Scholar] [CrossRef]
- Bagschik, G.; Menzel, T.; Maurer, M. Ontology Based Scene Creation for the Development of Automated Vehicles. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1813–1820. [Google Scholar] [CrossRef]
- Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.; Santos, L.B.d.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
- SafetyPool.ai. Online Resource. 2025. Available online: https://www.safetypool.ai/ (accessed on 12 September 2025).
- Schuldes, M.; Glasmacher, C.; Eckstein, L. scenario. center: Methods from Real-world Data to a Scenario Database. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju, Republic of Korea, 2–5 June 2024; pp. 1119–1126. [Google Scholar] [CrossRef]
- Vass, S.; Galassi, M.C.; Ciuffo, B.; Baldini, G. A common scenario database for Automated Vehicles validation and certification. Transp. Res. Procedia 2023, 72, 3845–3852. [Google Scholar] [CrossRef]
- Feng, Y.; Bao, S.; Liu, H. Connected and Automated Vehicle (CAV) Testing Scenario Design and Implementation Using Naturalistic Driving Data and Augmented Reality; Technical Report; Report No. UMTRI-2023-6; University of Michigan Transportation Research Institute (UMTRI): Ann Arbor, MI, USA, 2023. Available online: https://rosap.ntl.bts.gov/view/dot/73486 (accessed on 12 September 2025).
- Althoff, M.; Koschi, M.; Manzinger, S. CommonRoad: Composable Benchmarks for Motion Planning on Roads. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 719–726. [Google Scholar] [CrossRef]
- Maierhofer, S.; Klischat, M.; Althoff, M. CommonRoad Scenario Designer: An Open-Source Toolbox for Map Conversion and Scenario Creation for Autonomous Vehicles. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 4792–4799. [Google Scholar] [CrossRef]
- Fremont, D.J.; Dreossi, T.; Ghosh, S.; Yue, X.; Sangiovanni-Vincentelli, A.L.; Seshia, S.A. Scenic: A Language for Scenario Specification and Scene Generation. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Phoenix, AZ, USA, 22–26 June 2019; pp. 63–78. [Google Scholar] [CrossRef]
- Zhang, X.; Khastgir, S.; Jennings, P. Scenario Description Language for Automated Driving Systems: A Two Level Abstraction Approach. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 973–980. [Google Scholar] [CrossRef]
- Majumdar, R.; Mathur, A.; Pirron, M.; Stegner, L.; Zufferey, D. Paracosm: A Language and Tool for Testing Autonomous Driving Systems. arXiv 2019, arXiv:1902.01084. [Google Scholar]
- Zhao, Y.; Xiao, W.; Mihalj, T.; Hu, J.; Eichberger, A. Chat2Scenario: Scenario Extraction from Dataset through Utilization of Large Language Models. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju, Republic of Korea, 2–5 June 2024; pp. 559–566. [Google Scholar] [CrossRef]
- Özsu, M.T.; Valduriez, P. Principles of Distributed Database Systems, 4th ed.; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar] [CrossRef]
- Pyoscx Contributors. Scenariogeneration: A Python Framework for Generating OpenSCENARIO and OpenDRIVE Content. Open-Source Software. 2024. Available online: https://github.com/pyoscx/scenariogeneration (accessed on 12 February 2026).
- Menzel, T.; Bagschik, G.; Isensee, L.; Schomburg, A.; Maurer, M. From functional to logical scenarios: Detailing a keyword-based scenario description for execution in a simulation environment. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 2383–2390. [Google Scholar] [CrossRef]
- Menzel, T.; Bagschik, G.; Maurer, M. Scenarios for development, test and validation of automated vehicles. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1821–1827. [Google Scholar] [CrossRef]
- Pek, C.; Rusinov, V.; Manzinger, S.; Üste, M.C.; Althoff, M. CommonRoad drivability checker: Simplifying the development and validation of motion planning algorithms. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1013–1020. [Google Scholar] [CrossRef]
- Lin, Y.; Althoff, M. CommonRoad-CriMe: A toolbox for criticality measures of autonomous vehicles. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–8. [Google Scholar] [CrossRef]




| Item | Value |
|---|---|
| Number of scenario packages | 466 |
| Number of unique tags (categorical + valued) | 94 |
| Average tags per scenario | 12 |
| Median query latency (tag-only) | 838 ms |
| P95 query latency (tag-only) | 1010 ms |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Azar, Y.T.; Ortega, J.D.; Nieto, M. A Reproducible Reference Architecture for Automated Driving Scenario Databases. Vehicles 2026, 8, 88. https://doi.org/10.3390/vehicles8040088
Azar YT, Ortega JD, Nieto M. A Reproducible Reference Architecture for Automated Driving Scenario Databases. Vehicles. 2026; 8(4):88. https://doi.org/10.3390/vehicles8040088
Chicago/Turabian StyleAzar, Yavar Taghipour, Juan Diego Ortega, and Marcos Nieto. 2026. "A Reproducible Reference Architecture for Automated Driving Scenario Databases" Vehicles 8, no. 4: 88. https://doi.org/10.3390/vehicles8040088
APA StyleAzar, Y. T., Ortega, J. D., & Nieto, M. (2026). A Reproducible Reference Architecture for Automated Driving Scenario Databases. Vehicles, 8(4), 88. https://doi.org/10.3390/vehicles8040088

