A Reproducible Reference Architecture for Automated Driving Scenario Databases

Azar, Yavar Taghipour; Ortega, Juan Diego; Nieto, Marcos

doi:10.3390/vehicles8040088

Open AccessArticle

A Reproducible Reference Architecture for Automated Driving Scenario Databases

by

Yavar Taghipour Azar

,

Juan Diego Ortega

and

Marcos Nieto

^*

Vicomtech Foundation, Basque Research and Technology Alliance (BRTA), 20009 San Sebastián, Spain

^*

Author to whom correspondence should be addressed.

Vehicles 2026, 8(4), 88; https://doi.org/10.3390/vehicles8040088

Submission received: 12 February 2026 / Revised: 30 March 2026 / Accepted: 5 April 2026 / Published: 10 April 2026

Download

Browse Figures

Versions Notes

Abstract

As automated vehicles move from controlled environments to unpredictable real-world roads, scenario-based testing has become the cornerstone of safety validation. In recent years, substantial progress has been made in scenario representation standards and generation methodologies. However, integrating scenario generation, standards-aligned packaging, validation, curation, and structured querying into a reproducible end-to-end lifecycle remains challenging in practice. This work presents a reproducible reference architecture for Scenario Databases (SCDBs) that treats scenario collections as lifecycle-governed data systems rather than static repositories. The proposed architecture unifies the scenario lifecycle within a single workflow. It integrates scenario generation and ingestion, validation and curation, immutable storage, semantic and value-based querying, and reproducible export. Scenario semantics are represented using ASAM OpenX formats (OpenDRIVE and OpenSCENARIO), together with ASAM OpenLABEL metadata, enabling standards-aligned interoperability. Querying is performed over categorical and value-carrying metadata without requiring inspection of raw scenario artifacts at query time. The reference implementation is deployed using Infrastructure-as-Code, supporting reproducibility and low operational overhead. Execution-based metric enrichment is supported as an optional extension, enabling scenarios to be augmented with execution-derived measurements and trace metadata. The contribution is not a centralized database, but a reference architecture and deployment blueprint that supports interoperable and federated scenario ecosystems. By framing SCDBs as reproducible lifecycle systems, this work supports scalable scenario reuse and more transparent safety validation workflows.

Keywords:

scenario-based testing; scenario database (SCDB); automated driving; connected and automated mobility (CCAM); ontology-driven scenario generation; ASAM OpenX (OpenDRIVE, OpenSCENARIO, and OpenLABEL); validation and curation; semantic and value-based querying; cloud-native architecture; Infrastructure-as-Code (IaC)

1. Introduction

Safety validation is a central challenge for connected, cooperative, and automated mobility (CCAM) systems, as these systems operate in open and highly variable real-world environments. Traditional validation strategies in the automotive domain have relied on distance-based testing, understood as the accumulation of real-world driving mileage to support statistical safety arguments. For higher levels of automation, however, such approaches become infeasible, as rare but safety-critical events occur too infrequently to be observed within practical driving distances. Moreover, reported mileage alone does not demonstrate coverage of relevant operational situations or constitute a complete safety case. Scenario-based testing (SBT) addresses these limitations by enabling the systematic evaluation of representative and safety-relevant situations instead of relying solely on mileage accumulation [1,2].

A scenario is commonly understood as the evolution of traffic scenes over time, encompassing entities, events, and interactions, grounded in established terminology distinguishing scene, situation, and scenario [3]. In contemporary safety assessment practice, scenario representations are typically organized into functional, logical, and concrete layers to support systematic coverage, parameterization, and execution [4,5,6]. Layered environment models (e.g., the six-layer model) further provide a structured taxonomy of operational conditions relevant to scenario instantiation and classification [7].

A parallel line of work has addressed scenario interoperability by standardizing the representation of road networks, dynamic behaviors, and semantic annotations. The ASAM OpenX family has emerged as an exchange layer: OpenDRIVE for road geometry, OpenSCENARIO for behavior specification, and OpenLABEL for semantic labeling [8,9,10]. Complementary initiatives extend these standards with ontology- and taxonomy-driven descriptors for operational design domain (ODD) and behavioral concepts [11,12,13]. At the same time, the broader data stewardship community has emphasized FAIR principles as a foundation for reusable scientific assets and reproducible research workflows [14].

1.1. Problem Statement and Research Gap

Recent years have seen substantial progress in scenario representation standards, generation methods, and scenario sharing platforms. Professional scenario databases and repositories, such as SafetyPool and CommonRoad, play an important role in enabling scenario reuse, benchmarking, and community exchange. These systems often prioritize particular objectives, for example large-scale scenario collection, dataset publication, or evaluation campaigns.

From an engineering perspective, however, operating a Scenario Database (SCDB) as a reproducible lifecycle system involves a broader set of concerns. These include how scenarios are generated or ingested, validated, curated, versioned, documented with metadata, and subsequently retrieved and reused in a traceable manner. In practice, these aspects are frequently addressed through project-specific workflows and surrounding toolchains rather than described as an explicit, end-to-end SCDB lifecycle model [15,16,17].

This work does not aim to replace existing scenario repositories or prescribe a single solution. Instead, it aims to make typical SCDB lifecycle steps and interfaces more explicit and reproducible, so that scenario collections can be managed in a systematic and transparent way. In doing so, we highlight two recurring engineering challenges observed across SCDB-oriented workflows:

Fragmented end-to-end pipelines. Scenario generation, ingestion, validation, curation, and enrichment are often handled by separate tools or processes. While this is natural in complex toolchains, it can lead to heterogeneous schemas, implicit assumptions, and limited traceability of how scenarios were created, modified, or validated over time.
Limited modeling of value-constrained querying. Many repositories support tag- or keyword-based retrieval. More structured constraints (e.g., numeric ranges, parameter thresholds, or attribute-value constraints) are not always modeled as explicit, indexed metadata aligned with standards. This can make precise retrieval more difficult when users require both semantic concepts and quantitative filters for targeted validation campaigns.

1.2. SCDB Usage Contexts, Interoperability, and Federation

SCDBs are used by multiple stakeholder groups with different objectives. Validation engineers require repeatable retrieval of scenario subsets for simulation campaigns. Researchers require stable identifiers, provenance information, and reproducible datasets for benchmarking. Toolchain integrators require standards-aligned exports to simulators and analysis pipelines. Governance and safety-oriented roles require explicit validation status, curation state, and traceability to avoid the accidental reuse of unsuitable artifacts.

In practice, many established scenario repositories are designed for specific communities or goals and are not intended to be unified under a single architecture. Interoperability is therefore commonly pursued through federation, where multiple SCDB hosts expose compatible metadata and exchange interfaces while preserving local ownership and operational constraints.

In this context, our contribution is positioned as a reference architecture and engineering blueprint. It can guide new SCDB initiatives and can also inform incremental harmonization of existing hosts by outlining a small set of reproducible contracts for packaging, governance metadata, and query semantics that support interoperability across systems.

Accordingly, the remainder of the paper focuses on (a) standards-aligned exchange models, (b) explicit lifecycle governance concepts, (c) scalable querying that combines semantic and value-based constraints, and (d) reproducible deployment practices using Infrastructure-as-Code.

1.3. Contributions

This work introduces a reproducible reference architecture for SCDBs and a technical realization, referred to as Demo SCDB. Beyond the implementation itself, the research contribution is the explicit formalization of the SCDB as a lifecycle system—including query semantics and governance state—and a reproducibility methodology that makes such systems deployable and comparable across environments. The abstract-level claims are operationalized through the following contributions:

Reference architecture and data model. We formalize an SCDB lifecycle (generation/ingestion, validation, curation, storage, indexing, query, export) and define a data model that treats OpenX artifacts and OpenLABEL metadata as first-class assets with explicit provenance and schema/version control.
Standards-oriented generation and packaging. We describe a pipeline translating declarative scenario specifications (ODD descriptors, road topology, behavioral templates) into OpenDRIVE/OpenSCENARIO artifacts with OpenLABEL annotations and reproducible manifests.
OpenLABEL-driven querying with value constraints. We define querying as OpenLABEL-based matching over categorical tags and value-carrying tag_data (numeric and textual), implemented through an explicit tag–scenario association index for scalable retrieval.
Cloud-native, IaC deployment. We provide an Infrastructure-as-Code (IaC) deployment blueprint enabling reproducible provisioning and low operational overhead, supporting both local adoption and future interoperability patterns.

Figure 1 summarizes the SCDB lifecycle and highlights the separation between scenario artifacts and evolving metadata. During scenario ingestion, initial metadata and provenance information are created to capture the origin of the scenario and its generation context. Subsequent simulation executions may produce additional information such as evaluation results, execution parameters, and derived metrics. These results are written back to the database to extend the provenance record and enrich the scenario metadata over time.

The remainder of the paper is organized as follows. Section 2 positions the work relative to existing SCDB initiatives and scenario toolchains. Section 3 defines the reference architecture and data model. Section 4 presents the cloud-native realization and interfaces. Section 5 details generation and packaging. Section 6 defines validation, curation, and optional enrichment. Section 7 details semantic and value-based querying. Section 8 reports evaluation. Section 9 discusses implications, design trade-offs, and threats to validity. Section 10 concludes and outlines future work.

2. Related Work

SCDBs have been proposed as enabling infrastructure for systematic scenario-based safety validation and, increasingly, certification-oriented workflows [17]. A prominent direction is the construction of curated repositories with ODD-based tagging and standardized scenario exchange formats. SafetyPool™ exemplifies a platform-level approach focused on scenario sharing and ODD-based classification [15]. Another direction is scenario extraction and structuring from real-world data, as exemplified by scenario.center, which emphasizes methods for transforming naturalistic driving recordings into scenario database entries suitable for downstream usage [16,18].

Closely related to these efforts is the CommonRoad ecosystem, which provides an open-source toolchain and an online platform for creating, managing, and accessing traffic scenarios for automated driving research [19,20]. CommonRoad primarily positions itself as a standardized scenario and map representation and a tooling framework rather than as a database architecture. Its online platform provides access to a large-scale, curated collection of more than 60,000 scenarios, accompanied by rich metadata and a powerful filtering interface supporting road topology, traffic participants, and scenario attributes. These capabilities make CommonRoad a widely adopted resource for benchmarking, motion planning evaluation, and algorithmic comparison.

Parallel research investigates scenario specification languages and domain-specific languages (DSLs), providing formal abstractions for describing scenes, constraints, and stochastic generation mechanisms [21,22,23]. Ontology-driven scenario modeling aims to ensure semantic consistency and machine interpretability of ODD and behavior descriptors [12,13]. Moreover, AI-assisted scenario extraction and description has gained traction, for example, through the use of large language models to map text or dataset elements into structured scenario representations [24].

While these efforts address important sub-problems—such as content creation, tooling, tagging, extraction, or specification—a recurrent limitation is the lack of a reproducible, end-to-end SCDB blueprint that integrates standards-aligned scenario assets, explicit lifecycle governance, and structured querying mechanisms within a deployable architecture. This paper addresses that integration gap by presenting a reference architecture and technical realization that explicitly operationalizes the scenario lifecycle at the database level.

3. Reference Architecture and Data Model

3.1. Architectural Requirements

From the usage contexts in Section 1.2 and recurring needs in scenario library research and safety assurance literature, we derive the following architectural requirements:

R1: Standards-first artifact exchange and management. Scenario artifacts and their semantics must use open standards (OpenDRIVE, OpenSCENARIO, and OpenLABEL) without toolchain-specific coupling. The reference implementation stores versioned, standards-compliant scenario packages to maximize interoperability and reproducibility; however, this is an implementation choice rather than an architectural requirement.
R2: Lifecycle governance. Validation outcomes and curation state must be explicit, queryable, and enforced to prevent accidental reuse of invalid or deprecated scenarios.
R3: Structured querying. The system must support semantic querying via standardized descriptors and value-constrained querying via numeric and textual tag_data, including combined predicates.
R4: Reproducibility. Scenario creation, deployment, and export must be reproducible through explicit provenance, versioning, and Infrastructure-as-Code.
R5: Extensibility. New vocabularies, ontologies, tag schemes, and future enrichment services must be integrable without refactoring the core data model.

3.2. Conceptual SCDB Workflow

We define an SCDB as a lifecycle system composed of the following stages:

Create/Ingest: Generation or ingestion of scenario packages and metadata.
Validate: Syntactic and semantic validation of artifacts and descriptors.
Curate: Assignment of lifecycle state, versioning, and deprecation.
Store: Persistence of immutable scenario packages and structured metadata.
Index: Construction of semantic and value indexes for querying.
Query: Retrieval using combined semantic and value predicates.
Export: Delivery of reproducible scenario bundles with manifests.

3.3. Scenario Record Schema

Each scenario in the SCDB is represented by the structured record defined in Equation (1).

S = 〈 id, ver, A, L, P, H, σ 〉,

(1)

where

id

is a globally stable scenario identifier,

ver

is the version identifier,

A

contains immutable references to scenario artifacts,

L

is the set of OpenLABEL tags associated with the scenario,

P

represents the OpenSCENARIO/OpenDRIVE parameterization,

H

captures optional execution history and metadata, and

σ

encodes the storage state of the scenario within the SCDB.

The component

A

contains immutable references to scenario artifacts, including standards-compliant OpenX files (OpenDRIVE, OpenSCENARIO) and optional auxiliary outputs (e.g., previews, validation reports, execution traces (e.g., simulator logs, trajectories, and event timelines) if produced by external execution services).

OpenLABEL as first-class query metadata

Semantic descriptors are encoded in

L

as an ASAM OpenLABEL document. In contrast to approaches where tags are treated as passive attributes, OpenLABEL is used as a query-carrying representation: querying requests are submitted to the SCDB as OpenLABEL documents, and matching is performed against OpenLABEL-derived metadata.

Each OpenLABEL tag is modeled according to Equation (2).

t = 〈 type, ontology, data 〉,

(2)

where

type

denotes a semantic concept defined in a referenced ontology,

ontology

identifies the ontology source, and

data

contains the optional value associated with the tag. Tags with

data = ⌀

represent categorical predicates (e.g., actor class, maneuver type), whereas tags with non-empty

data

encode valued descriptors, such as numeric ranges or textual values. This distinction reflects the OpenLABEL structure, where tags may function either as semantic markers or as typed parameter constraints.

Logical normalization for querying

For querying,

L

is normalized into the two logical sets defined in Equation (3):

T_{cat} = {t ∣ t . data = ⌀}, T_{val} = {t ∣ t . data \neq ⌀} .

(3)

where

T_{cat}

denotes the set of categorical tags without associated values,

T_{val}

denotes the set of valued tags carrying additional data, and

t . data

refers to the data field of a tag t.

Indexing via tag–scenario association

To enable scalable querying, the SCDB maintains an explicit many-to-many association between tags and scenarios, conceptually modeled by the junction relation in Equation (4).

J \subseteq T \times S, T = T_{cat} \cup T_{val} .

(4)

where

J

represents the many-to-many association between tags and scenarios,

T

is the set of all tags, and

S

denotes the set of all scenarios stored in the SCDB. This structure enables efficient retrieval of candidate scenario identifiers from tag-based queries without inspecting full scenario records or raw OpenLABEL files.

Provenance, human description, and lifecycle state

Provenance and reproducibility metadata are stored in

P

, including generator identifiers, schema versions, and generation parameters. The component

H

is an optional unstructured human-readable description intended for documentation and review and is not required for automated querying. Finally,

σ

denotes the curation state (e.g., draft, validated, published, quarantined), governing scenario visibility and admissibility.

Query semantics

Given a query expressed as an OpenLABEL document

L_{q}

, querying proceeds in two stages:

Candidate selection: Extraction of tag types from $L_{q}$ and retrieval of matching scenario identifiers via the junction relation $J$ .
Predicate refinement: Application of constraints derived from valued tag_data in $T_{val}$ , and restriction to admissible lifecycle states (typically $σ \in {published}$ ).

Reproducible packaging

Each scenario package includes a manifest recording cryptographic checksums of all artifacts in

A

; schema versions for OpenDRIVE, OpenSCENARIO, and OpenLABEL; and provenance metadata from

P

. This establishes a clear separation between immutable scenario content, standardized queryable metadata, optional human-facing descriptions, and lifecycle governance.

To make this relationship explicit at the data-model level, Figure 2 illustrates how scenarios and tags are linked through a junction (association) table. This structure supports efficient many-to-many mapping between scenario records and their semantic descriptors.

4. Cloud-Native Implementation and Interfaces

4.1. Design Rationale and Deployment Model

To minimize operational overhead while preserving operational flexibility under bursty workloads, the reference implementation of Demo SCDB adopts a serverless, cloud-native composition. The design is driven by three dominant workload characteristics: (i) read-heavy querying workloads dominated by scenario search and metadata retrieval, (ii) intermittent write bursts arising from scenario ingestion and generation pipelines, and (iii) export-oriented access patterns for packaging and downloading scenario artifacts.

The complete backend is provisioned through Infrastructure-as-Code (IaC), enabling deterministic deployment, environment staging, and reproducible infrastructure configuration across development and production contexts. In the current realization, IaC is implemented using the AWS Cloud Development Kit (CDK), which deploys a managed serverless stack comprising Amazon S3 for artifact storage, Amazon DynamoDB for metadata and tag association indexing, AWS Lambda for stateless compute, Amazon API Gateway for REST exposure, and Amazon Cognito for OAuth 2.0–compatible authentication and authorization. Lambda functions are packaged with shared runtime dependencies via a dedicated Lambda Layer, ensuring consistent library versions across endpoints and simplifying maintenance. While this implementation targets a specific managed stack, the architectural principles described here are backend-agnostic and transferable to alternative serverless or container-based environments.

Figure 3 provides an overview of the managed serverless stack used in the reference implementation and highlights the main interactions between storage, metadata, compute, API, and authentication services.

4.2. Core Components

Immutable artifact storage

Scenario artifacts, including OpenDRIVE, OpenSCENARIO, OpenLABEL files, auxiliary metadata, images, and reports, are stored in an object-based storage layer. In the reference implementation this is realized using Amazon S3, configured with encryption at rest and object versioning. Artifacts are addressed via stable object keys following a structured namespace convention (e.g., /<scenarioId>/<fileName>). References to stored artifacts, including storage keys, filenames, content types, and upload timestamps, are recorded in the scenario metadata record, ensuring a clear separation between immutable content and mutable metadata.

Metadata catalog and tag association index

The metadata plane is implemented as a low-latency catalog. In the current realization, Amazon DynamoDB is used to host two logical relations: (i) a scenario table, keyed by a stable scenario identifier, storing scenario-level metadata and file references; and (ii) a tag–scenario association table materializing the many-to-many relationship between semantic tags and scenarios. The association table is keyed by tag identifier and supports efficient tag-driven candidate retrieval without scanning full scenario records [25]. This design directly supports the two-stage retrieval pipeline described in Section 4.3.

Stateless compute and API layer

All external operations are exposed through stateless compute functions behind a RESTful API. In the reference stack, AWS Lambda implements ingestion, querying, metadata retrieval, and export handlers, while Amazon API Gateway provides request routing, throttling, and endpoint management. Statelessness ensures horizontal scalability and simplifies reproducible deployment, while routing logic cleanly separates lifecycle operations from querying-oriented queries. Shared dependencies (e.g., OpenLABEL parsing and validation utilities) are centralized in a Lambda Layer to reduce duplication and keep function packages lightweight.

Lifecycle maintenance

To maintain consistency across metadata and artifact storage, the implementation includes internal maintenance logic responsible for cascading clean-up operations. When a scenario is removed or invalidated, associated tag links are removed from the tag association table and orphaned artifact references are cleaned up to prevent metadata drift and stale querying results.

4.3. OpenLABEL-Driven Querying Interface

Query-as-OpenLABEL principle

Demo SCDB adopts OpenLABEL as the authoritative semantic interface for scenario querying. Querying requests are submitted as OpenLABEL documents, and matching is performed against OpenLABEL-derived metadata stored in tags and valued tag_data. This approach elevates OpenLABEL from a passive annotation format to an active query interface, ensuring that querying semantics remain aligned with standardized vocabularies and ontologies [10].

Categorical versus valued tags

OpenLABEL tags are treated as typed semantic predicates that may either be: (i) categorical tags, which express the presence of a concept and carry no associated value, or (ii) valued tags, which include structured tag_data encoding numeric ranges or textual values. Categorical tags support presence-based filtering (e.g., existence of a maneuver or actor type), while valued tags enable refined constraints (e.g., numeric ranges, identifier matching, textual constraints).

Two-stage retrieval pipeline

Given a querying request expressed as an OpenLABEL document

L_{q}

, scenario retrieval proceeds in two stages:

Candidate selection. Tag types are extracted from $L_{q}$ and used to query the tag–scenario association index. Candidate scenario identifiers are retrieved through indexed lookups, avoiding full-table scans.
Predicate refinement. Scenario metadata records for candidate identifiers are fetched in batches, and additional predicates are applied. These include constraints derived from valued tag_data (numeric comparisons and textual constraints) and admissibility checks on the curation state (e.g., restricting results to published scenarios).

4.4. Ingestion and Export Interfaces

Scenario ingestion

Ingestion operations accept a scenario identifier, structured metadata fields, and a set of associated files. Files are persisted to immutable object storage, while a consolidated scenario record is written to the metadata catalog. Each semantic tag extracted from the scenario’s OpenLABEL annotation is additionally materialized in the tag–scenario association table. In the reference stack, these operations are exposed through API Gateway endpoints backed by Lambda handlers that write to S3 and DynamoDB, respectively. Recommending harmonized OpenDRIVE, OpenSCENARIO, and OpenLABEL formats for user-uploaded scenarios simplifies ingestion, reduces the need for complex format-mapping layers, and enables consistent metadata extraction for storage and querying. This recommendation supports interoperability and scalable ingestion but is not a strict architectural requirement.

Scenario export

Export operations retrieve immutable artifacts associated with one or more scenario identifiers and assemble them into reproducible packages. When exporting to a specific simulation or testing environment, a lightweight adapter/translation step may be required to satisfy tool-specific input requirements while preserving the standardized OpenX/OpenLABEL content as the source-of-truth package. Querying and export are deliberately decoupled: querying returns identifiers and metadata, while bulk artifact transfer is handled by dedicated export endpoints. In the managed implementation, exports are realized via API endpoints that generate time-limited, pre-signed URLs for S3 objects, reducing backend load while preserving access control.

4.5. Identity, Access Control, and API Contracts

Access control in Demo SCDB is designed to balance open querying with controlled scenario curation. Authentication and authorization are enforced using an OAuth 2.0–compliant identity provider; in the reference stack this is realized using Amazon Cognito. API endpoints exposed through API Gateway are protected using a Cognito authorizer, ensuring that requests are authenticated before invoking backend logic.

Three logical user roles are distinguished:

Normal users, who are authorized to perform read-only querying operations such as OpenLABEL-based querying and metadata retrieval;
Contributors, who are additionally authorized to perform write operations, including scenario ingestion, metadata updates, and lifecycle transitions (e.g., publishing);
Administrators, who have extended privileges for governance operations, including user management and maintenance tasks.

Authorization is enforced through OAuth 2.0 access tokens carrying explicit scopes. API contracts are documented using OpenAPI specifications, which declare required authentication schemes and scopes for each endpoint, making access policies transparent and portable across backend implementations.

4.6. Portability Considerations

Although the current realization of Demo SCDB is instantiated on a managed serverless cloud stack (S3, DynamoDB, Lambda, API Gateway, Cognito), the architecture is intentionally designed around backend-agnostic abstractions: (i) immutable artifact storage, (ii) a structured metadata catalog, (iii) an explicit tag–scenario association index, and (iv) stateless API handlers governed by a formal contract. None of these elements intrinsically depend on a specific cloud provider. The implementation-specific choices primarily affect operational characteristics (deployment tooling, scaling behavior, and managed-service constraints) rather than the conceptual data model or querying semantics.

5. Scenario Generation and Packaging

5.1. Generation Objectives and Scope

Demo SCDB treats scenario generation as a controlled and reproducible process whose output is not merely executable content, but standardized, versioned assets suitable for long-term storage, validation, and querying. We distinguish between a scenario and a test scenario. A scenario is a descriptive representation of a possible traffic evolution, including road layout, actors, behaviors, and environmental conditions, without implying evaluation criteria. A test scenario, in contrast, binds a concrete scenario instance to a specific test objective and an evaluation specification (test oracle) and serves as input to a validation method or simulator. Demo SCDB stores and exports scenarios as reusable descriptive assets; test scenarios are derived via adapters that specialize a scenario for a given execution environment and evaluation context. The generation pipeline therefore aims to produce (i) standards-compliant OpenX artifacts and (ii) machine-interpretable semantic metadata (OpenLABEL) that can be indexed and queried independently of the generation tooling.

The initial population of the database is performed using a dedicated generation toolkit built on top of the open-source scenariogeneration library [26]. The objective of this first-generation corpus is not exhaustive operational coverage, but rather the construction of a diverse and internally consistent baseline that exercises the SCDB lifecycle end-to-end: ingestion, validation, curation, indexing, querying, and export.

Figure 4 illustrates the generation and packaging workflow used to create these baseline assets before they are ingested into the SCDB for storage and lifecycle management.

5.2. ODD-Centered, Parameter-Based Scenario Modeling

Scenario generation follows the established abstraction chain from functional intent to logical and concrete scenarios [4,27,28]. Functional descriptions are mapped to an explicit Operational Design Domain (ODD), which constrains admissible operating conditions. Logical scenarios are represented as constrained parameter spaces derived from the selected ODD, and concrete scenarios correspond to fully assigned parameter vectors.

The generation process is guided by a structured semantic view aligned with OpenLABEL concepts. At a high level, scenario construction progresses from environment and road context toward actors and behaviors. Concretely, generation first establishes the scene and road topology, then introduces road users and their roles, followed by behavioral templates and interaction patterns. ODD descriptors, actor categories, and behavior concepts correspond to OpenLABEL tag families, ensuring that semantic annotations used for querying are consistent with the generation logic.

Let

θ

denote the complete parameter vector defining a scenario. For implementation, we decompose the logical parameter space into the five coupled groups defined in Equation (5), which reflect the generation ordering and the structure of OpenX artifacts without implying a universal parameter taxonomy:

θ = (θ^{road}, θ^{dyn}, θ^{behavior}, θ^{env}, θ^{init}),

(5)

where

θ^{road}

captures road geometry and topology,

θ^{dyn}

captures traffic participants and their physical attributes,

θ^{behavior}

captures maneuver and interaction parameters,

θ^{env}

captures environmental conditions (e.g., weather and illumination), and

θ^{init}

captures initial-state variables (positions, speeds, gaps). ODD constraints restrict the admissible domains and combinations across these groups.

5.3. OpenDRIVE-Centered Scene Anchoring

Road topology is treated as the structural backbone of each scenario family, as it constrains feasible actor placement, available maneuvers, and admissible parameter combinations. Scenario generation therefore begins with OpenDRIVE, which serves as the authoritative representation of the road.

Two complementary strategies are supported:

External OpenDRIVE ingestion, where complex road networks (e.g., urban layouts or highways) are imported from external sources and reused as immutable anchors;
Procedural OpenDRIVE synthesis, where parameterized road motifs (e.g., two-way roads, multi-lane highways, merges, junctions, roundabouts) are generated programmatically for controlled experimentation.

5.4. Behavior Templates and OpenSCENARIO Emission

Dynamic interactions are specified using reusable, parameterized behavior templates that compile into executable OpenSCENARIO 1.x artifacts. OpenSCENARIO represents behavior through entities, actions, maneuvers, and event-based triggers. Template instantiation resolves high-level intent (e.g., cut-in, following, approach) into an explicit event/action structure, and produces a concrete OpenSCENARIO file that references the corresponding OpenDRIVE road asset.

5.5. Constraint-Aware Concretization

Scenario concretization is performed under explicit feasibility constraints to avoid generating invalid or semantically inconsistent scenarios. Concretization follows an ordered strategy:

fixation of road parameters,
assignment of initial states conditioned on road geometry,
instantiation of behavioral parameters conditioned on initial states,
assignment of environmental parameters.

Constraints arising from spatial arrangement relations, cross-entity dependencies, and parameter dependencies induced by physical or design rules are enforced during parameter assignment and documented as part of the scenario provenance.

5.6. Semantic Annotation and Reproducible Packaging

Each generated scenario is exported as a self-contained, versioned package comprising OpenDRIVE, OpenSCENARIO, and an OpenLABEL document encoding ODD descriptors, actor roles, and behavioral semantics. The OpenLABEL annotations reflect the same semantic structure used during generation, enabling consistent lifecycle management and querying. A manifest file accompanies each package and records cryptographic checksums, standard versions, generator provenance, and parameter-space provenance.

Packaged artifacts are treated as immutable: any modification to OpenX artifacts or the exported OpenLABEL document results in a new versioned scenario instance. This guarantees unambiguous ingestion into the SCDB, enables reproducible retrieval and execution, and ensures that subsequent enrichment products can be traced back to the exact scenario definition that produced them.

6. Validation, Curation, and Optional Enrichment

6.1. Two-Stage Validation

We adopt a two-stage validation strategy that separates structural and semantic concerns:

Syntactic validation, ensuring conformance to data schemas (e.g., OpenDRIVE/OpenSCENARIO/OpenLABEL) and referential integrity among road, actors, and scenario definitions.
Semantic validation, enforcing feasibility and plausibility constraints such as collision-free initialization, parameter bounds, and consistency with ODD definitions (e.g., admissible speed ranges under specific environmental conditions).

In addition to physical feasibility, semantic validation can incorporate realism-oriented checks by enforcing ODD-consistency and plausibility constraints. Representative examples include admissible speed ranges consistent with road class and curvature, actor placement and initial gaps consistent with lane geometry and traffic rules, and coherence between environmental conditions (e.g., weather, illumination) and the declared ODD. Such policies are explicitly versioned to keep validation reproducible [29].

6.2. Optional Execution-Based Enrichment

Format validity does not guarantee relevance for safety assessment. Therefore, the architecture allows scenarios to be optionally executed in a compatible simulator or evaluation environment to produce trajectories and event traces. From these traces, execution-derived metrics can be computed and stored as provenance-linked metadata. Toolchains such as CommonRoad-CriMe provide a unified framework for computing criticality measures in a consistent coordinate and vehicle model setting [30]. In the current version of Demo SCDB, querying is driven by OpenLABEL tags and tag_data; ontologies for execution-derived metrics and metric-driven querying can be supported as an extension. Execution in a simulator does not by itself transform a scenario into a test scenario. A scenario becomes a test scenario only when it is associated with a defined test objective and an explicit evaluation specification. The SCDB therefore focuses on storing scenarios and execution-derived metrics, while pass/fail judgments are produced in the test environment.

6.3. Curation Workflow

We model scenario curation as a state machine with the canonical states draft →validated→ published, and a failure sink quarantined. Transitions are governed by validation outcomes and contributor privileges. This model supports controlled evolution of the scenario library, ensuring that published artifacts meet validation requirements [29,30].

7. Semantic and Value-Based Querying

Scenario querying in Demo SCDB is driven by OpenLABEL-based queries grounded in an explicit scenario-tag ontology [10]. Users submit querying requests as OpenLABEL documents, and the backend performs tag-based matching while enforcing lifecycle admissibility (default: published). In this version, querying supports both (i) categorical tags (presence-based predicates) and (ii) valued tag_data constraints over numeric and textual fields. The ontology defines the semantic vocabulary of the query space, while concrete queryable values are represented at the scenario-instance level through the corresponding OpenLABEL annotations.

To support efficient retrieval at scale, semantic tags are indexed explicitly rather than derived on demand from raw scenario packages. The architecture materializes a tag index supporting many-to-many relationships (scenario–tag) and applies value-based predicate refinement over stored tag_data associated with candidate scenarios. Query results return stable scenario identifiers and package checksum, and exports include manifests capturing metadata and provenance required to reproduce the retrieved artifacts.

8. Evaluation

This section provides an empirical and procedural validation of the proposed reference architecture by examining how it operationalizes requirements R1–R5 (Section 3) in a working deployment. The goal of this evaluation is not to benchmark absolute system performance or scenario-space coverage, but to verify that the architecture functions as intended as a lifecycle-governed, standards-aligned scenario database. Accordingly, the evaluation focuses on (i) reproducibility and deployment determinism (R4), (ii) structured querying performance (R3), and (iii) lifecycle governance enforcement (R2).

8.1. Experimental Setup

We evaluate a representative deployment of Demo SCDB provisioned entirely through Infrastructure-as-Code (IaC), using the serverless stack described in Section 4. All infrastructure components, storage resources, and API endpoints are instantiated from version-controlled IaC definitions, enabling deterministic recreation of the system.

The evaluation dataset consists of standards-aligned scenario packages (OpenDRIVE, OpenSCENARIO, OpenLABEL) generated by the pipeline described in Section 5. The dataset is constructed to provide structural diversity across road layouts, actor types, maneuvers, and ODD attributes. Its purpose is to exercise the SCDB lifecycle end-to-end rather than to approximate real-world traffic distributions. The dataset therefore supports validation of ingestion, validation, indexing, querying, and export workflows under heterogeneous but controlled conditions.

Representative workloads include:

scenario ingestion and metadata registration,
tag-based querying,
value-constrained querying over tag_data,
reproducible export of scenario packages.

To contextualize the evaluation, Table 1 summarizes the main characteristics of the dataset and the querying workload used in the experiments.

8.2. Results and Interpretation

Structured querying (R3)

The primary indicator for R3 is end-to-end latency under semantic and value-constrained predicates. The explicit tag–scenario association index enables candidate selection through indexed lookups, while valued tag_data predicates are applied during refinement (Section 4.3). Tag-only queries are dominated by index retrieval and metadata fetches, whereas value-constrained queries incur additional filtering cost proportional to candidate set size.

Observed latencies remain within interactive ranges for scenario selection workflows. Importantly, retrieval cost scales with metadata complexity rather than scenario file size, since raw OpenX artifacts are not parsed at query time. This confirms that the architectural choice of metadata-centric indexing effectively supports structured querying.

As an illustrative example, consider the generated motorway overtaking scenario. This scenario describes a two-vehicle interaction in which a follower vehicle performs an overtaking maneuver on a multi-lane motorway.

The scenario is stored in the SCDB together with its semantic annotations and associated metadata. Its categorical tags include MotionOvertake, RoadTypeMotorway, VehicleCar, and DynamicElementsTraffic. A query can then be formulated to retrieve scenarios satisfying specific semantic constraints, for example, those tagged with MotionOvertake and RoadTypeMotorway.

The system first retrieves candidate scenarios using the categorical tag index and then applies the standard query refinement process defined by the SCDB. In this case, the motorway overtaking scenario is retrieved as part of the query result. The resulting scenarios can then be exported as reproducible packages, including OpenDRIVE, OpenSCENARIO, and OpenLABEL files together with their associated metadata and provenance.

Lifecycle governance (R2)

Lifecycle governance enforcement was validated by maintaining scenarios in multiple curation states (draft, validated, published, quarantined) during evaluation. Default querying was confirmed to return only admissible scenarios (typically published), demonstrating that lifecycle state is actively enforced rather than treated as passive metadata. Validation reports and provenance metadata remain traceable for each scenario, supporting auditability and controlled evolution of the scenario library.

Reproducibility (R4)

Reproducibility is evaluated procedurally. The full SCDB stack was redeployed from the same IaC configuration in a separate environment, yielding consistent infrastructure topology and API contracts. Scenario packages include manifests with cryptographic checksums and explicit standard versions, allowing artifact integrity to be verified across deployments. These properties support deterministic redeployment and reproducible scenario exchange, which are central goals of the proposed architecture.

9. Discussion

The primary contribution of Demo SCDB is an operational and reproducible system blueprint that integrates standards-based packaging, lifecycle governance, and structured querying into a coherent database contract. Rather than proposing new scenario representations or metrics, the system demonstrates how OpenX artifacts and OpenLABEL metadata can be organized, validated, curated, and retrieved in a scalable and reproducible manner.

9.1. Positioning for Interoperability and Federation

This work is intended to be compatible with federated SCDB ecosystems. In such settings, individual hosts retain autonomy over their scenario assets and operational constraints, while interoperability is achieved through aligned metadata models, stable identifiers, and standards-oriented exchange. The reference architecture in this paper can therefore be adopted by new SCDB initiatives and can also inform incremental harmonization of existing hosts by identifying clear interfaces for ingestion, governance, querying, and export.

9.2. Design Trade-Offs

Key design trade-offs include the use of serverless components to reduce operational overhead, explicit indexing of OpenLABEL-derived metadata to support value-constrained querying at scale, and the separation of syntactic and semantic validation to allow incremental refinement of validation policies. In particular, decoupling querying from bulk artifact export allows read-heavy workloads to remain low-latency while preserving reproducibility through manifest-based packaging.

9.3. Threats to Validity and Limitations

Our evaluation (Section 8) focuses on architectural properties and representative workload behavior rather than exhaustively characterizing all possible SCDB deployments. Several limitations should be considered:

Dataset coverage and representativeness. The initial corpus is designed for structural diversity to exercise the SCDB lifecycle; it is not intended to provide complete scenario-space coverage or statistical representativeness of real-world traffic distributions.
Execution environment dependence. Execution traces and metrics enrichment depend on the chosen simulator and configuration; measurement values and criticality measures may vary across toolchains unless execution services and metric definitions are standardized and versioned.
Ontology and tag completeness. Query semantics are constrained by the expressiveness and coverage of the adopted ontologies and tagging practices; incomplete vocabularies can lead to reduced recall even if the underlying artifacts are relevant.
Implementation-specific effects. While the reference architecture is backend-agnostic, operational characteristics (e.g., cold-start latency, throughput limits, cost profile) depend on the chosen cloud or deployment substrate.

These limitations motivate the future work outlined in Section 10, especially standardized execution services, metrics ontologies, and broader interoperability patterns.

10. Conclusions and Future Work

This paper presented a reproducible reference architecture for scenario databases supporting automated driving validation. The work formalizes the SCDB as a lifecycle-oriented system that brings scenario generation and ingestion, validation, curation, storage, structured querying, and reproducible export into a unified framework.

The contribution lies in making these lifecycle steps explicit and operational through a standards-aligned data model and a deployable system design. By treating OpenDRIVE, OpenSCENARIO, and OpenLABEL artifacts together with their associated metadata as first-class, versioned assets, the architecture supports traceable scenario management and consistent query semantics across different stages of the workflow.

The reference implementation demonstrates that these concepts can be realized in practice using a cloud-native, Infrastructure-as-Code approach. The evaluation confirms that the system supports reproducible deployment, efficient metadata-driven querying, and enforceable lifecycle governance, while maintaining a clear separation between immutable scenario artifacts and evolving metadata. These conclusions are supported by the empirical results presented in Section 8, where the system demonstrates interactive query latencies under metadata-driven retrieval, consistent enforcement of lifecycle states during querying, and reproducible deployment through Infrastructure-as-Code with verifiable artifact integrity.

This work is not intended as a centralized solution, but as a reference blueprint that can guide the development of interoperable scenario database systems. By making lifecycle structure, metadata organization, and query semantics explicit, the proposed approach aims to support more systematic scenario reuse and more transparent validation processes in automated driving workflows.

Future work will extend the current architecture along several directions. A first step is the integration of execution-derived metrics through a standards-aligned representation, including the definition of a metrics ontology compatible with the existing OpenLABEL-based semantic layer. A second direction is the externalization of scenario execution into a dedicated service with a formal OpenAPI interface, enabling reproducible execution and results management independent of the SCDB core. In addition, we plan to develop modular metric computation services that attach versioned results and provenance to scenarios and enable metric-driven querying as an extension of the current interface. Finally, we will investigate AI-assisted support for selected lifecycle tasks, such as scenario generation, validation triage, and metadata completion, while maintaining explicit governance and traceability constraints.

Author Contributions

Conceptualization, Y.T.A., J.D.O. and M.N.; methodology, Y.T.A.; software, Y.T.A. and J.D.O.; validation, Y.T.A. and J.D.O.; formal analysis, Y.T.A.; investigation, Y.T.A.; resources, M.N.; data curation, Y.T.A. and J.D.O.; writing—original draft preparation, Y.T.A.; writing—review and editing, Y.T.A., J.D.O. and M.N.; visualization, Y.T.A. and J.D.O.; supervision, M.N.; project administration, M.N.; funding acquisition, M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Horizon Europe programme of the European Union under grant agreement No. 101069573 (project SUNRISE). The APC was funded by the SUNRISE project under the same grant agreement.

Data Availability Statement

There is no data published along with the manuscript.

Acknowledgments

This work was funded by the Horizon Europe programme of the European Union, under grant agreement 101069573 (project SUNRISE). Views and opinions expressed here are however those of the author(s) only and do not necessarily reflect those of the European Union or CINEA. Neither the European Union nor the granting authority can be held responsible for them.

Conflicts of Interest

There are no conflicts of interest.

References

Neurohr, C.; Westhofen, L.; Henning, T.; De Graaff, T.; Möhlmann, E.; Böde, E. Fundamental considerations around scenario-based testing for automated driving. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 121–127. [Google Scholar] [CrossRef]
Zhang, X.; Khastgir, S.; Tiele, J.K.; Takenaka, K.; Hayakawa, T.; Jennings, P. Odd and behavior based scenario generation for automated driving systems. IEEE Access 2024, 12, 10652–10663. [Google Scholar] [CrossRef]
Ulbrich, S.; Menzel, T.; Reschka, A.; Schuldt, F.; Maurer, M. Defining and Substantiating the Terms Scene, Situation, and Scenario for Automated Driving. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems (ITSC), Gran Canaria, Spain, 15–18 September 2015; pp. 982–988. [Google Scholar] [CrossRef]
PEGASUS Project Consortium. The PEGASUS Method: Measuring Automated Driving Capability (Project Material). 2019. Available online: https://www.pegasusprojekt.de/en/about-PEGASUS (accessed on 12 September 2025).
Weber, H.; Bock, J.; Klimke, J.; Roesener, C.; Hiller, J.; Krajewski, R.; Zlocki, A.; Eckstein, L. A framework for definition of logical scenarios for safety assurance of automated driving. Traffic Inj. Prev. 2019, 20, S65–S70. [Google Scholar] [CrossRef] [PubMed]
Ko, W.; Park, S.; Yun, J.; Park, S.; Yun, I. Development of a framework for generating driving safety assessment scenarios for automated vehicles. Sensors 2022, 22, 6031. [Google Scholar] [CrossRef] [PubMed]
Scholtes, M.; Westhofen, L.; Turner, L.R.; Lotto, K.; Schuldes, M.; Weber, H.; Wagener, N.; Neurohr, C.; Bollmann, M.H.; Körtke, F.; et al. 6-Layer Model for a Structured Description and Categorization of Urban Traffic and Environment. IEEE Access 2021, 9, 59131–59147. [Google Scholar] [CrossRef]
ASAM e.V. ASAM OpenDRIVE. Online Resource. 2025. Available online: https://www.asam.net/standards/detail/opendrive/ (accessed on 30 September 2025).
ASAM e.V. ASAM OpenSCENARIO 1.2.0 User Guide. Online Resource. 2022. Available online: https://www.asam.net/standards/detail/openscenario/ (accessed on 30 September 2025).
ASAM e.V. ASAM OpenLABEL. Online Resource. 2021. Available online: https://www.asam.net/standards/detail/openlabel/ (accessed on 30 September 2025).
ASAM e.V. ASAM OpenXOntology. Online Resource. 2024. Available online: https://www.asam.net/standards/detail/openxontology/ (accessed on 30 September 2025).
De Gelder, E.; Paardekooper, J.P.; Saberi, A.K.; Elrofai, H.; Op den Camp, O.; Kraines, S.; Ploeg, J.; De Schutter, B. Towards an Ontology for Scenario Definition for the Assessment of Automated Vehicles: An Object-Oriented Framework. IEEE Trans. Intell. Veh. 2022, 7, 300–314. [Google Scholar] [CrossRef]
Bagschik, G.; Menzel, T.; Maurer, M. Ontology Based Scene Creation for the Development of Automated Vehicles. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1813–1820. [Google Scholar] [CrossRef]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J.; Santos, L.B.d.; Bourne, P.E.; et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 2016, 3, 160018. [Google Scholar] [CrossRef] [PubMed]
SafetyPool.ai. Online Resource. 2025. Available online: https://www.safetypool.ai/ (accessed on 12 September 2025).
Schuldes, M.; Glasmacher, C.; Eckstein, L. scenario. center: Methods from Real-world Data to a Scenario Database. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju, Republic of Korea, 2–5 June 2024; pp. 1119–1126. [Google Scholar] [CrossRef]
Vass, S.; Galassi, M.C.; Ciuffo, B.; Baldini, G. A common scenario database for Automated Vehicles validation and certification. Transp. Res. Procedia 2023, 72, 3845–3852. [Google Scholar] [CrossRef]
Feng, Y.; Bao, S.; Liu, H. Connected and Automated Vehicle (CAV) Testing Scenario Design and Implementation Using Naturalistic Driving Data and Augmented Reality; Technical Report; Report No. UMTRI-2023-6; University of Michigan Transportation Research Institute (UMTRI): Ann Arbor, MI, USA, 2023. Available online: https://rosap.ntl.bts.gov/view/dot/73486 (accessed on 12 September 2025).
Althoff, M.; Koschi, M.; Manzinger, S. CommonRoad: Composable Benchmarks for Motion Planning on Roads. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 719–726. [Google Scholar] [CrossRef]
Maierhofer, S.; Klischat, M.; Althoff, M. CommonRoad Scenario Designer: An Open-Source Toolbox for Map Conversion and Scenario Creation for Autonomous Vehicles. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 4792–4799. [Google Scholar] [CrossRef]
Fremont, D.J.; Dreossi, T.; Ghosh, S.; Yue, X.; Sangiovanni-Vincentelli, A.L.; Seshia, S.A. Scenic: A Language for Scenario Specification and Scene Generation. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), Phoenix, AZ, USA, 22–26 June 2019; pp. 63–78. [Google Scholar] [CrossRef]
Zhang, X.; Khastgir, S.; Jennings, P. Scenario Description Language for Automated Driving Systems: A Two Level Abstraction Approach. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada, 11–14 October 2020; pp. 973–980. [Google Scholar] [CrossRef]
Majumdar, R.; Mathur, A.; Pirron, M.; Stegner, L.; Zufferey, D. Paracosm: A Language and Tool for Testing Autonomous Driving Systems. arXiv 2019, arXiv:1902.01084. [Google Scholar]
Zhao, Y.; Xiao, W.; Mihalj, T.; Hu, J.; Eichberger, A. Chat2Scenario: Scenario Extraction from Dataset through Utilization of Large Language Models. In Proceedings of the 2024 IEEE Intelligent Vehicles Symposium (IV), Jeju, Republic of Korea, 2–5 June 2024; pp. 559–566. [Google Scholar] [CrossRef]
Özsu, M.T.; Valduriez, P. Principles of Distributed Database Systems, 4th ed.; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar] [CrossRef]
Pyoscx Contributors. Scenariogeneration: A Python Framework for Generating OpenSCENARIO and OpenDRIVE Content. Open-Source Software. 2024. Available online: https://github.com/pyoscx/scenariogeneration (accessed on 12 February 2026).
Menzel, T.; Bagschik, G.; Isensee, L.; Schomburg, A.; Maurer, M. From functional to logical scenarios: Detailing a keyword-based scenario description for execution in a simulation environment. In Proceedings of the 2019 IEEE Intelligent Vehicles Symposium (IV), Paris, France, 9–12 June 2019; pp. 2383–2390. [Google Scholar] [CrossRef]
Menzel, T.; Bagschik, G.; Maurer, M. Scenarios for development, test and validation of automated vehicles. In Proceedings of the 2018 IEEE Intelligent Vehicles Symposium (IV), Suzhou, China, 26–30 June 2018; pp. 1821–1827. [Google Scholar] [CrossRef]
Pek, C.; Rusinov, V.; Manzinger, S.; Üste, M.C.; Althoff, M. CommonRoad drivability checker: Simplifying the development and validation of motion planning algorithms. In Proceedings of the 2020 IEEE Intelligent Vehicles Symposium (IV), Las Vegas, NV, USA, 19 October–13 November 2020; pp. 1013–1020. [Google Scholar] [CrossRef]
Lin, Y.; Althoff, M. CommonRoad-CriMe: A toolbox for criticality measures of autonomous vehicles. In Proceedings of the 2023 IEEE Intelligent Vehicles Symposium (IV), Anchorage, AK, USA, 4–7 June 2023; pp. 1–8. [Google Scholar] [CrossRef]

Figure 1. High-level lifecycle of scenario assets centered on the SCDB core: scenarios are generated/ingested, governed through validation and curation, and exported for downstream use; optional execution can feed back additional metadata as an extension.

Figure 2. Conceptual data model for scenarios and tags. A junction table represents the many-to-many relationship between scenario records and semantic tags. The symbols ‘1’ and ‘*’ denote cardinality in the database schema, representing one-to-many relationships that together form a many-to-many relationship between Scenarios and Tags via the TagScenarios junction table.

Figure 3. Managed serverless backend stack used in the reference implementation of Demo SCDB, deployed reproducibly via Infrastructure-as-Code.

Figure 4. Conceptual flow of knowledge-based scenario generation and preparation. The large arrow represents the generation-to-ingestion pipeline that produces curated scenario assets for storage and lifecycle management within the SCDB.

Table 1. Summary of the dataset and querying characteristics used in the evaluation.

Item	Value
Number of scenario packages	`466`
Number of unique tags (categorical + valued)	`94`
Average tags per scenario	`12`
Median query latency (tag-only)	838 ms
P95 query latency (tag-only)	1010 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Azar, Y.T.; Ortega, J.D.; Nieto, M. A Reproducible Reference Architecture for Automated Driving Scenario Databases. Vehicles 2026, 8, 88. https://doi.org/10.3390/vehicles8040088

AMA Style

Azar YT, Ortega JD, Nieto M. A Reproducible Reference Architecture for Automated Driving Scenario Databases. Vehicles. 2026; 8(4):88. https://doi.org/10.3390/vehicles8040088

Chicago/Turabian Style

Azar, Yavar Taghipour, Juan Diego Ortega, and Marcos Nieto. 2026. "A Reproducible Reference Architecture for Automated Driving Scenario Databases" Vehicles 8, no. 4: 88. https://doi.org/10.3390/vehicles8040088

APA Style

Azar, Y. T., Ortega, J. D., & Nieto, M. (2026). A Reproducible Reference Architecture for Automated Driving Scenario Databases. Vehicles, 8(4), 88. https://doi.org/10.3390/vehicles8040088

Article Menu

A Reproducible Reference Architecture for Automated Driving Scenario Databases

Abstract

1. Introduction

1.1. Problem Statement and Research Gap

1.2. SCDB Usage Contexts, Interoperability, and Federation

1.3. Contributions

2. Related Work

3. Reference Architecture and Data Model

3.1. Architectural Requirements

3.2. Conceptual SCDB Workflow

3.3. Scenario Record Schema

4. Cloud-Native Implementation and Interfaces

4.1. Design Rationale and Deployment Model

4.2. Core Components

4.3. OpenLABEL-Driven Querying Interface

4.4. Ingestion and Export Interfaces

4.5. Identity, Access Control, and API Contracts

4.6. Portability Considerations

5. Scenario Generation and Packaging

5.1. Generation Objectives and Scope

5.2. ODD-Centered, Parameter-Based Scenario Modeling

5.3. OpenDRIVE-Centered Scene Anchoring

5.4. Behavior Templates and OpenSCENARIO Emission

5.5. Constraint-Aware Concretization

5.6. Semantic Annotation and Reproducible Packaging

6. Validation, Curation, and Optional Enrichment

6.1. Two-Stage Validation

6.2. Optional Execution-Based Enrichment

6.3. Curation Workflow

7. Semantic and Value-Based Querying

8. Evaluation

8.1. Experimental Setup

8.2. Results and Interpretation

9. Discussion

9.1. Positioning for Interoperability and Federation

9.2. Design Trade-Offs

9.3. Threats to Validity and Limitations

10. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI