Design and Implementation of an Ontology for Semantic Labeling and Testing: Automotive Global Ontology (AGO)

. Modern Artiﬁcial Intelligence (AI) methods are able to massively produce accurate and rich descriptions of data, in domains like surveillance or automotion. The need to organize data at scale in a semantic structure has then arisen for long-term data maintenance and consumption. Ontologies and graph databases have gained popularity as a mechanism to satisfy this need. Ontologies provide the means to formally structure descriptive and semantic relations of a domain. Graph databases allow efﬁcient and well-adapted store, manipulation and consumption of these linked-data resources. However, up to date, there is not a universally deﬁned strategy for building AI-oriented ontologies for the automotive sector. One of the key challenges is the lack of a world-wide standardised vocabulary. Most private initiatives and large open datasets for Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD) development include their own deﬁnition of terms, with incompatible taxonomies and structures producing a well-known lack of interoperability. This paper presents a methodology for designing and building domain ontologies as a Knowledge Organization System (KOS) using graph databases (Neo4j). The Automotive Global Ontology (AGO) is issued as well as a result of the methodology implementation. Two different use cases for AGO are presented to showcase its capabilities: semantic labeling and scenario-based testing. The ontology and related material is made public for its subsequent usage by the industry and academic communities.


Introduction
Labeling, in the context of Artificial Intelligence (AI), is the process of adding descriptive information to data; for example, collections of images, data series, or sensor measurements. Labeling is a major bottleneck for machine learning (ML) progress, because the current state of the art regarding deep learning (DL) implies creating massive training datasets composed of data samples (e.g., images and point cloud scans). These samples are labeled with descriptions that are learnt by the DL model (e.g., labels with the class name, or the bounding box or shape of objects). As a consequence, better models are typically obtained using larger and carefully crafted datasets, which encompass, as much as possible, all the potential data variability of the domain of interest (e.g., all possible configurations of complex driving scenes).
As a result, business models for labeling have emerged, in which label producers offer services to create large datasets from raw data recordings. As an example, from 2018 to 2020, almost 20

State of the Art
Different works in the transportation field have identified the importance of domainknowledge structures for different purposes: to assess traffic scenes in real time applications [3], to provide automatic support for design and analysis of performance monitoring systems for Public Transport Systems [4], or to infer knowledge to aid test management [5].
Several ontologies in the transportation domain can be found in the literature, e.g., the Ontology of Traffic Networks (OTN) [6], which is summarized as a direct encoding of geographic data files (GDF) in OWL, and the Transport Disruption ontology [7], which provides a formal framework for modeling travel planning-related events.
In recent years, the focus has been on knowledge-based approaches representing scenarios with the purpose of promoting the scenario-based evaluation of ADASs and AD. Ontologies have become a key component for formalizing this knowledge. An event-based scenario description for testing was presented [8] based on the three abstraction levels for scenario description: functional, logical, and concrete scenarios, as described in [9] and adopted by the Pegasus project (https://www.pegasusprojekt.de/en/ accessed on 5 April 2021). The strategy was enhanced with a procedure that enabled qualitative description for the generation of more concrete scenarios. This scenario-based testing strategy entails the development of a structured knowledge and formal scenario representation language for driving simulation environments, such as, OpenSCENARIO (https://www.asam.net/project-detail/asam-openscenario-v1x/ accessed on 4 June 2021), CommonRoad (https://commonroad.in.tum.de/ accessed on 4 June 2021), or the Safety Pool's scenario description language (https://www.safetypool.ai/ accessed on 20 June 2021).
Despite these advances, there is currently a lack of an open knowledge base in the automotive domain that covers the needs of the testing and labeling applications. Therefore, one of the aims of this paper is to present AGO. This was built with the purpose of formalizing the terminology used for representing automotive scenarios and providing the required knowledge layer to support semantic-labeling tools.
A small number of studies have been published related to ontology engineering methodologies that present design and construction principles. However, to the best of our knowledge, there is not yet a standardized approach or any formal requirement other than the ontology languages defined by the W3C group [10,11] to define an ontology. In the following, a selection of existing works is discussed. METHONTOLOGY [12], On-To-Knowledge Methodology (OTKM) [13], and DILIGENT [14] constitute the basis for many subsequent proposals. For instance, NeOn [2] emphasizes the reuse of existing resources for building a collaborative ontology rather than starting from scratch, such as in METHONTOLOGY [12].
In relation to the reusable resources related to the automotive domain, in 2018 a survey of existing ontologies for transportation was carried out [15] in which several approaches were studied and compared. Among these, the ontology for road traffic management [16] presents bidirectional axioms ("doesAction" and "isActionDoneBy") that relate classes to driving actions. Other examples also consider attributes by introducing additional axioms into the ontology.
UPON [17] was published in 2005 as a proposal that takes advantage of the Unified Software Development Process and the Unified Modeling Language (UML). The proposed methodology is based on the semantic languages created by the World Wide Web (W3C), RDF, and OWL (see Section 5.2. for further information). These XML-based syntaxes allow the representation of knowledge as triples, i.e., a three-entity statement in the form of subject-predicate-object expressions. This atomic structure forms a directed graph; hence, ontologies can be defined as graphs, in which each class is a node (vertex), and is connected to other classes via properties or relations (edges). Some studies have considered the use of graph databases to implement RDF stores [18] or to build an ontology for an automated vehicle's context model [19].
However, there is not yet a de jure standard for the construction or design of ontologies in the automotive domain. Furthermore, the proposed methodology bases the construction and representation of the ontologies in graph databases. Among the analyzed references, to the best of our knowledge, sound developments based on graphs do not exist. Thus, is no dominant, standardized methodology based on these databases. In this work, we used Neo4j as the graph database to host the ontology and the Cypher query language to interoperate with it. This database deploys the ontology as a database resource, fostering its utilization to address the new challenges of the labeling industry, such as global networking (e.g., using the Bolt network protocol for client-server communication), Big Data, advanced algorithms (e.g., pathfinding), and visualization applications.

Terminology
In this work, the following definitions were adopted: • Action: A class understood to be a situation with a semantic meaning, happening in the scene typically related to Objects, which are either the subjects or the objects of the Action. They occur during a specific time interval (frames). It is necessary to distinguish between intransitive and transitive actions because they are semantically different.
• Intransitive action: Express status of Objects, and thus can be expressed as adjectives or verbs in the present continuous form: "the car is parked". In this example, the object is not specified, but the "Car" is known to be the subject and "Parked" is the predicate of the sentence.

•
Transitive action: Can be naturally treated as triples, where there is a subject, a predicate, and an object. For example, "a child is running in the park", where "Child" is the subject, "Park" is the object, and "Running" is the predicate or the Action.
• Attribute: A quality or feature of a class element or axiom of the ontology. • Axiom (Relationship): Statements that are asserted to be true in the domain being described [20]. They structure the ontology and provide semantic information. They are represented as relationships in the graph database. • Class: Concept of the domain represented as a node in the graph database. • Context: A class for elements that describe the general situation and circumstances of the scenario. Contextualizing can involve any aspect that helps the user or application define the surroundings and general conditions of the scenario. • Event: A class to represent anything that happens in an instant of time (frame). Therefore, any instantaneous change of state caused by an Object can be defined as an Event. These changes usually cause a new occurrence and, depending on the duration, this can be defined as a new Event or an Action. • Individual: Instance of an ontology class. In the case of automotive scenarios, a named class should be assigned to each individual of any scene. Hence, individuals of a class are defined by a unique identifier (UID) and a unique name that is specific to each analyzed case. • Object: A class that represents anything tangible, i.e., a person or thing. They are the main elements of the ontology and can be related to attributes or actions. • Ontology: Formal description of concepts (Class) and their relations (Axioms) according to a common understanding of experts in the domain. The definition of these elements can be completed with properties or restrictions. • Scenario: A quantitative and qualitative description of the situation (e.g., traffic environment), as the sequence of Actions and Events performed by Objects.

The AGO Domain Model
AGO aims to cover the main elements required to support semantic labeling and the description of automotive scenarios for testing environments. Hence, the core concepts defined in the ontology correspond to those used to structure the information in VCD and OpenLABEL. Its high-level structure can be seen in Figure 1. The main superclasses of the ontology are Object, Context, Action, and Event. These elements form the first level of classes and all other classes are derived from them. Furthermore, in VCD and OpenLABEL, the Relation element is required to structure the domain knowledge and semantically enrich the ontology. The RDF language model [10] defines several axioms for describing properties and relationships among named terms: "rdfs: domain", "rdfs: range", "rdfs: subClassOf".
The main superclasses of the ontology are Object, Context, Action, and Event. These elements form the first level of classes and all other classes are derived from them. Furthermore, in VCD and OpenLABEL, the Relation element is required to structure the domain knowledge and semantically enrich the ontology. The RDF language model [10] defines several axioms for describing properties and relationships among named terms: "rdfs: domain", "rdfs: range", "rdfs: subClassOf". Additionally, some non-standard relations are proposed in AGO for the purpose of covering the needs of the description of automotive scenarios.

•
"isSubjectOfAction" and "isObjectOfAction": these axioms are defined for actions expressed with transitive verbs. Transitive actions can be naturally treated as triples from a language perspective. Nevertheless, transitive triples do not allow relating the action with other nodes of the ontology (because the action in a transitive triple is the predicate or relation between subject and object, and thus it is not a class). Therefore, transitive actions are unwrapped as two related RDF triples, where the action is a class and the relations are "isSubjectOfAction" and "isObjectOfAction". In the case of the intransitive actions, a unique triple is generated relating the element that performs the action and the Action itself. • "isSubjectOfEvent" and "isObjectOfEvent": as for the Actions, the Events can be also distinguished as either transitive or intransitive. Therefore, the same type of relations is defined in AGO for these elements. One describes "who" performs the Event and the other "who/what" is affected by it. • "causes": Events are occurrences that happen in a time instant and usually trigger another Event or Action. Therefore, this axiom is adopted to relate Events with Actions and represent this effect.
To provide the ontology with the capability of representing spatio-temporal information by relating the different classes, two additions were performed. First, Allen's temporal relations [21] were adopted (e.g., "meets" as a relation used to define the timeline of the scenario by relating Action and Events in temporal order).
Second, for defining the spatial relations among the elements (e.g., required to construct a complete description of the road network): • "isPartOf": describes a spatial relation between the different Objects (i.e., "Lane"-"isPartOf" -> "Road") Additionally, some non-standard relations are proposed in AGO for the purpose of covering the needs of the description of automotive scenarios.

•
"isSubjectOfAction" and "isObjectOfAction": these axioms are defined for actions expressed with transitive verbs. Transitive actions can be naturally treated as triples from a language perspective. Nevertheless, transitive triples do not allow relating the action with other nodes of the ontology (because the action in a transitive triple is the predicate or relation between subject and object, and thus it is not a class). Therefore, transitive actions are unwrapped as two related RDF triples, where the action is a class and the relations are "isSubjectOfAction" and "isObjectOfAction". In the case of the intransitive actions, a unique triple is generated relating the element that performs the action and the Action itself. • "isSubjectOfEvent" and "isObjectOfEvent": as for the Actions, the Events can be also distinguished as either transitive or intransitive. Therefore, the same type of relations is defined in AGO for these elements. One describes "who" performs the Event and the other "who/what" is affected by it. • "causes": Events are occurrences that happen in a time instant and usually trigger another Event or Action. Therefore, this axiom is adopted to relate Events with Actions and represent this effect.
To provide the ontology with the capability of representing spatio-temporal information by relating the different classes, two additions were performed. First, Allen's temporal relations [21] were adopted (e.g., "meets" as a relation used to define the timeline of the scenario by relating Action and Events in temporal order).
For the representation of the knowledge, in the OWL Reference documentation, the W3C states that an OWL ontology is an RDF graph, which is in turn a set of RDF triples [11] (subject-predicate -> object). Therefore, some notations from the data-modeling vocabulary defined by the RDF Schema and the OWL language principles were adopted for the purpose of this work [10]:

•
Classes are identified by an Internationalized Resource Identifier (IRI). In addition, each Class is represented by a lexically meaningful Uniform Resource Identifier (URI) that is unique for each entity.

•
The RDF/XML document includes an ontology header with the defined base URI.

•
The set of all individuals is defined by the class extension of "owl:Thing".
All listed characteristics are automatically considered by Protégé when including new Class entities in the hierarchy. To complete the description of the elements, a description and a label are included as annotations, which results in a class representation, as depicted in Figure 2.
For the representation of the knowledge, in the OWL Reference documentation, the W3C states that an OWL ontology is an RDF graph, which is in turn a set of RDF triples [11] (subject-predicate -> object). Therefore, some notations from the data-modeling vocabulary defined by the RDF Schema and the OWL language principles were adopted for the purpose of this work [10]:

•
Classes are identified by an Internationalized Resource Identifier (IRI). In addition, each Class is represented by a lexically meaningful Uniform Resource Identifier (URI) that is unique for each entity.
• The RDF/XML document includes an ontology header with the defined base URI.

•
The set of all individuals is defined by the class extension of "owl:Thing".
All listed characteristics are automatically considered by Protégé when including new Class entities in the hierarchy. To complete the description of the elements, a description and a label are included as annotations, which results in a class representation, as depicted in .

Methodology: AGO Construction
This section presents the knowledge acquired during the process of constructing AGO. Error! Reference source not found. shows the pipeline of the followed methodology for the construction of the automotive domain ontology.

First Phase: Definition of the Scope and Knowledge Acquisition
The first stage is an analysis phase and aims to establish the concrete scope of the use case. This determines which objects are included in the taxonomy of classes. This step requires having a deep knowledge of the domain of interest and can be the most expensive phase in terms of person-hours. In this case, several large-scale open datasets for the development of AD technologies and the Safety Pool taxonomy were analyzed for acquiring

Methodology: AGO Construction
This section presents the knowledge acquired during the process of constructing AGO. Figure 3 shows the pipeline of the followed methodology for the construction of the automotive domain ontology. the network of objects. Concatenating these static objects (i.e., "Road"-"isConnect-edTo" -> "Intersection") is required to formally represent the road network in Open-DRIVE (https://www.asam.net/standards/detail/opendrive/ accessed on 13 May 2021) format.
For the representation of the knowledge, in the OWL Reference documentation, the W3C states that an OWL ontology is an RDF graph, which is in turn a set of RDF triples [11] (subject-predicate -> object). Therefore, some notations from the data-modeling vocabulary defined by the RDF Schema and the OWL language principles were adopted for the purpose of this work [10]:

•
Classes are identified by an Internationalized Resource Identifier (IRI). In addition, each Class is represented by a lexically meaningful Uniform Resource Identifier (URI) that is unique for each entity.
• The RDF/XML document includes an ontology header with the defined base URI.

•
The set of all individuals is defined by the class extension of "owl:Thing".
All listed characteristics are automatically considered by Protégé when including new Class entities in the hierarchy. To complete the description of the elements, a description and a label are included as annotations, which results in a class representation, as depicted in .

Methodology: AGO Construction
This section presents the knowledge acquired during the process of constructing AGO. Error! Reference source not found. shows the pipeline of the followed methodology for the construction of the automotive domain ontology.

First Phase: Definition of the Scope and Knowledge Acquisition
The first stage is an analysis phase and aims to establish the concrete scope of the use case. This determines which objects are included in the taxonomy of classes. This step requires having a deep knowledge of the domain of interest and can be the most expensive phase in terms of person-hours. In this case, several large-scale open datasets for the development of AD technologies and the Safety Pool taxonomy were analyzed for acquiring

First Phase: Definition of the Scope and Knowledge Acquisition
The first stage is an analysis phase and aims to establish the concrete scope of the use case. This determines which objects are included in the taxonomy of classes. This step requires having a deep knowledge of the domain of interest and can be the most expensive phase in terms of person-hours. In this case, several large-scale open datasets for the development of AD technologies and the Safety Pool taxonomy were analyzed for acquiring knowledge and defining the elements of the ontology. The number of elements is summarized in Table 1. The goal was to define generic classes that cover as many elements of the analyzed resources as possible to enhance interoperability, while maintaining a sufficient level of detail for the applications. Thus, the desired output was a domainspecific hierarchically structured knowledge graph. In general, these datasets do not define classes in a hierarchy but often constitute a one-level list. In addition, only two of the listed datasets cover a few classes related to maneuvers (Action classes in the AGO domain). However, actions are presented as special attributes of the corresponding objects. Due to the manner in which they are defined, these datasets do not consider the need of making a distinction between Action and Events. Moreover, they do not present semantically relevant relations among these elements and the objects. Taking into account that most of these datasets do not consider the traffic maneuvers, the "ALKS Regulation UN R157" (https://undocs.org/ECE/TRANS/WP.29/2 020/81 accessed on 20 July 2021) was included in AGO so that the ontology gathers the critical scenarios listed in the regulation as Actions, along with some additional Object classes and properties. As a result, AGO is an ontology that offers flexibility for present and future labeling needs, supported by the established mechanism to connect Objects, Actions, and Events through Relations. AGO also includes, in its current form, a rich set of object classes, which are easily extensible, and an equivalently wide number of actions and maneuvers that can serve as a basis for multiple applications in the domain (e.g., test coverage analysis, online data recording, and digital twins).

Technologies and Tool
The Semantic Web focuses on enabling machines with the capabilities of providing formal structured information using the encoded knowledge from ontologies [22]. There are several languages that enable formalizing and encoding knowledge. For the purpose of this work, the chosen ontology representation language was the Web Ontology Language (OWL) [11]. OWL was developed as a vocabulary extension of the Resource Description Framework (RDF) specification. This linking structure forms a directed, labeled graph, in which the edges represent the named link between two resources, represented by the graph nodes [5].
There are several tools specifically designed to create and manage ontologies: Protégé, Protégé Web, Fluent Editor (FE), OWLGrEd, etc. For this approach, the ontology was defined with Protégé (https://protege.stanford.edu/ accessed on 11 March 2021) because this tool enables intuitive construction of complete ontologies. Similarly, the ontology can be exported into different ontology representation languages, including RDF and OWL. This tool also provides the user with a Taxonomy Tree viewer (see Figure 4) and other visualization tools, such as OWLViz, which generates a diagram of the selected elements in different layouts.
(owl:versionInfo) is defined as a built-in annotation property of OWL [11]}. Furthermore, some of the Dublin Core Metadata Initiative (DCMI) [23] terms are adopted: Contributor, creator, description, title, date submitted, license, and publisher. In addition, to complete the description of the classes, a brief description and a label are included as defined by the RDF schema [10]: rdfs:comment and rdfs:label.

Ontology Construction
The AGO domain ontology was constructed with the common classes identified among the datasets in the first phase, and the analysis was undertaken with the objective of identifying the main terms and the related concepts. Thus, this phase involved the following tasks: • Identification of the common classes among the datasets; • Define upper-level classes for logically structuring the hierarchy; • Include every class defined in the datasets (manage synonyms).
The defined ontology provides a shared understanding of common concepts (Classes) among the main automotive open datasets. Therefore, it can be considered to be a domain ontology. The general classification of the classes was conducted according to the element types defined in the AGO domain model description (Section Error! Reference source not found.). Hence, the upper level consists of four core concepts, as depicted in Error! Reference source not found.. In addition, it was constructed in a top-manner down using the Containment Relationship Type [24] ("rdfs:subClassOf") and according to the hierarchy defined on the datasets. Consequently, most of the included classes appear in one or more of the analyzed datasets. After this process, the hierarchy of classes evolves from the simpler tree or plain lists defined in the source datasets into a more complex graph structure in which each class may have several parent nodes. This tool also provides the means to include Annotation properties, which comprise ontology metadata and general properties of the classes. For the metadata, the versionInfo (owl:versionInfo) is defined as a built-in annotation property of OWL [11]}. Furthermore, some of the Dublin Core Metadata Initiative (DCMI) [23] terms are adopted: Contributor, creator, description, title, date submitted, license, and publisher. In addition, to complete the description of the classes, a brief description and a label are included as defined by the RDF schema [10]: rdfs:comment and rdfs:label.

Ontology Construction
The AGO domain ontology was constructed with the common classes identified among the datasets in the first phase, and the analysis was undertaken with the objective of identifying the main terms and the related concepts. Thus, this phase involved the following tasks:

•
Identification of the common classes among the datasets; • Define upper-level classes for logically structuring the hierarchy; • Include every class defined in the datasets (manage synonyms).
The defined ontology provides a shared understanding of common concepts (Classes) among the main automotive open datasets. Therefore, it can be considered to be a domain ontology. The general classification of the classes was conducted according to the element types defined in the AGO domain model description (Section 4). Hence, the upper level consists of four core concepts, as depicted in Figure 1. In addition, it was constructed in a top-manner down using the Containment Relationship Type [24] ("rdfs:subClassOf") and according to the hierarchy defined on the datasets. Consequently, most of the included classes appear in one or more of the analyzed datasets. After this process, the hierarchy of classes evolves from the simpler tree or plain lists defined in the source datasets into a more complex graph structure in which each class may have several parent nodes.

Knowledge Representation Language and Knowledge Graph Structure
The proposed structure of AGO is a directed graph because this configuration provides the means of representing high-order semantic relations as RDF triples with basic elements of this non-SQL datasets. In terms of the triples, the subject and object are represented by "Class" labelled nodes. Data properties (owl:DataProperty) are also included as nodes under the "dataProperty" label. These serve as attributes to the classes according to the OWL specification. To semantically cover a traffic scenario, several complex relationships are required to link different elements. The user-defined axioms presented in Section 4 are included as object properties (owl:ObjectProperty) which, according to the OWL (RDF/XML) Structural Specification [20], is the construct defined to connect pairs of individuals with user-defined relationships. Therefore, the graph is populated with them as nodes with the "objectProperty" label to represent the predicate of the triples. Both property type nodes are related using the "DOMAIN" and "RANGE" edges. This explanation is illustrated in the graph snippet of Figure 5.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 22 The proposed structure of AGO is a directed graph because this configuration provides the means of representing high-order semantic relations as RDF triples with basic elements of this non-SQL datasets. In terms of the triples, the subject and object are represented by "Class" labelled nodes. Data properties (owl:DataProperty) are also included as nodes under the "dataProperty" label. These serve as attributes to the classes according to the OWL specification. To semantically cover a traffic scenario, several complex relationships are required to link different elements. The user-defined axioms presented in Section Error! Reference source not found. are included as object properties (owl:Ob-jectProperty) which, according to the OWL (RDF/XML) Structural Specification [20], is the construct defined to connect pairs of individuals with user-defined relationships. Therefore, the graph is populated with them as nodes with the "objectProperty" label to represent the predicate of the triples. Both property type nodes are related using the "DOMAIN" and "RANGE" edges. This explanation is illustrated in the graph snippet of Error! Reference source not found..

Third Phase: Database Building in Neo4j
The third stage implies building the knowledge-graph database and making the ontology available for its use via programmatic interfaces. For this purpose, a database to store the data was selected. In this case, Neo4j (https://neo4j.com/neo4j-graph-database/ accessed on 24 July 2021) neo4j-community-4.2.0) was chosen as the database for storing the ontology. To represent the ontology as a graph in Neo4j following the standards, the neosemantics plugin (n10s) (https://github.com/neo4j-labs/neosemantics accessed on 4 August 2021) is required. This enables importing and exporting the ontology graph from and into OWL files for further use. Using n10s Release 4.1.0, a Cypher query can be used to import the OWL ontology from a local directory or a URL. In addition, the query can be passed with some specific parameters that determine the names given to the elements in the database. The specific query used to structure the ontology is presented in Error! Reference source not found.. The imported data are further processed by including specific Neo4j labels to classify the nodes according to the core concepts. Grouping nodes with tags related to the first level of classes in the hierarchy (i.e., Action, Object, etc.) helps optimization of the queries

Third Phase: Database Building in Neo4j
The third stage implies building the knowledge-graph database and making the ontology available for its use via programmatic interfaces. For this purpose, a database to store the data was selected. In this case, Neo4j (https://neo4j.com/neo4j-graph-database/ accessed on 24 July 2021) neo4j-community-4.2.0) was chosen as the database for storing the ontology. To represent the ontology as a graph in Neo4j following the standards, the neosemantics plugin (n10s) (https://github.com/neo4j-labs/neosemantics accessed on 4 August 2021) is required. This enables importing and exporting the ontology graph from and into OWL files for further use. Using n10s Release 4.1.0, a Cypher query can be used to import the OWL ontology from a local directory or a URL. In addition, the query can be passed with some specific parameters that determine the names given to the elements in the database. The specific query used to structure the ontology is presented in Figure 6. The proposed structure of AGO is a directed graph because this configuration provides the means of representing high-order semantic relations as RDF triples with basic elements of this non-SQL datasets. In terms of the triples, the subject and object are represented by "Class" labelled nodes. Data properties (owl:DataProperty) are also included as nodes under the "dataProperty" label. These serve as attributes to the classes according to the OWL specification. To semantically cover a traffic scenario, several complex relationships are required to link different elements. The user-defined axioms presented in Section Error! Reference source not found. are included as object properties (owl:Ob-jectProperty) which, according to the OWL (RDF/XML) Structural Specification [20], is the construct defined to connect pairs of individuals with user-defined relationships. Therefore, the graph is populated with them as nodes with the "objectProperty" label to represent the predicate of the triples. Both property type nodes are related using the "DOMAIN" and "RANGE" edges. This explanation is illustrated in the graph snippet of Error! Reference source not found..

Third Phase: Database Building in Neo4j
The third stage implies building the knowledge-graph database and making the ontology available for its use via programmatic interfaces. For this purpose, a database to store the data was selected. In this case, Neo4j (https://neo4j.com/neo4j-graph-database/ accessed on 24 July 2021) neo4j-community-4.2.0) was chosen as the database for storing the ontology. To represent the ontology as a graph in Neo4j following the standards, the neosemantics plugin (n10s) (https://github.com/neo4j-labs/neosemantics accessed on 4 August 2021) is required. This enables importing and exporting the ontology graph from and into OWL files for further use. Using n10s Release 4.1.0, a Cypher query can be used to import the OWL ontology from a local directory or a URL. In addition, the query can be passed with some specific parameters that determine the names given to the elements in the database. The specific query used to structure the ontology is presented in Error! Reference source not found.. The imported data are further processed by including specific Neo4j labels to classify the nodes according to the core concepts. Grouping nodes with tags related to the first level of classes in the hierarchy (i.e., Action, Object, etc.) helps optimization of the queries The imported data are further processed by including specific Neo4j labels to classify the nodes according to the core concepts. Grouping nodes with tags related to the first level of classes in the hierarchy (i.e., Action, Object, etc.) helps optimization of the queries to the ontology because the consultation is undertaken for a smaller subset of nodes. Consequently, the ontology-based applications developed in later works should be optimized in terms of ontology consultation.
The resulting graphs' main characteristics are summarized in Table 2. AGO has a total of 523 nodes and 1365 relationships. Some of the class nodes have related properties to provide the user with complete semantic information about each element. These are known as property keys in Neo4j. When including the elements in Neo4j, the features of each element are added as node properties. Table 3 summarizes the included property keys with the related OWL syntax equivalence. Represents the type of data that each dataProperty element should have, which is related to the properties of the classes. For example, the "color" attribute should be a string whereas "height" should be an integer.

Datatypes
The taxonomy tree for the main classes in Protégé and OpenLABEL differs because the relations are not defined as a Class type in the graph representation. The user-defined axioms are defined as object property nodes and connected with the required elements with the "DOMAIN" and "RANGE" relationships to build the RDF triples. Hence, the subject of the triple is represented by the domain and the object by the range terms. Figure 5 represents the "Lane"-"isPartOf" -> "Road" object property triple as a graph. In addition, Figure 5 also depicts how the "curvature" attribute is related to the "Road" class following the data property syntax definition.

Use Cases
This section summarizes two different but related use cases in the automotive domain. The first proposes the utilization of the ontology to guide the creation of configuration files for semantic labeling applications. The second relates to the creation of a database of graphical scenario representations from real data labels and expert knowledge.

Semantic Labeling
Labeling is the process of creating descriptions of the content of some data. For images or other sensorial data, labels are typically spatio-temporal entities that determine the presence of objects or actions in the reality captured by the sensors [1]. With the emergence of DL, labeling has become a major activity for automakers and providers of electronics. With DL, a sufficiently large and rich dataset containing sensorial data and labels can be used to train models that learn from the dataset and predict labels on previously unseen data. This ability has triggered the creation of many ADAS and AD functions.
As a consequence, datasets have become a critical asset for players in the automotive market. Labels are frequently defined using function-specific or customized taxonomies and lacking a global or universal hierarchy. Datasets are usually non-compatible and difficult to merge due to semantic inconsistencies between the used terms.
In this sense, AGO, as a domain ontology, can serve as the core of a data translation function that maps relations between (otherwise non-interoperable) datasets. During the development of the work, all the datasets were represented graphically, and it was verified that the labels of the elements can be mapped to their respective synonyms in the ontology. This translation function requires the definition of advanced search queries. The result may be used to feed training models that need to be adapted for new tests that use different datasets from those used for training. A diagram of the process is presented in Figure 7.

Semantic Labeling
Labeling is the process of creating descriptions of the content of some data. For images or other sensorial data, labels are typically spatio-temporal entities that determine the presence of objects or actions in the reality captured by the sensors [1]. With the emergence of DL, labeling has become a major activity for automakers and providers of electronics. With DL, a sufficiently large and rich dataset containing sensorial data and labels can be used to train models that learn from the dataset and predict labels on previously unseen data. This ability has triggered the creation of many ADAS and AD functions.
As a consequence, datasets have become a critical asset for players in the automotive market. Labels are frequently defined using function-specific or customized taxonomies and lacking a global or universal hierarchy. Datasets are usually non-compatible and difficult to merge due to semantic inconsistencies between the used terms.
In this sense, AGO, as a domain ontology, can serve as the core of a data translation function that maps relations between (otherwise non-interoperable) datasets. During the development of the work, all the datasets were represented graphically, and it was verified that the labels of the elements can be mapped to their respective synonyms in the ontology. This translation function requires the definition of advanced search queries. The result may be used to feed training models that need to be adapted for new tests that use different datasets from those used for training. A diagram of the process is presented in Error! Reference source not found.. Figure 7. Using AGO ontology for translating of heterogeneous labels (e.g., from different datasets).
As an example of the above-mentioned process, Error! Reference source not found. depicts a graphical representation of the classes defined in the "Waymo" dataset mapped to their equivalent AGO classes. The equivalence among elements is defined by the Figure 7. Using AGO ontology for translating of heterogeneous labels (e.g., from different datasets).
As an example of the above-mentioned process, Figure 8 depicts a graphical representation of the classes defined in the "Waymo" dataset mapped to their equivalent AGO classes. The equivalence among elements is defined by the "owl:sameAs" relationship, which means that one can be replaced by the other without altering the meaning and vice versa. manner, the database can be used for generating structure or configuration files that can guide a labeling application or process to produce labels not only in the expected format (e.g., OpenLABEL), but also semantically compatible with the ontology as depicted in Error! Reference source not found.. Also related with these files, both the structure and the terms defined using natural language can be validated with a content-checker application. Thus, even if there are different users working with the tool or defining several use cases, the terms are chosen from the ontology and remain the same for every use case. For instance, in the case of web-based annotation tools [25], its functionality could be boosted by including relations among the annotated objects. In addition, the previous annotations and configuration files may be checked according to the vocabulary defined in the ontology, enhancing interoperability by ensuring a common understanding of the domain concepts. Figure 8. Waymo dataset mapping to AGO example. Matched classes represent the common terms among the different datasets and, thus, the translation can be undertaken by identifying these equivalences. This type of application allows identifying the relations among different datasets and translating the terms to the needed terminology. The translation can be performed automatically or manually, with user interfaces showing a graph visualization and navigation capabilities.
Furthermore, the ontology can be part of a knowledge management system and used before the labels are created, at the annotation or labeling stage (see Figure 9). Specifically, the ontology can serve as a database with a formal definition of a unified understanding of the terminology related with the use case. In this manner, the database can be used for generating structure or configuration files that can guide a labeling application or process to produce labels not only in the expected format (e.g., OpenLABEL), but also semantically compatible with the ontology as depicted in Figure 9. Also related with these files, both the structure and the terms defined using natural language can be validated with a content-checker application. Thus, even if there are different users working with the tool or defining several use cases, the terms are chosen from the ontology and remain the same for every use case. For instance, in the case of web-based annotation tools [25], its functionality could be boosted by including relations among the annotated objects. In addition, the previous annotations and configuration files may be checked according to the vocabulary defined in the ontology, enhancing interoperability by ensuring a common understanding of the domain concepts. . Using AGO to specify taxonomies and configuration files before the labeling stage to produce semantically compliant content.

Graphical Scenario Representation
Scenario representation is also a key aspect of ADAS/AD development and testing. Rich and realistic scenario representations can lead to the generation of simulated environments that can be utilized in virtual testing procedures (e.g., Hardware-in-the-Loop simulations). Scenarios may include description of the participants of a road or driving scene, including their interactions, spatio-temporal relations, etc.
Scenarios can be generated from expert knowledge, i.e., from high-level descriptions of how the situation should be, or from real data, i.e., from semantic labels obtained from annotation processes.
In this use case, both approaches were implemented, creating synthetic scenarios from the ALKS regulation, and real scenarios from the KITTI dataset (http://www.cvlibs.net/datasets/kitti/eval\_tracking.php accessed on 4 June 2021). These scenarios were created by first using the VCD toolkit to create the RDF entries in the OpenLABEL format, which is flexible enough to host high-level actions and relations (for the ALKS scenario), and to abstract high-level information from detailed labels (for the KITTI annotations).
Second, these VCD payloads were imported into Neo4j to build a scenario database. In the graph, each scenario is represented by a primary node with the following metadata: • cnl_text: textual description of the scenario in a Controlled Natural Language (CNL); • date_db: date and time of the latest update of the node; • scenario_uid: is composed by the information source and a numerical id; • schema_version: the VCD version used to represent the imported information.
The working example presented in this section is represented by the schematic diagram in Error! Reference source not found.. Additionally, the scenario is graphically depicted in Error! Reference source not found.. The depicted example includes some individuals and these nodes have the following information included as node properties: • frame_intervals: the start and end frames for each node; Figure 9. Using AGO to specify taxonomies and configuration files before the labeling stage to produce semantically compliant content.

Graphical Scenario Representation
Scenario representation is also a key aspect of ADAS/AD development and testing. Rich and realistic scenario representations can lead to the generation of simulated environments that can be utilized in virtual testing procedures (e.g., Hardware-in-the-Loop simulations). Scenarios may include description of the participants of a road or driving scene, including their interactions, spatio-temporal relations, etc.
Scenarios can be generated from expert knowledge, i.e., from high-level descriptions of how the situation should be, or from real data, i.e., from semantic labels obtained from annotation processes.
In this use case, both approaches were implemented, creating synthetic scenarios from the ALKS regulation, and real scenarios from the KITTI dataset (http://www.cvlibs.net/ datasets/kitti/eval\_tracking.php accessed on 4 June 2021). These scenarios were created by first using the VCD toolkit to create the RDF entries in the OpenLABEL format, which is flexible enough to host high-level actions and relations (for the ALKS scenario), and to abstract high-level information from detailed labels (for the KITTI annotations).
Second, these VCD payloads were imported into Neo4j to build a scenario database. In the graph, each scenario is represented by a primary node with the following metadata: • cnl_text: textual description of the scenario in a Controlled Natural Language (CNL); • date_db: date and time of the latest update of the node; • scenario_uid: is composed by the information source and a numerical id; • schema_version: the VCD version used to represent the imported information.
The working example presented in this section is represented by the schematic diagram in Figure 10. Additionally, the scenario is graphically depicted in Figure 11. The depicted example includes some individuals and these nodes have the following information included as node properties: • frame_intervals: the start and end frames for each node; • name: the name of the individual in CamelCase given as the class name plus a number used to list the individuals with the same type; • type: the name of the corresponding ontology class. Continuing with the scenario representation illustrated in Error! Reference source not found., the upper nodes represent the core static elements that correspond to the upper items of the VCD JSON schema. Hence, the metadata information is included as the node properties of the center node of the representation.
The individuals are presented with the semantically meaningful relations among them. These relations can be easily translated into the form of RDF triples, taking into account the pre-defined "subClassOf" containment relationship. Continuing with the scenario representation illustrated in Figure 11, the upper nodes represent the core static elements that correspond to the upper items of the VCD JSON schema. Hence, the metadata information is included as the node properties of the center node of the representation.
The individuals are presented with the semantically meaningful relations among them. These relations can be easily translated into the form of RDF triples, taking into account the pre-defined "subClassOf" containment relationship.
In addition, considering the AGO domain axioms, the list of the extracted triples for the example in Figure 11 can be extended. Taking all these triples into account, a translation into a NL textual description can be easily undertaken. Both the list of triples and the NL textual description are presented in Table 4. At this stage of the pipeline, the resulting data correspond to a functional scenario [9], which can be extended to obtain a logical scenario by considering the data properties defined in the ontology and included with the "hasAttribute" relation to the scenario representation. 021, 11, x FOR PEER REVIEW 17 of 22 Figure 11. Part of the representation as a graph of the KITTI dataset example scenarios' static elements.
In addition, considering the AGO domain axioms, the list of the extracted triples for the example in Error! Reference source not found. can be extended. Taking all these triples into account, a translation into a NL textual description can be easily undertaken. Both the list of triples and the NL textual description are presented in Figure 11. Part of the representation as a graph of the KITTI dataset example scenarios' static elements. Table 4. RDF triples converted into NL textual description for the KITTI dataset example scenario.

NL Description
The ego-vehicle is driving straight in lane1, which is part of a single-way two-lane road. When another car passes the ego-vehicle, then it starts lane changing into the other lane of the same road, lane2.

Results
The example ontology produced following the proposed methodology is available online (https://vcd.vicomtech.org/ontology/automotive accessed on 23 August 2021) in RDF format. The automotive ontology is composed of 390 class elements classified into three main groups using Neo4j labels: "Object", "Context", "Action" and "Event". The definition of each element is completed with annotation properties. In addition, the file contains 1367 relationships, of which 398 represent hierarchical relations among the classes (and so defining a graph hierarchy). AGO can be used as a top-level ontology and reused as a starting point to build new domain or application ontologies. Furthermore, SWRL (Semantic Web Rule Language (https://www.w3.org/Submission/SWRL/ accessed on 10 May 2021)) rules are included in the ontology file to extend the axioms of the scenarios by inferring knowledge. They can be also used to validate that the inclusion of the individuals is correct. Nevertheless, these rules are not imported into Neo4j and, therefore, their usability is not further extended in this paper.
The scripts used to build the ontology and scenario databases in Neo4j and other additional material, such as ALKS and KITTI functional scenario files, can be found at the GitHub repository (https://github.com/Vicomtech/video-content-description-VCD/tree/ master/ontologies accessed on 23 August 2021).

Comparison with Existing Ontologies
Different available ontologies related to the automotive domain were analyzed (summary presented in Table 5) to identify the gaps that need to be addressed for semantic labelling and scenario representation. Starting with the Transport Disruption Ontology [7] mentioned in Section 2, an exhaustive list of hierarchically classified events that may cause disturbances in traffic scenarios is presented. "Agent" is defined to be the subject of the events; however, the only objects defined as its subclasses are "Person", "Group", and "Organization". The scope of the Transport Disruption Ontology differs from the interests of AGO; consequently, this ontology does not cover the whole range of actions and objects related to traffic scenarios. Nevertheless, it may be used to extend the classes by reusing existing developments. Further, the listed "event" classes have related temporal objects based on the OWL-Time ontology, which allows the distinctions to be made among occurrences that happen instantaneously or during a time range. This means that, although they have somehow identified the need to make a distinction between Action and Events, this is implemented implicitly rather than explicitly. Hence, some object properties defined in relation to the "time" classes work as in the same manner as Allen's temporal relations. Most of the published approaches cover completely different scopes within the automotive domain; therefore, they lack many classes and properties to fulfill the requirements of scenario representation and semantic labelling. One example is The Automotive Ontology (AUTO) [26] created by the W3C Automotive Ontology Community Group, which only covers classes related to popular cars, buses, and motorcycles. Continuing with TTI Core [27], this is a layered approach presented as three core ontologies for safe autonomous driving: car, control, and map ontologies. Each of these covers a minimum portion of the domain-related classes and, despite not covering space-time relationships, it presents elements in order to relate objects to map elements In contrast, the ontology for scenarios for the assessment of autonomous vehicles [28] is the method that most resembles AGO because it makes a clear distinction between Action and Event classes. However, the relationships defined to relate classes are not clear because the ontology is not publicly accessible and the information about it is scarce.

Discussion
Most existing ontology building approaches in the automotive domain do not use graph-based tools as part of their pipeline. The presented methodology is based on Neo4j, a graph database that provides flexibility to easily modify and update the ontology with new information. This is a key feature when developing ontology-based approaches, because new scenarios will generate new individuals and, in turn, these elements should be updated for each case. In addition, graph-based data representation models are especially effective for the expression of highly related data, such as hierarchical classification or mappings between concepts. A graph database is also an interesting tool to provide interoperability to the ontology, because it is compatible with numerous services and languages. Therefore, it is easy to use with different applications.
In this work, the area of interest was the automotive applications of semantic labeling and scenario-based testing. Hence, a formal description of the ontology that covers as many driving scenes as possible was presented. The pipeline is based on the taxonomical structure of classes presented in the main automotive datasets, with expressivity and semantic load added via the inclusion of new relations. As result a reusable top-level automotive domain ontology, named AGO, was defined.
The methodology is defined with accessible tools and steps that do not require a significant technical background. The objective was to provide the means to construct and take advantage of the ontologies of as many user profiles as possible.
The knowledge regarding the construction of a domain ontology presented in this paper, and the two selected use cases, has been made available to the ASAM standardization group (https://www.asam.net/ accessed on 5 June 2021) for the development of the OpenLABEL and OpenXOntology standardization projects (https://www.asam.net/ project-detail/asam-openxontology/ accessed on 5 June 2021) (to appear 2021-2022), to contribute to the automotive industry and the scientific community. Funding: This work has received funding from the European Union's H2020 research and innovation program (grant agreement no 824309, project HEADSTART).