Next Article in Journal
How Digital Technology Shapes the Spatial Evolution of Global Value Chains in Financial Services
Previous Article in Journal
A Longitudinal Analysis of Chinese Urban Residents’ Livelihood Mobility Based on Investigation of Livelihood Trajectories
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation

by
Minwook Kim
,
Nayeon Kim
,
Heesoo Kim
and
Tai-Jin Song
*
Department of Urban Engineering, Chungbuk National University, Cheongju-si 28644, Chungcheongbuk-do, Republic of Korea
*
Author to whom correspondence should be addressed.
Sustainability 2025, 17(24), 11237; https://doi.org/10.3390/su172411237
Submission received: 22 October 2025 / Revised: 2 December 2025 / Accepted: 9 December 2025 / Published: 15 December 2025

Abstract

Autonomous vehicle (AV) accidents introduce uncertainty in liability attribution, as responsibility is divided between humans and automated systems. The 2018 Arizona crash highlighted growing societal concerns about accountability. To address these issues, prior studies proposed investigation processes considering perception sensors, driving control systems, communication infrastructure, and cybersecurity. However, conducting such investigations requires integrating large-scale data from multiple sources, including vehicle sensors, onboard recorders, V2X communications, and road infrastructure. Raw data often lack descriptive information, limiting their use in real investigations. This study establishes a structured mapping framework linking investigation procedures, responsible entities, items, and data across accident phases. With this backdrop, an autonomous driving–specific metadata schema extending DCAT was designed, comprising 10 Classes and 76 Properties. To demonstrate its applicability, a prototype data catalog user interface (UI) was conceptualized with data discovery and visualization examples. The proposed schema strengthens accountability and interoperability by explicitly aligning responsibilities and data relationships. It enables precise event localization and effective linkage of heterogeneous data. Future work will refine the schema by incorporating DSSAD, V2X, and security log data, and develop a user-tested UI prototype as a practical support tool for AV accident investigation.

1. Introduction

Before the introduction of autonomous driving (AV) technology, approximately 94% of traffic accidents were found to be attributable to human factors [1]. Therefore, unless there was clear evidence such as vehicle defects, liability for accidents was mostly concentrated on human drivers. In March 2018, an AV undergoing a test drive in Arizona, USA, collided with a pedestrian while driving. This resulted in the first recorded fatal accident caused by an AV. According to the NTSB’s investigation, the accident’s causes ultimately boiled down to two factors: (1) the test driver’s inattention and (2) the functional limitations of the autonomous driving system (ADS). However, the court’s ruling placed greater responsibility on the former. Thus, the occurrence of AV accidents heightened uncertainty in assigning responsibility by differentiating the liable parties into ‘human’ and ‘system’ [1]. This highlighted the necessity for an accident investigation system to clearly determine the cause of accidents involving AV(s) and identify the responsible party [2].
Accident investigations involving AVs comprehensively reflect various factors, including information from the vehicle’s internal perception sensors, physical vehicle control data based on driving decision outcomes, external infrastructure and communication data, and cyber attacks. This is designed to enable tracking of the entire ‘perception-decision-control’ process. Investigating diverse factors requires systems that are more varied and sophisticated than those used in past conventional traffic accident processes. Therefore, developing systems for autonomous accident investigation necessitates the systematic collection and storage of diverse data, including physical data on actual driving behavior previously unconsidered, and information from infrastructure and communication interactions. The diverse raw data generated and collected from vehicles, infrastructure, and other sources must be incorporated into a structured data management system that considers its relevance to accident investigation items and entities [3]. This indicates that establishing an autonomous driving accident investigation system begins with defining metadata. Metadata is data containing additional information about stored raw data, such as its content, format, and storage location, and is generally defined as ‘data about data’ [4]. Metadata provides usable attribute information to raw data and creates a system to manage it. Effective use of metadata requires the additional establishment of a data catalog. A data catalog can provide indexes, access methods, data descriptions, and simple visualizations for raw data based on metadata [5]. The objective of this study is to propose a standardization approach for the metadata schema and data catalog of diverse data generated and collected in autonomous driving environments. This aims to effectively investigate traffic accidents involving autonomous driving, which are expected to surge in the future. To achieve this, the study first establishes a mapping structure based on the autonomous driving accident investigation process. This maps the relationships between the investigation process, entities, items, and utilized data according to the accident occurrence stage (Pre-, In-, and Post-crash). Second, a specialized metadata schema applicable to autonomous driving accident investigations is developed. This data schema applies and extends various schema techniques used in other fields to reflect the unique characteristics of autonomous driving. Finally, the defined metadata schema is utilized for a pilot implementation of a data catalog user interface. The proposed user interface implements functions such as indexing, searching, and visualization to enable efficient utilization during accident investigations.

2. Related Works

Unlike manually driven vehicles (MV), AVs operate by perceiving and judging surrounding conditions based on various sensors [6]. This characteristic necessitates investigations distinct from traditional accident investigation procedures centered on human factors. Procedural differences imply varying data requirements during investigations, directly necessitating a systematic data management framework and metadata schema design. Therefore, the existing literature is reviewed focusing on three themes. First, we meticulously review studies on accident investigation procedures for AVs and MVs. Second, we examine the types and characteristics of data that can be utilized in the accident investigation processes for AV and MV types, comparing their differences. Finally, studies related to data catalogs and metadata schemas were reviewed.

2.1. Accident Investigation Process Traffic Accident Involving AV

The procedures for investigating traffic accidents vary in detail depending on each country’s legal system and agency operations, but the basic framework is generally similar. They typically consist of the following stages: accident reporting and initial response, on-site investigation, interviews with parties involved and witnesses, determination of fault, and report preparation [7,8,9,10,11,12]. Furthermore, accident investigations for MVs are categorized as either general accidents or complex accidents based on their complexity [6]. The latter includes cases involving not only simple human factors but also vehicle defects or significant loss of life. Such accidents require in-depth field investigations, data analysis, and multi-agency cooperation, focusing on comprehensively identifying various factors including the vehicle’s physical condition, road environment, and driver behavior [8,13,14]. In contrast, accident investigation procedures for AVs have not yet been institutionally established. While some reports compiled after accidents have been accumulated [15], they have not been institutionalized as a systematic procedure. In this context, some studies have proposed expanded procedures reflecting AV characteristics [1,16]. Hoque and Hasan [16] presented a digital forensic framework applicable to accident investigations, reflecting the characteristics of AVs that rely on various sensors for driving. This framework enables the verification of the confidentiality and integrity of logs collected from sensors installed in AVs. Kim et al. [1] proposed specific investigation items necessary for AV accident investigations, emphasizing that processes for vehicle functionality verification, physical investigation, digital forensics, and sensor data analysis must be included in addition to existing MV investigation procedures.
Synthesizing prior research reveals that AV accident investigation procedures are significantly more complex than those for MVs. They require the utilization of multi-source data to address new investigation items such as sensor errors and cybersecurity factors. Therefore, it is necessary to examine data usable in accident investigation procedures specialized for AVs and to establish a data management system that captures the characteristics of these data.

2.2. Usable Data to Accident Investigation

Above-mentioned in the literature review of accident investigations, the investigation of the accidents involving AVs requires consideration of more comprehensive factors compared to conventional accident investigations. This calls for identifying and utilizing addtional data sources. Therefore, this examines the supplementary data required when AVs are involved in traffic accidents. Data commonly used in investigations involving both AVs and MVs include driving footage, video, and images recorded by in-vehicle mounted devices, and accident reports [17,18,19,20]. Investigation of accident involving AV requires additional data to investigate relevant factors such as the vehicle control system, V2X communications, vehicle sensors, road infrastructure, and the accident reports [21,22,23,24,25,26]. Dashcam footage is a primary data, providing a comprehensive visual context of the driving environment at the time of the accident [17]. The accident report is a report prepared by police dispatched to the scene, serving as a core document directly linked to accident cause analysis [19]. Recording devices within AVs are categorized as Event Data Recorder (EDR) and Data storage Systems for Automated Driving (DSSAD). The former provides the vehicle’s mechanical driving records (speed, brake operation, engine RPM, any other information that is applied for the accident investigation) immediately before and after the accident, which are used for accident reconstruction [27]. The latter is a device installed in AV(s) that can continuously store driving records depending on whether the ADS is active or not [28]. The device is considered essential for determining the cause of AV accidents because it contains not only the mechanical records but also raw data collected from sensors such as LiDAR, Radar, and camera, along with logs related to the AV’s perception, decision-making, and control [29]. In addition, V2X information to compensate for the limitations of sensor perception range [30]. However, this communication work can become a potential pathway for cyberattacks. Log data must be secured to determine efforts for cyber attack detection, path tracing, and its mitigation [31]. Finally, road infrastructure information plays a pivotal role in accident investigation by providing details about the AV’s planning and current surrounding environment, as it provides the ODD information necessary for driving [32]. Table 1 summarizes the types and characteristics of key data that can be utilized in investigations. Each data type has distinct storage trigger mechanisms and varying temporal resolutions. Furthermore, even within the same data category, it exists in diverse formats such as images and point clouds.

2.3. Metadata Schema and Data Catalog

A data catalog is defined as a core tool that systematically manages vast data assets and supports users in efficiently exploring and utilizing data [5,49,50]. The catalogs perform functions such as providing information about raw data, indexing, and visualization based on standardized metadata. Implementing these functions begins with building metadata based on a standardized metadata schema [51]. A metadata schema refers to a set of metadata elements defined for a specific purpose [52]. Utilizing the same schema facilitates compatibility between different data catalogs [53]. A prominent example among various schemas is Data Catalog Vocabulary (DCAT) [54]. DCAT categorizes information provided through metadata into Classes (e.g., dataset and service) and Properties (attributes within each Class). DCAT is utilized as an Application Profile (AP) by public data portals in various countries and for sharing data collected in diverse fields [53,55,56,57]. These either use only the necessary Classes based on their domain or introduce and utilize additional Classes. Table 2 compares the Classes defined in each AP, including DCAT. Catalog, Dataset, and Distribution Class are common to all schemas, signifying they are essential components for building data catalogs. Conversely, Classes such as DatasetSeries, Resource, DataService, CatalogRecord, Agent, Location, and LicenseDocument are only utilized in specific schemas, reflecting the particularities of each domain [54,55,56,57]. Notably, DCAT-Trans introduced the Taxonomy Class, emphasizing the importance of managing subject classification systems for transportation data [53]. APs combine commonly required Classes with domain-specific Classes to meet the needs of each field.
Furthermore, at the Property level, the defined attributes are extensive and detailed, necessitating organization through semantic-based categorization. Table 3 presents the results of classifying Properties into 12 types. These include attributes such as Data Modificaiton and Creation, Index and Classification, Description, Resolution, Metadata, Distribution, Spatiotemporal, Identification, Linkage and Relationship, Access and Rights, Provider and Manager, and Assistant, which are sufficiently applicable for general dataset management or public data portal operation [53,54,55,56,57]. The investigation for the accident with AVs involves diverse entities collecting and utilizing various data sources, including internal and external vehicle sensors, infrastructure, V2X communications, and high-precision maps. This calls for not only using conventional properties but also developing additional properties for the investigation. These additional properties include separate properties for utilizing non-standardized data of the same type, legal spatial information for accident cause investigations, and attributes specifically utilized in the investigation.
Metadata schema studies can be broadly categorized into four types based on the scope of their contributions: (1) proposal of a schema including new properties, (2) presentation of structural diagrams, (3) conduction of case studies, and (4) validation of the proposed schema. As summarized in Table 4, the foundational step of these studies is to define a metadata schema reflecting the requirements of a specific domain. Studies at this stage define schemas by collecting fragmented information through literature reviews and assuming the functions of future systems, even in the absence of actual data. In this process, conceptual diagrams are frequently accompanied to explain the logical structure of the schema [53,58]. Beyond theoretical proposals, research progresses to the case study stage to verify the applicability of the schema in environments where at least partial datasets can be secured. Studies at this stage typically fulfill the role of a Proof of Concept (PoC) by implementing pilot systems or visualization user interfaces based on the acquired partial data [59,60,61,62,63]. For instance, Labropoulou et al. [59] and Abaza et al. [62] implemented pilot systems because they could access partial data built into data repositories and registries related to their domains. Additionally, Kim et al. [63] confirmed the utility of the schema through the implementation of a visualization user interface, as they were able to access partial data from ongoing projects in the domain. Finally, empirical validation is performed when both operational systems and data are fully available. As seen in the study by Bermudez-Edo et al. [64], comprehensive validation is possible only when actual IoT systems and sensor data are fully established. However, in domains where acquiring actual data is extremely restricted, such as AV accident data, this level of validation is practically difficult to achieve. Therefore, despite these limitations, this study advances to the case study stage by implementing a visualization user interface applying the proposed schema. Through this approach, we aim to demonstrate that the schema can be effectively utilized in the actual accident investigation process, even within a limited data environment.

2.4. Contributions

First, previous studies have focused on reviewing accident investigation procedures individually or analyzing data usable for accident investigations in a fragmented manner. This approach has the limitation of not sufficiently reflecting the interconnectivity between procedures and data. Therefore, this study comprehensively reviews accident investigation procedures and usable data, systematically mapping the required data for each stage of the investigation process. Second, metadata schemas designed for building existing data catalogs primarily focused on public data or general dataset management, limiting their scope to defining property types at a level supporting data search and distribution. However, AV data possesses unique characteristics, such as high resolution and multi-sensor fusion, while simultaneously needing to reflect the requirements of accident investigation procedures. Therefore, this study designed a specialized metadata schema for autonomous driving that incorporates these specific characteristics and requirements. Finally, the data catalog UI was proposed not merely as a metadata schema design, but as a support tool that can be directly utilized during accident investigations, featuring indexing, search, and visualization capabilities.

3. Methodology

Figure 1 illustrates the overall flow of the research. The accident occurrence process of AVs is divided into Pre-, In-, and Post-crash stages, with information collected both inside and outside the vehicle being mixed at each stage. However, existing accident investigation procedures suffer from limitations: the division of investigation items among investigative entities is unclear, and the data required for each investigation item is not systematically defined. To address these limitations, three methodological approaches were implemented. First, the relationships between investigative entities, investigation items, and data were mapped according to the procedures for each accident phase, establishing structural linkages. Second, based on the mapping results, additional properties specific to AV accident investigations were derived. Third, reflecting these, an extended metadata schema was designed. Finally, to verify the schema’s practical applicability, a data catalog UI example was conceptualized.

3.1. Mapping of Entities-Itemes-Data on Investigation Process

Accident investigations involving AVs have a broader scope and require more complex data types compared to traditional accident investigations. Accidents vary in their investigating entities and procedures depending on the stage of occurrence, and the information required at each stage also differs. Accordingly, we mapped the investigation procedures, investigating entities, and investigation items based on the accident occurrence stage to establish a structured relationship. Furthermore, by aligning the derived investigation items with the data that can actually be collected and utilized, we concretized the data utilization structure necessary for autonomous driving accident investigations.
Figure 2 shows the structure mapping the investigation entities, procedures, and investigation items (classification and subclassification) for traffic accidents involving AV. The classification of investigation items references the research by Kim et al. [2]. This figure is broadly divided into the Post, Pre-In, Cause Determination, and Report Writing stages, illustrating the relationship between the investigation entities and procedures performed at each stage, and the investigation items collected and analyzed through them. In the Post stage, local police dispatched upon receiving the accident report handle initial response and on-site investigation. The on-site investigation records and analyzes the accident scene, focusing on external factors such as the accident overview, parties involved, objects, traffic conditions, and environment. Specifically, this includes vehicle movement trajectories, road facility locations, and road obstacles. In the Pre-In phase, the AV accident investigation team leads an in-depth investigation to determine the cause of the accident. This process consists of three divisions: the vehicle investigation team, the digital forensics team, and the virtual environment investigation team.
  • Vehicle Investigation
    The vehicle investigation team conducts physical defect investigations and system defect investigations. Physical defect investigations examine the vehicle’s basic information (operating mode, key functions, vehicle status, etc.), hardware (H/W), and whether the chassis system is functioning properly. System defect investigations check for errors in the Human–Machine Interface (HMI), software, and functional modules, analyzing software versions and system logs as well.
  • Digital Forensic
    The digital forensics team focuses on investigating communication failures and security vulnerabilities. They inspect the V2X communication status of AVs and the safety of communication infrastructure, assess the potential for cybersecurity breaches, and verify whether accidents occurred due to external factors.
  • Virtual Environments Investigation
    The virtual environment investigation team reviews environmental factors such as high-precision maps, road design, and traffic operation status to determine whether AVs accurately perceived the actual road environment through their sensors.
Finally, the findings from each stage are consolidated by the autonomous driving accident analysis team. Based on this, the final cause of the accident is determined and the final report is prepared.
Table 5 distinguishes between data currently collectable in AV accident investigations and data which are necessary for investigation but not collected. For example, DSSAD data enables investigation into the perception-decision-control flow of AVs. V2X data (BSM, CAM, and any other sources) can verify external communications and surrounding situational information, while HD Map and Traffic Signal data are utilized to analyze virtual environment and infrastructure elements. Conversely, data gaps exist for HMI, software and hardware versions, physical and cyber security events, and any other elements where currently no usable data is available. These findings should also be reflected in future metadata schema design directions. Data linkage tasks according to the investigation phase are required. Autonomous driving accident investigations can be subdivided into pre-crash, post-crash, and in-crash phases, as well as cause and fault determination stages. This necessitates clear definitions of how various data are utilized according to specific procedures and purposes. Furthermore, differences in the roles and scope of responsibility between the investigating entity and the data holder complicate data sharing for accident investigations. Although data are often collected and utilized simultaneously within a single incident, interconnections between data types (cross-modality references, temporal baselines, and sources) are not systematically documented. Furthermore, cybersecurity and communication-related data are essential for investigating accidents involving AV and must be incorporated. Discrepancies in temporal and spatial resolution create the following limitations for data synchronization and fusion analysis: (1) Inconsistent provision of linkage structures and spatial context information between heterogeneous data types such as video, messages, and HD maps; (2) Reduced interoperability due to differing standards and formats adhered to by each dataset; (3) Limited reconstruction of precise incident information at the event level due to differing data collection cycles (e.g., continuous recording vs. event-based recording), and (4) Unclear documentation of the purpose for utilizing individual datasets within accident investigation reports.

3.2. Definition of Classes and Properties Under Essential DCAT and AP

The AP design used for AV accident investigations must first review the Classes and Properties defined in the previously introduced DCAT and APs. DCAT centers on three core Classes: Catalog, Dataset, and Distribution [53,54,55,56,57]. These form the minimum units for building a data catalog. The metatdata schema for AV accident investigations adds seven Classes-Relationship, Location, DataService, PeriodOfTime, Agent, License, and Checksum. Relationship-to the three core Classes.Relationship describes the linkage structure between datasets. Location includes spatial attribute information such as the spatial form (poin, line, and area) of data used in the investigations and the geographic location of that data (top-level administrative district to bottom-level administrative district). PeriodOfTime refers to comprehensive information about the overall time events from the point of acquisition to the end of the data catalog. DataService supports users in accessing data in real-time and receiving it in the desired format through indexing processes. To this end, it includes functions such as standardized endpoints (URLs) for data access and descriptions of these endpoints (API usage methods). Agent and License clearly define the responsible entity for the data and the usage conditions (rights, regulations, and so forth). Finally, Checksum includes content to support the integrity and verification of data recorded in the catalog. Table 6 presents the results and descriptions of matching the 12 types of Properties proposed in this study to 10 Classes, along with descriptions of the AV specific Properties in this study. Properties by type can be used redundantly across Classes, but each may contain distinct core Properties. For instance, the Dataset Class includes the description and Data modification and creation types, with detailed properties such as ‘title’, ‘description’, and ’modified’. Distribution includes types like access and rights, and distribution information, with corresponding detailed properties including ‘byteSize’, ‘formatDetail’, and ’accessURL’. Finally, Catalog requires Assistance and index and classification types as mandatory, with detailed properties including ‘catalog’, ‘service’, and ’themeTaxonomy’. Ultimately, the 54 sub-Properties within the 12 borrowed Property types either re-use existing DCAT and APs definitions or extend meanings to reflect the specific characteristics of AV data. For example, spatialResolution was expanded beyond the quantitative notation standard of ‘1m’ required by existing DCAT. It now allows for the granular expression of spatial unit characteristics, such as point-level resolution for individual objects, line-level resolution for linear objects, and area-level resolution for area objects. Additionally, the functionality of ‘isRequiredBy’ has been enhanced to explicitly specify related datasets that must be secured beforehand to utilize a specific dataset. The use of existing Properties had limitations in fully explaining AV accident investigations. For example, issues relating to cybersecurity and communication errors, such as hacking attempts or authentication failures detected during data collection, are not recorded. Multimodal data linkage also fails to account for issues such as time synchronization between sensors, coordinate system mismatches and data encoding formats. Furthermore, AV accident investigations require distinct items of investigation and procedures for each stage- Pre-, In- and Post-Crash, and the entities responsible for conducting them differ. However, the existing Properties fail to reflect these difference and roles specific to each stage and entity. This calls for introducing new properties to deal with four key areas: accident investigation process linkage; data heterogeneity management; multimodal data linkage and spatial information; and special causes specific to autonomous driving, such as security and communication issues. First, the accident investigation process linking property was introduced to reflect the difference between the investigation process and the subject of the accident, which are Post, Pre-In, and Cause Identification. These properties include ‘investigationStep’, which indicates at which stage of the investigation the data is utilized, enabling tracking of utilized data by investigation procedure; unlike existing attributes that solely track data generation time, this property explicitly aligns data with specific investigation phases. ‘prePostEventWindow’, which specifies the range of records before and after the event reference point, enabling matching of the time flow of analytics utilizing the data; and ‘caseNum’, which provides information about the unique number that identifies an individual incident, enabling data collected from various sources to be combined into a single incident. While the DCAT ‘identifier’ manages individual datasets, ‘caseNum’ functions as a overarching identifier that aggregates fragmented evidence in to a single unified incident. In addition, ‘investigationAgency’ is a sub-Property that provides information about the agency or department that conducted the investigation, allowing for clarity on the subject and scope of the investigation and a transparent record of evidence utilization. Finally, it also includes ‘reportingPurpose’, which provides the purpose for which the data or document was generated, and ’analysisSupportLevel’, which contains information about the scope and constraints of analytics utilization. Next, the properties used for data heterogeneity management are designed to manage the differences in format, resolution, and storage conditions of various sensor data in AVs. ‘sensorType’ and ‘dataModality’ provide a separation between the collection equipment and data representation format, enabling the management of different types of data collected from the same sensor. This addresses the limitations of existing properties, which solely describe file extensions and fail to distinguish between sensor modalities critical for AV accident investigation. ‘samplingRate’ and ‘videoResolution’ provide information on the resolution of sensor-specific data to identify resolution differences when synchronizing and fusing with other data. ‘clipLength’ provides information about the length of video data before and after the accident, which can be used with ‘prePostEventWindow’ to secure a baseline for accident recreation, analysis, etc., when analyzing the accident. ‘triggerMechanism’ provides information about continuous recording and event-based storage conditions, allowing you to track data collection intervals, time ranges, etc. ‘formatDetail’ describes the specific format of the data, including how it is encoded, to facilitate interoperability and utilization across data. ‘dataGranularityLevel’ is a property that categorizes and provides a level of resolution so that practitioners can determine how usable the data is. Finally, ‘beforebyteSize’ provides the file size before compression and can be utilized in conjunction with ‘byteSize’, ‘algorithm’, and ’checksumValue’ to ensure the integrity of large log and video data and the reliability of the distribution process. In addition, multimodal data linkage and spatial information enables the fusion and interpretation of data utilized in accident investigations. These include ‘multiModalLinkage’, which records the linkage between sensors, such as camera-lidar matching; unlike existing properties that merely indicate general associations, this property provides specific information required for sensor fusion. ‘videodatabbox’, which describes the spatial boundaries of video data; ‘geocoordinated’, which provides reference coordinate system information; and ‘geoContextType’, which describes the spatial context, such as road network-based or administrative district-based. Finally, the property of any special cause for autonomous driving is supplemented to investigate non-physical causes such as cyberattacks or communication errors. Two new classes have been added: ‘cyberSecurityEvent’, which records security events detected during data collection, filling the gap in the standard DCAT, which lacks properties for recording non-physical anomalies such as unauthorized access attempts; and ‘dataSourceEntity’, which specifies the entity or equipment that collected the data. These extensions bring the total number of classes to 10, including the three core classes, and utilize 76 properties. Of these, 54 are reuses and semantic extensions of existing DCAT properties, and 22 properties are newly introduced in four categories that are directly relevant to AV accident investigations: data heterogeneity management, multimodal data linkage and spatial information, and autonomous driving special cause considerations (security and communication).

3.3. Meta Data Schema for AV Accident Investigation

Figure 3 visualizes the core structure of the metadata schema for AV accident investigation in Unified Modeling Language (UML) style. UML is a language that visualizes the structure of complex systems, such as software or data, using classes, properties, and relationship symbols, there by aiding system design and documentation [68]. This diagram allows for an intuitive understanding of which properties belong to each class and how the connection structure between classes is formed.
The gray boxes represent Classes. The Dataset Class concentrates key Properties reflecting autonomous driving contexts, such as ‘investigationStep’, ‘prePostEventWindow’, ‘eventTimeMarker’, ‘sensorType’, ‘dataModality’, ‘samplingRate’, ‘clipLength’, ‘analysisSupportLevel’, and ‘dataGranularityLevel’. Relationship Classes enable cross-referencing between related datasets, including ‘multimodalLinkage’, which supports linking heterogeneous data. Distribution Classes include ‘formatDetail’ describing the specific format and standards adhered to by the provided data, and are linked with Checksum Classes to verify the integrity of received data. The Location Class introduces ‘geoContextType’ and ‘videodatabox’ to specifically describe the spatial context of video data, while the Agent Class adds ‘investigationAgency’ to clarify the investigating entity. Notably, UML diagrams can represent data flow and dependencies through association lines between Classes. The Dataset Class connects to Classes such as Distribution, Location, and PeriodOfTime, referencing the spatio-temporal scope and provision method of the data. Furthermore, the Dataset Class links to other datasets via the Relationship Class and connects to the Agent Class to clearly define the investigating entity and usage context. This relational structure allows for a structural understanding of how data is linked within the system for AVs accident investigations and which entities utilize it in which procedures, going beyond simple attribute definitions.

3.4. The Visualization UI Based on the Designed Metadata Schema

Figure 4 illustrates how datasets are organized into a multi-level taxonomy, and how this taxonomy is mapped to catalogs and datasets to consistently drive filtering and visualization. The first taxonomy (vehicle, communication, and road, etc.) and the second taxonomy (coordinates, dynamics, object interaction, system state, messages, etc.) are the higher-level thematic axes of the dataset, which can be hierarchically registered in a ‘themeTaxonomy’. This two-level categorization is more than just a thematic organization, it is directly related to the topics of AV accident investigations, providing a basis for data utilization by investigation procedures. The third level of detail is assigned as a ‘keyword’ for each dataset to enable multiple filtering capabilities. This tiered taxonomy is mapped to DCAT’s Catalog and Dataset structure, allowing users to filter from the Catalog UI by progressively reducing the data required from the first to the third taxonomy. In particular, the catalog is divided into Pre-Crash, In-Crash, and Post-Crash, and the required research items and datasets correspond to each stage. Thus, users can go beyond simply selecting datasets by topic while browsing the catalog and selectively search for data related to procedures before, during, and after an accident. Our taxonomy can also perform linkage discovery between related datasets. A dataset can be included in multiple catalogs at the same time, and conversely, a catalog can encompass multiple datasets. These many-to-many referential relationships can be described via ‘qulifiedRelation’. The duplication and versioning of datasets is controlled by utilizing ‘identifier’ and ‘version’. As a result, the proposed linkage is more than just a data management tool, it provides a basis for data utilization that is directly related to the stages of an autonomous driving accident investigation and the investigation items. The visualization is automatically determined by the results of the classification selection and the resolution metadata of the data. By referring to ‘samplingRate’, ‘temporalResolution’, and ’prePostEventWindow’ for the temporal axis and ‘spatialResolution,’geometry’,’bbox’, and ’geoContextType’ for the spatial axis, investigators can configure timelines based on frames, minutes, hours, and event windows, as well as map representations at the point, line, plane, and network levels. In addition, the difference in data format can be declared as ‘format’ and ‘formatDetail’, and the level of analysis available can be declared as ‘analysisSupportLevel’ to enable appropriate visualization features such as trajectory line, traffic light status, object detection time, TTC, etc. depending on the selected combination of classification and resolution. Note that the data-visualization feature relationship is not a 1:1 mapping. A dataset can be extended to multiple visualizations, or a visualization can fuse multiple datasets using a ‘multiModalLinkage’ to support fusion interpretation of heterogeneous data. For example, one of the visualizations, Accident location marker, displays accident locations as dots on a map. The data utilized here (Vehicle IMU Event) has a classification path of Vehicle → Physical → Coordinates → IMU-based, and the dataset is included in the In-crash catalog. The data is first filtered by using the classification path to filter candidate datasets, then the ‘eventTimeMarker’ is used to calculate the time interval to be displayed, and the accident location is displayed on the map by referring to the coordinate system described in the ‘geocoordinate’. In addition to the vehicle IMU-based location data, the accident point coordinates can be calibrated by referring to the location field of the CAM data or the road geometry to improve temporal and spatial consistency and reliability. These fusion relationships are declared using ‘multiModalLinkage’.
Figure 5 depicts an actual implementation of the autonomous driving-specific metadata schema as a “data discovery UI”. It can be seen that the autonomous driving-specific properties designed in the schema are not just descriptive, but also enable users to index data, obtain data information, and provide purposeful and relevant data. The organization of the figure is divided into four areas, but each element is mapped one-to-one to a property in the schema. First, the filtering area on the left side of the screen shows the schema’s multi-level taxonomy. The parent taxonomy corresponds to ‘themeTaxonomy’ and the child taxonomy corresponds to ‘keyword’ to support data exploration. The widget for selecting the time range of the data defines the searchable interval by referring to the metadata described in ‘temporal’. In addition, the selection of the data collection device utilizes ‘dataModality’ and ‘sensorType’, and the section for selecting the format of the data to be provided is mapped to ‘format’ and ‘formatDetail’. Options such as the number of preview rows of filtered data are then tied to the ‘analysisSupportLevel’. It allows users to perform complex filtering in the following order: “Select a classification -> Set the required data time range -> Select the format and standard of the data to be provided”. Next, the Metadata and Header viewer on the top right of the screen summarizes key information at the dataset level, and the properties used are ‘title’, ‘creator’, ‘wasGeneratedBy’, ‘contactPoint’, ‘investigationAgency’, and ’license’. Next, the header information is implemented by providing the metadata defined in ‘description’ in the form of a list. The associated navigation area in the center right of the screen provides information about related data. It is automatically generated based on the information in the ‘qualifiedRelation’ or ‘multiModalLinkage’ property of the dataset selected by the user through the filter. Finally, the Provide panel at the bottom right of the screen utilizes the ‘accessService’ and ‘downloadURL’ properties to provide API and file download paths, and provides sample previews based on the ‘analysisSupportLevel’. Users can obtain data immediately in the format and standard of their choice, or use the API to connect with other programs to leverage the data.

4. Case Study: Application of the New Methodology for Autonomous Vehicle Accident Investigation

The core objective of this section is to demonstrate that a schema constructed based on the procedural and data attributes of AV accidents can approximate the initiation time of security events by combining only the visualization layers selected by the user, after aligning heterogeneous data across temporal and spatial dimensions. Table 7 shows an accident scenario caused by a cybersecurity issue, partially adapted from accident scenarios involving AVs caused by cyberattacks presented in the study by Girdhar et al. [31]. This was caused by hacking the traffic signal control system, resulting in the transmission of erroneous data to the AV.
Figure 6 shows an example of applying the propsed AV specific metadata shema to accident scenario analysis. The UI suggests possible visualization candidates from the metadata, but the user has complete control over which layers to turn on or off. The visualization analysis procedure is as follows: (1) Layer Selection: The user manually selects spatial-based layers (accident location map, vehicle trajectory sequence, geometry visualization) and temporal-based layers (vehicle motion state, vehicle control actions, inter-vehicle interaction, autonomous system response, and external driving context). (2) Specify analysis time window: Determine the temporal scope of the analysis based on the ‘prePostEventWindow’ around the reference time ‘eventTimeMarker’. (3) Alignment and Overlay: Utilizing the selected visualization layers, the Temporal visualization layer is aligned according to ‘samplingRate’ and ‘temporalResolution’ based on the ‘eventTimeMarker’ information. The Spatial visualization layer is rendered according to ‘geoContextType’ and ‘spatialResolution’. (4) Anomaly Detection: Users identify the first point time t * where discrepancies occur between infrastructure signal data and vehicle reception signal data, based on the visualized layer information. The UI provides auxiliary features such as interval changes and change point indicators, but the final determination is made by the user. This configuration utilizes properties specifying data formats and standards to automatically inject parameters for data consistency during visualization, ensuring interoperability and reproducibility. It also records user selection history to maintain analysis transparency. As a result, it achieves a division of responsibilities where the system handles data alignment and synchronization while the investigator observes and judges the results. This supports practical procedures for rapidly and precisely narrowing down incidents suspected of security compromise based on data.

5. Conclusions

Accident investigations for AVs must leverage heterogeneous and massive amounts of data generated from diverse sources, including vehicle sensors, onboard recorders, V2X communications, and road infrastructure. Within this context, this study first systematically mapped the investigation entities, items, and data for each accident phase. Second, it designed an AV-specific metadata schema using DCAT. Third, to demonstrate the usability of the design results, an example of a data catalog UI is presented. Specifically, the data linkage according to the accident investigation procedure is explicitly defined, and attributes are structured to express unique requirements of AVs, such as temporal and spatial resolution, storage triggers, data fusion, and cybersecurity. As a result, we propose a schema comprising 76 properties: 54 reused from existing standards plus 22 AV-specific attributes. The proposed schema enhances data continuity and collaborative accountability during accident investigations by specifying responsibilities according to the investigation procedure. It also strengthens interoperability between heterogeneous data by explicitly defining characteristics (resolution, format, standards, etc.) as metadata. Finally, this study presented a case study (cybersecurity scenario) demonstrating how the data catalog UI, applying this schema and indexes, and it provides necessary data while performing multi-data timeline visualization. This case confirmed that by synchronizing DSSAD, V2X, and signal data around security event metadata, the initiation point of a cyberattack and the resulting changes in vehicle control and communication status can be rapidly identified. However, there were several limitations. First, access to security-related data from AV is currently restricted, preventing full reflection of metadata requirements in this domain. Second, some cases relied on data commonly collected from both AVs and MVs, such as EDRs and accident reports. Third, data not actually existing was described using literature-based explanations. Future research will collect DSSAD, V2X communications, security event logs, and other data generated in actual AV operating environments to enhance the proposed schema. At this stage, data collection from roadways is primarily limited to vehicle status information. However, investigating cyber attacks requires comprehensive communication records, including those from download and upload systems. Therefore, future study needs to implement such scenarios within an autonomous driving simulator environment. In addition, as a next step, a study needs to be conducted to verify how accurately the relevant schema properties map to the data generated in the virtual environment. Furthermore, by developing a data catalog UI prototype based on the enhanced schema and conducting user evaluations, we expect to validate its practical applicability in AV accident investigations.

Author Contributions

Conceptualization, M.K. and T.-J.S.; methodology, M.K. and N.K.; investigation, M.K., N.K. and H.K.; resources, M.K.; data curation, M.K. and N.K.; writing—original draft preparation, M.K.; writing—review and editing, M.K., N.K., H.K. and T.-J.S.; visualization, M.K. and N.K.; supervision, T.-J.S.; project administration, T.-J.S.; funding acquisition, T.-J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a Korea Institute of Policy Technology (KIPoT) grant funded by the Korea government (KNPA, Korean National Police Agency) (No. 092021D74000000). Development of a data extraction and analysis system for DSSAD (Data Storage System for Automated Driving).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

References

  1. Chougule, A.; Chamola, V.; Sam, A.; Yu, F.R.; Sikdar, B. A comprehensive review on limitations of autonomous driving and its impact on accidents and collisions. IEEE Open J. Veh. Technol. 2023, 5, 142–161. [Google Scholar] [CrossRef]
  2. Kim, H.; Han, H.; You, Y.; Cho, M.J.; Hong, J.; Song, T.J. A Comprehensive Traffic Accident Investigation System for Identifying Causes of the Accident Involving Events with Autonomous Vehicle. J. Adv. Transp. 2024, 2024, 9966310. [Google Scholar] [CrossRef]
  3. Terrizzano, I.G.; Schwarz, P.M.; Roth, M.; Colino, J.E. Data Wrangling: The Challenging Yourney from the Wild to the Lake. In Proceedings of the CIDR, Asilomar, Pacific Grove, CA, USA, 10 October 2015. [Google Scholar]
  4. Beamer, A. Map metadata: Essential elements for search and storage. Program 2009, 43, 18–35. [Google Scholar] [CrossRef]
  5. Stillerman, J.; Fredian, T.; Greenwald, M.; Manduchi, G. Data catalog project—A browsable, searchable, metadata system. Fusion Eng. Des. 2016, 112, 995–998. [Google Scholar] [CrossRef]
  6. Yeong, D.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef]
  7. Korea Law Information Center. Enforcement Decree of the Road Traffic Act. 2017. Available online: https://www.law.go.kr/ (accessed on 20 July 2025).
  8. National Transportation Safety Board. The Investigative Process. 2025. Available online: https://www.ntsb.gov/ (accessed on 20 July 2025).
  9. Employment and Social Development Canada. Investigations of Motor Vehicle Accidents on Public Roads—IPG-066. Effective Date: January 2009. 2009. Available online: https://www.canada.ca/en/employment-social-development/programs/laws-regulations/labour/interpretations-policies/066.html (accessed on 20 July 2025).
  10. Essex Police. H 0602 Procedure—Road Traffic Collisions (Investigations). Version 18—October 2024. 2024. Available online: https://www.essex.police.uk/ (accessed on 20 July 2025).
  11. Bundesministerium der Justiz und für Verbraucherschutz. Straßenverkehrsgesetz (StVG). 2025. Available online: https://www.gesetze-im-internet.de/stvg/ (accessed on 20 July 2025).
  12. Government of Sweden. Accident Investigation Act (1990:712). 1990. Available online: https://www.shk.se/ (accessed on 20 July 2025).
  13. Korea Road Traffic Authority (Koroad). Engineering Analysis of Traffic Accidents on Request from Judicial Institutions. 2025. Available online: https://www.koroad.or.kr/eng/content/view/ME02080000.do (accessed on 20 July 2025).
  14. College of Policing. Investigation of Fatal and Serious Injury Road Collisions. 2023. Available online: https://www.college.police.uk/ (accessed on 20 July 2025).
  15. California Department of Motor Vehicles. Autonomous Vehicle Collision Reports. 2025. Available online: https://www.dmv.ca.gov/ (accessed on 20 July 2025).
  16. Hoque, M.A.; Hasan, R. AVGuard: A Forensic Investigation Framework for Autonomous Vehicles. In Proceedings of the ICC 2021—IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
  17. Giovannini, E.; Giorgetti, A.; Pelletti, G.; Giusti, A.; Garagnani, M.; Pascali, J.; Pelotti, S.; Fais, P. Importance of dashboard camera (Dash Cam) analysis in fatal vehicle-pedestrian crash reconstruction. Forensic Sci. Med. Pathol. 2021, 17, 379–387. [Google Scholar] [CrossRef]
  18. Niehoff, P.; Gabler, H.C.; Brophy, J.; Chidester, C.; Hinch, J.; Ragland, C. Evaluation of event data recorders in full systems crash tests. In Proceedings of the 19th International Conference on the Enhanced Safety of Vehicles, Washington, DC, USA, 6–9 June 2005. [Google Scholar]
  19. Oh, G.; Ko, W.; Park, J.; Yun, I.; SO, J.J. Study on the improvement of traffic accident report for automated vehicle test scenarios. J. Korea Inst. Intell. Transp. Syst. 2022, 21, 167–182. [Google Scholar] [CrossRef]
  20. Kwayu, K.M.; Kwigizile, V.; Lee, K.; Oh, J.S. Discovering latent themes in traffic fatal crash narratives using text mining analytics and network topology. Accid. Anal. Prev. 2021, 150, 105899. [Google Scholar] [CrossRef]
  21. Pisu, P.; Soliman, A.; Rizzoni, G. Vehicle chassis monitoring system. Control Eng. Pract. 2003, 11, 345–354. [Google Scholar] [CrossRef]
  22. Smith, T.; Toth, C.; Timcho, T. Sharing and Using Connected Device Data to Improve Traveler Safety and Traffic Management—Concept of Operations, Use Cases, Traveler Information Needs, Messages, and Requirements; Report FHWA-HRT-23-030; WSP USA Inc.: New York, NY, USA; Cambridge Systematics, Inc.: Medford, MA, USA; Federal Highway Administration, Office of Operations Research, Development, and Technology: McLean, VA, USA, 2023.
  23. National Transportation Safety Board. Collision Between Car Operating with Partial Driving Automation and Truck-Tractor Semitrailer, Delray Beach, Florida, March 1, 2019; Highway Accident Brief NTSB/HAB-20/01; National Transportation Safety Board: Washington, DC, USA, 2020.
  24. National Transportation Safety Board. Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian, Tempe, Arizona, March 18, 2018; Highway Accident Report NTSB/HAR-19/03; National Transportation Safety Board: Washington, DC, USA, 2019.
  25. Feifel, H.; Erdem, B.; Menzel, D.; Gee, R. Reducing Fatalities in Road crashes in Japan, Germany, and USA with V2X-enhanced-ADAS. In Proceedings of the 27th Enhanced Safety of Vehicles (ESV), Conference, Yokohama, Japan, 3–6 April 2023; pp. 3–6. [Google Scholar]
  26. Khattak, M.; De Backer, H.; De Winne, P.; Brijs, T.; Pirdavani, A. Analysis of Road Infrastructure and Traffic Factors Influencing Crash Frequency: Insights from Generalised Poisson Models. Infrastructures 2024, 9, 47. [Google Scholar] [CrossRef]
  27. da Silva, M.P. Analysis of Event Data Recorder Data for Vehicle Safety Improvement; John, A., Ed.; Technical Report HS-810 935; DOT-VNTSC-NHTSA-08-01; Volpe National Transportation Systems Center (U.S.): Cambridge, MA, USA, 2008.
  28. Kim, I.; Lee, G.; Lee, S.; Choi, W. Data Storage System Requirement for Autonomous Vehicle. In Proceedings of the 2022 22nd International Conference on Control, Automation and Systems (ICCAS), Busan, Republic of Korea, 27 November–1 December 2022; 2022; pp. 45–49. [Google Scholar] [CrossRef]
  29. Hyun, S.; Son, J.; Oh, Y.; You, B. A study of the DSSAD data elements derivation through autonomous driving data analysis on expressways. J. Korea Inst. Intell. Transp. Syst. 2024, 23, 97–106. [Google Scholar] [CrossRef]
  30. Jung, C.; Lee, D.; Lee, S.; Shim, D. V2X-Communication-Aided Autonomous Driving: System Design and Experimental Validation. Sensors 2020, 20, 2903. [Google Scholar] [CrossRef] [PubMed]
  31. Girdhar, M.; You, Y.; Song, T.J.; Ghosh, S.; Hong, J. Post-accident cyberattack event analysis for connected and automated vehicles. IEEE Access 2022, 10, 83176–83194. [Google Scholar] [CrossRef]
  32. Pai, V.N.; Barosan, I.; Khabbaz Saberi, A. Map and Its Impact on the Functional Safety of Automated Driving Vehicles. J. Softw. Eng. Auton. Syst. 2023, 1, 17–27. [Google Scholar] [CrossRef]
  33. Moura, D.; Zhu, S.; Zvitia, O. Nexar Dashcam Collision Prediction Dataset and Challenge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
  34. Che, Z.; Li, G.; Li, T.; Jiang, B.; Shi, X.; Zhang, X.; Lu, Y.; Wu, G.; Liu, Y.; Ye, J. D2-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios. arXiv 2019, arXiv:1904.01975. [Google Scholar]
  35. Chen, R.J.; Tatem, W.M.; Gabler, H.C. Event Data Recorders (EDRs) Duration Study: Final Report; Final Report NHTSA Supplemental Report; Submitted to National Highway Traffic Safety Administration; Virginia Tech, Department of Biomedical Engineering and Mechanics: Blacksburg, VA, USA, 2017. [Google Scholar]
  36. Gabler, H.; Gabauer, D.; Newell, H.; Glassboro, N. Use of Event Data Recorder (EDR) Technology for Highway Crash Data Analysis. NCHRP Project 2004, 17–24. [Google Scholar]
  37. Chapman, S. Automated Vehicle Safety Assurance—In-Use Safety and Security Monitoring: Task 2—Minimum Dataset Specification; Published Project Report PPR2017 TETI0042; Prepared for Department for Transport; Version 1.0; Copyright © TRL Limited; TRL Limited: Wokingham, UK, 2022. [Google Scholar]
  38. UNECE World Forum for Harmonization of Vehicle Regulations (WP.29). DSSAD Guidance Document; Informal Document WP.29-196-09; Submitted to the 196th Session of the World Forum for Harmonization of Vehicle Regulations (WP.29); UNECE: Geneva, Switzerland, 19 June 2025; Available online: https://unece.org/transport/documents/2025/06/informal-documents/grva-dssad-guidance-document (accessed on 20 July 2025).
  39. SAE International. V2X Communications Message Set Dictionary; Technical Report SAE J2735_202409; Revised September 2024; SAE International: Warrendale, PA, USA, 2024. [Google Scholar]
  40. ETSI. Intelligent transport systems (its); vehicular communications; basic set of applications; part 2: Specification of cooperative awareness basic service. Draft ETSI TS 2011, 20, 448–451. [Google Scholar]
  41. No. EN 302 637-3; Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Part 3: Specifications of Decentralized Environmental Notification Basic Service. ETSI: Sophia Antipolis, France, 2019.
  42. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
  43. KITTI Vision Benchmark Suite. Available online: https://www.cvlibs.net/datasets/kitti/raw_data.php (accessed on 20 July 2025).
  44. Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
  45. Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. arXiv 2019, arXiv:1903.11027. [Google Scholar]
  46. nuScenes Dataset. Available online: https://www.nuscenes.org/nuscenes (accessed on 20 July 2025).
  47. Tampa CV Pilot Signal Phasing and Timing (SPaT) Sample. Available online: https://data.transportation.gov/Automobiles/Tampa-CV-Pilot-Signal-Phasing-and-Timing-SPaT-Samp/xn7c-yu2n/about_data (accessed on 20 July 2025).
  48. Wilson, B.; Qi, W.; Agarwal, T.; Lambert, J.; Singh, J.; Khandelwal, S.; Pan, B.; Kumar, R.; Hartnett, A.; Pontes, J.K.; et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv 2023, arXiv:2301.00493. [Google Scholar] [CrossRef]
  49. Yee, M.; Surkis, A.; Lamb, I.; Contaxis, N. The NYU Data Catalog: A modular, flexible infrastructure for data discovery. J. Am. Med Informatics Assoc. 2023, 30, 1693–1700. [Google Scholar] [CrossRef]
  50. Dibowski, H.; Schmid, S.; Svetashova, Y.; Henson, C.; Tran, T. Using Semantic Technologies to Manage a Data Lake: Data Catalog, Provenance and Access Control. In Proceedings of the SSWS@ ISWC, Athens, Greece, 2–6 November 2020; pp. 65–80. [Google Scholar]
  51. Cherradi, M.; Bouhafer, F.; Haddadi, A.E. Data lake governance using IBM-Watson knowledge catalog. Sci. Afr. 2023, 21, e01854. [Google Scholar] [CrossRef]
  52. Anil Hirwade, M. A study of metadata standards. Libr. Hi Tech News 2011, 28, 18–25. [Google Scholar] [CrossRef]
  53. Shin, D.K.; Lee, S.H.; Kang, J.; Park, E.M. Data catalogue standards based on dcat for transportation data: Dcat-trans. J. Korean Soc. Transp. 2019, 37, 430–444. [Google Scholar] [CrossRef]
  54. Albertoni, R.; Browning, D.; Cox, S.; Beltran, A.; Perego, A.; Winstanley, P. Data Catalog Vocabulary (DCAT)-Version 3, 2024. w3C Recommendation. Available online: https://www.w3.org/TR/vocab-dcat-3/ (accessed on 1 October 2025).
  55. European Commission. DCAT Application Profile for Data Portals in Europe (DCAT-AP)—Version 3.0.0, 2024. Interoperable Europe Portal. Available online: https://interoperable-europe.ec.europa.eu/collection/semic-support-centre/solution/dcat-application-profile-data-portals-europe/release/300 (accessed on 1 October 2025).
  56. DCAT-AP.de. DCAT-AP.de Specification—Version 3.0, 2024. DCAT-AP.de Portal. Available online: https://www.dcat-ap.de/def/dcatde/3.0/spec/ (accessed on 1 October 2025).
  57. GeoDCAT-AP. GeoDCAT-AP 3.0.0. 2025. Available online: https://semiceu.github.io/GeoDCAT-AP/releases/3.0.0/ (accessed on 20 July 2025).
  58. Canham, S.; Ohmann, C. A metadata schema for data objects in clinical research. Trials 2016, 17, 557. [Google Scholar] [CrossRef]
  59. Labropoulou, P.; Gkirtzou, K.; Gavriilidou, M.; Deligiannis, M.; Galanis, D.; Piperidis, S.; Rehm, G.; Berger, M.; Mapelli, V.; Rigault, M.; et al. Making metadata fit for next generation language technology platforms: The metadata schema of the european language grid. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 3428–3437. [Google Scholar]
  60. Welten, S.; Neumann, L.; Yediel, Y.U.; da Silva Santos, L.O.B.; Decker, S.; Beyan, O. DAMS: A distributed analytics metadata schema. Data Intell. 2021, 3, 528–547. [Google Scholar] [CrossRef]
  61. Mukherjee, S.; Das, R. Integration of domain-specific metadata schema for cultural heritage resources to DSpace: A prototype design. J. Libr. Metadata 2020, 20, 155–178. [Google Scholar] [CrossRef]
  62. Abaza, H.; Shutsko, A.; Klopfenstein, S.A.; Vorisek, C.N.; Schmidt, C.O.; Brünings-Kuppe, C.; Clemens, V.; Darms, J.; Hanß, S.; Intemann, T.; et al. Toward a Domain-Overarching Metadata Schema for Making Health Research Studies FAIR (Findable, Accessible, Interoperable, and Reusable): Development of the NFDI4Health Metadata Schema. JMIR Med. Inform. 2025, 13, e63906. [Google Scholar] [CrossRef]
  63. Kim, E.; Kim, J.; Woo, W. Metadata schema for context-aware augmented reality applications in cultural heritage domain. In 2015 Digital Heritage; IEEE: Piscataway, NJ, USA, 2015; Volume 2, pp. 283–290. [Google Scholar]
  64. Bermudez-Edo, M.; Elsaleh, T.; Barnaghi, P.; Taylor, K. IoT-Lite: A lightweight semantic model for the internet of things and its use with dynamic semantics. Pers. Ubiquitous Comput. 2017, 21, 475–487. [Google Scholar] [CrossRef]
  65. Specka, X.; Gärtner, P.; Hoffmann, C.; Svoboda, N.; Stecker, M.; Einspanier, U.; Senkler, K.; Zoarder, M.M.; Heinrich, U. The BonaRes metadata schema for geospatial soil-agricultural research data–Merging INSPIRE and DataCite metadata schemes. Comput. Geosci. 2019, 132, 33–41. [Google Scholar] [CrossRef]
  66. Manouselis, N.; Costopoulou, C. Quality in metadata: A schema for e-commerce. Online Inf. Rev. 2006, 30, 217–237. [Google Scholar] [CrossRef]
  67. Cano, M.A.; Tsueng, G.; Zhou, X.; Xin, J.; Hughes, L.D.; Mullen, J.L.; Su, A.I.; Wu, C. Schema Playground: A tool for authoring, extending, and using metadata schemas to improve FAIRness of biomedical data. BMC Bioinform. 2023, 24, 159. [Google Scholar] [CrossRef]
  68. Koç, H.; Erdoğan, A.M.; Barjakly, Y.; Peker, S. UML Diagrams in Software Engineering Research: A Systematic Literature Review. Proceedings 2021, 74, 13. [Google Scholar] [CrossRef]
Figure 1. Overall research framework.
Figure 1. Overall research framework.
Sustainability 17 11237 g001
Figure 2. Mapping of entities and items.
Figure 2. Mapping of entities and items.
Sustainability 17 11237 g002
Figure 3. UML-based metadata schema for AV accident investigation.
Figure 3. UML-based metadata schema for AV accident investigation.
Sustainability 17 11237 g003
Figure 4. Conceptual connections to investigation from data sources by the data catalog functions.
Figure 4. Conceptual connections to investigation from data sources by the data catalog functions.
Sustainability 17 11237 g004
Figure 5. Implementation of the AV-specific metadata schema as a data discovery UI.
Figure 5. Implementation of the AV-specific metadata schema as a data discovery UI.
Sustainability 17 11237 g005
Figure 6. Case study: Applying the AV-specific metadata schema to accident scenario analysis.
Figure 6. Case study: Applying the AV-specific metadata schema to accident scenario analysis.
Sustainability 17 11237 g006
Table 1. Autonomous and conventional vehicle accident data characteristics.
Table 1. Autonomous and conventional vehicle accident data characteristics.
Type of AccidentLayersData SourceType of Data GenerationTemporal ResolutionData TypeData ContentsRefs.
AV
&
MV
Dashcam
footage
Dashcamwhen an event occur
continuously recorded
25∼30
fps
Unstructured
(MP4)
Visual context of the driving environment[33,34]
Accident
overview
Accident
report
when an accident occurN/AUnstructured
(PDF)
Vehicle information
Driver information (e.g., DUI)
Accident severity
Accident circumstances
Accident overview (e.g., weather, road condition)
[15]
Recording
devices
within vehicles
EDRwhen an event occur
(airbag deployment,
and so forth)
2 fpsUnstructured
(PDF)
Pre-crash information (e.g., speed, brake engagement status)
In-crash information (e.g., airbag deployment time)
[35,36]
AVRecording
devices
within AV
DSSADcontinuously recorded
when an event occur
(crash detection,
system failure,
and so forth)
2 fpsStructured
(Timestamped
event log)
System status codes (e.g., ADS activation status)
Control command logs (e.g., steering angle)
V2X messages (e.g., vehicle received messages)
Sensor fusion information (e.g., object detection)
Vehicle location (e.g., latitude, longitude)
[37,38]
V2V
communication
BSM
(Basic Safety
Message)
continuously recorded10 fpsStructured
(ASN.1/UPER)
  Vehicle location
Braking information (e.g., brake system status)
Vehicle dimensions (e.g., length, width)
Protected communication zone information
[39]
CAM
(Cooperative
Awareness
Message)
continuously recorded1∼10
fps
Structured
(ASN.1/UPER)
Vehicle location
Vehicle dimensions
[40]
V2I
communication
TIM
(Traveler
Information
Message)
when an event occur
(road condition changes,
pre-defined zones,
and so forth)
Event-
based
Structured
(ASN.1/UPER)
Recommended information (e.g., road construction)
Road sign types
Emergency alert
[39]
DENM
(Decentralized
Environmental
Notification
Message)
when an event occur
(accident occur,
emergency vehicles
approaching, and so forth)
Event-
based
Structured
(ASN.1/UPER)
Event overview (e.g., type, location)
Emergency vehicle information (e.g., speed, direction)
Geographic area warning information
[41]
Vehicle
sensor
Cameracontinuously recorded10∼12
fps
Unstructured
(PNG, TFRecord)
Video of vehicle perception
3D bounding box for vehicle object perception
[42,43,44,45,46]
LiDARcontinuously recorded10∼20
fps
Unstructured
(bin, TFRecord)
Video of vehicle perception
3D bounding box for vehicle object perception
[42,43,44,45,46]
Radarcontinuously recorded13 fpsUnstructured
(bin)
Video of vehicle perception
3D bounding box for vehicle object perception
[45,46]
Road
infrastructure
Traffic Signalwhen an event occur
(signal state changes)
1 fpsStructured
(CSV)
Signal information (e.g., signal state, remaining time)[47]
HD MapPeriodic updatedN/AUnstructured
(GeoTIFF)
Road geometry (e.g., lane boundaries)

ODD
[48]
Note. AV = Autonomous Vehicle; MV = Manual Vehicle; fps = frames per second; ODD = Operational Design Domain; N/A indicates that data is not available.
Table 2. Comparison classes between DCAT and application profiles.
Table 2. Comparison classes between DCAT and application profiles.
ClassDescriptionDCAT
[54]
DCAT-
AP
[55]
DCAT-
AP.de
[56]
Geo-
DCAT-
AP
[57]
DCAT-
Trans
[53]
CatalogTo provide a list of datasets and data services included in the catalog including title, description, and list of included resourcesOOOOO
DatasetSeriesTo represent a collection of datasets with temporal or periodic continuity, including names, descriptions, and information about the included datasetsOOOO
ResourceTo express the characteristics of resources commonly used across multiple classes, including descriptive information such as unique identifiers and namesOOOO
DatasetTo describe the characteristics of a specific dataset, it includes information such as the title, description, subject, temporal scope, and spatial scopeOOOOO
DataServiceTo explain how to access and utilize the data, it includes information on the service’s name, description, access address, and provided functionsOOOO
DistributionTo explain the actual availability of the dataset, it includes information on data format, access path, download, location, and media typeOOOOO
CatalogRecordTo provide the management history of resources registered in the catalog, including information such as catalog’s creation time and modification timeOOOO
AgentTo describe the organization or individual associated with the datatset, include information on the administrator’s name, contact details, and type OOO
LocationTo express the spatial scope related to the dataset, it includes location coordinates, administrative districts, and geographic area information O OO
LicenseDocumentTo explain the terms of use for the dataset, including the required license type, rights description, and related documentation information O O
ChecksumTo verify data integrity, it includes information such as verification algorithms and hash values O O
RelationshipTo explain the relationships between data, include the names of related data, hierarchical relationships, and association information O OO
KindTo describe the category to which the dataset belongs, it includes information about the characteristics of the type, classfication, and category OOO
AttributionTo describe the entities contributing to the dataset, include information about the relevant organizations or individuals, their roles, and their level of contrivution O
TaxonomyTo systematically classify the topics within the dataset, it includes information on the topic classification system and topic items O
Note. ‘O’ indicates the presence of the corresponding class. DCAT-AP = DCAT Application Profile for Data Portals in Europe; DCAT-AP.de = German DCAT Application Profile (or DCAT-AP for Germany); GeoDCAT-AP = Geospatial Profile of DCAT-AP; DCAT-Trans: DCAT Application Profile for Transportation Data.
Table 3. Twelve types of properties and sub-properties.
Table 3. Twelve types of properties and sub-properties.
Types of PropertyDescriptionSub-PropertiesRefs.
Data modification and creationInformation containing history data on creation, updates, modifications, enabling tracking of the data lifecycles, including the creator, publication date, modification historywasGeneratedBy, issued[53,54,55,56,57]
Index and classificationInformation for indexing, including the dataset’s classification system, subject terms, keywords which supports users in exploring data suited to their purposestype, keyword[53,54,55,56,57]
DescriptonBasic information necessary for understanding the overall characteristics and content of the data, including its title, description, format, and other essential details information supporting data comprehensiondescription, title[53,54,55,56,57]
ResolutionInformation regarding the temporal and spatial resolution levels possessed by the data information supporting data evaluation according to analysis or utilization purposespatialResolutionInMeters[53,54,55,56,57]
MetadataInformation including metadata’s own compliance with standards, reference schemas, and other details information supporting the structureal reliability and interoperability of metadataconformsTo, isReferencedBy[53,54,55,56,57]
DistributionTechnical distribution information, including the data distribution method, format, download path, file size, and other technical details, as well as information supporting data accessibility, such as the data provision methoddownloadURL, bytesize[53,54,55,56,57]
SpatiotemporalIncludes information on the temporal and spatial scope covered by the data supporting spatio-temporal analysis or limited use by region and periodtemporal, spatial[53,54,55,56,57]
IdentificationIncludes the data’s unique identifier, version information, supporting the tracking of the data’s identity and historyversion, identifier[53,54,55,56,57]
Linkage and RelationshipIncludes information on references and linkages between other data related to specific data, supporting related data indexing and integrated data utilizationrelation, isReferencedBy[53,54,55,56,57]
Access and RightsInformation regarding permissions, licenses, access paths, and other details for data utilization, including information supporting legal restrictions on data use and managing authorized personnel accessaccessRights, accessURL[53,54,55,56,57]
Provider and ManagerInformation regarding the entity that created and provides the data, the managing agency, and the responsible personnel, supporting the assurance of reliability regarding the data’s source and responsible entitypublisher, contactPoint[53,54,55,56,57]
AssistanceAdditional attributes not covered in the above items, such as other reference information and data integrity verification, which are necessary for schema composition but do not describe the data it selfservesDataset, checksumValue[53,54,55,56,57]
Table 4. Summary of related works on metadata schema in various research domains.
Table 4. Summary of related works on metadata schema in various research domains.
Authors (Year)Research DomainProposed
Schema
DiagramCase
Study
ValidationNoteRef.
S. Canham and C.Ohmann (2016)Clinical researchOXXX-[58]
Shin et al. (2019)TransportationOXXX-[53]
X. Specka et al. (2019)Soil-agriculturalOOXX-[65]
N. Manouselis and C. Costopoulou (2006)E-commerceOOXX-[66]
M.A. Cano et al. (2023)BiomedicalOOXX-[67]
P. Labropoulou et al. (2020)Language technologyOOOXPilot implementation utilizing the schema[59]
M. Bermudez-Edo et al. (2017)IoTOOOOSchema validation based on testbed data[64]
S. Welten et al. (2021)MedicalOOOXVisualization implementation utilizing the schema[60]
S. Mukherjee and R. Das (2020)Cultural heritageOOOXSchema validation based on real-world data[61]
H. Abaza et al. (2025)Health researchOOOXVisualization implementation utilizing the schema[62]
Kim et al. (2015)Cultural heritageOOOXVisualization implementation utilizing the schema[63]
Note. ‘O’ indicates that the elemnet was included in the study scope, and ‘X’ indicates it was not.
Table 5. Accident investigation items by data resources.
Table 5. Accident investigation items by data resources.
ItemsSub-ItemsData ResourceCurrent Data Availability
Essential
information
ODD areaN/AN/A
Partybehavioraccident reportO
trajectoryDSSADO
forward attention statusaccident reportO
Objectcellphone usage statusaccident reportO
fixedaccident reportO
movableaccident reportO
TrafficTraffic flow ProgressionN/AN/A
Environmentroad facility locationHD MapO
sun glareCamera, LiDAR,
Radar
O
traffic signal informationTraffic SignalO
Vehicle
information
vehicle levelaccident reportO
autonomous mode
DSSADO
conventional modeaccident reportO
H/W function
fault
sense functionDSSADO
perception
& localize function
DSSADO
scene functionDSSAD, Camera,
LiDAR, Radar
O
plan & decide functionDSSADO
EV systemDSSADO
Chassis
system
chassis typeN/AN/A
chassis statusN/AN/A
HMIHMI typeN/AN/A
HMI locationN/AN/A
S/W function
fault
sense functionDSSADO
perception
& localize function
DSSADO
scene functionDSSAD, Camera,
LiDAR, Radar
O
plan & decide functionDSSADO
EV systemDSSADO
Other function
fault
ADS operational statusDSSADO
DDT fallback momentN/AO
risk minimization driving intervalDSSADO
Violationtype of violationaccident reportO
System versionsoftware versionN/AN/A
firmware versionN/AN/A
hardware versionN/AN/A
Communicationin-vehicleN/AN/A
externalBSM, CAM,
TIM, DENM
O
Communication
infrastructure
infrastructure typeN/AN/A
infrastructure statusN/AN/A
infrastructure locationN/AN/A
SecurityphysicalN/AN/A
cyberN/AN/A
Virtual
environment
road facilityHD MapO
visibility conditionCamera, LiDAR,
Radar
O
road configurationHD MapO
road operation conditionTIMO
road typeHD MapO
CameraO
road conditionLiDARO
RadarO
security
& communication alert area
DENMO
Note. ‘O’ indicates that data is currently available, while ‘N/A’ indicates that data is not available.
Table 6. Properties and sub-Properties for AV accident investigaion.
Table 6. Properties and sub-Properties for AV accident investigaion.
Class12 Types of PropertiesSub-PropertiesDescriptionExampleRemark
DatasetData modification
and Creation
issued *Date the dataset was first created1 June 2025Re-use existing DCAT and APs
definitions or extend meanings
modified *Date the dataset was updated or modified5 June 2025
wasGeneratedBy *How the dataset was createdLiDAR sesnsor raw data extraction
accrualPeriodicity *Update cycle of the datasetDaily
IdentificationidentifierA unique identifier for a datasetlidar-dataset-2025-06
versionA version for managing modification and update history of the same datasetv.1.2
Index and ClassificationkeywordKeywords of the datasetPerception, Virtual environment
Descriptiontitle *Title of the datasetAV LiDAR perception virtual environment data
description *Description of the datasetLiDAR point cloud data collected during AV operation
distribution *Distribution method of the datasetProvided in compressed file format
ResolutiontemporalResolutionTemproal resolution of the dataset12 fps
spatialResolutionSpatial resolution of the dataset, point (individual object), line (linear object), area (area obejct) unitsDriving trajectory at 0.5 m resolution
SpatiotemporalspatialSpatial scope of the datasetMajor arterial roads within Seoul Metropolitan City
temporalTemporal scope of the dataset1 June 2025 5 June 2025
Provider and ManagercreatorName of creating agency or administratorOO University AV research Lab.
contactPointContact information for creating agency or administratorlidarlab@univ.ac.kr
MetadataconformsToStandards (e.g., metadata, technical specifications) that the dataset or service complies withInternational Standard
isReferencedByInformation on how the dataset is referenced and utilized by other datasets, documents2025 AV environment perception report
RelationqualifiedRelationRelationship with other datasetsprov:wasDerivedFrom AV2025_v1
videoResolutionResolution of the video dataset1920 × 1080 @ 30 fpsNew Properties
(Data heterogeneity management)
triggerMechanismData storage or generation conditionIMU sensor crash detection
samplingRateTemporal collection frequency or cycle of data12 fps
dataModalityData representation formatVideo (Point cloud)
clipLengthClip length of log or video data45 s
sensorTypeData collection equipment or sensor typeLiDAR (Front)
dataGranularityLevelSpatial and temporal resolution level of dataSpatial: High resolution (1 m or less)
Temporal: High resolution (1 frame per second or more)
eventTimeMarkerSynchronization reference point for multimodal data15 May 2025 T08:30:10ZNew Properties
(Multimodal data linkage and
spatial information)
prePostEventWindowRecording time interval before and after a specific eventPre: 15 s
Post: 5 s
New Properties
(Accident investigation process linkage)
analysisSupportLevelLevel of support data provides for analysisAll levels (Raw data provided)
investigationStepStage of accident investigation where data is utilizedPre-crash (Virtual environment investigation)
reportingPurposePurpose for creating data or documentationComparison of Real-World environments and AV-perception virtual environments
dataSoruceEntityData collection entity or equipmentDSSAD_Extractor_Unit_42New Properties
(Special causes specific to AV)
cyberSecurityEventSecurity events detected during the data collection processAuthentication failure warning
DistributionDescriptiontitleTitle of distributionLiDAR data distribution fileRe-use existing DCAT and APs
definitions or extend meanings
descriptionDescription of distributionCompressed LAS (LASer) format Lidar data
Data Modification
and Creation
issuedDate the distribution was first created1 June 2025
modifiedDate the distribution was updated or modified5 June 2025
Distributionformat *Distribution formatLAS
compressFormat *Distribution file compression formatzip
byteSize *Distribution file size25 GB
downloadURL *Distribution file download URLhttps://data.exampl/lidar202506.zip
Access and RightsaccessURL *URL providing access to the specified distribution methodhttps://api.data.example/lidar
accessService *Service endpoint provided for interacting with data distributionREST API (JSON)
beforebyteSizePre-compression distribution file size information36 GBNew Properties
(Data heterogeneity management)
formatDetailDetailed file format (e.g., encoding method) or standard name of the distribution data)Video data, MP4 (H.264)
CatalogAssistancecatalog *Hierarchical relationships between catalogsTop-level: City data catalog/Sub-level: Vehicle sensor catalogRe-use existing DCAT and APs
definitions or extend meanings
service *Provided information for datasets included in the catalogLiDAR Data Service 2025
dataset *Individual data included in the catalogLiDARDataset_202506
Index and ClassificationthemeTaxonomy *Subject classification criteria supporting dataset organization and retrievalRoad environment perception
RelationshipLinkage and Relationshiprelation *Related datasetsAV camera perception virtual environment dataRe-use existing DCAT and APs
definitions or extend meanings
isRequiredBy *‘A’ dataset required for using ‘B’ dataset to enable complex data interpretation3D Object Detection Dataset
Requires *‘B’ dataset required for using ‘A’ dataset to enable complex data interpretationRoad Surface Condition Dataset
caseNumUnique identifier information for individual accidentsCASE-20250601-0123New Properties
(Accident investigation process linkage)
multiModalLinkageInformation on the relationship between ‘A’ dataset and other sensors, devices, and so forthCamera → LiDar MatchingNew Properties
(Multimodal data linkage and spatial information)
LocationSpatiotemporalgeometry *Spatial geometry of the dataLINESTRING (127.03 37.50, 127.05 37.52)Re-use existing DCAT and APs
definitions or extend meanings
bbox *Geographic boundaries of the dataPOLYGON (127.02 37.49, 127.06 37.53)
adminUnitL1Highest-level administrative district to which the data belongsRepublic of Korea
adminUnitL2Second-highest administrative district to which the data belongsSeoul Metropolitan City
adminUnitL3Third-highest administrative district to which the data belongsGangnam District
adminUnitL4Lowest-level administrative district to which the data belongsYeoksam 1-dong
videodatabboxGeographic boundary information in video dataFRAMEBOX (Seoul) 00:00:00–01:00:00
FRAMEBOX (Incheon) 01:00:00–02:00:00
New Properties
(Multimodal data linkage and spatial information)
geocoordinateSpatial coordinate system information referenced by the data37.7749° N
geoContextTypeReference system type for spatial data (e.g., road network-based, administrative boundary-based)Road network-based
DataServiceAccess and RightsendpointDescription *Description of operations possible using the endpointRESTful API for LiDAR data queryRe-use existing DCAT and APs
definitions or extend meanings
endpointURL *Endpoint URL of the service providing the datasethttps://api.data.example/lidar/v1
license *Download and operation permissions for the datasetCC BY 4.0
AssistanceservesDatasetDatasets that can be deployed by the data serviceLiDARDataset_202506
PeriodOfTimespatiotemporalendDate *Data release end date5 June 2025Re-use existing DCAT and APs
definitions or extend meanings
startDate *Data release end date1 June 2025
AgentProvider and Managertype *Agent typeOrganizationRe-use existing DCAT and APs
definitions or extend meanings
name *Name of managing agency or administratorOO University AV research center
contactPointContact information for managing agency or administratoravcenter@univ.ac.kr
LicenseAccess and Rightstype *Required License typeCC BY 4.0Re-use existing DCAT and APs
definitions or extend meanings
ChecksumAssistancealgorithm *Algorithm used to generate the checksumValueSHA-256Re-use existing DCAT and APs
definitions or extend meanings
checksumValue *Checksum generated using the algorithm checksumValue9f2c7b3e4a…c12a
Note. Properties marked with an asterisk (*) indicate mandatory fields defined in the standard DCAT and APs.
Table 7. Accident scenario overview.
Table 7. Accident scenario overview.
ContentDescription
Accident LocationSignalized Intersection
Scenario descriptionThe traffic signal at the intersection was displaying a green light, but the attacker hacked the signal control system and transmitted a red light message to the AV. This caused the AV to brake abruptly, resulting in a rear-end collision by the MV following behind.
Driver taskAutomated Driving System (ADS) fully engaged
AV function issueSecurity breach
Issue typeMalicious message injection
System stateHacked V2I system
WeatherClear
Collision typeAV to MV
Crash typeRear-end collision of MV
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, M.; Kim, N.; Kim, H.; Song, T.-J. Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation. Sustainability 2025, 17, 11237. https://doi.org/10.3390/su172411237

AMA Style

Kim M, Kim N, Kim H, Song T-J. Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation. Sustainability. 2025; 17(24):11237. https://doi.org/10.3390/su172411237

Chicago/Turabian Style

Kim, Minwook, Nayeon Kim, Heesoo Kim, and Tai-Jin Song. 2025. "Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation" Sustainability 17, no. 24: 11237. https://doi.org/10.3390/su172411237

APA Style

Kim, M., Kim, N., Kim, H., & Song, T.-J. (2025). Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation. Sustainability, 17(24), 11237. https://doi.org/10.3390/su172411237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop