Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation

Kim, Minwook; Kim, Nayeon; Kim, Heesoo; Song, Tai-Jin

doi:10.3390/su172411237

Open AccessArticle

Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation

by

Minwook Kim

,

Nayeon Kim

,

Heesoo Kim

and

Tai-Jin Song

^*

Department of Urban Engineering, Chungbuk National University, Cheongju-si 28644, Chungcheongbuk-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(24), 11237; https://doi.org/10.3390/su172411237

Submission received: 22 October 2025 / Revised: 2 December 2025 / Accepted: 9 December 2025 / Published: 15 December 2025

(This article belongs to the Special Issue Hyper-Connected Cities: Advancing Intelligent Transport Systems for Sustainable and Resilient Mobility)

Download

Browse Figures

Versions Notes

Abstract

Autonomous vehicle (AV) accidents introduce uncertainty in liability attribution, as responsibility is divided between humans and automated systems. The 2018 Arizona crash highlighted growing societal concerns about accountability. To address these issues, prior studies proposed investigation processes considering perception sensors, driving control systems, communication infrastructure, and cybersecurity. However, conducting such investigations requires integrating large-scale data from multiple sources, including vehicle sensors, onboard recorders, V2X communications, and road infrastructure. Raw data often lack descriptive information, limiting their use in real investigations. This study establishes a structured mapping framework linking investigation procedures, responsible entities, items, and data across accident phases. With this backdrop, an autonomous driving–specific metadata schema extending DCAT was designed, comprising 10 Classes and 76 Properties. To demonstrate its applicability, a prototype data catalog user interface (UI) was conceptualized with data discovery and visualization examples. The proposed schema strengthens accountability and interoperability by explicitly aligning responsibilities and data relationships. It enables precise event localization and effective linkage of heterogeneous data. Future work will refine the schema by incorporating DSSAD, V2X, and security log data, and develop a user-tested UI prototype as a practical support tool for AV accident investigation.

Keywords:

cooperative autonomous vehicle; accident investigation; cyber security; metadata schema; data catalog

1. Introduction

Before the introduction of autonomous driving (AV) technology, approximately 94% of traffic accidents were found to be attributable to human factors [1]. Therefore, unless there was clear evidence such as vehicle defects, liability for accidents was mostly concentrated on human drivers. In March 2018, an AV undergoing a test drive in Arizona, USA, collided with a pedestrian while driving. This resulted in the first recorded fatal accident caused by an AV. According to the NTSB’s investigation, the accident’s causes ultimately boiled down to two factors: (1) the test driver’s inattention and (2) the functional limitations of the autonomous driving system (ADS). However, the court’s ruling placed greater responsibility on the former. Thus, the occurrence of AV accidents heightened uncertainty in assigning responsibility by differentiating the liable parties into ‘human’ and ‘system’ [1]. This highlighted the necessity for an accident investigation system to clearly determine the cause of accidents involving AV(s) and identify the responsible party [2].

Accident investigations involving AVs comprehensively reflect various factors, including information from the vehicle’s internal perception sensors, physical vehicle control data based on driving decision outcomes, external infrastructure and communication data, and cyber attacks. This is designed to enable tracking of the entire ‘perception-decision-control’ process. Investigating diverse factors requires systems that are more varied and sophisticated than those used in past conventional traffic accident processes. Therefore, developing systems for autonomous accident investigation necessitates the systematic collection and storage of diverse data, including physical data on actual driving behavior previously unconsidered, and information from infrastructure and communication interactions. The diverse raw data generated and collected from vehicles, infrastructure, and other sources must be incorporated into a structured data management system that considers its relevance to accident investigation items and entities [3]. This indicates that establishing an autonomous driving accident investigation system begins with defining metadata. Metadata is data containing additional information about stored raw data, such as its content, format, and storage location, and is generally defined as ‘data about data’ [4]. Metadata provides usable attribute information to raw data and creates a system to manage it. Effective use of metadata requires the additional establishment of a data catalog. A data catalog can provide indexes, access methods, data descriptions, and simple visualizations for raw data based on metadata [5]. The objective of this study is to propose a standardization approach for the metadata schema and data catalog of diverse data generated and collected in autonomous driving environments. This aims to effectively investigate traffic accidents involving autonomous driving, which are expected to surge in the future. To achieve this, the study first establishes a mapping structure based on the autonomous driving accident investigation process. This maps the relationships between the investigation process, entities, items, and utilized data according to the accident occurrence stage (Pre-, In-, and Post-crash). Second, a specialized metadata schema applicable to autonomous driving accident investigations is developed. This data schema applies and extends various schema techniques used in other fields to reflect the unique characteristics of autonomous driving. Finally, the defined metadata schema is utilized for a pilot implementation of a data catalog user interface. The proposed user interface implements functions such as indexing, searching, and visualization to enable efficient utilization during accident investigations.

2. Related Works

Unlike manually driven vehicles (MV), AVs operate by perceiving and judging surrounding conditions based on various sensors [6]. This characteristic necessitates investigations distinct from traditional accident investigation procedures centered on human factors. Procedural differences imply varying data requirements during investigations, directly necessitating a systematic data management framework and metadata schema design. Therefore, the existing literature is reviewed focusing on three themes. First, we meticulously review studies on accident investigation procedures for AVs and MVs. Second, we examine the types and characteristics of data that can be utilized in the accident investigation processes for AV and MV types, comparing their differences. Finally, studies related to data catalogs and metadata schemas were reviewed.

2.1. Accident Investigation Process Traffic Accident Involving AV

The procedures for investigating traffic accidents vary in detail depending on each country’s legal system and agency operations, but the basic framework is generally similar. They typically consist of the following stages: accident reporting and initial response, on-site investigation, interviews with parties involved and witnesses, determination of fault, and report preparation [7,8,9,10,11,12]. Furthermore, accident investigations for MVs are categorized as either general accidents or complex accidents based on their complexity [6]. The latter includes cases involving not only simple human factors but also vehicle defects or significant loss of life. Such accidents require in-depth field investigations, data analysis, and multi-agency cooperation, focusing on comprehensively identifying various factors including the vehicle’s physical condition, road environment, and driver behavior [8,13,14]. In contrast, accident investigation procedures for AVs have not yet been institutionally established. While some reports compiled after accidents have been accumulated [15], they have not been institutionalized as a systematic procedure. In this context, some studies have proposed expanded procedures reflecting AV characteristics [1,16]. Hoque and Hasan [16] presented a digital forensic framework applicable to accident investigations, reflecting the characteristics of AVs that rely on various sensors for driving. This framework enables the verification of the confidentiality and integrity of logs collected from sensors installed in AVs. Kim et al. [1] proposed specific investigation items necessary for AV accident investigations, emphasizing that processes for vehicle functionality verification, physical investigation, digital forensics, and sensor data analysis must be included in addition to existing MV investigation procedures.

Synthesizing prior research reveals that AV accident investigation procedures are significantly more complex than those for MVs. They require the utilization of multi-source data to address new investigation items such as sensor errors and cybersecurity factors. Therefore, it is necessary to examine data usable in accident investigation procedures specialized for AVs and to establish a data management system that captures the characteristics of these data.

2.2. Usable Data to Accident Investigation

Above-mentioned in the literature review of accident investigations, the investigation of the accidents involving AVs requires consideration of more comprehensive factors compared to conventional accident investigations. This calls for identifying and utilizing addtional data sources. Therefore, this examines the supplementary data required when AVs are involved in traffic accidents. Data commonly used in investigations involving both AVs and MVs include driving footage, video, and images recorded by in-vehicle mounted devices, and accident reports [17,18,19,20]. Investigation of accident involving AV requires additional data to investigate relevant factors such as the vehicle control system, V2X communications, vehicle sensors, road infrastructure, and the accident reports [21,22,23,24,25,26]. Dashcam footage is a primary data, providing a comprehensive visual context of the driving environment at the time of the accident [17]. The accident report is a report prepared by police dispatched to the scene, serving as a core document directly linked to accident cause analysis [19]. Recording devices within AVs are categorized as Event Data Recorder (EDR) and Data storage Systems for Automated Driving (DSSAD). The former provides the vehicle’s mechanical driving records (speed, brake operation, engine RPM, any other information that is applied for the accident investigation) immediately before and after the accident, which are used for accident reconstruction [27]. The latter is a device installed in AV(s) that can continuously store driving records depending on whether the ADS is active or not [28]. The device is considered essential for determining the cause of AV accidents because it contains not only the mechanical records but also raw data collected from sensors such as LiDAR, Radar, and camera, along with logs related to the AV’s perception, decision-making, and control [29]. In addition, V2X information to compensate for the limitations of sensor perception range [30]. However, this communication work can become a potential pathway for cyberattacks. Log data must be secured to determine efforts for cyber attack detection, path tracing, and its mitigation [31]. Finally, road infrastructure information plays a pivotal role in accident investigation by providing details about the AV’s planning and current surrounding environment, as it provides the ODD information necessary for driving [32]. Table 1 summarizes the types and characteristics of key data that can be utilized in investigations. Each data type has distinct storage trigger mechanisms and varying temporal resolutions. Furthermore, even within the same data category, it exists in diverse formats such as images and point clouds.

2.3. Metadata Schema and Data Catalog

A data catalog is defined as a core tool that systematically manages vast data assets and supports users in efficiently exploring and utilizing data [5,49,50]. The catalogs perform functions such as providing information about raw data, indexing, and visualization based on standardized metadata. Implementing these functions begins with building metadata based on a standardized metadata schema [51]. A metadata schema refers to a set of metadata elements defined for a specific purpose [52]. Utilizing the same schema facilitates compatibility between different data catalogs [53]. A prominent example among various schemas is Data Catalog Vocabulary (DCAT) [54]. DCAT categorizes information provided through metadata into Classes (e.g., dataset and service) and Properties (attributes within each Class). DCAT is utilized as an Application Profile (AP) by public data portals in various countries and for sharing data collected in diverse fields [53,55,56,57]. These either use only the necessary Classes based on their domain or introduce and utilize additional Classes. Table 2 compares the Classes defined in each AP, including DCAT. Catalog, Dataset, and Distribution Class are common to all schemas, signifying they are essential components for building data catalogs. Conversely, Classes such as DatasetSeries, Resource, DataService, CatalogRecord, Agent, Location, and LicenseDocument are only utilized in specific schemas, reflecting the particularities of each domain [54,55,56,57]. Notably, DCAT-Trans introduced the Taxonomy Class, emphasizing the importance of managing subject classification systems for transportation data [53]. APs combine commonly required Classes with domain-specific Classes to meet the needs of each field.

Furthermore, at the Property level, the defined attributes are extensive and detailed, necessitating organization through semantic-based categorization. Table 3 presents the results of classifying Properties into 12 types. These include attributes such as Data Modificaiton and Creation, Index and Classification, Description, Resolution, Metadata, Distribution, Spatiotemporal, Identification, Linkage and Relationship, Access and Rights, Provider and Manager, and Assistant, which are sufficiently applicable for general dataset management or public data portal operation [53,54,55,56,57]. The investigation for the accident with AVs involves diverse entities collecting and utilizing various data sources, including internal and external vehicle sensors, infrastructure, V2X communications, and high-precision maps. This calls for not only using conventional properties but also developing additional properties for the investigation. These additional properties include separate properties for utilizing non-standardized data of the same type, legal spatial information for accident cause investigations, and attributes specifically utilized in the investigation.

Metadata schema studies can be broadly categorized into four types based on the scope of their contributions: (1) proposal of a schema including new properties, (2) presentation of structural diagrams, (3) conduction of case studies, and (4) validation of the proposed schema. As summarized in Table 4, the foundational step of these studies is to define a metadata schema reflecting the requirements of a specific domain. Studies at this stage define schemas by collecting fragmented information through literature reviews and assuming the functions of future systems, even in the absence of actual data. In this process, conceptual diagrams are frequently accompanied to explain the logical structure of the schema [53,58]. Beyond theoretical proposals, research progresses to the case study stage to verify the applicability of the schema in environments where at least partial datasets can be secured. Studies at this stage typically fulfill the role of a Proof of Concept (PoC) by implementing pilot systems or visualization user interfaces based on the acquired partial data [59,60,61,62,63]. For instance, Labropoulou et al. [59] and Abaza et al. [62] implemented pilot systems because they could access partial data built into data repositories and registries related to their domains. Additionally, Kim et al. [63] confirmed the utility of the schema through the implementation of a visualization user interface, as they were able to access partial data from ongoing projects in the domain. Finally, empirical validation is performed when both operational systems and data are fully available. As seen in the study by Bermudez-Edo et al. [64], comprehensive validation is possible only when actual IoT systems and sensor data are fully established. However, in domains where acquiring actual data is extremely restricted, such as AV accident data, this level of validation is practically difficult to achieve. Therefore, despite these limitations, this study advances to the case study stage by implementing a visualization user interface applying the proposed schema. Through this approach, we aim to demonstrate that the schema can be effectively utilized in the actual accident investigation process, even within a limited data environment.

2.4. Contributions

First, previous studies have focused on reviewing accident investigation procedures individually or analyzing data usable for accident investigations in a fragmented manner. This approach has the limitation of not sufficiently reflecting the interconnectivity between procedures and data. Therefore, this study comprehensively reviews accident investigation procedures and usable data, systematically mapping the required data for each stage of the investigation process. Second, metadata schemas designed for building existing data catalogs primarily focused on public data or general dataset management, limiting their scope to defining property types at a level supporting data search and distribution. However, AV data possesses unique characteristics, such as high resolution and multi-sensor fusion, while simultaneously needing to reflect the requirements of accident investigation procedures. Therefore, this study designed a specialized metadata schema for autonomous driving that incorporates these specific characteristics and requirements. Finally, the data catalog UI was proposed not merely as a metadata schema design, but as a support tool that can be directly utilized during accident investigations, featuring indexing, search, and visualization capabilities.

3. Methodology

Figure 1 illustrates the overall flow of the research. The accident occurrence process of AVs is divided into Pre-, In-, and Post-crash stages, with information collected both inside and outside the vehicle being mixed at each stage. However, existing accident investigation procedures suffer from limitations: the division of investigation items among investigative entities is unclear, and the data required for each investigation item is not systematically defined. To address these limitations, three methodological approaches were implemented. First, the relationships between investigative entities, investigation items, and data were mapped according to the procedures for each accident phase, establishing structural linkages. Second, based on the mapping results, additional properties specific to AV accident investigations were derived. Third, reflecting these, an extended metadata schema was designed. Finally, to verify the schema’s practical applicability, a data catalog UI example was conceptualized.

3.1. Mapping of Entities-Itemes-Data on Investigation Process

Accident investigations involving AVs have a broader scope and require more complex data types compared to traditional accident investigations. Accidents vary in their investigating entities and procedures depending on the stage of occurrence, and the information required at each stage also differs. Accordingly, we mapped the investigation procedures, investigating entities, and investigation items based on the accident occurrence stage to establish a structured relationship. Furthermore, by aligning the derived investigation items with the data that can actually be collected and utilized, we concretized the data utilization structure necessary for autonomous driving accident investigations.

Figure 2 shows the structure mapping the investigation entities, procedures, and investigation items (classification and subclassification) for traffic accidents involving AV. The classification of investigation items references the research by Kim et al. [2]. This figure is broadly divided into the Post, Pre-In, Cause Determination, and Report Writing stages, illustrating the relationship between the investigation entities and procedures performed at each stage, and the investigation items collected and analyzed through them. In the Post stage, local police dispatched upon receiving the accident report handle initial response and on-site investigation. The on-site investigation records and analyzes the accident scene, focusing on external factors such as the accident overview, parties involved, objects, traffic conditions, and environment. Specifically, this includes vehicle movement trajectories, road facility locations, and road obstacles. In the Pre-In phase, the AV accident investigation team leads an in-depth investigation to determine the cause of the accident. This process consists of three divisions: the vehicle investigation team, the digital forensics team, and the virtual environment investigation team.

Vehicle Investigation
The vehicle investigation team conducts physical defect investigations and system defect investigations. Physical defect investigations examine the vehicle’s basic information (operating mode, key functions, vehicle status, etc.), hardware (H/W), and whether the chassis system is functioning properly. System defect investigations check for errors in the Human–Machine Interface (HMI), software, and functional modules, analyzing software versions and system logs as well.
Digital Forensic
The digital forensics team focuses on investigating communication failures and security vulnerabilities. They inspect the V2X communication status of AVs and the safety of communication infrastructure, assess the potential for cybersecurity breaches, and verify whether accidents occurred due to external factors.
Virtual Environments Investigation
The virtual environment investigation team reviews environmental factors such as high-precision maps, road design, and traffic operation status to determine whether AVs accurately perceived the actual road environment through their sensors.

Finally, the findings from each stage are consolidated by the autonomous driving accident analysis team. Based on this, the final cause of the accident is determined and the final report is prepared.

Table 5 distinguishes between data currently collectable in AV accident investigations and data which are necessary for investigation but not collected. For example, DSSAD data enables investigation into the perception-decision-control flow of AVs. V2X data (BSM, CAM, and any other sources) can verify external communications and surrounding situational information, while HD Map and Traffic Signal data are utilized to analyze virtual environment and infrastructure elements. Conversely, data gaps exist for HMI, software and hardware versions, physical and cyber security events, and any other elements where currently no usable data is available. These findings should also be reflected in future metadata schema design directions. Data linkage tasks according to the investigation phase are required. Autonomous driving accident investigations can be subdivided into pre-crash, post-crash, and in-crash phases, as well as cause and fault determination stages. This necessitates clear definitions of how various data are utilized according to specific procedures and purposes. Furthermore, differences in the roles and scope of responsibility between the investigating entity and the data holder complicate data sharing for accident investigations. Although data are often collected and utilized simultaneously within a single incident, interconnections between data types (cross-modality references, temporal baselines, and sources) are not systematically documented. Furthermore, cybersecurity and communication-related data are essential for investigating accidents involving AV and must be incorporated. Discrepancies in temporal and spatial resolution create the following limitations for data synchronization and fusion analysis: (1) Inconsistent provision of linkage structures and spatial context information between heterogeneous data types such as video, messages, and HD maps; (2) Reduced interoperability due to differing standards and formats adhered to by each dataset; (3) Limited reconstruction of precise incident information at the event level due to differing data collection cycles (e.g., continuous recording vs. event-based recording), and (4) Unclear documentation of the purpose for utilizing individual datasets within accident investigation reports.

3.2. Definition of Classes and Properties Under Essential DCAT and AP

The AP design used for AV accident investigations must first review the Classes and Properties defined in the previously introduced DCAT and APs. DCAT centers on three core Classes: Catalog, Dataset, and Distribution [53,54,55,56,57]. These form the minimum units for building a data catalog. The metatdata schema for AV accident investigations adds seven Classes-Relationship, Location, DataService, PeriodOfTime, Agent, License, and Checksum. Relationship-to the three core Classes.Relationship describes the linkage structure between datasets. Location includes spatial attribute information such as the spatial form (poin, line, and area) of data used in the investigations and the geographic location of that data (top-level administrative district to bottom-level administrative district). PeriodOfTime refers to comprehensive information about the overall time events from the point of acquisition to the end of the data catalog. DataService supports users in accessing data in real-time and receiving it in the desired format through indexing processes. To this end, it includes functions such as standardized endpoints (URLs) for data access and descriptions of these endpoints (API usage methods). Agent and License clearly define the responsible entity for the data and the usage conditions (rights, regulations, and so forth). Finally, Checksum includes content to support the integrity and verification of data recorded in the catalog. Table 6 presents the results and descriptions of matching the 12 types of Properties proposed in this study to 10 Classes, along with descriptions of the AV specific Properties in this study. Properties by type can be used redundantly across Classes, but each may contain distinct core Properties. For instance, the Dataset Class includes the description and Data modification and creation types, with detailed properties such as ‘title’, ‘description’, and ’modified’. Distribution includes types like access and rights, and distribution information, with corresponding detailed properties including ‘byteSize’, ‘formatDetail’, and ’accessURL’. Finally, Catalog requires Assistance and index and classification types as mandatory, with detailed properties including ‘catalog’, ‘service’, and ’themeTaxonomy’. Ultimately, the 54 sub-Properties within the 12 borrowed Property types either re-use existing DCAT and APs definitions or extend meanings to reflect the specific characteristics of AV data. For example, spatialResolution was expanded beyond the quantitative notation standard of ‘1m’ required by existing DCAT. It now allows for the granular expression of spatial unit characteristics, such as point-level resolution for individual objects, line-level resolution for linear objects, and area-level resolution for area objects. Additionally, the functionality of ‘isRequiredBy’ has been enhanced to explicitly specify related datasets that must be secured beforehand to utilize a specific dataset. The use of existing Properties had limitations in fully explaining AV accident investigations. For example, issues relating to cybersecurity and communication errors, such as hacking attempts or authentication failures detected during data collection, are not recorded. Multimodal data linkage also fails to account for issues such as time synchronization between sensors, coordinate system mismatches and data encoding formats. Furthermore, AV accident investigations require distinct items of investigation and procedures for each stage- Pre-, In- and Post-Crash, and the entities responsible for conducting them differ. However, the existing Properties fail to reflect these difference and roles specific to each stage and entity. This calls for introducing new properties to deal with four key areas: accident investigation process linkage; data heterogeneity management; multimodal data linkage and spatial information; and special causes specific to autonomous driving, such as security and communication issues. First, the accident investigation process linking property was introduced to reflect the difference between the investigation process and the subject of the accident, which are Post, Pre-In, and Cause Identification. These properties include ‘investigationStep’, which indicates at which stage of the investigation the data is utilized, enabling tracking of utilized data by investigation procedure; unlike existing attributes that solely track data generation time, this property explicitly aligns data with specific investigation phases. ‘prePostEventWindow’, which specifies the range of records before and after the event reference point, enabling matching of the time flow of analytics utilizing the data; and ‘caseNum’, which provides information about the unique number that identifies an individual incident, enabling data collected from various sources to be combined into a single incident. While the DCAT ‘identifier’ manages individual datasets, ‘caseNum’ functions as a overarching identifier that aggregates fragmented evidence in to a single unified incident. In addition, ‘investigationAgency’ is a sub-Property that provides information about the agency or department that conducted the investigation, allowing for clarity on the subject and scope of the investigation and a transparent record of evidence utilization. Finally, it also includes ‘reportingPurpose’, which provides the purpose for which the data or document was generated, and ’analysisSupportLevel’, which contains information about the scope and constraints of analytics utilization. Next, the properties used for data heterogeneity management are designed to manage the differences in format, resolution, and storage conditions of various sensor data in AVs. ‘sensorType’ and ‘dataModality’ provide a separation between the collection equipment and data representation format, enabling the management of different types of data collected from the same sensor. This addresses the limitations of existing properties, which solely describe file extensions and fail to distinguish between sensor modalities critical for AV accident investigation. ‘samplingRate’ and ‘videoResolution’ provide information on the resolution of sensor-specific data to identify resolution differences when synchronizing and fusing with other data. ‘clipLength’ provides information about the length of video data before and after the accident, which can be used with ‘prePostEventWindow’ to secure a baseline for accident recreation, analysis, etc., when analyzing the accident. ‘triggerMechanism’ provides information about continuous recording and event-based storage conditions, allowing you to track data collection intervals, time ranges, etc. ‘formatDetail’ describes the specific format of the data, including how it is encoded, to facilitate interoperability and utilization across data. ‘dataGranularityLevel’ is a property that categorizes and provides a level of resolution so that practitioners can determine how usable the data is. Finally, ‘beforebyteSize’ provides the file size before compression and can be utilized in conjunction with ‘byteSize’, ‘algorithm’, and ’checksumValue’ to ensure the integrity of large log and video data and the reliability of the distribution process. In addition, multimodal data linkage and spatial information enables the fusion and interpretation of data utilized in accident investigations. These include ‘multiModalLinkage’, which records the linkage between sensors, such as camera-lidar matching; unlike existing properties that merely indicate general associations, this property provides specific information required for sensor fusion. ‘videodatabbox’, which describes the spatial boundaries of video data; ‘geocoordinated’, which provides reference coordinate system information; and ‘geoContextType’, which describes the spatial context, such as road network-based or administrative district-based. Finally, the property of any special cause for autonomous driving is supplemented to investigate non-physical causes such as cyberattacks or communication errors. Two new classes have been added: ‘cyberSecurityEvent’, which records security events detected during data collection, filling the gap in the standard DCAT, which lacks properties for recording non-physical anomalies such as unauthorized access attempts; and ‘dataSourceEntity’, which specifies the entity or equipment that collected the data. These extensions bring the total number of classes to 10, including the three core classes, and utilize 76 properties. Of these, 54 are reuses and semantic extensions of existing DCAT properties, and 22 properties are newly introduced in four categories that are directly relevant to AV accident investigations: data heterogeneity management, multimodal data linkage and spatial information, and autonomous driving special cause considerations (security and communication).

3.3. Meta Data Schema for AV Accident Investigation

Figure 3 visualizes the core structure of the metadata schema for AV accident investigation in Unified Modeling Language (UML) style. UML is a language that visualizes the structure of complex systems, such as software or data, using classes, properties, and relationship symbols, there by aiding system design and documentation [68]. This diagram allows for an intuitive understanding of which properties belong to each class and how the connection structure between classes is formed.

The gray boxes represent Classes. The Dataset Class concentrates key Properties reflecting autonomous driving contexts, such as ‘investigationStep’, ‘prePostEventWindow’, ‘eventTimeMarker’, ‘sensorType’, ‘dataModality’, ‘samplingRate’, ‘clipLength’, ‘analysisSupportLevel’, and ‘dataGranularityLevel’. Relationship Classes enable cross-referencing between related datasets, including ‘multimodalLinkage’, which supports linking heterogeneous data. Distribution Classes include ‘formatDetail’ describing the specific format and standards adhered to by the provided data, and are linked with Checksum Classes to verify the integrity of received data. The Location Class introduces ‘geoContextType’ and ‘videodatabox’ to specifically describe the spatial context of video data, while the Agent Class adds ‘investigationAgency’ to clarify the investigating entity. Notably, UML diagrams can represent data flow and dependencies through association lines between Classes. The Dataset Class connects to Classes such as Distribution, Location, and PeriodOfTime, referencing the spatio-temporal scope and provision method of the data. Furthermore, the Dataset Class links to other datasets via the Relationship Class and connects to the Agent Class to clearly define the investigating entity and usage context. This relational structure allows for a structural understanding of how data is linked within the system for AVs accident investigations and which entities utilize it in which procedures, going beyond simple attribute definitions.

3.4. The Visualization UI Based on the Designed Metadata Schema

Figure 4 illustrates how datasets are organized into a multi-level taxonomy, and how this taxonomy is mapped to catalogs and datasets to consistently drive filtering and visualization. The first taxonomy (vehicle, communication, and road, etc.) and the second taxonomy (coordinates, dynamics, object interaction, system state, messages, etc.) are the higher-level thematic axes of the dataset, which can be hierarchically registered in a ‘themeTaxonomy’. This two-level categorization is more than just a thematic organization, it is directly related to the topics of AV accident investigations, providing a basis for data utilization by investigation procedures. The third level of detail is assigned as a ‘keyword’ for each dataset to enable multiple filtering capabilities. This tiered taxonomy is mapped to DCAT’s Catalog and Dataset structure, allowing users to filter from the Catalog UI by progressively reducing the data required from the first to the third taxonomy. In particular, the catalog is divided into Pre-Crash, In-Crash, and Post-Crash, and the required research items and datasets correspond to each stage. Thus, users can go beyond simply selecting datasets by topic while browsing the catalog and selectively search for data related to procedures before, during, and after an accident. Our taxonomy can also perform linkage discovery between related datasets. A dataset can be included in multiple catalogs at the same time, and conversely, a catalog can encompass multiple datasets. These many-to-many referential relationships can be described via ‘qulifiedRelation’. The duplication and versioning of datasets is controlled by utilizing ‘identifier’ and ‘version’. As a result, the proposed linkage is more than just a data management tool, it provides a basis for data utilization that is directly related to the stages of an autonomous driving accident investigation and the investigation items. The visualization is automatically determined by the results of the classification selection and the resolution metadata of the data. By referring to ‘samplingRate’, ‘temporalResolution’, and ’prePostEventWindow’ for the temporal axis and ‘spatialResolution,’geometry’,’bbox’, and ’geoContextType’ for the spatial axis, investigators can configure timelines based on frames, minutes, hours, and event windows, as well as map representations at the point, line, plane, and network levels. In addition, the difference in data format can be declared as ‘format’ and ‘formatDetail’, and the level of analysis available can be declared as ‘analysisSupportLevel’ to enable appropriate visualization features such as trajectory line, traffic light status, object detection time, TTC, etc. depending on the selected combination of classification and resolution. Note that the data-visualization feature relationship is not a 1:1 mapping. A dataset can be extended to multiple visualizations, or a visualization can fuse multiple datasets using a ‘multiModalLinkage’ to support fusion interpretation of heterogeneous data. For example, one of the visualizations, Accident location marker, displays accident locations as dots on a map. The data utilized here (Vehicle IMU Event) has a classification path of Vehicle → Physical → Coordinates → IMU-based, and the dataset is included in the In-crash catalog. The data is first filtered by using the classification path to filter candidate datasets, then the ‘eventTimeMarker’ is used to calculate the time interval to be displayed, and the accident location is displayed on the map by referring to the coordinate system described in the ‘geocoordinate’. In addition to the vehicle IMU-based location data, the accident point coordinates can be calibrated by referring to the location field of the CAM data or the road geometry to improve temporal and spatial consistency and reliability. These fusion relationships are declared using ‘multiModalLinkage’.

Figure 5 depicts an actual implementation of the autonomous driving-specific metadata schema as a “data discovery UI”. It can be seen that the autonomous driving-specific properties designed in the schema are not just descriptive, but also enable users to index data, obtain data information, and provide purposeful and relevant data. The organization of the figure is divided into four areas, but each element is mapped one-to-one to a property in the schema. First, the filtering area on the left side of the screen shows the schema’s multi-level taxonomy. The parent taxonomy corresponds to ‘themeTaxonomy’ and the child taxonomy corresponds to ‘keyword’ to support data exploration. The widget for selecting the time range of the data defines the searchable interval by referring to the metadata described in ‘temporal’. In addition, the selection of the data collection device utilizes ‘dataModality’ and ‘sensorType’, and the section for selecting the format of the data to be provided is mapped to ‘format’ and ‘formatDetail’. Options such as the number of preview rows of filtered data are then tied to the ‘analysisSupportLevel’. It allows users to perform complex filtering in the following order: “Select a classification -> Set the required data time range -> Select the format and standard of the data to be provided”. Next, the Metadata and Header viewer on the top right of the screen summarizes key information at the dataset level, and the properties used are ‘title’, ‘creator’, ‘wasGeneratedBy’, ‘contactPoint’, ‘investigationAgency’, and ’license’. Next, the header information is implemented by providing the metadata defined in ‘description’ in the form of a list. The associated navigation area in the center right of the screen provides information about related data. It is automatically generated based on the information in the ‘qualifiedRelation’ or ‘multiModalLinkage’ property of the dataset selected by the user through the filter. Finally, the Provide panel at the bottom right of the screen utilizes the ‘accessService’ and ‘downloadURL’ properties to provide API and file download paths, and provides sample previews based on the ‘analysisSupportLevel’. Users can obtain data immediately in the format and standard of their choice, or use the API to connect with other programs to leverage the data.

4. Case Study: Application of the New Methodology for Autonomous Vehicle Accident Investigation

The core objective of this section is to demonstrate that a schema constructed based on the procedural and data attributes of AV accidents can approximate the initiation time of security events by combining only the visualization layers selected by the user, after aligning heterogeneous data across temporal and spatial dimensions. Table 7 shows an accident scenario caused by a cybersecurity issue, partially adapted from accident scenarios involving AVs caused by cyberattacks presented in the study by Girdhar et al. [31]. This was caused by hacking the traffic signal control system, resulting in the transmission of erroneous data to the AV.

Figure 6 shows an example of applying the propsed AV specific metadata shema to accident scenario analysis. The UI suggests possible visualization candidates from the metadata, but the user has complete control over which layers to turn on or off. The visualization analysis procedure is as follows: (1) Layer Selection: The user manually selects spatial-based layers (accident location map, vehicle trajectory sequence, geometry visualization) and temporal-based layers (vehicle motion state, vehicle control actions, inter-vehicle interaction, autonomous system response, and external driving context). (2) Specify analysis time window: Determine the temporal scope of the analysis based on the ‘prePostEventWindow’ around the reference time ‘eventTimeMarker’. (3) Alignment and Overlay: Utilizing the selected visualization layers, the Temporal visualization layer is aligned according to ‘samplingRate’ and ‘temporalResolution’ based on the ‘eventTimeMarker’ information. The Spatial visualization layer is rendered according to ‘geoContextType’ and ‘spatialResolution’. (4) Anomaly Detection: Users identify the first point time

t^{*}

where discrepancies occur between infrastructure signal data and vehicle reception signal data, based on the visualized layer information. The UI provides auxiliary features such as interval changes and change point indicators, but the final determination is made by the user. This configuration utilizes properties specifying data formats and standards to automatically inject parameters for data consistency during visualization, ensuring interoperability and reproducibility. It also records user selection history to maintain analysis transparency. As a result, it achieves a division of responsibilities where the system handles data alignment and synchronization while the investigator observes and judges the results. This supports practical procedures for rapidly and precisely narrowing down incidents suspected of security compromise based on data.

5. Conclusions

Accident investigations for AVs must leverage heterogeneous and massive amounts of data generated from diverse sources, including vehicle sensors, onboard recorders, V2X communications, and road infrastructure. Within this context, this study first systematically mapped the investigation entities, items, and data for each accident phase. Second, it designed an AV-specific metadata schema using DCAT. Third, to demonstrate the usability of the design results, an example of a data catalog UI is presented. Specifically, the data linkage according to the accident investigation procedure is explicitly defined, and attributes are structured to express unique requirements of AVs, such as temporal and spatial resolution, storage triggers, data fusion, and cybersecurity. As a result, we propose a schema comprising 76 properties: 54 reused from existing standards plus 22 AV-specific attributes. The proposed schema enhances data continuity and collaborative accountability during accident investigations by specifying responsibilities according to the investigation procedure. It also strengthens interoperability between heterogeneous data by explicitly defining characteristics (resolution, format, standards, etc.) as metadata. Finally, this study presented a case study (cybersecurity scenario) demonstrating how the data catalog UI, applying this schema and indexes, and it provides necessary data while performing multi-data timeline visualization. This case confirmed that by synchronizing DSSAD, V2X, and signal data around security event metadata, the initiation point of a cyberattack and the resulting changes in vehicle control and communication status can be rapidly identified. However, there were several limitations. First, access to security-related data from AV is currently restricted, preventing full reflection of metadata requirements in this domain. Second, some cases relied on data commonly collected from both AVs and MVs, such as EDRs and accident reports. Third, data not actually existing was described using literature-based explanations. Future research will collect DSSAD, V2X communications, security event logs, and other data generated in actual AV operating environments to enhance the proposed schema. At this stage, data collection from roadways is primarily limited to vehicle status information. However, investigating cyber attacks requires comprehensive communication records, including those from download and upload systems. Therefore, future study needs to implement such scenarios within an autonomous driving simulator environment. In addition, as a next step, a study needs to be conducted to verify how accurately the relevant schema properties map to the data generated in the virtual environment. Furthermore, by developing a data catalog UI prototype based on the enhanced schema and conducting user evaluations, we expect to validate its practical applicability in AV accident investigations.

Author Contributions

Conceptualization, M.K. and T.-J.S.; methodology, M.K. and N.K.; investigation, M.K., N.K. and H.K.; resources, M.K.; data curation, M.K. and N.K.; writing—original draft preparation, M.K.; writing—review and editing, M.K., N.K., H.K. and T.-J.S.; visualization, M.K. and N.K.; supervision, T.-J.S.; project administration, T.-J.S.; funding acquisition, T.-J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a Korea Institute of Policy Technology (KIPoT) grant funded by the Korea government (KNPA, Korean National Police Agency) (No. 092021D74000000). Development of a data extraction and analysis system for DSSAD (Data Storage System for Automated Driving).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

References

Chougule, A.; Chamola, V.; Sam, A.; Yu, F.R.; Sikdar, B. A comprehensive review on limitations of autonomous driving and its impact on accidents and collisions. IEEE Open J. Veh. Technol. 2023, 5, 142–161. [Google Scholar] [CrossRef]
Kim, H.; Han, H.; You, Y.; Cho, M.J.; Hong, J.; Song, T.J. A Comprehensive Traffic Accident Investigation System for Identifying Causes of the Accident Involving Events with Autonomous Vehicle. J. Adv. Transp. 2024, 2024, 9966310. [Google Scholar] [CrossRef]
Terrizzano, I.G.; Schwarz, P.M.; Roth, M.; Colino, J.E. Data Wrangling: The Challenging Yourney from the Wild to the Lake. In Proceedings of the CIDR, Asilomar, Pacific Grove, CA, USA, 10 October 2015. [Google Scholar]
Beamer, A. Map metadata: Essential elements for search and storage. Program 2009, 43, 18–35. [Google Scholar] [CrossRef]
Stillerman, J.; Fredian, T.; Greenwald, M.; Manduchi, G. Data catalog project—A browsable, searchable, metadata system. Fusion Eng. Des. 2016, 112, 995–998. [Google Scholar] [CrossRef]
Yeong, D.; Velasco-Hernandez, G.; Barry, J.; Walsh, J. Sensor and Sensor Fusion Technology in Autonomous Vehicles: A Review. Sensors 2021, 21, 2140. [Google Scholar] [CrossRef]
Korea Law Information Center. Enforcement Decree of the Road Traffic Act. 2017. Available online: https://www.law.go.kr/ (accessed on 20 July 2025).
National Transportation Safety Board. The Investigative Process. 2025. Available online: https://www.ntsb.gov/ (accessed on 20 July 2025).
Employment and Social Development Canada. Investigations of Motor Vehicle Accidents on Public Roads—IPG-066. Effective Date: January 2009. 2009. Available online: https://www.canada.ca/en/employment-social-development/programs/laws-regulations/labour/interpretations-policies/066.html (accessed on 20 July 2025).
Essex Police. H 0602 Procedure—Road Traffic Collisions (Investigations). Version 18—October 2024. 2024. Available online: https://www.essex.police.uk/ (accessed on 20 July 2025).
Bundesministerium der Justiz und für Verbraucherschutz. Straßenverkehrsgesetz (StVG). 2025. Available online: https://www.gesetze-im-internet.de/stvg/ (accessed on 20 July 2025).
Government of Sweden. Accident Investigation Act (1990:712). 1990. Available online: https://www.shk.se/ (accessed on 20 July 2025).
Korea Road Traffic Authority (Koroad). Engineering Analysis of Traffic Accidents on Request from Judicial Institutions. 2025. Available online: https://www.koroad.or.kr/eng/content/view/ME02080000.do (accessed on 20 July 2025).
College of Policing. Investigation of Fatal and Serious Injury Road Collisions. 2023. Available online: https://www.college.police.uk/ (accessed on 20 July 2025).
California Department of Motor Vehicles. Autonomous Vehicle Collision Reports. 2025. Available online: https://www.dmv.ca.gov/ (accessed on 20 July 2025).
Hoque, M.A.; Hasan, R. AVGuard: A Forensic Investigation Framework for Autonomous Vehicles. In Proceedings of the ICC 2021—IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
Giovannini, E.; Giorgetti, A.; Pelletti, G.; Giusti, A.; Garagnani, M.; Pascali, J.; Pelotti, S.; Fais, P. Importance of dashboard camera (Dash Cam) analysis in fatal vehicle-pedestrian crash reconstruction. Forensic Sci. Med. Pathol. 2021, 17, 379–387. [Google Scholar] [CrossRef]
Niehoff, P.; Gabler, H.C.; Brophy, J.; Chidester, C.; Hinch, J.; Ragland, C. Evaluation of event data recorders in full systems crash tests. In Proceedings of the 19th International Conference on the Enhanced Safety of Vehicles, Washington, DC, USA, 6–9 June 2005. [Google Scholar]
Oh, G.; Ko, W.; Park, J.; Yun, I.; SO, J.J. Study on the improvement of traffic accident report for automated vehicle test scenarios. J. Korea Inst. Intell. Transp. Syst. 2022, 21, 167–182. [Google Scholar] [CrossRef]
Kwayu, K.M.; Kwigizile, V.; Lee, K.; Oh, J.S. Discovering latent themes in traffic fatal crash narratives using text mining analytics and network topology. Accid. Anal. Prev. 2021, 150, 105899. [Google Scholar] [CrossRef]
Pisu, P.; Soliman, A.; Rizzoni, G. Vehicle chassis monitoring system. Control Eng. Pract. 2003, 11, 345–354. [Google Scholar] [CrossRef]
Smith, T.; Toth, C.; Timcho, T. Sharing and Using Connected Device Data to Improve Traveler Safety and Traffic Management—Concept of Operations, Use Cases, Traveler Information Needs, Messages, and Requirements; Report FHWA-HRT-23-030; WSP USA Inc.: New York, NY, USA; Cambridge Systematics, Inc.: Medford, MA, USA; Federal Highway Administration, Office of Operations Research, Development, and Technology: McLean, VA, USA, 2023.
National Transportation Safety Board. Collision Between Car Operating with Partial Driving Automation and Truck-Tractor Semitrailer, Delray Beach, Florida, March 1, 2019; Highway Accident Brief NTSB/HAB-20/01; National Transportation Safety Board: Washington, DC, USA, 2020.
National Transportation Safety Board. Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian, Tempe, Arizona, March 18, 2018; Highway Accident Report NTSB/HAR-19/03; National Transportation Safety Board: Washington, DC, USA, 2019.
Feifel, H.; Erdem, B.; Menzel, D.; Gee, R. Reducing Fatalities in Road crashes in Japan, Germany, and USA with V2X-enhanced-ADAS. In Proceedings of the 27th Enhanced Safety of Vehicles (ESV), Conference, Yokohama, Japan, 3–6 April 2023; pp. 3–6. [Google Scholar]
Khattak, M.; De Backer, H.; De Winne, P.; Brijs, T.; Pirdavani, A. Analysis of Road Infrastructure and Traffic Factors Influencing Crash Frequency: Insights from Generalised Poisson Models. Infrastructures 2024, 9, 47. [Google Scholar] [CrossRef]
da Silva, M.P. Analysis of Event Data Recorder Data for Vehicle Safety Improvement; John, A., Ed.; Technical Report HS-810 935; DOT-VNTSC-NHTSA-08-01; Volpe National Transportation Systems Center (U.S.): Cambridge, MA, USA, 2008.
Kim, I.; Lee, G.; Lee, S.; Choi, W. Data Storage System Requirement for Autonomous Vehicle. In Proceedings of the 2022 22nd International Conference on Control, Automation and Systems (ICCAS), Busan, Republic of Korea, 27 November–1 December 2022; 2022; pp. 45–49. [Google Scholar] [CrossRef]
Hyun, S.; Son, J.; Oh, Y.; You, B. A study of the DSSAD data elements derivation through autonomous driving data analysis on expressways. J. Korea Inst. Intell. Transp. Syst. 2024, 23, 97–106. [Google Scholar] [CrossRef]
Jung, C.; Lee, D.; Lee, S.; Shim, D. V2X-Communication-Aided Autonomous Driving: System Design and Experimental Validation. Sensors 2020, 20, 2903. [Google Scholar] [CrossRef] [PubMed]
Girdhar, M.; You, Y.; Song, T.J.; Ghosh, S.; Hong, J. Post-accident cyberattack event analysis for connected and automated vehicles. IEEE Access 2022, 10, 83176–83194. [Google Scholar] [CrossRef]
Pai, V.N.; Barosan, I.; Khabbaz Saberi, A. Map and Its Impact on the Functional Safety of Automated Driving Vehicles. J. Softw. Eng. Auton. Syst. 2023, 1, 17–27. [Google Scholar] [CrossRef]
Moura, D.; Zhu, S.; Zvitia, O. Nexar Dashcam Collision Prediction Dataset and Challenge. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025. [Google Scholar]
Che, Z.; Li, G.; Li, T.; Jiang, B.; Shi, X.; Zhang, X.; Lu, Y.; Wu, G.; Liu, Y.; Ye, J. D²-City: A Large-Scale Dashcam Video Dataset of Diverse Traffic Scenarios. arXiv 2019, arXiv:1904.01975. [Google Scholar]
Chen, R.J.; Tatem, W.M.; Gabler, H.C. Event Data Recorders (EDRs) Duration Study: Final Report; Final Report NHTSA Supplemental Report; Submitted to National Highway Traffic Safety Administration; Virginia Tech, Department of Biomedical Engineering and Mechanics: Blacksburg, VA, USA, 2017. [Google Scholar]
Gabler, H.; Gabauer, D.; Newell, H.; Glassboro, N. Use of Event Data Recorder (EDR) Technology for Highway Crash Data Analysis. NCHRP Project 2004, 17–24. [Google Scholar]
Chapman, S. Automated Vehicle Safety Assurance—In-Use Safety and Security Monitoring: Task 2—Minimum Dataset Specification; Published Project Report PPR2017 TETI0042; Prepared for Department for Transport; Version 1.0; Copyright © TRL Limited; TRL Limited: Wokingham, UK, 2022. [Google Scholar]
UNECE World Forum for Harmonization of Vehicle Regulations (WP.29). DSSAD Guidance Document; Informal Document WP.29-196-09; Submitted to the 196th Session of the World Forum for Harmonization of Vehicle Regulations (WP.29); UNECE: Geneva, Switzerland, 19 June 2025; Available online: https://unece.org/transport/documents/2025/06/informal-documents/grva-dssad-guidance-document (accessed on 20 July 2025).
SAE International. V2X Communications Message Set Dictionary; Technical Report SAE J2735_202409; Revised September 2024; SAE International: Warrendale, PA, USA, 2024. [Google Scholar]
ETSI. Intelligent transport systems (its); vehicular communications; basic set of applications; part 2: Specification of cooperative awareness basic service. Draft ETSI TS 2011, 20, 448–451. [Google Scholar]
No. EN 302 637-3; Intelligent Transport Systems (ITS); Vehicular Communications; Basic Set of Applications; Part 3: Specifications of Decentralized Environmental Notification Basic Service. ETSI: Sophia Antipolis, France, 2019.
Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 3354–3361. [Google Scholar] [CrossRef]
KITTI Vision Benchmark Suite. Available online: https://www.cvlibs.net/datasets/kitti/raw_data.php (accessed on 20 July 2025).
Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; et al. Scalability in perception for autonomous driving: Waymo open dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 2446–2454. [Google Scholar]
Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuScenes: A multimodal dataset for autonomous driving. arXiv 2019, arXiv:1903.11027. [Google Scholar]
nuScenes Dataset. Available online: https://www.nuscenes.org/nuscenes (accessed on 20 July 2025).
Tampa CV Pilot Signal Phasing and Timing (SPaT) Sample. Available online: https://data.transportation.gov/Automobiles/Tampa-CV-Pilot-Signal-Phasing-and-Timing-SPaT-Samp/xn7c-yu2n/about_data (accessed on 20 July 2025).
Wilson, B.; Qi, W.; Agarwal, T.; Lambert, J.; Singh, J.; Khandelwal, S.; Pan, B.; Kumar, R.; Hartnett, A.; Pontes, J.K.; et al. Argoverse 2: Next generation datasets for self-driving perception and forecasting. arXiv 2023, arXiv:2301.00493. [Google Scholar] [CrossRef]
Yee, M.; Surkis, A.; Lamb, I.; Contaxis, N. The NYU Data Catalog: A modular, flexible infrastructure for data discovery. J. Am. Med Informatics Assoc. 2023, 30, 1693–1700. [Google Scholar] [CrossRef]
Dibowski, H.; Schmid, S.; Svetashova, Y.; Henson, C.; Tran, T. Using Semantic Technologies to Manage a Data Lake: Data Catalog, Provenance and Access Control. In Proceedings of the SSWS@ ISWC, Athens, Greece, 2–6 November 2020; pp. 65–80. [Google Scholar]
Cherradi, M.; Bouhafer, F.; Haddadi, A.E. Data lake governance using IBM-Watson knowledge catalog. Sci. Afr. 2023, 21, e01854. [Google Scholar] [CrossRef]
Anil Hirwade, M. A study of metadata standards. Libr. Hi Tech News 2011, 28, 18–25. [Google Scholar] [CrossRef]
Shin, D.K.; Lee, S.H.; Kang, J.; Park, E.M. Data catalogue standards based on dcat for transportation data: Dcat-trans. J. Korean Soc. Transp. 2019, 37, 430–444. [Google Scholar] [CrossRef]
Albertoni, R.; Browning, D.; Cox, S.; Beltran, A.; Perego, A.; Winstanley, P. Data Catalog Vocabulary (DCAT)-Version 3, 2024. w3C Recommendation. Available online: https://www.w3.org/TR/vocab-dcat-3/ (accessed on 1 October 2025).
European Commission. DCAT Application Profile for Data Portals in Europe (DCAT-AP)—Version 3.0.0, 2024. Interoperable Europe Portal. Available online: https://interoperable-europe.ec.europa.eu/collection/semic-support-centre/solution/dcat-application-profile-data-portals-europe/release/300 (accessed on 1 October 2025).
DCAT-AP.de. DCAT-AP.de Specification—Version 3.0, 2024. DCAT-AP.de Portal. Available online: https://www.dcat-ap.de/def/dcatde/3.0/spec/ (accessed on 1 October 2025).
GeoDCAT-AP. GeoDCAT-AP 3.0.0. 2025. Available online: https://semiceu.github.io/GeoDCAT-AP/releases/3.0.0/ (accessed on 20 July 2025).
Canham, S.; Ohmann, C. A metadata schema for data objects in clinical research. Trials 2016, 17, 557. [Google Scholar] [CrossRef]
Labropoulou, P.; Gkirtzou, K.; Gavriilidou, M.; Deligiannis, M.; Galanis, D.; Piperidis, S.; Rehm, G.; Berger, M.; Mapelli, V.; Rigault, M.; et al. Making metadata fit for next generation language technology platforms: The metadata schema of the european language grid. In Proceedings of the Twelfth Language Resources and Evaluation Conference, Marseille, France, 11–16 May 2020; pp. 3428–3437. [Google Scholar]
Welten, S.; Neumann, L.; Yediel, Y.U.; da Silva Santos, L.O.B.; Decker, S.; Beyan, O. DAMS: A distributed analytics metadata schema. Data Intell. 2021, 3, 528–547. [Google Scholar] [CrossRef]
Mukherjee, S.; Das, R. Integration of domain-specific metadata schema for cultural heritage resources to DSpace: A prototype design. J. Libr. Metadata 2020, 20, 155–178. [Google Scholar] [CrossRef]
Abaza, H.; Shutsko, A.; Klopfenstein, S.A.; Vorisek, C.N.; Schmidt, C.O.; Brünings-Kuppe, C.; Clemens, V.; Darms, J.; Hanß, S.; Intemann, T.; et al. Toward a Domain-Overarching Metadata Schema for Making Health Research Studies FAIR (Findable, Accessible, Interoperable, and Reusable): Development of the NFDI4Health Metadata Schema. JMIR Med. Inform. 2025, 13, e63906. [Google Scholar] [CrossRef]
Kim, E.; Kim, J.; Woo, W. Metadata schema for context-aware augmented reality applications in cultural heritage domain. In 2015 Digital Heritage; IEEE: Piscataway, NJ, USA, 2015; Volume 2, pp. 283–290. [Google Scholar]
Bermudez-Edo, M.; Elsaleh, T.; Barnaghi, P.; Taylor, K. IoT-Lite: A lightweight semantic model for the internet of things and its use with dynamic semantics. Pers. Ubiquitous Comput. 2017, 21, 475–487. [Google Scholar] [CrossRef]
Specka, X.; Gärtner, P.; Hoffmann, C.; Svoboda, N.; Stecker, M.; Einspanier, U.; Senkler, K.; Zoarder, M.M.; Heinrich, U. The BonaRes metadata schema for geospatial soil-agricultural research data–Merging INSPIRE and DataCite metadata schemes. Comput. Geosci. 2019, 132, 33–41. [Google Scholar] [CrossRef]
Manouselis, N.; Costopoulou, C. Quality in metadata: A schema for e-commerce. Online Inf. Rev. 2006, 30, 217–237. [Google Scholar] [CrossRef]
Cano, M.A.; Tsueng, G.; Zhou, X.; Xin, J.; Hughes, L.D.; Mullen, J.L.; Su, A.I.; Wu, C. Schema Playground: A tool for authoring, extending, and using metadata schemas to improve FAIRness of biomedical data. BMC Bioinform. 2023, 24, 159. [Google Scholar] [CrossRef]
Koç, H.; Erdoğan, A.M.; Barjakly, Y.; Peker, S. UML Diagrams in Software Engineering Research: A Systematic Literature Review. Proceedings 2021, 74, 13. [Google Scholar] [CrossRef]

Figure 1. Overall research framework.

Figure 2. Mapping of entities and items.

Figure 3. UML-based metadata schema for AV accident investigation.

Figure 4. Conceptual connections to investigation from data sources by the data catalog functions.

Figure 5. Implementation of the AV-specific metadata schema as a data discovery UI.

Figure 6. Case study: Applying the AV-specific metadata schema to accident scenario analysis.

Table 1. Autonomous and conventional vehicle accident data characteristics.

Type of Accident	Layers	Data Source	Type of Data Generation	Temporal Resolution	Data Type	Data Contents	Refs.
AV & MV	Dashcam footage	Dashcam	when an event occur continuously recorded	25∼30 fps	Unstructured (MP4)	Visual context of the driving environment	[33,34]
	Accident overview	Accident report	when an accident occur	N/A	Unstructured (PDF)	Vehicle information Driver information (e.g., DUI) Accident severity Accident circumstances Accident overview (e.g., weather, road condition)	[15]
	Recording devices within vehicles	EDR	when an event occur (airbag deployment, and so forth)	2 fps	Unstructured (PDF)	Pre-crash information (e.g., speed, brake engagement status) In-crash information (e.g., airbag deployment time)	[35,36]
AV	Recording devices within AV	DSSAD	continuously recorded when an event occur (crash detection, system failure, and so forth)	2 fps	Structured (Timestamped event log)	System status codes (e.g., ADS activation status) Control command logs (e.g., steering angle) V2X messages (e.g., vehicle received messages) Sensor fusion information (e.g., object detection) Vehicle location (e.g., latitude, longitude)	[37,38]
	V2V communication	BSM (Basic Safety Message)	continuously recorded	10 fps	Structured (ASN.1/UPER)	Vehicle location Braking information (e.g., brake system status) Vehicle dimensions (e.g., length, width) Protected communication zone information	[39]
	V2V communication	CAM (Cooperative Awareness Message)	continuously recorded	1∼10 fps	Structured (ASN.1/UPER)	Vehicle location Vehicle dimensions	[40]
	V2I communication	TIM (Traveler Information Message)	when an event occur (road condition changes, pre-defined zones, and so forth)	Event- based	Structured (ASN.1/UPER)	Recommended information (e.g., road construction) Road sign types Emergency alert	[39]
	V2I communication	DENM (Decentralized Environmental Notification Message)	when an event occur (accident occur, emergency vehicles approaching, and so forth)	Event- based	Structured (ASN.1/UPER)	Event overview (e.g., type, location) Emergency vehicle information (e.g., speed, direction) Geographic area warning information	[41]
	Vehicle sensor	Camera	continuously recorded	10∼12 fps	Unstructured (PNG, TFRecord)	Video of vehicle perception 3D bounding box for vehicle object perception	[42,43,44,45,46]
		LiDAR	continuously recorded	10∼20 fps	Unstructured (bin, TFRecord)	Video of vehicle perception 3D bounding box for vehicle object perception	[42,43,44,45,46]
		Radar	continuously recorded	13 fps	Unstructured (bin)	Video of vehicle perception 3D bounding box for vehicle object perception	[45,46]
	Road infrastructure	Traffic Signal	when an event occur (signal state changes)	1 fps	Structured (CSV)	Signal information (e.g., signal state, remaining time)	[47]
	Road infrastructure	HD Map	Periodic updated	N/A	Unstructured (GeoTIFF)	Road geometry (e.g., lane boundaries) ODD	[48]

Note. AV = Autonomous Vehicle; MV = Manual Vehicle; fps = frames per second; ODD = Operational Design Domain; N/A indicates that data is not available.

Table 2. Comparison classes between DCAT and application profiles.

Class	Description	DCAT [54]	DCAT- AP [55]	DCAT- AP.de [56]	Geo- DCAT- AP [57]	DCAT- Trans [53]
Catalog	To provide a list of datasets and data services included in the catalog including title, description, and list of included resources	O	O	O	O	O
DatasetSeries	To represent a collection of datasets with temporal or periodic continuity, including names, descriptions, and information about the included datasets	O	O	O	O
Resource	To express the characteristics of resources commonly used across multiple classes, including descriptive information such as unique identifiers and names	O	O	O	O
Dataset	To describe the characteristics of a specific dataset, it includes information such as the title, description, subject, temporal scope, and spatial scope	O	O	O	O	O
DataService	To explain how to access and utilize the data, it includes information on the service’s name, description, access address, and provided functions	O	O	O	O
Distribution	To explain the actual availability of the dataset, it includes information on data format, access path, download, location, and media type	O	O	O	O	O
CatalogRecord	To provide the management history of resources registered in the catalog, including information such as catalog’s creation time and modification time	O	O	O	O
Agent	To describe the organization or individual associated with the datatset, include information on the administrator’s name, contact details, and type		O	O	O
Location	To express the spatial scope related to the dataset, it includes location coordinates, administrative districts, and geographic area information		O		O	O
LicenseDocument	To explain the terms of use for the dataset, including the required license type, rights description, and related documentation information		O		O
Checksum	To verify data integrity, it includes information such as verification algorithms and hash values		O		O
Relationship	To explain the relationships between data, include the names of related data, hierarchical relationships, and association information		O		O	O
Kind	To describe the category to which the dataset belongs, it includes information about the characteristics of the type, classfication, and category		O	O	O
Attribution	To describe the entities contributing to the dataset, include information about the relevant organizations or individuals, their roles, and their level of contrivution			O
Taxonomy	To systematically classify the topics within the dataset, it includes information on the topic classification system and topic items					O

Note. ‘O’ indicates the presence of the corresponding class. DCAT-AP = DCAT Application Profile for Data Portals in Europe; DCAT-AP.de = German DCAT Application Profile (or DCAT-AP for Germany); GeoDCAT-AP = Geospatial Profile of DCAT-AP; DCAT-Trans: DCAT Application Profile for Transportation Data.

Table 3. Twelve types of properties and sub-properties.

Types of Property	Description	Sub-Properties	Refs.
Data modification and creation	Information containing history data on creation, updates, modifications, enabling tracking of the data lifecycles, including the creator, publication date, modification history	wasGeneratedBy, issued	[53,54,55,56,57]
Index and classification	Information for indexing, including the dataset’s classification system, subject terms, keywords which supports users in exploring data suited to their purposes	type, keyword	[53,54,55,56,57]
Descripton	Basic information necessary for understanding the overall characteristics and content of the data, including its title, description, format, and other essential details information supporting data comprehension	description, title	[53,54,55,56,57]
Resolution	Information regarding the temporal and spatial resolution levels possessed by the data information supporting data evaluation according to analysis or utilization purpose	spatialResolutionInMeters	[53,54,55,56,57]
Metadata	Information including metadata’s own compliance with standards, reference schemas, and other details information supporting the structureal reliability and interoperability of metadata	conformsTo, isReferencedBy	[53,54,55,56,57]
Distribution	Technical distribution information, including the data distribution method, format, download path, file size, and other technical details, as well as information supporting data accessibility, such as the data provision method	downloadURL, bytesize	[53,54,55,56,57]
Spatiotemporal	Includes information on the temporal and spatial scope covered by the data supporting spatio-temporal analysis or limited use by region and period	temporal, spatial	[53,54,55,56,57]
Identification	Includes the data’s unique identifier, version information, supporting the tracking of the data’s identity and history	version, identifier	[53,54,55,56,57]
Linkage and Relationship	Includes information on references and linkages between other data related to specific data, supporting related data indexing and integrated data utilization	relation, isReferencedBy	[53,54,55,56,57]
Access and Rights	Information regarding permissions, licenses, access paths, and other details for data utilization, including information supporting legal restrictions on data use and managing authorized personnel access	accessRights, accessURL	[53,54,55,56,57]
Provider and Manager	Information regarding the entity that created and provides the data, the managing agency, and the responsible personnel, supporting the assurance of reliability regarding the data’s source and responsible entity	publisher, contactPoint	[53,54,55,56,57]
Assistance	Additional attributes not covered in the above items, such as other reference information and data integrity verification, which are necessary for schema composition but do not describe the data it self	servesDataset, checksumValue	[53,54,55,56,57]

Table 4. Summary of related works on metadata schema in various research domains.

Authors (Year)	Research Domain	Proposed Schema	Diagram	Case Study	Validation	Note	Ref.
S. Canham and C.Ohmann (2016)	Clinical research	O	X	X	X	-	[58]
Shin et al. (2019)	Transportation	O	X	X	X	-	[53]
X. Specka et al. (2019)	Soil-agricultural	O	O	X	X	-	[65]
N. Manouselis and C. Costopoulou (2006)	E-commerce	O	O	X	X	-	[66]
M.A. Cano et al. (2023)	Biomedical	O	O	X	X	-	[67]
P. Labropoulou et al. (2020)	Language technology	O	O	O	X	Pilot implementation utilizing the schema	[59]
M. Bermudez-Edo et al. (2017)	IoT	O	O	O	O	Schema validation based on testbed data	[64]
S. Welten et al. (2021)	Medical	O	O	O	X	Visualization implementation utilizing the schema	[60]
S. Mukherjee and R. Das (2020)	Cultural heritage	O	O	O	X	Schema validation based on real-world data	[61]
H. Abaza et al. (2025)	Health research	O	O	O	X	Visualization implementation utilizing the schema	[62]
Kim et al. (2015)	Cultural heritage	O	O	O	X	Visualization implementation utilizing the schema	[63]

Note. ‘O’ indicates that the elemnet was included in the study scope, and ‘X’ indicates it was not.

Table 5. Accident investigation items by data resources.

Items	Sub-Items	Data Resource	Current Data Availability
Essential information	ODD area	N/A	N/A
Party	behavior	accident report	O
	trajectory	DSSAD	O
	forward attention status	accident report	O
Object	cellphone usage status	accident report	O
	fixed	accident report	O
	movable	accident report	O
Traffic	Traffic flow Progression	N/A	N/A
Environment	road facility location	HD Map	O
	sun glare	Camera, LiDAR, Radar	O
	traffic signal information	Traffic Signal	O
Vehicle information	vehicle level	accident report	O
	autonomous mode	DSSAD	O
	conventional mode	accident report	O
H/W function fault	sense function	DSSAD	O
	perception & localize function	DSSAD	O
	scene function	DSSAD, Camera, LiDAR, Radar	O
	plan & decide function	DSSAD	O
	EV system	DSSAD	O
Chassis system	chassis type	N/A	N/A
Chassis system	chassis status	N/A	N/A
HMI	HMI type	N/A	N/A
HMI	HMI location	N/A	N/A
S/W function fault	sense function	DSSAD	O
	perception & localize function	DSSAD	O
	scene function	DSSAD, Camera, LiDAR, Radar	O
	plan & decide function	DSSAD	O
	EV system	DSSAD	O
Other function fault	ADS operational status	DSSAD	O
	DDT fallback moment	N/A	O
	risk minimization driving interval	DSSAD	O
Violation	type of violation	accident report	O
System version	software version	N/A	N/A
	firmware version	N/A	N/A
	hardware version	N/A	N/A
Communication	in-vehicle	N/A	N/A
Communication	external	BSM, CAM, TIM, DENM	O
Communication infrastructure	infrastructure type	N/A	N/A
	infrastructure status	N/A	N/A
	infrastructure location	N/A	N/A
Security	physical	N/A	N/A
Security	cyber	N/A	N/A
Virtual environment	road facility	HD Map	O
	visibility condition	Camera, LiDAR, Radar	O
	road configuration	HD Map	O
	road operation condition	TIM	O
	road type	HD Map	O
		Camera	O
	road condition	LiDAR	O
		Radar	O
	security & communication alert area	DENM	O

Note. ‘O’ indicates that data is currently available, while ‘N/A’ indicates that data is not available.

Table 6. Properties and sub-Properties for AV accident investigaion.

Class	12 Types of Properties	Sub-Properties	Description	Example	Remark
Dataset	Data modification and Creation	issued *	Date the dataset was first created	1 June 2025	Re-use existing DCAT and APs definitions or extend meanings
		modified *	Date the dataset was updated or modified	5 June 2025
		wasGeneratedBy *	How the dataset was created	LiDAR sesnsor raw data extraction
		accrualPeriodicity *	Update cycle of the dataset	Daily
	Identification	identifier	A unique identifier for a dataset	lidar-dataset-2025-06
	Identification	version	A version for managing modification and update history of the same dataset	v.1.2
	Index and Classification	keyword	Keywords of the dataset	Perception, Virtual environment
	Description	title *	Title of the dataset	AV LiDAR perception virtual environment data
		description *	Description of the dataset	LiDAR point cloud data collected during AV operation
		distribution *	Distribution method of the dataset	Provided in compressed file format
	Resolution	temporalResolution	Temproal resolution of the dataset	12 fps
	Resolution	spatialResolution	Spatial resolution of the dataset, point (individual object), line (linear object), area (area obejct) units	Driving trajectory at 0.5 m resolution
	Spatiotemporal	spatial	Spatial scope of the dataset	Major arterial roads within Seoul Metropolitan City
	Spatiotemporal	temporal	Temporal scope of the dataset	1 June 2025 5 June 2025
	Provider and Manager	creator	Name of creating agency or administrator	OO University AV research Lab.
	Provider and Manager	contactPoint	Contact information for creating agency or administrator	lidarlab@univ.ac.kr
	Metadata	conformsTo	Standards (e.g., metadata, technical specifications) that the dataset or service complies with	International Standard
	Metadata	isReferencedBy	Information on how the dataset is referenced and utilized by other datasets, documents	2025 AV environment perception report
	Relation	qualifiedRelation	Relationship with other datasets	prov:wasDerivedFrom AV2025_v1
		videoResolution	Resolution of the video dataset	1920 × 1080 @ 30 fps	New Properties (Data heterogeneity management)
		triggerMechanism	Data storage or generation condition	IMU sensor crash detection
		samplingRate	Temporal collection frequency or cycle of data	12 fps
		dataModality	Data representation format	Video (Point cloud)
		clipLength	Clip length of log or video data	45 s
		sensorType	Data collection equipment or sensor type	LiDAR (Front)
		dataGranularityLevel	Spatial and temporal resolution level of data	Spatial: High resolution (1 m or less) Temporal: High resolution (1 frame per second or more)
		eventTimeMarker	Synchronization reference point for multimodal data	15 May 2025 T08:30:10Z	New Properties (Multimodal data linkage and spatial information)
		prePostEventWindow	Recording time interval before and after a specific event	Pre: 15 s Post: 5 s	New Properties (Accident investigation process linkage)
		analysisSupportLevel	Level of support data provides for analysis	All levels (Raw data provided)
		investigationStep	Stage of accident investigation where data is utilized	Pre-crash (Virtual environment investigation)
		reportingPurpose	Purpose for creating data or documentation	Comparison of Real-World environments and AV-perception virtual environments
		dataSoruceEntity	Data collection entity or equipment	DSSAD_Extractor_Unit_42	New Properties (Special causes specific to AV)
		cyberSecurityEvent	Security events detected during the data collection process	Authentication failure warning	New Properties (Special causes specific to AV)
Distribution	Description	title	Title of distribution	LiDAR data distribution file	Re-use existing DCAT and APs definitions or extend meanings
	Description	description	Description of distribution	Compressed LAS (LASer) format Lidar data
	Data Modification and Creation	issued	Date the distribution was first created	1 June 2025
	Data Modification and Creation	modified	Date the distribution was updated or modified	5 June 2025
	Distribution	format *	Distribution format	LAS
		compressFormat *	Distribution file compression format	zip
		byteSize *	Distribution file size	25 GB
		downloadURL *	Distribution file download URL	https://data.exampl/lidar202506.zip
	Access and Rights	accessURL *	URL providing access to the specified distribution method	https://api.data.example/lidar
	Access and Rights	accessService *	Service endpoint provided for interacting with data distribution	REST API (JSON)
		beforebyteSize	Pre-compression distribution file size information	36 GB	New Properties (Data heterogeneity management)
		formatDetail	Detailed file format (e.g., encoding method) or standard name of the distribution data)	Video data, MP4 (H.264)	New Properties (Data heterogeneity management)
Catalog	Assistance	catalog *	Hierarchical relationships between catalogs	Top-level: City data catalog/Sub-level: Vehicle sensor catalog	Re-use existing DCAT and APs definitions or extend meanings
		service *	Provided information for datasets included in the catalog	LiDAR Data Service 2025
		dataset *	Individual data included in the catalog	LiDARDataset_202506
	Index and Classification	themeTaxonomy *	Subject classification criteria supporting dataset organization and retrieval	Road environment perception
Relationship	Linkage and Relationship	relation *	Related datasets	AV camera perception virtual environment data	Re-use existing DCAT and APs definitions or extend meanings
		isRequiredBy *	‘A’ dataset required for using ‘B’ dataset to enable complex data interpretation	3D Object Detection Dataset
		Requires *	‘B’ dataset required for using ‘A’ dataset to enable complex data interpretation	Road Surface Condition Dataset
		caseNum	Unique identifier information for individual accidents	CASE-20250601-0123	New Properties (Accident investigation process linkage)
		multiModalLinkage	Information on the relationship between ‘A’ dataset and other sensors, devices, and so forth	Camera → LiDar Matching	New Properties (Multimodal data linkage and spatial information)
Location	Spatiotemporal	geometry *	Spatial geometry of the data	LINESTRING (127.03 37.50, 127.05 37.52)	Re-use existing DCAT and APs definitions or extend meanings
		bbox *	Geographic boundaries of the data	POLYGON (127.02 37.49, 127.06 37.53)
		adminUnitL1	Highest-level administrative district to which the data belongs	Republic of Korea
		adminUnitL2	Second-highest administrative district to which the data belongs	Seoul Metropolitan City
		adminUnitL3	Third-highest administrative district to which the data belongs	Gangnam District
		adminUnitL4	Lowest-level administrative district to which the data belongs	Yeoksam 1-dong
		videodatabbox	Geographic boundary information in video data	FRAMEBOX (Seoul) 00:00:00–01:00:00 FRAMEBOX (Incheon) 01:00:00–02:00:00	New Properties (Multimodal data linkage and spatial information)
		geocoordinate	Spatial coordinate system information referenced by the data	37.7749° N
		geoContextType	Reference system type for spatial data (e.g., road network-based, administrative boundary-based)	Road network-based
DataService	Access and Rights	endpointDescription *	Description of operations possible using the endpoint	RESTful API for LiDAR data query	Re-use existing DCAT and APs definitions or extend meanings
		endpointURL *	Endpoint URL of the service providing the dataset	https://api.data.example/lidar/v1
		license *	Download and operation permissions for the dataset	CC BY 4.0
	Assistance	servesDataset	Datasets that can be deployed by the data service	LiDARDataset_202506
PeriodOfTime	spatiotemporal	endDate *	Data release end date	5 June 2025	Re-use existing DCAT and APs definitions or extend meanings
PeriodOfTime	spatiotemporal	startDate *	Data release end date	1 June 2025	Re-use existing DCAT and APs definitions or extend meanings
Agent	Provider and Manager	type *	Agent type	Organization	Re-use existing DCAT and APs definitions or extend meanings
		name *	Name of managing agency or administrator	OO University AV research center
		contactPoint	Contact information for managing agency or administrator	avcenter@univ.ac.kr
License	Access and Rights	type *	Required License type	CC BY 4.0	Re-use existing DCAT and APs definitions or extend meanings
Checksum	Assistance	algorithm *	Algorithm used to generate the checksumValue	SHA-256	Re-use existing DCAT and APs definitions or extend meanings
Checksum	Assistance	checksumValue *	Checksum generated using the algorithm checksumValue	9f2c7b3e4a…c12a	Re-use existing DCAT and APs definitions or extend meanings

Note. Properties marked with an asterisk (*) indicate mandatory fields defined in the standard DCAT and APs.

Table 7. Accident scenario overview.

Content	Description
Accident Location	Signalized Intersection
Scenario description	The traffic signal at the intersection was displaying a green light, but the attacker hacked the signal control system and transmitted a red light message to the AV. This caused the AV to brake abruptly, resulting in a rear-end collision by the MV following behind.
Driver task	Automated Driving System (ADS) fully engaged
AV function issue	Security breach
Issue type	Malicious message injection
System state	Hacked V2I system
Weather	Clear
Collision type	AV to MV
Crash type	Rear-end collision of MV

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, M.; Kim, N.; Kim, H.; Song, T.-J. Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation. Sustainability 2025, 17, 11237. https://doi.org/10.3390/su172411237

AMA Style

Kim M, Kim N, Kim H, Song T-J. Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation. Sustainability. 2025; 17(24):11237. https://doi.org/10.3390/su172411237

Chicago/Turabian Style

Kim, Minwook, Nayeon Kim, Heesoo Kim, and Tai-Jin Song. 2025. "Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation" Sustainability 17, no. 24: 11237. https://doi.org/10.3390/su172411237

APA Style

Kim, M., Kim, N., Kim, H., & Song, T.-J. (2025). Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation. Sustainability, 17(24), 11237. https://doi.org/10.3390/su172411237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of an Extended DCAT-Based Metadata Schema and Data Catalog for Autonomous Vehicle Accident Investigation

Abstract

1. Introduction

2. Related Works

2.1. Accident Investigation Process Traffic Accident Involving AV

2.2. Usable Data to Accident Investigation

2.3. Metadata Schema and Data Catalog

2.4. Contributions

3. Methodology

3.1. Mapping of Entities-Itemes-Data on Investigation Process

3.2. Definition of Classes and Properties Under Essential DCAT and AP

3.3. Meta Data Schema for AV Accident Investigation

3.4. The Visualization UI Based on the Designed Metadata Schema

4. Case Study: Application of the New Methodology for Autonomous Vehicle Accident Investigation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI