AI Training Data Management for Reliable Autonomous Vehicles Using Hashgraph

Suh, Yeonsong; Chung, Yoonseo; Park, Younghoon

doi:10.3390/app15116123

Open AccessArticle

AI Training Data Management for Reliable Autonomous Vehicles Using Hashgraph

by

Yeonsong Suh

¹

,

Yoonseo Chung

² and

Younghoon Park

^2,*

¹

Department of Computer Science, Sookmyung Women’s University, Seoul 04310, Republic of Korea

²

Division of Computer Science, Sookmyung Women’s University, Seoul 04310, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(11), 6123; https://doi.org/10.3390/app15116123

Submission received: 2 April 2025 / Revised: 23 May 2025 / Accepted: 27 May 2025 / Published: 29 May 2025

(This article belongs to the Topic Emerging AI+X Technologies and Applications)

Download

Browse Figures

Versions Notes

Abstract

Autonomous vehicles have attracted considerable attention from researchers and organizations, with artificial intelligence (AI) playing a key role in this technology. For AI models in autonomous vehicles to be reliable, the integrity of the training data is crucial, resulting in the development of various blockchain-based management systems. However, conventional blockchain systems incur significant time delays when processing training data transactions, posing challenges in autonomous vehicle environments that require real-time processing. In this study, we propose a hashgraph-based training data management system for trusted AI. To validate our system, we conducted simulations using the CARLA simulator and compared its performance to a conventional blockchain-based system. The simulation results show that Hedera achieved significantly lower latencies and better scalability than Ethereum, confirming its suitability for secure and efficient AI data verification in autonomous systems.

Keywords:

hashgraph; blockchain; autonomous vehicle; AI; training data

1. Introduction

The global adoption of autonomous vehicles is rapidly transforming modern transportation systems [1]. Companies such as Tesla and Waymo are at the forefront of developing self-driving technologies, while cities such as San Francisco are actively deploying driverless taxi services [2]. At present, China is making substantial progress in autonomous vehicle research through large-scale, government-backed initiatives, underscoring its ambition to become a global leader in this field. These efforts are closely tied to advancements in deep learning and artificial intelligence (AI).

Despite significant progress, ensuring the reliability and safety of autonomous vehicles remains a critical challenge. These systems must function effectively under unpredictable conditions, such as adverse weather, unexpected road obstacles, and deviations from predefined routes. Their performance heavily depends on the quality of training data; poor or biased data can result in critical decision-making errors [3]. For instance, if an AI system misinterprets a partially obscured or degraded traffic sign, this may result in incorrect speed adjustments or hazardous driving behavior, posing serious safety risks. While various factors influence the reliability of AI models, this study focuses on ensuring the integrity and accuracy of training data.

To address this issue, blockchain technology has attracted considerable attention from researchers and developers. Leveraging robust cryptographic mechanisms—such as hash functions and digital signatures—blockchain helps to prevent data tampering [4]. Moreover, distributed ledger technology (DLT) enhances system resilience by eliminating single points of failure, thereby improving overall data reliability. These properties render blockchain a promising tool for safeguarding training datasets against unauthorized modifications.

In the context of autonomous driving AI, blockchain has been explored as a means of securing vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications, ensuring the integrity of the data exchanged among autonomous vehicles and mitigating risks such as data manipulation and identity forgery. Recent studies have proposed blockchain-based frameworks to protect self-driving vehicles from cyberattacks [5]. These frameworks ensure the verifiability and authenticity of shared sensor data, which are crucial for AI-driven decision-making in real-time traffic environments [6].

Beyond securing communication channels, blockchain technologies have also been employed to manage training and sensor datasets in AI-driven environments. This is particularly important in scenarios where data integrity, traceability, and verifiability are essential for building secure and trustworthy AI models. Several studies have proposed blockchain-based frameworks that enable the secure storage, sharing, and validation of AI-related data, especially in domains such as autonomous driving and industrial IoT [7,8]. By leveraging blockchain’s decentralized and tamper-resistant characteristics, these approaches enhance the transparency and reliability of end-to-end AI pipelines.

However, conventional blockchain architectures face inherent limitations that restrict their applicability to real-time AI applications [9]. High computational overhead, slow transaction speeds, and limited scalability remain significant challenges. Many blockchain networks rely on computationally intensive consensus algorithms, resulting in excessive energy consumption and high operational costs. Furthermore, blockchain transactions are typically processed sequentially, causing network congestion and increased latency. These limitations pose critical barriers for autonomous driving systems, where real-time data verification and low-latency decision-making are essential for ensuring safety and performance. Overcoming these barriers requires the exploration of alternative distributed ledger technologies that offer greater efficiency and scalability.

Our study proposes a problem–solution approach using blockchain to build reliable AI models and reduce processing times through faster transactions. To address these challenges, this study investigates the potential of hashgraph—an advanced distributed ledger technology designed for fast, scalable, and secure transaction processing [10]. Unlike traditional blockchains, hashgraph uses a directed acyclic graph (DAG) structure, allowing for asynchronous and parallel transaction processing [11]. This architecture eliminates sequential bottlenecks and significantly increases throughput, enabling the network to process thousands of transactions per second while maintaining robust security. Additionally, hashgraph’s consensus mechanism offers low-latency finality, rendering it particularly well suited for applications requiring rapid data validation. As each participant maintains their own transaction history, hashgraph is also advantageous for collaborative AI model development, including ensemble and federated learning. These features position hashgraph as a promising solution for managing the vast and continuously growing datasets required for training AI models in autonomous vehicles.

This study proposes a secure and efficient training data management framework for autonomous vehicles using hashgraph. The framework collects training data from autonomous vehicles—including road conditions, traffic signals, and environmental features—and transmits them to both a central server and the hashgraph network. After processing, refined AI models are distributed back to the vehicles to improve real-world performance. To evaluate the feasibility of the proposed approach, we utilized Hedera, the most widely adopted implementation of hashgraph, and conducted simulations using CARLA, an open-source autonomous driving research platform. The results demonstrate that integrating hashgraph into the autonomous vehicle data pipeline ensures secure data storage, efficient verification, and reliable AI training. Leveraging Hedera’s high throughput and tamper-resistant storage capabilities, our proposed system enhances the trustworthiness of AI-driven transportation technologies, contributing to the development of safer and more reliable autonomous mobility solutions.

2. Related Research

Blockchain technology has been extensively explored as a means of ensuring data integrity and security across various domains, including autonomous vehicle systems [12]. Traditional blockchain-based platforms such as Ethereum and Hyperledger employ cryptographic hashing and distributed consensus mechanisms to prevent unauthorized data modifications. However, these solutions face inherent scalability limitations, particularly in real-time environments that demand low-latency data validation [13].

To improve blockchain performance in asynchronous networks, several Byzantine Fault-Tolerant (BFT) consensus algorithms have been proposed. PBFT (Practical Byzantine Fault Tolerance) [14] and Tendermint [15], for instance, adopt a partially synchronous network model to realize consensus under normal operating conditions. Nevertheless, these approaches often encounter network bottlenecks and computational overhead due to their reliance on sequential transaction processing. To address these inefficiencies, asynchronous BFT (ABFT) mechanisms have been introduced [16]. Hashgraph utilizes a gossip-based protocol to achieve ABFT, enabling parallel transaction processing while ensuring data integrity and immutability.

Recent studies have underscored the advantages of hashgraph in AI-driven applications. Lasy et al. [17] investigated its integration into distributed AI models, demonstrating its effectiveness in managing high-throughput data transactions with minimal latency. Alternative consensus protocols such as HoneyBadgerBFT [18] and BEAT [19] have introduced probabilistic mechanisms aimed at enhancing communication efficiency in adversarial network environments. These techniques have been successfully applied to decentralized AI training frameworks, improving both data availability and security while offering viable alternatives to conventional blockchain-based data management approaches.

However, the application of hashgraph in autonomous driving systems remains largely underexplored. Existing implementations have primarily focused on financial services and general AI data management. Recent studies have emphasized the potential of hashgraph-based key management frameworks for secure vehicle-to-vehicle communication, employing logical key hierarchies and batch rekeying to minimize transaction latency and adapt effectively to dynamic traffic conditions [20]. Concurrently, directed acyclic graph (DAG)-based blockchain architectures have emerged as scalable alternatives to conventional blockchains. These systems enable high-throughput, low-latency processing, rendering them particularly well-suited for real-time autonomous vehicle coordination and microtransaction scenarios, such as electric vehicle (EV) charging [21]. Moreover, the integration of blockchain with artificial intelligence (AI) and fog computing has been proposed to enhance the reliability of AI-driven decision-making in autonomous systems by ensuring secure and tamper-resistant data processing [22]. Further research is warranted to evaluate its feasibility in safety-critical environments where real-time data processing and decision-making are essential.

As summarized in Table 1, previous studies have addressed only individual aspects, such as hash storage or real-time verification, without presenting an integrated framework. In contrast, our approach combines smart contracts, real-time validation, and cross-platform benchmarking between Hedera and Ethereum, addressing both theoretical limitations and deployment challenges in blockchain-based AI data verification.

3. Blockchain and Hashgraph

Blockchain is a widely adopted distributed ledger technology that ensures data integrity through cryptographic hashing and decentralized consensus mechanisms. Figure 1a illustrates its operational mechanism. Each block in a blockchain contains a list of transactions that are cryptographically linked to the previous block using a hash function [12]. This structure guarantees immutability, as modifying a single transaction would require recalculating all subsequent hashes, making tampering computationally infeasible. Nonetheless, blockchain systems differ widely in how well they support high-performance and low-latency applications.

Blockchain networks employ various consensus mechanisms. Proof of Work (PoW)—one of the most common approaches—requires nodes to solve cryptographic puzzles in order to validate transactions. While this ensures security, it results in high latency and energy consumption [13]. Moreover, PoW-based blockchains process transactions sequentially, resulting in network congestion and increased transaction fees, especially during periods of high demand. While consensus protocols such as Proof of Stake (PoS) and Practical Byzantine Fault Tolerance (PBFT) have been introduced as alternatives, they continue to face limitations in scalability and responsiveness within large-scale or real-time networks. These limitations render traditional blockchains unsuitable for real-time applications such as autonomous driving and large-scale IoT networks [23].

To overcome the limitations of traditional blockchains, hashgraph introduces an innovative consensus mechanism based on parallel event-driven processing as opposed to a conventional sequential block structure [16]. It leverages a directed acyclic graph (DAG) topology, where each transaction is represented as a vertex that references preceding events. This approach facilitates simultaneous transaction recording and significantly improves scalability. Hashgraph achieves consensus through a distinctive integration of “gossip about gossip” and “virtual voting” [24]. In this process, each node randomly communicates with others, exchanging event histories that include both the origin and timestamp of each message. These metadata enable every node to reconstruct a consistent global perspective of the event’s chronology [21]. The gossip protocol ensures that information spreads exponentially across the network, realizing complete dissemination in logarithmic time relative to the number of participants. Through virtual voting, the nodes establish consensus without the need for additional message exchanges, thereby minimizing communication overhead while maintaining consistency and fairness [16].

In addition to its performance advantages, hashgraph provides asynchronous Byzantine Fault Tolerance (aBFT), allowing the network to maintain reliable operations even in adversarial scenarios [25]. Unlike partially synchronous protocols, aBFT guarantees both safety and liveness without imposing constraints on message delivery times, making it highly suitable for unstable or malicious network conditions. Moreover, it ensures equitable transaction ordering, effectively reducing the likelihood of front-running and order manipulation—well-documented vulnerabilities in traditional blockchain systems. For instance, in autonomous vehicle coordination, hashgraph enables real-time consensus with respect to traffic data among vehicles without relying on centralized infrastructure. In IoT deployments, its lightweight framework supports secure data validation across distributed sensor nodes with limited computational capabilities.

By enabling parallel transaction processing, the DAG-based architecture reduces latency and enhances system performance. The consensus algorithm of hashgraph facilitates asynchronous agreement among network participants, maintaining consistency and reliability even under high transactional loads. This combination of parallelism and consensus provides an efficient framework for distributed data management. Due to its minimal latency and excellent scalability, hashgraph is well suited for latency-sensitive applications such as autonomous vehicle networks, large-scale IoT systems, and ad hoc mobile communications. Its resource-efficient design facilitates deployment in constrained environments while ensuring secure and consistent data synchronization.

4. Proposed System

This section outlines the overall structure of the proposed system, highlighting its key components and their interactions.

4.1. System Overview

To ensure the integrity and security of training data in autonomous vehicle systems, we propose a blockchain-enhanced framework that integrates Hedera for verifying data before it is used for AI model training. The overall process consists of the five following sequential stages:

Step 1: Data Collection: Onboard sensors, such as cameras and LiDAR, collect environmental data from the vehicle’s surroundings. This process constitutes a part of the standard data acquisition pipeline.
Step 2: Hashing and Logging: In our proposed system, the collected data are hashed using a cryptographic function (e.g., SHA-256), and the resulting hash value is logged on the Hedera network. This step introduces decentralized integrity validation and is represented in red in Figure 2. The use of Hedera ensures that the fingerprint of the original data is tamper-proof and time-stamped across a distributed ledger.
Step 3: Data Transmission: The raw sensor data are simultaneously transmitted to a centralized server for storage. This step adheres to the conventional AI training workflow.
Step 4: Verification Before Training: Prior to model training, the system recomputes the hash of the locally stored data and compares it to the original hash retrieved from Hedera. A match confirms that the data have not been altered since the initial logging. This verification module—also highlighted in red—acts as a gatekeeper for maintaining data integrity.
Step 5: Model Training: If the integrity check is successful, the validated data are passed to the AI training module. The model then learns from trustworthy inputs, ensuring the reliability of downstream predictions.

Cryptographic hashes are inherently non-reversible, meaning that the hash value alone cannot be used to reconstruct the original data. Therefore, Hedera does not serve as a data storage platform; rather, it serves as a decentralized integrity validation mechanism. This design choice minimizes storage overhead while maximizing verifiability. Through combining real-time data acquisition with immutable, ledger-based verification, the proposed system enhances the trustworthiness of the AI training pipeline. It ensures that only authenticated and untampered data are used during model training, thereby improving both the safety and performance of autonomous driving systems.

4.2. Decentralized Applications

Decentralized applications (DApps) are autonomous software solutions built on blockchain infrastructures, allowing logic execution and data validation to occur without reliance on centralized authorities. These applications leverage smart contracts to implement tamper-proof business rules and are particularly well-suited for use cases that demand transparency, security, and decentralized trust. In autonomous vehicle systems, DApps play a vital role in ensuring the integrity and authenticity of sensor-generated datasets used for AI training. By embedding cryptographic validation mechanisms directly within the data pipeline, DApps enable the real-time decentralized verification of data before they are consumed by deep learning models. This is especially important in safety-critical domains, where decisions depend on complex perception models trained on large volumes of sensor input. To address this, our system incorporates two dedicated DApps, one responsible for securely registering dataset hashes upon data collection and another for verifying data consistency at the point of model training.

4.2.1. DApp for Data Uploading

This DApp establishes a decentralized method for ensuring dataset integrity without storing raw data directly on the blockchain. Instead, it records cryptographic hashes that serve as verifiable proofs of authenticity while significantly reducing storage overhead.

Hashing Data: Before data are transmitted to the server, the system applies the SHA-256 algorithm to each dataset. SHA-256 is a widely recognized cryptographic hash function that produces a fixed-length 256 bit output. It is known for its collision resistance, meaning that it is computationally impractical for two distinct inputs to generate the same hash. Additionally, its one-way property ensures that the original data cannot be reconstructed from the hash, thereby enhancing data privacy and security. Even minimal alterations in the dataset yield entirely different hashes, enabling robust integrity verification.
Storing Hash Values: The resulting hashes are permanently recorded on the blockchain via smart contracts deployed on platforms such as Hedera Hashgraph or Ethereum Geth. Hedera uses a gossip-based protocol in which nodes randomly exchange information, allowing for rapid and efficient consensus with minimal communication overhead. This mechanism provides strong fault tolerance and scalability, rendering it ideal for applications requiring real-time data handling and high throughput. In contrast, Ethereum has traditionally used a Proof-of-Work (PoW) consensus mechanism, where miners solve complex mathematical problems to validate transactions and append blocks to the chain. Although PoW is highly secure, it suffers from high latencies, substantial energy consumption, and limited scalability, particularly under heavy network traffic.

Incorporating this DApp into the autonomous vehicle data pipeline guarantees that data authenticity is validated before any AI training process begins. By only storing cryptographic fingerprints instead of raw datasets, the system realizes substantial savings in on-chain storage costs and complexity.

4.2.2. DApp for Data Checking

This DApp is designed to verify the consistency of training data throughout the entire pipeline. Before initiating model training, the system retrieves the previously stored cryptographic hash and compares it with a newly generated hash derived from the current version of the dataset. This comparison confirms whether the training data have remained unaltered, ensuring end-to-end data integrity.

Integrity Verification: Prior to granting an AI model access to the dataset, the system performs an integrity check by fetching the original cryptographic hash from the blockchain and comparing it with a freshly computed hash of the present dataset. A match between the two values validates that the data have not been modified or corrupted, thereby allowing the training process to proceed with high assurance. This verification step is crucial for maintaining the accuracy and reliability of training data, especially in safety-critical AI systems.

Such verification is particularly vital in autonomous vehicle applications, where model performance, operational safety, and decision reliability are tightly linked to the quality and trustworthiness of the training data.

4.3. Smart Contract

A smart contract is a self-governing, automatically executed program deployed on a blockchain that enforces predefined rules without dependence on centralized authorities. It improves data reliability, transparency, and security by automating transaction workflows and maintaining immutable records across the distributed ledger. Unlike conventional software systems that require administrative involvement or manual supervision, smart contracts are executed uniformly across all network nodes. This decentralized consensus framework ensures deterministic behavior and verifiable execution, regardless of the system’s conditions or user’s privileges.

In the proposed system architecture, the smart contract is designed to manage cryptographic hash values instead of directly storing raw datasets. This approach significantly reduces storage overhead while ensuring data integrity and supporting system scalability. In contrast to traditional database systems that rely on centralized management and that are prone to single points of failure or unauthorized changes, smart contracts function in a distributed environment. They inherently offer immutability and trustless verification, guaranteeing that once the integrity of a dataset is recorded, it cannot be modified or deleted.

To support these functionalities, the smart contract implemented in our framework consists of two primary components, namely state variables and execution logic. The state variables fulfill the following roles: they store basic text entries for demonstration or testing purposes; associate dataset identifiers with their respective cryptographic hash values; and record the total number of hashes registered in the system. Additionally, the contract defines two fundamental processes to facilitate hash-based verification. The first process records a newly computed hash on the blockchain and links it with a unique internal reference. The second process confirms dataset integrity by comparing the current hash with the previously stored one, thereby verifying whether the data have remained unchanged.

Figure 3 uses color-coded blocks to distinguish between storage-related variables and verification methods. Taken together, these components enable the smart contract to manage dataset authenticity and integrity. The contract is designed to support the following two main operations:

Hash registration records cryptographic hashes on the blockchain to generate unique and immutable identifiers for each dataset. This process ensures that all datasets are verifiably protected against unauthorized alterations.
Integrity verification retrieves and compares stored hashes with newly computed ones to detect discrepancies, thereby validating the dataset’s integrity prior to AI training.

Algorithms 1 and 2 formally describe the uploading and verification logic implemented in the smart contract. The process begins with the computation of a dataset’s hash value, which is then registered on the blockchain. Later, the system recomputes the hash and compares it with the stored value to verify whether the dataset has remained intact.

Algorithm 1 Uploading Process

1:: Input: Dataset folder path, smart contract instance
2:: for all files in dataset folder do
3:: Read file contents
4:: Compute SHA-256 hash
5:: Send transaction: upload_hash(fileIndex, fileHash)
6:: Confirm transaction
7:: end for

Algorithm 2 Verification Process

1:: Input: Dataset folder path, smart contract instance
2:: for all files in dataset folder do
3:: Read file contents
4:: Compute SHA-256 hash
5:: Retrieve stored hash: query_hash(fileIndex)
6:: if computed hash == stored hash then
7:: Log success
8:: end if
9:: end for

In the registration phase, the smart contract generates a cryptographic digest for each dataset and immutably stores it on the blockchain. This ensures that any subsequent attempt to alter the data can be reliably detected by comparing the original hash with a newly generated one. The process establishes a verifiable snapshot of the dataset’s state at the time of its initial upload. During the verification phase, a new hash is computed from the current dataset and checked against the original. A match confirms that the dataset is authentic and suitable for use in AI model training. If the values differ, the system flags a potential breach, triggering an alert for further investigation before proceeding with training or deployment. By incorporating this mechanism, the smart contract provides a transparent and tamper-evident foundation for data governance. It ensures a secure and scalable structure for decentralized data workflows—particularly important in real-time AI systems such as autonomous vehicles, where reliability and decision accuracy critically depend on trustworthy input data.

5. Experimental Results

To evaluate the proposed system, we conducted experiments in a controlled virtual environment using the CARLA simulator (version 0.9.13). The simulation server was executed via Docker on a Linux-based system with the following hardware specifications:

Operating System: Ubuntu 20.04 LTS (Canonical Ltd., London, UK);
CPU: Intel Xeon Silver 4216 @ 2.10 GHz (32 cores) (Intel Corporation, Santa Clara, CA, USA);
Memory: 256 GB RAM;
GPU: NVIDIA RTX A6000 (NVIDIA Corporation, Santa Clara, CA, USA).

The CARLA simulator was containerized and launched using Docker, with all sensor simulations and environment controls managed internally. Sensor data were collected from simulated autonomous driving scenarios using a custom Python 3.7-based acquisition script integrated with CARLA’s API. This script emulated real-time sensor inputs (camera and LiDAR) to generate raw datasets. To simulate blockchain-based verification, smart contracts were deployed to local test environments using both Hedera and Ethereum platforms. The smart contract logic followed the procedures described in Section 3, implementing hash storage and validation using Web3 APIs. Client-side operations, including hashing and transaction requests, were performed on a macOS-based system with the following specifications:

CPU: Apple M1 (Apple Inc., Cupertino, CA, USA);
Memory: 16 GB RAM;
Storage: 1 TB SSD.

Our proposed system is designed for deployment in real-world autonomous vehicle environments, where on-device data integrity verification is critical. Due to practical limitations, however, the experiments were conducted using the CARLA simulator, which offers a realistic and controllable testing environment for autonomous driving research. CARLA enables simulations with virtual vehicle sensors, allowing us to evaluate the performance of the system under conditions that approximate real-world deployment. The results obtained from this simulated environment provide theoretical validation and insights that are transferable to actual use cases, thereby supporting the feasibility of blockchain-based data integrity frameworks. All blockchain-related functionalities, including transaction requests and hash comparisons, were implemented using JavaScript with Web3.js, supporting interoperability across Hedera and Ethereum platforms.

To assess the performance of Hedera and Ethereum, we measured the time required to upload and verify hash values across datasets of different sizes (10, 100, 300, 500, 700, and 1000 images). Each image was processed using the SHA-256 hashing algorithm before being uploaded to the blockchain, and the corresponding hash values were later retrieved and verified via smart contracts.

The experiments were carried out on a local Hedera network and an Ethereum Geth node, with both running under identical hardware conditions. The measured execution times for each dataset size are presented in Figure 4 and Figure 5.

Our analysis showed that Hedera significantly outperformed Ethereum in terms of transaction speed, and this is primarily due to its asynchronous Byzantine Fault-Tolerant (aBFT) consensus algorithm. In contrast, Ethereum exhibited considerably higher latency because of its Proof-of-Work (PoW) consensus mechanism. Hedera also demonstrated near-linear scalability, whereas Ethereum’s execution time increased exponentially with the dataset’s size. Additionally, Ethereum incurred escalating gas fees proportional to the dataset volume, while Hedera maintained consistently low and predictable transaction costs. Ethereum was included in our evaluation as a representative benchmark due to its widespread adoption and foundational role in smart contract development. As one of the most extensively used blockchains for decentralized applications and academic studies, Ethereum provides a meaningful baseline for assessing the practicality of alternative platforms.

These findings suggest that Hedera offers a more efficient and scalable solution for high-volume AI training data verification. Given the stringent demands of real-time, large-scale data integrity validation in autonomous driving systems, Hedera’s consensus protocol emerges as a practical alternative to conventional blockchain frameworks. Moreover, Hedera’s architectural advantages—such as finality without forking and high throughput—enable it to sustain performances under increasing data loads, which is essential for real-world deployment in production-grade AI pipelines. The insights gained from this evaluation can inform the development of future implementations for secure, scalable, and cost-effective data verification frameworks in autonomous vehicle environments.

6. Conclusions and Future Research

Our experimental findings validated Hedera Hashgraph as a more efficient and scalable alternative to Ethereum for managing secure datasets in AI training workflows. Its low latency and high throughput render it particularly suitable for real-time applications, especially in the domain of autonomous vehicle systems. Ensuring the integrity of training data is essential for building reliable AI models, particularly in safety-critical contexts such as autonomous driving. The proposed framework overcomes the limitations of conventional blockchain-based methods by enabling faster verification, minimizing computational overhead, and enhancing scalability through the use of an asynchronous Byzantine Fault-Tolerant (aBFT) consensus algorithm. Despite the promising results, this study has several limitations. First, the evaluation was conducted in a simulated environment using CARLA and has not yet been validated under real-world deployment conditions. Second, the current system primarily focuses on time and cost efficiency without assessing other dimensions such as energy consumption or resilience against potential security threats. Third, the dataset used was limited in diversity, consisting of a fixed set of images that may not fully capture the variability encountered in actual driving scenarios. Addressing these limitations will be a key focus of future research.

Future studies will focus on integrating deep learning models that leverage the verified datasets for improved AI inference and decision-making accuracy. Additionally, we plan to further optimize smart contract execution on the Hedera platform by investigating lightweight transaction encoding and parallelized logic structures. Field-scale evaluations will be conducted in real-world autonomous driving scenarios to assess the robustness of the system under continuous high-frequency data ingestion. Beyond autonomous vehicles, the proposed framework can be extended to other data-intensive fields—such as medical diagnostics, industrial automation, and smart city infrastructure—where tamper-evident and trustworthy data validation is vital for system performance and safety.

Author Contributions

Conceptualization, Y.S.; Methodology, Y.S.; Software, Y.S. and Y.C.; Investigation, Y.S.; Writing—original draft preparation, Y.S.; Supervision, Y.P.; Project administration, Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Institute of Information and Communications Technology Planning and Evaluation (IITP) grant funded by the Korea government (MSIT) (No. RS-2022-00207391: Development of Hashgraph-based Blockchain Enhancement Scheme and Implementation of Testbed for Autonomous Driving). The APC was funded by the same agency.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The simulation data generated using the CARLA autonomous driving simulator are available upon reasonable request.

Acknowledgments

The authors thank the Institute of Information and Communications Technology Planning and Evaluation (IITP) for their support.

Conflicts of Interest

The authors declare no conflict of interest.

References

Iordache, S.; Patilea, C.C.; Paduraru, C. Enhancing Autonomous Vehicle Safety with Blockchain Technology: Securing Vehicle Communication and AI Systems. Future Internet 2024, 16, 471. [Google Scholar] [CrossRef]
Dikmen, M.; Burns, C. Trust in autonomous vehicles: The case of Tesla Autopilot and Summon. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 1093–1098. [Google Scholar] [CrossRef]
Author, A.; Author, B. A Study to Investigate the Role and Challenges Associated with Deep Learning Algorithms in Autonomous Vehicles. World Electr. Veh. J. 2024, 15, 518. [Google Scholar] [CrossRef]
Author, C.; Author, D. Blockchain-Based Data-Preserving AI Learning Environment Model. Appl. Sci. 2020, 10, 4718. [Google Scholar] [CrossRef]
Ali, M.S.; Rehman, M.H.U.; Salah, K.; Jayaraman, R. Blockchain-Enabled Secure Communication for Intelligent Transportation Systems. Electronics 2023, 12, 152. [Google Scholar] [CrossRef]
Bendiab, G.; Hameurlaine, A.; Germanos, G.; Kolokotronis, N.; Shiaeles, S. Autonomous Vehicles Security: Challenges and Solutions Using Blockchain and Artificial Intelligence. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3614–3637. [Google Scholar] [CrossRef]
Sharma, P.K.; Park, J.H. Blockchain based hybrid network architecture for the smart city. Future Gener. Comput. Syst. 2018, 86, 650–655. [Google Scholar] [CrossRef]
Lu, Y.; Huang, X.; Dai, Y.; Maharjan, S.; Zhang, Y. Blockchain and federated learning for privacy-preserved data sharing in industrial IoT. IEEE Trans. Ind. Inform. 2019, 16, 4177–4186. [Google Scholar] [CrossRef]
Author, E.; Author, F. Enhancing Security and Accountability in Autonomous Vehicles Using Blockchain Technology. Electronics 2023, 12, 4998. [Google Scholar] [CrossRef]
Author, G.; Author, H. Blockchain-Empowered AI for 6G-Enabled Internet of Vehicles. Electronics 2021, 11, 3339. [Google Scholar] [CrossRef]
Salah, K.; Rehman, M.H.U.; Nizamuddin, N.; Al-Fuqaha, A. Blockchain for AI: Review and Open Research Challenges. IEEE Access 2019, 7, 10127–10149. [Google Scholar] [CrossRef]
Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 15 April 2025).
Chauhan, A.; Malviya, O.P.; Verma, M.; Mor, T.S. Blockchain and Scalability. In Proceedings of the 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRS-C), Lisbon, Portugal, 16–20 July 2018; pp. 122–128. [Google Scholar] [CrossRef]
Castro, M.; Liskov, B. Practical Byzantine Fault Tolerance. In Proceedings of the OSDI, New Orleans, LA, USA, 22–25 February 1999; pp. 173–186. [Google Scholar]
Buchman, E.; Kwon, J.; Milosevic, Z. The Latest Gossip on BFT Consensus. arXiv 2018, arXiv:1807.04938. [Google Scholar]
Baird, L. The Swirlds Hashgraph Consensus Algorithm: Fair, Fast, Byzantine Fault Tolerance; Swirlds Technical Report; SWIRLDS-TR-2016-01; Swirlds: Dallas, TX, USA, 2016; pp. 9–11. [Google Scholar]
Lasy, T. From Hashgraph to a Family of Atomic Broadcast Algorithms. arXiv 2019, arXiv:1912.05895. [Google Scholar]
Miller, A.; Xia, Y.; Croman, K.; Shi, E.; Song, D. The Honey Badger of BFT Protocols. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 31–42. [Google Scholar]
Duan, S.; Reiter, M.K.; Zhang, H. BEAT: Asynchronous BFT Made Practical. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 2028–2041. [Google Scholar]
Jha, S.; Jha, N.; Prashar, D.; Ahmad, S.; Alouffi, B.; Alharbi, A. Integrated IoT-Based Secure and Efficient Key Management Framework Using Hashgraphs for Autonomous Vehicles to Ensure Road Safety. Sensors 2022, 22, 2529. [Google Scholar] [CrossRef] [PubMed]
Bai, Y.; Lee, S.; Seo, S.H. A Survey on Directed Acyclic Graph-Based Blockchain in Smart Mobility. Sensors 2025, 25, 1108. [Google Scholar] [CrossRef] [PubMed]
Bhumichai, D.; Smiliotopoulos, C.; Benton, R.; Kambourakis, G.; Damopoulos, D. The Convergence of Artificial Intelligence and Blockchain: The State of Play and the Road Ahead. Information 2024, 15, 268. [Google Scholar] [CrossRef]
Haque, E.U.; Abbasi, W.; Almogren, A.; Choi, J.; Altameem, A.; Rehman, A.U.; Hamam, H. Performance Enhancement in Blockchain-Based IoT Data Sharing Using Lightweight Consensus Algorithm. Sci. Rep. 2024, 14, 26561. [Google Scholar] [CrossRef] [PubMed]
Baird, L.; Harmon, M.; Madsen, P. Hedera: A Public Hashgraph Network and Governing Council; White Paper; Hedera Hashgraph, LLC.: Richardson, TX, USA, 2019; Volume 1, pp. 9–10. [Google Scholar]
Lai, R.; Zhao, G.; He, Y.; Hou, Z. A Robust Sharding-Enabled Blockchain with Efficient Hashgraph Mechanism for MANETs. Appl. Sci. 2023, 13, 8726. [Google Scholar] [CrossRef]

Figure 1. Operational flow of the system: (a) Blockchain (b) Hashgraph.

Figure 2. Proposed model.

Figure 3. Structure of a smart contract.

Figure 4. Uploading time comparison.

Figure 5. Checking time comparison.

Table 1. Differences between this study and prior research.

Study	Core Features	Validation	Multi-Platform
Lasy et al. [17]	Hash Only	X	X
Jha et al. [20]	Hash + Contract	X	X
Bhumichai et al. [22]	None	O	X
Our Work	Hash + Contract	O	O

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Suh, Y.; Chung, Y.; Park, Y. AI Training Data Management for Reliable Autonomous Vehicles Using Hashgraph. Appl. Sci. 2025, 15, 6123. https://doi.org/10.3390/app15116123

AMA Style

Suh Y, Chung Y, Park Y. AI Training Data Management for Reliable Autonomous Vehicles Using Hashgraph. Applied Sciences. 2025; 15(11):6123. https://doi.org/10.3390/app15116123

Chicago/Turabian Style

Suh, Yeonsong, Yoonseo Chung, and Younghoon Park. 2025. "AI Training Data Management for Reliable Autonomous Vehicles Using Hashgraph" Applied Sciences 15, no. 11: 6123. https://doi.org/10.3390/app15116123

APA Style

Suh, Y., Chung, Y., & Park, Y. (2025). AI Training Data Management for Reliable Autonomous Vehicles Using Hashgraph. Applied Sciences, 15(11), 6123. https://doi.org/10.3390/app15116123

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AI Training Data Management for Reliable Autonomous Vehicles Using Hashgraph

Abstract

1. Introduction

2. Related Research

3. Blockchain and Hashgraph

4. Proposed System

4.1. System Overview

4.2. Decentralized Applications

4.2.1. DApp for Data Uploading

4.2.2. DApp for Data Checking

4.3. Smart Contract

5. Experimental Results

6. Conclusions and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI