Blockchain-Based Decentralized Identity Management System with AI and Merkle Trees
Abstract
1. Introduction
2. Background
2.1. Centralized and Decentralized Identity Models
2.2. OCR and Document Field Extraction
2.3. Data Collection and Preprocessing
- Camera-based: Users manually photographed cards under real-world conditions. Preprocessing included resolution normalization, rotation correction, corner detection, and perspective transformation.
- NFC-based: Grayscale images were extracted from embedded NFC chips, offering high fidelity with minimal preprocessing beyond alignment and integrity checks.
2.4. Merkle Tree Anchoring for Image Verification
3. Related Work
Comparison with Existing Frameworks
4. Experiment Setup
- Identity Provider (IdP): Manages the issuance, verification, revocation, and update of identity claims using Merkle Trees.
- Service Provider (SP): Consumes verified identity data to offer services, performing verification based on Merkle hash proofs.
- User: Owns identity attributes stored as Merkle Tree hashes on-chain and interacts with SPs via zero-knowledge-based proofs.
- User Enrollment: Users submit required documents to the Identity Provider. These documents are processed using AI to extract key identity features. OCR technology converts visual content to text, which is subsequently verified by a freelance identity confirmation service. The verified data is encoded into a Merkle Tree structure, with the root hash published on the blockchain, allowing for tamper-proof reference to user attributes.
- Identity Verification: During identity proofing, Service Providers request users to submit Merkle Tree hash certificates. There are two service types:
- −
- Only requires verification of the identity status and its validator (i.e., the IdP), relying solely on the Merkle root hash.
- −
- Requires access to specific verified attributes (e.g., address, ID number). The user must approve each access request, reinforcing privacy.
- Authentication: After identity verification, users receive login credentials and a Merkle certificate. These credentials allow for secure login while enabling SPs to verify identity attributes cryptographically without accessing raw data. A challenge–response protocol is also employed to ensure the user’s proof of possession of the claimed attributes.
- Access and Revocation Control: The IdP enforces fine-grained access controls. Only SPs authorized by the user can access the requested attributes. Users retain full authority to revoke or update access permissions. Any update made to an attribute is rehashed and published, with SPs automatically notified through Merkle tree linkage.
4.1. User Enrollment
- Account Initialization via MetaMask: Users begin the account creation process by utilizing MetaMask, a widely used Ethereum wallet and browser extension. MetaMask generates a public–private key pair, allowing users to efficiently manage their Substrate blockchain accounts and interact seamlessly with the platform’s decentralized applications (dApps).
- Submission of Personal Documents: Users are required to submit their personal documents by capturing images of the necessary documents (e.g., passport, driver’s license) using their device’s camera. These images are then securely encrypted and uploaded to the Substrate blockchain using the public key generated by MetaMask. The blockchain creates a unique hash for each uploaded document, serving as a secure reference.
- Data Extraction: The uploaded document images are processed by a dedicated AI server. This server employs advanced object detection algorithms to identify and extract relevant pieces of information from the documents. The extracted information is segmented into smaller components, such as name and date of birth. Each component undergoes Optical Character Recognition (OCR) to convert it into text. To maintain a link to the original document, a unique identifier (e.g., a hash value) is generated for each extracted component.
- Attribute Generation: The Identity Provider (IdP) uses the OCR-processed information to generate privacy attribute tokens. These tokens are then published on the Attribute Repository Contract (ARC) and are associated with the user’s public key. This association ensures that users can prove ownership of their attributes securely.
- Identity Verification: After completing the registration steps, the system initiates the identity verification process to confirm the authenticity of the user’s information. Various verification methods, including document validation and biometric data checks, are employed to ensure the user’s identity is legitimate.
4.2. Identity Verification
- Each of the 10 labeled segments on the card image undergoes hashing separately for both the segmented image region and the corresponding OCR-extracted text.
- Additionally, a comprehensive hash of the entire document image is computed.
- Distribution is managed carefully to prevent any single freelancer from obtaining a complete set of identity data, thereby preserving privacy and reducing potential misuse.
- Freelancers receive specific segment tasks, e.g., verifying whether the OCR-extracted text “User A” matches the image “Name.jpg” labeled as “NAME”.
- For each verification task, freelancers’ signed confirmations are directly recorded on the blockchain:
- −
- Example: Label: NAME, Image: Name.jpg, Text: “User A”, Verifier: Verifier A
- −
- Example: Label: NAME, Image: Name.jpg, Text: “User A”, Verifier: Verifier B
- Verifier ID
- Verified text
- Associated segment image
- Timestamp
- It is committed to a blockchain smart contract linked to the user’s public key.
- They review the freelance-verified segments.
- Final attestation includes signatures confirming the integrity and correctness of each attribute.
- The Attribute Repository Contract (ARC) is updated.
- Verified attributes become available for controlled querying by authorized service providers.
4.3. Authentication Process
- The hash of the OCR-verified text or the segment image.
- The necessary sibling hashes forming the Merkle Path up to the Merkle Root.
- The extracted OCR text associated with the segment.
- Retrieves the Merkle Root associated with the user’s public key from the blockchain.
- Reconstructs the Merkle Root from the provided leaf node and Merkle Path.
- Validates the integrity and correctness of the provided proof against the stored Merkle Root.
- Optionally checks the blockchain records for the associated verification history (freelancer verifications, IdP attestations if needed).
- If the proof matches the stored Merkle Root, authentication succeeds, and the SP can proceed with onboarding or service provisioning.
- If validation fails, authentication is denied, and the user may be asked to resubmit or perform additional verification steps.
4.4. Access and Revocation Control
- Which service providers have access to which verified fields.
- When the access was granted.
- Whether any expiration time is set.
- Freelancer verifications.
- Identity Provider (IdP) or Third-Party Verifier (TPV) attestations.
- Select the verified attribute(s) (e.g., Date of Birth, Nationality).
- Submit a “Grant Access” transaction to the blockchain, recording:
- −
- User’s public key.
- −
- Attribute hash (segment hash).
- −
- Service Provider ID.
- −
- Grant timestamp.
- −
- Optional expiration time.
- Select the SP and attributes they wish to revoke.
- Submit a “Revoke Access” transaction to the blockchain.
- The SP will no longer be able to retrieve or verify Merkle Proofs related to the revoked attributes.
- Smart contract enforcement ensures immediate denial of any new queries.
- Permanently recorded on-chain with timestamps.
- Visible to both the user and the affected SP.
4.5. Formal Threat Model and Potential Attack Surfaces
- OCR Spoofing: Adversaries may attempt to manipulate identity images to bypass OCR detection. This is mitigated by incorporating human-in-the-loop verification from independent freelance verifiers.
- Verifier Collusion: Multiple freelance verifiers may collude to validate false information. To reduce this risk, we apply a reputation-based scoring system and randomized assignment of verifiers.
- Blockchain Replay Attacks: Reusing valid Merkle proofs from a previous session may allow unauthorized access. This is mitigated through time-stamped transactions and revocation mechanisms.
- Unauthorized Data Access: Intercepting Merkle leaf data or image fragments could lead to data leakage. We prevent this by transmitting only hashed data and verifying proofs without exposing raw identity attributes.
Quantitative Analysis of Freelancer Security Economics
- Economic Incentive Model: Formally model the staking and reward mechanism for freelancers using game-theoretic frameworks. Define clear mathematical relationships between staking levels, rewards, penalties, and verifier accuracy. Reference economic models from existing decentralized systems (e.g., Sovrin and uPort) to justify parameter choices [12,13].
- Reputation Scoring Simulation: Establish a quantitative reputation model, incorporating parameters such as historical accuracy, task completion rate, and reliability. Perform sensitivity analysis on reputation score impact on freelancer incentives and security outcomes. Propose a weighted voting mechanism based on reputation scores to enhance system robustness against dishonest verifications.
4.6. Scalability and Cost Analysis
- Gas-Cost Evaluation: Deploy the BDIMS smart contracts on a public Ethereum test-net (e.g., Goerli or Sepolia). Record the gas cost breakdown of key functions, including enrollment, attribute verification, Merkle root commitment, and attribute query transactions. This provides practical insights into deployment feasibility and cost-effectiveness in real-world scenarios [5,13].
- Concurrent User Latency Testing: Conduct a stress test to measure the latency of end-to-end identity enrollment and verification processes under high load (from 1000 to 10,000 concurrent users). Use benchmarking tools such as Apache JMeter or Gatling to simulate multiple simultaneous interactions. Collect latency data to identify performance bottlenecks and propose optimization strategies.
- Throughput vs. Block Time Analysis: Perform experiments varying blockchain block time parameters (e.g., on Ethereum-compatible Substrate chains) and measure throughput in terms of transactions per second (TPS). Plot results showing throughput changes relative to block times, thus clearly demonstrating BDIMS scalability limitations and ideal operational conditions.
5. Results
5.1. Detection of Identity Attributes with YOLO
5.2. Training and Validation Trends
5.3. Confusion Matrix Analysis
5.4. OCR Extraction Accuracy
5.5. Cryptographic Verification Using Merkle Tree
- Each extracted attribute region was hashed using SHA-256.
- All hashes were combined to build a Merkle Tree whose root represents the complete document.
5.5.1. Verification Results
5.5.2. Robustness Against Modifications
5.6. Summary and Future Work
- Gas-Cost Evaluation: Deploying the BDIMS smart contracts on a public Ethereum test-net (e.g., Goerli or Sepolia) to measure detailed gas consumption for enrollment, attribute verification, Merkle root commitments, and attribute queries, thereby providing practical insights into real-world deployment costs.
- Concurrent User Latency Testing: Conducting rigorous load tests to measure the end-to-end latency of identity enrollment and verification processes under scenarios involving 1000 to 10,000 concurrent users, using established performance benchmarking tools.
- Throughput vs. Block Time Analysis: Experimentally evaluating system throughput against varying blockchain block times on Ethereum-compatible Substrate chains, presenting clear throughput versus block time plots to demonstrate scalability under realistic network conditions.
- Quantitative Security Economics Analysis: Performing detailed quantitative modeling of freelancer-based security economics, specifically:
- −
- Simulating collusion thresholds among freelancers and proposing effective randomized task assignments to mitigate potential collusion.
- −
- Developing an economic incentive model incorporating staking, rewards, and penalties using game-theoretic frameworks to clearly define relationships between verifier accuracy and economic incentives.
- −
- Establishing a quantitative reputation scoring mechanism and analyzing its impact on security outcomes through sensitivity analysis and weighted voting schemes.
- Empirical Validation of Threat Models: Conducting extensive adversarial simulations and robustness testing, particularly targeting sophisticated attack scenarios including freelancer collusion, deepfake generation, and advanced OCR spoofing.
- LLM Integration for Enhanced OCR Accuracy: Integrating large language models (LLMs) for post-OCR text correction and semantic validation, particularly beneficial for handling noise, irregular layouts, and multilingual content, to further enhance extraction accuracy.
6. Conclusions
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Bradley, J.; Hill, B.; Sakimura, N. Identity Management Using a Centralized Authority. In Proceedings of the 2014 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA), Hong Kong, China, 14–18 July 2014; pp. 1–8. [Google Scholar]
- Ziegeldorf, J.H.; Morchon, O.G.; Wehrle, K. Privacy in the Internet of Things: Threats and Challenges. Secur. Commun. Netw. 2014, 7, 2728–2742. [Google Scholar] [CrossRef]
- Callas, J. Decentralized Identity: A New Approach to Identity Management. IEEE Secur. Priv. 2021, 19, 12–18. [Google Scholar]
- Allen, C. The Path to Self-Sovereign Identity. Life with Alacrity, 25 April 2016. Available online: http://www.lifewithalacrity.com/2016/04/the-path-to-self-soverereign-identity.html (accessed on 4 January 2024).
- Ravidas, S.; Nguyen, K.; Oualha, N. Decentralized identity: A survey on emerging trends and challenges. IEEE Access 2022, 10, 14038–14060. [Google Scholar] [CrossRef]
- Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 4 May 2024).
- Mougayar, W. The Business Blockchain: Promise, Practice, and Application of the Next Internet Technology; Wiley: Hoboken, NJ, USA, 2016. [Google Scholar]
- Kouhizadeh, M.; Sarkis, J. Blockchain Practices, Potentials, and Perspectives in Greening Supply Chains. Sustainability 2017, 9, 3652. [Google Scholar] [CrossRef]
- Hardjono, T.; Lipton, A.; Pentland, A. Towards an Interoperability Architecture Blockchain Autonomous Systems. IEEE Trans. Eng. Manag. 2020, 67, 1296–1306. [Google Scholar] [CrossRef]
- Reed, D.; Sporny, M.; Longley, D.; Allen, C.; Sabadello, M.; Chadwick, D. Decentralized Identifiers (DIDs)v1.0. World Wide Web Consortium (W3C), Working Draft. 2021. Available online: https://www.w3.org/TR/did-core/ (accessed on 4 October 2024).
- Smith, R. An Overview of the Tesseract OCR Engine. In Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Curitiba, Brazil, 23–26 September 2007; Volume 2, pp. 629–633. [Google Scholar] [CrossRef]
- Sovrin Foundation. Sovrin: A Protocol and Token for Self-Sovereign Identity and Decentralized Trust. 2021. Available online: https://sovrin.org/library/sovrin-protocol-and-token-white-paper/ (accessed on 4 February 2025).
- Veramo. Veramo: Modular Framework for Decentralized Identity. 2024. Available online: https://veramo.io (accessed on 4 February 2025).
- Microsoft. ION: A Decentralized Identifier Network Built on Bitcoin. 2022. Available online: https://identity.foundation/ion/ (accessed on 4 February 2025).
- Alsobeh, A.M.R.; Magableh, A.A. BlockASP: A framework for AOP-based model checking in blockchain systems. IEEE Access 2023, 11, 115062–115075. [Google Scholar] [CrossRef]
- Abuhasan, F.; Ashqar, H.I.; Alsobeh, A.M.R.; Darwish, O. Blockchain-based national digital identity framework—Case of Palestine. In Proceedings of the International Conference on Intelligent Computing, Communication, Networking and Services (ICCNS2024), Dubrovnik, Croatia, 24–27 September 2024. [Google Scholar] [CrossRef]
Feature | Sovrin | uPort | ION | BDIMS (This Work) |
---|---|---|---|---|
Blockchain Platform | Hyperledger Indy | Ethereum | Bitcoin (Sidetree) | Ethereum-compatible |
Merkle Tree Verification | No | No | No | Yes |
AI-based OCR | No | No | No | Yes |
On-chain Data Policy | Minimal | Off-chain | Anchored logs | Hash-only |
Latency/Throughput | Moderate | Variable | High latency | Low latency |
User-Controlled Access | Yes | Yes | Yes | Yes |
Regulatory Compliance (e.g., GDPR) | Partial | In Progress | Limited | Designed for Compliance |
Smart Contract Utilization | Minimal | High | None | Modular/Lightweight |
Security Audit | Public Review | Informal | Open Spec | Verifiable Fields |
Field | Accuracy (%) | Standard Deviation |
---|---|---|
Name | 98.5 | 0.4 |
DateOfBirth | 98.2 | 0.3 |
Sex | 97.8 | 0.5 |
Region | 97.0 | 0.6 |
Address | 92.1 | 0.9 |
Status | 96.7 | 0.7 |
DateOfExpiration | 96.3 | 0.8 |
No | 97.4 | 0.5 |
DateOfBegin | 97.6 | 0.6 |
PeriodOfStay | 97.1 | 0.6 |
Attribute | Value |
---|---|
Label | Name |
Hash | 741e29ed02e49629943666ed2be39a92be0fba16 e715220ee273a2eb014be582 |
Merkle Root | 567929a89761a8676885abcaa00f8fd5e03469f4 a2f0cd73719e60ddf273aed3 |
Proof Length | 4 hashes |
Verification Result | Valid (✓) |
Metric | Average Value | Notes |
---|---|---|
Proof Generation Time | 18 ms | For 10 attributes |
Merkle Proof Size | 1024 bytes | 4 hashes, each 256 bits |
Verification Time | 7 ms | On a standard CPU |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Le, H.V.A.; Nguyen, Q.D.N.; Tadashi, N.; Tran, T.H. Blockchain-Based Decentralized Identity Management System with AI and Merkle Trees. Computers 2025, 14, 289. https://doi.org/10.3390/computers14070289
Le HVA, Nguyen QDN, Tadashi N, Tran TH. Blockchain-Based Decentralized Identity Management System with AI and Merkle Trees. Computers. 2025; 14(7):289. https://doi.org/10.3390/computers14070289
Chicago/Turabian StyleLe, Hoang Viet Anh, Quoc Duy Nam Nguyen, Nakano Tadashi, and Thi Hong Tran. 2025. "Blockchain-Based Decentralized Identity Management System with AI and Merkle Trees" Computers 14, no. 7: 289. https://doi.org/10.3390/computers14070289
APA StyleLe, H. V. A., Nguyen, Q. D. N., Tadashi, N., & Tran, T. H. (2025). Blockchain-Based Decentralized Identity Management System with AI and Merkle Trees. Computers, 14(7), 289. https://doi.org/10.3390/computers14070289