Self-Sovereign Identities and Content Provenance: VeriTrust—A Blockchain-Based Framework for Fake News Detection

Farhan, Maruf; Butt, Usman; Sulaiman, Rejwan Bin; Alraja, Mansour

doi:10.3390/fi17100448

Open AccessArticle

Self-Sovereign Identities and Content Provenance: VeriTrust—A Blockchain-Based Framework for Fake News Detection

by

Maruf Farhan

¹,

Usman Butt

^2,*

,

Rejwan Bin Sulaiman

³

and

Mansour Alraja

^4,*

¹

STEM Department, University of Sussex, The International Study Centre (ISC), Brighton BN1 9RH, UK

²

College of Engineering and Information Technology, Ajman University, Ajman P.O. Box 346, United Arab Emirates

³

Department of Computer Science and Engineering, Northumbria University, London E1 7HT, UK

⁴

Newcastle Business School, Northumbria University, Newcastle upon Tyne NE1 8ST, UK

^*

Authors to whom correspondence should be addressed.

Future Internet 2025, 17(10), 448; https://doi.org/10.3390/fi17100448

Submission received: 12 August 2025 / Revised: 13 September 2025 / Accepted: 24 September 2025 / Published: 30 September 2025

(This article belongs to the Special Issue AI and Blockchain: Synergies, Challenges, and Innovations)

Download

Browse Figures

Versions Notes

Abstract

The widespread circulation of digital misinformation exposes a critical shortcoming in prevailing detection strategies, namely, the absence of robust mechanisms to confirm the origin and authenticity of online content. This study addresses this by introducing VeriTrust, a conceptual and provenance-centric framework designed to establish content-level trust by integrating Self-Sovereign Identity (SSI), blockchain-based anchoring, and AI-assisted decentralized verification. The proposed system is designed to operate through three key components: (1) issuing Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs) through Hyperledger Aries and Indy; (2) anchoring cryptographic hashes of content metadata to an Ethereum-compatible blockchain using Merkle trees and smart contracts; and (3) enabling a community-led verification model enhanced by federated learning with future extensibility toward zero-knowledge proof techniques. Theoretical projections, derived from established performance benchmarks, suggest the framework offers low latency and high scalability for content anchoring and minimal on-chain transaction fees. It also prioritizes user privacy by ensuring no on-chain exposure of personal data. VeriTrust redefines misinformation mitigation by shifting from reactive content-based classification to proactive provenance-based verification, forming a verifiable link between digital content and its creator. VeriTrust, while currently at the conceptual and theoretical validation stage, holds promise for enhancing transparency, accountability, and resilience against misinformation attacks across journalism, academia, and online platforms.

Keywords:

fake news detection; content provenance; self-sovereign identity; blockchain; misinformation; decentralized verification; verifiable credentials; smart contracts

1. Introduction

1.1. Background and Motivation

The digital era has revolutionized how news is created and shared, making information more accessible and enhancing global communication while simultaneously enabling the rapid spread of misinformation and fake news. Major events, such as the COVID-19 pandemic and various global political events, have spotlighted the dangerous societal impacts of fake news and the need for reliable tools to verify content and attribute responsibility to creators [1]. Traditional centralized verification models struggle to scale in today’s fast-moving, multi-platform information landscape.

Emerging technologies such as blockchain and self-sovereign identity (SSI) frameworks offer transformative potential towards decentralized, transparent, and tamper-proof systems for content verification and identity management. Blockchain ensures immutability, transparency, and cryptographic security, providing compelling solutions for establishing content provenance and maintaining an auditable trail of information lifecycle [2]. Meanwhile, SSI allows individuals control over their digital identities via cryptographically verifiable credentials, sidestepping reliance on central authorities [3].

By merging blockchain and SSI, a smarter, decentralized fake news detection systems that tackle both technical validation and the deeper issue of author accountability can be built. However, academic exploration of this convergence remains largely unexplored, leaving an important research gap that this study addresses.

1.2. Problem Statement

Existing fake news detection tools have several key weaknesses that hinder their effectiveness in real-world deployments. Traditional machine learning (ML) approaches show promise on controlled datasets but falter when applied to dynamic, real-world scenarios, especially where contextual data about content creators and disseminators is lacking. Most current systems focus heavily on content itself, neglecting the equally vital task of verifying who created and shared it, which is crucial for establishing trust in digital information ecosystems.

A major shortfall is the lack of robust mechanisms to trace digital content’s origin and transformation. Without this, identifying misinformation sources or validating authenticity becomes nearly impossible. Dependence on centralized face-checkers or platform moderation introduces scalability issues and risks of bias.

Identity-related threats, including impersonation and spoofing, further exacerbate the problem. In environments like social media, malicious users can easily create fake profiles and disseminate false information. The absence of verifiable credentials leaves digital ecosystems vulnerable.

While blockchain-based approaches have been proposed as a solution [4], existing implementations often suffer from slow processing speeds and high computational demands, making them unsuitable for widespread deployment. Real-time detection systems require both efficient solutions and handling massive volumes of content without compromising security or reliability, a challenge yet to be fully solved.

1.3. Research Objectives

Objective 1 of this study is to design VeriTrust, a comprehensive blockchain-based framework that integrates SSI principles with content tracking, forming a scalable and secure fake news detection model. VeriTrust addresses key deficiencies by enabling trustworthy author identification, persistent content audit trails, and high-performance verification processes.

The aim of objective 2 is to propose an architecture that unites SSI with blockchain-led content validation. This allows creators to be held accountable while respecting user privacy and autonomy as core tenets of SSI.

Finally, the research aims to establish an evaluation framework to measure accuracy, scalability, security, and user experience. This helps assess real-world readiness and uncovers paths for refinement and broader adoption.

1.4. Contributions

This work offers several critical contributions, including:

VeriTrust Framework: A pioneering solution that combines SSI and blockchain to detect fake news, filling a major gap in current academic discourse.
Innovative Cryptographic Protocols: Introduction of privacy-preserving protocols, such as digital signatures and zero-knowledge proofs, that ensure secure identity verification and content validation.
Comprehensive Evaluation Framework: A holistic system to assess technical metrics, security robustness, and user experience, grounded in practical deployment scenarios including regulatory and integration considerations.

Beyond technical impact, this research also advances blockchain’s potential for public-good applications. Insights from this work could inform systems beyond misinformation control, such as digital identity verification and trusted content infrastructures.

It should be emphasised that VeriTrust is presented as a conceptual framework rather than a fully implemented prototype. This study presents the system implementation focusing on architectural design, mathematical validation of cryptographic foundations, and workflow-level pseudocode. A detailed elaboration of integration of SSI (Hyperledger Indy/Aries), blockchain anchoring (Hyperledger Besu), and decentralized storage (IPFS) is included, while the full-scale prototype deployment and empirical benchmarking are reserved for future work. Therefore, the claims made in this paper should be interpreted as demonstrating feasibility and conceptual validity, rather than providing implementation-level performance metrics.

1.5. Paper Organization

The remainder of this paper is structured as follows:

Section 2 explores the related literature on fake news detection, SSI frameworks, and blockchain-based content tracking, identifying key gaps.
Section 3 details the VeriTrust architecture and design specifications
Section 4 covers implementation, including technology choices and optimization strategies.
Section 5 discussion and comparative analysis
Section 6 concludes by summarizing contributions and proposing future research directions.

2. Literature Review

2.1. Knowledge-Based Fake News Detection

The authors [5] aimed to develop a novel and effective method for detecting and classifying fake news by leveraging sequential pattern mining to uncover distinguishing linguistic patterns between real and fake news. They proposed a framework called Fake News Analysis and Classification Using Sequential Pattern Mining (FNACSPM), which involves preprocessing six diverse datasets, extracting frequent word sequences using SPM algorithms such as TKS, CM-SPAM, and ERMiner, and utilizing these patterns as features for classification via machine learning models. While the proposed framework improves classification performance and interpretability, it still relies heavily on text content and may struggle with data from nontextual modalities. Furthermore, the need for preprocessing and parameter tuning could limit scalability and real-time applications.

Another study [6] reviewed prior detection techniques and proposed an enhanced hybrid model combining ML, deep learning, and emotional and frequency-based features. The suggested new system was optimized through intelligent search algorithms, and the approach improved performance but lacked robust support for multilingual content, cross-source fact-checking, and deep emotional tone analysis.

In [7], the EPFMKG (Entity debiasing framework of Multi-view and Knowledge Graph) framework was introduced, aiming to reduce entity bias and improve future generalization by modelling emotional and stylistic features with knowledge graphs. By training entity and content features separately and excluding entity prediction at test time, it showed robust performance on both Chinese and English datasets, particularly in forecasting future news through time-based divides.

A dual BERT-based deep neural network model [8] enhanced detection accuracy by jointly analysing article titles and bodies. The model was trained and evaluated using authentic datasets, demonstrating superior performance compared to standalone BERT and traditional ML approaches. Nonetheless, the system still struggled to differentiate fake news that closely resembled authentic news in writing style and structure, necessitating future enhancements for wider applicability.

Lastly, the LGT (long-range Graph Transformer) model [9] was introduced by researchers who targeted early rumor detection on social media, overcoming GNN’s limitation in capturing long-range user dependencies. It includes the Bot Detection Module, scoring user credibility, and the Rumor Detection Module, combining publisher-news graphs, user interaction features through graph transformers, and text content features using CNNs, fused with attention mechanisms for prediction. Tested on Twitter 15/16 and Weibo16, it achieved 94%+ accuracy. However, evolving bot strategies and privacy concerns pose challenges for future adoption, and future work aims to improve dynamic data modelling, robustness against attacks, and explore AGI applications for social media analysis.

2.2. Style-Based Fake News Detection

Style-based detection methods focus on linguistic and psychological features such as readability, sentiment, and entity usage. A notable study [10] introduced MCRED (Message Credibility), a multi-modal credibility model using text, images, and user interactions with graphical neural networks and contrastive learning to detect misinformation. Despite strong performance, this model struggled with neutral or scientific-toned misinformation and evolving deepfake tactics. These evolving tactics make it harder for the model to generalize and stay effective without continuous updates.

In [11], classic and deep learning models like KNN, Naïve Bayes, logistic regression, and LSTM (Long Short-Term Memory) were trained using TF-IDF and Word2Vec text representations on the ISOT dataset. Performance was evaluated using accuracy, precision, recall, and F1-score. LSTM achieved 99.2% accuracy but lacked validation across diverse datasets (limited to the ISOT dataset) and did not explore newer models like BERT or RoBERTa that could improve findings.

In [12], a deep neural network model was developed using linguistic and psychological features such as sentiment polarity, syntactic complexity, and readability scores. The system achieved 90% accuracy on the FakeNewsNet dataset, but its reliance on handcrafted features makes it vulnerable to domain shifts and writing style mimicry.

The NER-SA framework [13] was presented by a subsequent investigation, which applied named entity recognition (NER) methodologies for stylometric detection, boosting accuracy by +8.9% and F1-score by +24% on the LIAR dataset and up to 19.4% accuracy on the fake/real news corpus. Yet, it may fail against mimicry of authentic entity patterns or in low-resolution languages.

In a more recent approach, Wu et al. proposed SheepDog, a style-robust detection framework that enhances resilience against fake news written in LLM-generated styles using adversarial augmentation and attribution-based learning [14]. While effective, this model still requires frequent retraining as generative language models evolve.

2.3. Propagation-Based Fake News Detection

Previous studies in this area analyze how misinformation spreads across single social media platforms like Twitter or Facebook and have not dealt with cross-platform behaviour in such detail. Although models such as LSTM-based models showed high accuracy (up to 99.2%) on datasets like ISOT and reached 94.1% on the MPPFND (Multi-Platform Propagation Fake News Detection) dataset [15], their results stem from limited comparisons and short-term data. Cross-platform propagation behaviour remains underexplored, and no single study has yet fully integrated textual features with platform-specific behavioural patterns.

CoST [16] integrates structural and temporal features from propagation graphs using transformers. When evaluated on Weibo and Twitter datasets, it outperformed static GNNs by 5–7% in ROC-AUC (above 0.93). However, its dependence on detailed temporal data can hinder performance in cases of delay or missing links.

GETAE [17] merges text and propagation signal via a graph-enhanced ensemble architecture, showing 6–8% improvements in F1-score on Twitter-15/16, demonstrating the effectiveness of combined propagation and linguistic signals. Yet, the model assumes complete graph data and does not include any mechanism to verify content origin or author identity.

CSDA (Causal Subgraph-oriented Domain Adaptive fake news detection) model [18] focuses on extracting causal propagation substructures to improve domain adaptation and generalization for unseen settings. It yielded 7–16% better accuracy in zero-shot OOD (out-of-distribution) tasks across multiple platforms. Despite strong generalization, CSDA lacks identity and provenance verification mechanisms, remaining platform-agnostic.

2.4. Source-Based Fake News Detection

Instead of analysing content, source-based approaches evaluate the metadata, reputation, and social behaviour of publishers or users to gauge credibility [19].

Study [20] explored factors like misinformation frequency, page structure, and author professionalism across platforms, showing their clear influence on credibility scoring. However, it remained descriptive and lacked an operational detection model, highlighting the difficulty of automating these indicators effectively.

ExFake framework [21] proposed an explainable AI (artificial intelligence) system that scores credibility based on user behaviour and interaction with fact-checking sources. Evaluated on real social media data, it outperformed the baseline by roughly 8–12% in F1 score. Nonetheless, it depends on centralized trust metrics and lacks cryptographic safeguards for source authenticity.

Us-DeFake [22] introduces a dual-layer graph (news and user) that incorporates user credibility into fake news detection. Applied to Twitter data, it yielded 5–10% F1 score enhancements over propagation-only models. Yet, its reliance on behavioural patterns makes it vulnerable to sybil attacks and fraudulent identities.

2.5. Blockchain Media and Content Verification

Blockchain technology has gained attention in the media domain for its potential to ensure transparency, traceability, and trust in digital content ecosystems. Below are some of the aspects discussed.

2.5.1. Content Authentication and Integrity

The application of blockchain technology to content authentication has emerged as a critical research area, driven by the urgent need to combat misinformation and establish digital content integrity. Ref. [23] demonstrates significant advances in cryptographic verification mechanisms that leverage blockchain’s immutable ledger properties, developing a blockchain-based privacy-preserving authentication system for ensuring multimedia content integrity. Their work addresses fundamental challenges in maintaining data integrity while preserving user privacy, establishing theoretical foundations for scalable content verification systems.

Ref. [24] expands on these foundations by exploring digital resource copyright protection schemes based on blockchain cross-chain technology. Their research addresses the expansion of the digital publishing industry that has led to increasingly rampant copyright infringement of digital resources, proposing blockchain technology as a comprehensive solution. The authors demonstrate how hybrid architectures combining traditional digital watermarking techniques with blockchain-based verification systems can address the fundamental challenge of balancing security with computational efficiency.

In [25], a blockchain-based digital copyright protection system was proposed that incorporates sharding network technology to address the growing challenges of copyright infringement caused by rapid dissemination and easy duplication of digital content. This study highlights how blockchain’s decentralized structure, tamper-resistance, and irreversible record-keeping offer effective and practical solutions to many of the limitations found in traditional third-party copyright protection methods.

2.5.2. Copyright Protection and Digital Rights Management

The advancement of blockchain in copyright protection marks a major step forward in digital rights management. In [26], examine its use in managing music copyrights through a case study of the VNT chain platform, where they develop automated systems capable of functioning without centralized intermediaries. This research utilizes smart contracts to embed intricate licensing terms, showcasing how blockchain-based approaches can effectively support copyright management.

In [26], recent network technology developments are addressed, which have enabled digital work to be published online easily, creating the leading issues in the digital technology field regarding infringement of digital work, which can severely damage data integrity. Their blockchain-based solution provides transparent and immutable records of copyright ownership and licensing, automatically enforcing licensing terms and distributing royalties without traditional intermediaries.

Ref. [27] advances this research through their investigation of new media digital copyright registration and trading systems based on blockchain technology. Their work addresses challenges posed by the current new media environment, where traditional copyright mechanisms struggle to keep pace with rapid technological advancement. The research provides innovative solutions for content creators to establish and protect their intellectual property rights in digital environments, demonstrating scalable approaches to copyright management.

2.5.3. Systematic Reviews and Meta-Analysis

Systematic literature reviews have provided comprehensive analyses of blockchain applications across multiple domains. Recent systematic reviews have investigated the current status, classification, and open issues in blockchain-based applications, providing valuable insights into the state of research and development [28]. These reviews examine application fields, platforms, and consensus protocols, highlighting the rapid growth and significant interest in blockchain as the underlying technology behind various digital applications.

Despite blockchain’s advantages, including decentralization, data integrity, traceability, and immutability, significant limitations remain, including scalability, resource greediness, and governance challenges. Contemporary research has focused on addressing these limitations through various technological innovations and architectural improvements, with particular emphasis on developing solutions that can operate at internet scale while maintaining security properties.

2.5.4. Self-Sovereign Identity (SSI): DIDs and Verifiable Credentials

Self-sovereign identity (SSI) represents a paradigm shift in digital identity management, transferring control from centralized authorities to individual users. At the core of SSI systems are decentralized identifiers (DIDs) and verifiable credentials (VCs), which provide privacy-preserving, cryptographically verifiable claims of identity.

SSI architectures are grounded in cryptographic trust models that ensure non-repudiation and selective disclosure. Recent work has emphasized the importance of anchoring identity on decentralized networks to enhance privacy, eliminate intermediary reliance, and enable cross-domain authentication [29,30]. However, most studies focus on identity control and privacy preservation without establishing direct links between user identity and content authorship.

Several large-scale SSI implementations have emerged globally, offering practical insights into both the potential and limitations of self-sovereign identity systems:

European Blockchain Services Infrastructure (EBSI): EBSI aims to deliver blockchain-enabled cross-border public services across EU member states, using SSI for educational credentials and healthcare certification. Pilots have issued tamper-proof diplomas and COVID-19 vaccination certificates as verifiable credentials, shared securely without re-verification from issuers [31,32,33].
Hyperledger Indy and Sovrin Network: The Sovrin Foundation built a decentralized identity ecosystem on Hyperledger Indy. In the healthcare sector, Epic Systems’ MyChart application uses Indy-based credentials for privacy-preserving medical data sharing [34,35]. British Columbia’s government also launched the Person Credential initiative, issuing over 100,000 digital identities since 2020 [36].
Educational Sector Implementations: MIT has issued over 10,000 blockchain-based diplomas, enabling global verification [37].
Public Sector Examples: Estonia’s e-Residency program supports more than 100,000 global digital residents with decentralized identity principles [38]. Singapore’s SingPass has integrated blockchain-based features for secure citizen authentication, serving over 4 million users [39].

Although pilot projects show promise, self-sovereign identity still struggles with scale, regulatory ambiguity, interoperability gaps, complex user experience, and issuer governance overhead [3,40,41,42,43]. At the same time, the full spectrum of fake news detectors—from linguistic classifiers to graph transformers—remains fragmented: blockchains guarantee immutability yet ignore authorship, SSI authenticates people but not their content, and provenance models log edits without a trustworthy identity anchor. No existing solution therefore provides an end-to-end trust pipeline that proves who created a story and how it evolved. Table 1 distils representative work in each domain and highlights where these gaps persist.

2.5.5. Digital Content Provenance

Digital content provenance has emerged as a critical mechanism to ensure trust in media authenticity by tracing the origins, agents, and transformations of digital artifacts. Core models such as the W3C PROV standard provide a formal ontology of Entities, Activities, and Agents that underpin causal chains in data evolution, enabling traceability in formats like PROV-N or PROV-O [44]. While semantically rich, W3C PROV implementations often rely on centralized storage or proprietary metadata formats, which limit trust guarantees in adversarial or decentralized settings [45]. To mitigate these limitations, modern frameworks integrate W3C PROV with cryptographic primitives and blockchain-based anchoring. Merkle trees, originally introduced by Ralph Merkle [46], support scalable and tamper-evident batch verification, while anchoring the computed Merkle root on an immutable blockchain ledger ensures permanent auditability.

For instance, the ForensiBlock system proposed by Akbarfam et al. combines Merkle tree constructions with blockchain-based access control to track forensic events and support efficient case history extraction [47]. Similarly, the FACTchain model embeds structured provenance metadata into Ethereum smart contracts to establish immutable edit trails for journalistic content, thereby enhancing transparency and accountability [48]. Scalability constraints, especially in handling voluminous multimedia content, are addressed through hybrid on-chain/off-chain architectures. In these models, only cryptographic hashes and essential metadata are stored on-chain, while original content (e.g., videos, images, full-text documents) is hosted on decentralized platforms like IPFS and Web3.Storage. This ensures verifiability through content-addressable identifiers without inflating on-chain storage.

Recent provenance systems have also incorporated forensic validation tools such as DeepForensics, OpenCV-based perceptual hashing, and semantic integrity scorers to verify visual authenticity at the pixel level [49]. These tools allow the detection of adversarial manipulations that may bypass traditional metadata analysis. Despite these advances, key challenges persist: real-time adaptability in fast-evolving media environments, the integration of privacy-preserving techniques such as zero-knowledge proofs, and the lack of interoperability across blockchain ecosystems. As a result, current provenance solutions remain fragmented, failing to deliver an integrated framework that links content authorship, lifecycle tracking, and forensic verification in a cryptographically verifiable manner. This gap motivates the VeriTrust framework introduced in Section 4, which seeks to address these limitations through a unified architecture combining SSI, blockchain anchoring, and AI-driven content analysis. As summarized in Table 1, we consolidate representative approaches across fake news detection, blockchain provenance, and self-sovereign identity systems to highlight persistent limitations—thereby motivating the unified architecture proposed in our VeriTrust framework.

Despite these challenges, efforts continue to develop more effective blockchain-based solutions for real-world problems. Researchers are addressing methodological challenges that limit blockchain applications. Studies have been conducted in areas such as supply chain [50,51], Higher education [52], Transportation [53], electoral voting system [54], fake product detection [55], medical devices [56], and cybersecurity [57].

3. Proposed Framework: VeriTrust: A Blockchain-SSI Framework for Content Provenance and Fake News Detection

The “VeriTrust” framework is proposed as a comprehensive blockchain-based solution designed to combat fake news by integrating Self-Sovereign Identities (SSI) with robust content provenance and advanced detection mechanisms. This framework aims to shift from a reactive, post-publication fact-checking model to a proactive, source- and process-verified trust model, fundamentally altering the economics and dynamics of misinformation. Instead of merely detecting and correcting fake news after it spreads, VeriTrust aims to establish verifiable authenticity at the point of creation and throughout its lifecycle.

3.1. Architectural Overview

The VeriTrust framework is composed of three interrelated layers, each vital to verifying content authenticity and combating misinformation:

Decentralized Identity and Credential Layer (SSI layer): This foundational layer manages digital identities through Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs), allowing users to control their own identity data without reliance on centralized providers.
Content Provenance Layer (Blockchain Layer): This layer acts as the system’s core and maintains an immutable record of content from its origin, promoting transparency and preventing tampering throughout its lifecycle [47].
Fake news Detection and Verification Layer (AI/Community Layer): Utilizing sophisticated AI algorithms for initial content analysis, this layer also incorporates a decentralized, community-led mechanism to support verification and resolve disputes [58].

To ensure interoperability, the entire framework aligns with established standards from W3C for DIDs and VCs [34]. To address the scalability challenges associated with storing large volumes of data directly on a blockchain, VeriTrust adopts an on-chain/off-chain storage model. Only essential metadata, such as cryptographic hashes of content, DIDs, and timestamps, are immutably recorded on a public or permissioned blockchain. The actual content (articles, images, videos) is stored off-chain on decentralized storage solutions like IPFS or Web3.Storage. This approach balances the need for immutable verification records with practical scalability requirements.

3.2. Self-Sovereign Identity Integration

The SSI layer is fundamental to establishing trust at the source of information:

User Registration and DID Creation: All participants in the content ecosystem, including content creators (e.g., journalists, citizen reporters), publishers, and fact-checkers, register on the VeriTrust platform. During registration, they create their unique Decentralized Identifiers (DIDs). This process ensures that users retain full control over their digital identities, eliminating reliance on centralized services for identity management [59].
Verifiable Credentials for Roles and Expertise: To enhance credibility, trusted entities such as journalism schools, professional associations, or established media organizations can issue Verifiable Credentials (VCs) to verify a user’s specific role (e.g., “Certified Journalist,” “Expert Fact-Checker”) or demonstrated expertise [36]. These VCs are held securely by the user in their digital wallet and can be selectively disclosed as Verifiable Presentations (VPs) only when necessary for specific verification processes [37].
Digital Signatures for Content Authentication: As a key component of the SSI integration, every content submission and any later edits are required to be digitally signed using the private key associated with the creator’s or editor’s Decentralized Identifier (DID) [47]. This cryptographic linkage ensures irrefutable proof of both origin and authorship, safeguarding against unauthorized changes and denying any possibility of disowning the content.
Access Control: Leveraging SSI principles, the framework enables granular, user-controlled access to their identity data. Users can selectively disclose specific attributes or credentials for verification purposes, maintaining a high degree of privacy and autonomy over their personal information.

3.3. Content Provenance Mechanism

The Content Provenance Layer, built upon blockchain technology, provides an immutable and verifiable history for every piece of content:

Initial Content Registration: When a piece of content is first created or uploaded to the platform, its cryptographic hash, along with the creator’s DID and a timestamp, is immutably recorded on the blockchain [60]. This marks the genesis point of the content, establishing its verifiable origin.
Lifecycle Tracking: Any subsequent significant events in the content’s lifecycle, such as modifications, re-publications, or flags from fact-checking processes, are also hashed, timestamped, and linked to the relevant DIDs (e.g., the editor’s DID, the fact-checker’s DID) on the blockchain. This creates a transparent and verifiable chain of custody, allowing anyone to trace the content’s complete history.
Metadata Documentation: Comprehensive metadata associated with the content—including details about the creation device, geographical location, and links to original source materials—is securely linked to its blockchain record. This rich contextual information provides crucial data for verification purposes.
On-chain/Off-chain Storage Model: To ensure scalability while maintaining immutability, only cryptographic hashes and essential provenance metadata are stored on-chain. The full, larger content files (e.g., high-resolution images, videos, full articles) are stored on decentralized off-chain storage solutions like IPFS or Web3.Storage. This hybrid approach optimizes blockchain performance and storage efficiency.

3.4. Fake News Detection Components

The Fake News Detection and Verification Layer integrates advanced computational analysis with community oversight:

AI-Powered Content Analysis: A multimodal deep learning model is employed to analyze content across various formats, including text, images, and video, for indicators of misinformation. This AI model leverages federated learning, allowing it to be trained across distributed nodes without centralizing sensitive user data, thereby preserving privacy while maintaining high detection accuracy [61]. This provides an initial, automated assessment of content authenticity.
Decentralized Voting and Challenge Mechanism: For content flagged as suspicious by the AI or challenged by users, a community-driven decentralized voting system is activated. Verified fact-checkers, whose expertise is attested by VCs, and potentially a broader community of users, can review the content and vote on its authenticity. Stake-based verification models can be implemented to incentivize honest participation and penalize malicious actors [62]. This hybrid intelligence model, combining AI’s efficiency with human nuance, creates a resilient, self-improving system that leverages the strengths of both while mitigating their individual weaknesses. The ability of such a challenge mechanism to correct a high percentage of AI misclassifications even with adversarial participation is a strong indicator of its robustness.
Reputation System: A blockchain-based reputation mechanism tracks the historical accuracy and reliability of all participants: content creators, publishers, and fact-checkers [63]. Participants with a higher reputation, built on consistent contributions of verified content or accurate fact-checking, may be granted more voting power or influence within the decentralized verification process.
Smart Contracts for Rules and Incentives: Smart contracts are crucial for automating the execution of verification rules, managing the distribution of token rewards to incentivize accurate fact-checking, and implementing penalties for malicious or dishonest behaviour within the system [59].
Provenance-Based Authenticity Checks: The detection system directly integrates with the immutable provenance data recorded on the blockchain. By cross-referencing the content with its creator’s DID, modification history, and associated metadata, the system can perform robust authenticity checks, providing a foundational layer of trust that goes beyond surface-level content analysis.

Figure 1 illustrates the overarching architecture of the VeriTrust framework, structured across four principal components: user interaction, decentralized identity, content provenance, and community-driven verification. Each of these components are functionally interconnected, collectively enabling end-to-end content integrity, reliable source authentication, and resilient, tamper-evident recordkeeping.

As a supplement to the visual overview, Table 2 summarizes the core technological constituents of the VeriTrust framework, namely Decentralized Identifiers (DIDs), Verifiable Credentials (VCs), blockchain ledgers, and federated AI models. For each component, the table delineates its function within the broader system and references the corresponding standards or enabling technologies. This detailed mapping serves as a foundational blueprint for both implementation and validation.

4. Implementation and Validation

This section provides a comprehensive implementation and validation strategy for the VeriTrust framework. It aims to substantiate the theoretical model by translating its components into an operational system, evaluating its cryptographic and performance integrity, and examining potential trade-offs. By systematically elaborating each step of the implementation pipeline—from component selection to mathematical verification and simulation—we demonstrate that VeriTrust is not only conceptually sound but also practically viable. The section is structured into eight detailed subsections covering system architecture, workflow execution, algorithmic logic, layered design, known limitations, formal validation, benchmarking methodology, and reproducibility artifacts. Together, these subsections offer the empirical and analytical foundations for VeriTrust’s security, scalability, and applicability in real-world content provenance and verification use cases.

4.1. System Components and Tool Stack

The implementation of VeriTrust relies on a stack of open-source, standards-compliant technologies chosen for their maturity, security, and alignment with decentralized principles.

Blockchain Ledger: Hyperledger Besu is selected as the blockchain platform. As an Ethereum-compatible, permissioned client, it supports smart contracts written in Solidity and offers high throughput, making it suitable for enterprise-level applications like credential management and high-frequency provenance logging [64]. Its adoption by the European Blockchain Services Infrastructure (EBSI) further validates its robustness [31].
SSI Framework: Hyperledger Aries and Indy provide the core components for Self-Sovereign Identity. Hyperledger Indy serves as a purpose-built distributed ledger for managing DIDs, while Hyperledger Aries provides a toolkit for building digital wallets and agent-to-agent communication, facilitating the issuance and presentation of VCs. DID methods like did:web or did:key can be used for simple, ledger-independent identity creation.
Smart Contracts: Solidity is the language of choice for implementing the on-chain logic. Smart contracts will manage the DID registry, the anchoring of Merkle roots for content provenance, and the rules for the decentralized voting and reputation system.
Off-Chain Storage: IPFS (InterPlanetary File System) is used for storing large content files and detailed provenance records off-chain. This approach addresses blockchain scalability limitations by only storing an immutable cryptographic hash (the IPFS CID) on the ledger, which points to the full data.
Verifiable Credentials: Credentials will adhere to the W3C VC Data Model, serialized in JSON-LD format. This ensures interoperability and allows credentials to be machine-readable and cryptographically verifiable across different platforms.
Frontend Verifier: A browser plugin or API-based service will act as the primary interface for consumer verification. This client-side tool will locally hash content, query the blockchain for the corresponding Merkle root, retrieve the Merkle proof from IPFS, and validate the VC signature against the issuer’s public DID document.

The core software components used in the system, including the version used and each component’s security role are listed in Table 3.

A public repository of reference artifacts has been released to enhance the reproducibility and demonstrate the architectural feasibility of this conceptual work. The repository includes Docker/Compose templates, a minimal smart contract sketch, sample data formats, and a standalone verifier stub that validates the core cryptographic logic (hashing and Merkle inclusion) in an offline setting. These resources are not part of the deployed system and are intended to be illustrative. The repository is publicly available [65] at: https://github.com/marufrigan/veritrust-artifacts (accessed on 25 July 2025).

4.2. Implementation Workflow and Data Flow

The data flow within VeriTrust follows a logical sequence from identity creation to final verification, as depicted in Figure 2.

Identity Registration: A creator (e.g., a journalist) and a publisher (e.g., a news agency) each generate their own DIDs and associated key pairs using a compatible digital wallet (built with Hyperledger Aries). Their public keys and service endpoints are published in their respective DID Documents, which are registered on the DID ledger (Hyperledger Indy [66]).
VC Issuance: The publisher, acting as a trusted Issuer, creates a W3C-compliant Verifiable Credential asserting the creator’s role (e.g., “Verified Journalist”). This VC is cryptographically signed with the publisher’s private key and sent to the creator, who stores it securely in their digital wallet [66].
Content Creation and Hashing: The creator produces a piece of content (e.g., an article). The client application locally computes a SHA-256 hash of the content, creating a unique and tamper-evident fingerprint.
Provenance Logging: The creator’s application bundles the content hash with relevant metadata (timestamp, author DID, etc.) into a provenance record. This record is digitally signed using the creator’s private key. The record is then added to a batch of other records, which are used to construct a Merkle Tree. The single Merkle root of this tree is submitted to the VeriTrust smart contract on the Hyperledger Besu blockchain. The full provenance record and the content itself are stored on IPFS.
Verification: A consumer’s client (e.g., a browser plugin) encounters the content. It performs the following steps:
○
Locally computes the SHA-256 hash of the content it is viewing.
○
Queries the blockchain for the transaction containing the claimed provenance.
○
Retrieves the Merkle proof and the signed provenance record from IPFS.
○
Uses the Merkle proof to validate that its locally computed hash is included in the on-chain Merkle root.
○
Resolves the creator’s DID to fetch their public key and verifies the digital signature on the provenance record.
○
Optionally, it can request the creator’s VC and verify the publisher’s signature on it, confirming their credentials.
○
Content Canonicalization for Dynamic Updates. To mitigate the risk of “hash drift” caused by minor content modifications (e.g., minor edits, formatting changes), VeriTrust applies a canonicalization step prior to hashing. While the immutable core elements like the author’s DID and original timestamp are preserved, the mutable components (e.g., metadata updates or headline corrections) are versioned and linked to new hashes. This approach ensures the traceability of content evolution without compromising the integrity of the original provenance commitment [67,68].

4.3. Algorithms and Pseudocode

To operationalize the core functionalities of VeriTrust—namely, the submission and verification of content provenance—this section presents two primary algorithms. These algorithms are designed to (i) encode and register the provenance of a content item in a secure and scalable manner, and (ii) verify the integrity and authenticity of content from the perspective of an external verifier. The structure, inputs, and procedures of each algorithm are detailed below in pseudocode and supplemented with narrative explanations. These processes directly map onto the cryptographic foundations discussed in Section 4.6, ensuring traceability, non-repudiation, and computational soundness.

4.3.1. Content Canonicalization for Stable Hashing

VeriTrust incorporates a deterministic canonicalization pipeline prior to computer hashing to ensure that identical content constantly generates a stable cryptographic fingerprint. This process eliminates superficial variations that could otherwise lead to inconsistent digests.

The canonicalization process comprises three layers:

Textual Data Normalization—To unify character encodings (e.g., different Unicode forms of accented characters are treated equivalently), all textual data is standardized to Unicode Normalization Form C (NFC), eliminating additional whitespace, line breaks, and non-semantic formatting characters.
Structured Data Serialization—To ensure the same logical record always produces the same serialized representation, regardless of platform or editor differences, JSON-based metadata is processed by sorting keys lexicographically and serializing values in a uniform format without extra whitespaces.
Multimedia Content Preprocessing—To prevent irrelevant changes (like a device automatically rewriting metadata) from altering the fingerprint of substantive content, hash of binary artifacts (e.g., images, video, audio) are computed on their raw content bytes after removing non-essential metadata (e.g., EXIF tags in JPEGs, container-level timestamps in MP4 files).

By incorporating canonicalization as a preprocessing step, VeriTrust ensures that the hash values embedded in Merkle trees remain stable, reproducible, and resilient to benign content updates, while still capturing meaningful alterations through versioned re-anchoring events [67,68].

4.3.2. Algorithm 1: Content Provenance Submission

The content submission process initiates when a content creator (e.g., a journalist) generates digital material, which is accompanied by organized metadata and a Verifiable Credential (VC) provided by a verified publisher. The submission algorithm’s goal is to cryptographically link these components and securely establish their provenance and authorship.

Algorithm 1 formalizes the content-provenance submission workflow; Table 4 lists its inputs, and Figure 3 shows the flowchart.

Algorithm 1. Content Submission.

Input: Content, metadata, verifiable credential (VC), creatorPrivateKey
Generate a cryptographic hash of the content using SHA-256.
- contentHash ← SHA256(content)
Create a structured provenance record:
- provenanceRecord = { contentHash, timestamp, creatorDID, vcID, metadata }
Sign the provenance record with the creator’s private key.
- signedRecord ← signWithKey(provenanceRecord, creatorPrivateKey)
Add the signed record to the current batch and update the Merkle tree.
- merkleRoot ← updateMerkleTree(signedRecord)
Submit the Merkle root to the blockchain via a smart contract.
- transaction ← submitToBlockchain(merkleRoot)
Store the full signed record and Merkle proof off-chain (e.g., on IPFS).
- storeOffChain(signedRecord, getMerkleProof(signedRecord))
Output: transaction.id

The submission algorithm begins by creating a cryptographic hash of the content, enabling immediate detection of any alterations. Each structured entry contains metadata and a link to the creator’s Verifiable Credential, tying the content to a verified and identifiable origin. Upon signing the record with the creator’s private key, it establishes clear proof of authorship and prevents any denial of origin. This signed entry is then inserted in a Merkle tree, an efficient data structure that represents thousands of records through a single root hash. The root is immutably written to the blockchain, while the complete record and its verification path are stored in IPFS. This split approach between on-chain and off-chain storage enhances both scalability and compliance with privacy regulations.

4.3.3. Algorithm 2: Content Verification

The verification algorithm is employed by end-users or content-hosting platforms aiming to confirm the authenticity of encountered content. This verifies both the integrity of the material and evaluates the credibility of its creator and issuing authority by leveraging decentralized cryptographic evidence.

Algorithm 2 formalizes the content-verification workflow. The required inputs for the verification procedure are summarized in Table 5.

Algorithm 2. Content Verification.

Inputs: content, signedRecord, vc, merkleProof, onChainMerkleRoot
Output: True if integrity and credentials validate; otherwise False

1. Compute content hash:

computedHash ← SHA256(content)

2. Validate inclusion and integrity:

isMerkleProofValid ← verifyMerkleProof(signedRecord, merkleProof, onChainMerkleRoot)

isHashMatch ← (computedHash = signedRecord.contentHash)

if not (isMerkleProofValid ∧ isHashMatch) then

return False // integrity validation failed

3. Verify digital signatures and credentials:

creatorDID ← signedRecord.creatorDID

creatorPublicKey ← resolveDID(creatorDID)

isRecordSignatureValid ← verifySignature(signedRecord, creatorPublicKey)

issuerDID ← vc.issuer

issuerPublicKey ← resolveDID(issuerDID)

isVCSignatureValid ← verifySignature(vc, issuerPublicKey)

4. Return final result:

return (isRecordSignatureValid ∧ isVCSignatureValid)

To detect any tampering, the verifier recomputes the content’s hash and checks the inclusion proof against the Merkle root stored on the blockchain, confirming the content’s inclusion in a verified batch. The provenance record’s digital signature is authenticated using the creator’s public key, retrieved from their DID document. Similarly, the issuer’s signature on the Verifiable Credential (VC) is validated to confirm its legitimacy. The system acknowledges the content’s origin only when all cryptographic validations are met successfully. The client-side verification workflow outlined in Algorithm 2 is illustrated in Figure 4.

Collectively, these algorithms provide an auditable and tamper-resistant lifecycle for verifying content provenance. They reflect the foundational values of the VeriTrust framework: decentralization, cryptographic assurance, and scalable architecture through a combination of on-chain and off-chain components. Designed for flexibility, the system’s modular and platform-independent design allows seamless adaptation across environments, from browser extensions to enterprise content verification APIs. The public repository (v0.1.0) includes a runnable offline verifier, available at “verifier-stub/verify.py”, which reproduces the hashing and Merkle inclusion procedures.

4.4. Framework Diagram and Architecture

Figure 5 illustrates the detailed design of the VeriTrust framework, capturing the full lifecycle of content provenance, from the generation of digital identity and credential issuance to final verification on the client side. The system is organized into three core layers:

Identity Layer (layer 1): Manages Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs)
Provenance Layer (layer 2): Establishes content authenticity using cryptographic hashing and Merkle root embedding
Verification Layer (layer 3): Enables validation through a hybrid blockchain record and off-chain data

4.4.1. Layer 1: Identity Layer (Self-Sovereign Identity—SSI)

At the initial level, both the content creator and publisher generate DIDs using agents compatible with Hyperledger Indy, such as Aries wallets. The publisher then issues a VC to the creator that verifies their legitimacy, be it through authority, affiliation, or qualifications. These credentials are stored locally in decentralized wallets and presented during content validation. This SSI approach ensures identities are cryptographically secure, privacy-conscious, revocable, and aligned with evolving W3C standards [69].

4.4.2. Layer 2: Provenance Layer (On-Chain + Off-Chain)

Once the content is created, it undergoes SHA-256 hashing to create a tamper-evident fingerprint. The creator signs this content hash along with metadata and DID information to produce a signed provenance record. These records are then batched into a Merkle Tree, whose root is committed to an on-chain smart contract deployed on Ethereum-compatible ledgers (e.g., Hyperledger Besu). The full signed record and corresponding Merkle proof are stored in an off-chain decentralized storage system such as IPFS or Filecoin, ensuring content retrievability without burdening the blockchain with large data payloads [70].

4.4.3. Layer 3: Verification Layer (Client-Side)

The verification process begins when a content Verifier (e.g., fact-checker, platform, or consumer) views the content and computes a local hash. This hash is compared against the Merkle proof retrieved from IPFS and the Merkle root stored on-chain. The system validates:

Content integrity: Is the local hash part of the Merkle tree rooted on-chain?
Record authenticity: Is the signed record valid and signed by the creator’s DID?
Issuer authority: Is the VC valid, and was it issued by a legitimate DID?

Only if all cryptographic checks pass does the system accept the content as verifiable. The design enforces trust decentralization, privacy preservation, and minimal on-chain load, aligning with the architectural principles of scalable decentralized identity [58].

4.5. Limitations and Trade-Offs

While VeriTrust boasts robust architecture and strong cryptographic foundations, several intrinsic limitations warrant critical attention:

Issuers’ Trustworthiness: The framework’s success depends on the integrity of credential issuers. In the absence of formal governance or vetting protocols, malicious actors could compromise trust by generating fraudulent credentials.
ZKP Overhead: Zero-Knowledge Proofs enhance privacy and support selective disclosure, but they introduce computational complexity that may hinder performance on devices with limited processing power.
Off-Chain Storage Risks: Since IPFS does not ensure data persistence unless pinning incentives are actively maintained, reliance on it for content storage poses availability concerns.
Cold-Start Adoption Barrier: For decentralized infrastructure to function, creators, issuers, and verifiers must simultaneously onboard, demanding technical literacy and platform compatibility across stakeholders.
Ethereum Gas Costs: Though Merkle root batching alleviates some transaction fees, gas costs on Ethereum-based chains remain a potential bottleneck in high-frequency scenarios.
Regulatory Compliance Challenges: The blockchain’s immutable nature clashes with data protection requirements such as GDPR, particularly with respect to the right to erasure and principles of data minimization.

These challenges underscore the necessity of complementing cryptographic innovation with thoughtful governance, usability, regulatory, and legal compliance strategies.

4.6. Mathematical Validation

To substantiate VeriTrust’s cryptographic integrity and verification assurances, this section formalizes the mathematical principles underpinning its core functions. These include:

Secure Hashing: Ensuring content integrity through tamper-evident cryptographic digests
Merkle Trees: Enabling scalable and auditable data structures for efficient provenance anchoring
ECDSA Signatures: Validating the authenticity of digital credentials via elliptic curve-based cryptography
Zero-Knowledge Proofs (ZKPs): Providing selective disclosure capabilities while preserving user privacy

Each component is accompanied by a clear definition, a practical mini-scenario, and a formal statement detailing its security guarantees. Collectively, these constructs form the theoretical bedrock that justifies VeriTrust’s ability to withstand threats such as forgery, unauthorized modification, identity spoofing, and privacy violations.

4.6.1. Hash-Based Content Integrity

Definition 1.

Let H: {0, 1}* → {0, 1}²⁵⁶ denote the SHA-256 cryptographic hash function, which maps an arbitrary-length input into a fixed 256-bit digest. The function satisfies the following properties:

Preimage Resistance: Given a hash output $h$ It is computationally infeasible to find any input $x$ such that $H (x) = h$ .
Second Preimage Resistance: Given $x$ , it is computationally infeasible to find another $x^{'}$ ≠ $x$ such that $H (x^{'}) = H (x)$ .
Collision Resistance: It is infeasible to find any pair $x$ , $x^{'}$ such that $x$ ≠ $x^{'}$ and $H (x) = H (x^{'})$ .

Scenario. Suppose Alice creates a digital news article A and submits it to VeriTrust. The content hash

h = H (A)

is computed and recorded in a signed provenance record. If a malicious actor later attempts to modify the article content to A′, the recomputed hash

H (A^{'})

would differ from

h

, breaking the verification process.

Security Claim. The hash digest

h

serves as a tamper-evident fingerprint. Any post-submission modification results in a hash mismatch, rendering forgery computationally infeasible under the assumption of SHA-256’s resistance properties [71].

4.6.2. Merkle Tree Inclusion Proof

Definition 2.

A Merkle Tree

M

is a binary hash tree constructed from

n

leaf nodes

{h 1, h_{2}, \dots h n}

, where each internal node is

h_{i j} = H (h_{i} | | h_{j})

, and the root node

R

represents the aggregated hash commitment. A Merkle Proof for a leaf

h_{i}

consists of log₂ (n) sibling hashes required to recompute

R

.

Scenario. Alice’s content hash

h_{1}

is added to a Merkle Tree with other creators’ content

h_{2}, h_{3}, h_{4}

. The tree is built as:

$h_{12} = H (h_{1} | | h_{2})$
$h_{34} = H (h_{3} | | h_{4})$
$R = H (h_{12} | | h_{34})$

To verify the inclusion of

h_{1}

, the verifier reconstructs the Merkle root using the proof path {

h 2

,

h_{34}

} and checks if

R^{'} = R

.

Security Claim. The proof size grows logarithmically with the batch size. This ensures both integrity and scalability: the on-chain root anchors thousands of records without exposing their content, and any single record’s inclusion can be validated in

O (\log n)

time.

4.6.3. ECDSA-Based Authenticity

Definition 3.

The Elliptic Curve Digital Signature Algorithm (ECDSA) consists of the following functions:

$KeyGen () \to (s k, p k)$ : Generates a private/public key pair.
$σ = {Sign}_{s k} (m)$ : Generates signature σ on message m using private key $s k$ .
$Verify p k (m, σ) \to {True, False}$ : Verifies the signature using $p k$ .

Scenario. Alice signs her provenance record r with her private key

s k_{A}

, yielding signature

σ_{A} = {Sign}_{s k_{A}} (r)

. A verifier retrieves her DID Document from Hyperledger Indy, extracts the public key

p k_{A}

, and confirms that

{Verify}_{p k_{A}} (r, σ_{A}) = True

.

Security Claim. Under the EUF-CMA assumption (Existential Unforgeability under Chosen Message Attack), it is computationally infeasible to forge σ without

s k_{A}

. Thus, authenticity and non-repudiation of content are cryptographically ensured [72].

4.6.4. Zero-Knowledge Privacy Guarantees

Definition 4.

A Zero-Knowledge Proof (ZKP) protocol allows a prover to convince a verifier that a statement is true without revealing why. A non-interactive ZKP, such as Groth16, uses pre-processed circuits to verify a proof π over public inputs without revealing the witness w.

Scenario. Alice possesses a Verifiable Credential (VC) asserting her membership in a journalist union. Instead of revealing the entire VC, she uses a ZKP to prove that the VC’s issuer is trusted and the credential is current, without exposing her name or credential attributes.

$VerifyZKP (π, publicInputs) = True$

Security Claim. A valid π satisfies:

Completeness: If the statement is true, the verifier accepts.
Soundness: A false claim passes verification with negligible probability.
Zero-Knowledge: The verifier learns nothing beyond the statement’s validity.

This allows content creators to prove verifiable status (e.g., “verified journalist”) without leaking identity details. These characteristics make ZKPs a foundational element of the proposed privacy-preserving improvements for the VeriTrust framework.

4.6.5. Summary of Formal Guarantees

The table below summarizes the key security goals of VeriTrust, the formal property used to realize each goal, and the cryptographic basis for each assertion.

Table 6 summarizes VeriTrust’s key security goals together with the corresponding formal property and cryptographic basis for each.

These formal guarantees collectively validate VeriTrust’s design objectives of verifiable authenticity, scalable inclusion, and privacy preservation in the decentralized digital media ecosystem.

4.7. Benchmarking and Evaluation Methodology

To evaluate the practical feasibility and projected performance of the VeriTrust framework, this subsection outlines a benchmarking methodology informed by the established literature across three domains: blockchain anchoring, decentralized identity systems, and verifiable credential workflows. While VeriTrust remains a conceptual framework at this stage, its design decisions and performance expectations are supported by empirical studies and known system behaviours from analogous real-world deployments.

4.7.1. Evaluation Metrics and Performance Objectives

The following core metrics are defined to evaluate VeriTrust’s operational viability:

Latency (ms): Time required to perform identity resolution, content verification, and Merkle proof validation.
Gas Cost (Gwei): Estimated cost to anchor Merkle roots on-chain using Solidity smart contracts.
Proof Size (bytes): The size of Merkle proof versus Zero-Knowledge Proof (ZKP) required for verifiability.
Storage Overhead: On-chain vs. off-chain content and metadata storage comparison.
Scalability: Number of content records anchored per transaction using Merkle batching.

These metrics are aligned with studies such as [73] on SSI-based authentication latency, and [74] on blockchain throughput constraints.

4.7.2. Benchmarking Simulation Scenarios

Although direct empirical evaluation is beyond this paper’s scope, we simulate expected outcomes using parameters and performance profiles reported in the literature and technical documentation of underlying tools.

Scenario A: Content Anchoring Gas Cost

Assuming the submission of a single Merkle root to Hyperledger Besu (IBFT-2, private chain).
Based on similar Solidity smart contract interactions [75], the estimated gas cost is gas units.
On a permissioned deployment, transaction fees can be negligible or fixed-rate depending on policy enforcement.

Scenario B: Merkle Proof vs. ZKP Overhead

A typical Merkle proof for records involves hashes ~320 bytes total.
In contrast, a ZKP using Groth16 or Plonk ranges from ~768 bytes to 1.3 KB [76].
Thus, Merkle batching is more space-efficient for inclusion verification, while ZKPs offer privacy benefits.

Scenario C: Identity Resolution and VC Validation Latency

Identity resolution via Hyperledger Indy takes ~180–500 ms, depending on ledger query delay [77].
VC signature verification over secp256r1 (using ECDSA) typically requires ~20–50 ms per credential [Aries Framework Metrics].
Total client-side verification latency is thus estimated at ms per content artifact.
This version does not include direct empirical benchmarking and instead, reproducibility is supported through publicly released artifacts that detail the data formats and verification logic, available at https://github.com/marufrigan/veritrust-artifacts (v0.1.0) [65] (accessed on 25 July 2025).

4.7.3. Storage Efficiency and Off-Chain Trade-Offs

VeriTrust leverages IPFS to store full provenance records, thereby minimizing on-chain storage burden:

An average provenance JSON + signature file is ~1.1 KB.
Offloading this to IPFS reduces smart contract data payloads by 90–95% per record.
Only the IPFS CID (~46 bytes) and Merkle root (~32 bytes) are stored on-chain.

This aligns with best practices in hybrid blockchain systems, where only commitments, not raw data, are anchored [78].

4.7.4. Scalability via Merkle Batching

Given that one Merkle root can commit up to 2k records in a single transaction, we simulate a batch size of n = 512 content submissions per cycle. This yields:

≈6.4× improvement in on-chain anchoring efficiency.
A 90% reduction in gas fees per individual record.

Such batching is suitable for media platforms issuing content in daily or hourly cycles and is consistent with scaling strategies seen in timestamping systems like OpenTimestamps [79].

5. Discussion and Comparative Analysis

This section evaluates the VeriTrust framework in light of existing systems and challenges in the domain of digital content verification and provenance. The discussion is structured around three core aspects: a comparative performance analysis against state-of-the-art models, a security assessment of the architecture under common threat vectors, and a reflection on the regulatory and ethical implications of deploying such a system at scale.

5.1. Benchmark vs. State-of-the-Art Systems

A growing number of blockchain-based and SSI-enabled frameworks have emerged in recent years to combat fake news, misinformation, and digital identity fraud. While many of these systems share overlapping objectives with VeriTrust, they often fall short in specific technical, ethical, or operational dimensions. Table 7 presents a comparative analysis of selected content verification frameworks across 5 core evaluation criteria: SSI Support (native integration of DIDs/VCs), Merkle Anchoring (use of Merkle trees for batching), Off-Chain Storage (use of systems like IPFS), VC Integration (support for Verifiable Credentials), and ZKP Support (current or planned integration of Zero-Knowledge Proofs).

Key observations

SSI and Credential-Based Trust: VeriTrust is one of the few frameworks to natively incorporate Self-Sovereign Identity (SSI) using Hyperledger Indy and Aries. Unlike centralized key registries used in Factom and ContentProof, VeriTrust binds identity assertions (e.g., “verified journalist”) via W3C Verifiable Credentials issued by decentralized authorities. This greatly improves resistance to impersonation and key forgery [69].
Revocation Performance and Scalability: Unlike many existing SSI frameworks, which often encounter difficulties with high-latency revocation checks, VeriTrust utilizes Hyperledger Indy’s cryptographic accumulators to enable efficient, privacy-preserving revocation. Previous research shows that revocation proof verification can be completed within ~50–100 ms, supporting scalability for high-volume media ecosystems [84].
Anchored Provenance via Merkle Trees: TruthChain and ContentProof use direct content hashes to anchor items on-chain. VeriTrust improves this by introducing Merkle tree batching, thereby enabling scalable anchoring of thousands of records per transaction—reducing gas costs and enabling public audit trails. The use of Merkle-based verification also simplifies inclusion proofs on the verifier side [85].
Future-Proof Privacy with ZKPs: While ZKP support is still in the roadmap for VeriTrust, it is absent in most other systems. ZKPs are critical for privacy-preserving attestations, especially when verifying credentials without revealing full claim content (e.g., “is a certified fact-checker” without revealing identity). Planned integration with Groth16 or PLONK will address this in VeriTrust v2 [76].
Federated Learning Role: VeriTrust proposes federated learning as a complementary layer serving two key purposes: (i) enabling client-side misinformation without centralizing user data, and (ii) facilitating collective reputation scoring of issuers and creators. Together, this dual application strengthens both technical accuracy and trust governance while preserving user privacy.
Off-Chain Data and Decentralized Storage: Factom and ContentProof rely on third-party databases for content hosting, raising long-term reliability issues.
Compliance-Aware Design: VeriTrust uniquely considers GDPR implications by salting content hashes, supporting revocation registries, and designing DID resolvers with expiration logic. This addresses the immutability-vs.-erasure tension that undermines many blockchain applications in regulated jurisdictions [86].

Summary

While several existing platforms address partial aspects of the fake news verification problem, VeriTrust distinguishes itself through:

Full-stack integration of identity, provenance, and content verification layers
Cryptographic verifiability via Merkle proofs and digital signatures
Forward compatibility with zero-knowledge technologies and data privacy laws

These features position VeriTrust as a uniquely holistic, secure, and policy-aware solution for provenance-sensitive content ecosystems.

5.2. Security Analysis: Threat Model and Mitigations

VeriTrust achieves security through a multi-layer approach involving cryptographic integrity, identity authentication, and tamper-resistant storage. Its dual-ledger design utilizes Hyperledger Indy for decentralized identity and Hyperledger Besu for immutable content provenance. Cross-ledger consistency is maintained by cryptographically linking provenance records on Besu to credentials managed by Indy. This architecture is fault-tolerant and allows content verification to proceed even if the identity ledge experiences temporary outages. The governance model is strictly permissioned, limiting credential issuance and revocation to authorized entities and enforced through a revocation registry to prevent the risk of misuse. This arrangement enforces clear trust boundaries and aligns with governance frameworks proposed by prominent SSI ecosystems like the Sovrin Foundation and the European Blockchain Services Infrastructure (EBSI). This section details the primary threat vectors and how the architectural strategies are used to mitigate them.

As shown in Figure 6, the VeriTrust threat model maps major risks to mitigation controls across the Identity & Trust, Provenance & Anchoring, and Governance & Verification layers.

Threat 1: Content Tampering (After Publication)

Attack Vector: Malicious actors modify digital content after it has been endorsed or published.
Mitigation: Content fingerprints are computed via SHA-256 hashes. Any tampering changes the hash, invalidating the Merkle proof. As each record is cryptographically tied to its original content, tampering is instantly detectable [71].

Threat 2: Credential Forgery

Attack Vector: An attacker fabricates or reuses Verifiable Credentials to impersonate verified publishers.
Mitigation: Each VC is signed using the issuer’s private key and verified using the corresponding DID’s public key from the Hyperledger Indy ledger. Revocation registries are checked during validation, preventing the reuse of revoked credentials [67].

Threat 3: Replay Attacks

Attack Vector: A previously valid signed content record is resubmitted to falsely prove new authorship.
Mitigation: Timestamps and nonces are incorporated into the provenance record. Merkle root anchoring ensures each batch is uniquely tied to its time of submission. Duplicate entries trigger rejection logic in the smart contract.

Threat 4: Sybil Attacks in the Identity Layer

Attack Vector: Adversaries generate multiple fake identities to flood the system with malicious content.
Mitigation: VeriTrust relies on verifiable credential issuance by trusted authorities (e.g., academic institutions, journalistic bodies). This makes Sybil identities non-functional unless they obtain VC endorsements, which are cryptographically verified.

Threat 5: Off-Chain Content Deletion

Attack Vector: Provenance records stored in IPFS are garbage-collected or lost due to a lack of pinning.
Mitigation: Although IPFS ensures decentralized storage, unpinned content risks being removed eventually. VeriTrust addresses this by incorporating multiple availability strategies:
○
Decentralized Pinning Services (e.g., Pinata, Eternum) are used to replicate data across independent nodes.
○
Institutional Archival Gateways (e.g., university or media organization IPFS nodes) that maintain long-term pinning.
○
Hybrid Redundancy Models combine IPFS with conventional cloud storage (e.g., AWS S3), ensuring fallback access [87].
○
Multi-Node Replication: Previous research indicates that pinning across ≥3 nodes can achieve retrieval success rates above 99% over extended timeframes [87].

The VeriTrust security model combines:

Immutable commitments via Merkle trees
Decentralized trust anchors using DIDs and VCs
Tamper-evidence and content-bound hashes
Smart contract logic for duplication and replay prevention

Table 8 summarizes the key threat categories for VeriTrust and the corresponding mitigation strategies and trade-offs.

VeriTrust dual-ledger architecture uses Hyperledger Indy for decentralized identity and Hyperledger Besu for immutable content provenance. To ensure cross-ledger consistency, provenance records on Besu are cryptographically linked to credentials managed by Indy. This design is fault-tolerant, as content verification can continue even during temporary identity ledger outages. The governance approach is permissioned, which means that only approved organizations can issue and revoke credentials. A revocation registry is used to stop misuse. This clearly shows where the system’s trust boundaries are, and it fits with the governance frameworks suggested by top SSI ecosystems such as the Sovrin Foundation and European Blockchain Services Infrastructure [31].

User Roles and Trust Assumptions: Within the governance framework, validator nodes are operated by independent stakeholders, including accredited new agencies, academic institutions, and civil-society organizations. In order to avoid unilateral control, issuer onboarding must require multiparty approval via the decentralized trust registry. This consortium-based model ensures authority is distributed, limiting single points of failure and mitigating centralization risks by preventing any single actor from monopolizing consensus or credential issuance.

5.3. Regulatory and Ethical Considerations

The deployment of content provenance frameworks in digital ecosystems introduces critical ethical and regulatory obligations. VeriTrust is designed to align with emerging data protection laws, content rights frameworks, and digital identity governance principles.

GDPR Compliance and the Right to Be Forgotten: VeriTrust avoids storing raw content or personal identifiers directly on-chain. Instead, it uses salted cryptographic hashes and off-chain IPFS storage. In line with GDPR Article 17, revocation registries allow credentials to be invalidated, while DID documents can be deactivated or rotated. This design ensures that data subjects retain meaningful control over their verifiable credentials without violating blockchain immutability constraints [86].
Algorithmic Fairness and Credential Bias: SSI-based systems carry risks of embedded bias, particularly when credential issuers act as gatekeepers (e.g., only certain institutions can issue VCs). VeriTrust proposes a governance model where issuers are vetted by a decentralized trust registry. This guards against monopolistic control and enhances inclusiveness [88].
Censorship and Decentralized Access: A balance must be struck between content verifiability and freedom of expression. While VeriTrust does not moderate content, it enables third-party verification without central arbitration. All verification logic is transparent, cryptographic, and content-neutral, making the system resistant to political or institutional censorship [89].
Legal Standing of Cryptographic Evidence: VeriTrust uses digital signatures (ECDSA) and timestamped Merkle proofs to provide court-admissible evidence trails. Several jurisdictions now recognize such cryptographic attestations under eIDAS (EU), ESIGN (USA), and PKI-based evidence frameworks [90]. Future versions may support notarization plugins for enhanced legal assurance.
Ethical Use of AI-Driven Verification Modules: If integrated with AI-based classifiers (e.g., for content quality or misinformation tagging), VeriTrust must ensure transparency and auditability. The use of explainable AI (XAI) and transparent training datasets is encouraged to avoid opacity and hidden bias in downstream use cases [61].
GDPR Compliance and Data Protection: VeriTrust incorporates specific safeguards to comply with the EU General Data Protection Regulation (GDPR), carefully balancing the blockchain immutability with individual user data rights.
Salt Management: To mitigate dictionary attacks, NIST guidelines are followed, and hashes are safeguarded with periodically rotated, cryptographically secure random salts.
Off-Chain Data Erasure: Upholding the “Right to be Forgotten”, Personally Identifiable Information (PII) is encrypted and stored off-chain. Deletion of this off-chain data allows the associated on-chain hash to be permanently non-resolvable, achieving practical erasure in line with prevailing GDPR interpretations.
Real-Time Revocation: Credential validity is governed by a Hyperledger Indy revocation registry. Verifiers must perform real-time queries against this registry to ensure that a credential has not been revoked prior to accepting it as valid.
DPIA Summary: An initial Data Protection Impact Assessment (DPIA) highlights and addresses risks such as unauthorized access and issuer misuse through encryption, credential revocation, and transparent governance practices.

VeriTrust demonstrates awareness of key regulatory and ethical concerns and embeds safeguards at both architectural and procedural levels. It does not aim to moderate or censor, but rather empower users, institutions, and platforms to verify content in a cryptographically robust, privacy-aware, and legally resilient manner.

5.4. Reproducibility Statement

A public repository has been released as VeriTrust Artifacts v0.1.0, and includes: (i) annotated Docker/Compose templates for Besu, Indy/Aries, and IPFS; (ii) a minimal Solidity contract (“contracts/ContentRegistry.sol”) demonstrating the anchoring interface; (iii) an offline verifier stub (“verifier-stub/verify.py”) that validates SHA-256 integrity and Merkle inclusion against example JSON; and (iv) example Verifiable Credentials (VCs), Verifiable Presentations (VPs), provenance records, Merkle proofs and a toy dataset under “data/”. These artifacts are released with the intention for conceptual reproducibility rather than production deployment, and they enable independent inspection of the framework’s cryptographic foundations and workflows [65] URL: https://github.com/marufrigan/veritrust-artifacts (accessed on 25 July 2025).

6. Conclusions and Future Work

6.1. Summary of Contributions

This paper introduces VeriTrust, a provenance-centric framework designed to address a critical shortfall in misinformation mitigation: the lack of verifiable authorship and source accountability. Conventional detection methods, whether content-based, style-based, or graph-based, are largely reactive and often struggle to keep pace with the dynamic evolution of misinformation tactics. Moreover, they lack cryptographic assurances of content integrity.

VeriTrust offers a proactive, trust-anchored architecture comprising three key pillars:

Self-Sovereign Identity (SSI): Binds cryptographically verifiable ties between content creators/publishers and their identities via Decentralized Identifiers (DIDs) and Verifiable Credentials (VCs).
Blockchain Technology: Ensures immutable, tamper-evident logging of content provenance by anchoring hash values, timestamps, and Merkle proofs on a distributed ledger.
Hybrid AI and Community Validation Layer: Combines AI-driven content analysis with participatory verification mechanisms, such as decentralized voting and reputation models, to foster collective trust.

By shifting the trust architecture from reactive detection to origin-first verification, VeriTrust reframes the misinformation problem from a binary “fake vs. real” to a transparent process of assessing “who authored it”, “when”, and “how”. This reconfiguration creates a decentralized, cryptographically verifiable trust pipeline.

It should be noted that although the present work is largely conceptual, its claims are supported by formal cryptographic validation (Section 4.6) with empirical benchmarking and real-world deployment deferred to future research phases.

6.2. Concluding Remarks

In an age defined by information deluge and synthetically generated content, the crisis extends beyond misinformation; it is the systematic erosion of trust in digital narratives. Much of today’s media circulates stripped of origin, intent, or verifiable lineage, leaving audiences exposed to convincingly fabricated distortions.

While gradual improvements in content classifiers and NLP models have been made, these tools remain vulnerable to adversarial manipulation and lack source traceability. What is urgently needed is not merely smarter detectors, but smarter systems of accountability.

VeriTrust proposes a conceptual shift: by anchoring digital content to verifiable identities and immutable provenance records at the point of creation, trust becomes an intrinsic, provable property rather than a retrofitted guess. While not designed to eliminate all falsehoods, VeriTrust establishes a transparent infrastructure where truth can be cryptographically verified and origin tracked, thus restoring foundational trust in digital communication ecosystems. Accordingly, this work should be viewed as a theoretical and architectural contribution. Practical validation through real-world deployment, stress testing, and usability assessments remains an essential step to confirm the effectiveness of these findings.

6.3. Future Research Directions

The VeriTrust framework lays the foundation for expansive future research aimed at operationalizing, scaling, and evolving its applicability across diverse digital ecosystems.

Prototype Development and Benchmarking: A proof-of-concept prototype should be developed using Hyperledger Besu, Aries, and Solidity. Key performance metrics such as latency, throughput, and transaction scalability must be rigorously evaluated under varying content volumes. Critical performance indicators, including client-side verification latency, gas costs for varying Merkle batch sizes, and IPFS retrieval success rates with and without pinning, should be thoroughly assessed in controlled settings using information from prior blockchain benchmarking studies [71,87] and decentralized storage research [87].
Privacy-Preserving Verification via ZKPs: Future study should investigate the integration of efficient Zero-Knowledge Proofs (ZKPs) to enable privacy-preserving verification. While the current design does not yet include the incorporation of ZKPs, their adoption represents a key research direction for enabling privacy-preserving verification without disclosing sensitive identifiers, allowing the maintenance of a critical balance between accountability and privacy.
Multimodal Provenance Models: Extending provenance to dynamic formats (e.g., deepfake videos, synthetic audio_ demands robust PROV-compliant metadata schemas and real-time validation pipelines tailored to composite content verification.
Decentralized Governance Protocols: Sustainable trust requires community-driven governance models for VC issuer accreditation, protocol evolution, and dispute resolution, potentially drawing from DAO-inspired voting models.
Ecosystem Interoperability: Seamless interoperability across cross-chain and cross-platform must be addressed, ensuring that the VeriTrust framework can integrate with varied blockchain networks such as Polygon and Solana, as well as open-source media platforms.
Data Protection Impact Assessment (DPIA): Future work should include systematic Data Protection Impact Assessments (DPIAs) to evaluate adherence to GDPR and related data protection frameworks, with a focus on revocation propagation and anonymization of off-chain metadata.

6.4. Limitations of the Study

VeriTrust is introduced as a conceptual and theoretically validated framework without a full prototype or large-scale deployment reported in this paper. Performance benchmarks (e.g., latency, gas cost, availability) are derived from previous studies and remain to require empirical validation. While the integration of zero-knowledge proofs is proposed as a future enhancement, it has not yet been implemented. Off-chain persistence currently relies on pinning and institutional mirrors, and long-term availability has not been formally assessed. Finally, governance specifics such as issuer accreditation, validator audits, and revocation operations are described at a high level and must be validated in real-world consortia.

Author Contributions

Methodology, M.F., U.B. and R.B.S.; validation, M.A., M.F. and U.B.; investigation, all authors; formal analysis, R.B.S. and M.A.; writing—original draft, M.F., U.B., M.A. and R.B.S.; writing—review and editing, M.F., U.B. and R.B.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Alsuwat, E.; Alsuwat, H. An improved multi-modal framework for fake news detection using NLP and Bi-LSTM. J. Supercomput. 2024, 81, 177. [Google Scholar] [CrossRef]
Zhang, L. Mechanisms of data mining in analyzing the effects and trends of news dissemination. Appl. Math. Nonlinear Sci. 2024, 9. [Google Scholar] [CrossRef]
Pava-Díaz, R.A.; Gil-Ruiz, J.; López-Sarmiento, D.A. Self-sovereign identity on the blockchain: Contextual analysis and quantification of SSI principles implementation. Front. Blockchain 2024, 7, 1443362. [Google Scholar] [CrossRef]
Torky, M.; Nabil, E.; Said, W. Proof of Credibility: A Blockchain Approach for Detecting and Blocking Fake News in Social Networks. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 321–327. [Google Scholar] [CrossRef]
Nawaz, M.Z.; Nawaz, M.S.; Fournier-Viger, P.; He, Y. Analysis and Classification of Fake News Using Sequential Pattern Mining. Big Data Min. Anal. 2024, 7, 942–963. [Google Scholar] [CrossRef]
Farokhian, M.; Rafe, V.; Veisi, H. Fake news detection using dual BERT deep neural networks. Multimed. Tools Appl. 2023, 83, 43831–43848. [Google Scholar] [CrossRef]
Saha, I.; Puthran, S. Fake News Detection: A Comprehensive Review and a Novel Framework. In Proceedings of the 2024 OITS International Conference on Information Technology (OCIT), Vijayawada, India, 12–14 December 2024; pp. 463–469. [Google Scholar] [CrossRef]
Xia, J.; Li, Y.; Yu, K. Lgt: Long-range graph transformer for early rumor detection. Soc. Netw. Anal. Min. 2024, 14, 100. [Google Scholar] [CrossRef]
Zhu, X.; Li, L. Research on Detecting Future Fake News Based on Multi-View and Knowledge Graph. In Proceedings of the 2024 4th International Conference on Neural Networks, Information and Communication Engineering (NNICE), Guangzhou, China, 19–21 January 2024; pp. 309–315. [Google Scholar] [CrossRef]
Verma, P.K.; Agrawal, P.; Madaan, V.; Prodan, R. MCred: Multi-modal message credibility for fake news detection using BERT and CNN. J. Ambient. Intell. Humaniz. Comput. 2022, 14, 10617–10629. [Google Scholar] [CrossRef]
Hiramath, C.K.; Deshpande, G.C. Fake News Detection Using Deep Learning Techniques. In Proceedings of the 2019 1st International Conference on Advances in Information Technology (ICAIT), Chikmagalur, India, 25–27 July 2019; Volume 4. [Google Scholar] [CrossRef]
Arunthavachelvan, K.; Raza, S.; Ding, C. A deep neural network approach for fake news detection using linguistic and psychological features. User Model. User-Adapt. Interact. 2024, 34, 1043–1070. [Google Scholar] [CrossRef]
Tsai, C.M. Stylometric Fake News Detection Based on Natural Language Processing Using Named Entity Recognition: In-Domain and Cross-Domain Analysis. Electronics 2023, 12, 3676. [Google Scholar] [CrossRef]
Wu, J.; Guo, J.; Hooi, B. Fake News in Sheep’s Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks. In Proceedings of the KDD’24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; Volume 33, pp. 3367–3378. [Google Scholar] [CrossRef]
Zhao, C.; Wei, L.; Qin, Z.; Zhou, W.; Song, Y.; Hu, S. MPPFND: A Dataset and Analysis of Detecting Fake News with Multi-Platform Propagation. arXiv 2025. [Google Scholar] [CrossRef]
Guo, Z.; Wu, P.; Liu, X.; Pan, L. CoST: Comprehensive structural and temporal learning of social propagation for fake news detection. Neurocomputing 2025, 648, 130618. [Google Scholar] [CrossRef]
Truică, C.-O.; Apostol, E.-S.; Marogel, M.; Paschke, A. GETAE: Graph Information Enhanced Deep Neural NeTwork Ensemble ArchitecturE for fake news detection. Expert Syst. Appl. 2025, 275, 126984. [Google Scholar] [CrossRef]
Gong, S.; Sinnott, R.O.; Qi, J.; Paris, C. Less is More: Unseen Domain Fake News Detection via Causal Propagation Substructures. arXiv 2023, arXiv:2411.09389v1. [Google Scholar]
Bhardwaj, A.; Bharany, S.; Kim, S. Fake Social Media News and Distorted Campaign Detection Framework Using Sentiment Analysis & Machine Learning. Heliyon 2024, 10, e36049. [Google Scholar] [CrossRef] [PubMed]
Pareek, S.; Goncalves, J. Peer–supplied credibility labels as an online misinformation intervention. Int. J. Hum.-Comput. Stud. 2024, 188, 103276. [Google Scholar] [CrossRef]
Amri, S.; Boleilanga, H.C.M.; Aimeur, E. ExFake: Towards an Explainable Fake News Detection Based on Content and Social Context Information. arXiv 2023. [Google Scholar] [CrossRef]
Nasser, M.; Arshad, N.I.; Ali, A.; Alhussian, H.; Saeed, F.; Da’u, A.; Nafea, I. A systematic review of multimodal fake news detection on social media using deep learning models. Results Eng. 2025, 26, 104752. [Google Scholar] [CrossRef]
Li, X.; Wei, L.; Wang, L.; Ma, Y.; Zhang, C.; Sohail, M. A blockchain-based privacy-preserving authentication system for ensuring multimedia content integrity. Int. J. Intell. Syst. 2022, 37, 3050–3071. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, J.; Wu, S.; Pathan, M.S. Research on digital copyright protection based on the hyperledger fabric blockchain network technology. PeerJ Comput. Sci. 2021, 7, e709. [Google Scholar] [CrossRef]
Chen, Q.; Kong, Y.; Cheng, L. A Digital Copyright Protection System Based on Blockchain and with Sharding Network. In Proceedings of the 2023 IEEE 10th International Conference on Cyber Security and Cloud Computing (CSCloud)/2023 IEEE 9th International Conference on Edge Computing and Scalable Cloud (EdgeCom), Xiangtan, China, 1–3 July 2023. [Google Scholar] [CrossRef]
Shi, Q.; Zhou, Y. Application of blockchain technology in digital music copyright management: A case study of VNT chain platform. Front. Blockchain 2025, 7, 1388832. [Google Scholar] [CrossRef]
Wang, J.; Zheng, Q.; Shen, Y. Research on New Media Digital Copyright Registration and Trading Based on Blockchain. In Proceedings of the 2024 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Toronto, ON, Canada, 19–21 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
Xie, R.; Tang, M. A digital resource copyright protection scheme based on blockchain cross-chain technology. Heliyon 2024, 10, e36830. [Google Scholar] [CrossRef] [PubMed]
Stockburger, L.; Kokosioulis, G.; Mukkamala, A.; Mukkamala, R.R.; Avital, M. Blockchain-Enabled Decentralized Identify Management: The Case of Self-Sovereign Identity in Public Transportation. Blockchain Res. Appl. 2021, 2, 100014. [Google Scholar] [CrossRef]
Nokhbeh Zaeem, R.; Chang, K.C.; Huang, T.C.; Liau, D.; Song, W.; Tyagi, A.; Khalil, M.; Lamison, M.; Pandey, S.; Barber, K.S. Blockchain-Based Self-Sovereign Identity: Survey, Requirements, Use-Cases, and Comparative Study. In Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Melbourne, VIC, Australia, 14–17 December 2021. [Google Scholar] [CrossRef]
European Blockchain Services Infrastructure|Shaping Europe’s Digital Future. Available online: https://digital-strategy.ec.europa.eu/en/policies/european-blockchain-services-infrastructure (accessed on 28 July 2025).
Bashaddadh, O.; Omar, N.; Mohd, M.; Khalid, M.N.A. Machine Learning and Deep Learning Approaches for Fake News Detection: A Systematic Review of Techniques, Challenges, and Advancements. IEEE Access 2025, 13, 90433–90466. [Google Scholar] [CrossRef]
Hernández-Ramos, J.L.; Karopoulos, G.; Geneiatakis, D.; Martin, T.; Kambourakis, G.; Fovino, I.N. Sharing Pandemic Vaccination Certificates through Blockchain: Case Study and Performance Evaluation. Wirel. Commun. Mob. Comput. 2021, 2021, e2427896. [Google Scholar] [CrossRef]
Su, Y.-J.; Hsu, W.-C. Hyperledger Indy-based Roaming Identity Management System. Taiwan Ubiquitous Inf. 2023, 8, 545–557. Available online: https://bit.kuas.edu.tw/~jni/2023/vol8/s2/16.JNI-0774.pdf (accessed on 25 July 2025).
Privacy Policies|Epic. Epic.com. 2025. Available online: https://www.epic.com/en-us/privacypolicies/ (accessed on 30 July 2025).
Digital Government. Digital Government—Making Digital Service Delivery Easier for Public Service Employees. 2024. Available online: https://digital.gov.bc.ca/design/digital-trust/online-identity/person-credential/ (accessed on 30 July 2025).
Schembri, F.; Digital diplomas. MIT Technology Review. 2018. Available online: https://www.technologyreview.com/2018/04/25/143311/digital-diplomas/ (accessed on 30 July 2025).
Hoffmann, T.; Vasquez, M.C.S. The estonian e-residency programme and its role beyond the country’s digital public sector ecosystem. CES Derecho 2022, 13, 184–204. [Google Scholar] [CrossRef]
National Digital Identity and Government Data Sharing in Singapore A Case Study of Singpass and APEX. Available online: https://www.developer.tech.gov.sg/assets/files/GovTech%20World%20Bank%20NDI%20APEX%20report.pdf (accessed on 28 July 2025).
Pöhn, D.; Grabatin, M.; Hommel, W. Analyzing the Threats to Blockchain-Based Self-Sovereign Identities by Conducting a Literature Survey. Appl. Sci. 2023, 14, 139. [Google Scholar] [CrossRef]
Grech, A.; Sood, I.; Ariño, L. Blockchain, Self-Sovereign Identity and Digital Credentials: Promise Versus Praxis in Education. Front. Blockchain 2021, 4, 616779. [Google Scholar] [CrossRef]
Weigl, L.; Barbereau, T.; Fridgen, G. The construction of self-sovereign identity: Extending the interpretive flexibility of technology towards institutions. Gov. Inf. Q. 2023, 40, 101873. [Google Scholar] [CrossRef]
Brăcăcescu, R.V.; Mocanu, Ş.; Ioniţă, A.D.; Brăcăcescu, C. A proposal of digital identity management using blockchain. Rev. Roum. Sci. Tech. Série Électrotechnique Énergétique 2024, 69, 85–90. [Google Scholar] [CrossRef]
PROV-O: The PROV Ontology. W3C Recommendation. 2013. Available online: https://www.w3.org/TR/prov-o/ (accessed on 25 July 2025).
Gil, Y.; Miles, S.; Belhajjame, K.; Deus, H.; Garijo, D.; Klyne, G.; Missier, P.; Soiland-Reyes, S.; Zednik, S. PROV Model Primer: W3C Working Group Note. Research Explorer The University of Manchester. 2013. Available online: https://research.manchester.ac.uk/en/publications/prov-model-primer-w3c-working-group-note (accessed on 30 July 2025).
Merkle, R.C. Protocols for Public Key Cryptosystems. In Proceedings of the 1980 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 14–16 April 1980. [Google Scholar] [CrossRef]
Akbarfam, A.J.; Heidaripour, M.; Maleki, H.; Dorai, G.; Agrawal, G. ForensiBlock: A Provenance-Driven Blockchain Framework for Data Forensics and Auditability. In Proceedings of the 2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Atlanta, GA, USA, 1–4 November 2023; pp. 136–145. [Google Scholar] [CrossRef]
Feng, K.; Ritchie, N.; Blumenthal, P.; Parsons, A.; Zhang, A.X. Examining the Impact of Provenance-Enabled Media on Trust and Accuracy Perceptions. Proc. ACM Hum.-Comput. Interact. 2023, 7, 1–42. [Google Scholar] [CrossRef]
Jiang, L.; Li, R.; Wu, W.; Qian, C.; Loy, C.C. DeeperForensics-1. In 0: A Large-Scale Dataset for Real-World Face Forgery Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar] [CrossRef]
Butt, U.J.; Hussien, O.; Hasanaj, K.; Shaalan, K.; Hassan, B.; Al-Khateeb, H. Predicting the impact of data poisoning attacks in blockchain-enabled supply chain networks. Algorithms 2023, 16, 549. [Google Scholar] [CrossRef]
Farhan, M.; Sulaiman, R.B. Developing Blockchain Technology to Identify Counterfeit Items Enhances the Supply Chain’s Effectiveness. FMDB Trans. Sustain. Technoprise Lett. 2023, 1, 123–134. [Google Scholar]
Jahankhani, H.; Jamal, A.; Brown, G.; Sainidis, E.; Fong, R.; Butt, U.J. (Eds.) AI, Blockchain and Self-Sovereign Identity in Higher Education; Springer Nature: Berlin/Heidelberg, Germany, 2023. [Google Scholar]
Farhan, M.; Nur, A.H.; Sulaiman, R.B.; Butt, U.; Zaman, S. Blockchain-Based Railway Ticketing: Enhancing Security and Efficiency with QR Codes and Smart Contract. In Proceedings of the 2025 4th International Conference on Computing and Information Technology (ICCIT), Tabuk, Saudi Arabia, 13–14 April 2025; IEEE: Piscataway, NJ, USA, 2025; pp. 238–244. [Google Scholar]
Farhan, M.; Nur, A.H.; Salih, A.; Sulaiman, R.B. Disrupting the ballot box: Blockchain as a catalyst for innovation in electoral processes. In Proceedings of the 2024 International Conference on Advances in Computing, Communication, Electrical, and Smart Systems (iCACCESS), Dhaka, Bangladesh, 8–9 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
Maruf, F.; Sulaiman, R.B.; Nur, A.H. Blockchain technology to detect fake products. In Digital Transformation for Improved Industry and Supply Chain Performance; Khan, M.R., Khan, N.R., Jhanjhi, N.Z., Eds.; IGI Global: Hershey, PA, USA, 2024; pp. 1–32. [Google Scholar] [CrossRef]
Sourav, M.S.U.; Mahmud, M.S.; Talukder, M.S.H.; Sulaiman, R.B. IoMT-blockchain-based secured remote patient monitoring framework for neuro-stimulation devices. In Artificial Intelligence for Blockchain and Cybersecurity Powered IoT Applications; CRC Press: Boca Raton, FL, USA, 2024; pp. 209–225. [Google Scholar]
Bin Sulaiman, R. Applications of Block-Chain Technology and Related Security Threats. Rejwan, Applications of Block-Chain Technology and Related Security Threats. SSRN Electron. J. 2018. Available online: https://ssrn.com/abstract=3205732 (accessed on 30 July 2025). [CrossRef]
Protocol Labs Research. Design and Evaluation of IPFS: A Storage Layer for the Decentralized Web|Protocol Labs Research. Protocol Labs Research. 2022. Available online: https://research.protocol.ai/publications/design-and-evaluation-of-ipfs-a-storage-layer-for-the-decentralized-web/ (accessed on 30 July 2025).
Morales-Alarcón, C.H.; Bodero-Poveda, E.; Villa-Yánez, H.M.; Buñay-Guisñan, P.A. Blockchain and Its Application in the Peer Review of Scientific Works: A Systematic Review. Publications 2024, 12, 40. [Google Scholar] [CrossRef]
Ramachandran, A.; Ramadevi, P.; Alkhayyat, A.; Yousif, Y.K. Blockchain and Data Integrity Authentication Technique for Secure Cloud Environment. Intell. Autom. Soft Comput. 2023, 36, 2055–2070. [Google Scholar] [CrossRef]
Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K. Fake News Detection and Classification: A Comparative Study of Convolutional Neural Networks, Large Language Models, and Natural Language Processing Models. Future Internet 2025, 17, 28. [Google Scholar] [CrossRef]
Schardong, F.; Custódio, R. Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy. Sensors 2022, 22, 5641. [Google Scholar] [CrossRef]
Surjatmodjo, D.; Unde, A.A.; Cangara, H.; Sonni, A.F. Information Pandemic: A Critical Review of Disinformation Spread on Social Media and Its Implications for State Resilience. Soc. Sci. 2024, 13, 418. [Google Scholar] [CrossRef]
Shcherbakov, A. Hyperledger Indy Besu as a permissioned ledger in self-sovereign identity. In Proceedings of the Open Identity Summit 2024 (OID 2024), Porto, Portugal, 20–21 June 2024; Roßnagel, H., Schunck, C.H., Sousa, F., Eds.; GI-Edition: Lecture Notes in Informatics (LNI), P-350; Gesellschaft für Informatik e.V.: Bonn, Germany, 2024; pp. 127–137. [Google Scholar] [CrossRef]
[R-VERITRUST]. VeriTrust Artifacts (v0.1.0—Conceptual Scaffold). GitHub. 2025. Available online: https://github.com/marufrigan/veritrust-artifacts (accessed on 30 July 2025).
Mazzocca, C.; Acar, A.; Uluagac, S.; Montanari, R.; Bellavista, P.; Conti, M. A Survey on Decentralized Identifiers and Verifiable Credentials. IEEE Commun. Surv. Tutor. 2025, 1–32. [Google Scholar] [CrossRef]
Verifiable Credentials Data Model v2.0. W3C Recommendation. 2023. Available online: https://www.w3.org/TR/vc-data-model/ (accessed on 30 August 2025).
Exclusive XML Canonicalization Version 1.0. W3C Recommendation. 2002. Available online: https://www.w3.org/TR/xml-exc-c14n/ (accessed on 30 August 2025).
Sporny, M.; Longley, D.; Sabadello, M.; Reed, D.; Steele, O.; Allen, C. Decentralized Identifiers (DIDs) v1.0. W3C Recommendation. 2022. Available online: https://www.w3.org/TR/did-core/ (accessed on 30 July 2025).
Doshi-Velez, F.; Kim, B. Towards A Rigorous Science of Interpretable Machine Learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
National Institute of Standards and Technology (NIST). FIPS PUB 180-4: Secure Hash Standard (SHS); NIST: Gaithersburg, MD, USA, 2015. [Google Scholar] [CrossRef]
RFC 6979; Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA). Internet Engineering Task Force (IETF). 2013. Available online: https://datatracker.ietf.org/doc/html/rfc6979 (accessed on 30 July 2025).
Siqueira, A.; Da, F.; Rocha, V. Performance Evaluation of Self-Sovereign Identity Use Cases. In Proceedings of the 2023 IEEE International Conference on Decentralized Applications and Infrastructures (DAPPS), Athens, Greece, 17–20 July 2023. [Google Scholar] [CrossRef]
Kokoris-Kogias, E.; Alp, E.C.; Gasser, L.; Jovanovic, P.; Syta, E.; Ford, B. CALYPSO: Private Data Management for Decentralized Ledgers. Cryptology ePrint Archive. 2018. Available online: https://eprint.iacr.org/2018/209 (accessed on 30 July 2025).
Gürsoy, G.; Brannon, C.M.; Gerstein, M. Using Ethereum blockchain to store and query pharmacogenomics data via smart contracts. BMC Med. Genom. 2020, 13, 74. [Google Scholar] [CrossRef]
Ben-Sasson, E.; Bentov, I.; Horesh, Y.; Riabzev, M. Scalable Zero Knowledge with No Trusted Setup. In Advances in Cryptology—CRYPTO 2019, Proceedings of the 39th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2019; Springer eBooks; Springer: Berlin/Heidelberg, Germany, 2019; pp. 701–732. [Google Scholar] [CrossRef]
Empower Healthcare through a Self-Sovereign Identity Infrastructure for Secure Electronic Health Data Access. arXiv 2022. [CrossRef]
Zhang, K.; Jacobsen, H. Towards Dependable, Scalable, and Pervasive Distributed Ledgers with Blockchains. In Proceedings of the 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), Vienna, Austria, 2–6 July 2018. [Google Scholar] [CrossRef]
OpenTimestamps: A Timestamping Proof Standard. 2025. Available online: https://opentimestamps.org/ (accessed on 30 July 2025).
OriginTrail Decentralized Knowledge Graph. 2025. Available online: https://origintrail.io/ (accessed on 30 July 2025).
Content Authenticity Initiative (CAI). 2025. Available online: https://contentauthenticity.org (accessed on 30 July 2025).
Coalition for Content Provenance and Authenticity (C2PA). C2PA Specification, Version 1.3. 2025. Available online: https://c2pa.org/specifications/specifications/1.3 (accessed on 30 July 2025).
Hyperledger Indy. 2025. Available online: https://www.hyperledger.org/projects/hyperledger-indy (accessed on 30 July 2025).
Obaid, A.H.; Al-Husseini, K.A.O.; Almaiah, M.A.; Shehab, R. Dynamic key revocation and recovery framework for smart city authentication systems. Int. J. Innov. Res. Sci. Stud. 2025, 8, 2112–2123. [Google Scholar] [CrossRef]
Benet, J. IPFS—Content Addressed, Versioned, P2P File System. arXiv 2014. [Google Scholar] [CrossRef]
Suripeddi, M.K.S.; Purandare, P. Blockchain and GDPR—A study on compatibility issues of the distributed ledger technology with GDPR data processing. J. Phys. Conf. Ser. 2021, 1964, 042005. [Google Scholar] [CrossRef]
IPFS Documentation|IPFS Docs. 2023. Available online: https://docs.ipfs.tech/ (accessed on 30 July 2025).
Working Groups. Decentralizing Trust in Identity Systems. Decentralized Identity Foundation. 2024. Available online: https://blog.identity.foundation/decentralizing-trust-in-identity-systems/ (accessed on 30 July 2025).
Blockchain and Freedom of Expression. 2019. Available online: https://www.article19.org/wp-content/uploads/2019/07/Blockchain-and-FOE-v4.pdf (accessed on 25 July 2025).
Dumortier, J. Regulation (EU) No 910/2014 on Electronic Identification and Trust Services for Electronic Transactions in the Internal Market (EIDAS Regulation); Edward Elgar Publishing: Cheltenham, UK, 2022. [Google Scholar] [CrossRef]

Figure 1. Layered Architecture of the VeriTrust Framework for Content Provenance and Fake News Detection.

Figure 2. End-to-end implementation pipeline.

Figure 3. Algorithm 1—Content Submission: Flowchart showing canonicalization, hash recomputation, retrieval of record + Merkle proof (IPFS), on-chain Merkle root check, and DID/VC signature validation leading to Trusted/Fail.

Figure 4. Algorithm 2—Content Verification: Flowchart showing retrieval of {signedRecord, merkleProof, VC} from IPFS, canonicalization, and salted hash recomputation SHA-256 (canonical_content∥saltNonce), on-chain Merkle-root check, provenance signature verification via creator DID resolution, and VC signature + revocation check, leading to a trusted/fail decision.

Figure 5. Cryptographic Verification Workflow of VeriTrust.

Figure 6. Threat Model and Mitigations.

Table 1. Summary of existing techniques and identified gaps in fake news detection, identity management, and content provenance.

Area	Representative Work and Methodology	Identified Gap
Knowledge-Based Fake News Detection	FNACSPM [5] used sequential pattern mining and machine learning classifiers on six datasets.	Relies only on textual content; lacks integration with source identity or cross-modal misinformation signals. Not designed for real-time adaptability.
Style-Based Fake News Detection	MCRED [10] combined text, image, and user interaction with contrastive learning and GNNs. Other studies used TF-IDF, Word2Vec, LSTM, and BERT.	Vulnerable to adversarial writing styles and LLM-generated text. No link to verified author identity or provenance.
Propagation-Based Fake News Detection	GETAE [17] and CoST [16] leveraged graph transformers and ensemble propagation-text models.	Needs a complete propagation history; unsuitable for early detection. Lacks identity anchoring or source verification.
Source-Based Fake News Detection	ExFake and Us-DeFake evaluated user trust via behavioral metadata and social graphs.	Heavily relies on behavioral patterns, which can be manipulated. No cryptographic user identity or tamper-proof accountability.
Blockchain for Content Verification	Attestiv, OriginStamp, and Hyperledger solutions focus on immutable timestamps and hash commitments.	Do not verify content authorship. High computational/storage overhead makes real-time applications difficult.
Self-Sovereign Identity (SSI)	Implements DIDs and verifiable Credentials for user-controlled identity in education and healthcare.	Identity proofs not tied to digital content authorship. No content anchoring or provenance linkage.
Digital Content Provenance	W3C PROV, Merkle trees, ForensiBlock used for editing trails and forensic verification.	Tracks edits and metadata, but lacks a link to the verified identity of the author or source of content.
Integrated Frameworks	No unified framework combining SSI + provenance + blockchain.	Existing systems work in silos. No holistic pipeline for verifying user identity and content origin simultaneously.
Proposed Framework (VeriTrust)	Combines DIDs/VCverified authorship + W3C PROV hash lineage + Besu smartcontract anchoring; realtime verifier plugin.	Delivers first end-to-end trust pipeline: proves who created content (SSI), what/when it was published (blockchain + hash), and how it evolved (PROV).

Table 2. Core technological components of the VeriTrust framework and their roles in fake news detection.

Component	Description	Role in Fake News Detection/Content Provenance	Underlying Technology/Standard
Decentralized Identifiers (DIDs)	Unique, cryptographically verifiable identifiers controlled by the user.	Establishes an immutable, user-controlled identity for content creators and verifiers.	W3C DID Standard, DLT (e.g., Ethereum/Polygon)
Verifiable Credentials (VCs)	Digital certificates containing claims about a user’s identity, roles, or expertise.	Attests to the credibility and qualifications of content creators and fact-checkers; it enables selective disclosure.	W3C VC Standard, Cryptographic Proofs
Blockchain Ledger	A distributed, immutable, and tamper-resistant record of transactions.	Stores cryptographic hashes of content, DIDs, timestamps, and provenance events; ensures data integrity and transparency.	Ethereum, Polygon (PoS), Smart Contracts
Smart Contracts	Self-executing code stored on the blockchain.	Automates provenance tracking rules, manages decentralized voting mechanisms, and enforces incentive/penalty systems.	Solidity (for EVM-compatible blockchains)
Decentralized Storage (IPFS/Web3.Storage)	Peer-to-peer file storage systems.	Stores the full content files (articles, images, videos) off-chain, linked by hashes on the blockchain, addressing scalability.	IPFS, Web3.Storage
AI Detection Models	Multimodal deep learning algorithms.	Provides initial automated assessment of content authenticity (text, image, video); identifies potential misinformation.	Deep Learning (CNN, LSTM, Transformer Architectures), Federated Learning
Decentralized Voting Mechanism	A community-driven system for verifying content authenticity.	Allows verified fact-checkers and community members to review and vote on content, correcting AI misclassifications and building consensus.	Blockchain, Token Economics, Reputation Systems
Reputation System	A blockchain-based mechanism to track the reliability of participants.	Incentivizes honest behaviour and accurate verification; grants influence based on proven trustworthiness.	Blockchain, Token Economics

Table 3. Core components, version, and security role.

Component	Version	Security Role
Hyperledger Besu	24.3.1	On-chain anchoring of Merkle roots
Solidity Compiler	0.8.23	Smart contract bytecode verification
Hyperledger Indy	1.13.2	DID and schema ledger
Hyperledger Aries JS	0.4.5	DIDComm messaging and wallet services
IPFS (go-ipfs)	0.25.0	Off-chain storage (provenance content)
Docker Engine	20.10.21	Containerization and isolation
OpenSSL	3.2.0	TLS encryption and signature utilities

Table 4. Content provenance submission.

Variable	Type	Domain	Description
content	bytes	{0, 1} *	The raw digital content to be submitted.
metadata	JSON	Text	Structured contextual metadata (e.g., title, timestamp).
vc	JSON-LD	W3C VC	A Verifiable Credential issued to the creator.
creatorPrivateKey	bytes	Private Key	The private key used to sign the provenance record.

Note: In {0, 1} *, the * (Kleene star) denotes “zero or more repetitions”; i.e., the set of all finite binary strings. It is not a footnote marker.

Table 5. Variable definitions.

Variable	Type	Domain	Description
content	bytes	{0, 1} *	The raw content to be verified.
signedRecord	JSON	Text	The provenance record retrieved from IPFS.
vc	JSON-LD	W3C VC	The Verifiable Credential is associated with the creator.
merkleProof	list[bytes]	Hashes	Sibling hashes for verifying the Merkle root inclusion.
onChainMerkleRoot	bytes	Hash	The Merkle root is stored on-chain.

Note: In {0, 1} *, the * (Kleene star) denotes “zero or more repetitions”; i.e., the set of all finite binary strings. It is not a footnote marker.

Table 6. Security goals ↔ formal property ↔ cryptographic basis.

Security Goal	Formal Property	Cryptographic Basis
Integrity	$H (C) \neq H (C^{'})$	SHA-256 (FIPS 180-4)
Authenticity	${V e r i f y}_{pk} (m, σ)$	ECDSA (RFC 6979)
Non-repudiation	Signature binds the sender identity	Public Key Infrastructure
Inclusion	R = Merkle(h_i, P)	Merkle Trees (binary hashes)
Privacy-preserving	$VerifyZKP (π, x)$	ZKPs (Groth16, zk-SNARKs)

Table 7. Comparative analysis of content verification frameworks.

Framework	SSI Support	Merkle Anchoring	Off-Chain Storage	VC Integration	ZKP Support	GDPR-Aware Design	Key Limitation
VeriTrust (Ours)	Yes	Yes	Yes	Yes	Planned	Yes	Deployment maturity (in progress)
OriginTrail [80]	Yes	Yes	Yes	Partial	No	No	Primarily focused on supply-chain traceability
Content Authenticity Initiative [81]	No	Yes	Yes	No	No	Partial	Lacks decentralized identity and credential layer
C2PA Standard [82]	No	Yes (signatures + manifests)	Yes	No	No	Partial	Not blockchain anchored; identity is weakly bound
Sovrin / Hyperledger Indy [83]	Yes	No	Yes	Yes	No	Partial	No native content provenance anchoring

Table 8. Threats, risks, and mitigation strategies in VeriTrust.

Threat Category	Risk Description	Mitigation Strategy/Control	Trade-Offs
Content Tampering	Alteration of content after publication	SHA-256 hashing, Merkle root anchoring, tamper-evidence checks [71]	Requires recomputation of hashes on updates
Credential Forgery	Attackers forge or reuse credentials	VCs signed with issuer’s private key; DID-based public key resolution; revocation checks [67]	Reliant on timely issuer participation
Replay Attacks	Re-submission of valid but outdated records	Use of nonces + timestamps; duplicate-entry rejection via smart contract	Extra storage/processing overhead
Sybil Attacks (Identity Layer)	Mass creation of fake identities to flood the system	VC issuance restricted to vetted authorities; decentralized trust registries	Governance complexity, risk of gatekeeping
Off-Chain Content Deletion	Loss of IPFS data due to garbage collection or node churn	Multi-node pinning, decentralized pinning services, institutional mirrors, hybrid cloud fallback [87]	Additional storage/operational costs
Malicious or Colluding Issuers	Issuers collude or issue fraudulent credentials	Accreditation via multi-stakeholder trust registries (e.g., Sovrin, EBSI) [70]	Governance overhead, trust in registry
Key Compromise/DID Rotation	Stolen or expired private keys undermine authenticity	DID rotation (per W3C DID Core [58]); credential revocation registries	Requires timely revocation propagation
Contract-Level Invariants	Logic flaws or replay vulnerabilities in smart contracts	Formal verification (SMTChecker, Certora) [5]; monotonic timestamp and uniqueness constraints	Verification costs, developer expertise
Governance Risks in Permissioned Networks	Collusion among validator nodes undermines consensus	Diversity of operators, external audits, migration paths to public/consortium blockchains [79]	Higher coordination and governance overhead

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Farhan, M.; Butt, U.; Sulaiman, R.B.; Alraja, M. Self-Sovereign Identities and Content Provenance: VeriTrust—A Blockchain-Based Framework for Fake News Detection. Future Internet 2025, 17, 448. https://doi.org/10.3390/fi17100448

AMA Style

Farhan M, Butt U, Sulaiman RB, Alraja M. Self-Sovereign Identities and Content Provenance: VeriTrust—A Blockchain-Based Framework for Fake News Detection. Future Internet. 2025; 17(10):448. https://doi.org/10.3390/fi17100448

Chicago/Turabian Style

Farhan, Maruf, Usman Butt, Rejwan Bin Sulaiman, and Mansour Alraja. 2025. "Self-Sovereign Identities and Content Provenance: VeriTrust—A Blockchain-Based Framework for Fake News Detection" Future Internet 17, no. 10: 448. https://doi.org/10.3390/fi17100448

APA Style

Farhan, M., Butt, U., Sulaiman, R. B., & Alraja, M. (2025). Self-Sovereign Identities and Content Provenance: VeriTrust—A Blockchain-Based Framework for Fake News Detection. Future Internet, 17(10), 448. https://doi.org/10.3390/fi17100448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Self-Sovereign Identities and Content Provenance: VeriTrust—A Blockchain-Based Framework for Fake News Detection

Abstract

1. Introduction

1.1. Background and Motivation

1.2. Problem Statement

1.3. Research Objectives

1.4. Contributions

1.5. Paper Organization

2. Literature Review

2.1. Knowledge-Based Fake News Detection

2.2. Style-Based Fake News Detection

2.3. Propagation-Based Fake News Detection

2.4. Source-Based Fake News Detection

2.5. Blockchain Media and Content Verification

2.5.1. Content Authentication and Integrity

2.5.2. Copyright Protection and Digital Rights Management

2.5.3. Systematic Reviews and Meta-Analysis

2.5.4. Self-Sovereign Identity (SSI): DIDs and Verifiable Credentials

2.5.5. Digital Content Provenance

3. Proposed Framework: VeriTrust: A Blockchain-SSI Framework for Content Provenance and Fake News Detection

3.1. Architectural Overview

3.2. Self-Sovereign Identity Integration

3.3. Content Provenance Mechanism

3.4. Fake News Detection Components

4. Implementation and Validation

4.1. System Components and Tool Stack

4.2. Implementation Workflow and Data Flow

4.3. Algorithms and Pseudocode

4.3.1. Content Canonicalization for Stable Hashing

4.3.2. Algorithm 1: Content Provenance Submission

4.3.3. Algorithm 2: Content Verification

4.4. Framework Diagram and Architecture

4.4.1. Layer 1: Identity Layer (Self-Sovereign Identity—SSI)

4.4.2. Layer 2: Provenance Layer (On-Chain + Off-Chain)

4.4.3. Layer 3: Verification Layer (Client-Side)

4.5. Limitations and Trade-Offs

4.6. Mathematical Validation

4.6.1. Hash-Based Content Integrity

4.6.2. Merkle Tree Inclusion Proof

4.6.3. ECDSA-Based Authenticity

4.6.4. Zero-Knowledge Privacy Guarantees

4.6.5. Summary of Formal Guarantees

4.7. Benchmarking and Evaluation Methodology

4.7.1. Evaluation Metrics and Performance Objectives

4.7.2. Benchmarking Simulation Scenarios

4.7.3. Storage Efficiency and Off-Chain Trade-Offs

4.7.4. Scalability via Merkle Batching

5. Discussion and Comparative Analysis

5.1. Benchmark vs. State-of-the-Art Systems

5.2. Security Analysis: Threat Model and Mitigations

5.3. Regulatory and Ethical Considerations

5.4. Reproducibility Statement

6. Conclusions and Future Work

6.1. Summary of Contributions

6.2. Concluding Remarks

6.3. Future Research Directions

6.4. Limitations of the Study

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI