You are currently viewing a new version of our website. To view the old version click .
Future Internet
  • Article
  • Open Access

30 November 2025

A Blockchain-Based Framework for OSINT Evidence Collection and Identification

,
,
and
1
Department of Criminal Investigation, Central Police University, Taoyuan 333322, Taiwan
2
Department of Information Management, Central Police University, Taoyuan 333322, Taiwan
*
Author to whom correspondence should be addressed.
Future Internet2025, 17(12), 551;https://doi.org/10.3390/fi17120551 
(registering DOI)
This article belongs to the Special Issue Security and Privacy in Blockchains and the IoT—3rd Edition

Abstract

The rapid advancement of social media and the exponential increase in online information have made open-source intelligence an essential component of modern criminal investigations. However, existing digital forensics standards mainly focus on evidence derived from controlled devices such as computers and mobile storage, providing limited guidance for social media–based intelligence. Evidence captured from online platforms is often volatile, editable, and difficult to verify, which raises doubts about its authenticity and admissibility in court. To address these challenges, this study proposes a systematic and legally compliant open-source intelligence framework aligned with digital forensics principles. The framework comprises five stages: identification, acquisition, authentication, preservation, and validation. By integrating blockchain-based notarization and image verification mechanisms into existing forensic workflows, the proposed system ensures data integrity, traceability, and authenticity. The implemented prototype demonstrates the feasibility of conducting reliable and legally compliant open-source intelligence investigations, providing law enforcement agencies with a standardized operational guideline for social media–based evidence collection.

1. Introduction

Social media has become an indispensable component of modern communication, shaping how individuals interact, share content, and disseminate information. Popular platforms such as Facebook, Instagram, WhatsApp, and others collectively host billions of active users worldwide []. This extensive and diverse ecosystem provides a vast reservoir of publicly accessible information that can be leveraged for investigative purposes, particularly in open-source intelligence (OSINT) contexts.
Law enforcement agencies have increasingly utilized social media content to identify suspects and reconstruct criminal activities. A well-known example occurred after the 2021 U.S. Capitol riot [], when the Federal Bureau of Investigation (FBI) extensively used Facebook data and user-generated posts to identify participants and gather supporting evidence. Similarly, in the 2015 United Kingdom cyclist insurance fraud case [], investigators analyzed Facebook photos to reveal inconsistencies between the suspect’s claims and his actual activities, ultimately leading to a criminal conviction. These cases demonstrate the evidentiary potential of open-source information available through Facebook.
Although digital forensics standards such as ISO/IEC 27037:2012 [] and ISO/IEC 27043:2015 [] provide comprehensive guidance for handling evidence from computers, mobile devices, and storage media, they offer limited applicability to open-source intelligence. Evidence obtained from social media platforms is typically volatile, may be altered or deleted remotely, and is often captured as screenshots without verifiable provenance. Such conditions make it difficult to prove authenticity and maintain evidential integrity in judicial proceedings. Consequently, ensuring the reliability of social media evidence has become a significant challenge in digital forensics and legal practice.
To address these challenges, this study advances three core contributions that differentiate it from prior OSINT or blockchain-based forensic research. First, we operationalize a complete OSINT evidence lifecycle that incorporates multi-factor user identity correlation before acquisition, ensuring that only attributable and case-relevant social media content is collected. This design responds directly to recurring judicial concerns about the inability to link social media accounts to real-world individuals. Second, the proposed system supports both on-site OSINT acquisition for first responders and office-based OSINT processing, reflecting the realities of criminal investigation workflows and enabling consistent handling of OSINT across operational contexts. Third, we introduce on-acquisition notarization through a permissioned judicial blockchain infrastructure, providing immediate and cross-agency verifiable proof of evidentiary integrity, authenticity, and timing for volatile OSINT. These three components collectively close the OSINT-to-courtroom gap and establish an end-to-end, legally oriented framework for handling social media evidence.
To realize this design, this study proposes a blockchain-based forensic framework for OSINT evidence collection and user identification. The framework integrates established digital forensics standards with blockchain-supported notarization and timestamping mechanisms to ensure the integrity, authenticity, and traceability of collected intelligence across various social media and online platforms.
The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 presents the proposed system architecture. Section 4 discusses experimental results. Section 5 concludes the study and outlines future research directions.

3. Proposed OSINT Evidence Collection Methodology

To address these challenges, this study establishes a systematic and legally compliant framework for the collection and preservation of OSINT evidence. The framework is developed in accordance with international digital-forensics standards, including ISO/IEC 27037, 27041, 27042, and 27043, which collectively define procedures for the identification, acquisition, authentication, preservation, and validation of digital evidence. The proposed methodology follows this structure to ensure procedural clarity and maintain consistency with established forensic best practices.
While ISO/IEC 27037 formally defines four primary stages of digital evidence handling (identification, collection, acquisition, and preservation), this study extends the framework by incorporating the authentication and validation concepts outlined in ISO/IEC 27041 and ISO/IEC 27043. This enhancement ensures procedural transparency, auditability, and legal defensibility throughout the entire OSINT investigation workflow, aligning the proposed framework with the most recent international standards for end-to-end digital forensic processes.
To realize this framework in practice, a digital-evidence collection system named CIB-Triage III (Figure 2) was developed. The system operates the framework by supporting modular acquisition, metadata extraction, hashing, and verification within a unified workflow. It enables evidence collection from mobile devices, computers, and cloud platforms, and its automated processes minimize human intervention, thereby reducing the risk of contamination or tampering.
Figure 2. (a) CIB-Triage III integrates evidence collection tools for computers, mobile devices, and cloud data, along with mechanisms for on-chain evidence registration and verification. (b) The screenshot and recording functions within the digital forensic toolkit allow investigators to capture visual data from social networking platforms as digital evidence. (c) Each captured image is automatically annotated with metadata, including the timestamp and file size, and a cryptographic hash value is generated to verify the integrity and authenticity of the evidence.
Unlike traditional digital forensics, OSINT data are typically obtained from open networks and social media platforms rather than devices owned by suspects. As such, linking online evidence to a specific individual in the real world presents a significant challenge. Establishing this connection requires a combination of behavioral analysis, profile correlation, and network structure assessment to verify that the collected information accurately represents the activities of the identified person. This verification process is essential for ensuring that OSINT evidence is both authentic and attributable before it is formally acquired and preserved.
Within the context of OSINT, the Evidence Identification Phase differs from traditional digital-forensics practice. Although ISO/IEC 27037 defines identification as the process of recognizing potential sources of digital evidence, OSINT extends this phase to include the evaluation of publicly available materials distributed across social media and online ecosystems. This phase focuses on identifying and validating OSINT data that is relevant to the specific investigation target. By examining contextual relationships, user interactions, and data provenance, the process determines whether the collected online content can be legitimately associated with the subject under investigation. This refined identification phase ensures that subsequent acquisition, authentication, preservation, and validation procedures are performed on evidence that is both relevant and verifiably connected to the intended individual or incident.
After OSINT authenticity has been verified and the social media account has been credibly linked to a real-world individual, the corresponding evidentiary data are collected using CIB-Triage III. This system performs modular acquisition of verified OSINT artifacts from online platforms, ensuring that each item is accompanied by metadata, precise timestamps, and cryptographic hash values (e.g., SHA-256, acquisition timestamp, collector ID, content length) to maintain data integrity and verifiability. The evidence bytes remain off-chain in the evidence vault. The verified and authenticated evidence, along with its metadata and hash values, is subsequently uploaded to b-JADE for blockchain notarization. By leveraging the decentralized and immutable architecture of blockchain, the framework preserves authoritative evidence states, enables immediate verification of authenticity, and mitigates risks associated with the volatility of online content. Integration with the Police Digital Evidence Chain further enhances inter-agency trust and procedural transparency, strengthening the legal defensibility of OSINT-derived evidence in judicial and administrative proceedings.
The following sections provide a detailed explanation of the proposed framework, elaborating on the identification, acquisition, authentication, preservation, and validation phases (Figure 3). Each phase is discussed in terms of its objectives, procedures, and mechanisms, illustrating how the framework systematically implements OSINT evidence management in a transparent and legally defensible manner. To further enhance clarity, Figure 4 visualizes the complete data flow from OSINT collection through identity correlation, off-chain storage, and on-chain notarization. This architectural illustration highlights the interaction between CIB-Triage III and b-JADE, the separation of trust boundaries, and the integration between on-chain and off-chain components, providing a clear understanding of how evidence moves through the system.
Figure 3. Operational workflow of the proposed OSINT evidence collection framework.
Figure 4. System architecture of the proposed OSINT evidence framework.

3.1. Identification Phase

3.1.1. Platform and Tool Preparation

The platform and tool preparation sub-phase ensures that all technical resources are ready and that pre-existing intelligence is systematically organized, establishing a solid foundation for subsequent investigative activities. All hardware and software required for evidence collection are configured and validated according to the specific objectives of the investigation. Core tools include screen-capture applications, Optical Character Recognition (OCR) utilities for converting text embedded in images into machine-readable formats, and analytical software such as mind-mapping tools used to conduct relational analysis and visualize complex intelligence structures.
Integrated forensic platforms, such as CIB-Triage III, further enhance operational rigor by enabling continuous recording of digital interactions, automatic extraction of textual content, and secure computation and storage of cryptographic hash values alongside essential metadata, including timestamps and source identifiers. Collected evidence can be uploaded either immediately or in batches to b-JADE, leveraging blockchain’s decentralized, immutable, and auditable characteristics to ensure authenticity and integrity across the entire chain of custody. By formalizing these preparatory procedures and aligning them with forensic and legal standards, the framework enhances both operational efficiency and the reliability of the resulting evidence.

3.1.2. Target Background and Relevant Information

Before initiating active OSINT collection, investigators systematically compile comprehensive background information on the target of the investigation. This preparatory step ensures that subsequent data acquisition and analysis are conducted with precision, legality, and a clear understanding of the context. The compiled information includes personal identifiers such as full names, known aliases, demographic characteristics, professional and educational backgrounds, contact details, frequently visited locations, and photographic records.
The data compilation also extends to individuals closely associated with the target, including family members, colleagues, and other relevant affiliates, along with any supporting documentation, such as news articles, social media posts, or public records. This structured dataset serves as a critical foundation for subsequent stages of OSINT investigation, facilitating correlation, behavioral analysis, and identity verification. By conducting this process under standardized forensic and legal protocols, investigators ensure that all preparatory activities are performed within a methodologically sound and legally defensible framework.

3.1.3. User Identity Correlation Phase

The User Identity Correlation Phase focuses on verifying and correlating online identities with real-world individuals through a progressive multifactor comparison mechanism. The process begins with surface-level indicators available on social media and gradually advances toward more detailed and personalized information that provides stronger corroboration. This layered approach ensures that identity attribution proceeds from low-risk, easily observable factors to high-confidence, evidence-based verification, thereby enhancing both accuracy and legal defensibility.
By integrating multiple identification criteria in a structured sequence, OSINT data are cross-validated across diverse sources to minimize misidentification risks and strengthen investigative precision. The verification process consists of four levels, each increasing in depth and reliability, building upon the previous one.
  • Identical Account or Name Comparison
This method represents the entry point of the analysis, identifying potential users based on identical or highly similar usernames across multiple platforms. For example, if a person uses the account name A1 on platform Alpha and a nearly identical username on platform Beta, there is a strong probability that both accounts are operated by the same individual. The reliability of this approach increases with the uniqueness or rarity of the username.
2.
Relationship Network Comparison
The next level examines social relationships and connections between followers and friends within online platforms. If user A is known to have siblings named Alpha and Beta, and those accounts appear in the friend list of a particular Facebook profile, it can be reasonably inferred that the profile is likely controlled by user A. This method leverages stable interpersonal ties as mid-level corroborative evidence, enhancing confidence in user identification.
3.
Special Event Comparison
At this stage, the analysis incorporates identifiable events with distinct temporal markers such as birthdays or anniversaries. For instance, if user A′s birthday is known to be March 3 and a Facebook account receives numerous birthday messages precisely on that date, the likelihood that the account belongs to user A increases significantly. Event-based temporal correlation thus provides contextual validation linking digital activity with real-world timelines.
4.
Individualized Content Comparison
The final and most in-depth layer focuses on highly personalized information voluntarily or inadvertently disclosed online. Examples include photographs with recognizable locations, visible vehicle license plates, or images related to banks. If user A posts a photograph that clearly displays identifiable features such as a car license plate or a distinctive environment, it provides compelling corroborative evidence that the account is indeed operated by that individual. This method yields the highest level of confidence, provided that the evidence is obtained and preserved in compliance with legal and forensic standards.

3.2. Acquisition Phase

The Acquisition Phase translates the outputs of the Identification Phase into actionable evidence collection. Building on the previously established verified OSINT targets and sources, this stage focuses on the controlled acquisition of relevant data while maintaining strict forensic soundness. All collection activities are conducted in accordance with ISO/IEC 27037 guidelines to ensure that each acquired item retains its authenticity, integrity, and traceability throughout the process.

3.2.1. Collection

The Collection sub-phase involves the systematic acquisition of publicly available data that have been pre-validated as relevant to the investigation. Data sources include major social media platforms (Facebook, Instagram, and X), instant messaging applications, online forums, blogs, and cryptocurrency exchanges. To preserve authenticity and integrity, investigators use digital screenshots, continuous screen recording, and other non-intrusive methods that do not alter the original data.
Key elements of interest include usernames, account identifiers, profile attributes, user-generated content, social relationships (friends, followers, commenters), temporally defined activities (birthday posts, event participation), and individualized digital artifacts such as images, videos, vehicle license plates, and geolocation data. Formalizing and standardizing these procedures ensures reproducibility, completeness, and compliance with evidentiary requirements.

3.2.2. Analysis

After collection, the acquired data are organized and filtered to remove redundancy and irrelevant information. The analysis aims to extract critical, individualized intelligence components that contribute to establishing evidential value. Each dataset is examined for authenticity, integrity, and temporal accuracy before being prepared for authentication and preservation.
Analytical tools assist this process: Google Lens can detect duplicate images across platforms, while deep-fake-detection tools such as Sensity [] and Reality Defender [] verify content authenticity. Captured images are processed with OCR software to facilitate textual indexing, and visualization applications such as Xmind [] help investigators map and understand complex relationships within the collected intelligence.

3.3. Authentication Phase

The authentication phase ensures that all OSINT evidence collected in the acquisition stage is genuine, unaltered, and traceable to its original source. This phase verifies the integrity and authenticity of each data item by applying forensic validation techniques and establishing a clear chain of custody. The authentication process is particularly critical for OSINT because the information originates from open and dynamic online environments, where digital content can be modified, deleted, or fabricated without notice.
Verification begins by confirming the provenance and consistency of each evidence item. Metadata such as timestamps, URLs, device information, and digital signatures are analyzed to confirm their coherence with the original collection records. Cryptographic hash values generated during acquisition are recalculated and compared to the originals to verify that no alterations occurred. Any discrepancy triggers an integrity review, and the affected data are either revalidated or excluded from further evidentiary use.
To further enhance reliability, blockchain-based notarization mechanisms such as b-JADE are utilized to record hash values and essential metadata on a decentralized ledger. This ensures long-term immutability and auditability, allowing investigators and judicial authorities to independently verify the authenticity of evidence at any point in time. Additionally, all authentication activities are documented within the forensic management system to ensure procedural transparency and compliance with ISO/IEC 27001:2022 [] and 27002:2022 [] requirements.
Through this systematic verification process, the framework guarantees that every authenticated OSINT artifact meets the evidentiary standards of integrity, originality, and reproducibility required for legal admissibility.

3.4. Preservation

The Preservation Phase focuses on maintaining the integrity, authenticity, and continuity of OSINT evidence after authentication. To reinforce evidentiary reliability, the proposed framework integrates the preservation process directly into the CIB-Triage III platform. This integration enables automated computation of unique cryptographic hash values for each digital artifact, ensuring that all evidence is systematically verified and securely recorded. Immutable hash records serve as digital fingerprints, preventing post-collection modification and maintaining an unbroken chain of custody.
By embedding OSINT preservation procedures into an established forensic platform, the framework extends conventional digital-evidence management practices. It provides investigators with a standardized, reproducible, and legally defensible methodology for long-term intelligence preservation, fully compliant with the principles outlined in ISO/IEC 27042 and 27043.

3.5. Validation Phase

The Validation Phase consolidates and formalizes all outputs from the preceding collection, analysis, and identification stages into a structured, auditable, and transparent forensic record. Within this phase, all investigative findings, analytical reasoning, and operational methodologies are systematically documented to ensure reproducibility, traceability, and evidentiary validity.
Following the principles outlined in the Berkeley Protocol on Digital Open Source Investigations [], the framework structures the final report to include:
  • Clearly defined investigative objectives;
  • Comprehensive description of the methodology, including procedural steps and adherence to legal and forensic standards;
  • A chronological account of activities conducted during collection, analysis, and identification;
  • Detailed documentation of data sources and provenance to maintain transparency and traceability;
  • Acknowledgment of uncertainties, data gaps, or limitations encountered during the investigation; and
  • Evidence-based findings supported by actionable recommendations.
Furthermore, the framework incorporates visual analytic representations, such as network diagrams, charts, and relational maps, to illustrate complex interconnections and facilitate the interpretation of analytical conclusions. By formalizing the report generation process within this structure, the Validation Phase ensures that OSINT investigations are systematic, transparent, and legally defensible, while also providing a replicable methodology for knowledge transfer and future investigative or judicial use.

3.6. Legal Admissibility and Compliance Mapping

This section maps the OSINT lifecycle employed in this study, which consists of Identification, Acquisition, Authentication, Preservation, and Validation, to the evidentiary requirements of the United States Federal Rules of Evidence (FRE). This alignment ensures that each stage of the proposed framework satisfies specific legal thresholds required for admissibility:
  • Authenticity (FRE 901): The requirement of Rule 901(a) is satisfied through a documented, repeatable process. The system captures comprehensive logs, precise timestamps, and the SHA-256 hash of the original data at the moment of capture. This digest, along with minimal metadata, is immediately notarized on the b-JADE consortium ledger to freeze the state of evidence.
  • Self-Authentication (FRE 902): Requirements under Rules 902(13) and 902(14) regarding certified records of generated electronic information are met through a dual-verification mechanism. This involves a custodian’s certificate describing the system’s operation and a verifiable match between the hash of the preserved object in the off-chain vault and the immutable digest recorded on-chain (referenced by transaction ID).
  • Best Evidence Rule (FRE 1001–1003): The framework complies with the “Best Evidence” principle by preserving the original binary data in a Write-Once-Read-Many (WORM) evidence vault. It allows for the production of the original or a bit-for-bit duplicate, with integrity demonstrated through deterministic re-hashing and on-chain proof verification.
Through these mechanisms, our framework ensures that OSINT evidence collected through the proposed lifecycle meets the legal standards for authentication, integrity, provenance, and evidentiary sufficiency required in judicial proceedings. By embedding admissibility considerations directly into the forensic workflow, the system strengthens the reliability and defensibility of OSINT evidence in both investigative and courtroom contexts.

4. Implementation of the OSINT Evidence Collection Framework

This section presents a prototype implementation of the proposed framework that demonstrates feasibility and workflow integration; large-scale, real-world deployment and evaluation are left to future work. The implementation follows the five sequential phases established in Section 3, namely identification, acquisition, authentication, preservation, and validation. Each phase has been modularized into dedicated software components and operational procedures to ensure that the investigative workflow remains systematic, legally compliant, and forensically verifiable.
Given its large user base, Facebook [] is used here as a case study platform for implementing and demonstrating the pipeline. However, the system architecture is designed to be applicable to other OSINT sources such as X (Twitter), Instagram, and additional social network platforms. This platform, once fully developed, can potentially serve as a foundation for testing and validating the proposed framework. The system architecture is designed to be extensible and can be expanded in future versions to incorporate other OSINT sources.
The implementation emphasizes procedural rigor and reproducibility while integrating cryptographic verification and blockchain-based preservation mechanisms to ensure the integrity of digital evidence throughout the forensic process.

4.1. Identification Phase

In the identification phase, investigators prepare the technical environment and compile relevant background information on the investigative target. This preparation ensures that all required tools and intelligence resources are properly organized before active evidence collection begins. The identification phase, therefore, serves as the foundation of the entire OSINT workflow, linking investigative planning with the operational procedures described in subsequent stages.

4.1.1. Platform and Tools Preparation

The platform and tools preparation phase ensures that investigators are equipped with the essential technical resources, software platforms, and baseline intelligence required to perform efficient and legally compliant OSINT evidence collection. All tools incorporated into the framework correspond to the five functional phases of the OSINT evidence collection process, namely identification, acquisition, authentication, preservation, and validation. Excluding the validation phase, seven primary tools have been implemented to support the core operations of the system. These tools establish a standardized and reproducible technical environment that aligns with the principles of ISO/IEC 27037, supporting accurate, transparent, and legally defensible OSINT operations within a forensic context.
Among these tools, CIB-Triage III serves as the central platform, providing comprehensive functionality for screen capture, continuous screen recording, Optical Character Recognition (OCR), and blockchain-based preservation through b-JADE. The Google Lens utility, accessible directly via the Chrome browser, enables rapid image correlation and cross-platform verification. Xmind (version 26.01) facilitates structured visualization and relationship mapping, while deepfake detection tools such as Sensity and Reality Defender are employed to verify the authenticity of visual content (Figure 5).
Figure 5. Tools utilized in the implementation of the OSINT evidence collection framework.

4.1.2. Target Background and Relevant Information

During this stage, investigators compile comprehensive background and familial information concerning the target. Police information systems are accessed to retrieve a list of the target’s relatives, including the collection of fundamental biographical data such as names, photographs, and dates of birth for all individuals within the fourth-degree kinship range. Public academic databases such as Google Scholar, along with advanced search queries using filters like “site:edu.xx” or “site:ac.xx,” are also utilized to identify the target’s academic background, research expertise, and potential classmates or collaborators.
Additional data sources, such as stop and search records, co-travel information from immigration databases, and visitation logs from detention facilities, are examined to uncover the target’s close associates or possible accomplices. All gathered intelligence is subsequently organized with Xmind to construct a comprehensive mind map of the target (Figure 6), providing a visual overview that facilitates subsequent data analysis and operational assessment.
Figure 6. Map of target basic and relative information.

4.1.3. User Identity Correlation Phase

A multi-factor verification approach is employed to confirm account attribution. Relying on a single factor, such as a matching photograph, provides limited confidence due to the potential for impersonation or coincidental similarity. To address this limitation, the verification process follows a probabilistic reasoning model inspired by Bayesian theory. Each independent indicator contributes to the posterior probability of correct attribution according to the formula:
  P A B 1 , B 1 , , B N = k P A i = 1 N P B i A
where P ( A B 1 , B 1 , , B N ) represents the probability that the account truly belongs to the target given N independent corroborating factors B 1 , B 1 , , B N . The constant k is a normalization factor that ensures the total probability across all potential account candidates equals one. In practical applications, investigators are primarily concerned with relative probabilities rather than the explicit value of k, since the normalization term does not affect comparative attribution results.
In this operational framework, the specific weights P ( B i A ) are assigned according to the evidentiary strength of each category. For example, ‘Individualized Content Comparison’ (high weight) receives a substantially higher weight than ‘Identical Account Name’ (low weight) because the likelihood of accidental duplication is far lower for physical identifiers such as license plates or geolocated imagery than for usernames. This hierarchical weighting ensures that identity correlation is grounded in high-confidence indicators prior to final attribution.
As additional consistent evidence is incorporated, such as overlapping friend networks, recurring event timestamps, and unique identifiers, the cumulative probability of accurate attribution increases multiplicatively. This quantitative framework mirrors fingerprint identification principles, in which each new point of correspondence raises the overall confidence level and significantly reduces the likelihood of false attribution.
To operationalize this framework, investigators apply the four OSINT evidence classification methods introduced in Section 3.1.3. Representative examples are shown in Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11 and are described as follows:
Figure 7. LINE and Facebook accounts using the same username “PKNEVXXXXXE,” demonstrating cross-platform identity correlation.
Figure 8. Facebook “Like” interaction records illustrating relationship mapping through engagement patterns.
Figure 9. Comparison between the target’s social media connections and known family and friend relationships.
Figure 10. Facebook post containing a special event message “Happy Birthday!”, showing temporal and relational cues useful for identity verification.
Figure 11. Facebook post displays a car license plate, representing individualized content that supports identity corroboration.
  • Identical Account or Name Comparison. Investigators begin by using keywords or names to locate accounts with identical or similar usernames. For instance, Facebook’s search function can identify accounts displaying public profile pictures, posts, and interaction histories. LINE is a messaging and social platform widely used in East Asia, and its functionalities are similar to those of WhatsApp and Facebook Messenger. LINE allows account searches using either user IDs or phone numbers. As shown in Figure 7, correlation is strengthened when both platforms use the same account name, such as the username “PKNEVXXXXXE.”
  • Relationship Network Comparison. Public friend lists and post-interaction records, such as likes and comments, can be analyzed to map social networks (Figure 8 and Figure 9). Even when friend lists are private, repeated engagement from specific accounts may indicate closer relationships.
  • Special Event Comparison. Posts associated with birthdays, memorials, or anniversaries (Figure 10) often reveal key personal relationships and life events, helping to corroborate identity.
  • Individualized Content Comparison. Posts may inadvertently disclose distinctive information such as locations, vehicle license plates, or household environments. Although each element may appear minor, these details become valuable when corroborated with other intelligence (Figure 11).
Once multi-factor identification confirms the actual user of a social media account, investigators then transition back to the Collection sub-phase to conduct broader data acquisition. At this stage, the CIB-Triage III system functions as the core acquisition platform, providing real-time screen capture and continuous recording capabilities. These functionalities support a comprehensive collection of both static and dynamic data across websites, social media platforms, and other publicly accessible sources, ensuring completeness and traceability of collected intelligence. Intelligence directly related to the target, such as check-in records or tagged locations, provides valuable insights into the target’s activities, timeline, and associations. Finally, all information relevant to the investigation is securely preserved through the Preservation sub-phase by recording integrity-protected artifacts, such as hash values, onto the b-JADE blockchain, ensuring long-term traceability and evidentiary admissibility.

4.2. Acquisition Phase

4.2.1. Collection

The Collection sub-phase involves the systematic acquisition of publicly available data that have been pre-validated as relevant to the investigation. The CIB-Triage III system serves as the core acquisition platform, providing real-time screen capture and continuous recording capabilities (Figure 12). To preserve authenticity and integrity, investigators employ digital screenshots, continuous screen recording, and other non-intrusive techniques that do not modify the original data.
Figure 12. Real-time screen capture and continuous recording are performed using the CIB-Triage III platform.
These functionalities support comprehensive acquisition of both static and dynamic data across websites, social media platforms, and other publicly accessible sources, ensuring completeness and traceability of collected intelligence. By centralizing these operations within a single platform, the framework achieves consistent data-handling procedures that comply with ISO/IEC 27037 guidelines and maintain forensic soundness throughout the acquisition process.

4.2.2. Analysis

After collection, the acquired data are systematically organized, filtered, and verified to remove redundancy and irrelevant information. The analysis aims to extract critical and individualized intelligence components that contribute to establishing evidential value. Each dataset is examined for authenticity, integrity, and temporal accuracy before being prepared for authentication and preservation.
To enhance analytical rigor and efficiency, multiple complementary tools are integrated into the framework:
  • OCR within CIB-Triage III converts textual content embedded in images into searchable formats, enabling rapid extraction of relevant information from large datasets (Figure 13).
    Figure 13. Extraction and analysis of Facebook friend lists using the OCR function within CIB-Triage III.
  • Mind-mapping software, such as Xmind, provides structured visualization and relational analysis, allowing investigators to systematically synthesize fragmented intelligence (Figure 14).
    Figure 14. Mind mapping of the investigative target and associated relationships created with Xmind.
  • Reverse image search tools, such as Google Lens, enable verification of whether visual content is already publicly available, misappropriated, or altered.
  • Deepfake detection platforms, such as Sensity and Reality Defender, are employed to evaluate the authenticity of images and videos, minimizing the risk of incorporating manipulated content into the investigation.
After the analytical process is completed, verified datasets are forwarded to the Authentication Phase, where hash values and metadata are generated to ensure evidentiary integrity and forensic verifiability.

4.3. Authentication and Preservation Phase

In the Authentication Phase, CIB-Triage III (Figure 15) automatically generates cryptographic hash values for all captured screenshots and video recordings. These hash values, together with associated metadata such as timestamps and collector identity, are securely stored in b-JADE (Figure 16). This procedure ensures that each piece of evidence can be independently verified during judicial review and remains tamper-proof throughout its lifecycle.
Figure 15. Hash value generation and metadata recording are performed within the CIB-Triage III platform.
Figure 16. Evidence preservation and blockchain-based verification performed through the integration of CIB-Triage III with the b-JADE.

4.4. Validation Phase

This phase is the final operational step of the framework. It consolidates and validates outputs from identification, acquisition, authentication, and preservation. The system performs consistency checks, revalidates evidence integrity, and generates structured exports to support report preparation for judicial or administrative use.
  • Data integrity revalidation
The platform rechecks hash values, timestamps, and blockchain notarization identifiers produced during authentication and preservation. Hash records stored in b-JADE are compared with the local evidence registry to confirm that no alterations occurred during storage or transfer (Figure 17).
Figure 17. Verification of digital evidence integrity on the b-JADE blockchain. (a) Verification performed through the connection between the CIB-Triage III forensic platform and b-JADE. (b) Verification conducted via the b-JADE Common Verification Platform web interface (https://b-jade.moj.gov.tw/page/hash/hash.aspx) (accessed on 24 November 2025).
2.
Procedural traceability review
Operational logs are compiled and linked to their corresponding digital objects. Logs include acquisition timestamps, collector identifiers, analytic actions, and version information. These linkages establish an auditable chain of custody suitable for judicial review.
3.
Reporting support and structured export
Validated findings, metadata, and integrity references are organized into exportable fields aligned with Section 3.5 and the Berkeley Protocol. The framework does not auto-generate the final report. Investigators can directly use exported fields, such as key findings and identity correlation evidence, as well as the associated acquisition and authentication metadata, cryptographic hashes, and blockchain transaction IDs. Additionally, they can appendices that list artifact manifests and hash indexes.
Through these validations and structured outputs, the Validation Phase completes the forensic workflow, ensuring that OSINT evidence managed by CIB-Triage III is verifiable, traceable, and suitable for legal admissibility.

4.5. Performance Evaluation and Scalability Analysis

Social media evidence is characterized by high velocity and volatility, which requires the underlying notarization mechanism to respond rapidly and remain stable under varying workloads. To evaluate whether the proposed blockchain-based framework meets these operational requirements, we conducted a series of performance and scalability experiments focusing on the hash-to-on-chain process in the b-JADE environment.

4.5.1. Experimental Setup and Scenarios

The experiments were performed on VMware ESXi virtual machines equipped with 4 virtual CPUs, 8 GB of RAM, and a 120 GB solid-state drive, running Ubuntu 20.04 and Hyperledger Fabric. The blockchain network was configured with a batch timeout of 2 s, a maximum message count of 500 transactions per block, and an absolute maximum block size of 10 megabytes. These parameters reflect a representative configuration for permissioned blockchain deployments in practical forensic environments.
To evaluate the impact of increasing data volume on system latency, we generated a comprehensive set of hash values and conducted write-latency tests across four batch sizes: 10, 100, 1000, and 10,000 entries. Because Hyperledger Fabric does not explicitly record block-finalization timestamps, the completion time for each test was determined using the header timestamp of the final transaction included in the last confirmed block, which provides millisecond-level precision. To ensure reliability and reduce the influence of outliers, each test scenario was executed 5 times, and the average completion time was used for analysis.

4.5.2. Experimental Results and Analysis

The results summarized in Table 2 demonstrate a clear relationship between batch size and write latency while confirming the operational stability of the system. For small batches of 10 hashes, the average completion time was 42 milliseconds. As the batch size increased to 100 and 1000 hashes, the average latency rose to 2.24 s and 51.25 s, respectively. The largest batch of 10,000 hashes required 555.21 s (approximately 9 min).
Table 2. Test result for 10 to 10,000 hash values.
These increases in latency reflect the computational overhead inherent to block ordering and validation within a permissioned blockchain network. Nonetheless, the performance remains within acceptable limits for forensic operations. In practical investigative contexts, the ability to notarize thousands of digital artifacts, such as chat logs or image collections, within minutes is sufficient to preserve volatile social media evidence before suspects can alter or delete content. These results demonstrate that the b-JADE infrastructure provides the throughput and stability required for real-world investigative scenarios that demand timely OSINT preservation.

5. Conclusions

This research establishes an integrated digital-forensics framework that transforms OSINT from a loosely organized investigative activity into a legally admissible evidentiary process. By combining standardized identification, controlled acquisition and analysis, cryptographic authentication, and blockchain-based preservation, the proposed model operationalizes the full OSINT lifecycle in accordance with forensic and legal standards.
Through the implementation of the CIB-Triage III prototype, the framework demonstrates that OSINT evidence can be recognized, collected, analyzed, authenticated, and validated within a single coherent workflow. This integration closes the long-standing gap between open-source information gathering and judicially recognized digital evidence. The b-JADE blockchain infrastructure further extends this capability by providing immutable notarization and cross-agency verification, creating an auditable chain of custody that meets the evidentiary requirements of international courts and law enforcement cooperation.
The study’s methodological contribution lies in formalizing a replicable, end-to-end process that aligns the principles of the Berkeley Protocol on Digital Open Source Investigations with ISO/IEC 27037 and related standards. Technically, it delivers a prototype that unifies multi-factor identity correlation, integrity verification, and digital preservation within a single forensic platform. Practically, it offers judicial institutions a verifiable path to admit OSINT-derived evidence while maintaining transparency, reproducibility, and ethical compliance.
By consolidating these theoretical, technical, and procedural advances, the framework establishes a new paradigm for OSINT forensics. It provides an actionable model that enhances investigative credibility, strengthens transnational evidentiary reliability, and promotes judicial confidence in the lawful use of open-source intelligence worldwide. Future research should further evaluate whether the deployment of a production-level version of CIB-Triage III can empirically improve the judicial acceptance and evidentiary weight of OSINT in real court proceedings.
While the current framework establishes a robust foundation for data integrity using permissioned blockchains, future iterations will integrate advanced privacy-preserving technologies and intelligent analysis modules. Zero Knowledge Proofs (ZKPs) represent a promising direction for privacy-preserving OSINT, enabling investigators to prove the possession or validity of evidence without disclosing sensitive collection methods or source identities on the ledger. Furthermore, as discussed in Section 2.4, incorporating AI-driven anomaly detection, deepfake detection, and dynamic graph analysis will enhance automated identification of manipulated content and fraudulent on-chain behaviors. Integrating these smart and privacy-preserving analytical components within the b-JADE infrastructure represents a natural evolution of the framework, transforming it from a passive preservation mechanism into an active, intelligent forensic ecosystem.

Author Contributions

All authors discussed the content of the manuscript and contributed equally to its preparation. Specifically, H.-W.H. performed the experiments; C.-H.S. and C.-Y.L. designed and supervised the study; H.-Y.T. analyzed the experimental results; and C.-Y.L. assisted in interpreting the results. C.-H.S. and C.-Y.L. reviewed the manuscript, while C.-Y.L. and H.-W.H. drafted the initial version. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The CIB-Triage III prototype used in this study is not publicly available due to copyright and intellectual property restrictions. No new datasets were generated or analyzed beyond those incorporated within the framework’s controlled testing environment.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT-5, ChatGPT-4o, and Grammarly (version 1.2.198.1762) to assist with text refinement and language editing through human–machine–human iterations. Furthermore, ChatGPT-5 and Freepik were utilized to optimize the visual aesthetics, specifically the color schemes and structural layout, of Figure 1 and Figure 4 based on the original designs. The authors have carefully reviewed and edited all outputs and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Statista. Biggest Social Media Platforms by Users 2025. Available online: https://www.statista.com/statistics/272014/global-social-networks-ranked-by-number-of-users/ (accessed on 17 November 2025).
  2. AP News. Capitol Rioters’ Social Media Posts Influencing Sentencings. Available online: https://apnews.com/article/media-prisons-social-media-capitol-siege-sentencing-0a60a821ce19635b70681faf86e6526e (accessed on 10 July 2025).
  3. Insurance Business, UK. Another Fraudulent Insurance Claim Foiled by Social Media. Available online: https://www.insurancebusinessmag.com/uk/news/breaking-news/another-fraudulent-insurance-claim-foiled-by-social-media-193531.aspx (accessed on 10 July 2025).
  4. ISO/IEC 27037:2012; Information Technology—Security Techniques—Guidelines for Identification, Collection, Acquisition, and Preservation of Digital Evidence. ISO: Geneva, Switzerland, 2012.
  5. ISO/IEC 27043:2015; Information Technology—Security Techniques—Guidelines for the Analysis and Interpretation of Digital Evidence. ISO: Geneva, Switzerland, 2015.
  6. ISO/IEC 27041:2015; Information Technology—Security Techniques—Incident Investigation Principles and Processes. ISO: Geneva, Switzerland, 2015.
  7. ISO/IEC 27042:2015; Information Technology—Security Techniques—Guidance on Assuring Suitability and Adequacy of Incident Investigative Method. ISO: Geneva, Switzerland, 2015.
  8. US Law Rule 901. Authenticating or Identifying Evidence. Available online: https://www.law.cornell.edu/rules/fre/rule_901 (accessed on 14 June 2025).
  9. CourtListener. (n.d.). Search Results for Courts: All Query: Social Media Evidence. Available online: https://www.courtlistener.com/?type=r&q=social%20media%20evidence&type=r&order_by=score%20desc (accessed on 21 November 2025).
  10. Gupta, K.; Oladimeji, D.; Varol, C.; Rasheed, A.; Shahshidhar, N. A Comprehensive Survey on Artifact Recovery from Social Media Platforms: Approaches and Future Research Directions. Information 2023, 14, 629. [Google Scholar] [CrossRef]
  11. Patzakis, J. Social Media Evidence Key Factor in Estimated 500,000 Litigation Cases Last Year. Available online: https://blog.pagefreezer.com/social-media-evidence-500000-litigation-cases (accessed on 22 November 2025).
  12. Wright, P. Neal Ysart Legal Scrutiny of OSINT Evidence and the Access to Digital Devices in Court: Key Case Law and Best Practices. Available online: https://www.linkedin.com/pulse/legal-scrutiny-osint-evidence-access-3gbjf/ (accessed on 22 November 2025).
  13. 06-1893—Lorraine et al v. Markel American Insurance Company—Content Details. Available online: https://www.govinfo.gov/app/details/USCOURTS-mdd-1_06-cv-01893 (accessed on 22 November 2025).
  14. 06-1497—Capitol Records, Inc et al v. Thomas-Rasset—Document in Context. Available online: https://www.govinfo.gov/app/details/USCOURTS-mnd-0_06-cv-01497/context (accessed on 22 November 2025).
  15. People v. Price 2017 New York Court of Appeals Decisions New York Case Law New York Law U.S. Law Justia. Available online: https://law.justia.com/cases/new-york/court-of-appeals/2017/58.html (accessed on 10 July 2025).
  16. United States v. Farrad, No. 16-6730 (6th Cir. 2018) Justia. Available online: https://law.justia.com/cases/federal/appellate-courts/ca6/16-6730/16-6730-2018-07-17.html (accessed on 22 November 2025).
  17. Can Social Media Be Used in Court? 23 Court Cases That Prove Social Media Evidence Can Make or Break a Case. Available online: https://blog.pagefreezer.com/social-media-digital-evidence-forensics-court-cases (accessed on 22 November 2025).
  18. Supreme Court of British Columbia, C.C. ICD—R. v. Hamdan—Asser Institute. Available online: https://www.internationalcrimesdatabase.org/Case/3306/ (accessed on 14 June 2025).
  19. Taiwan Supreme Court Judgment No. 320. Available online: https://web.archive.org/web/20251129025708/https://judgment.judicial.gov.tw/FJUD/data.aspx?ty=JD&id=TPSM,112%2c%e5%8f%b0%e4%b8%8a%2c320%2c20230817%2c1 (accessed on 24 November 2025).
  20. Basumatary, B.; Kalita, H.K. Social Media Forensics—A Holistic Review. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 23–25 March 2022; pp. 590–597. [Google Scholar] [CrossRef]
  21. Vidya, V.; Saly, K.; Balan, C. Forensic Acquisition and Analysis of Webpage. In Proceedings of the 2022 2nd International Conference on Intelligent Technologies (CONIT), Hubli, India, 24–26 June 2022. [Google Scholar] [CrossRef]
  22. Greco, C.; Ianni, M.; Seminara, G.; Guzzo, A.; Fortino, G. A Forensic Framework for Screen Capture Validation in Legal Contexts. In Proceedings of the 2024 IEEE International Conference on Cyber Security and Resilience (CSR), London, UK, 2–4 September 2024; pp. 127–132. [Google Scholar] [CrossRef]
  23. Guo, H.; Yu, X. A Survey on Blockchain Technology and Its Security. Blockchain Res. Appl. 2022, 3, 100067. [Google Scholar] [CrossRef]
  24. Liu, S.; Zheng, Q. A Study of a Blockchain-Based Judicial Evidence Preservation Scheme. Blockchain Res. Appl. 2024, 5, 100192. [Google Scholar] [CrossRef]
  25. Deng, Z.; Ren, Y.; Liu, Y.; Yin, X.; Shen, Z.; Kim, H.J. Blockchain-Based Trusted Electronic Records Preservation in Cloud Storage. Comput. Mater. Contin. 2019, 58, 135. [Google Scholar] [CrossRef]
  26. Ministry of Justice, Taiwan. Blockchain-Applied Judicial Alliance for Digital Era Common Verification Platform. Available online: https://web.archive.org/web/20250505075608/https://b-jade.moj.gov.tw/Default.aspx (accessed on 24 November 2025).
  27. Transaction Flow—Hyperledger Fabric Docs Main Documentation. Available online: https://hyperledger-fabric.readthedocs.io/en/latest/txflow.html (accessed on 10 July 2025).
  28. Sheng, Z.; Song, L.; Wang, Y. Dynamic Feature Fusion: Combining Global Graph Structures and Local Semantics for Blockchain Phishing Detection. IEEE Trans. Netw. Serv. Manag. 2025, 22, 4706–4718. [Google Scholar] [CrossRef]
  29. Tian, Y.; Song, L.; Qian, P.; Wang, Y.; Sun, J.; Jia, Y. DiT-SGCR: Directed Temporal Structural Representation with Global-Cluster Awareness for Ethereum Malicious Account Detection. arXiv 2025, arXiv:2506.20123. [Google Scholar]
  30. Liu, T.; Wang, Y.; Sun, J.; Tian, Y.; Huang, Y.; Xue, T.; Li, P.; Liu, Y. The Role of Transformer Models in Advancing Blockchain Technology: A Systematic Survey. Eng. Appl. Artif. Intell. 2026, 163, 112968. [Google Scholar] [CrossRef]
  31. Jia, Y.; Wang, Y.; Sun, J.; Tian, Y.; Qian, P. LMAE4Eth: Generalizable and Robust Ethereum Fraud Detection by Exploring Transaction Semantics and Masked Graph Embedding. IEEE Trans. Inf. Forensics Secur. 2025, 20, 10260–10274. [Google Scholar] [CrossRef]
  32. Sun, J.; Jia, Y.; Wang, Y.; Tian, Y.; Zhang, S. Ethereum Fraud Detection via Joint Transaction Language Model and Graph Representation Learning. Inf. Fusion 2025, 120, 103074. [Google Scholar] [CrossRef]
  33. Zafarani, R.; Liu, H. Connecting Corresponding Identities across Communities. In Proceedings of the Third International AAAI Conference on Weblogs and Social Media, San Jose, CA, USA, 17–20 May 2009. [Google Scholar]
  34. Wang, Y.; Liu, T.; Tan, Q.; Shi, J.; Guo, L. Identifying Users across Different Sites Using Usernames. Procedia Comput. Sci. 2016, 80, 376–385. [Google Scholar] [CrossRef]
  35. Li, Y.; Peng, Y.; Ji, W.; Zhang, Z.; Xu, Q. User Identification Based on Display Names Across Online Social Net-works. IEEE Access 2017, 5, 17342–17353. [Google Scholar] [CrossRef]
  36. Motoyama, M.; Varghese, G. I Seek You: Searching and Matching Individuals in Social Networks; ACM Digital Library: New York, NY, USA, 2013; ISBN 9781605588087. [Google Scholar]
  37. Raad, E.; Chbeir, R.; Dipanda, A. User Profile Matching in Social Networks. In Proceedings of the 13th International Conference on Network-Based Information Systems, Takayama, Japan, 14–16 September 2010; pp. 297–304. [Google Scholar] [CrossRef]
  38. Iofciu, T.; Fankhauser, P.; Abel, F.; Bischoff, K. Identifying Users Across Social Tagging Systems. In Proceedings of the Fifth International AAAI Conference on Weblogs and Social Media, Barcelona, Spain, 17–21 July 2011. [Google Scholar]
  39. Na, Y.; Yinliang, Z.; Lili, D.; Genqing, B.; Liu, E.; Clapworthy, G.J. User Identification Based on Multiple Attribute Decision Making in Social Networks. China Commun. 2013, 10, 37–49. [Google Scholar] [CrossRef]
  40. Narayanan, A.; Shmatikov, V. De-Anonymizing Social Networks. Proc IEEE Symp Secur Priv 2009, 173–187. [Google Scholar] [CrossRef]
  41. Cui, Y.; Pei, J.; Tang, G.; Luk, W.-S.; Jiang, D.; Hua, M. Finding Email Correspondents in Online Social Networks. World Wide Web 2012, 16, 195–218. [Google Scholar] [CrossRef]
  42. Sensity AI: Best Deepfake Detection Software in 2025. Available online: https://sensity.ai/ (accessed on 17 June 2025).
  43. Deepfake Detection—Reality Defender. Available online: https://www.realitydefender.com/ (accessed on 17 June 2025).
  44. Xmind—Mind Mapping App. Available online: https://xmind.com/ (accessed on 17 June 2025).
  45. ISO/IEC 27001:2022; Information Security, Cybersecurity and Privacy Protection—Information Security Management Systems—Requirements. ISO: Geneva, Switzerland, 2022.
  46. ISO/IEC 27002:2022; Information Security, Cybersecurity and Privacy Protection—Information Security Controls. ISO: Geneva, Switzerland, 2022.
  47. United Nations Office of the High Commissioner for Human Rights (OHCHR) and Human Rights Center, University of California, Berkeley School of Law. The Berkeley Protocol on Digital Open Source Investigations: A New Standard for Online Investigations. Office of the High Commissioner for Human Rights, Geneva, Switzerland, 2020. Available online: https://www.ohchr.org/en/publications/policy-and-methodological-publications/berkeley-protocol-digital-open-source (accessed on 24 November 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.