Next Article in Journal
A Probabilistic Linguistic Multi-Criteria Optimization Approach: An Application on Cold Chain Supplier Selection for Perishable Goods
Previous Article in Journal
Optical Remote Sensing Image Classification Based on Quantum Statistics
Previous Article in Special Issue
A Comparative Investigation of Study ROI: Multimodal Personalized English Learning Environment Versus Traditional English Learning Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

TDSR: Distributed Data Asset Registration and Cross-Jurisdictional Verification in Trusted Data Spaces

1
School of Computer Science and Engineering, Macau University of Science and Technology, Macau SAR 999078, China
2
Saiyu Technology (Beijing) Co., Ltd., Beijing 100070, China
3
China Internet Network Information Center, Beijing 100070, China
4
China Academy of Industrial Internet, Beijing 100015, China
5
Companhia de Equipamentos Master, Limitada, Macau SAR 999078, China
*
Authors to whom correspondence should be addressed.
Electronics 2026, 15(10), 2079; https://doi.org/10.3390/electronics15102079
Submission received: 20 April 2026 / Revised: 9 May 2026 / Accepted: 10 May 2026 / Published: 13 May 2026

Abstract

Trans-border data circulation across multi-jurisdictional boundaries faces an operational conflict between ownership provenance prerequisites and data minimisation mandates, compounded by the tight coupling of large data payloads with synchronous state consensus ledgers, which forces replication of feature matrices across all consensus nodes and leads to network saturation. Existing frameworks remain unequipped to resolve this, as coupling in-band payload routing with synchronous state ledgers generates communication overheads scaling with data volume. The proposed Trusted Data Space with Registration (TDSR) implements a four-layer protocol stack. A dual-plane topology establishes a decoupled storage–ledger mechanism, partitioning asynchronous payload datastores and synchronous consensus ledgers to sustain throughput independent of data dimensionality. Navigating this infrastructure, the Unified Data Resource Identifier (UDRI) executes out-of-band cross-domain routing without exposing verifier intents. Driven by the Oblivious Data Asset Registration (ODAR) mechanism, a two-phase, four-algorithm lifecycle dictates end-to-end ownership provenance. This execution shifts hypothesis testing to isolated sandboxes via an algorithm-agnostic mathematical contract, capping external data transit at a constant leakage bound. A deployed testbed across the Guangdong-Hong Kong-Macao Greater Bay Area validates the proposed architecture, supporting data circulation across divergent legal jurisdictions.

1. Introduction

The digital economy relies on the circulation of data as a factor of production. The exchange of data resources involves the confirmation, registration, and cross-domain verification of Data Asset Ownership. Trusted Data Spaces establish distributed networks for this circulation, which are distributed, sovereignty-preserving infrastructures that enable controlled data sharing and circulation across organisational and jurisdictional boundaries. Providers retain control over datasets, including Domain Name System (DNS) recursive logs [1,2,3,4]. Trans-border data mobility requires a lifecycle for operations. A data provider at node N A extracts the fingerprint F D ( t ) from a source asset D to secure a cryptographic registration anchor. A transaction transfers a subset d to a consumer at node N B , and the receiving node N B initiates a provenance verification. The structure computes a decision D ( d , D ) { 0 , 1 } to prove the condition d D . Proving this condition ensures the traceability of ownership. Data minimisation mandates impose operational constraints. Executing the verification sequence occurs without exposing the source asset D to trans-border network transmission. Systems maintain a consensus state across untrusted domains. Executing cross-jurisdictional verification exposes a structural conflict between ownership provenance prerequisites and data minimisation mandates.
Existing frameworks are unable to resolve this conflict. They couple in-band payload routing with synchronous state ledgers, causing communication overheads that scale with data volume. The primary issue is the in-band coupling of large data payloads with synchronous state consensus ledgers, causing replication of feature matrices across all consensus nodes, resulting in communication overheads that scale linearly with data volume. Federated models manage data exchange via application gateways, where payload data and routing instructions traverse identical network channels [5]. This configuration exposes the verification intent to the host node. Information leakage across jurisdictional boundaries scales linearly with the volume of the queried data. Blockchain technologies provide state registration. Designs couple state consensus with data storage. Consensus nodes replicate feature matrices to maintain the ledger. This replication impacts network bandwidth. Transaction throughput varies with data dimensionality. Identification protocols navigate routing requirements. Decentralised identifiers (DIDs) authenticate cryptographic keys. Standard formats omit metadata regarding jurisdictional boundaries and physical storage planes [6]. Resolution involves centralised infrastructure or bilateral legal agreements. Verification mechanisms govern the proof of provenance. Cryptographic protocols involve interactive parameter exchanges between nodes. This interaction generates latency across networks. Distributed storage protocols lack state transition mechanisms to prevent double-registration [7]. Systems balance requirements for volume, authentication, and minimisation.
We propose the Trusted Data Space with Registration (TDSR), which implements a four-layer protocol stack defined by isolation boundaries, privacy–legal, intent–routing, and storage–consensus, for cross-jurisdictional data circulation. The infrastructure layer employs a dual-plane topology. Unlike existing Trusted Data Spaces that rely on in-band payload routing through centralised connectors, TDSR achieves constant-bounded leakage and scalability independent of data dimensionality. This topology partitions the physical network. A data plane executes asynchronous payload distribution. A trust plane executes synchronous state consensus [8,9]. The Unified Data Resource Identifier (UDRI) protocol binds the cross-layer operations by acting as a pointer encapsulating the infrastructure target, jurisdictional domain, registrar node, and ownership fingerprint [10,11]. The protocol stack executes the lifecycle driven by the Oblivious Data Asset Registration (ODAR) mechanism. The ODAR protocol coordinates out-of-band identity discovery. Verification shifts to local execution environments. The architecture supports fingerprint representations regardless of the extracting methods. The proposal enforces a mathematical bound on data leakage during trans-border data circulation [12,13]. It should be noted that its median response time of 3.82 s positions TDSR primarily for ownership audits and regulatory compliance verification rather than real-time streaming applications.
The content and contributions of this research are:
  • Architecture for TDSR: The architecture establishes isolation boundaries. The dual-plane topology establishes a decoupled storage–ledger mechanism, partitioning asynchronous payload datastores and synchronous consensus ledgers. This physical separation prevents ledger saturation, sustaining throughput independent of data dimensionality.
  • UDRI pointer and ODAR protocol: The UDRI pointer and the ODAR protocol standardise asset resolution. This combination enables out-of-band cross-domain routing without exposing verifier intent, bypassing the transmission scaling observed in in-band systems.
  • Interface contract and leakage bound: The architecture defines an algorithm-agnostic mathematical contract at the feature extraction interface. This operational sequence shifts hypothesis testing to isolated sandboxes, capping external data transit at a constant leakage bound.
The remainder of this paper is structured as follows. Section 2 reviews related work in data spaces and data asset registration. Section 3 defines the system model and formulation. Section 4 details the TDSR architecture, with the dual-plane mechanism, UDRI specification, and algorithmic executions. Section 5 evaluates system performance from the GBA deployment practice. Section 6 discusses advantages and limitations. Section 7 provides the conclusion and outlines future work.

2. Related Work

Cross-jurisdictional data circulation exposes an operational conflict between sovereignty mandates and distributed consensus mechanisms [3,14]. Research on data spaces in distribution demonstrates the throughput impact when systems couple payload datastores with state ledgers [15,16]. Evaluations of cross-domain routing identify the inability of identifiers to execute mapping across boundaries [10,17]. Analyses of privacy-preserving protocols reveal the trans-border communication overhead driven by interactive parameter exchange [13]. Synthesising these gaps dictates the engineering of the isolation boundaries in the TDSR architecture.

2.1. Distributed Data Spaces and Coupling Frameworks

The paradigm for information management shifted from isolated databases to distributed dataspaces to allocate data control to the provider [1,2,3]. Federated frameworks, including the International Data Spaces (IDS), utilise connector gateways to mediate exchange [18,19]. The application payload and the network routing instructions traverse identical channels during in-band transmission. Lacking an intent–routing isolation boundary, this network configuration generates communication overheads scaling with data volume during cross-domain verification [5]. Blockchain-based registries deploy Byzantine fault-tolerant (BFT) consensus ledgers to manage state [4,15]. These networks force nodes to replicate the feature matrix by coupling the payload datastore with the consensus state machine [20,21]. Transaction throughput degrades as data dimensionality increases [16]. Distributed storage systems, including InterPlanetary File System (IPFS) and Chord, decouple storage from routing using Distributed Hash Tables (DHTs) [9,22]. A sequential ledger to anchor ownership or prevent double-registration is absent in these designs [23]. Trans-border platforms omit a dual-plane infrastructure capable of isolating asynchronous datastores from synchronous consensus ledgers [24].

2.2. Cross-Domain Identifiers and Routing Protocols

Web architectures locate assets through the DNS and Uniform Resource Identifiers [10]. The dependency of these systems on root servers exposes query patterns to network surveillance [6]. Encrypting the transport layer via protocol modifications, including DNS over HTTPS [25,26], EDNS(0) [27], and Oblivious DoH [28,29,30,31], mitigates this exposure. A mechanism for cross-domain data asset registration without central authorities remains unaddressed. Within the Web3 ecosystem, DIDs standardise peer-to-peer identity mapping [32,33]. Consortium blockchains leverage DIDs and Verifiable Credentials (VCs) to execute cross-domain authentication [17,34,35]. Applying standard DIDs to a multi-jurisdictional data space reveals a protocol gap. A DID document authenticates the public key [36,37]. The specification omits the legal jurisdictional boundary, the infrastructure storage plane, and the cryptographic hash of the registered data asset [38,39]. A protocol string featuring cross-layer Protocol Data Unit mapping is absent in the literature. Systems require an identifier to navigate the infrastructure, resolve the compliance domain, and address the oblivious payload [11].

2.3. Data Provenance and Privacy-Preserving Verification

Regulatory mandates restrict the trans-border transfer of source assets [14,40,41,42]. Privacy-preserving frameworks bypass the transmission of the dataset D by computing provenance through Multi-Party Computation and Homomorphic Encryption [13,43]. The data provider and the verifier synchronise intermediate variables across the regulatory boundary during execution. This communication pattern introduces latency across networks and omits a privacy–compliance isolation boundary. Generating a cryptographic proof of provenance via Non-Interactive Zero-Knowledge Proofs imposes a computational overhead impacting operations for high-volume datasets such as DNS recursive logs [7]. The literature presents frameworks including CoVault that enable analytics over personal data repositories [44], alongside protocols including DP-BREM that introduce private mechanisms for distributed environments [45]. These solutions retain the trans-border traffic overhead scaling with data volume inherent in routing across regulatory domains. Shifting verification execution to isolated sandboxes within the compliance domain of the verifier achieves a trans-border communication boundary [46,47]. Coupling cryptographic verification mechanisms with representation algorithms, existing frameworks force a systemic overhaul of consensus layers during feature extraction upgrades [13,43]. This operational dependency dictates the necessity for an architecture that integrates local execution environments with an algorithm-agnostic interface.

3. Theoretical Formulation and System Model

Trans-border data registration and verification present a multidimensional optimisation problem constrained by network throughput, regulatory compliance, and adversarial threats. The model operates under a Dolev–Yao communication model and a f < n / 3 Byzantine fault tolerance threshold, excluding hardware side-channels and volumetric Distributed Denial of Service attacks. This section establishes the theoretical formulation for the TDSR architecture. We formulate the topological model to resolve the consensus bottleneck, partitioning the network graph into a synchronous trust plane and an asynchronous data plane. We define the cross-layer resolution protocol, where a mapping function establishes deterministic addressing across jurisdictional domains without central authorities. We formalise the statistical provenance algorithm as a multivariate hypothesis test, proving that the operational execution confines trans-border information leakage to a constant O ( 1 ) boundary. These theoretical formulations establish the system prerequisites, dictating the structural isolation boundaries engineered in the protocol stack. Table 1 details the formal notations.

3.1. Dual-Plane Topology Formulation

We model the multi-jurisdictional Trusted Data Space as a distributed network graph G = ( V , E ) , where V denotes the set of participating nodes across divergent legal domains and  E represents the inter-jurisdictional communication links. Within a unified architecture, data asset registration relies on State Machine Replication via BFT consensus [8]. Let n = | V | be the number of consensus nodes and  | F D | be the payload volume of the extracted fingerprint. The communication complexity C S M R required to reach finality for a single registration block is bounded by:
C S M R = O ( n 2 · | F D | )
As the feature dimensionality k of the data asset expands, the matrix size | F D | scales, causing C S M R to saturate the network bandwidth and degrade finality latency. The infrastructure must decouple the state consensus from the payload storage [48]. We formalise this decoupling mechanism by partitioning the network graph G into two isolated planes:
G P t r u s t P d a t a
P t r u s t functions as a synchronous ledger processing constant-size cryptographic pointers ( c p t r ), reducing the consensus complexity to C S M R * = O ( n 2 · c p t r ) . P d a t a operates as an asynchronous DHT dedicated to absorbing the variable-size payload. This topological separation manifests as the Layer 1 Infrastructure boundary in the protocol stack.
This dual-plane formulation provides the physical foundation for the isolation boundaries. By decoupling state consensus from payload storage, the topology ensures that the fingerprints generated at the feature extraction interface are the only assets entering the synchronous control plane, maintaining the structural separation of concerns.
Adversarial model: The system operates under the Dolev–Yao model with f < n / 3 Byzantine tolerance. A malicious verifier cannot learn the raw data or the verifier’s intent because only the UDRI hash is queried over the DHT. The  O ( 1 ) leakage bound prevents information leakage beyond the statistical manifold. Sybil attacks are mitigated by the Cryptographically Generated Identifier mechanism, which requires regulatory credentials.
Definition of “oblivious” in TDSR: In this work, the term “oblivious” refers to intent concealment: during a UDRI lookup over the IPFS DHT, the querier retrieves the fingerprint payload Ω by its content hash without revealing to the data provider or intermediate nodes which subset d is being verified or for what purpose. This is achieved through the standard content-addressed retrieval of IPFS DHT, not via cryptographic primitives such as Oblivious Transfer or Oblivious RAM. The out-of-band design further ensures that the provider node N A remains unaware of any verification activity initiated by verifier N B .

3.2. Hierarchical Architecture with Pointer and Resolution

To enable deterministic resolution across the decoupled planes ( P t r u s t P d a t a ) without relying on centralised identification systems [10], we define the UDRI protocol. Let U denote the universe of all valid UDRI strings. We define the resolution mechanism as a cross-layer protocol mapping function f U D R I :
f U D R I : U ( C D I D , Ω )
where C D I D represents the verifiable compliance credential anchored on P t r u s t , and  Ω represents the content payload addressed on P d a t a . The input UDRI string is formulated as a 4-tuple:
U D R I = ( C y b e r s p a c e , D o m a i n , R e g i s t r a r , O w n e r s h i p )
By executing f U D R I , a verifier node N B in an external jurisdiction authenticates the identity of the provider N A via the trust plane and retrieves the statistical parameters from the data plane. This mapping dictates the four-layer protocol stack hierarchy.
The UDRI protocol acts as a persistent pointer that resolves across the hierarchical layers. In our methodology, the resolution process terminates at the feature extraction interface, where it triggers the respective representation or verification functions. This ensures that the pointer resolves to a verifiable functional interface rather than a raw data locator [10].

3.3. Statistical Provenance and Hypothesis Testing Formulation

To execute the registration and verification lifecycle, the architecture must prove that a purported data subset d held by the verifier N B originates from the registered source asset D held by the provider N A . We formulate this as a multivariate statistical hypothesis testing problem.
Feature extraction: Let the source asset D be represented by a multivariate distribution. The provider N A applies an extraction function Φ ( D ) to generate the fingerprint F D ( t ) = ( μ D , Σ D ) , where μ D is the mean vector and Σ D is the covariance matrix of dimension k × k . This tuple is encapsulated into Ω .
Hypothesis testing: N B establishes the null hypothesis H 0 : d D . Upon retrieving Ω via the UDRI pointer, the isolated sandbox at N B computes the sample mean μ d of the local subset d. The distance D M at time t is calculated using the time-dependent mean vector μ D ( t ) and the covariance matrix Σ D ( t ) :
D M 2 ( x , μ D ( t ) ) = ( x μ D ( t ) ) T Σ D 1 ( t ) ( x μ D ( t ) )
For clarity in the operational phase, let D M 2 ( x ) denote this instantaneous distance under the fixed statistical manifold Ω retrieved via the UDRI pointer. The system defines a threshold τ based on a predefined significance level α and degrees of freedom k, where τ = χ k , 1 α 2 . The isolated sandbox then computes the deterministic Boolean decision D ( d , D ; t ) { 0 , 1 } :
D ( d , D ; t ) = 1 , if D M 2 ( x ) τ 0 , otherwise
The outcome D = 1 validates the cross-domain provenance constraint.
The hypothesis testing framework established here constitutes the mathematical contract for feature extraction interfaces. By formalising verification as a statistical distance threshold τ , the methodology allows the verification interface V ( d , Φ ( D ) ) to remain implementation-agnostic, supporting diverse representation algorithms as long as they satisfy the established provenance consistency.

Formal Contract for Distributional Alignment and Decoupled Interfaces

To resolve the statistical limitation of applying the Mahalanobis distance to non-linear or long-tailed raw data distributions, we refine the feature extraction interface as a composite mapping mechanism. The architecture decouples the extraction and verification into composable functions that adhere to a distributional alignment contract.
Let the extraction interface be defined as Φ = Φ s t a t ψ :
  • Alignment mapping ( ψ ): A transformation ψ : D r a w Z R k that projects the raw dataset D into a latent topological space Z . To satisfy the verification contract, ψ must ensure that the mapped representations ψ ( D ) approximate a multivariate normal distribution N ( μ D , Σ D ) .
  • Statistical extraction ( Φ s t a t ): A function Φ s t a t : Z R k × k × R k that computes the constant-size statistical manifold Ω = ( μ D , Σ D ) from the aligned latent space Z .
To address the non-Gaussian characteristics of raw DNS logs, the alignment mapping ψ projects the raw dataset into a latent space Z that approximates a multivariate normal distribution N ( μ D , Σ D ) .
Verification interface V: A deterministic evaluator V: ( ψ ( d ) , Ω ) { 0 , 1 } that executes the hypothesis testing. The architectural decoupling is valid if and only if the evaluation function satisfies conditional independence from the raw data given Ω :
P ( V ( ψ ( d ) , Φ ( D ) ) = 1 | d D ) 1 α
By abstracting ψ as a pluggable boundary, the TDSR architecture maintains algorithm agnosticism. Data providers may implement ψ using various feature representation methods, while the L1 trust plane processes the standardised output Ω . The algorithm-agnostic contract holds for any feature extraction function Φ that satisfies the distributional alignment condition ψ ( D ) N ( μ D , Σ D ) . This universality is achieved because verification depends only on the standardised statistical manifold Ω , not on the internal details of ψ .

3.4. Trans-Border Information Leakage Bound

To satisfy multi-jurisdictional data minimisation mandates, the system enforces an upper bound on trans-border information leakage [40]. Let L ( N A , N B ) denote the total volume of raw data crossing the legal boundary during a complete verification sequence.
Theorem 1.
The trans-border communication complexity of the TDSR verification lifecycle is bounded by a constant O ( 1 ) with respect to the raw source asset volume D and the queried subset volume d.
Proof. 
Let N B query the provenance of d against D. The verification is executed within the local sandbox of N B . The cross-domain payload exchange is restricted to the transmission of the UDRI pointer request and the retrieval of the payload from P d a t a . The total trans-border leakage function is defined as:
L T D S R ( N A , N B ) = s i z e ( U D R I ) + s i z e ( Ω )
The size of the UDRI is a fixed string. The Boolean decision D requires 1 bit. The feature dimensionality k is a fixed parameter, predefined and constant across all nodes. The size of μ D (a k-dimensional vector) and Σ D (a k × k matrix) is a structural constant C = O ( k 2 ) , independent of the cardinality of D or d. s i z e ( Ω ) is determined by the dimensions of the k × k covariance matrix Σ D and the k-dimensional vector μ D . Ω depends on the predefined feature dimensionality k, and it is independent of the cardinality of either the subset | d | or the source D. For any established feature extraction model, the payload is a structural constant C :
L T D S R ( N A , N B ) C O ( 1 )
This proof demonstrates that the architecture guarantees data minimisation. To achieve this constant bound without relying on interactive Multi-Party Computation [13], the architecture is forced to execute the calculation D ( d , D ) . This theoretical formulation requirement mandates the isolation boundary in the architecture.    □
These formulations define the operational limits for cross-domain data exchange. The required consensus complexity, f U D R I cross-layer mapping, and communication bound establish the structural prerequisites. These prerequisites will support the four-layer architecture proposal with isolation boundaries and key components.
This information leakage bound is enforced by the privacy–legal boundary. Since the architecture restricts network inputs to low-dimensional statistical representations Φ ( D ) , the mutual information I ( D ; Network ) is kept at a constant complexity, independent of the raw dataset’s cardinality n [39].
The verification process operates within a bounded two-phase commit protocol. Time constraints are introduced to ensure system liveness. T e l a p s e d represents the time elapsed since the verification request initiation. The threshold T c o m m i t defines the maximum time for consensus. If  T e l a p s e d T c o m m i t , then the transaction is aborted to prevent network congestion. The system state machine executes these temporal bounds to maintain liveness and trigger the defined rollback sequence upon timeout.

3.5. State Transition Model for Operational Workflow

To provide a theoretical foundation for the end-to-end execution lifecycle, the trans-border registration and verification process is operationalised as a sequential state machine M = S , E , δ , where S represents the set of system states, E denotes the protocol events, and  δ is the transition function. The execution integrates temporal boundary checks to resolve the non-deterministic latency inherent in wide-area networks.
The execution lifecycle is partitioned into two non-overlapping phases, enforcing the isolation boundaries:
1.
Registration phase ( δ r e g ): Maps an unregistered raw asset state s 0 to an anchored state s a n c h o r e d via the extraction and consensus events:
δ r e g : s 0 Extract Φ ( D ) 2 PC Consensus ( C D I D , Ω ) a n c h o r e d
The transition δ r e g is conditioned upon the consensus finality. If the trust plane confirms the anchor within the predefined temporal bound T c o m m i t , the system transitions to s a n c h o r e d . Otherwise, the rollback function prunes the local datastore to ensure state integrity. This phase guarantees that only the pointer metadata enters the synchronous ledger state.
2.
Verification phase ( δ v e r ): Maps a query event originating from an external jurisdiction to a definitive Boolean provenance decision, conditioned upon the anchored state:
δ v e r : ( d , ( C D I D , Ω ) a n c h o r e d ) Oblivious Retrieval Sandbox Execution D { 0 , 1 }
This formal state separation ensures that the verification transition δ v e r can operate on the retrieved payload Ω , preempting the necessity of querying the raw asset state s 0 . This guarantees the structural integrity of the workflow and enforces the trans-border information leakage bound established in Theorem 1.

3.6. Security Analysis

Verification of the ODAR protocol applies Burrows–Abadi–Needham logic to establish the formal belief of verifier N B in the authenticity and freshness of the fingerprint Ω from provider N A . Let P X (P believes X), P X (P sees X), P X (P once said X), P X (P has jurisdiction over X), and  # ( X ) (X is fresh). The interaction is idealised as a singular message transfer via the UDRI pointer:
Message 1 : N A N B : { Ω , t r e g } K A 1
The verification targets two terminal goals:
  • Goal 1: N B ( N A Ω ) (verifier believes provider registered the specific payload).
  • Goal 2: N B Ω (verifier believes the payload parameters are authentic).
The derivation relies on three initial assumptions mapped to the theoretical boundaries defined in prior sections:
Assumption 1
( N B p k A N A )). The cross-domain identity resolution mechanism enforces that p k A belongs to N A .
Assumption 2
( N B # ( t r e g ) )). The temporal boundary T c o m m i t during the state transition consensus ensures t r e g is fresh.
Assumption 3
( N B ( N A Ω ) )). The feature extraction interface, which isolates raw datasets from external queries, establishes that N A has jurisdiction over its registered asset Ω.
Step 1, message meaning: Given that N B holds p k A Assumption 1 and sees the signed payload:
N B p k A N A , N B { Ω , t r e g } K A 1 N B N A ( Ω , t r e g )
Step 2, freshness promotion: Applying the freshness rule to t r e g Assumption 2:
N B # ( t r e g ) N B # ( Ω , t r e g )
Step 3, nonce verification: Combining freshness and origin establishes belief in N A ’s current state Goal 1:
N B # ( Ω , t r e g ) , N B N A ( Ω , t r e g ) N B N A Ω
Step 4, jurisdiction application: Applying Assumption 3 over the belief from Step 3 confirms Goal 2:
N B ( N A Ω ) , N B N A Ω N B Ω
The deduction demonstrates that N B confirms ownership provenance, maintaining the minimisation constraints of the architecture.

4. The Proposed TDSR Architecture

We operationalise the methodology to construct the TDSR architecture. To specify the hierarchical architecture, we isolate data processing, identity resolution, and state consensus. We propose the protocol stack, formalise the identification, design the dual-plane network topology and define the decoupled storage and ledger mechanism. We demonstrate the detailed execution algorithms for distributed data registration and cross-domain verification.

4.1. Design Principles

The development of the TDSR architecture is guided by three fundamental principles to resolve the tension between data sovereignty and distributed scalability:
1.
Principle 1: data minimisation. The protocol stack enforces minimality by restricting trans-border communication to constant-size pointers and fingerprints, ensuring the mutual information between the source data and network traffic is bounded.
2.
Principle 2: decoupled representation. The system isolates the feature extraction logic from the state consensus layer. This decoupling ensures that algorithmic complexity at Layer 4 does not impact the throughput or finality of the trust plane at Layer 1.
3.
Principle 3: cross-domain interoperability. By utilising the UDRI pointer and a dual-plane topology, the architecture enables deterministic asset resolution across heterogeneous jurisdictional boundaries without requiring a centralised certificate authority.

4.2. The Four-Layer Architecture and Isolation Boundaries

TDSR is structured into a four-layer architecture, operationalising the functional separation of data processing, identity resolution, and state consensus. This hierarchical abstraction partitions the protocol stack into discrete layers (L1–L4) to enforce the isolation boundaries. As detailed in the integrated architecture in Figure 1, the UDRI segments resolve to specific layers, while the infrastructure layer maintains the structural decoupling between the data payload datastore and the state consensus ledger.

4.2.1. L1: Infrastructure Layer

This base layer manages the physical network connections, the TCP/IP stack, and the cryptographic operations. It separates the network traffic into a data plane and a trust plane to decouple payload resolution from state consensus. It acts as the entry point for the cross-domain infrastructure queries.

4.2.2. L2: Cross-Domain Interoperability Layer

This middleware layer implements out-of-band identity discovery to support multi-jurisdictional verification. It executes the ODAR protocol to query the DHT [28,29]. It parses the jurisdictional boundaries to route requests before they reach the specific host endpoints.

4.2.3. L3: Registration Node Layer

This layer hosts the local registries and executes the core verification logic. It manages the local datastores and coordinates the two-phase commit sequence for asset registration [38,39]. During verification, this layer provisions the isolated execution sandboxes to perform the local χ 2 hypothesis tests.

4.2.4. L4: Feature Extraction Interface Layer

This layer interacts with the raw data assets, such as DNS recursive logs [13]. It abstracts the feature extraction algorithms as a black-box input module to generate the fingerprint templates. It acts as the final resolution target for the content identifiers.
L4 establishes the mathematical and operational contract between raw data repositories and the control plane ledgers. By formalising this interface contract, TDSR treats feature extraction as a pluggable black-box, operating independently of the representation algorithm’s internal complexity. To enable trust evaluation without data exposure, L4 maps data assets to standard verification interfaces through the following core mechanisms:
Fingerprint Interface Φ ( D )
The representation interface abstracts the underlying feature extraction process. Let D denote the raw dataset residing in the data plane. The function Φ ( D ) maps D to a low-dimensional statistical representation, or fingerprint, which captures the underlying distribution of the data without retaining raw records. To support downstream compliance checks, Φ ( D ) must satisfy the divisibility property, ensuring that any legitimate subset d D maintains a verifiable statistical relationship with the global fingerprint Φ ( D ) . This interface generates the structural payload that UDRI protocol points to, standardising the metadata format for the control plane.
Verification Interface V ( d , Φ ( D ) )
The verification interface provides a standard mechanism to authenticate data subsets across domain boundaries. Given a questioned subset d and the registered global fingerprint Φ ( D ) , the interface executes a Boolean or probabilistic evaluation V ( d , Φ ( D ) ) { 0 , 1 } . This operation is executed within isolated sandbox environments in the application layer. By confining the statistical distance computations like Mahalanobis distance to this interface, the architecture prevents the leakage of global parameters to external queriers. The algorithm-agnostic contract is enforced solely through the structural output format: any implementation of Φ must produce a statistical manifold Ω = ( μ D , Σ D ) R k × R k × k , where k is fixed by the consortium. The verification interface V consumes Ω and a subset d, and outputs a decision without querying raw data. For example, a hash-based extractor can produce Ω by calculating mean and covariance of byte-trigram count vectors from network logs; an ML embedding extractor can replace this with the mean and covariance of the penultimate layer activations. Both conform to the same interface, leaving the consensus and routing layers unaffected.
Decoupling Design
The structural separation of L4 establishes a decoupled registration mechanism. The TDSR architecture treats Φ and V as pluggable modules. The control plane orchestrates the routing, consensus, and UDRI resolution without requiring awareness of the specific mathematical implementations of Φ or V [48]. This protocol decoupling ensures that systemic security and cross-jurisdictional standardisation of the chosen algorithmic feature extraction method are maintained [33].
To ensure architectural modularity, we define the verifiable data asset interface as a functional contract I V D A = Φ , V that abstracts the interaction between the data plane and the control plane:
  • Extraction interface Φ : A transformation function Φ : D r a w Ω that compresses a high-dimensional dataset D into a constant-size fingerprint Ω . This interface ensures that only privacy-preserving metadata is exported.
  • Verification interface V: An evaluation function V : ( d , Ω ) { 0 , 1 } that consumes a subset d and the anchored fingerprint Ω to output a Boolean provenance decision.
By formalising this interface, the TDSR decouples the specific feature engineering logic from the network’s consensus and routing protocols, allowing for independent algorithmic upgrades.

4.2.5. Stratification Principles and Isolation Boundaries

This stratification is a structural necessity enforced by three constraints in cross-jurisdictional data circulation: data minimisation mandates, multi-domain trust mapping, and consensus ledger scalability [12]. These constraints define three isolation boundaries as follows:
  • Privacy–legal boundary (L4 to L3): Data protection mandates prohibit raw asset transmission [40]. L4 restricts network inputs to fingerprints. Raw data remains within the local jurisdiction.
  • Intent–routing boundary (L3 to L2): Identity resolution is decoupled from local computation [29]. L2 manages cross-domain routing. L3 executes the logic within isolated sandboxes.
  • Storage–consensus boundary (L2/L3 to L1): High-volume payloads are separated from state ordering [48]. L1 implements a dual-plane topology. Asynchronous datastores are isolated from synchronous consensus.

4.3. UDRI Identification and Protocol Stack Mapping

The UDRI functions as a cryptographic pointer that encapsulates the physical resolution path, the legal jurisdiction, the node identity, and the data ownership asset into a single string. We define the formal syntax as follows:
udri : < Infra _ ID > : < Domain _ ID > : < Node _ ID > : < Cont _ ID >
The interoperability middleware parses this UDRI to execute cross-domain discovery. The identifier resolves the logical node identity via the BFT ledger and locates the physical data payload via the IPFS DHT [34].
This is an example UDRI string: udri:gba:hk:did:0x1a2b3c:Qmd…. Here, gba identifies the Greater Bay Area consortium infrastructure, hk specifies the legal jurisdiction, did:0x1a2b3c resolves the registrar node via the BFT ledger, and Qmd… is the Content Identifier for the fingerprint payload in the IPFS DHT. This structure enables oblivious routing because the verifier only performs a DHT lookup on the cryptographic hash of the Cont_ID component without revealing its intent or source IP address.
Governance of top-level identifiers: In the current GBA deployment, < Infra _ ID > and < Domain _ ID > are established through consortium governance: participating jurisdictions negotiate and sign a genesis configuration block that enumerates valid infrastructures and legal domains. Uniqueness is enforced by the BFT trust plane, which rejects duplicate registrations. For inter-consortium resolution across independent TDSR instances, a federation of registrars or existing international legal frameworks would serve as trust anchors, though full decentralisation of this top-level governance remains an open problem.

4.3.1. Infrastructure Identifier (<Infra_ID>)

The <Infra_ID> targets the specific BFT consortium network and IPFS swarm. It directs the middleware gRPC calls to the correct trust plane endpoint when multiple independent consortia operate within the same wide-area network.

4.3.2. Domain Identifier (<Domain_ID>)

The <Domain_ID> specifies the legal jurisdiction of the host node, for example, hk for Hong Kong or gd for Guangdong. It allows the verifier to audit the compliance boundary before initiating cross-border payload retrieval.

4.3.3. Node Identifier (<Node_ID>)

The <Node_ID> maps to the DID of the host node. The BFT ledger resolves this segment to retrieve the DID document and the associated VC to authenticate the node’s regulatory status.

4.3.4. Content Identifier (<Cont_ID>)

The <Cont_ID> corresponds to the hash identifier of the registered data asset. The middleware passes this segment to the data plane to initiate the oblivious DHT resolution for the fingerprint payload.

4.3.5. The Four-Layer Architecture Protocol Stack

Each layer executes defined protocols and processes Protocol Data Units. The UDRI string functions as the cross-layer binding agent [10,11]. Table 2 details the executed protocols at each operational layer and maps them to their corresponding UDRI resolution segments.
The protocol stack dictates a data processing sequence. During registration, L4 protocols generate the payload, L3 protocols coordinate the anchoring state, and L1 protocols execute the physical distribution. During verification, L1 retrieves the network state, L2 protocols authenticate the cross-domain credentials, and L3 protocols execute the hypothesis test. Each layer resolves a specific segment of the UDRI 4-tuple [34].
  • L1: Infrastructure layer: Resolves the < I n f r a _ I D >. It manages physical connections and separates traffic into the trust and data planes [49].
  • L2: Cross-domain interoperability layer: Resolves the < D o m a i n _ I D >. It implements out-of-band identity discovery and executes decentralised credential authentication [33].
  • L3: Registration node layer: Resolves the < N o d e _ I D >. It manages local registries and coordinates the two-phase commit sequence for asset anchoring [38].
  • L4: Feature extraction interface: Resolves the < C o n t _ I D >. It interacts with raw data to generate fingerprints F D ( t ) [7].
Traditional network identifiers do not capture the dual-plane semantics of a distributed data space. Building upon the paradigm of Named Data Networking where data is addressed by secure name rather than host location [11], we define a custom protocol identifier, the UDRI, to bridge the decoupled data and trust mechanisms. Unlike DIDs or Uniform Resource Identifiers, which embed or resolve full payload locations and often require in-band data transfer, UDRI functions as an oblivious cryptographic pointer that resolves only the statistical fingerprint Ω via out-of-band DHT lookup. This design decouples metadata resolution from payload transport, achieving constant-bounded leakage and enabling cross-jurisdictional verification without exposing raw data or intent. This approach is further supported by recent hierarchical blockchain frameworks for node authentication in IoT networks [50], which provide complementary mechanisms for decentralised identity management and Sybil-resistant verification.

4.3.6. UDRI Lifecycle Management

The UDRI lifecycle governs the cryptographic binding across the four functional segments: <Infra_ID>, <Domain_ID>, <Node_ID>, and <Cont_ID>. Management operations are synchronised through the trust plane to ensure cross-jurisdictional consistency [21,36].
Initialisation and registration: During the prefix phase, the <Infra_ID> and <Domain_ID> are established through consortium governance protocols, anchoring the infrastructure type and jurisdictional boundary. The <Node_ID> is bound to a provider’s identity via a BFT-backed registration transaction. The <Cont_ID> is generated upon the successful pinning of the data payload on the IPFS plane, completing the four-segment mapping.
Revocation and suspension: To invalidate a data asset, the provider issues a revocation certificate to the trust plane. This operation marks the <Cont_ID> as ‘inactive’ within the global state trie. While the raw data may persist in the IPFS DHT, the interoperability layer blocks all resolution requests for the associated UDRI, terminating the asset’s lifecycle for cross-domain verification.
Transferability and migration: Ownership migration involves re-binding a <Cont_ID> to a new <Node_ID> or <Domain_ID>. This is executed via a cross-shard transaction on the trust plane that updates the pointer reference without altering the underlying data fingerprint. For infrastructure migrations, the <Infra_ID> is updated, and a new UDRI is issued with a pointer to the historical version to maintain provenance continuity.
Termination and rollback: If a registration fails to reach consensus within the predefined window T c o m m i t , the system triggers a recursive rollback. This initiates the Unpin command for the <Cont_ID> at the data plane and prunes the pending <Node_ID> association from the trust plane mempool, preventing the accumulation of orphaned identifiers.

4.4. Dual-Plane Network Topology

The operational conflict between high-volume data transmission and low-latency state consensus necessitates the dual-plane network topology [21], extending the classical clean-slate separation of control and data planes to distributed data spaces [49]. Unified network architectures couple payload resolution with ledger synchronisation. This configuration degrades transaction throughput when consensus nodes process and replicate large feature matrices. To resolve this bottleneck, we partition the infrastructure layer into two isolated operational planes. The data plane manages the asynchronous distribution of variable-size payload objects. The trust plane executes the synchronous consensus of constant-size state anchors. This separation isolates the network traffic, prevents ledger saturation, and maintains block finality bounds during high-frequency data registration [24,51].
Practical integration with existing infrastructures is achieved using standard APIs. The data plane employs unmodified IPFS DHT for storage and retrieval of statistical fingerprints, while the trust plane utilises any permissioned BFT blockchain, such as Fisco Bcos, for anchoring constant-size UDRIs. No custom modifications to the underlying IPFS or BFT systems are required; communication occurs through well-documented REST and gRPC interfaces.

4.4.1. Data Plane for Payload Resolution

The data plane executes the IPFS protocol stack. Nodes communicate via networking architecture. The data plane routes the fingerprint objects using DHT protocols [9,22] to guarantee oblivious O ( log n ) state retrieval. This plane isolates the data transmission from the physical IP layer using TLS 1.3 encapsulation [52], preventing in-band packet interception [25,46]. To mitigate the risk of data loss due to node churn, the architecture implements multi-node pinning. The data plane distributes the fingerprint package Ω to k independent nodes within the DHT, ensuring availability through redundancy [9,20]. The IPFS DHT provides O ( log n ) lookup latency for content-addressed payloads, where n is the number of DHT nodes. To ensure reliable availability of fingerprint payloads Ω in a multi-jurisdictional setting, TDSR employs a pinning policy with redundancy factor r = 3 , distributing copies of each Ω across geographically diverse storage nodes within the consortium. Garbage collection is governed by the UDRI lifecycle: payloads remain pinned as long as the corresponding UDRI is active on the trust plane, and revocation triggers a controlled unpin operation. Storage nodes are operated by consortium members, ensuring compliance with data localisation requirements.

4.4.2. Trust Plane for State Consensus

The trust plane operates the BFT consensus ledger [51]. The nodes communicate via gRPC [53]. The trust plane manages state synchronisation across jurisdictional boundaries [16]. It provides the interface to query the DID and the anchored Content Identifiers without processing the underlying data payloads.
The two planes interact exclusively through the UDRI pointer. The trust plane anchors only the constant-size cryptographic pointer (256 bytes), while the data plane stores the full statistical fingerprint Ω via content-addressed DHT. Data integrity is maintained by anchoring the cryptographic hash of Φ ( D ) on the trust plane, ensuring that any modification to the payload in the data plane invalidates the anchored pointer.

4.5. Decoupled Storage and Consensus Mechanisms

The architectural division between the storage mechanisms and the consensus ledger addresses the operational limits of distributed protocols. BFT blockchains exhibit throughput degradation when nodes replicate high-volume feature matrices. DHTs accommodate variable-size payloads but lack the sequential state machines required to anchor ownership and prevent double-registration [7,15]. We partition the data structures to resolve this incompatibility. The system allocates the variable-size fingerprint matrices to the asynchronous datastore. It allocates the constant-size cryptographic pointers to the synchronous ledger. This decoupling restricts the consensus transaction size and sustains high registration throughput independent of the data dimensionality.

4.5.1. Data Payload Datastore

The data payload datastore operates on the IPFS swarm network [7]. It stores the fingerprint package Ω , which includes the statistical matrices, metadata, and provider signatures. The payload size scales with the feature dimensionality k. This datastore distributes objects via content addressing but does not enforce sequential ordering or global timestamping. The data plane operates in an untrusted environment. Payload integrity is guaranteed by the anchoring of Hash ( Ω ) on the trust plane during registration: the verifier can recompute the hash locally against the retrieved Ω and reject mismatches. Invalid UDRI hashes injected into the DHT are harmless, as resolution triggers signature verification unauthenticated payloads are discarded. To mitigate DHT pollution and node churn, the registration process pins Ω to k 3 independent storage nodes, ensuring availability even under 30 % node departures.

4.5.2. State Consensus Ledger

The state consensus ledger resides on the permissioned BFT blockchain [15]. It records the state transition transaction T x . The transaction contains only the N o d e _ I D , the 32-byte C o n t _ I D hash, the timestamp, and the signature. The ledger state size remains constant at 256 bytes per registration. This isolation prevents network saturation during high-frequency DNS data registration [43].

4.6. Two-Phase Data Asset Ownership Registration and Verification Algorithmic Execution

The preceding architectural components are synthesised into the unified TDSR framework. Figure 1 illustrates the overall system architecture, mapping the functional modules to their respective algorithmic executions across the four-layer protocol stack and the dual-plane infrastructure. This integrated representation confirms the structural alignment between the UDRI resolution path and the distributed registration and verification lifecycles.

4.6.1. Phase 1: Data Asset Registration

The registration phase enables the data provider N A to assert ownership over the source asset D. The host node executes the payload construction Algorithm 1 in Module 1 to decouple the physical resolution identity from the data contents [29,40]. The system then executes a two-phase commit Algorithm 2 in Module 2 to anchor the payload [38,54]. Rollback functions trigger if T e l a p s e d T c o m m i t or if network partitions prevent block finality, preventing unanchored data objects from saturating the local storage of node N A [46].
Algorithm 1 Fingerprint payload encapsulation
Require: Raw data asset D at time t, Metadata M, Provider secret key s k P
Ensure: Fingerprint payload Ω , Content Identifier C o n t _ I D
  1: μ D ( t ) , Σ D ( t ) FeatureExtraction ( D )
  2: S t r i n g s i g H ( μ D ( t ) Σ D ( t ) M )
  3: σ P Sign s k P ( S t r i n g s i g )
  4: Ω ( μ D ( t ) , Σ D ( t ) ) M σ P
  5: C o n t _ I D H ( Ω ) {Compute IPFS object hash for UDRI segment}
  6. return Ω , Cont _ ID
The system executes a simplified two-phase commit protocol, Algorithm 2, that focuses on the Prepare and Commit phases while omitting the full voting phase to reduce cross-jurisdictional latency. The rollback functions, Unpin and Delete, are triggered if T e l a p s e d T c o m m i t or if network partitions prevent the achievement of block finality. This ensures that unanchored data objects do not saturate the local storage of node N A . The term “oblivious” refers to the fact that neither the data provider nor any intermediate network node learns the verifier’s specific data subset d during registration or verification. Only the constant-size statistical fingerprint Ω and the UDRI pointer are exchanged across jurisdictional boundaries.
Algorithm 2 BFT state anchoring via two-phase commit
Require: Payload Ω , C o n t _ I D , N o d e _ I D , D o m a i n _ I D , I n f r a _ I D , Node secret key s k i ,
      Timeout limit T c o m m i t
Ensure: UDRI or Error State
  1: Datastore . Add ( Ω )
  2: Datastore . Pin ( C o n t _ I D )
  3: t r e g GetCurrentTimestamp ( )
  4: σ N i Sign s k i ( H ( N o d e _ I D C o n t _ I D t r e g ) )
  5: T x ( N o d e _ I D , C o n t _ I D , t r e g , σ N i )
  6: BFT _ Network . Broadcast ( T x )
  7: T e l a p s e d Timer . Value ( )
  8: while T e l a p s e d T c o m m i t do
  9:    if  BFT _ Network . CheckFinality ( T x ) = = True then
10:         IPFS _ DHT . BroadcastProviderRecord ( C o n t _ I D )
11:         U D R I FormatUDRI ( I n f r a _ I D , D o m a i n _ I D , N o d e _ I D , C o n t _ I D )
12.        return  U D R I
13:    end if
14: end while
15: LocalDatastore.Unpin(Cont_ID)
16: LocalDatastore.Delete( Ω )
17. return Error: State Anchoring Failed

4.6.2. Phase 2: Cross-Domain Verification

The verification phase enables the verifier node ( N B ) to prove that a local data subset d originates from the registered asset D. The protocol enforces verification without exposing the verifier’s intent to N A [13]. The system first executes the identity verification using Algorithm 3, which corresponds to Module 3, to validate node compliance attributes across jurisdictional boundaries [17,34]. This replaces the central CA. Upon successful authentication, the Data User executes the cross-domain verification using Algorithm 4, which corresponds to Module 4 [43]. The algorithm integrates UDRI parsing, ODAR retrieval, and local sandbox execution. It bounds trans-border leakage to O ( 1 ) [14].
The system executes the identity verification using Algorithm 3 to validate the node compliance attributes across jurisdictional boundaries. This replaces the central CA [6,37].
Upon successful authentication, the Data User executes the cross-domain verification using Algorithm 4. The algorithm integrates the UDRI parsing, the ODAR retrieval, and the local sandbox execution. It bounds trans-border leakage [14,28]. The current implementation employs the Mahalanobis distance with a χ 2 threshold, which assumes that the aligned latent representation ψ ( D ) approximates a multivariate normal distribution. This assumption holds for frequency-based features derived from DNS logs but may not generalise to heavily skewed, long-tailed, or unstructured data such as medical images. By virtue of the algorithm-agnostic interface, advanced alternatives, including non-parametric kernel density estimators or learned similarity metrics, can be substituted within the sandbox without modifying the consensus or routing layers.
Algorithm 3 Decentralised credential authentication
Require: Verifiable Credential V C i = ( N o d e _ I D , A , σ I ) , Node public key p k i , Issuer public
      key p k I
Ensure: Boolean Authentication Result
  1:
H a s h p k H ( p k i )
  2:
S u f f i x N o d e ExtractSuffix ( N o d e _ I D )
  3:
if  H a s h p k S u f f i x N o d e  then
  4:
    return False {Cryptographically Generated Identifier mismatch}
  5:
end if
  6:
H a s h V C H ( N o d e _ I D A )
  7:
I s V a l i d Verify p k I ( σ I , H a s h V C )
  8:
if  I s V a l i d = = False OR CheckRevocationRegistry ( V C i ) = = Revoked then  
  9:
    return False
10:
end if
11:
return True
Algorithm 4 Cross-domain statistical verification in isolated sandbox
Require: Target U D R I , Local data subset d, Significance level α , Degrees of freedom k
Ensure: Boolean Decision D ( d , D )
  1:
I n f r a _ I D , D o m a i n _ I D , N o d e _ I D , C o n t _ I D ParseUDRI ( U D R I )
  2:
V C i , p k i BFT _ Network . RetrieveState ( N o d e _ I D )
  3:
if  Algorithm ( V C i , p k i , p k I ) = = False  then
  4:
    return  Error : Compliance Authentication Failed
  5:
end if
  6:
Ω IPFS _ DHT . ObliviousLookup ( C o n t _ I D )
  7:
if  LookupTimeout ( )  then
  8:
    return  Error : Payload Discovery Timeout
  9:
end if
10:
μ D ( t ) , Σ D ( t ) , σ P ParsePayload ( Ω )
11:
Sandbox . AllocateMemory ( )
12:
x Sandbox . ExtractFeatures ( d )
13:
D M 2 ( x ) ( x μ D ( t ) ) T Σ D ( t ) 1 ( x μ D ( t ) )
14:
τ χ k , 1 α 2
15:
if  D M 2 ( x ) τ  then
16:
     D ( d , D ) 1
17:
else
18:
     D ( d , D ) 0
19:
end if
20:
Sandbox . Purge ( d , x , μ D ( t ) , Σ D ( t ) )
21:
return  D ( d , D )

4.7. End-to-End Operational Workflow

The TDSR operational lifecycle can be abstracted as a sequential state transition from local raw data to a verified global claim. This unified workflow is expressed as follows:
Data Φ Fingerprint ( Ω ) 2 PC Registration ( P t r u s t ) UDRI Query V Verification
The system exposes a unified interface Φ ( D ) for fingerprint extraction and V ( d , Φ ( D ) ) for verification, enabling decoupled and privacy-preserving cross-domain validation.
The process begins with the provider extracting a statistical manifold from raw logs. This fingerprint is anchored to the trust plane via a two-phase commit, establishing an immutable global state. A verifier utilises the UDRI pointer to resolve the asset’s location and executes the verification interface within an isolated sandbox, completing the provenance check without raw data exposure.
The interaction between the layers and the defined interfaces is operationalised through a two-phase workflow, as illustrated in the sequence diagram in Figure 2. It maps the execution sequence of the defined algorithms across the operational network layers. This section details the end-to-end processes for asset registration and cross-domain verification, which ensure that data sovereignty remains intact during transnational circulation and orchestrates the four core algorithms.

4.7.1. Distributed Data Registration

The registration pipeline begins in the data plane of the origin domain and comprises local extraction and global consensus. First, the data owner invokes the feature extraction interface to execute Algorithm 1, which extracts the divisible fingerprint Φ ( D ) and mints the UDRI. This step ensures that the raw dataset never leaves the local jurisdiction.
Following the local preparation, the service layer broadcasts the registration request to the control plane. To ensure ledger consistency across multi-jurisdictional nodes without recording the raw data, the architecture invokes the two-phase commit mechanism governed by Algorithm 2. Upon successful consensus, the cryptographic hash of Φ ( D ) and the UDRI locator are appended to the immutable ledger [15], completing the registration with an information leakage bound.

4.7.2. Cross-Domain Query and Verification

The verification lifecycle is triggered when an authorised consumer in a different jurisdiction initiates a query, progressing through distributed routing and isolated computation. The consumer submits the target UDRI to their local application layer. The system first executes Algorithm 3 to parse the identifier, resolve the intent-routing boundary, and direct the request to the specific origin domain [22].
Once the system has succeeded in pointing the request, a secure execution environment in the sandbox is instantiated in the origin registration node. The consumer provides the questioned data subset d. The feature extraction interface then transitions to Algorithm 4, which performs the cross-domain statistical verification V ( d , Φ ( D ) ) . It calculates the confidence interval against the registered fingerprint and outputs a binary verification result. By isolating the computation, this integrated pipeline ensures that trust is established without raw data disclosure [40].

5. Results and Evaluation

We deployed a distributed testbed across GBA, Guangdong, Hong Kong, and Macao, to evaluate the TDSR architecture. The evaluation quantifies the operational costs and structural boundaries introduced by the ODAR protocol and the UDRI identifiers, focusing on execution latency, dimensionality invariance, continuous overhead, and system resilience. We implemented the Mahalanobis distance and χ 2 test as baseline proxies to validate end-to-end connectivity and O ( 1 ) communication bounds. We do not evaluate feature extraction accuracy by particular algorithm or for specific data distributions. While the GBA testbed provides a rigorous proof-of-concept validation using real-world DNS recursive logs, the architecture is designed to be data-agnostic at Layer 4. For time-series structured data typical in medical telemetry or financial transaction records, the same feature extraction and statistical provenance mechanisms can be applied.

5.1. Experimental Setup and Baseline Models

The testbed operates across regional data centres over Wide Area Network, with baseline round-trip times between 15 and 25 milliseconds. Virtual machines host all protocol layers. Operating environments utilise Ubuntu 22.04 LTS and Docker containerisation.
The trust plane comprises 64 BFT validator nodes with hardware allocation of eight vCPUs, 16 GB RAM, and 500 GB NVMe storage. The data plane comprises 12 IPFS storage nodes with a replication factor of 3 and hardware allocation 16 vCPUs, 64 GB RAM, and 4 TB NVMe storage.
The testbed ingests real-world DNS recursive logs from a cooperative provider in the GBA region. The dataset consists of approximately 2.11 million DNS resolution requests per day, with each log entry containing a timestamp, anonymised source IP, queried domain, record type, and response code. Feature extraction at Layer 4 employs a frequency-based approach: each 24 h window is partitioned into 5 min bins, and the empirical distribution of query types (A, AAAA, MX, etc.), response codes, and top-level domain categories is aggregated into a k = 16 -dimensional feature vector. The fingerprint Φ ( D ) then computes the mean vector μ D and 16 × 16 covariance matrix Σ D over the complete observation period, producing a constant-size statistical manifold Ω of approximately 4 KB. All performance metrics include 95% confidence intervals (CI) derived from 1000 independent query iterations. Wilcoxon rank-sum tests validate the performance divergence between TDSR and baselines. Significance is established at p < 0.01 .
We employ two models as baselines:
  • Coupled ledger as Baseline A: A standard blockchain architecture lacking the storage–consensus boundary. Nodes embed the complete feature template F D ( t ) within the BFT consensus transactions.
  • Federated in-band routing as Baseline B: A federated architecture lacking the intent–routing boundary. Nodes route verification queries to target IP addresses and authenticate via centralised Public Key Infrastructure (PKI).
Baseline A is implemented as a standard permissioned BFT blockchain of Fisco Bcos where complete feature matrices Φ ( D ) are embedded in consensus transactions. Baseline B is realised as a federated architecture with centralised PKI gateways that route full payloads in-band.
The dual-plane decoupling introduces a modest latency overhead of 280 ms for DHT lookup and 150 ms for BFT anchoring compared with tightly coupled baselines. This trade-off is deliberate: it eliminates O ( n 2 · | D | ) communication complexity and ensures constant-bounded trans-border leakage, making TDSR particularly suitable for ownership audits and compliance verification rather than real-time streaming applications. Deployment follows standard gRPC and REST APIs with no custom modifications to underlying IPFS or BFT systems, enabling straightforward integration with existing infrastructures.

5.2. End-to-End Execution Latency

As anticipated from the system design and positioning discussed, the end-to-end cross-domain verification latency remains acceptable for audit and compliance scenarios, with a median of 3.82 s across regional paths in the GBA testbed. We quantify the execution latency across the cross-domain verification lifecycle. This phase requires the verifier node to resolve the UDRI pointer and authenticate a queried dataset across jurisdictional boundaries.
Figure 3a presents the Cumulative Distribution Function of this execution. The distribution confirms that total cross-border verification latency remains bounded within 4.35 s at the 95th percentile, with a median response time of 3.82 s. Deconstructing this median latency across the protocol stack reveals the operational costs of the decoupling mechanisms. The L1 data plane executes the oblivious DHT lookup to retrieve the payload in 2.85 s. The L2 layer executes the out-of-band credential authentication in 0.45 s. The L3 layer allocates the isolated sandbox and computes the statistical verification in 0.52 s. The architecture absorbs these operational latencies to eliminate point-to-point network connections and to hide resolution intents.
Figure 3b–d evaluate the TDSR architecture against baselines across regional routes. Baseline A exceeds 12.0 s in all instances. The coupled ledger mechanism collapses under high-dimensional payload replication. Baseline B maintains P95 latency between 5.06 s and 5.42 s. TDSR restricts P95 latency to a range of 4.24 s to 4.61 s. The dual-plane topology isolates consensus overhead. The ODAR mechanism enforces data minimisation. Latency remains stable regardless of the regional routing path.

5.3. Payload Dimensionality Invariance

To validate the isolation efficacy of the storage-consensus boundary, we executed a dimensionality stress test. The system mandates that throughput remains independent of data dimensionality. The feature matrix size extracted at the L4 interface increases from 256 bytes to 50 megabytes.
Figure 4 presents the transaction throughput response. Baseline A experiences throughput degradation to near-zero as consensus nodes replicate expanding payloads across the synchronous network. The TDSR architecture sustains a constant throughput of 2400 Transactions Per Second across all payload dimensions. By anchoring only uniform 256-byte UDRI pointers on the trust plane, the system isolates the state machine from the variable-size data objects, confirming the O ( 1 ) throughput invariance.
While the physical testbed incorporates 64 BFT validator nodes, the decoupled topology provides theoretical guarantees for massive-scale deployment. By offloading the variable-size multidimensional payloads to the IPFS data plane, the trust plane is relegated to ordering 256-byte UDRI pointers. The existing BFT consensus literature confirms that when transaction payloads are minimised to this constant bound, state machine replication can scale to thousands of nodes without bottlenecking throughput [8,48,51]. The 2400 TPS observed in the 64-node deployment represents a baseline capacity; the architecture is capable of supporting global-scale multi-jurisdictional nodes.

5.4. Continuous Operational Overhead

To evaluate system behaviour under sustained load and verify the data minimisation bounds, we executed a 30-day stress test, processing 2.11 million daily DNS resolution requests.
During a single cross-border query, the L4 interface confines raw execution to the local sandbox. The network transmits 128 bytes for the UDRI request, 850 bytes for the DID credential, and 4096 bytes for the statistical payload. Total application payload crossing the border is bounded to 5.8 kilobytes, independent of the gigabyte-scale source datasets.
As shown in Table 3, the enforced O ( 1 ) constant trans-border leakage bound is maintained at ≤5.8 KB per query. This application payload is explicitly composed of: (1) 128 bytes for the encapsulated UDRI query string; (2) 850 bytes for the DID document and associated VC metadata; and (3) 4096 bytes for the statistical fingerprint Ω , comprising a small-dimensional covariance matrix Σ D , mean vector μ D , and cryptographic signatures. This fixed composition ensures that trans-border transmission remains independent of the source data’s gigabyte-scale cardinality.
Figure 5 illustrates the accumulated system overhead. The 3000 MB ledger failure threshold represents the maximum operational memory allocation designated for smart contract execution environments and state databases on the lightweight validator nodes deployed within our GBA testbed. Baseline A crossed this threshold on day 15 due to the continuous on-chain replication of high-dimensional matrices. Baseline B breaches the 15,000 MB gateway bandwidth threshold on day 14. The architecture transmits raw data payloads in-band.

5.5. System Resilience

Experiments evaluate the system resilience through Byzantine fault injection and data plane node churn. Figure 5 and Figure 6 display the performance response.
The trust plane sustains block finality below 4.5 s under f < n / 3 Byzantine conditions. Consensus finality remains below failure limits across the 64-node testbed. The data plane processes 30% node churn. Multi-node pinning maintains payload availability. Oblivious DHT routing latency increases by 0.8 s.
Figure 6 quantifies the operational stability over 30 days. The 5.8 KB trans-border leakage bound prevents the system from reaching the 3000 MB ledger storage threshold and 15,000 MB network bandwidth threshold. Baseline A reaches the 3000 MB threshold on day 15. Baseline B reaches the 15,000 MB threshold on day 14. TDSR maintains operation below all failure thresholds for the 30-day duration. The dual-plane topology isolates the trust plane from data plane partitions and wide-area network jitter. The architecture enforces data minimisation to maintain system liveness.

5.6. Architecture Ablation

Table 4 presents an ablation analysis isolating the operational impact of the core architectural boundaries.
Removing the storage–consensus boundary (TDSR w/o Storage–Consensus) forces the trust plane to process variable-size multidimensional feature matrices. Throughput falls below 10 TPS. Finality latency scales to O ( k ) , exceeding 12,000 ms. The consensus ledger saturates.
Removing the intent–routing boundary (TDSR w/o Intent–Routing) forces in-band payload transmission. Trans-border payload scales linearly ( O ( N ) ) with queried data volume, routinely exceeding 10 MB. Network bandwidth saturates. Data minimisation mandates are breached.
The full TDSR architecture executes all isolation boundaries. Throughput is constant at 2400 TPS. The trans-border payload is strictly bounded to 5.8 KB per query. The dual-plane topology and ODAR mechanism jointly maintain data minimisation and system scalability.

6. Discussion

Execution metrics from the deployment quantify the isolation boundaries of the TDSR architecture. This section analyses the mechanisms of the protocol stack, resolving the conflict between sovereignty mandates and consensus scalability. We examine the storage–consensus decoupling mechanism, establishing bounds on transaction throughput. We evaluate the intent–routing boundaries confirming cross-domain compliance. We map the trans-border communication payload to the privacy prerequisites. Synthesising these mechanisms demonstrates that TDSR operates as a protocol stack governing trans-border data circulation.

6.1. Architectural Advantages over Reference Models

Table 5 summarises the architectural properties and operational boundaries between the reference models and the proposed TDSR. The TDSR architecture resolves the structural limitations inherent in the in-band federated model evaluated as Baseline B, prevalent in frameworks such as GAIA-X and the IDS. In-band systems couple identity resolution with payload routing, which exposes verifier intents to host nodes and incurs O ( N ) trans-border communication overhead scaling linearly with data volume [1,19,38]. In contrast, the TDSR implements an out-of-band decoupled architecture. By executing intent resolution via the IPFS DHT, the ODAR mechanism enforces a trans-border communication boundary. This architectural separation caps external payload transmission at a constant O ( 1 ) bound of ≤5.8 KB per query. The system achieves deterministic data minimisation through its routing topology, circumventing the need for the interactive bandwidth exchanges required by Multi-Party Computation protocols.
TDSR mitigates the dimensionality variance associated with the coupled ledger model evaluated as Baseline A. Blockchain registries anchor ownership by embedding multidimensional feature matrices into the state consensus, which can lead to throughput degradation as data volume expands [5,15]. By partitioning the asynchronous datastore and the synchronous trust plane, the TDSR restricts L1 consensus operations to 256-byte UDRI pointers, sustaining a consistent throughput of 2400 TPS independent of the underlying data scale [46]. The architecture establishes algorithm agnosticism through the L4 interface contract. By providing an isolated sandbox for pluggable verification modules, the TDSR accommodates diverse privacy-preserving mechanisms—such as Zero-Knowledge Proofs—without embedding their associated computational complexities, such as O ( k log k ) proof generation, into the global consensus layer, shifting the security perimeter from interactive networks to the protocol boundary. While the algorithm-agnostic interface allows ZKP verifiers to be deployed within the sandbox without altering the consensus layer, the local computational cost of ZKP proof generation, typically O ( k log k ) and several seconds to minutes for non-trivial datasets, would exceed the current 3.82 s median latency budget of TDSR. Thus, TDSR occupies a middle ground in the privacy-preserving spectrum: it provides deterministic O ( 1 ) network-level leakage with low latency suitable for audits while remaining compatible with stronger but costlier primitives like ZKPs for high-assurance scenarios.

6.2. Multi-Jurisdictional Compliance Execution

The TDSR architecture established a cross-jurisdictional compliance mechanism for data circulation operating across divergent legal jurisdictions, driven by the integration of the ODAR protocol and the DID mechanism [14]. Legal frameworks, including the Personal Information Protection Law and the General Data Protection Regulation, enforce data minimisation and restrict the trans-border transfer of identifiers [18,40,41].
The ODAR protocol executes the data minimisation constraint. A Data User querying a DNS subset triggers the middleware to retrieve the fingerprint via the IPFS DHT, bypassing direct connections to the host node [28]. The DHT mechanism isolates the origin IP address of the verifier to satisfy query intent regulations. Jurisdictions lack recognition of CA [6]. The architecture replaces the CA with a DID mechanism based on Cryptographically Generated Identifiers [37]. Jurisdictions issue VC to bind compliance attributes to these DIDs [42]. Proofs and evaluations demonstrate the trans-border payload is bounded. The network transmits 5.8 kilobytes of hash commitments and Boolean scores per query. Zero bytes of DNS recursive logs cross jurisdictional boundaries. Beyond data minimisation, TDSR supports the right to be forgotten through UDRI revocation and facilitates data portability by allowing owners to transfer Cont_ID ownership without moving raw data. The architecture enables regulatory auditing by providing verifiable provenance proofs while maintaining zero raw data transit across borders. Regulatory enforcement is supported through immutable UDRI-anchored audit logs on the trust plane, while the DID and VC mechanisms enable legal interoperability under cross-border governance and compliance frameworks.

6.3. System Overhead and Security Boundaries

The decoupling mechanisms introduce latencies during cross-border verification, generating a response time of 3.82 s. This comprises 2.85 s for asynchronous DHT lookup operations across the data plane, with BFT state retrieval and sandbox executions adding 0.97 s, reflecting the cost of hiding resolution intents [7,33]. To manage state divergence, the architecture enforces a two-phase commit protocol where consensus timeouts trigger rollback functions to purge unanchored payloads, preventing ledger state pollution in the off-chain storage layer.
The decoupling architecture establishes security boundaries by isolating attack vectors through the dual-plane topology. The BFT consortium blockchain maintains state consistency to mitigate double-registration under the fault tolerance threshold f < n / 3 [16]. The Cryptographically Generated Identifier mechanism addresses Sybil attacks by requiring regulatory credentials for cross-domain resolution [29], ensuring the authentication layer remains secure across jurisdictional boundaries.

6.4. Operational Scope and Limitations

This execution profile dictates the operational scope of the TDSR architecture, positioning it for cross-jurisdictional data ownership audits and compliance verification rather than stream control. In contexts where certificate authority processes require days, a 3.82 s deterministic resolution provides a magnitude improvement. To optimise the DHT routing bottleneck during query sequences, the L2 interoperability layer integrates a Least Recent Used credential cache governed by Time-To-Live constraints, enabling cache hits to reduce end-to-end verification to the BFT retrieval threshold (<1.0 s).
The architecture is subject to methodological and infrastructural limitations. While the alignment mapping ψ resolves the skew in non-Gaussian distributions, evaluating the accuracy of mapping algorithms in processing long-tailed datasets remains a challenge [45]. The security proofs presented are based on an assumption of successful distributional alignment via the mapping ψ . In practice, extreme skewness in certain real-world datasets may challenge this assumption. Evaluations omit stress testing against Distributed Denial of Service on DHT nodes [7,30]. Future iterations address these constraints by integrating Trusted Execution Environments [44] and Byzantine-resilient mechanisms to formalise a hardware root of trust. While the GBA deployment provides a robust proof of concept using real-world DNS recursive logs, more complex data modalities such as medical imaging and detailed financial audits present additional challenges due to their high dimensionality and non-tabular structure. These limitations are recognised in the current study. In particular, long-tailed and heavily skewed distributions, which are prevalent in many real-world datasets, may challenge the distributional alignment assumption. Additional limitations include potential exposure to Distributed Denial of Service (DDoS) attacks targeting the DHT lookup service or BFT consensus nodes, as well as adversarial conditions where malicious registrars attempt to flood the system with invalid UDRIs. While the current design incorporates rate limiting at the intent-routing layer and Cryptographically Generated Identifiers for Sybil resistance, these threats warrant further mitigation strategies in future deployments.

6.5. Architectural Generalisation and Transferability

The L4 feature extraction interface abstracts representation algorithms to enable deployment beyond DNS records. Substituting the local extraction algorithm applies the O ( 1 ) trans-border leakage bounds to diverse data infrastructures. Industrial IoT sensor telemetry utilises the dimensional invariance to prevent ledger saturation such as distributed split single-sideband time-modulated arrays [55] and distributed IRS beamforming [56]. Medical case databases utilise oblivious routing and isolated sandboxes to enforce zero-byte trans-border raw data transfer. Financial logs utilise the decentralised credential mechanism to execute compliance verification without central authorities.
The TDSR architecture establishes a data-agnostic verification mechanism. The decoupled dual-plane topology ensures variations in raw data formats and query volumes do not compromise the synchronous consensus ledger. The architecture provides a generalised baseline for multi-jurisdictional data asset registration.

7. Conclusions and Future Work

The proposed TDSR architecture resolves the operational conflict between ownership provenance prerequisites and data minimisation mandates. By implementing a four-layer protocol stack and a dual-plane topology, the system establishes a decoupled storage–ledger mechanism. This partitioning of asynchronous payload datastores from synchronous consensus ledgers sustains throughput independent of data dimensionality. Navigating this infrastructure, the formulated UDRI executes out-of-band cross-domain routing without exposing verifier intents. Driven by the ODAR mechanism, the two-phase, four-algorithm lifecycle shifts hypothesis testing to isolated sandboxes via an algorithm-agnostic mathematical contract. This operational workflow caps external data transit at a constant leakage bound. The deployment across the Guangdong–Hong Kong–Macao Greater Bay Area validates the architecture, establishing a robust compliance mechanism for data circulation across divergent legal jurisdictions.
Future work will not only optimise the performance but also evaluate the TDSR architecture against attack models. We plan to extend the feature extraction interface to better accommodate unstructured and high-dimensional data such as medical imaging and financial audit records, exploring non-parametric kernel methods and Trusted Execution Environments to further strengthen cross-sector applicability. Exploring non-parametric kernel density estimation and robust statistical techniques to better accommodate long-tailed and non-Gaussian data distributions is also scheduled. This evaluation will execute attack-and-defence experiments, testing robustness against DHT record pollution, Sybil routing attacks, and Distributed Denial of Service vectors across the dual-plane topology. Future research will explore non-parametric kernel methods to support data representations. Integrating Trusted Execution Environments [44] and Byzantine-resilient verification mechanisms [45] will formalise the hardware-level root of trust required to mitigate poisoning vectors in multi-jurisdictional networks.

Author Contributions

X.Y. served as the primary implementer of the research and the main drafter of the manuscript. J.X. contributed to the cross-border experiments. W.D. designed the distributed mechanisms. C.Z. led the compliance implementation. J.R. and S.L. provided expert guidance on the practical aspects of data asset registration. W.I.L. supplied the necessary research facilities and funding support. W.W. (Wei Wang) provided primary idea of the architecture, identification system and essential data resources with funding support as co-corresponding author. W.W. (Wenyong Wang) conceived the research direction and core ideas while also providing the required research conditions as W.W. (Wei Wang). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Macau Science and Technology Development Funds under grant number 0019/2025/EIB2.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data are available upon request.

Acknowledgments

The authors would like to thank the Recursive Domain Name Service Provider (1.2.4.8) for their cooperation in this research. The research is under a data administrative and compliant environment. During the preparation of this manuscript, the authors used Google Gemini 3.1 Pro for the purposes of language polishing. The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

Authors Xingxing Yang and Wei Yang were employed by the company Saiyu Technology (Beijing) Co., Ltd, Beijing, China; author Wai Ip Lei was employed by Companhia De Equipamentos Master, Limitada, Macau SAR, China. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
BFTByzantine Fault-Tolerant
CACertificate Authority
DHTDistributed Hash Table
DIDDecentralised Identifier
DNSDomain Name System
GBAGuangdong–Hong Kong–Macao Greater Bay Area
IDSInternational Data Space
IPFSInterPlanetary File System
ODAROblivious Data Asset Registration
PKIPublic Key Infrastructure
TDSRTrusted Data Space with Registration
UDRIUnified Data Resource Identifier
VCVerifiable Credential

References

  1. Mitrovska, A.; Shariati, B.; Jafari, A.; Safari, P.; Fischer, J.K.; Freund, R. Network data sharing: A governance framework for ensuring data sovereignty and privacy compliance. J. Opt. Commun. Netw. 2025, 17, 1019–1028. [Google Scholar] [CrossRef]
  2. Franklin, M.; Halevy, A.; Maier, D. From databases to dataspaces: A new abstraction for information management. SIGMOD Rec. 2005, 34, 27–33. [Google Scholar] [CrossRef]
  3. Jarke, M. Data Sovereignty and the Internet of Production. In Advanced Information Systems Engineering; Springer: Cham, Switzerland, 2020; Volume 61, pp. 549–560. [Google Scholar] [CrossRef]
  4. Liang, Z.-Y.; Liu, G.-Y.; Ren, Y.; Yang, M.; Jiang, R.-W.; Luo, Y.; Ma, Y.-S. Trustworthy Data Space Collaborative Trust Mechanism Driven by Blockchain: Technology Integration, Cross-Border Governance, and Standardization Path. Information 2025, 16, 1066. [Google Scholar] [CrossRef]
  5. Gu, P.; Chen, L. An Efficient Blockchain-based Cross-domain Authentication and Secure Certificate Revocation Scheme. In Proceedings of the IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 11–14. [Google Scholar] [CrossRef]
  6. Deccio, C.; Davis, J. DNS privacy in practice and preparation. In Proceedings of the 15th International Conference on Emerging Networking Experiments and Technologies; Association for Computing Machinery: New York, NY, USA, 2019; pp. 138–143. [Google Scholar] [CrossRef]
  7. Yu, H.; Yuchi, X.; Yang, X.; Li, H.; Yang, X.; Wang, W. DNS-Sensor: A Sensor-Driven Architecture for Real-Time DNS Cache Poisoning Detection and Mitigation. Sensors 2025, 25, 6884. [Google Scholar] [CrossRef]
  8. Danezis, G.; Kokoris-Kogias, E.; Sonnino, A.; Spiegelman, A. Narwhal and Tusk: A DAG-based mempool and efficient BFT consensus. In Proceedings of the Seventeenth European Conference on Computer Systems (EuroSys); Association for Computing Machinery: New York, NY, USA, 2022; pp. 34–50. [Google Scholar] [CrossRef]
  9. Stoica, I.; Morris, R.; Karger, D.; Kaashoek, M.F.; Balakrishnan, H. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Comput. Commun. Rev. 2001, 1, 149–160. [Google Scholar] [CrossRef]
  10. Berners-Lee, T.; Fielding, R.; Masinter, L. RFC 3986—Uniform Resource Identifier (URI): Generic Syntax; Internet Engineering Task Force (IETF): Fremont, CA, USA, 2005. [Google Scholar] [CrossRef]
  11. Zhang, L.; Afanasyev, A.; Burker, J.A.; Jacobson, V.L.; Claffy, K.C.; Crowley, P.J.; Papadopoulos, C.; Wang, L.; Zhang, B. Named data networking. ACM SIGCOMM Comput. Commun. Rev. 2010, 44, 66–73. [Google Scholar] [CrossRef]
  12. Zhang, C.; Liu, Y.; Xu, M.; Yang, X.; Li, P.; Yang, C.; Liu, Q.; Xiong, X.; Chen, P.; Wang, W. Trans-Border Trusted Data Spaces: A General Framework Supporting Trustworthy International Data Circulation. IEEE Access 2025, 13, 30481–30496. [Google Scholar] [CrossRef]
  13. Liu, Y.; Yang, C.; Liu, Q.; Xu, M.; Zhang, C.; Cheng, L.; Wang, W. PDPHE: Personal Data Protection for Trans-Border Transmission Based on Homomorphic Encryption. Electronics 2025, 13, 1959. [Google Scholar] [CrossRef]
  14. Kaya, M.; Shahid, H. Cross-border data flows and digital sovereignty: Legal dilemmas in transnational governance. Interdiscip. Stud. Soc. Law Politics 2025, 4, 219–233. [Google Scholar] [CrossRef]
  15. Androulaki, E.; Barger, A.; Bortnikov, V.; Cachin, C.; Christidis, K.; De Caro, A.; Enyeart, D.; Ferris, C.; Laventman, G.; Manevich, Y.; et al. Hyperledger fabric: A distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth EuroSys Conference; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1–15. [Google Scholar] [CrossRef]
  16. Berger, C.; Toumia, S.B.; Reiser, H.P. Scalable Performance Evaluation of Byzantine Fault-Tolerant Systems Using Network Simulation. In Proceedings of the IEEE Pacific Rim International Symposium on Dependable Computing, Singapore, 24–27 October 2023. [Google Scholar] [CrossRef]
  17. Zhao, F.; Ding, H.; Li, C.; Su, Z.; Liang, G.; Yang, C. A Blockchain-Based Efficient Cross-Domain Authentication Scheme for Internet of Vehicles. Comput. Mater. Contin. 2024, 80, 567. [Google Scholar] [CrossRef]
  18. Suzuki, K.; Yokozeki, D. Data governance for achieving data sharing in the IOWN era. NTT Tech. Rev. 2023, 21, 49–54. [Google Scholar] [CrossRef]
  19. Otto, B.; Jarke, M. Designing a multi-sided data platform: Findings from the International Data Spaces case. Electron. Mark. 2019, 29, 561–580. [Google Scholar] [CrossRef]
  20. Sun, L.-S.; Bai, X.; Zhang, C.; Li, Y.; Zhang, Y.-B.; Guo, W.-Q. BSTProv: Blockchain-Based Secure and Trustworthy Data Provenance Sharing. Electronics 2022, 11, 1489. [Google Scholar] [CrossRef]
  21. Yu, X.; Xie, Y.; Xu, Q.; Xu, Z.; Xiong, R. Secure data sharing for cross-domain industrial IoT based on consortium blockchain. In Proceedings of the 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Rio de Janeiro, Brazil, 24–26 May 2023; pp. 1508–1513. [Google Scholar] [CrossRef]
  22. Maymounkov, P.; Mazieres, D. Kademlia: A peer-to-peer information system based on the xor metric. In Peer-to-Peer Systems: First International Workshop, IPTPS 2002, Cambridge, MA, USA, March 7–8, 2002, Revised Papers; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar] [CrossRef]
  23. Shafagh, H.; Burkhalter, L.; Hithnawi, A.; Duquennoy, S. Towards blockchain-based auditable storage and sharing of iot data. In Proceedings of the 2017 Cloud Computing Security Workshop; Association for Computing Machinery: New York, NY, USA, 2017; pp. 45–50. [Google Scholar] [CrossRef]
  24. Bai, Y.; Yang, J.; Lyu, A.; Jia, X.; Yuan, B.; Feng, J.; Tong, J. Trusted data space: Conceptual connotation, technical architecture and construction paths. In Proceedings of the 2025 IEEE International Conference on High Performance Computing and Communications (HPCC), Exeter, UK, 13–15 August 2025. [Google Scholar] [CrossRef]
  25. Böttger, T.; Cuadrado, F.; Antichi, G.; Fernandes, E.L.; Tyson, G.; Castro, I.; Uhlig, S. An empirical study of the cost of DNS-over-HTTPS. In Proceedings of the Internet Measurement Conference; Association for Computing Machinery: New York, NY, USA, 2019; pp. 15–28. [Google Scholar] [CrossRef]
  26. Hoffman, P.; McManus, P. RFC 8484—DNS Queries Over HTTPS (DoH); Internet Engineering Task Force: Fremont, CA, USA, 2018. [Google Scholar] [CrossRef]
  27. Damas, J.; Graff, M.; Vixie, P. RFC 6891—Extension Mechanisms for DNS (EDNS(0)); Internet Engineering Task Force: Fremont, CA, USA, 2013. [Google Scholar] [CrossRef]
  28. Kinnear, E.; McManus, P.; Pauly, T.; Rose, K.; Wood, C.A. RFC 9230—Oblivious DNS Over HTTPS (ODoH); Internet Engineering Task Force: Fremont, CA, USA, 2022. [Google Scholar] [CrossRef]
  29. Singanamalla, S.; Chunhapanya, S.; Hoyland, J.; Vavruša, M.; Verma, T.; Wu, P.; Fayed, M.; Heimerl, K.; Sullivan, N.; Wood, C. Oblivious DNS over HTTPS (ODoH): A Practical Privacy Enhancement to DNS. Proc. Priv. Enhancing Technol. 2021, 2021, 575–592. [Google Scholar] [CrossRef]
  30. Ali, B.; Chen, G. Next-generation AI for advanced threat detection and security enhancement in DNS over HTTPS. J. Netw. Comput. Appl. 2025, 244, 104326. [Google Scholar] [CrossRef]
  31. Ali, B.; Chen, G. E3-DoH: Enhanced evolutionary encryption for DNS-over-HTTPS, DNS-over-TLS, and DNS-over-QUIC. Inf. Sci. 2026, 746, 123430. [Google Scholar] [CrossRef]
  32. Lin, I.-C.; Yeh, I.-L.; Chang, C.-C.; Liu, J.-C.; Chang, C.-C. Designing a Secure and Scalable Data Sharing Mechanism Using Decentralized Identifiers (DID). Comput. Model. Eng. Sci. 2024, 141, 809–822. [Google Scholar] [CrossRef]
  33. Wang, R.; Pan, H.; Deng, X.; Li, Y.; Li, C.; Fan, D.; Guo, X. Blockchain and DID-based cross-domain identity authentication and maintenance in Web3. In Proceedings of the 2024 4th International Conference on Blockchain Technology and Information Security (ICBCTIS), Wuhan, China, 17–19 August 2024; pp. 120–126. [Google Scholar] [CrossRef]
  34. Zhang, L.; Huang, Y.; Nie, J.; Wang, K. Cross-Domain Authentication Scheme Based on Blockchain and Consistent Hash Algorithm for System-Wide Information Management. Comput. Mater. Contin. 2023, 77, 1467–1488. [Google Scholar] [CrossRef]
  35. Wang, F.; Cui, J.; Zhang, Q.; He, D.; Gu, C.; Zhong, H. Blockchain-Based Lightweight Message Authentication for Edge-Assisted Cross-Domain Industrial Internet of Things. IEEE Trans. Dependable Secur. Comput. 2024, 21, 1587–1604. [Google Scholar] [CrossRef]
  36. Tong, F.; Chen, X.; Wang, K.; Zhang, Y. CCAP: A Complete Cross-Domain Authentication based on Blockchain for Internet of Things. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3789–3800. [Google Scholar] [CrossRef]
  37. Zheng, J.; Xu, M.; Li, J.; Chen, B.; Tan, Z.; Wang, A.; Zhang, S.; Liu, Y.; Zhang, K.Q.; Zheng, L.; et al. STALE: A Scalable and Secure Trans-Border Authentication Scheme Leveraging Email and ECDH Key Exchange. Electronics 2025, 14, 2399. [Google Scholar] [CrossRef]
  38. Zhao, S.; Cao, L.; Li, J.; Wan, J.; Bai, J. Cross-Domain Data Traceability Mechanism Based on Blockchain. Comput. Mater. Contin. 2023, 76, 2531–2549. [Google Scholar] [CrossRef]
  39. Li, F.; Zhao, Y.; Zhang, K.; Xu, H.; Wang, Y.; Wang, D. Blockchain-based lightweight trusted data interaction scheme for cross-domain IIoT. Digit. Commun. Netw. 2025, 11, 1192–1204. [Google Scholar] [CrossRef]
  40. Hoofnagle, C.J.; van der Sloot, B.; Borgesius, F.Z. The European Union general data protection regulation: What it is and what it means. Inf. Commun. Technol. Law 2019, 28, 65–98. [Google Scholar] [CrossRef]
  41. Demetzou, K. Data Protection Impact Assessment: A tool for accountability and the unclarified concept of ‘high risk’ in the General Data Protection Regulation. Comput. Law Secur. Rev. 2019, 35, 105342. [Google Scholar] [CrossRef]
  42. Zafar, A. Reconciling blockchain technology and data protection laws: Regulatory challenges, technical solutions, and practical pathways. J. Cybersecur. 2025, 11, tyaf002. [Google Scholar] [CrossRef]
  43. Xu, Z.; Cao, S. Multi-Source Data Privacy Protection Method Based on Homomorphic Encryption and Blockchain. Comput. Model. Eng. Sci. 2024, 136, 861–881. [Google Scholar] [CrossRef]
  44. De Viti, R.; Sheff, I.; Glaeser, N.; Dinis, B.; Rodrigues, R.; Bhattacharjee, B.; Hithnawi, A.; Garg, D.; Druschel, P. CoVault: Secure, scalable analytics of personal data. In Proceedings of the 34th USENIX Security Symposium; USENIX Association: Berkeleym, CA, USA, 2025. [Google Scholar]
  45. Gu, X.; Li, M.; Xiong, L. DP-BREM: Differentially-private and Byzantine-robust federated learning with client momentum. In Proceedings of the 34th USENIX Security Symposium; USENIX Association: Berkeleym, CA, USA, 2025. [Google Scholar]
  46. Xu, M.; Chen, B.; Tan, Z.; Chen, S.; Wang, L.; Liu, Y.; San, T.I.; Fong, S.W.; Wang, W.; Feng, J. AHAC: Advanced Network-Hiding Access Control Framework. Appl. Sci. 2024, 14, 5593. [Google Scholar] [CrossRef]
  47. Tan, Z.; Xue, C.; Liu, Y.; Yang, C.; Xu, M.; Yang, X.; Fai, T.K.; Yu, H.; Zheng, L.; Wang, W. SDTUC: A Software-Defined Networking and SRv6-Enabled Framework for Enhanced Dataspace Security and Data Transmission Traceability. In Proceedings of the 2025 IEEE 11th World Forum on Internet of Things (WF-IoT), Chengdu, China, 27–30 October 2025. [Google Scholar] [CrossRef]
  48. Yin, J.; Martin, J.P.; Venkataramani, A.; Alvisi, L.; Dahlin, M. Separating agreement from execution for byzantine fault tolerant services. ACM SIGOPS Oper. Syst. Rev. 2003, 37, 253–267. [Google Scholar] [CrossRef]
  49. Greenberg, A.; Hjalmtysson, G.; Maltz, D.A.; Myers, A.; Rexford, J.; Xie, G.; Yan, H.; Zhang, J.; Zhang, H. A clean slate 4D approach to network control and management. ACM SIGCOMM Comput. Commun. Rev. 2005, 35, 41–54. [Google Scholar] [CrossRef]
  50. Kumar, D.; Yadulla, A.R.; Bhuvanesh, A.; Pawar, P.; Kasula, V.K.; Keerthanadevi, R. Hierarchical Blockchain Framework for Node Authentication in IoT Networks: A Comprehensive Analysis. In Proceedings of the 2025 International Conference in Advances in Power, Signal, and Information Technology (APSIT), Bhubaneswar, India, 23–25 May 2025; pp. 1–6. [Google Scholar] [CrossRef]
  51. Yin, M.; Malkhi, D.; Reiter, M.K.; Gueta, G.G.; Abraham, I. HotStuff: BFT consensus with linearity and responsiveness. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing; Association for Computing Machinery: New York, NY, USA, 2019; pp. 347–356. [Google Scholar] [CrossRef]
  52. Rescorla, E. RFC 8446—The Transport Layer Security (TLS) Protocol Version 1.3; Internet Engineering Task Force (IETF): Fremont, CA, USA, 2018. [Google Scholar] [CrossRef]
  53. gRPC. gRPC: A High-Performance, Open Source Universal RPC Framework. gRPC Documentation. 2025. Available online: https://grpc.io/ (accessed on 9 May 2026).
  54. Nicholson, N.; Caldeira, S.; Furtado, A.; Nicholl, C. Trusted data spaces as a viable and sustainable solution for networks of population-based patient registries. JMIR Public Health Surveill. 2023, 9, e34123. [Google Scholar] [CrossRef]
  55. Ma, Y.; Ma, R.; Lin, Z.; Miao, C.; Zhang, R.; Long, W.; Wu, W.; Wang, J. Distributed Split Single-Sideband Time-Modulated Arrays for Secure Communications. IEEE Internet Things J. 2026; early access. [CrossRef]
  56. Hu, X.; Lyu, S.; Zhao, C.; Liu, C.; Peng, M. Distributed IRS Beamforming for Secure Transmission. IEEE Trans. Veh. Technol. 2026; early access. [CrossRef]
Figure 1. The integrated system architecture and operational workflow of TDSR. The diagram illustrates the hierarchical mapping between the four-layer protocol stack from L1 to L4 and the two-phase registration and verification lifecycle. Module 1 and Module 2 execute the registration at the provider node, while Module 3 and Module 4 execute the cross-domain credential authentication and isolated statistical verification via the dual-plane infrastructure.
Figure 1. The integrated system architecture and operational workflow of TDSR. The diagram illustrates the hierarchical mapping between the four-layer protocol stack from L1 to L4 and the two-phase registration and verification lifecycle. Module 1 and Module 2 execute the registration at the provider node, while Module 3 and Module 4 execute the cross-domain credential authentication and isolated statistical verification via the dual-plane infrastructure.
Electronics 15 02079 g001
Figure 2. Sequence diagram of the registration and verification lifecycle. Phase 1 executes the two-phase commit for data anchoring. Phase 2 executes the oblivious retrieval and sandbox calculation. The architecture ensures raw data vectors remain isolated within the local boundary of N B .
Figure 2. Sequence diagram of the registration and verification lifecycle. Phase 1 executes the two-phase commit for data anchoring. Phase 2 executes the oblivious retrieval and sandbox calculation. The architecture ensures raw data vectors remain isolated within the local boundary of N B .
Electronics 15 02079 g002
Figure 3. Evaluation of cross-domain verification latency across regional paths. (a) Cumulative Distribution Function of TDSR execution latency for Guangdong–Hong Kong, Hong Kong–Macao, and Guangdong–Macao routes. (bd) Comparative analysis of TDSR, Baseline A (coupled BFT), and Baseline B (federated) for each respective route. P95 and P50 indicators define the response boundaries for 95% and 50% of queries.
Figure 3. Evaluation of cross-domain verification latency across regional paths. (a) Cumulative Distribution Function of TDSR execution latency for Guangdong–Hong Kong, Hong Kong–Macao, and Guangdong–Macao routes. (bd) Comparative analysis of TDSR, Baseline A (coupled BFT), and Baseline B (federated) for each respective route. P95 and P50 indicators define the response boundaries for 95% and 50% of queries.
Electronics 15 02079 g003
Figure 4. Transaction throughput response across varying feature dimensionality. The decoupled TDSR architecture sustains a constant throughput of 2400 TPS across all payload dimensions, whereas the coupled ledger as Baseline A experiences severe degradation, validating the dimensionality invariance.
Figure 4. Transaction throughput response across varying feature dimensionality. The decoupled TDSR architecture sustains a constant throughput of 2400 TPS across all payload dimensions, whereas the coupled ledger as Baseline A experiences severe degradation, validating the dimensionality invariance.
Electronics 15 02079 g004
Figure 5. Accumulated system overhead over a 30-day continuous deployment. The dashed lines indicate the operational failure thresholds for ledger storage (3000 MB) and network traffic (15,000 MB). The TDSR avoids ledger saturation and network congestion through protocol-enforced minimisation bounds.
Figure 5. Accumulated system overhead over a 30-day continuous deployment. The dashed lines indicate the operational failure thresholds for ledger storage (3000 MB) and network traffic (15,000 MB). The TDSR avoids ledger saturation and network congestion through protocol-enforced minimisation bounds.
Electronics 15 02079 g005
Figure 6. System finality latency under simulated network churn and Byzantine faults. The trust plane maintains stable block finality provided f < n / 3 . The data plane absorbs the 30% node churn via multi-node pinning redundancy, confirming continuous operation under wide-area network degradation.
Figure 6. System finality latency under simulated network churn and Byzantine faults. The trust plane maintains stable block finality provided f < n / 3 . The data plane absorbs the 30% node churn via multi-node pinning redundancy, confirming continuous operation under wide-area network degradation.
Electronics 15 02079 g006
Table 1. Key notations and cryptographic primitives defining the TDSR system model.
Table 1. Key notations and cryptographic primitives defining the TDSR system model.
NotationDefinitionSystem Component
N A , N B Data provider node and verifier nodeRegistration node layer
D , d Source data asset and local data subset ( d D )Feature extraction interface
F D ( t ) Fingerprint extracted from D at time tFingerprint payload ( μ D , Σ D )
D ( d , D ) Boolean provenance decision output { 0 , 1 } Isolated sandbox execution
Ω Encapsulated UDRI pointer resolution payloadODAR protocol
D M 2 ( x ) Squared Mahalanobis distanceSandbox hypothesis testing
Table 2. TDSR protocol stack definition and UDRI structural mapping.
Table 2. TDSR protocol stack definition and UDRI structural mapping.
Architecture LayerExecuting ProtocolsSegmentProtocol Function
L4: Feature ExtractionFingerprint Encapsulation Protocol<Cont_ID>Indexes the extracted fingerprint payload via the DHT.
L3: Registration NodeTwo-Phase Commit; Isolated Sandbox<Node_ID>Maps to DID for cross-domain node authentication.
L2: InteroperabilityODAR Routing; DID/VC Authentication<Domain_ID>Isolates legal jurisdiction boundary and executes identity discovery.
L1: InfrastructureBFT Consensus; IPFS DHT Protocol<Infra_ID>Targets infrastructure, isolating state ledger from datastore.
Table 3. Composition of the trans-border payload per verification query (≈5.8 KB total including protocol overhead). The total remains constant and independent of the source dataset D and query subset d.
Table 3. Composition of the trans-border payload per verification query (≈5.8 KB total including protocol overhead). The total remains constant and independent of the source dataset D and query subset d.
ComponentSize (Bytes)Content
UDRI request128〈Infra_ID〉:〈Domain_ID〉:〈Node_ID〉:〈Cont_ID〉
DID credential850Verifiable Credential and public key
Statistical payload4096Fingerprint Ω = ( μ D , Σ D ) with k = 16
Total5074(Protocol overhead brings total to ≈5.8 KB)
Table 4. Ablation analysis isolating the impact of specific architectural boundaries. Removing the storage–consensus boundary causes throughput collapse, while bypassing the intent–routing boundary breaches data minimisation mandates.
Table 4. Ablation analysis isolating the impact of specific architectural boundaries. Removing the storage–consensus boundary causes throughput collapse, while bypassing the intent–routing boundary breaches data minimisation mandates.
Model VariantAblated Component (Removed)Throughput (TPS)Latency (ms)Payload
w/o Storage–ConsensusDual-Plane Topology<10>12,0005.8 KB
w/o Intent–RoutingODAR and DHT Routing1504500≥10 MB
Full TDSRNone (All Boundaries Active)240038205.8 KB
Table 5. Comparison of architectural properties and operational boundaries between foundational distributed paradigms and the proposed TDSR.
Table 5. Comparison of architectural properties and operational boundaries between foundational distributed paradigms and the proposed TDSR.
Evaluation MetricCoupled Ledger (Baseline A)In-Band Federated (Baseline B)TDSR (Ours)
Cross-Domain RoutingState-embedded resolution, global intent exposedApplication gateways, relies on centralised connectorsOblivious UDRI resolution via IPFS DHT, intent hidden
Dimensionality InvarianceSevere throughput degradation, payload-coupled consensusUnbounded; dependent on gateway capacitySustains constant ∼2400 TPS across varying scales
Trans-Border LeakageReplicated across all consensus nodesScales linearly with queried data volume O ( N ) Bounded constant payload O ( 1 ) (≤5.8 KB per query)
Verification ExecutionHigh on-chain compute cost for multidimensional matricesDependent on bilateral data sharing agreementsIsolated local sandbox execution; algorithm-agnostic
Network ResilienceState bloating under high-frequency registrationGateway bottlenecks under high concurrencyBFT consensus ( f < n / 3 ) with multi-node DHT pinning
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yang, X.; Xie, J.; Deng, W.; Zhang, C.; Ren, J.; Liu, S.; Lei, W.I.; Wang, W.; Wang, W. TDSR: Distributed Data Asset Registration and Cross-Jurisdictional Verification in Trusted Data Spaces. Electronics 2026, 15, 2079. https://doi.org/10.3390/electronics15102079

AMA Style

Yang X, Xie J, Deng W, Zhang C, Ren J, Liu S, Lei WI, Wang W, Wang W. TDSR: Distributed Data Asset Registration and Cross-Jurisdictional Verification in Trusted Data Spaces. Electronics. 2026; 15(10):2079. https://doi.org/10.3390/electronics15102079

Chicago/Turabian Style

Yang, Xingxing, Jieling Xie, Weiping Deng, Chi Zhang, Junqi Ren, Shuang Liu, Wai Ip Lei, Wei Wang, and Wenyong Wang. 2026. "TDSR: Distributed Data Asset Registration and Cross-Jurisdictional Verification in Trusted Data Spaces" Electronics 15, no. 10: 2079. https://doi.org/10.3390/electronics15102079

APA Style

Yang, X., Xie, J., Deng, W., Zhang, C., Ren, J., Liu, S., Lei, W. I., Wang, W., & Wang, W. (2026). TDSR: Distributed Data Asset Registration and Cross-Jurisdictional Verification in Trusted Data Spaces. Electronics, 15(10), 2079. https://doi.org/10.3390/electronics15102079

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop