1. Introduction
The digital economy relies on the circulation of data as a factor of production. The exchange of data resources involves the confirmation, registration, and cross-domain verification of Data Asset Ownership. Trusted Data Spaces establish distributed networks for this circulation, which are distributed, sovereignty-preserving infrastructures that enable controlled data sharing and circulation across organisational and jurisdictional boundaries. Providers retain control over datasets, including Domain Name System (DNS) recursive logs [
1,
2,
3,
4]. Trans-border data mobility requires a lifecycle for operations. A data provider at node
extracts the fingerprint
from a source asset
D to secure a cryptographic registration anchor. A transaction transfers a subset
d to a consumer at node
, and the receiving node
initiates a provenance verification. The structure computes a decision
to prove the condition
. Proving this condition ensures the traceability of ownership. Data minimisation mandates impose operational constraints. Executing the verification sequence occurs without exposing the source asset
D to trans-border network transmission. Systems maintain a consensus state across untrusted domains. Executing cross-jurisdictional verification exposes a structural conflict between ownership provenance prerequisites and data minimisation mandates.
Existing frameworks are unable to resolve this conflict. They couple in-band payload routing with synchronous state ledgers, causing communication overheads that scale with data volume. The primary issue is the in-band coupling of large data payloads with synchronous state consensus ledgers, causing replication of feature matrices across all consensus nodes, resulting in communication overheads that scale linearly with data volume. Federated models manage data exchange via application gateways, where payload data and routing instructions traverse identical network channels [
5]. This configuration exposes the verification intent to the host node. Information leakage across jurisdictional boundaries scales linearly with the volume of the queried data. Blockchain technologies provide state registration. Designs couple state consensus with data storage. Consensus nodes replicate feature matrices to maintain the ledger. This replication impacts network bandwidth. Transaction throughput varies with data dimensionality. Identification protocols navigate routing requirements. Decentralised identifiers (DIDs) authenticate cryptographic keys. Standard formats omit metadata regarding jurisdictional boundaries and physical storage planes [
6]. Resolution involves centralised infrastructure or bilateral legal agreements. Verification mechanisms govern the proof of provenance. Cryptographic protocols involve interactive parameter exchanges between nodes. This interaction generates latency across networks. Distributed storage protocols lack state transition mechanisms to prevent double-registration [
7]. Systems balance requirements for volume, authentication, and minimisation.
We propose the Trusted Data Space with Registration (TDSR), which implements a four-layer protocol stack defined by isolation boundaries, privacy–legal, intent–routing, and storage–consensus, for cross-jurisdictional data circulation. The infrastructure layer employs a dual-plane topology. Unlike existing Trusted Data Spaces that rely on in-band payload routing through centralised connectors, TDSR achieves constant-bounded leakage and scalability independent of data dimensionality. This topology partitions the physical network. A data plane executes asynchronous payload distribution. A trust plane executes synchronous state consensus [
8,
9]. The Unified Data Resource Identifier (UDRI) protocol binds the cross-layer operations by acting as a pointer encapsulating the infrastructure target, jurisdictional domain, registrar node, and ownership fingerprint [
10,
11]. The protocol stack executes the lifecycle driven by the Oblivious Data Asset Registration (ODAR) mechanism. The ODAR protocol coordinates out-of-band identity discovery. Verification shifts to local execution environments. The architecture supports fingerprint representations regardless of the extracting methods. The proposal enforces a mathematical bound on data leakage during trans-border data circulation [
12,
13]. It should be noted that its median response time of 3.82 s positions TDSR primarily for ownership audits and regulatory compliance verification rather than real-time streaming applications.
The content and contributions of this research are:
Architecture for TDSR: The architecture establishes isolation boundaries. The dual-plane topology establishes a decoupled storage–ledger mechanism, partitioning asynchronous payload datastores and synchronous consensus ledgers. This physical separation prevents ledger saturation, sustaining throughput independent of data dimensionality.
UDRI pointer and ODAR protocol: The UDRI pointer and the ODAR protocol standardise asset resolution. This combination enables out-of-band cross-domain routing without exposing verifier intent, bypassing the transmission scaling observed in in-band systems.
Interface contract and leakage bound: The architecture defines an algorithm-agnostic mathematical contract at the feature extraction interface. This operational sequence shifts hypothesis testing to isolated sandboxes, capping external data transit at a constant leakage bound.
The remainder of this paper is structured as follows.
Section 2 reviews related work in data spaces and data asset registration.
Section 3 defines the system model and formulation.
Section 4 details the TDSR architecture, with the dual-plane mechanism, UDRI specification, and algorithmic executions.
Section 5 evaluates system performance from the GBA deployment practice.
Section 6 discusses advantages and limitations.
Section 7 provides the conclusion and outlines future work.
3. Theoretical Formulation and System Model
Trans-border data registration and verification present a multidimensional optimisation problem constrained by network throughput, regulatory compliance, and adversarial threats. The model operates under a Dolev–Yao communication model and a
Byzantine fault tolerance threshold, excluding hardware side-channels and volumetric Distributed Denial of Service attacks. This section establishes the theoretical formulation for the TDSR architecture. We formulate the topological model to resolve the consensus bottleneck, partitioning the network graph into a synchronous trust plane and an asynchronous data plane. We define the cross-layer resolution protocol, where a mapping function establishes deterministic addressing across jurisdictional domains without central authorities. We formalise the statistical provenance algorithm as a multivariate hypothesis test, proving that the operational execution confines trans-border information leakage to a constant
boundary. These theoretical formulations establish the system prerequisites, dictating the structural isolation boundaries engineered in the protocol stack.
Table 1 details the formal notations.
3.1. Dual-Plane Topology Formulation
We model the multi-jurisdictional Trusted Data Space as a distributed network graph
, where
denotes the set of participating nodes across divergent legal domains and
represents the inter-jurisdictional communication links. Within a unified architecture, data asset registration relies on State Machine Replication via BFT consensus [
8]. Let
be the number of consensus nodes and
be the payload volume of the extracted fingerprint. The communication complexity
required to reach finality for a single registration block is bounded by:
As the feature dimensionality
k of the data asset expands, the matrix size
scales, causing
to saturate the network bandwidth and degrade finality latency. The infrastructure must decouple the state consensus from the payload storage [
48]. We formalise this decoupling mechanism by partitioning the network graph
into two isolated planes:
functions as a synchronous ledger processing constant-size cryptographic pointers (), reducing the consensus complexity to . operates as an asynchronous DHT dedicated to absorbing the variable-size payload. This topological separation manifests as the Layer 1 Infrastructure boundary in the protocol stack.
This dual-plane formulation provides the physical foundation for the isolation boundaries. By decoupling state consensus from payload storage, the topology ensures that the fingerprints generated at the feature extraction interface are the only assets entering the synchronous control plane, maintaining the structural separation of concerns.
Adversarial model: The system operates under the Dolev–Yao model with Byzantine tolerance. A malicious verifier cannot learn the raw data or the verifier’s intent because only the UDRI hash is queried over the DHT. The leakage bound prevents information leakage beyond the statistical manifold. Sybil attacks are mitigated by the Cryptographically Generated Identifier mechanism, which requires regulatory credentials.
Definition of “oblivious” in TDSR: In this work, the term “oblivious” refers to intent concealment: during a UDRI lookup over the IPFS DHT, the querier retrieves the fingerprint payload by its content hash without revealing to the data provider or intermediate nodes which subset d is being verified or for what purpose. This is achieved through the standard content-addressed retrieval of IPFS DHT, not via cryptographic primitives such as Oblivious Transfer or Oblivious RAM. The out-of-band design further ensures that the provider node remains unaware of any verification activity initiated by verifier .
3.2. Hierarchical Architecture with Pointer and Resolution
To enable deterministic resolution across the decoupled planes (
) without relying on centralised identification systems [
10], we define the UDRI protocol. Let
denote the universe of all valid UDRI strings. We define the resolution mechanism as a cross-layer protocol mapping function
:
where
represents the verifiable compliance credential anchored on
, and
represents the content payload addressed on
. The input UDRI string is formulated as a 4-tuple:
By executing , a verifier node in an external jurisdiction authenticates the identity of the provider via the trust plane and retrieves the statistical parameters from the data plane. This mapping dictates the four-layer protocol stack hierarchy.
The UDRI protocol acts as a persistent pointer that resolves across the hierarchical layers. In our methodology, the resolution process terminates at the feature extraction interface, where it triggers the respective representation or verification functions. This ensures that the pointer resolves to a verifiable functional interface rather than a raw data locator [
10].
3.3. Statistical Provenance and Hypothesis Testing Formulation
To execute the registration and verification lifecycle, the architecture must prove that a purported data subset d held by the verifier originates from the registered source asset D held by the provider . We formulate this as a multivariate statistical hypothesis testing problem.
Feature extraction: Let the source asset D be represented by a multivariate distribution. The provider applies an extraction function to generate the fingerprint , where is the mean vector and is the covariance matrix of dimension . This tuple is encapsulated into .
Hypothesis testing:
establishes the null hypothesis
. Upon retrieving
via the UDRI pointer, the isolated sandbox at
computes the sample mean
of the local subset
d. The distance
at time
t is calculated using the time-dependent mean vector
and the covariance matrix
:
For clarity in the operational phase, let
denote this instantaneous distance under the fixed statistical manifold
retrieved via the UDRI pointer. The system defines a threshold
based on a predefined significance level
and degrees of freedom
k, where
. The isolated sandbox then computes the deterministic Boolean decision
:
The outcome validates the cross-domain provenance constraint.
The hypothesis testing framework established here constitutes the mathematical contract for feature extraction interfaces. By formalising verification as a statistical distance threshold , the methodology allows the verification interface to remain implementation-agnostic, supporting diverse representation algorithms as long as they satisfy the established provenance consistency.
3.4. Trans-Border Information Leakage Bound
To satisfy multi-jurisdictional data minimisation mandates, the system enforces an upper bound on trans-border information leakage [
40]. Let
denote the total volume of raw data crossing the legal boundary during a complete verification sequence.
Theorem 1. The trans-border communication complexity of the TDSR verification lifecycle is bounded by a constant with respect to the raw source asset volume D and the queried subset volume d.
Proof. Let
query the provenance of
d against
D. The verification is executed within the local sandbox of
. The cross-domain payload exchange is restricted to the transmission of the UDRI pointer request and the retrieval of the payload from
. The total trans-border leakage function is defined as:
The size of the UDRI is a fixed string. The Boolean decision
requires 1 bit. The feature dimensionality
k is a fixed parameter, predefined and constant across all nodes. The size of
(a
k-dimensional vector) and
(a
matrix) is a structural constant
, independent of the cardinality of
D or
d.
is determined by the dimensions of the
covariance matrix
and the
k-dimensional vector
.
depends on the predefined feature dimensionality
k, and it is independent of the cardinality of either the subset
or the source
D. For any established feature extraction model, the payload is a structural constant
:
This proof demonstrates that the architecture guarantees data minimisation. To achieve this constant bound without relying on interactive Multi-Party Computation [
13], the architecture is forced to execute the calculation
. This theoretical formulation requirement mandates the isolation boundary in the architecture. □
These formulations define the operational limits for cross-domain data exchange. The required consensus complexity, cross-layer mapping, and communication bound establish the structural prerequisites. These prerequisites will support the four-layer architecture proposal with isolation boundaries and key components.
This information leakage bound is enforced by the privacy–legal boundary. Since the architecture restricts network inputs to low-dimensional statistical representations
, the mutual information
is kept at a constant complexity, independent of the raw dataset’s cardinality
n [
39].
The verification process operates within a bounded two-phase commit protocol. Time constraints are introduced to ensure system liveness. represents the time elapsed since the verification request initiation. The threshold defines the maximum time for consensus. If , then the transaction is aborted to prevent network congestion. The system state machine executes these temporal bounds to maintain liveness and trigger the defined rollback sequence upon timeout.
3.5. State Transition Model for Operational Workflow
To provide a theoretical foundation for the end-to-end execution lifecycle, the trans-border registration and verification process is operationalised as a sequential state machine , where represents the set of system states, denotes the protocol events, and is the transition function. The execution integrates temporal boundary checks to resolve the non-deterministic latency inherent in wide-area networks.
The execution lifecycle is partitioned into two non-overlapping phases, enforcing the isolation boundaries:
- 1.
Registration phase (
): Maps an unregistered raw asset state
to an anchored state
via the extraction and consensus events:
The transition is conditioned upon the consensus finality. If the trust plane confirms the anchor within the predefined temporal bound , the system transitions to . Otherwise, the rollback function prunes the local datastore to ensure state integrity. This phase guarantees that only the pointer metadata enters the synchronous ledger state.
- 2.
Verification phase (
): Maps a query event originating from an external jurisdiction to a definitive Boolean provenance decision, conditioned upon the anchored state:
This formal state separation ensures that the verification transition can operate on the retrieved payload , preempting the necessity of querying the raw asset state . This guarantees the structural integrity of the workflow and enforces the trans-border information leakage bound established in Theorem 1.
3.6. Security Analysis
Verification of the ODAR protocol applies Burrows–Abadi–Needham logic to establish the formal belief of verifier
in the authenticity and freshness of the fingerprint
from provider
. Let
(
P believes
X),
(
P sees
X),
(
P once said
X),
(
P has jurisdiction over
X), and
(
X is fresh). The interaction is idealised as a singular message transfer via the UDRI pointer:
The verification targets two terminal goals:
The derivation relies on three initial assumptions mapped to the theoretical boundaries defined in prior sections:
Assumption 1 ()). The cross-domain identity resolution mechanism enforces that belongs to .
Assumption 2 ()). The temporal boundary during the state transition consensus ensures is fresh.
Assumption 3 ()). The feature extraction interface, which isolates raw datasets from external queries, establishes that has jurisdiction over its registered asset Ω.
Step 1, message meaning: Given that
holds
Assumption 1 and sees the signed payload:
Step 2, freshness promotion: Applying the freshness rule to
Assumption 2:
Step 3, nonce verification: Combining freshness and origin establishes belief in
’s current state Goal 1:
Step 4, jurisdiction application: Applying Assumption 3 over the belief from Step 3 confirms Goal 2:
The deduction demonstrates that confirms ownership provenance, maintaining the minimisation constraints of the architecture.
4. The Proposed TDSR Architecture
We operationalise the methodology to construct the TDSR architecture. To specify the hierarchical architecture, we isolate data processing, identity resolution, and state consensus. We propose the protocol stack, formalise the identification, design the dual-plane network topology and define the decoupled storage and ledger mechanism. We demonstrate the detailed execution algorithms for distributed data registration and cross-domain verification.
4.1. Design Principles
The development of the TDSR architecture is guided by three fundamental principles to resolve the tension between data sovereignty and distributed scalability:
- 1.
Principle 1: data minimisation. The protocol stack enforces minimality by restricting trans-border communication to constant-size pointers and fingerprints, ensuring the mutual information between the source data and network traffic is bounded.
- 2.
Principle 2: decoupled representation. The system isolates the feature extraction logic from the state consensus layer. This decoupling ensures that algorithmic complexity at Layer 4 does not impact the throughput or finality of the trust plane at Layer 1.
- 3.
Principle 3: cross-domain interoperability. By utilising the UDRI pointer and a dual-plane topology, the architecture enables deterministic asset resolution across heterogeneous jurisdictional boundaries without requiring a centralised certificate authority.
4.2. The Four-Layer Architecture and Isolation Boundaries
TDSR is structured into a four-layer architecture, operationalising the functional separation of data processing, identity resolution, and state consensus. This hierarchical abstraction partitions the protocol stack into discrete layers (L1–L4) to enforce the isolation boundaries. As detailed in the integrated architecture in
Figure 1, the UDRI segments resolve to specific layers, while the infrastructure layer maintains the structural decoupling between the data payload datastore and the state consensus ledger.
4.2.1. L1: Infrastructure Layer
This base layer manages the physical network connections, the TCP/IP stack, and the cryptographic operations. It separates the network traffic into a data plane and a trust plane to decouple payload resolution from state consensus. It acts as the entry point for the cross-domain infrastructure queries.
4.2.2. L2: Cross-Domain Interoperability Layer
This middleware layer implements out-of-band identity discovery to support multi-jurisdictional verification. It executes the ODAR protocol to query the DHT [
28,
29]. It parses the jurisdictional boundaries to route requests before they reach the specific host endpoints.
4.2.3. L3: Registration Node Layer
This layer hosts the local registries and executes the core verification logic. It manages the local datastores and coordinates the two-phase commit sequence for asset registration [
38,
39]. During verification, this layer provisions the isolated execution sandboxes to perform the local
hypothesis tests.
4.2.4. L4: Feature Extraction Interface Layer
This layer interacts with the raw data assets, such as DNS recursive logs [
13]. It abstracts the feature extraction algorithms as a black-box input module to generate the fingerprint templates. It acts as the final resolution target for the content identifiers.
L4 establishes the mathematical and operational contract between raw data repositories and the control plane ledgers. By formalising this interface contract, TDSR treats feature extraction as a pluggable black-box, operating independently of the representation algorithm’s internal complexity. To enable trust evaluation without data exposure, L4 maps data assets to standard verification interfaces through the following core mechanisms:
Fingerprint Interface
The representation interface abstracts the underlying feature extraction process. Let D denote the raw dataset residing in the data plane. The function maps D to a low-dimensional statistical representation, or fingerprint, which captures the underlying distribution of the data without retaining raw records. To support downstream compliance checks, must satisfy the divisibility property, ensuring that any legitimate subset maintains a verifiable statistical relationship with the global fingerprint . This interface generates the structural payload that UDRI protocol points to, standardising the metadata format for the control plane.
Verification Interface
The verification interface provides a standard mechanism to authenticate data subsets across domain boundaries. Given a questioned subset d and the registered global fingerprint , the interface executes a Boolean or probabilistic evaluation . This operation is executed within isolated sandbox environments in the application layer. By confining the statistical distance computations like Mahalanobis distance to this interface, the architecture prevents the leakage of global parameters to external queriers. The algorithm-agnostic contract is enforced solely through the structural output format: any implementation of must produce a statistical manifold , where k is fixed by the consortium. The verification interface V consumes and a subset d, and outputs a decision without querying raw data. For example, a hash-based extractor can produce by calculating mean and covariance of byte-trigram count vectors from network logs; an ML embedding extractor can replace this with the mean and covariance of the penultimate layer activations. Both conform to the same interface, leaving the consensus and routing layers unaffected.
Decoupling Design
The structural separation of L4 establishes a decoupled registration mechanism. The TDSR architecture treats
and
V as pluggable modules. The control plane orchestrates the routing, consensus, and UDRI resolution without requiring awareness of the specific mathematical implementations of
or
V [
48]. This protocol decoupling ensures that systemic security and cross-jurisdictional standardisation of the chosen algorithmic feature extraction method are maintained [
33].
To ensure architectural modularity, we define the verifiable data asset interface as a functional contract that abstracts the interaction between the data plane and the control plane:
Extraction interface : A transformation function that compresses a high-dimensional dataset D into a constant-size fingerprint . This interface ensures that only privacy-preserving metadata is exported.
Verification interface V: An evaluation function that consumes a subset d and the anchored fingerprint to output a Boolean provenance decision.
By formalising this interface, the TDSR decouples the specific feature engineering logic from the network’s consensus and routing protocols, allowing for independent algorithmic upgrades.
4.2.5. Stratification Principles and Isolation Boundaries
This stratification is a structural necessity enforced by three constraints in cross-jurisdictional data circulation: data minimisation mandates, multi-domain trust mapping, and consensus ledger scalability [
12]. These constraints define three isolation boundaries as follows:
Privacy–legal boundary (L4 to L3): Data protection mandates prohibit raw asset transmission [
40]. L4 restricts network inputs to fingerprints. Raw data remains within the local jurisdiction.
Intent–routing boundary (L3 to L2): Identity resolution is decoupled from local computation [
29]. L2 manages cross-domain routing. L3 executes the logic within isolated sandboxes.
Storage–consensus boundary (L2/L3 to L1): High-volume payloads are separated from state ordering [
48]. L1 implements a dual-plane topology. Asynchronous datastores are isolated from synchronous consensus.
4.3. UDRI Identification and Protocol Stack Mapping
The UDRI functions as a cryptographic pointer that encapsulates the physical resolution path, the legal jurisdiction, the node identity, and the data ownership asset into a single string. We define the formal syntax as follows:
The interoperability middleware parses this UDRI to execute cross-domain discovery. The identifier resolves the logical node identity via the BFT ledger and locates the physical data payload via the IPFS DHT [
34].
This is an example UDRI string: udri:gba:hk:did:0x1a2b3c:Qmd…. Here, gba identifies the Greater Bay Area consortium infrastructure, hk specifies the legal jurisdiction, did:0x1a2b3c resolves the registrar node via the BFT ledger, and Qmd… is the Content Identifier for the fingerprint payload in the IPFS DHT. This structure enables oblivious routing because the verifier only performs a DHT lookup on the cryptographic hash of the Cont_ID component without revealing its intent or source IP address.
Governance of top-level identifiers: In the current GBA deployment, and are established through consortium governance: participating jurisdictions negotiate and sign a genesis configuration block that enumerates valid infrastructures and legal domains. Uniqueness is enforced by the BFT trust plane, which rejects duplicate registrations. For inter-consortium resolution across independent TDSR instances, a federation of registrars or existing international legal frameworks would serve as trust anchors, though full decentralisation of this top-level governance remains an open problem.
4.3.1. Infrastructure Identifier (<Infra_ID>)
The <Infra_ID> targets the specific BFT consortium network and IPFS swarm. It directs the middleware gRPC calls to the correct trust plane endpoint when multiple independent consortia operate within the same wide-area network.
4.3.2. Domain Identifier (<Domain_ID>)
The <Domain_ID> specifies the legal jurisdiction of the host node, for example, hk for Hong Kong or gd for Guangdong. It allows the verifier to audit the compliance boundary before initiating cross-border payload retrieval.
4.3.3. Node Identifier (<Node_ID>)
The <Node_ID> maps to the DID of the host node. The BFT ledger resolves this segment to retrieve the DID document and the associated VC to authenticate the node’s regulatory status.
4.3.4. Content Identifier (<Cont_ID>)
The <Cont_ID> corresponds to the hash identifier of the registered data asset. The middleware passes this segment to the data plane to initiate the oblivious DHT resolution for the fingerprint payload.
4.3.5. The Four-Layer Architecture Protocol Stack
Each layer executes defined protocols and processes Protocol Data Units. The UDRI string functions as the cross-layer binding agent [
10,
11].
Table 2 details the executed protocols at each operational layer and maps them to their corresponding UDRI resolution segments.
The protocol stack dictates a data processing sequence. During registration, L4 protocols generate the payload, L3 protocols coordinate the anchoring state, and L1 protocols execute the physical distribution. During verification, L1 retrieves the network state, L2 protocols authenticate the cross-domain credentials, and L3 protocols execute the hypothesis test. Each layer resolves a specific segment of the UDRI 4-tuple [
34].
L1: Infrastructure layer: Resolves the <
>. It manages physical connections and separates traffic into the trust and data planes [
49].
L2: Cross-domain interoperability layer: Resolves the <
>. It implements out-of-band identity discovery and executes decentralised credential authentication [
33].
L3: Registration node layer: Resolves the <
>. It manages local registries and coordinates the two-phase commit sequence for asset anchoring [
38].
L4: Feature extraction interface: Resolves the <
>. It interacts with raw data to generate fingerprints
[
7].
Traditional network identifiers do not capture the dual-plane semantics of a distributed data space. Building upon the paradigm of Named Data Networking where data is addressed by secure name rather than host location [
11], we define a custom protocol identifier, the UDRI, to bridge the decoupled data and trust mechanisms. Unlike DIDs or Uniform Resource Identifiers, which embed or resolve full payload locations and often require in-band data transfer, UDRI functions as an oblivious cryptographic pointer that resolves only the statistical fingerprint
via out-of-band DHT lookup. This design decouples metadata resolution from payload transport, achieving constant-bounded leakage and enabling cross-jurisdictional verification without exposing raw data or intent. This approach is further supported by recent hierarchical blockchain frameworks for node authentication in IoT networks [
50], which provide complementary mechanisms for decentralised identity management and Sybil-resistant verification.
4.3.6. UDRI Lifecycle Management
The UDRI lifecycle governs the cryptographic binding across the four functional segments:
<Infra_ID>,
<Domain_ID>,
<Node_ID>, and
<Cont_ID>. Management operations are synchronised through the trust plane to ensure cross-jurisdictional consistency [
21,
36].
Initialisation and registration: During the prefix phase, the <Infra_ID> and <Domain_ID> are established through consortium governance protocols, anchoring the infrastructure type and jurisdictional boundary. The <Node_ID> is bound to a provider’s identity via a BFT-backed registration transaction. The <Cont_ID> is generated upon the successful pinning of the data payload on the IPFS plane, completing the four-segment mapping.
Revocation and suspension: To invalidate a data asset, the provider issues a revocation certificate to the trust plane. This operation marks the <Cont_ID> as ‘inactive’ within the global state trie. While the raw data may persist in the IPFS DHT, the interoperability layer blocks all resolution requests for the associated UDRI, terminating the asset’s lifecycle for cross-domain verification.
Transferability and migration: Ownership migration involves re-binding a <Cont_ID> to a new <Node_ID> or <Domain_ID>. This is executed via a cross-shard transaction on the trust plane that updates the pointer reference without altering the underlying data fingerprint. For infrastructure migrations, the <Infra_ID> is updated, and a new UDRI is issued with a pointer to the historical version to maintain provenance continuity.
Termination and rollback: If a registration fails to reach consensus within the predefined window , the system triggers a recursive rollback. This initiates the Unpin command for the <Cont_ID> at the data plane and prunes the pending <Node_ID> association from the trust plane mempool, preventing the accumulation of orphaned identifiers.
4.4. Dual-Plane Network Topology
The operational conflict between high-volume data transmission and low-latency state consensus necessitates the dual-plane network topology [
21], extending the classical clean-slate separation of control and data planes to distributed data spaces [
49]. Unified network architectures couple payload resolution with ledger synchronisation. This configuration degrades transaction throughput when consensus nodes process and replicate large feature matrices. To resolve this bottleneck, we partition the infrastructure layer into two isolated operational planes. The data plane manages the asynchronous distribution of variable-size payload objects. The trust plane executes the synchronous consensus of constant-size state anchors. This separation isolates the network traffic, prevents ledger saturation, and maintains block finality bounds during high-frequency data registration [
24,
51].
Practical integration with existing infrastructures is achieved using standard APIs. The data plane employs unmodified IPFS DHT for storage and retrieval of statistical fingerprints, while the trust plane utilises any permissioned BFT blockchain, such as Fisco Bcos, for anchoring constant-size UDRIs. No custom modifications to the underlying IPFS or BFT systems are required; communication occurs through well-documented REST and gRPC interfaces.
4.4.1. Data Plane for Payload Resolution
The data plane executes the IPFS protocol stack. Nodes communicate via networking architecture. The data plane routes the fingerprint objects using DHT protocols [
9,
22] to guarantee oblivious
state retrieval. This plane isolates the data transmission from the physical IP layer using TLS 1.3 encapsulation [
52], preventing in-band packet interception [
25,
46]. To mitigate the risk of data loss due to node churn, the architecture implements multi-node pinning. The data plane distributes the fingerprint package
to
k independent nodes within the DHT, ensuring availability through redundancy [
9,
20]. The IPFS DHT provides
lookup latency for content-addressed payloads, where
n is the number of DHT nodes. To ensure reliable availability of fingerprint payloads
in a multi-jurisdictional setting, TDSR employs a pinning policy with redundancy factor
, distributing copies of each
across geographically diverse storage nodes within the consortium. Garbage collection is governed by the UDRI lifecycle: payloads remain pinned as long as the corresponding UDRI is active on the trust plane, and revocation triggers a controlled unpin operation. Storage nodes are operated by consortium members, ensuring compliance with data localisation requirements.
4.4.2. Trust Plane for State Consensus
The trust plane operates the BFT consensus ledger [
51]. The nodes communicate via gRPC [
53]. The trust plane manages state synchronisation across jurisdictional boundaries [
16]. It provides the interface to query the DID and the anchored Content Identifiers without processing the underlying data payloads.
The two planes interact exclusively through the UDRI pointer. The trust plane anchors only the constant-size cryptographic pointer (256 bytes), while the data plane stores the full statistical fingerprint via content-addressed DHT. Data integrity is maintained by anchoring the cryptographic hash of on the trust plane, ensuring that any modification to the payload in the data plane invalidates the anchored pointer.
4.5. Decoupled Storage and Consensus Mechanisms
The architectural division between the storage mechanisms and the consensus ledger addresses the operational limits of distributed protocols. BFT blockchains exhibit throughput degradation when nodes replicate high-volume feature matrices. DHTs accommodate variable-size payloads but lack the sequential state machines required to anchor ownership and prevent double-registration [
7,
15]. We partition the data structures to resolve this incompatibility. The system allocates the variable-size fingerprint matrices to the asynchronous datastore. It allocates the constant-size cryptographic pointers to the synchronous ledger. This decoupling restricts the consensus transaction size and sustains high registration throughput independent of the data dimensionality.
4.5.1. Data Payload Datastore
The data payload datastore operates on the IPFS swarm network [
7]. It stores the fingerprint package
, which includes the statistical matrices, metadata, and provider signatures. The payload size scales with the feature dimensionality
k. This datastore distributes objects via content addressing but does not enforce sequential ordering or global timestamping. The data plane operates in an untrusted environment. Payload integrity is guaranteed by the anchoring of
on the trust plane during registration: the verifier can recompute the hash locally against the retrieved
and reject mismatches. Invalid UDRI hashes injected into the DHT are harmless, as resolution triggers signature verification unauthenticated payloads are discarded. To mitigate DHT pollution and node churn, the registration process pins
to
independent storage nodes, ensuring availability even under
node departures.
4.5.2. State Consensus Ledger
The state consensus ledger resides on the permissioned BFT blockchain [
15]. It records the state transition transaction
. The transaction contains only the
, the 32-byte
hash, the timestamp, and the signature. The ledger state size remains constant at 256 bytes per registration. This isolation prevents network saturation during high-frequency DNS data registration [
43].
4.6. Two-Phase Data Asset Ownership Registration and Verification Algorithmic Execution
The preceding architectural components are synthesised into the unified TDSR framework.
Figure 1 illustrates the overall system architecture, mapping the functional modules to their respective algorithmic executions across the four-layer protocol stack and the dual-plane infrastructure. This integrated representation confirms the structural alignment between the UDRI resolution path and the distributed registration and verification lifecycles.
4.6.1. Phase 1: Data Asset Registration
The registration phase enables the data provider
to assert ownership over the source asset
D. The host node executes the payload construction Algorithm 1 in Module 1 to decouple the physical resolution identity from the data contents [
29,
40]. The system then executes a two-phase commit Algorithm 2 in Module 2 to anchor the payload [
38,
54]. Rollback functions trigger if
or if network partitions prevent block finality, preventing unanchored data objects from saturating the local storage of node
[
46].
| Algorithm 1 Fingerprint payload encapsulation |
Require: Raw data asset D at time t, Metadata M, Provider secret key Ensure: Fingerprint payload , Content Identifier 1: 2: 3: 4: 5: {Compute IPFS object hash for UDRI segment} 6. return , |
The system executes a simplified two-phase commit protocol, Algorithm 2, that focuses on the Prepare and Commit phases while omitting the full voting phase to reduce cross-jurisdictional latency. The rollback functions,
Unpin and
Delete, are triggered if
or if network partitions prevent the achievement of block finality. This ensures that unanchored data objects do not saturate the local storage of node
. The term “oblivious” refers to the fact that neither the data provider nor any intermediate network node learns the verifier’s specific data subset
d during registration or verification. Only the constant-size statistical fingerprint
and the UDRI pointer are exchanged across jurisdictional boundaries.
| Algorithm 2 BFT state anchoring via two-phase commit |
Require: Payload , , , , , Node secret key , Timeout limit Ensure: UDRI or Error State 1: 2: 3: 4: 5: 6: 7: 8: while do 9: if then 10: 11: 12. return 13: end if 14: end while 15: LocalDatastore.Unpin(Cont_ID) 16: LocalDatastore.Delete() 17. return Error: State Anchoring Failed |
4.6.2. Phase 2: Cross-Domain Verification
The verification phase enables the verifier node (
) to prove that a local data subset
d originates from the registered asset
D. The protocol enforces verification without exposing the verifier’s intent to
[
13]. The system first executes the identity verification using Algorithm 3, which corresponds to Module 3, to validate node compliance attributes across jurisdictional boundaries [
17,
34]. This replaces the central CA. Upon successful authentication, the Data User executes the cross-domain verification using Algorithm 4, which corresponds to Module 4 [
43]. The algorithm integrates UDRI parsing, ODAR retrieval, and local sandbox execution. It bounds trans-border leakage to
[
14].
The system executes the identity verification using Algorithm 3 to validate the node compliance attributes across jurisdictional boundaries. This replaces the central CA [
6,
37].
Upon successful authentication, the Data User executes the cross-domain verification using Algorithm 4. The algorithm integrates the UDRI parsing, the ODAR retrieval, and the local sandbox execution. It bounds trans-border leakage [
14,
28]. The current implementation employs the Mahalanobis distance with a
threshold, which assumes that the aligned latent representation
approximates a multivariate normal distribution. This assumption holds for frequency-based features derived from DNS logs but may not generalise to heavily skewed, long-tailed, or unstructured data such as medical images. By virtue of the algorithm-agnostic interface, advanced alternatives, including non-parametric kernel density estimators or learned similarity metrics, can be substituted within the sandbox without modifying the consensus or routing layers.
| Algorithm 3 Decentralised credential authentication |
Require: Verifiable Credential , Node public key , Issuer public key Ensure: Boolean Authentication Result
- 1:
- 2:
- 3:
if then - 4:
return False {Cryptographically Generated Identifier mismatch} - 5:
end if - 6:
- 7:
- 8:
if - 9:
return False - 10:
end if - 11:
return True
|
| Algorithm 4 Cross-domain statistical verification in isolated sandbox |
Require: Target , Local data subset d, Significance level , Degrees of freedom k Ensure: Boolean Decision
- 1:
- 2:
- 3:
if then - 4:
return - 5:
end if - 6:
- 7:
if then - 8:
return - 9:
end if - 10:
- 11:
- 12:
- 13:
- 14:
- 15:
if then - 16:
- 17:
else - 18:
- 19:
end if - 20:
- 21:
return
|
4.7. End-to-End Operational Workflow
The TDSR operational lifecycle can be abstracted as a sequential state transition from local raw data to a verified global claim. This unified workflow is expressed as follows:
The system exposes a unified interface for fingerprint extraction and for verification, enabling decoupled and privacy-preserving cross-domain validation.
The process begins with the provider extracting a statistical manifold from raw logs. This fingerprint is anchored to the trust plane via a two-phase commit, establishing an immutable global state. A verifier utilises the UDRI pointer to resolve the asset’s location and executes the verification interface within an isolated sandbox, completing the provenance check without raw data exposure.
The interaction between the layers and the defined interfaces is operationalised through a two-phase workflow, as illustrated in the sequence diagram in
Figure 2. It maps the execution sequence of the defined algorithms across the operational network layers. This section details the end-to-end processes for asset registration and cross-domain verification, which ensure that data sovereignty remains intact during transnational circulation and orchestrates the four core algorithms.
4.7.1. Distributed Data Registration
The registration pipeline begins in the data plane of the origin domain and comprises local extraction and global consensus. First, the data owner invokes the feature extraction interface to execute Algorithm 1, which extracts the divisible fingerprint and mints the UDRI. This step ensures that the raw dataset never leaves the local jurisdiction.
Following the local preparation, the service layer broadcasts the registration request to the control plane. To ensure ledger consistency across multi-jurisdictional nodes without recording the raw data, the architecture invokes the two-phase commit mechanism governed by Algorithm 2. Upon successful consensus, the cryptographic hash of
and the UDRI locator are appended to the immutable ledger [
15], completing the registration with an information leakage bound.
4.7.2. Cross-Domain Query and Verification
The verification lifecycle is triggered when an authorised consumer in a different jurisdiction initiates a query, progressing through distributed routing and isolated computation. The consumer submits the target UDRI to their local application layer. The system first executes Algorithm 3 to parse the identifier, resolve the intent-routing boundary, and direct the request to the specific origin domain [
22].
Once the system has succeeded in pointing the request, a secure execution environment in the sandbox is instantiated in the origin registration node. The consumer provides the questioned data subset
d. The feature extraction interface then transitions to Algorithm 4, which performs the cross-domain statistical verification
. It calculates the confidence interval against the registered fingerprint and outputs a binary verification result. By isolating the computation, this integrated pipeline ensures that trust is established without raw data disclosure [
40].
5. Results and Evaluation
We deployed a distributed testbed across GBA, Guangdong, Hong Kong, and Macao, to evaluate the TDSR architecture. The evaluation quantifies the operational costs and structural boundaries introduced by the ODAR protocol and the UDRI identifiers, focusing on execution latency, dimensionality invariance, continuous overhead, and system resilience. We implemented the Mahalanobis distance and test as baseline proxies to validate end-to-end connectivity and communication bounds. We do not evaluate feature extraction accuracy by particular algorithm or for specific data distributions. While the GBA testbed provides a rigorous proof-of-concept validation using real-world DNS recursive logs, the architecture is designed to be data-agnostic at Layer 4. For time-series structured data typical in medical telemetry or financial transaction records, the same feature extraction and statistical provenance mechanisms can be applied.
5.1. Experimental Setup and Baseline Models
The testbed operates across regional data centres over Wide Area Network, with baseline round-trip times between 15 and 25 milliseconds. Virtual machines host all protocol layers. Operating environments utilise Ubuntu 22.04 LTS and Docker containerisation.
The trust plane comprises 64 BFT validator nodes with hardware allocation of eight vCPUs, 16 GB RAM, and 500 GB NVMe storage. The data plane comprises 12 IPFS storage nodes with a replication factor of 3 and hardware allocation 16 vCPUs, 64 GB RAM, and 4 TB NVMe storage.
The testbed ingests real-world DNS recursive logs from a cooperative provider in the GBA region. The dataset consists of approximately 2.11 million DNS resolution requests per day, with each log entry containing a timestamp, anonymised source IP, queried domain, record type, and response code. Feature extraction at Layer 4 employs a frequency-based approach: each 24 h window is partitioned into 5 min bins, and the empirical distribution of query types (A, AAAA, MX, etc.), response codes, and top-level domain categories is aggregated into a -dimensional feature vector. The fingerprint then computes the mean vector and covariance matrix over the complete observation period, producing a constant-size statistical manifold of approximately 4 KB. All performance metrics include 95% confidence intervals (CI) derived from 1000 independent query iterations. Wilcoxon rank-sum tests validate the performance divergence between TDSR and baselines. Significance is established at .
We employ two models as baselines:
Coupled ledger as Baseline A: A standard blockchain architecture lacking the storage–consensus boundary. Nodes embed the complete feature template within the BFT consensus transactions.
Federated in-band routing as Baseline B: A federated architecture lacking the intent–routing boundary. Nodes route verification queries to target IP addresses and authenticate via centralised Public Key Infrastructure (PKI).
Baseline A is implemented as a standard permissioned BFT blockchain of Fisco Bcos where complete feature matrices are embedded in consensus transactions. Baseline B is realised as a federated architecture with centralised PKI gateways that route full payloads in-band.
The dual-plane decoupling introduces a modest latency overhead of 280 ms for DHT lookup and 150 ms for BFT anchoring compared with tightly coupled baselines. This trade-off is deliberate: it eliminates communication complexity and ensures constant-bounded trans-border leakage, making TDSR particularly suitable for ownership audits and compliance verification rather than real-time streaming applications. Deployment follows standard gRPC and REST APIs with no custom modifications to underlying IPFS or BFT systems, enabling straightforward integration with existing infrastructures.
5.2. End-to-End Execution Latency
As anticipated from the system design and positioning discussed, the end-to-end cross-domain verification latency remains acceptable for audit and compliance scenarios, with a median of 3.82 s across regional paths in the GBA testbed. We quantify the execution latency across the cross-domain verification lifecycle. This phase requires the verifier node to resolve the UDRI pointer and authenticate a queried dataset across jurisdictional boundaries.
Figure 3a presents the Cumulative Distribution Function of this execution. The distribution confirms that total cross-border verification latency remains bounded within 4.35 s at the 95th percentile, with a median response time of 3.82 s. Deconstructing this median latency across the protocol stack reveals the operational costs of the decoupling mechanisms. The L1 data plane executes the oblivious DHT lookup to retrieve the payload in 2.85 s. The L2 layer executes the out-of-band credential authentication in 0.45 s. The L3 layer allocates the isolated sandbox and computes the statistical verification in 0.52 s. The architecture absorbs these operational latencies to eliminate point-to-point network connections and to hide resolution intents.
Figure 3b–d evaluate the TDSR architecture against baselines across regional routes. Baseline A exceeds 12.0 s in all instances. The coupled ledger mechanism collapses under high-dimensional payload replication. Baseline B maintains P95 latency between 5.06 s and 5.42 s. TDSR restricts P95 latency to a range of 4.24 s to 4.61 s. The dual-plane topology isolates consensus overhead. The ODAR mechanism enforces data minimisation. Latency remains stable regardless of the regional routing path.
5.3. Payload Dimensionality Invariance
To validate the isolation efficacy of the storage-consensus boundary, we executed a dimensionality stress test. The system mandates that throughput remains independent of data dimensionality. The feature matrix size extracted at the L4 interface increases from 256 bytes to 50 megabytes.
Figure 4 presents the transaction throughput response. Baseline A experiences throughput degradation to near-zero as consensus nodes replicate expanding payloads across the synchronous network. The TDSR architecture sustains a constant throughput of 2400 Transactions Per Second across all payload dimensions. By anchoring only uniform 256-byte UDRI pointers on the trust plane, the system isolates the state machine from the variable-size data objects, confirming the
throughput invariance.
While the physical testbed incorporates 64 BFT validator nodes, the decoupled topology provides theoretical guarantees for massive-scale deployment. By offloading the variable-size multidimensional payloads to the IPFS data plane, the trust plane is relegated to ordering 256-byte UDRI pointers. The existing BFT consensus literature confirms that when transaction payloads are minimised to this constant bound, state machine replication can scale to thousands of nodes without bottlenecking throughput [
8,
48,
51]. The 2400 TPS observed in the 64-node deployment represents a baseline capacity; the architecture is capable of supporting global-scale multi-jurisdictional nodes.
5.4. Continuous Operational Overhead
To evaluate system behaviour under sustained load and verify the data minimisation bounds, we executed a 30-day stress test, processing 2.11 million daily DNS resolution requests.
During a single cross-border query, the L4 interface confines raw execution to the local sandbox. The network transmits 128 bytes for the UDRI request, 850 bytes for the DID credential, and 4096 bytes for the statistical payload. Total application payload crossing the border is bounded to 5.8 kilobytes, independent of the gigabyte-scale source datasets.
As shown in
Table 3, the enforced
constant trans-border leakage bound is maintained at ≤5.8 KB per query. This application payload is explicitly composed of: (1) 128 bytes for the encapsulated UDRI query string; (2) 850 bytes for the DID document and associated VC metadata; and (3) 4096 bytes for the statistical fingerprint
, comprising a small-dimensional covariance matrix
, mean vector
, and cryptographic signatures. This fixed composition ensures that trans-border transmission remains independent of the source data’s gigabyte-scale cardinality.
Figure 5 illustrates the accumulated system overhead. The 3000 MB ledger failure threshold represents the maximum operational memory allocation designated for smart contract execution environments and state databases on the lightweight validator nodes deployed within our GBA testbed. Baseline A crossed this threshold on day 15 due to the continuous on-chain replication of high-dimensional matrices. Baseline B breaches the 15,000 MB gateway bandwidth threshold on day 14. The architecture transmits raw data payloads in-band.
5.5. System Resilience
Experiments evaluate the system resilience through Byzantine fault injection and data plane node churn.
Figure 5 and
Figure 6 display the performance response.
The trust plane sustains block finality below 4.5 s under Byzantine conditions. Consensus finality remains below failure limits across the 64-node testbed. The data plane processes 30% node churn. Multi-node pinning maintains payload availability. Oblivious DHT routing latency increases by 0.8 s.
Figure 6 quantifies the operational stability over 30 days. The 5.8 KB trans-border leakage bound prevents the system from reaching the 3000 MB ledger storage threshold and 15,000 MB network bandwidth threshold. Baseline A reaches the 3000 MB threshold on day 15. Baseline B reaches the 15,000 MB threshold on day 14. TDSR maintains operation below all failure thresholds for the 30-day duration. The dual-plane topology isolates the trust plane from data plane partitions and wide-area network jitter. The architecture enforces data minimisation to maintain system liveness.
5.6. Architecture Ablation
Table 4 presents an ablation analysis isolating the operational impact of the core architectural boundaries.
Removing the storage–consensus boundary (TDSR w/o Storage–Consensus) forces the trust plane to process variable-size multidimensional feature matrices. Throughput falls below 10 TPS. Finality latency scales to , exceeding 12,000 ms. The consensus ledger saturates.
Removing the intent–routing boundary (TDSR w/o Intent–Routing) forces in-band payload transmission. Trans-border payload scales linearly () with queried data volume, routinely exceeding 10 MB. Network bandwidth saturates. Data minimisation mandates are breached.
The full TDSR architecture executes all isolation boundaries. Throughput is constant at 2400 TPS. The trans-border payload is strictly bounded to 5.8 KB per query. The dual-plane topology and ODAR mechanism jointly maintain data minimisation and system scalability.
6. Discussion
Execution metrics from the deployment quantify the isolation boundaries of the TDSR architecture. This section analyses the mechanisms of the protocol stack, resolving the conflict between sovereignty mandates and consensus scalability. We examine the storage–consensus decoupling mechanism, establishing bounds on transaction throughput. We evaluate the intent–routing boundaries confirming cross-domain compliance. We map the trans-border communication payload to the privacy prerequisites. Synthesising these mechanisms demonstrates that TDSR operates as a protocol stack governing trans-border data circulation.
6.1. Architectural Advantages over Reference Models
Table 5 summarises the architectural properties and operational boundaries between the reference models and the proposed TDSR. The TDSR architecture resolves the structural limitations inherent in the in-band federated model evaluated as Baseline B, prevalent in frameworks such as GAIA-X and the IDS. In-band systems couple identity resolution with payload routing, which exposes verifier intents to host nodes and incurs
trans-border communication overhead scaling linearly with data volume [
1,
19,
38]. In contrast, the TDSR implements an out-of-band decoupled architecture. By executing intent resolution via the IPFS DHT, the ODAR mechanism enforces a trans-border communication boundary. This architectural separation caps external payload transmission at a constant
bound of ≤5.8 KB per query. The system achieves deterministic data minimisation through its routing topology, circumventing the need for the interactive bandwidth exchanges required by Multi-Party Computation protocols.
TDSR mitigates the dimensionality variance associated with the coupled ledger model evaluated as Baseline A. Blockchain registries anchor ownership by embedding multidimensional feature matrices into the state consensus, which can lead to throughput degradation as data volume expands [
5,
15]. By partitioning the asynchronous datastore and the synchronous trust plane, the TDSR restricts L1 consensus operations to 256-byte UDRI pointers, sustaining a consistent throughput of 2400 TPS independent of the underlying data scale [
46]. The architecture establishes algorithm agnosticism through the L4 interface contract. By providing an isolated sandbox for pluggable verification modules, the TDSR accommodates diverse privacy-preserving mechanisms—such as Zero-Knowledge Proofs—without embedding their associated computational complexities, such as
proof generation, into the global consensus layer, shifting the security perimeter from interactive networks to the protocol boundary. While the algorithm-agnostic interface allows ZKP verifiers to be deployed within the sandbox without altering the consensus layer, the local computational cost of ZKP proof generation, typically
and several seconds to minutes for non-trivial datasets, would exceed the current 3.82 s median latency budget of TDSR. Thus, TDSR occupies a middle ground in the privacy-preserving spectrum: it provides deterministic
network-level leakage with low latency suitable for audits while remaining compatible with stronger but costlier primitives like ZKPs for high-assurance scenarios.
6.2. Multi-Jurisdictional Compliance Execution
The TDSR architecture established a cross-jurisdictional compliance mechanism for data circulation operating across divergent legal jurisdictions, driven by the integration of the ODAR protocol and the DID mechanism [
14]. Legal frameworks, including the Personal Information Protection Law and the General Data Protection Regulation, enforce data minimisation and restrict the trans-border transfer of identifiers [
18,
40,
41].
The ODAR protocol executes the data minimisation constraint. A Data User querying a DNS subset triggers the middleware to retrieve the fingerprint via the IPFS DHT, bypassing direct connections to the host node [
28]. The DHT mechanism isolates the origin IP address of the verifier to satisfy query intent regulations. Jurisdictions lack recognition of CA [
6]. The architecture replaces the CA with a DID mechanism based on Cryptographically Generated Identifiers [
37]. Jurisdictions issue VC to bind compliance attributes to these DIDs [
42]. Proofs and evaluations demonstrate the trans-border payload is bounded. The network transmits 5.8 kilobytes of hash commitments and Boolean scores per query. Zero bytes of DNS recursive logs cross jurisdictional boundaries. Beyond data minimisation, TDSR supports the right to be forgotten through UDRI revocation and facilitates data portability by allowing owners to transfer
Cont_ID ownership without moving raw data. The architecture enables regulatory auditing by providing verifiable provenance proofs while maintaining zero raw data transit across borders. Regulatory enforcement is supported through immutable UDRI-anchored audit logs on the trust plane, while the DID and VC mechanisms enable legal interoperability under cross-border governance and compliance frameworks.
6.3. System Overhead and Security Boundaries
The decoupling mechanisms introduce latencies during cross-border verification, generating a response time of 3.82 s. This comprises 2.85 s for asynchronous DHT lookup operations across the data plane, with BFT state retrieval and sandbox executions adding 0.97 s, reflecting the cost of hiding resolution intents [
7,
33]. To manage state divergence, the architecture enforces a two-phase commit protocol where consensus timeouts trigger rollback functions to purge unanchored payloads, preventing ledger state pollution in the off-chain storage layer.
The decoupling architecture establishes security boundaries by isolating attack vectors through the dual-plane topology. The BFT consortium blockchain maintains state consistency to mitigate double-registration under the fault tolerance threshold
[
16]. The Cryptographically Generated Identifier mechanism addresses Sybil attacks by requiring regulatory credentials for cross-domain resolution [
29], ensuring the authentication layer remains secure across jurisdictional boundaries.
6.4. Operational Scope and Limitations
This execution profile dictates the operational scope of the TDSR architecture, positioning it for cross-jurisdictional data ownership audits and compliance verification rather than stream control. In contexts where certificate authority processes require days, a 3.82 s deterministic resolution provides a magnitude improvement. To optimise the DHT routing bottleneck during query sequences, the L2 interoperability layer integrates a Least Recent Used credential cache governed by Time-To-Live constraints, enabling cache hits to reduce end-to-end verification to the BFT retrieval threshold (<1.0 s).
The architecture is subject to methodological and infrastructural limitations. While the alignment mapping
resolves the skew in non-Gaussian distributions, evaluating the accuracy of mapping algorithms in processing long-tailed datasets remains a challenge [
45]. The security proofs presented are based on an assumption of successful distributional alignment via the mapping
. In practice, extreme skewness in certain real-world datasets may challenge this assumption. Evaluations omit stress testing against Distributed Denial of Service on DHT nodes [
7,
30]. Future iterations address these constraints by integrating Trusted Execution Environments [
44] and Byzantine-resilient mechanisms to formalise a hardware root of trust. While the GBA deployment provides a robust proof of concept using real-world DNS recursive logs, more complex data modalities such as medical imaging and detailed financial audits present additional challenges due to their high dimensionality and non-tabular structure. These limitations are recognised in the current study. In particular, long-tailed and heavily skewed distributions, which are prevalent in many real-world datasets, may challenge the distributional alignment assumption. Additional limitations include potential exposure to Distributed Denial of Service (DDoS) attacks targeting the DHT lookup service or BFT consensus nodes, as well as adversarial conditions where malicious registrars attempt to flood the system with invalid UDRIs. While the current design incorporates rate limiting at the intent-routing layer and Cryptographically Generated Identifiers for Sybil resistance, these threats warrant further mitigation strategies in future deployments.
6.5. Architectural Generalisation and Transferability
The L4 feature extraction interface abstracts representation algorithms to enable deployment beyond DNS records. Substituting the local extraction algorithm applies the
trans-border leakage bounds to diverse data infrastructures. Industrial IoT sensor telemetry utilises the dimensional invariance to prevent ledger saturation such as distributed split single-sideband time-modulated arrays [
55] and distributed IRS beamforming [
56]. Medical case databases utilise oblivious routing and isolated sandboxes to enforce zero-byte trans-border raw data transfer. Financial logs utilise the decentralised credential mechanism to execute compliance verification without central authorities.
The TDSR architecture establishes a data-agnostic verification mechanism. The decoupled dual-plane topology ensures variations in raw data formats and query volumes do not compromise the synchronous consensus ledger. The architecture provides a generalised baseline for multi-jurisdictional data asset registration.
7. Conclusions and Future Work
The proposed TDSR architecture resolves the operational conflict between ownership provenance prerequisites and data minimisation mandates. By implementing a four-layer protocol stack and a dual-plane topology, the system establishes a decoupled storage–ledger mechanism. This partitioning of asynchronous payload datastores from synchronous consensus ledgers sustains throughput independent of data dimensionality. Navigating this infrastructure, the formulated UDRI executes out-of-band cross-domain routing without exposing verifier intents. Driven by the ODAR mechanism, the two-phase, four-algorithm lifecycle shifts hypothesis testing to isolated sandboxes via an algorithm-agnostic mathematical contract. This operational workflow caps external data transit at a constant leakage bound. The deployment across the Guangdong–Hong Kong–Macao Greater Bay Area validates the architecture, establishing a robust compliance mechanism for data circulation across divergent legal jurisdictions.
Future work will not only optimise the performance but also evaluate the TDSR architecture against attack models. We plan to extend the feature extraction interface to better accommodate unstructured and high-dimensional data such as medical imaging and financial audit records, exploring non-parametric kernel methods and Trusted Execution Environments to further strengthen cross-sector applicability. Exploring non-parametric kernel density estimation and robust statistical techniques to better accommodate long-tailed and non-Gaussian data distributions is also scheduled. This evaluation will execute attack-and-defence experiments, testing robustness against DHT record pollution, Sybil routing attacks, and Distributed Denial of Service vectors across the dual-plane topology. Future research will explore non-parametric kernel methods to support data representations. Integrating Trusted Execution Environments [
44] and Byzantine-resilient verification mechanisms [
45] will formalise the hardware-level root of trust required to mitigate poisoning vectors in multi-jurisdictional networks.