7.1. Trust Establishment
Trust Establishment is the process of verifying a node’s or user’s identity to enable secure transactions within a blockchain. Within Static Sharding, the work in [
23] introduces unique features, such as the Identity Establishment and Overlay Setup for Committees. Here, each processor autonomously generates an identity composed of a Proof-of-Work (PoW), IP address, and public key. During the Committee Formation phase, processors verify each other’s identities by solving a PoW problem with publicly verifiable solutions. Sharding typically employs a specialized method for identity verification, which can enhance overall security. However, the reliance on PoW introduces significant scalability challenges, as its computational demands become increasingly burdensome with network expansion.
Significant differences are observed in how various blockchain architectures implement Trust Establishment. For instance, Elastico [
23] explores a two-stage Trust Establishment process, potentially enhancing security through multiple validations but complicating the consensus process and potentially slowing down transactions. Conversely, OmniLedger [
51] and RapidChain [
70] emphasize a single primary stage to enhance speed and simplicity, though this may make them more vulnerable to coordinated attacks. These differences underscore the crucial trade-off between security and efficiency in blockchain design, with each approach offering its own strengths and limitations.
Approaches such as Monoxide [
42] and Chainspace [
43] do not introduce novel methods for establishing trust, but instead integrate trust directly into their consensus mechanisms. Although this streamlined approach could enhance process efficiency, it also exposes to risks if the consensus mechanism itself is compromised. A notable drawback is that the absence of separate Trust Establishment feature could hinder these technologies’ ability to adapt to emergent security threats.
An interesting case is Meepo [
71], a consortium blockchain, where Trust Establishment is inherently assured due to the closed nature of its approach, eliminating the need for new Trust Establishment methods. This built-in trust simplifies management and keeps transactions private within the group, but it also restricts the network’s openness and may not perform well in larger, more decentralized environments, which is a significant limitation in its design.
Liu et al. in [
66] propose Kronos, which establishes trust through a secure shard configuration mechanism that may utilize PoW, Proof-of-Stake (PoS), or public randomness to assign nodes to shards and resist Sybil attacks at the entry point. This process ensures that nodes cannot cheaply fabricate multiple identities during initialization or periodic reconfiguration. Simulations across thousands of nodes on AWS EC2 validate the effectiveness of Kronos under different network models (synchronous, partially synchronous, and asynchronous). However, trust is statically assigned: once a node is verified, no ongoing trust reassessment is performed. Thus, if a node initially behaves correctly but later colludes or becomes compromised, the work in [
66] has no built-in mechanisms to detect or expel it. The security model also critically depends on the honest majority assumption within shards, which may not hold under economically incentivized attacks. No behavioral monitoring or epochal re-verification is embedded, posing long-term scalability risks for highly dynamic public deployments.
Lin et al. in [
67] introduce DL-Chain, a sharding system that establishes trust and node assignment at the start of each epoch using epoch randomness generated by Verifiable Random Functions (VRFs) combined with Verifiable Delay Functions (VDFs). Each node is randomly assigned to a Proposer Shard and a Finalizer Committee based on this process, ensuring fair and unpredictable allocation. This approach prevents adversaries from concentrating malicious nodes within a single shard, thereby enhancing security while avoiding the computational overhead of PoW-based approaches. Experimental results with up to 2550 nodes demonstrate that the work in [
67] random assignment strategy maintains negligible failure probability and resists targeted shard capture. However, DL-Chain treats Trust Establishment as a static process within each epoch. There are no mechanisms for dynamic reassignment or behavior-based penalties during an epoch. As a result, malicious nodes admitted at epoch formation persist throughout the epoch without recertification or removal. While this work assumes the underlying randomness process is bias-resistant and publicly verifiable, it does not explicitly discuss the risks of potential collusion in VRF/VDF generation.
In Dynamic Sharding, Trust Establishment remains an essential process. However, the works in [
52,
53] do not introduce novel approaches. For example, the work in [
69] employs the Louvain Algorithm to facilitate Trust Establishment, yet it uses a well-established method rather than presenting an innovative approach. However, this reliance on conventional methods may limit the potential for significant improvements in security and system flexibility.
Liu et al. [
68] propose DYNASHARD, which utilizes a secure random process for managing committee selection to promote fairness and reduce the risk of collusion. While the protocol ensures committees are selected via internal randomness, it does not explicitly specify periodic or epoch-based reshuffling of committees, nor does it present simulation results quantifying committee diversity or capture rates over time. The Trust Establishment mechanism in DYNASHARD is solely randomness-driven, without incorporating historical node behavior or penalization for misbehavior in the committee selection process. Furthermore, the protocol does not describe any external or decentralized randomness beacon for committee selection, nor does it detail how the randomness source is made publicly verifiable. This lack of external verifiability could limit transparency and potentially undermine trust in adversarial scenarios.
AEROChain [
55] establishes trust implicitly through its dual-shard architecture, where each node belongs to a physical shard for transaction validation and to a logical shard for account migration coordination. Both layers use Practical Byzantine Fault Tolerance (PBFT) for consensus, embedding trust into repeated voting among nodes. However, the approach lacks mechanisms for behavioral scoring or identity revalidation across epochs. Trust is static once nodes are assigned, which makes the work in [
55] vulnerable to long-term adversarial drift. The scope in [
55] focuses on scalable and balanced account migration, not direct trust modeling. No trust-specific simulations are presented, and findings center on the Deep Reinforcement Learning (DRL) based migration model.
SkyChain [
73] proposes periodic re-sharding to dynamically reassign nodes across committees, aiming to balance performance and security in a dynamic blockchain environment. Their method leverages an adaptive ledger protocol and a DRL-based sharding mechanism to adjust shard configurations based on observed network state. However, identity establishment is handled through Sybil-resistant PoW puzzles, with no additional cryptographic attestation or behavioral scoring mechanisms in place. The re-sharding process is explicitly adaptive, driven by the DRL policy rather than fixed or deterministic intervals. The primary focus of SkyChain is on scalable dynamic re-sharding, with contributions centered around optimizing performance and security trade-offs. Findings are reported from simulation-based evaluations, focusing on TPS, latency, and safety metrics, without any explicit trust or reputation evaluation.
Similarly, in Layered Sharding, while trust is inherently addressed in every approach, one work in [
24] adopts an Assignment System for Trust Establishment, whereas another work in [
54] does not propose any specific new method. But inconsistency in addressing trust challenges underscores the need for more innovative solutions that can better meet the evolving security demands of blockchain networks.
SPRING [
74] implements a Trust Establishment mechanism where nodes undergo a one-time PoW process at registration. The last bits of the PoW solution directly determine the initial shard assignment. The protocol employs a reconfiguration phase in which consensus nodes are regularly shuffled among shards for security, using VRF to generate unpredictable, bias-resistant randomness for node redistribution and leader selection. This ongoing reconfiguration mechanism is explicitly designed to prevent persistent collusion or the formation of static shard compositions. While the initial entry barrier is minimal compared to protocols with ongoing reputation or slashing, the security model relies on periodic node rotation and PBFT consensus within shards, not on dynamic behavioral trust assessment. Therefore, the risk of persistent malicious behavior is mitigated by the enforced protocol-level reshuffling of nodes, maintaining the integrity and unpredictability of shard compositions over time.
TBDD [
78] introduces a promising response to these limitations through a trust-based and DRL-driven framework. It integrates multi-layered trust evaluation using historical, direct, and indirect feedback to generate local and global trust scores. These scores are used by a decentralized TBDD Committee (TC) to guide re-sharding through DRL, classifying nodes into risk levels and reallocating them to enhance security and reduce cross-shard transactions. Empirical results demonstrate up to 13% improvements in TPS and significantly better risk distribution under adversarial loads. However, maintaining accurate trust scores requires frequent updates and cross-node communication, posing challenges in high-churn environments like IoT. Additionally, aggregated trust metrics may lag behind behavioral shifts, and the committee structure introduces a potential point of failure if compromised. Bootstrapping new or low-activity nodes remains a vulnerability, and the centralized coordination role of the TC raises concerns about long-term decentralization.
Trust Establishment is a fundamental component of Static, Dynamic, and Layered Sharding. Static Sharding, leveraging innovative Trust Establishment methods, often achieves demonstrably superior security. In contrast, Dynamic and Layered Sharding, reliant on conventional techniques, face inherent limitations in security enhancement and adaptive system reconfiguration. This reliance not only impedes immediate progress but also creates a potential bottleneck for future advancements in resilience and agile scaling. While recent AI-driven frameworks such as TBDD introduce layered trust scoring and DRL-based re-sharding, they depend on centralized committee coordination and continuous cross-node communication, which may limit scalability in high-churn environments. Several approaches, including SPRING [
74] and DL-Chain [
67], treat trust as a static entry event without behavioral reassessment or epochal refresh, which increases vulnerability to long-term collusion. Furthermore, trust metrics in many models are not verifiably auditable or resistant to subtle manipulation, especially under adversarial conditions.
7.2. Consensus Selection
The selection of consensus mechanisms is critical for ensuring the integrity and efficiency of the networks. Various approaches have been developed to enhance the consensus process [
79,
80] across different blockchain architectures, each with its unique advantages and limitations.
Within Static Sharding, Elastico [
23] demonstrates a model that leverages PoW alongside PBFT to establish secure consensus within committees. This hybrid consensus mechanism allows for a Final Consensus Broadcast after achieving agreement within a committee using a conventional Byzantine Agreement protocol. The final committee aggregates the results from all committees and uses a Byzantine consensus process to finalize the outcome then broadcast it to the entire network. The merging of PoW and PBFT is unique, but handling many committees and combining outcomes may slow down the process and increase overhead, especially as the network scales.
OmniLedger [
51] employs ByzcoinX for its consensus mechanism. ByzcoinX extends the classical Byzantine consensus by forming a communication tree, where validators are organized hierarchically to reduce messaging overhead. For example, instead of every node broadcasting to all others, leaders in each subgroup aggregate votes and forward them upward, improving efficiency and reducing latency under high transaction loads. In OmniLedger [
51], this enhances the traditional roles of PoW and PoS, which in this context do not directly contribute to transaction validation but rather to the representation of validators in the Identity Block Creation process. ByzcoinX addresses the need for more resilient communication within shards to manage transaction dependencies and improve block parallelization, even in scenarios where some validators fail. ByzcoinX enhances shard communication, but its reliance on validators for identifying block production could lead to inefficiencies or vulnerabilities if coordination fails.
RapidChain [
70] offers an Intra-committee Consensus (PBFT) that leverages a unique gossiping protocol suitable for handling large blocks and achieving high TPS through block pipelining. This setup includes a two-tier validation process where a smaller group of validators processes transactions quickly, which are then re-verified by a larger, slower tier for enhanced security. Even though speed and security are increased by this two-tier method, delays may be introduced by the second validation phase.
Monoxide [
42] proposes Asynchronous Consensus Zones to scale the block-chain network linearly without compromising on security or decentralization. Its consensus mechanism, Chu-Ko-Nu mining, also functions as a Trust Establishment tool by ensuring uniform mining power across zones and introducing Epoch Randomness. Chu-ko-nu mining, a novel proof-of-work scheme, is employed in Asynchronous Consensus Zones to enhance security. This allows miners to use a single PoW solution to create multiple blocks simultaneously, one per zone, ensuring that mining power is evenly distributed across all zones. As a result, attacking a single zone is as difficult as attacking the entire network. Asynchronous Consensus Zones of Monoxide [
42] provide linear scalability, but the challenge of synchronizing randomization and coordinating consensus across multiple zones may lead to increased overhead and latency, particularly in larger networks.
Chainspace [
43] utilizes the Sharded Byzantine Atomic Commit (S-BAC) protocol, a combination of Atomic Commit and Byzantine Agreement protocols, to ensure consistency across transactions that involve multiple shards. This method guarantees that a transaction must be unanimously approved by all shards it touches before it can be committed, thus maintaining transaction integrity. Although the S-BAC protocol maintains high integrity in cross-shard transactions, the need for unanimous clearance may introduce inefficiencies or delays, especially as the number of shards increases. In practice, S-BAC works like a two-phase commit extended to sharded environments. For instance, if a transaction spans Shard A and Shard B, both shards first enter a ‘prepare’ phase and lock the resources. Only if both confirm in the ‘commit’ phase is the transaction finalized; otherwise, both shards roll back. This ensures atomicity across shards, though at the cost of higher communication overhead.
Meepo [
71], on the other hand, does not introduce a new consensus mechanism but rather employs a non-modified PoA, relying on the existing trust model inherent in its consortium blockchain framework. The blockchain’s flexibility and openness are restricted by Meepo’s use of PoA, which makes it less suitable for decentralized or permissionless networks, even though it simplifies the consensus process in a regulated setting.
Kronos [
66] distinguishes between “happy” and “unhappy” paths in cross-shard transaction processing. In normal operations, transactions are processed or rejected with minimal overhead using standard intra-shard Byzantine Fault Tolerance (BFT), while the unhappy path invokes additional rollback mechanisms to ensure atomicity and shard consistency. AWS experiments demonstrate that Kronos achieves high TPS and low latency under various network synchrony and Byzantine fault conditions. The protocol’s transaction integrity and atomicity guarantees are established through formal security analysis presented in this work, rather than through empirical experiments. However, the rollback mechanism introduces significant communication overhead and increases finalization latency under persistent fault conditions. Moreover, the protocol assumes low rollback frequency, and frequent rollbacks in highly adversarial environments could lead to severe TPS degradation. Kronos also does not dynamically adapt committee memberships or quorum thresholds based on real-time fault rates, limiting flexibility.
DL-Chain [
67] modularizes consensus by dividing transaction proposal and finalization into two distinct layers. Proposer shards are responsible for assembling and processing transactions, while finalizer committees independently validate and finalize these transactions. This architectural separation enhances parallelism and fault isolation, resulting in significant TPS improvements as demonstrated in experimental results. Node assignment to proposer shards and finalizer committees is performed at the beginning of each epoch using a randomness process, and these assignments remain fixed throughout the epoch. Consequently, DL-Chain does not support dynamic reassignment or migration of workloads during an epoch. In scenarios with highly variable or uneven transaction loads, some proposer shards may become bottlenecks due to the absence of intra-epoch load balancing mechanisms. If a finalizer committee leader becomes non-responsive due to validator churn or failure, DL-Chain employs a view change protocol within the Fast Byzantine Fault Tolerance (FBFT) consensus algorithm to replace the faulty leader and restore liveness within the committee.
In Dynamic sharding, DYNASHARD [
68] adopts a hybrid consensus architecture, utilizing BFT for intra-shard transaction validation and a combination of Multiparty Computation (MPC) and threshold signature schemes for global coordination. This design ensures that transaction commitments, both within and across shards, are achieved with strong atomicity and security guarantees. The protocol’s evaluation demonstrates resilience to adversarial behaviors, including collusion and double-spending, through comprehensive security analysis and simulation-based validation. Nonetheless, the use of threshold signature aggregation and MPC introduces non-trivial computational and bandwidth overheads. These cryptographic protocols require each participant to compute and exchange partial signatures or intermediate values in multiple rounds, significantly increasing the computational workload and network traffic compared to traditional consensus mechanisms. As transaction volume and committee size scale, these overheads may impact TPS and latency, making DYNASHARD less suitable for high-frequency or latency-sensitive applications (e.g., [
51]). Additionally, DYNASHARD does not incorporate early commitment or fast-finality optimizations, leaving it potentially susceptible to synchronization delays during periods of peak system concurrency.
AEROChain [
55] uses PBFT at two levels: physical shards (validate local transactions) while the logical shard (handles migration proposals during the reconfiguration phase). The contribution lies in separating state migration from transaction consensus. This layered PBFT structure provides modularity but introduces coordination overhead and potential bottlenecks during high migration volumes. The approach was validated as part of the full AEROChain simulation, but no consensus-specific benchmarks were isolated. The absence of fast-path execution or rollback handling limits resilience to stalled consensus rounds.
SkyChain [
73] uses standard BFT protocols for intra-shard consensus without modifications or enhancements. Its novelty lies in DRL-powered re-sharding, not consensus innovation. Their method assumes fewer than one-third faulty nodes but lacks rollback, fast-track, or speculative commit techniques. No simulations were presented to test consensus scalability. Thus, while structurally sound, its consensus mechanism is basic, and no fallback mechanisms are discussed.
While in Layered Sharding, SPRING [
74] employs PBFT as the intra-shard consensus protocol for both A-Shard and T-Shard. The protocol operates under the partial synchrony assumption common to BFT systems. SPRING’s DRL agent dynamically assigns new addresses to shards in order to balance transaction load and minimize cross-shard transactions. In parallel, the protocol includes a periodic reconfiguration phase in which consensus nodes are reshuffled across shards to maintain security and prevent persistent collusion. While workload imbalance among shards may still arise due to the power-law distribution of transaction activity, SPRING’s design seeks to balance TPS and fairness without requiring dynamic committee resizing or more advanced consensus adaptations. Experimental results show that SPRING reduces cross-shard transaction ratio and improves TPS, but all evaluated consensus groups are periodically rotated and operate under standard PBFT assumptions.
In Dynamic (e.g., [
25,
52,
53]) and Layered Sharding (e.g., [
24,
54]), a key challenge is designing consensus protocols that efficiently coordinate how blocks are generated and verified across shards. These approaches typically rely on existing, non-modified consensus mechanisms such as PoW, PBFT, or byzantine-based techniques, which may not introduce new consensus mechanisms but are essential for the proper functioning of these sharded techniques. As the demand for more effective and scalable approaches grows, reliance on pre-existing consensus processes may limit the flexibility and scalability of Dynamic and Layered Sharding systems.
Recent efforts have explored the integration of AI into sharding and consensus mechanisms to enhance blockchain scalability, fairness, and energy efficiency. El Mezouari and Omary in [
81] present a hybrid consensus framework that combines PoS for block creation with AI-enhanced sharding for transaction validation. In their approach, decision tree algorithms dynamically allocate tasks to shards based on network load and historical node behavior. Entropy measures and Haversine distance metrics are used to optimize shard load distribution and minimize cross-shard communication overhead. El Mezouari and Omary in [
81] argue that this model mitigates the centralization risks of pure PoS while improving decentralization, TPS, and energy efficiency compared to traditional PoW models. However, incorporating AI-driven shard management introduces challenges related to algorithmic transparency, susceptibility to model drift, and dependency on high-quality, unbiased training datasets. Additionally, the operational complexity of coordinating between PoS and intelligent sharding logic could pose risks to system stability if not rigorously optimized.
Similarly, Chen et al. in [
82] propose Proof-of-Artificial Intelligence (PoAI) as an alternative to traditional PoW and PoS consensus mechanisms. In PoAI, nodes are classified into “super nodes” and “random nodes” using Convolutional Neural Networks (CNNs) trained on metrics such as transaction volume, network reliability, and security posture. Validators are dynamically selected based on capability scores rather than hash power or stake, aiming to reduce resource consumption and promote fairer node rotation. While PoAI offers improvements in efficiency and energy conservation, it introduces concerns about model explainability, fairness, and vulnerability to adversarial attacks. The criteria defining “super nodes” could inadvertently concentrate power among high-resource participants, undermining decentralization goals, particularly if CNN biases are not properly mitigated.
To address the limitations observed in conventional sharding and consensus mechanism designs, emerging AI-augmented mechanisms offer promising alternatives. Consensus mechanisms like PoAI [
82] leverage machine learning models to intelligently assign validator roles, thereby improving transaction confirmation speed and reducing energy consumption. Similarly, hybrid approaches that integrate PoS with AI-based shard reconfiguration [
81] provide dynamic adaptability to evolving network conditions. However, despite demonstrating measurable performance improvements, these AI-driven designs introduce new challenges related to transparency, fairness, and adversarial resilience. Future work must critically address these issues, including enhancing model interpretability, safeguarding against manipulative behaviors, and developing lightweight validation protocols, before widespread deployment in permissionless blockchain environments can be realized.
Li et al. in [
83] focuses on improving TPS in sharded blockchain systems through optimization of consensus-layer parameters, such as block size, shard count, and time interval. It introduces Model-Based Policy Optimization for Blockchain Sharding (MBPOBS), a Reinforcement Learning (RL) framework that uses Gaussian Process Regression to model blockchain performance and guides parameter optimization via the Cross-Entropy Method. The DRL component is used to predict performance outcomes and select optimal configuration policies in a sample-efficient manner. Simulation results show that MBPOBS yields substantial TPS improvements (1.1× to 1.26×) compared to model-free RL baselines (Batch Deep Q-learning and Deep Q-Network with Successor Representation). The primary strength of this work lies in its statistically grounded, sample-efficient method for optimizing consensus-related parameters. However, the study is limited to consensus performance and does not incorporate aspects such as trust, shard reliability, or cross-shard fault tolerance. Additionally, all evaluation is performed in a simulated environment, though the model’s robustness is tested under varying rates of malicious nodes (adversarial settings).
In conclusion, while these various consensus mechanisms provide effective approaches for managing blockchain transactions across different systems, they also present challenges related to scalability, reliability, and complex implementation. A thorough evaluation of each approach is essential to determine the best strategy for maintaining system performance and security. The scalability and adaptability of blockchain networks, especially in large or dynamic environments, are at risk due to bottlenecks or inefficiencies, whether from traditional consensus mechanisms or newer, more complex methods that could cause new complications. Although AI-based models such as PoAI, hybrid PoS-AI designs, and model-based optimization frameworks have shown measurable improvements in simulation settings, most have not been validated under real-world or adversarial conditions. In addition, many of these approaches do not address Trust Establishment, cross-shard fault tolerance, or model explainability, limiting their practical assessment for deployment in decentralized environments.
7.3. Epoch Randomness
The integration of Epoch Randomness within various sharding techniques significantly enhances the security and fairness by introducing unpredictability in node and shard assignments. This feature is crucial for preventing manipulation and ensuring equitable distribution of network load and responsibilities. Epoch Randomness enhances security, but its application across different sharding techniques presents challenges in balancing operational efficiency with increased complexity.
In Static Sharding, Epoch Randomness is implemented through distinct methods in several approaches. Elastico [
23] employs a Distributed Commit-and-XOR method, which creates a biased yet constrained set of random values that directly influence the PoW challenges in the subsequent epoch. This method ensures that randomness plays a role in the mining process, which helps strengthen security by making it harder for attackers to predict or manipulate the mining outcomes. Although the Commit-and-XOR method improves security, its complexity could result in excessive overhead, which may impact network efficiency as the network scales. OmniLedger [
51] uses a combination of VRF [
84,
85] and RandHound [
26], which ensures the randomness is both unbiased and unpredictable. RandHound’s approach, which involves dividing servers into smaller groups and using a commit-then-reveal protocol, ensures that the randomness includes contributions from at least one honest participant, thus maintaining integrity. The reliance on multiple server groups and protocols may slow down the process, especially in larger networks, potentially impacting overall performance. RapidChain [
70] opts for a Distributed Random Generation protocol optimized by a brief reconfiguration protocol based on the Cuckoo Rule [
86], allowing for rapid and unbiased randomness generation essential for its operational efficiency. Although RapidChain’s method speeds up transaction processes, it may not be able to expand when more complex random choices are required. As the need for sophisticated approaches grows, scalability issues may arise within blockchain networks.
Kronos [
66] generates epoch randomness for shard assignment using public randomness, which can be derived from PoW, PoS, or other secure sources. This process provides non-predictable, deterministic validator assignment. The protocol is designed to ensure consistent shard diversity and resilience against validator collusion through its random assignment process, assuming the underlying randomness is secure. However, if the randomness generation relies on PoW or PoS outputs rather than a publicly verifiable randomness beacon, it may be vulnerable if those outputs become skewed or dominated by a colluding miner or staker group.
DL-Chain [
67] generates randomness at each epoch using outputs VRFs, which provide unpredictability and allow local proof verification. The work in [
67] claims that the assignment process based on this randomness is bias-resistant, attributing this property to the use of VRF and VDF technologies as established in prior work. However, DL-Chain does not include simulation studies or experiments specifically evaluating the bias-resistance or security of its own randomness mechanism. Additionally, the protocol does not incorporate decentralized randomness aggregation or zero-knowledge proofs for randomness generation. This absence could, in principle, allow adversaries who control VRF private keys to subtly bias role allocations, a limitation that this work does not explicitly address.
Dynamic Sharding also explores Epoch Randomness but with varying emphases and integration depths. For example, the work in [
52] significantly focuses on incorporating Epoch Randomness within its cross-shard transaction process to enhance security. Other Dynamic Sharding approaches, such as those introduced in [
25,
53], recognize the importance of randomness but do not delve as deeply into its systematic integration as seen in Static Sharding. The integration of certain mechanism in Dynamic Sharding, such as those proposed in [
69], lacks proper organization, making them potentially vulnerable to manipulation or attacks. This risk is heightened in complex networks with high transaction volumes, and as the system scales, the threat becomes more evident.
DYNASHARD [
68] selects committees through an internal secure random process (in this context, committee selection refers to the random assignment of validators to serve as consensus groups for individual shards, responsible for transaction validation and consensus within the protocol). However, the protocol does not describe the frequency of reseeding or provide simulation evidence of committee diversity across epochs. Moreover, the randomness source is not publicly auditable, and in a permissionless adversarial setting, compromised entropy could bias committee selection without detection. This risk could be mitigated by adopting decentralized or externally verifiable randomness commitment protocols.
AEROChain [
55] introduces a single shared random seed per epoch, which governs both node reassignment and migration transaction determinism. This ensures synchronization without external coordination overhead. However, this randomness is not generated through a verifiable process such as VRFs or public randomness beacons. Its centralization could lead to vulnerabilities if the seed is manipulated. The scope is to enable deterministic AERO policy execution, validated indirectly through simulation-based performance improvements but not through cryptographic robustness tests.
SkyChain [
73] supports epoch-based reconfiguration, but the source and security of its randomness are unspecified. While re-sharding intervals and block size are adjusted based on DRL policies, the randomness mechanism remains opaque. As a result, it lacks public verifiability or resistance to seed manipulation. Its randomness approach is implicit and not evaluated independently.
Layered Sharding (e.g., Pyramid [
24], and OverlapShard [
54]), meanwhile, incorporates Epoch Randomness into the Cross-shard Algorithm process, ensuring that transactions across different layers of shards maintain unpredictability and security. This method is important for preventing targeted attacks and ensuring a fair distribution of transaction loads across the network. The multiple layers in this approach likely add to its complexity, which could lead to inefficiencies and affect overall performance. These risks are more likely to arise in large-scale implementations.
On the other hand, SPRING [
74] assigns nodes to shards at registration based on the last bits of the PoW solution string. The protocol incorporates a periodic reconfiguration phase in which consensus nodes are regularly shuffled among shards using VRF-generated randomness, ensuring that shard compositions remain unpredictable and resistant to long-term adversarial planning or validator collusion.
Traditional approaches relying on static randomness in sharding risk inefficiency as workloads become increasingly dynamic and predictable over time. AI-Shard introduced in [
49] addresses this by using a Graph Convolutional Network–Generative Adversarial Network (GCN-GAN) model to generate predictive node interaction matrices, enabling time-sensitive reshuffling that optimizes shard configurations based on anticipated workloads. This method demonstrably reduces cross-shard transactions and improves throughput in dynamic IoT environments. Wang et al. [
49] present prediction-based sharding as superior to static randomization, it inherently trades pure randomness for workload-driven optimization. From a security perspective, reliance on historical data and model predictions could potentially introduce patterns susceptible to adversarial exploitation if model errors or biases occur—though such risks are not discussed by the authors. Simulation results confirm AI-Shard’s performance advantages, but the ultimate security and adaptability of the framework would depend on the ongoing accuracy and robustness of its predictive models. To further enhance adaptability, Wang et al. [
49] introduce a dual-layer architecture with DRL-based parameter control (via Double Deep Q-Network), allowing for continuous reconfiguration in response to environmental changes. While this work highlights significant computational cost and challenges in real-world deployment, it lacks additional considerations such as robustness and model explainability.
Epoch Randomness holds an essential place in keeping blockchain networks safe and fair. Different sharding techniques use randomness in unique ways. Static Sharding uses solid and direct techniques. On the other hand, Dynamic and Layered Sharding are still evolving to improve randomness methods. Current research highlights the importance of randomness for the integrity and efficiency of blockchain tasks. It also identifies areas where these methods may not yet be fully effective. However, several approaches lack cryptographic verifiability, such as the use of non-transparent seed generation in AEROChain [
55] and SkyChain [
73], which do not provide public randomness proofs or resistance to manipulation. In AI-based approaches like AI-Shard [
49], shard assignments are decided by model predictions rather than by secure random numbers. This means that if the model makes mistakes or is biased, attackers might find and use patterns in how nodes are assigned. Moreover, approaches such as SPRING [
74] and DYNASHARD [
68] do not incorporate decentralized or auditable randomness sources, raising concerns about long-term entropy integrity in adversarial environments.
7.4. Cross-Shard Algorithm
Cross-shard Algorithm helps optimize transaction processing across different shards, which plays an important role in improving the overall resilience of a blockchain network.
Within Static Sharding technique, the Two-Phase Commit, as utilized in OmniLedger [
51], represents a fundamental approach where transactions affecting multiple shards are handled atomically. This method employs a bias-resistant public-randomness approach to select large, statistically representative shards, ensuring fair and efficient transaction execution. The cross-shard Atomix in [
51] extends this concept by ensuring that transactions are either fully completed or entirely aborted, maintaining consistency across shards in a Byzantine environment. The Two-Phase Commit process can cause delays, especially in busy networks, leading to slower transaction speeds and a noticeable impact on overall performance.
Al-Bassam et al. in [
43] implemented S-BAC further contributes to these robust cross-shard mechanisms by detailing a five-phase process starting from the Initial Broadcast to the Final Process Accept. This structure helps in mitigating issues such as rogue BFT-Initiators by implementing a two-phase procedure that waits for a timeout before taking action, thus safeguarding the integrity of transaction processing. Although S-BAC enhances security, its five-phase procedure is likely to cause unnecessary delays, particularly in time-sensitive situations, making transaction execution more difficult.
In contrast, RapidChain [
70] adopts a faster approach with its Cross-shard Verification, using a routing method inspired by Kademlia that minimizes latency and storage requirements for each node (meaning nodes only store their own shard’s data, not the entire blockchain) which enables quick identification and verification of transactions across shards, streamlining the validation process. RapidChain’s approach lowers latency, but if not properly monitored, its reliance on quick routing (where transactions are directed to the correct shard through a small number of efficient hops utilizing a routing table) may pose security threats, especially in a network with a large number of nodes.
Wang and Wang in [
42] introduce a novel concept of Eventual Atomicity where transactions are efficiently completed without relying on the traditional Two-Phase Commit mechanism. This method allows for asynchronous, lock-free interleaving of transactions across zones, enhancing the overall TPS of the blockchain network and reducing the confirmation latency for cross-zone transactions. To guarantee that all nodes come to a final consensus regarding the transaction status, the absence of a conventional commit procedure may cause issues with dependability and consistency.
Meepo [
71] presents a comprehensive investigation into sharded consortium blockchains by enhancing cross-shard efficiency, cross-contract flexibility, and shard availability through Cross-epoch, Cross-call mechanisms, Partial Cross-call Merging Strategy and maintaining rigorous transaction atomicity through a Replay-epoch. Although Meepo provides a thorough approach to meeting these demanding requirements, its reliance on mechanisms such as Cross-epoch, Cross-call mechanisms, Partial Cross-call Merging Strategy, and Replay-epoch may introduce additional complexity and overhead, particularly in large-scale deployments or environments with high transaction volumes.
Kronos [
66] introduces batch certification for cross-shard transactions using either vector commitments or Merkle trees, enabling atomic certification of multiple transactions in a single protocol instance. This “batch-proof-after-BFT” approach significantly reduces cross-shard messaging overhead, as a single batch proof can replace the need for separate proofs for each transaction, thus improving efficiency when processing large numbers of cross-shard transactions. The protocol provides strong atomicity guarantees by design, as proven in its formal analysis, and experiments confirm high TPS and low latency under various workloads. Within each batch, transactions are validated individually, and if a transaction fails validation, it is rejected without impacting other transactions in the batch.
DL-Chain [
67] applies a relay-forwarding model for cross-shard transaction handling. While efficient under normal conditions, relay operations lack a rollback or compensation mechanism. If a relay node fails or a destination shard becomes unavailable, transactions may be left in an incomplete or pending state without automatic recovery. This work does not present experiments specifically on this failure scenario, but the absence of a rollback protocol means such risks are not fully addressed.
Additionally, the work in [
52] explores Dynamic Sharding by employing a Two-Phase Commit protocol and transaction splitting to manage cross-shard transactions effectively. It uses Anchorhash in conjunction with the Jump Consistent Hash Algorithm to minimize disruptions in node assignment mapping caused by sharding changes. This method is recognized for enhancing scalability through more efficient transaction processing and robust auditing capabilities. Even though dynamic reconfiguration during transaction processing offers flexibility, it may introduce performance bottlenecks in rapidly growing networks. Notably, the work in [
52] is the only work we identified within the Dynamic Sharding paradigm that incorporates a Cross-shard Algorithm, suggesting that this strategy is uncommon in Dynamic Sharding.
DYNASHARD [
68] validates cross-shard transactions using a combination of MPC and threshold signature schemes. This ensures atomic validation even under adversarial conditions, providing strong security guarantees against coordinated attacks. However, the reliance on synchronous MPC introduces significant computational overhead, and the protocol does not optimize for early commitment under low-fault scenarios, resulting in persistent high latency even when failures are rare.
Building on this foundation, DYNASHARD [
68] proposes a hybrid consensus mechanism that integrates threshold-based signatures with decentralized validation to enhance cross-shard transaction efficiency. Unlike traditional commit-based protocols, it employs Merkle-based synchronization and real-time shard boundary adjustments, allowing faster responsiveness under varying workloads. This model exemplifies a broader shift toward dynamic, lightweight cross-shard coordination mechanisms tailored for high-throughput, high-variability blockchain environments.
AEROChain [
55] features a Cross-shard Transaction Module (CSTM) inspired by Monoxide’s relay-based design. It processes intra-shard and cross-shard transactions during consensus, while migration-specific cross-shard transactions are executed during reconfiguration. Grouping by prefix-based account abstraction reduces overhead. However, AEROChain lacks atomic commit, rollback, or escrow mechanisms for cross-shard failures. The focus is on reducing cross-shard transaction frequency through migration rather than ensuring atomicity. The module is embedded in the DRL-evaluated framework but not tested separately for failure tolerance.
SkyChain [
73] claims cross-shard support, but its implementation lacks clarity. There is no specification of consistency guarantees or error recovery protocols in partial transaction failure scenarios. Its cross-shard mechanism is undeveloped and not supported by simulation results or architecture diagrams, making this a major limitation in transactional robustness.
Finally, the Layered Sharding architecture in Pyramid [
24] offers a sophisticated framework for managing cross-shard transactions. It employs a combination of internal (i-shard) and bridging (b-shard) shards, with b-shard nodes tasked with verifying and proposing blocks that span multiple shards. In contrast, OverlapShard [
54] primarily relies on a structure comprising both Actual and Virtual shards. The work in [
24] supports a unique block preparation process using Co-si for scalability, which needs further refinement to enhance its effectiveness. While the Layered Sharding approaches introduce innovation, it could complicate block preparation, potentially reducing TPS and increasing latency, particularly in complex, highly layered architectures.
SPRING [
74] minimizes cross-shard interactions by proactively assigning new addresses to shards using a DRL agent, significantly reducing the frequency of cross-shard transactions. For unavoidable cross-shard transactions, SPRING adopts a relay-based processing model, in which the transaction is processed on the source shard and then relayed to the target shard for finalization. However, the protocol does not specify any formal rollback, escrow, or atomic commit mechanisms to guarantee consistency in the event of partial failures or complex multi-shard dependencies. As such, SPRING may be less robust in scenarios involving intricate, interdependent cross-shard transactions.
With an emphasis on improving cross-shard transactions to increase scalability, reliability, and efficiency, these diverse approaches (e.g., [
24,
52]) demonstrate the ongoing innovation in blockchain technology. However, each approach has unique drawbacks, particularly in terms of added complexity and potential performance compromises. A major challenge in creating reliable and effective blockchain operations is finding the optimal balance between the demands of large-scale, decentralized networks and the need for robust performance, scalability, and security. Several works, such as AEROChain [
55] and SkyChain [
73], do not provide explicit guarantees for transactional atomicity or rollback mechanisms to manage partial cross-shard failures. This absence makes it difficult to comprehensively evaluate their fault tolerance, as there is insufficient evidence regarding their behavior under adverse or failure scenarios. Additionally, methods employing batching or multi-phase coordination, including Kronos [
66] and S-BAC [
43], inherently introduce latency under conditions of low transaction volume or in time-sensitive contexts such as real-time financial systems and IoT applications, where rapid transaction processing is critical. Critically, these protocols lack detailed mitigation strategies, including adaptive coordination methods or simplified fallback procedures, to effectively minimize latency under these less demanding operational scenarios. Consequently, this unaddressed latency significantly impacts the evaluation of their practical responsiveness and reliability, particularly in high-throughput or adversarial environments.
7.5. Cross-Shard Capacity
Cross-shard Capacity is one of the unique methods to optimize shard operation and transaction processing across diverse blockchain environments. It involves adjusting the number and size of shards to decrease cross-shard communication. Notably, this method is not employed in Static or Layered Sharding techniques.
C. Chen et al. in [
69] propose a method in Dynamic Sharding that modifies user distribution according to real-time network conditions in a public blockchain. This method differs from conventional sharding techniques that often allocate users randomly, hence enhancing long-term system performance in dynamic settings. This protocol includes stages such as the Validator Redistribution Approach and the Validator Vote and System Reconfiguration Approach, which allow for ongoing adjustments to shard composition in response to changing network demands. Although this dynamic protocol improves flexibility, the frequent need for reconfiguration and redistribution may increase operational complexity and affect the stability of the system as a whole in larger networks.
Concurrently, Tao et al. in [
25] present a methodology that correlates the quantity of miners per shard with the transaction volume within each shard. This methodology, illustrated by the MaxShard system, guarantees that shards experiencing elevated transaction volumes are allocated adequate processing resources to sustain performance. Furthermore, it utilizes a combination of Inter-shard Merging, Intra-shard Transaction Selection, and Parameter Unification Method to optimize cross-shard communication, enabling transaction validation within each shard, thus reducing latency and increasing TPS. However, if demand fluctuates rapidly, allocating miners to transaction volumes may result in resource imbalances amongst shards. This could therefore worsen computing overhead and bottlenecks.
Effective Sharding Consensus Mechanism in [
52] focuses on the initial assignment of nodes to shards and the subsequent redistribution as shards evolve. This dynamic allocation helps maintain balance across the network and adapt to changes (e.g., shard addition or deletion). The Node Remapping process, integral to this approach, ensures nodes are spread equitably across the remaining shards, maintaining the integrity and efficiency of the network. Even if dynamic allocation improves balance, frequent node remapping can increase computational load and cause delays, especially in networks with high shard volatility.
The Dynamic Sharding Protocol Design for Consortium Blockchains in [
72] utilizes Unspecified Agents to oversee transactions inside a consortium blockchain framework. These agents allocate transactions to shards according to the sender’s information and oversee the consensus process throughout the network. The concept uses random integers to construct unpredictable routing tables for each epoch and incorporates both Boss Shard and Normal Shard components, hence enhancing security against attacks on shard integrity. For larger deployments, the greater system complexity that comes with enhanced security through Unspecified Agents and random routing must be weighed against significant scalability and maintainability challenges.
By creating public and private keys and exchanging identifying information, each node in this method [
72] initializes and guarantees secure connections inside the consortium blockchain architecture. In order to ensure reliable transaction processing and system resilience, the method also includes successive phases such as Transaction Sharding and Micro Block Generation, Full Block Generation, and Synchronization. Despite its resilience, the multi-phase design may result in synchronization delays (particularly during the aggregation and dissemination of micro-blocks by the boss shard and their verification across other shards). This delay is notably intensified under severe transaction loads, where large volumes of simultaneous transactions increase complexity, communication overhead, and processing latency, potentially impacting overall system performance.
DYNASHARD [
68] dynamically regulates its transaction load through the support of shard splitting and merging, informed by real-time assessments of transaction volume and system resource use. When a shard’s transaction volume or resource utilization surpasses a specified splitting threshold, the shard is divided into numerous smaller shards to evenly distribute the load and preserve transaction processing efficiency. Should a shard exhibit continuously low activity, as evidenced by its transaction volume and resource utilization falling beneath a designated merging threshold for multiple consecutive epochs, DYNASHARD may merge it with other underutilized shards to improve overall resource efficiency and system performance. Because of this adaptable approach, DYNASHARD can continue to function reliably even in the event of transaction volume fluctuations or low levels.
However, the shard splitting and merging process introduces considerable complexity in maintaining system consistency. During a split, the shard’s entire state, including ongoing transactions, must be atomically divided among the new child shards. Validators must update their local views of the network topology, while ensuring that no transactions are lost, duplicated, or incorrectly assigned. Similarly, merging shards requires reconciling multiple independent shard states into a single coherent ledger without introducing transaction conflicts or state corruption. These transitions are highly sensitive to timing and synchronization accuracy, and if validators operate on outdated shard mappings or fail to synchronize their views properly, temporary inconsistencies could arise, making the system vulnerable to double-spending attacks or transaction replay.
Moreover, there is currently no specified rollback or recovery plan in place for DYNASHARD in the event that a split or merge operation fails in the middle of the process, such as validator crashes, partial network partitions, or anomalies in transaction queuing. Incomplete shard state changes, validator discord, or partial ledger divergence may arise from a failed migration if transactional atomicity guarantees are absent during shard reconfigurations. Even though dynamic shard restructuring greatly enhances scalability and resource efficiency, the problems show that it needs strong migration protections, fault-tolerant processes, and strict state validation mechanisms to keep the system’s integrity during transitional times.
AEROChain [
55] offers AERO, a DRL-based optimizer that balances shard loads and minimizes cross-shard transactions by utilizing prefix-level account grouping. The DRL policy is taught on actual Ethereum data and evaluated against five benchmarks (e.g., SPRING [
74] and Monoxide [
42]). It attained a 31.77% increase in TPS compared to state-of-the-art approaches, due to diminished cross-shard transaction ratios and enhanced migration operations. Nonetheless, its adaptability is reactive and limited in adversarial robustness evaluation. Although the prefix abstraction increases training efficiency, performance may be impaired by abrupt behavioral changes or malicious behaviors.
SkyChain [
73] facilitates the modification of shard quantity and block size via DRL-driven re-sharding. It characterizes sharding dynamics as a Markov Decision Process and adjusts parameters to optimize security and performance. Nonetheless, its limitations include a lack of granularity in workload data and insufficient learning-based workload adaptability, unlike approaches such as AEROChain [
55]. No migration plan or optimization focused on cross-shard transaction is provided. Simulation results demonstrate that SkyChain attains approximately 30–35% enhancement in overall TPS relative to fixed sharding baselines. However, the assessment is confined to aggregate measures and fails to examine performance on a per-shard or per-transaction basis, hence neglecting real transaction-level implications.
Enhancing cross-shard communication and transaction processing by adjusting shard size and quantity to maximize Cross-shard Capacity is a common theme among these approaches (e.g., [
52,
69]). Methods such as dynamic user allocation [
69] (where a user denotes an account that submits transactions to the blockchain) and node reassignment [
52] (with nodes defined as servers or computers tasked with processing and validating transactions) facilitate workload distribution and ensure efficient transaction processing across shards. Moreover, the emphasis on minimizing cross-shard communication in On Sharding Open Blockchains [
25] and enhancing TPS across every approach highlights the persistent endeavors to enhance latency and TPS rates, which are critical metrics for blockchain network performance. While these approaches enhance TPS and inter-shard communication, dynamic shard management may lead to inefficiencies, especially in extensive or volatile networks. The approaches analyzed in the existing literature offer a thorough review of strategies for improving Cross-shard Capacity in both public and consortium blockchains. Considering their potential, these approaches encounter considerable problems regarding complexity, scalability, and adaptability, especially in dynamic environments. Continuous adjustments are essential to maintain resilience and resilient performance.
Zhang and Xue in [
48] reformulated the shard allocation problem in the FLPShard model as a Single-Source Capacitated Facility Location Problem, allowing for variable node assignment based on inter-node interdependence, latency, and geographic proximity. The concept improves intra-shard efficiency and diminishes the frequency and expense of cross-shard transactions by clustering highly interacting nodes. This approach is especially effective in Industrial IoT settings, where physical infrastructure and organizational hierarchies affect communication dynamics. Although FLPShard demonstrates formal rigor and quantifiable enhancements in TPS and latency, it introduces significant computing complexity and assumes constant environmental conditions, which may restrict its applicability in more dynamic or diverse deployment scenarios.
On the other hand, the AI-Shard framework in [
49] utilizes a predictive and adaptive approach for capacity management. It utilizes a GCN-GAN model to predict changing node interaction matrices and leverages a DRL controller to dynamically optimize shard configurations. This anticipatory technique enables AI-Shard to proactively synchronize shard structures with fluctuating communication patterns, reducing cross-shard interactions and enhancing TPS, particularly in Building IoT applications. The dual-layer architecture of AI-Shard, comprising a coordinating main shard and adaptive sub-shards, enhances scalable collaboration in densely populated IoT contexts. The reliance on past data for training, sensitivity to DRL parameter adjustment, and the possibility of unstable convergence, however, provide significant difficulties that could compromise real-time responsiveness in hostile or volatile networks.
Lin et al. in [
87] emphasize improving the scalability and communication efficiency of blockchain-based Federated Learning (FL) systems within Intelligent Transportation Systems. It tackles the constraints of current two-layer blockchain systems that depend on static shard parameters and entail substantial inter-chain communication expenses. The primary contributions are a reputation-based shard selection method to eradicate bad nodes, a streamlined shard transmission strategy to minimize overhead, and a DRL-based adaptive sharding controller to dynamically optimize shard configurations. The suggested approach incorporates DRL to perpetually adjust shard configurations based on environmental insights, while subjective logic is employed to represent trust both within and among shards. The simulation findings indicate that the framework enhances throughput and decreases latency while maintaining FL accuracy. The main advantage resides in the efficient integration of trust management with DRL-based shard adaptation, directly tackling Cross-shard Capacity challenges. Since the work was assessed using a simulator, a real-world validation is required to determine its practical efficacy.
Chen et al. in [
88] focus on scalability and flexibility in blockchain-enabled IoT systems by tackling reconfiguration delays and suboptimal shard allocation. To organize IoT devices into optimized shards, the proposed Block-K Clustering approach integrates multiple algorithms, including K-means clustering, Genetic Algorithms, and the Cuckoo Rule, to enhance shard formation and efficiency. A DRL model is incorporated to dynamically modify the number of shards and consensus parameters according to transactional patterns and device clustering dynamics. Their system represents the network as a Dynamic Transaction Flow Graph, utilizing a DRL component designed to enhance cluster modularity while concurrently reducing inter-shard communication. Simulation outcomes validate that the approach significantly boosts TPS, diminishes reconfiguration duration, and strengthens system resilience against malicious activity. A primary advantage is the integration of many algorithms that facilitates real-time, performance-sensitive shard modification. Nonetheless, this work does not investigate the framework’s scalability in exceedingly large IoT networks, nor does it thoroughly assess the communication costs associated with frequent reconfiguration.
Taken together, both frameworks in [
87,
88] emphasize that optimizing Cross-shard Capacity requires not only increasing shard count but also the smart management of shard composition, interaction density, and responsiveness to fluctuating demands. Adaptive capacity optimization is an underexplored design component, yet it is critical for developing scalable, robust sharded blockchains capable of supporting sophisticated, high-throughput applications in IoT and beyond. However, many of these techniques rely primarily on simulation-based evaluations, leaving little insight into their efficacy in real-world deployment conditions or hostile network scenarios. Furthermore, numerous systems, especially those utilizing DRL-based reconfiguration, inadequately highlight the coordination of shard rebalancing during state transitions, as well as the maintenance of consistency amid delayed synchronization or partial node updates. These unresolved issues hinder the evaluation of robustness and practicality in dynamic settings.
7.6. DAG Block Structure
The integration of DAG offers a sophisticated approach to managing data structures and transaction processing [
89,
90]. This method naturally aligns with the inherent characteristics of blockchain transactions and object interactions, in which each transaction may invalidate certain inputs and generate new active objects as outputs, thereby forming a DAG. This feature is predominantly observed in Static Sharding.
For example, OmniLedger [
51] employs a block-based DAG (BlockDAG) to enhance the concurrent processing of blocks within its architecture. In this approach, each block can reference multiple parent blocks, effectively capturing the simultaneous actions and interdependencies of transactions across different blocks. Similarly, the Hash-DAG structure [
43] represents transactions and objects in a directed graph format, illustrating the dynamic relationships among existing objects, transactional modifications, and newly created outputs. However, the simultaneous interdependencies in both BlockDAG and Hash-DAG structures can complicate data consistency, especially under high transaction loads where tracking multiple parent blocks may delay consensus.
In the approach proposed by Lee and Kim in [
91], the integration of DAGs into blockchain sharding frameworks provides an advanced method for managing asynchronous transactional data, particularly in FL environments with non-independent and identically distributed data distributions. Each shard maintains an independent DAG ledger, enabling heterogeneous nodes to asynchronously record and aggregate model updates without serialization bottlenecks. Their architecture aligns naturally with FL, where updates frequently diverge from earlier model states.
To secure aggregation under adversarial conditions, the framework employs a novel tip selection algorithm based on model accuracy, similarity, and update multiplicity, mitigating risks such as poisoned models. While integrating a DAG enhances TPS and resilience, it also presents new challenges. For instance, careful coordination is needed to verify transaction multiplicity and maintain aggregation consistency during asynchronous updates. Additionally, if the random tip selection is poorly tuned, it can hinder model convergence. Nonetheless, empirical evaluations demonstrate that the DAG-enabled system significantly outperforms chain-based FL in robustness and accuracy.
While Lee and Kim [
91] primarily design their layered DAG model for FL rather than general-purpose blockchain sharding, their work illustrates the broader potential of DAGs to facilitate scalable, parallel transaction aggregation in decentralized environments, albeit with some trade-offs in communication overhead for general transactional systems.
The adoption of DAG structures, as seen in [
43,
51], offers significant advantages in terms of transaction processing speed and data integrity by enabling more flexible and efficient block validation and propagation mechanisms. Nevertheless, achieving a consistent transaction history within a DAG remains challenging because there is no straightforward method for tracking the complete sequence of interdependent transactions, making discrepancy recovery difficult.
In summary, while DAGs can enhance the flexibility and speed of blockchain transaction processing, managing complex transaction histories and resolving inconsistencies continue to pose significant challenges. These issues could compromise the long-term integrity and reliability of sharded blockchains, indicating that further research and development are necessary to fully realize the potential of DAG-based approaches in improving blockchain scalability and efficiency. Moreover, unlike traditional linear blockchains, DAG-based systems require more sophisticated conflict resolution and ordering strategies, particularly in multi-shard environments where concurrent updates may propagate with inconsistent timing. While domain-specific implementations such as FL have shown promise, general-purpose blockchain networks integrating DAGs must contend with broader coordination, synchronization, and validation overheads that can impact scalability.
7.7. Availability Enhancement
In a consortium blockchain network, the primary goal of availability enhancement is to improve the resilience of the blockchain network by serving as a contingency for shard failures. To the best of our knowledge, this mechanism is uniquely implemented in the Static Sharding approach in [
71].
The work presented in Meepo [
71] focuses on enhancing shard robustness through a backup mechanism called Shadow Shard-Based Recovery. In this approach, each shard is equipped with several shadow shards that act as backup servers. When a primary shard encounters issues, a consortium member can seamlessly switch to a corresponding backup shard to ensure uninterrupted blockchain operations. Zheng et al. in [
71] implement shadow shards to reinforce the robustness of each shard, ensuring that the overall system continues to function despite individual component failures. While redundancy strategies improve system availability compared to centralized systems that are prone to single points of failure, managing shadow shards can increase overhead. This issue becomes increasingly evident in large-scale deployments. The frequent switching between main and backup shards could slow down operations and need more resources, which raises concerns about scalability and overall resource use.
The implementation of Shadow Shard-Based Recovery within Meepo’s architecture [
71] offers a practical approach to the critical challenge of maintaining high availability and reliability in blockchain networks. Despite its benefits in improving resilience, the method can complicate maintenance and elevate operating expenses as the number of shards and shadow shards grows. This trade-off between resource efficiency and robustness underscores the challenges of achieving high availability in large-scale blockchain networks.
Kronos [
66] integrates rollback recovery into its cross-shard transaction framework, enabling the system to maintain liveness even when certain cross-shard transactions enter the unhappy path. If a cross-shard transaction partially commits to one input shard and another input shard subsequently determines its input is invalid, Kronos triggers an atomic rollback to revert the affected transaction. However, experimental results show that as the frequency of rollbacks increases, system TPS decreases, reflecting the overhead of handling unhappy paths. Simulation results demonstrate that Kronos remains live and continues processing transactions in the presence of rollbacks, though at reduced TPS.
Further complexity arises when transactions produce cascading effects before final commitment, such as state transitions that indirectly influence other shards or involve off-chain acknowledgments. In such cases, rolling back a single transaction may not be sufficient to fully restore system integrity unless additional compensatory actions are taken. Moreover, while rollback improves fault tolerance, it increases coordination overhead during failure scenarios, as shards must synchronize their rollback states and confirm consistency before resuming operations. Under adversarial conditions where failures are deliberately triggered, frequent rollbacks could degrade system performance and expose shards to liveliness bottlenecks. Although Kronos’ integrated rollback mechanism represents a significant advance over sharding mechanisms lacking explicit recovery protocols, ensuring efficient and fully coherent rollback under complex dependency chains remains an open engineering challenge.
Even though Meepo’s Shadow Shard-Based Recovery [
71] method significantly enhances system resilience, scalability may be limited by the increased complexity and resource requirements associated with managing shadow shards. As the number of primary shards grows, the associated shadow shards must also be proportionally maintained, which can lead to increased storage, communication, and synchronization overhead. This fixed redundancy model does not incorporate dynamic adaptation to traffic or failure patterns, potentially resulting in underutilized resources. Furthermore, the authors do not provide empirical analysis or simulation results evaluating performance impacts under varying network sizes or failure rates, leaving the practical limits of this mechanism unquantified. This observation highlights the ongoing trade-off between high availability and the operational expenses of large-scale blockchain networks. In contrast, Kronos [
66] approaches availability through rollback-based recovery embedded in its consensus logic, which avoids shadow redundancy but introduces coordination complexity during failure handling.
7.8. Summary
Although many of the features covered in this section, such as Trust Establishment, Consensus Selection, Epoch Randomness, Cross-shard Algorithm, Cross-shard Capacity, DAG Block Structures, and Availability Enhancement, are widely utilized in general blockchain systems, they remain fundamental to the design of sharded blockchain architectures. Within sharding, these features are not simply adopted from broader blockchain frameworks. They are selectively adapted, extended, or re-engineered to address the specific demands of decentralized shard management, cross-shard consistency, and dynamic system scaling.
Most sharding approaches, whether Static, Dynamic, or Layered, rely on specialized implementations or variations of these core features to maintain TPS, resilience, and decentralization at scale. Their continued evolution is critical, particularly as sharded systems face increasingly complex operational and adversarial challenges.
In addition to these traditional approaches, recent developments have introduced AI-augmented techniques that incorporate predictive modeling, reinforcement learning, and reputation-driven strategies. While these AI-based methods are not directly rooted in the existing Static, Dynamic, or Layered Sharding approaches discussed in this section, they introduce novel features aimed at improving shard adaptability, workload prediction, security hardening, and resource optimization. As such, they represent an important emerging direction that complements and extends the evolution of sharding technologies.
The interplay between the features covered in this section, their specialization across different sharding techniques, and the gradual shift from broad to more targeted optimizations will be discussed in the next section, providing a deeper view of how sharding techniques have evolved over time.