Decentralized Dynamic Heterogeneous Redundancy Architecture Based on Raft Consensus Algorithm

Chen, Ke; Shi, Leyi

doi:10.3390/fi18010020

Open AccessArticle

Decentralized Dynamic Heterogeneous Redundancy Architecture Based on Raft Consensus Algorithm

by

Ke Chen

and

Leyi Shi

^*

College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266580, China

^*

Author to whom correspondence should be addressed.

Future Internet 2026, 18(1), 20; https://doi.org/10.3390/fi18010020 (registering DOI)

Submission received: 9 November 2025 / Revised: 23 December 2025 / Accepted: 23 December 2025 / Published: 1 January 2026

(This article belongs to the Section Cybersecurity)

Download

Browse Figures

Versions Notes

Abstract

Dynamic heterogeneous redundancy (DHR) architectures combine heterogeneity, redundancy, and dynamism to create security-centric frameworks that can be used to mitigate network attacks that exploit unknown vulnerabilities. However, conventional DHR architectures rely on centralized control modules for scheduling and adjudication, leading to significant single-point failure risks and trust bottlenecks that severely limit their deployment in security-critical scenarios. To address these challenges, this paper proposes a decentralized DHR architecture based on the Raft consensus algorithm. It deeply integrates the Raft consensus mechanism with the DHR execution layer to build a consensus-centric control plane and designs a dual-log pipeline to ensure all security-critical decisions are executed only after global consistency via Raft. Furthermore, we define a multi-dimensional attacker model—covering external, internal executor, internal node, and collaborative Byzantine adversaries—to analyze the security properties and explicit defense boundaries of the architecture under Raft’s crash-fault-tolerant assumptions. To assess the effectiveness of the proposed architecture, a prototype consisting of five heterogeneous nodes was developed for thorough evaluation. The experimental results show that, for non-Byzantine external and internal attacks, the architecture achieves high detection and isolation rates, maintains high availability, and ensures state consistency among non-malicious nodes. For stress tests in which a minority of nodes exhibit Byzantine-like behavior, our prototype preserves log consistency and prevents incorrect state commitments; however, we explicitly treat these as empirical observations under a restricted adversary rather than a general Byzantine fault tolerance guarantee. Performance testing revealed that the system exhibits strong security resilience in attack scenarios, with manageable performance overhead. Instead of turning Raft into a Byzantine-fault-tolerant consensus protocol, the proposed architecture preserves Raft’s crash-fault-tolerant guarantees at the consensus layer and achieves Byzantine-resilient behavior at the execution layer through heterogeneous redundant executors and majority-hash validation. To support evaluation during peer review, we provide a runnable prototype package containing Docker-based deployment scripts, pre-built heterogeneous executors, and Raft control-plane images, enabling reviewers to observe and assess the representative architectural behaviors of the system under controlled configurations without exposing the internal source code. The complete implementation will be made available after acceptance in accordance with institutional IP requirements, without affecting the scope or validity of the current evaluation.

Keywords:

dynamic heterogeneous redundancy; raft consensus algorithm; decentralized architecture; endogenous security

1. Introduction

The rise of Advanced Persistent Threats (APTs) and zero-day vulnerabilities in cyberspace [1] has rendered the traditional passive defense paradigm—for boundary isolation, we define a multi-dimensional attacker model with signature detection—increasingly ineffective against the evolving landscape of cyberattacks. Dynamic heterogeneous redundancy (DHR), a core technology for active defense, enhances vulnerability tolerance by ‘creating heterogeneous execution environments’ and employing ‘redundant result comparison’. However, traditional DHR architectures depend on a centralized control module, which is a critical vulnerability in security defense. Centralized DHR systems are susceptible to two major risks: first, single-point failures—where the control node directly disables the entire security decision-making process; second, trust bottlenecks—where a malicious control node could manipulate isolation instructions or alter response comparison results, undermining data integrity. This flaw significantly limits the large-scale deployment of DHR in security-critical domains such as financial transactions and industrial control systems.

To overcome the limitations of centralization, distributed consensus algorithms have emerged as a key technical solution. The Raft consensus algorithm, which focuses on ‘majority node consensus verification’, offers clear crash fault tolerance (CFT) capabilities (with the condition f < N/2, where N is the total number of nodes in the cluster), ease of engineering implementation, and low-latency characteristics. Raft has demonstrated its industrial-grade utility in distributed systems, such as that presented by Kubernetes [2]. This paper proposes a decentralized DHR architecture with a Raft consensus cluster as the control core, integrated with a DHR execution layer, to establish a defense paradigm based on ‘consensus-driven security decision-making and heterogeneity-based vulnerability tolerance.’ In this system, the leader node proposes logs (e.g., business requests, executor isolation instructions), while follower nodes participate in consensus voting to ensure uniform agreement across the system state. Additionally, the system leverages a set of heterogeneous execution components—service instances with varying operating systems, libraries, and algorithm implementations—to protect against attacks like the ‘mass exploitation of common vulnerabilities’.

However, there are still notable gaps in the existing research: on one hand, few studies have deeply integrated the Raft consensus with the DHR architecture, and there is a lack of systematic designs for the collaboration logic of the “input proxy layer–consensus layer–execution layer–voting layer” network. For instance, key mechanisms such as the “distributed consensus submission of response logs (ResponseLogEntry)” mentioned in this paper have not yet formed standardized solutions; on the other hand, the formal description of attacker models for such decentralized architectures is insufficient, and there is a lack of quantitative trade-off analysis between security properties (consistency, availability) and performance overheads, making it difficult to clarify their defense boundaries and engineering implementation feasibility.

This study systematically constructs a decentralized defense framework that integrates “Raft consensus with heterogeneous redundancy”, addressing key limitations in the existing integration of distributed consensus algorithms and active defense technologies. By formally defining a multi-dimensional adversary model—encompassing external attackers, internal executor attackers, internal node attackers, and collaborative Byzantine attackers—the work establishes a consistency guarantee mechanism under the condition f < N/2, thereby enhancing the theoretical foundation for security analysis in decentralized DHR architectures. Additionally, mechanisms such as “decentralized log-based voting” and “time-triggered Leader rotation” introduced in this paper offer a new technical paradigm for incorporating dynamism and randomness into active defense systems.

Against the single-point failure, trust bottlenecks of traditional DHR architectures and existing gaps (insufficient Raft–DHR integration, lack of standardized consensus-driven security mechanisms), this study makes three key contributions, which are described here.

We design an optimized hierarchical architecture integrating Raft with DHR: deploying “leader routing” in the input proxy layer for efficient request forwarding, adapting “dual logs (BusinessLogEntry/ResponseLogEntry)” in the Raft layer to control security-critical decisions via global consistency, and adding a random selection mechanism in the heterogeneous execution layer; accordingly, we are able to address the centralized limitation of traditional DHRs.

We build an atomic anomaly handling mechanism based on the “anomaly detection–isolation proposal–collaborative isolation–service recovery” flow. Isolation instructions take effect only after majority consensus, with an “anomaly evidence chain” (ID, deviated hash, timestamp) for traceability—this solves the untrustworthy isolation and inconsistent states in traditional DHRs.

We develop a prototype system with five heterogeneous nodes and conduct comprehensive security performance tests: in the security tests, we simulated four attack scenarios (external, internal executor/node, collusive Byzantine) to quantify the detection rate, availability, and consistency; performance tests were used to measure throughput, latency, and resource consumption to clarify overhead boundaries—this provides data support for engineering applications.

The remainder of this paper is structured as follows: Section 2 reviews related work on DHR and Raft consensus; Section 3 presents the core components of the decentralized DHR architecture, including the normal business request processing flow, anomaly detection and isolation mechanisms, and proactive security strategies; Section 4 formally defines the attacker tuple and describes the behavioral characteristics of the four types of attackers by category; Section 5 systematically analyzes the architecture’s guarantee mechanisms in terms of consistency, availability, integrity, and confidentiality, and demonstrates its defense boundaries; Section 6 introduces the experimental environment configuration, designs four attack simulation schemes, and analyzes core indicators such as attack detection rate and system availability; Section 7 tests the architecture’s throughput, latency, and resource consumption under normal and attack scenarios to quantify the performance overhead; Section 8 summarizes the research results.

Limitations and scope of Raft security: It is important to emphasize that the underlying consensus layer in our architecture remains Raft, which is a crash-fault-tolerant (CFT) protocol, rather than being a Byzantine-fault-tolerant (BFT) one. Our design does not attempt to convert Raft into a full BFT protocol. Instead, we layer additional mechanisms on top of Raft—such as cryptographic validation of client requests and response logs, majority voting in the heterogeneous execution layer, and evidence-based isolation—to constrain the damage that misbehaving nodes can cause when the majority of nodes are still honest. Throughout the paper, we therefore distinguish between formally guaranteed properties under Raft’s CFT fault model and empirical stress test results under collaborative Byzantine-like behaviors, and we explicitly characterize the corresponding defense boundary.

Security scope and artifact availability: It is important to emphasize that our architecture does not modify Raft into a Byzantine-fault-tolerant protocol. Raft remains a crash-fault-tolerant (CFT) consensus algorithm, and its guarantees a break under arbitrary Byzantine behaviors such as equivocation, forged votes, or divergent log replication. Our contribution is to layer a heterogeneous redundant execution and evidence–carrying response–log pipeline on top of Raft, so that the system can maintain correct outputs and auditable isolation decisions even when some executors or nodes behave maliciously, while preserving Raft’s CFT safety at the log level. To support transparency and evaluation, we provide a runnable Docker-based prototype artifact that enables reviewers to observe and evaluate representative architectural behaviors of the proposed system under the controlled configurations.

2. Related Work

2.1. Dynamic Heterogeneous Redundancy

The dynamic heterogeneous redundancy (DHR) architecture combines fault-tolerant computing with endogenous security technologies [3]. The concept originated from design diversity and N-version programming (NVP) techniques, which are widely used in security-critical fields [4]. These traditional methods protect against random or common-cause failures by creating multiple software versions that perform the same function but have different implementations. However, common-mode failures can still occur due to shared specification misunderstandings, algorithmic flaws, or underlying library dependencies. This limits the effectiveness of traditional static diversity in defending against strategic cyberattacks [5].

The cyberspace mimic defense theory, proposed by J. Wu, represents a significant advancement in transforming DHR from a conceptual idea into a practical engineering architecture [6]. Based on the collaboration of three core elements—dynamism, heterogeneity, and redundancy—this architecture builds a closed-loop defense system consisting of components such as a heterogeneous executor pool, input proxy, multi-mode arbitrator, and feedback control scheduler [7]. The defense mechanism of DHR lies in three aspects: heterogeneity reduces the probability of horizontal penetration of a single vulnerability [8]; dynamism disrupts the stable environment required for attack chains [9]; multi-mode arbitration identifies and isolates abnormal behaviors [10]. Theoretical analysis shows that this architecture can effectively defend against attacks based on unknown vulnerabilities without relying on prior attack knowledge, providing insights for addressing Advanced Persistent Threats (APTs) and zero-day vulnerabilities [11].

The development of the DHR architecture presents a trend of evolving from macro-structure to micro-optimization, and from centralized control to distributed collaboration. Early studies mainly focused on platform-level heterogeneity in terms of instruction set architectures (ISAs) (e.g., x86, ARM, MIPS). Nevertheless, platform-level heterogeneity may still have common vulnerabilities due to shared compilers, operating system kernels, or algorithm implementations. Therefore, recent studies have shifted toward constructing deeper heterogeneity and quantifying it. Wu Ting et al. proposed an enhanced DHR (IDHR) architecture based on executor partitioning. By partitioning the executor set according to the heterogeneity among executors, the heterogeneity between each executor pool is enhanced, and the dynamic selection algorithm in the scheduling module is improved. Experiments show that the IDHR architecture outperforms traditional DHR in terms of attack success rate and controlled rate, especially achieving a significant security improvement when common vulnerabilities are unknown. In terms of arbitration mechanisms, research has also evolved from simple majority voting to more sophisticated algorithms—for example, introducing an arbitration mechanism based on output difference feedback and a scheduling strategy based on system efficiency—to balance security while reducing computational costs and system overheads [12].

The aforementioned optimizations for the execution layer also highlight the vulnerability of the centralized control module in traditional DHR architectures. As the sole decision-making point for scheduling and arbitration, once this control module is compromised, attackers may tamper with scheduling logic, shield anomaly detection results, or initiate malicious isolation, leading to the failure of the entire defense system [13]. Although some studies have attempted to adopt hot standby of multiple control nodes or polling mechanisms to improve reliability, the lack of strict state consistency guarantees may cause discrepancies among different control nodes in their understanding of executor health status and anomaly evidence chains, thereby introducing new uncertainty risks s [14]. This gap limits the large-scale application of DHR in high-security and high-reliability environments (e.g., finance, industrial control) and creates the need for integrating distributed consensus into the DHR control layer.

2.2. Security Applications of Raft Consensus

The Raft algorithm is designed for managing replicated state machines, aiming to solve data consistency problems in distributed systems [15]. By breaking down the consensus problem into three key components—leader election, log replication, and safety—the algorithm enhances both understandability and engineering feasibility [16,17]. In Raft, cluster nodes are divided into three roles: leader, candidate, and follower [18,19]. Leader election is triggered through a term-based mechanism combined with a randomized election timeout (typically 150–300 milliseconds). Log replication is managed by the leader, which receives client requests, broadcasts log entries, and adopts a two-phase commitment mechanism that requires confirmation from a majority of nodes before committing log entries to the state machine. Traditionally, Raft and its predecessor consensus algorithms, such as Paxos, are mainly applied to ensure data consistency and system availability, so as to address non-malicious failures such as node crashes and network partitions.

In recent years, as the security demands of distributed systems have increased, consensus algorithms like Raft have found new applications in the security domain. Their primary value lies in utilizing the consensus mechanism to create a trusted decision-making anchor among distributed nodes. For example, in blockchain technology, Byzantine fault tolerance (BFT)-based consensus algorithms are used to reach agreement on transaction history among untrusted nodes. This practice has proven the application potential of tamper-proof collaborative decision-making capabilities guaranteed by the consensus mechanism in distributed security systems.

Existing studies primarily explore the application of consensus technology in the security field in two ways. The first approach is to use consensus as a platform for auditing and preserving evidence of security events. For instance, leveraging the tamper-proof property of blockchain to record alert logs of intrusion detection systems (IDSs) or changes in system security status, so as to realize post-incident accountability and evidence preservation. In such studies, consensus technology is in a “post-event” and “peripheral” position and does not deeply participate in real-time security decision-making [20]. The second approach seeks to integrate the consensus mechanism directly into the control loop of defense systems. There have been preliminary explorations of using BFT consensus for collaborative decision making on distributed firewall policies or reaching agreement on attack determination in distributed IDSs [21]. These studies highlight the potential of consensus mechanisms for enabling real-time security decision making.

Despite its potential, research on the deep integration of Raft consensus with dynamic heterogeneous redundancy architecture remains in the exploratory phase [22,23]. Most existing studies treat Raft as a fundamental tool for metadata management in DHR systems, rather than positioning it as the strategic core of the defense logic. Specifically, the paradigm of converting all security workflows—such as scheduling instructions for executors, response sets for anomaly detection, and critical isolation decisions—into log entries strictly managed sequentially by the Raft state machine, and verifying and validating them through majority consensus, is still rare in the literature. The innovation of this work lies in the systematic construction of a decentralized DHR control plane with Raft consensus at its core, enabling consensus-driven security decision making. This not only helps mitigate the single-point-of-failure and trust bottleneck problems of traditional DHR systems, but also provides a continuous and tamper-evident audit trail for security state transitions through consensus logs.

Novelty boundary: Existing work on consensus-assisted security mainly falls into three categories: (i) BFT firewalls that use Byzantine consensus to coordinate distributed policy updates; (ii) consensus-driven intrusion detection systems that aggregate alerts from multiple sensors; (iii) randomized executor pools that increase diversity at the execution layer. However, these approaches typically do not replicate full business logic across heterogeneous executors under a consensus-managed dual-log pipeline, nor do they treat consensus logs as the single source of truth for both scheduling and isolation decisions. Our work differs in that it systematically elevates Raft from a metadata service to the core of the DHR control plane, converting all security-critical workflows—request routing, executor selection, anomaly adjudication, and isolation—to log entries governed by Raft, while coupling them with heterogeneous redundancy in the execution layer.

Unlike BFT firewalls, which filter external inputs, or consensus-driven IDS systems that coordinate alerts, or randomized executor pools that provide diversity without verifiable evidence, our architecture uniquely integrates decentralized Raft-based control, heterogeneous redundant execution, dual-log evidence pipelines, and explicit evidence-backed isolation decisions. This positions our work at the intersection of consensus robustness and heterogeneous active defense.

3. System Model

3.1. Architecture Overview and Core Components

Figure 1 illustrates the core workflow and the interaction logic of components in the decentralized DHR architecture, which is based on Raft consensus. With the Raft consensus cluster as the control core, this architecture integrates the execution layer design of dynamic heterogeneous redundancy (DHR) [23] to implement a defense paradigm of “consensus-driven security decision-making and heterogeneity-guaranteed vulnerability tolerance”.

The relationship between the core components and their workflows is as follows:

Input and proxy layer: Client requests are first received by the input proxy (F1), which handles two key tasks: ‘request access’ and ‘leader routing’. If F1 is a follower node in the Raft cluster, it forwards the request to the current leader; if F1 is the leader, the request is processed directly in the subsequent workflow. This design ensures efficient routing of requests to the consensus core, preventing the access layer from becoming a performance bottleneck.
Raft consensus cluster: The Raft consensus cluster consists of a leader and a set of followers (F₁, F₂, …, F_n), functioning as the ‘decision-making core’ of the system. The leader proposes logs (e.g., business requests, executor isolation instructions), while the followers participate in log replication and consensus voting using the Raft protocol, ensuring that all nodes reach consensus on system states, such as ‘how to respond to requests’ and ‘whether to isolate abnormal executors’.
Set of heterogeneous execution components (A₁, A₂, A₃, …, A_k): A randomized selection algorithm dynamically selects k heterogeneous executors from the available executor pool. These executors include multiple business instances with varying configurations (e.g., different operating systems, dependency library versions, and algorithm implementations), providing the foundation for ‘redundancy and diversity’. The leader assigns consensus-approved business requests to these heterogeneous executors, leveraging ‘diversity’ to inherently protect against the risk of ‘mass exploitation of common vulnerabilities’.
Decentralized log voting layer: This layer collects response results from all heterogeneous executors, verifies the legitimacy of the responses through ‘distributed response comparison and consensus result verification’ based on the global consistency of Raft logs, and outputs valid results or triggers the anomaly isolation process.
Dynamic random selection mechanism: For each business request, the leader randomly selects k heterogeneous executors from the available pool using a random algorithm, ensuring an unpredictable selection process.

3.2. Core Workflow Based on Raft

This section formally describes the system’s core workflow as a state machine driven by client requests and internal consensus events, with rigorous integration of the Raft consensus mechanism and DHR execution layer. The formalization adheres to the experimental settings (five heterogeneous nodes, f < N/2 malicious node tolerance) and security mechanisms defined in the paper.

3.2.1. System State Definition

The global state of the system is defined as a tuple that encapsulates the core elements of the architecture:

S = (R o l e, T e r m, L o g, C m t, L A p p, E P o o l, R M a p, C S e s s, L H a s h),

(1)

Table 1 shows the descriptions of the symbols in Equation (1).

3.2.2. Normal Request Processing Workflow

The request processing workflow follows a sequential, consensus-driven pattern across four coordinated phases.

Phase 1: request reception and routing—a client request

Req = ⟨ CID, SN, Data, {Sig}_{C} ⟩

arrives at an input proxy node

N_{i}

; the node verifies the client signature

{Sig}_{C}

and sequence number freshness against

C S e s s [CID]

; if

N_{i}

is a follower, it forwards the request to the current leader; otherwise, processing continues directly.

Phase 2: business log consensus and synchronization—the leader constructs a business log entry

e_{Biz} = (Term, Biz, ⟨ RID, Req, {Sig}_{L} ⟩)

, where

RID = CID ∥ Timestamp ∥ Sequence

provides global uniqueness; this entry is broadcast to all followers via Raft’s log replication mechanism; upon commitment by a majority of nodes (

⌈ \frac{N + 1}{2} ⌉

), all nodes synchronously apply the entry, initializing

RMap [RID] = \emptyset

.

Phase 3: heterogeneous execution and response collection—the leader randomly selects

k

executors

A_{set} = {A_{1}, A_{2}, \dots, A_{k}}

from

E P o o l

using a cryptographically secure selection algorithm; the request is distributed to all selected executors for parallel processing; each executor

A_{j}

processes the request through its transition function

δ_{j}

, producing output

{Out}_{j}

; the owning node then creates a response log entry

e_{Resp} = (Term, Resp, ⟨ RID, {EID}_{j}, H ({Out}_{j}), NID ⟩)

and submits it for consensus.

Phase 4: distributed voting and result synchronization—as response log entries commit, nodes update

RMap [RID]

with tuples

({EID}_{j}, H ({Out}_{j}), NID)

; when

∣ RMap [RID] ∣ \geq ⌈ \frac{k + 1}{2} ⌉

, the majority voting function triggers:

H ({Out}^{*}) = a r g m a x h count ({({EID}_{j}, h_{j}, NID) \in RMap [RID] ∣ h_{j} = h})

(2)

where leader broadcasts

H ({Out}^{*})

, all nodes verify consistency, and the leader returns the final result to the client while updating the client session state.

3.2.3. Anomaly Detection and Executor Isolation

The anomaly-handling workflow provides automated detection and mitigation of compromised components through consensus-based coordination.

Anomaly identification: The system detects anomalies through two primary mechanisms, majority voting discrepancies and response timeouts. When the voting function identifies responses diverging from the consensus hash, or when executors exceed configured timeout thresholds, the system constructs an anomaly set:

AnomSet = {{EID}_{j} ∣ ({EID}_{j}, h_{j}, NID) \in RMap [RID], h_{j} \neq H ({Out}^{*}) \lor timeout}

(3)

For each anomalous executor, nodes compile an evidence chain

E = ⟨ RID, {EID}_{j}, h_{j}, H ({Out}^{*}), TS, NID, {Sig}_{N} ⟩

capturing the contextual proof of misbehavior.

Consensus-based isolation: The leader formalizes isolation proposals as

e_{Iso} = (Term, Iso, ⟨ {EID}_{x}, H (E), H ({Out}^{*}), {Sig}_{L} ⟩)

. Followers independently verify the cryptographic evidence before consenting to commitment. This verification ensures isolation decisions reflect genuine consensus rather than unilateral action.

Atomic mitigation and recovery: Upon commitment of the isolation log, all nodes atomically update their executor pools:

E P o o l \leftarrow E P o o l ∖ {A_{x} ∣ A_{x} . EID = {EID}_{x}}

. The system immediately initiates service recovery by reselecting executors from the updated pool to reprocess the affected request, maintaining service continuity while preserving security guarantees.

3.2.4. Fault Tolerance and Recovery Mechanisms

The architecture ensures robust operation through integrated fault tolerance mechanisms.

Executor failure resilience: The system tolerates up to

f_{e} = k - ⌈ \frac{k + 1}{2} ⌉

simultaneous executor failures while maintaining correct operation. The majority voting mechanism inherently masks faulty outputs by relying on quorum consensus rather than requiring universal execution reliability.

Node crash recovery: For node failures, the Raft consensus protocol ensures state consistency through log replication and persistence. A recovering node

N_{f}

synchronizes its state by fetching missing log entries from the current Leader, then reapplies all committed entries from

L A p p_{N_{f}} + 1

to

C m t

to reconstruct consistent

R M a p

and

E P o o l

states.

Network partition handling: During network partitions, the Raft cluster ensures safety by allowing only the partition with a majority of nodes (

\geq ⌈ \frac{N + 1}{2} ⌉

) to process requests and commit new log entries. The minority partition becomes temporarily unavailable but automatically recovers consistency upon partition healing through log synchronization with the updated leader.

This formal model provides a comprehensive specification of the decentralized DHR architecture’s core workflow, explicitly defining the integration of Raft consensus with dynamic heterogeneous redundancy while establishing verifiable guarantees for safety, liveness, and security under various fault conditions.

3.2.5. Log Growth and Evidence Retention

To make the dual-log overhead explicit, we model the steady-state growth rate of the response log. Let

λ

denote the client request arrival rate (requests per second) and let

k

be the number of heterogeneous executors selected for each request. For each committed BusinessLogEntry, the system typically appends

k

corresponding ResponseLogEntry records. Hence, the number of response logs accumulated over time

t

is approximately

G (t) = λ \cdot k \cdot t,

(4)

indicating that the storage overhead grows linearly with both the request rate and the degree of redundancy. This simple model explains the trade-off between stronger heterogeneous validation (larger

k

) and log growth.

To bound the evidence storage cost, we configure a time-based evidence retention window

T_{e}

. All ResponseLogEntry records younger than

T_{e}

are retained in full and remain available for audit, dispute resolution, and rollback analysis. Only response logs that are (i) older than

T_{e}

, and (ii) already anchored by a committed isolation decision are eligible for garbage collection.

To prevent log compaction from breaking the evidence chain, we maintain a lightweight index that records which BusinessLogEntry and isolation decisions depend on a given ResponseLogEntry hash. Any log entry whose hash is still referenced by unresolved evidence sets is marked as non-compactable. This ensures that background log compaction never removes records that are required to reconstruct the evidence chain.

3.3. Security and Consistency Guarantee Mechanisms

3.3.1. Consensus Layer Security: Defending Against Malicious Single-Point Control

The “leader election rules” and “log consensus mechanism” of Raft require that control instructions such as “executor isolation” must be verified by the majority of nodes before taking effect. Even if the leader is compromised, its “malicious isolation proposal” will be rejected by follower nodes due to “failure of evidence hash verification”; if the leader frequently submits illegal proposals, it will trigger the “leader re-election” process of the follower nodes, excluding the attacked node from the consensus cluster.

3.3.2. Execution Layer Security: Heterogeneous Redundancy Against Vulnerabilities

The “multi-dimensional differentiated design” of the set of heterogeneous execution components (A1–Ak) (e.g., different operating systems, runtime environments, and core algorithms) makes it difficult for attackers to simultaneously break through all executors using “common vulnerabilities”.

3.3.3. Consistency Guarantee: Logs as the “Single Source of Truth”

All business requests and control instructions are stored in the form of Raft logs, and the logs are guaranteed to be tamper-proof through “chained hashing + multi-node backup”. This design ensures that the states of all nodes in the cluster are completely consistent, avoiding security vulnerabilities such as “repeated isolation of executors” and “conflicts in response results” caused by state discrepancies.

3.3.4. Proactive Security Enhancement Strategies

To address Advanced Persistent Threats (APTs), this architecture introduces proactive defense strategies to improve the system’s security resilience from both temporal and spatial dimensions. Dynamic leader rotation mechanism: the time-triggered leader re-election strategy effectively breaks the attacker’s persistent control over the control plane. The minimum stable term for triggering leader rotation is T_min = 15 s, which ensures system stability; the maximum risk term is T_max = 45 s, which limits the exposure time of a single point. The rotation probability function is:

p (t) = 1 - e^{- λ (t - T_{m i n})}, λ = 0.05

(5)

3.3.5. Output Normalization and Semantic Equivalence Across Heterogeneous Executors

Heterogeneous executors implemented in Java, Python, Go, Node.js, and PHP may naturally exhibit benign nondeterminism, including formatting discrepancies, floating-point rounding differences, or variations in JSON field ordering. To ensure that the DHR layer compares execution outputs in a deterministic and security-preserving manner, we define a canonical normalization function:

N (x) = CanonicalEncode (Round (x, p)),

(6)

where

p

denotes the numerical precision threshold, and CanonicalEncode produces a stable JSON-based canonical representation independent of runtime-specific encoding rules. All executor outputs are first transformed using

N (\cdot)

before hashing.

Two outputs are considered semantically equivalent if and only if the hashes of their normalized representations match:

Hash (N (x)) = Hash (N (y)) .

(7)

This normalization pipeline eliminates format-level and rounding-level nondeterminism, ensuring that legitimate heterogeneous differences do not trigger false-positive anomalies, while still allowing genuine semantic inconsistencies—such as tampering, forged results, or malicious deviations—to be reliably detected by the majority-voting mechanism.

3.4. Security Boundary and Assumptions

While the proposed architecture integrates Raft with a heterogeneous redundant execution layer, it is important to clarify the security boundary of the system. Raft, as used in this work, remains a crash-fault tolerant (CFT) consensus protocol and is not assumed to provide Byzantine fault tolerance. The system does not require Raft to defend against equivocation, forged votes, or malicious log construction.

Instead, the responsibility for handling adversarial behavior—such as tampering, inconsistent outputs, collusive responses, or semantic deviations—is delegated to the heterogeneous DHR execution layer, which performs replicated execution across diverse runtimes and applies majority voting over normalized outputs. This voting mechanism enables the system to empirically detect Byzantine-style deviations at the execution layer, without modifying Raft’s protocol or claiming formal BFT guarantees.

Therefore, the guarantees in this paper are divided into two layers:

Raft control plane (CFT): This provides crash-fault-tolerant log replication, global ordering of BusinessLogEntry and ResponseLogEntry, and state-machine consistency across nodes.
Heterogeneous execution layer (empirical Byzantine detection): This provides output comparison, response hashing, majority voting, and anomaly identification among executors. Its resilience to Byzantine behaviors is observational rather than protocol-theoretic.

This separation makes clear that Raft is not extended into a BFT protocol. The system detects Byzantine behaviors at the execution level, whereas Raft ensures only crash-tolerant dissemination of evidence.

4. Adversary Model

To formally analyze the security properties of the system, this Section models potential adversaries based on the Dolev–Yao model and the Byzantine Fault Tolerance model. We consider the adversary’s position in the system, capabilities, acquired knowledge, and attack objectives to define the types and behaviors of adversaries.

4.1. Adversary Capability Model

We define the adversary’s capabilities using the following tuple [24]:

Adversary = (L, K, C, O)

(8)

where

L

(location) denotes the adversary’s position in the system, including the external network, internal nodes, etc.

K

(knowledge) represents the knowledge possessed by the adversary, including system architecture, protocol details, etc.

C

(capabilities) refers to the adversary’s capabilities, such as tampering with messages, delaying responses, etc.

O

(objectives) indicates the adversary’s goals, such as disrupting consistency, stealing data, etc.

According to the adversary’s position and capabilities, we classify adversaries into the following four types:

External adversary: The adversary is located outside the system and can only interact with the system through public interfaces. Its capabilities include eavesdropping, tampering with, or injecting network messages, but it cannot directly access the system’s internal state.
Internal component adversary: The adversary has compromised one or more heterogeneous executors (Ax) and can arbitrarily tamper with the output and behavior of these executors. However, it cannot break through container isolation to access other executors or Raft nodes.
Internal node adversary: The adversary has taken control of a Raft node (Fi), including the heterogeneous executors on that node. It can arbitrarily manipulate the state of the node and send fake Raft messages but cannot directly control other nodes.
Collaborative Byzantine adversary: The adversary controls multiple Raft nodes simultaneously (no more than f nodes, where f < N/2, and N is the number of Raft nodes). These nodes can act collaboratively to launch Byzantine attacks, including behaviors that arbitrarily deviate from the protocol.

4.2. Assumptions About Adversary Knowledge

We assume that the adversary may possess the following knowledge: The system’s architectural design, including the existence of the Raft cluster and heterogeneous executors; the basic principles and communication mechanisms of the Raft consensus protocol; the heterogeneous configuration of some executors (e.g., operating systems, software versions); the specifications of cryptographic primitives (e.g., hash functions, digital signatures).

At the same time, we assume that the adversary does not possess the following knowledge: The private keys or long-term credentials of other nodes; complete details of the heterogeneous configuration of all executors; the system’s real-time status (e.g., the current leader node, health status), unless obtained through an attack.

4.3. Formalization of Attack Objectives

The adversary’s objectives can be formalized as one or more of the following:

Disrupting availability: Rendering the system unable to respond to client requests normally through denial-of-service (DoS) attacks.
Disrupting consistency: Causing the system state to diverge, where different nodes hold different states.
Disrupting integrity: Tampering with the system state or business data to cause unauthorized state changes.
Disrupting confidentiality: Stealing sensitive internal system data or business data.
Persistence control: Planting backdoors in the system to maintain long-term control.

4.4. Formalization of Adversary Behaviors

We use a state transition model to describe the adversary’s behaviors. Let the system state be S and the set of adversary actions be A; then the adversary’s behavior can be expressed as:

S_{t + 1} = f (S_{t}, A_{t})

(9)

where the adversary’s actions

A_{t}

include but are not limited to those shown in Table 2.

For internal node adversaries, their actions also include those shown in Table 3.

4.5. Adversary Profiles

Based on the above model, we define two typical adversary profiles:

Profile 1: malicious executor adversary—shown in Table 4.

Profile 2: Malicious Raft node adversary—shown in Table 5.

It is worth noting that this adversary model is intentionally more general than the failure model natively handled by Raft. While we model collaborative Byzantine adversaries for completeness, the underlying consensus protocol remains CFT. Consequently, in the subsequent security analysis we carefully separate what can be provably guaranteed under Raft’s crash-fault assumptions from what is only empirically observed under stronger Byzantine-style behaviors in our prototype.

4.6. Timeout Handling and Benign Performance Variations

In practice, response timeouts alone are not sufficient to classify an executor as malicious. The system first establishes baseline latency distributions under benign load and chooses timeout thresholds based on the P95–P99 percentiles of the observed latency. Occasional outliers beyond this threshold are treated as performance anomalies but not immediately as malicious behavior. Only persistent deviations across multiple independent requests—e.g., a statistically significant shift in the latency distribution or repeated timeouts for the same executor—will promote an executor from “suspected” to “malicious”, at which point an isolation proposal is issued. This two-stage process reduces false positives under bursty or high-load conditions while still allowing the system to isolate truly compromised executors.

All comparisons in our experiments use the normalization function

N (\cdot)

defined in Section 3.3.5 so that benign formatting or rounding differences do not trigger false-positive anomalies.

5. Security Analysis

Based on the adversary model in Section 4, this section systematically analyzes the security properties of the proposed architecture, including consistency, availability, integrity, and confidentiality, and demonstrates the architecture’s security when confronting various types of adversaries.

5.1. Analysis of Security Properties

5.1.1. Data Consistency

In the presence of adversaries, the system must still ensure that the states of all non-malicious nodes remain consistent. This is guaranteed through the following mechanisms:

Log integrity: All state changes are implemented via Raft logs, and log entries are linked through a hash chain to prevent tampering. Even if a malicious node attempts to fork the log, other nodes will detect this through hash verification.
Majority principle: Log commitment requires the agreement of a majority of nodes. Therefore, even if there are f malicious nodes ( $f < ⌈ N / 2 ⌉$ ), they cannot unilaterally commit malicious logs.
State machine determinism: All nodes apply the same log entries in the same order, and the state transition function is deterministic. As a result, the states of non-malicious nodes are always consistent.

5.1.2. Service Availability

Availability requires the system to respond to client requests in a timely manner. Adversaries may disrupt availability through denial-of-service (DoS) attacks, which is mitigated by the following mechanisms:

Leader fault tolerance: When the leader node is compromised by an adversary or fails, the system can re-elect a new leader after the election timeout to resume services.
Heterogeneous executor redundancy: Even if some executors are controlled by adversaries, the system can still generate correct responses through majority voting, ensuring continuous business operation.
Resource isolation: Executors run in isolated containers, preventing adversaries from exhausting the resources of an entire node through a single executor.

5.1.3. Business Integrity

Business integrity requires that business logic is not maliciously tampered with. This is ensured through the following mechanisms:

Business request verification: The leader node verifies the digital signature of client requests to ensure that the request source is legitimate and the request has not been tampered with.
Response consistency check: The correctness of executor responses is verified through the majority voting mechanism, and executors with abnormal outputs are isolated.
Audit logs: All business requests and system instructions are recorded in tamper-proof Raft logs, facilitating post-event auditing.

5.2. Security Demonstration Against Various Adversaries

5.2.1. Defense Against External Adversaries

External adversaries cannot bypass communication encryption and identity authentication, so they cannot forge client requests or eavesdrop on sensitive data. Even if they launch DDoS attacks, the system can mitigate the impact through request rate limiting and load balancing.

5.2.2. Defense Against Internal Component Adversaries

Abnormal executors are detected and isolated through the majority voting mechanism and response hash comparison. Formally, let the set of executors be

A = {A_{1}, A_{2}, \dots, A_{k}}

and the set of responses be

R = {R_{1}, R_{2}, \dots, R_{k}}

. The system determines the correct response via the function

M a j o r i t y (R)

and identifies abnormal executors via the function

D e t e c t A n o m a l y (R)

.

5.2.3. Defense Against Internal Node Adversaries

The Raft consensus mechanism imposes constraints that require the agreement of a majority of nodes for log entries. Let the number of malicious nodes be f and the total number of nodes be N. When f < N/2, the system can guarantee security: malicious proposals will be rejected by the majority after verification, and malicious leaders will be replaced through re-election. Malicious nodes cannot commit log entries that have not been verified by the majority. Even if a malicious node acts as the leader, its malicious proposals (e.g., illegal isolation of executors) will be rejected by follower nodes due to insufficient evidence. Meanwhile, the behavior of malicious leaders is monitored, and frequent anomalies will trigger re-election.

5.2.4. Defense Against Collaborative Byzantine Adversaries

In our adversary model, we also consider collaborative Byzantine adversaries that control up to f nodes, where f < N/2 in a cluster of N Raft nodes. From a protocol-theoretic perspective, classic Raft only provides safety guarantees under crash faults and benign message delays; it does not offer a general proof of safety under arbitrary Byzantine behavior. Our architecture therefore does not claim full Byzantine fault tolerance.

Instead, we add two layers of mitigation on top of Raft’s CFT core. First, all business requests and isolation instructions must carry valid client signatures and/or hash-based evidence, so forged or tampered log entries can be detected and rejected by honest nodes before commitment. Second, the heterogeneous execution layer performs majority voting on response hashes, which can mask incorrect outputs from a minority of compromised executors as long as at least one correct implementation exists.

In the specific collaborative Byzantine experiments of Section 6.2.4, where at most two nodes deviate from the protocol in controlled ways (e.g., contradictory voting, log forking, selective responses), we empirically observe that non-malicious nodes keep their logs and state machines consistent and that no incorrect log entries are committed. However, these results should be interpreted as stress test evidence under a restricted Byzantine behavior space, not as a formal BFT guarantee. When Byzantine nodes collude more arbitrarily or violate our cryptographic assumptions, the system may sacrifice liveness (e.g., blocking writes) to preserve safety, and a complete BFT analysis would require a different consensus protocol.

6. Security Verification Experiments

To verify the security of the architecture against various attacks, this section designs experiments to validate the architecture’s defense capabilities against malicious executors, malicious nodes, and collaborative Byzantine adversaries.

6.1. Experimental Environment and Configuration

This paper built a cluster consisting of 5 Raft nodes, where each Raft node runs on an independent virtual machine. Each node is deployed with corresponding heterogeneous executors, which differ significantly in terms of operating systems, dependency library versions, and business logic implementations. The node configuration is shown in Table 6.

We emphasize that the purpose of the experimental evaluation is to demonstrate architectural feasibility, consistency trends, and security-relevant behaviors under adversarial conditions, rather than to provide source-level reproducibility or serve as a reference implementation for low-level performance benchmarking. The experiments are designed to validate architectural mechanisms within the stated system assumptions.

In our prototype, each Raft node runs a JRaft-based Java control service together with a language-specific HTTP agent (Java/Spring Boot, Python/Flask, Go/Gin, Node.js/Express, PHP/ThinkPHP). The business logic consists of a simple account service with two operations: GET (read-only balance query) and INCREMENT/TRANSFER (state-changing update), both of which are encoded as BusinessLogEntry instances. For INCREMENT/TRANSFER, the consensus-committed new state is further processed by k heterogeneous executors, and their outputs are recorded as ResponseLogEntry instances in the dual-log pipeline. A Docker-based artifact, including deployment scripts and pre-built components, is provided to enable reviewers to deploy and observe a representative 5-node heterogeneous cluster using a single docker-compose command.

All experiments use JMeter as the load generator with:

Request rate: 100 QPS;
Experiment duration: 10 min;
Repeated trials: 30 runs per experiment;
Metrics collected: throughput (RPS), average latency, P95/P99 tail latency, executor divergence frequency, Raft commit latency, anomaly-detection latency.

All runs use the same docker compose setup to ensure consistency.

6.2. Experimental Design and Attack Simulation

Targeting the four types of adversary models, we designed four core experiments. Each experiment lasted 10 min, during which a load generator sent standard business requests at a rate of 100 QPS (Queries Per Second). Each experiment is executed 30 times under identical configuration. We report mean values across runs and include standard deviations; for latency-related metrics we additionally report P95 and P99 values to capture tail behavior.

6.2.1. External Adversary Simulation

Network attack experiment: Using the Scapy tool outside the cluster, we construct and send the following malicious traffic:

Replay attack: Capture legitimate requests and replay them immediately.
Message tampering: Modify key parameters in requests (e.g., user ID, transaction amount).
DDoS flooding: Send a large number of SYN packets to all node ports.

Verification target: Determine whether the system can ensure the non-reproducibility and integrity of requests and maintain service availability.

Expected results: The mechanism based on Raft log unique IDs and client signatures should reject all replay and tampered requests; cluster load balancing should mitigate the impact of DDoS.

6.2.2. Internal Component Adversary Simulation: Malicious Executor Experiment

Randomly select one executor (e.g., the Go executor on Node 3) and inject the following malicious behaviors during its request processing:

Deterministic error return: Always return fixed error results for specific requests.
Random error return: Return randomly tampered results with a 30% probability.
Response delay: Return responses after a random delay of 100–500 ms to simulate slow-rate attacks.

Verification target: Whether the majority voting mechanism can identify abnormal responses and trigger the isolation process for the malicious executor.

Expected results: The system should derive correct results through majority responses and generate valid isolation logs for the malicious executor.

6.2.3. Internal Node Adversary Simulation: Malicious Raft Node Experiment

Select Node 2 as the malicious node and simulate two attack scenarios:

Malicious follower: Discard 50% of the AppendEntries RPCs from the leader.
Malicious leader: Submit illegal isolation instructions to attempt to isolate a healthy executor (e.g., the Java executor on Node 1); submit malicious business logs containing an illegal request without client signature.

Verification target: Determine whether follower nodes can verify and reject illegal proposals, and whether the system can dismiss the malicious leader through re-election.

Expected results: Illegal isolation instructions will be rejected by the majority of nodes due to insufficient evidence; malicious business logs will be rejected due to signature verification failure; the abnormal behavior of Node 2 will lead to the early termination of its term.

6.2.4. Collaborative Byzantine Adversary Simulation: Byzantine Attack Experiment

Control Node 4 and Node 5 simultaneously (f = 2) to make them perform collaborative Byzantine behaviors.

Contradictory voting: Cast affirmative votes for different candidates during elections.
Log forking: Send log entries with different sequences to different followers.
Selective response: Process read requests normally but not respond to any written requests.

Verification Target: When f < N/2, determine whether the system can ensure security (state consistency, no illegal log commitment) and maintain limited availability.

Expected results: In the configured stress test, where at most two nodes behave in a Byzantine-like manner, we expect that the three non-malicious nodes (Node 1, Node 2, Node 3) will maintain fully consistent logs and state machines. Because a majority of 3 votes cannot be reached for new write entries, written requests are likely to be temporarily blocked, while read requests served from committed state can still succeed at a high rate. In our experiments we did not observe any incorrect log commitments, but we emphasize that this is an empirical result for this restricted adversary configuration rather than a general BFT guarantee. The security verification metrics and evaluation criteria are summarized in Table 7.

6.3. Experimental Results and Analysis

We used attack detection rate, system availability (proportion of successfully responded requests), and state consistency (whether the state machine hash values of all non-malicious nodes are the same) as core evaluation indicators, standard deviations are within 3–5% of the mean.

Consistency guarantee: In all four types of attack experiments, the state machines of all non-malicious nodes finally remained consistent, verifying the effectiveness of Raft logs as the “single source of truth”.

Anomaly detection and isolation: Experiments 2 and 3 showed that the majority voting and evidence verification mechanisms based on consensus can detect and isolate internal threats in near real time.

Availability trade-off: Experiment 4 showed that when facing collaborative Byzantine attacks, the system prioritizes ensuring security (consistency) while sacrificing partial availability (availability), which is consistent with the expectations of the CAP theorem.

6.4. Experimental Conclusions

The experiments in this Section comprehensively verified the security defense capabilities of the proposed architecture. The results show that the architecture can effectively resist various attack modes, from external networks to internal nodes, and from non-collaborative to collaborative attacks. The Raft consensus layer is the core of defense, ensuring the global consistency and tamper-proof of all security decisions. Heterogeneous executors and the majority voting mechanism together form a “security filter” for the execution layer, which can effectively filter malicious behaviors of single or a few points. Under extreme Byzantine faults, the architecture adheres to the security bottom line and ensures the eventual consistency of the system.

6.5. Discussion of Additional Adversarial Stress Tests

Although we did not implement full-scale adversarial experiments for extended attack scenarios in the current prototype, we provide a conceptual analysis of how the proposed architecture behaves under four representative adversarial patterns commonly encountered in distributed systems: partial message withholding, forced leader churn, timestamp manipulation, and multi-partition split-brain scenarios.

6.5.1. Partial Message Withholding

If a follower selectively drops AppendEntries messages, Raft preserves safety because log entries cannot be committed without a majority. Written availability may degrade, but honest nodes never diverge. The DHR layer continues to validate executor outputs as long as at least one honest executor remains active.

6.5.2. Forced Leader Churn

Frequent disruptions of the leader—with the intention of destabilizing election cycles—reduce system liveness but do not affect Raft’s safety properties. The system delays committed operations but prevents conflicting logs from being committed. This behavior illustrates Raft’s known trade-off: sacrificing availability to preserve safety under unstable leadership.

6.5.3. Timestamp Manipulation

Because Raft orders logs strictly by (term, index), client-side or executor-side timestamp tampering cannot influence log ordering. The DHR layer attaches evidence hashes and signatures to response logs, preventing malicious reordering or rollback attacks.

6.5.4. Split-Brain Scenarios

Under multi-partition network splits, minority partitions lose the ability to commit writes but retain read availability if executor results are locally verifiable. Once connectivity is restored, logs are reconciled consistent with Raft’s majority-based safety rules. This demonstrates the defense boundary of the architecture: safety is preserved, but availability degrades under severe partitions.

A full empirical evaluation of these stress tests is left for future work and requires a more extensive fault-injection framework. The conceptual analysis provided here complements the formal security model and clarifies the behavior of the architecture under adversarial stress conditions beyond the crash-fault assumption.

7. Performance Testing

This Section designs performance experiments to test the architecture’s performance under normal and attack scenarios, including throughput, latency, and resource consumption.

7.1. Testing Environment and Configuration

The 5-node cluster environment used in Section 6 is adopted. Additionally, an independent virtual machine is used to run Apache JMeter as the load generator to simulate concurrent client requests. Business requests include simple “account balance inquiries” and “transfer transactions” at a ratio of 4:1.

7.2. Test Scenario Design

Four progressive scenarios are designed, each running for 5 min, with the number of concurrent users increasing linearly from 50 to 500.

Scenario 1: baseline performance—the system’s performance without any attacks.
Scenario 2: malicious executor—one minute after the test starts, activate the malicious executor on Node 3 (which returns random errors).
Scenario 3: malicious follower node—during the test, Node 2 acts as a malicious follower and discards 30% of consensus messages.
Scenario 4: malicious leader node—use a tool to make Node 4 the leader, and let it submit 10% of illegal business requests.
Performance metrics: We conduct quantitative analysis of system test data based on the following core metrics: throughput measured in successful requests per second (RPS) evaluates system processing capacity; average latency reflects the mean time consumed for request processing; P99 latency characterizes the tail latency, indicating the maximum response time achieved by 99% of requests; meanwhile, CPU/memory utilization monitors the average resource consumption across the cluster.

7.3. Test Results and Analysis

In addition to the four attack scenarios, Table 8 now includes two baseline configurations for comparison: (i) a centralized DHR architecture with a single control node coordinating the same pool of heterogeneous executors, and (ii) a Raft-only cluster that runs the same business service on a 5-node Raft group but without heterogeneous redundancy (k = 1, no majority voting or executor isolation). All three baselines are evaluated under the same workload configuration (request mix, concurrency level, and duration), which allows us to isolate the performance cost of decentralization and heterogeneity. For reference, the performance of a Raft-only cluster without the DHR execution layer is reported in Table 9, which serves as a baseline to isolate the overhead introduced by heterogeneous execution and majority voting. The performance results of the proposed decentralized DHR architecture under the same workload and attack scenarios are summarized in Table 10.

According to the table, we can analyze the following results:

Performance overhead: Compared with the baseline scenario, introducing security mechanisms leads to significant performance overhead when the system is under attack. The overhead mainly comes from the following sources: majority voting calculation and log commitment (Scenario 2); consensus timeout and retransmission (Scenario 3); leader re-election and illegal proposal verification (Scenario 4). Compared with the Raft-only baseline, decentralized DHR incurs additional overhead due to heterogeneous execution and majority voting, while compared with centralized DHR it introduces extra consensus and log-replication costs. However, the throughput of the proposed architecture remains within X–Y% of the centralized baseline under normal conditions, which we consider acceptable given the removal of the single-point control bottleneck.
Throughput and latency: Throughput decreases and latency increases in all attack scenarios. Among them, the malicious leader scenario has the greatest impact on performance, as it directly interferes with the core of the consensus process.
Resource consumption: When under attack, CPU usage increases slightly because nodes need to perform more verification calculations and network communication. Memory usage remains stable across all scenarios.
Recoverability: In Scenarios 2 and 4, after the malicious components are automatically isolated by the system, the performance indicators gradually recover to more than 95% of the baseline level within 1–2 min, demonstrating the system’s self-healing capability.

While providing strong security guarantees, this architecture introduces controllable performance overhead. In the absence of attacks, its performance is comparable to that of conventional distributed systems. When under attack, the system sacrifices partial performance (10–20% throughput reduction and increased latency) in exchange for correct business logic and system state. This trade-off is reasonable and necessary in security-prioritized application scenarios (e.g., financial transactions, industrial control). Additionally, the system’s automatic fault recovery mechanism ensures a rapid recovery of performance, guaranteeing long-term service quality.

It is worth emphasizing that the performance results under malicious leader and Byzantine-like behaviors should be interpreted as stress tests of our dual-log and isolation mechanisms rather than evidence of full Byzantine fault tolerance at the consensus layer. Under such adversaries, our design deliberately sacrifices part of liveness to preserve the correctness and auditability of committed state, as discussed in Section 5.2.4.

7.4. Quantitative Analysis of Control-Plane Security

To substantiate the claim that the proposed Raft–DHR architecture “eliminates single-point control risk”, we provide a quantitative comparison between a centralized DHR control plane and our decentralized Raft-based control plane. Three dimensions are analyzed: (1) leader takeover probability, (2) attack-surface reduction, and (3) trust graph transformation.

7.4.1. Leader Takeover Probability

In a traditional centralized DHR system, all routing, arbitration, and isolation decisions are controlled by a single authority node. Thus, the probability of a successful takeover of the control plane is as follows:

P_{takeover}^{central} = 1 .

(10)

In contrast, Raft-based DHR requires a majority quorum (i.e., 3 out of 5 nodes in our prototype) to manipulate consensus outcomes. Assume an attacker compromises each node independently with probability

p

. The probability of successfully controlling the Raft quorum is as follows:

P_{takeover}^{raft} = \sum_{i = 3}^{5} (\binom{5}{i}) p^{i} (1 - p)^{5 - i} .

(11)

For typical compromise probabilities (e.g.,

p \leq 0.3

), the takeover probability is reduced by more than an order of magnitude compared to the centralized control plane. This demonstrates the probabilistic advantage of decentralizing the control-plane authority.

7.4.2. Attack-Surface Reduction

To assess the structural benefit of decentralization, we compare the number of nodes whose compromise can unilaterally impact system-wide isolation decisions. A quantitative comparison of the control-plane attack surface between centralized DHR and the proposed Raft–DHR architecture is presented in Table 11.

In the centralized architecture, the control plane forms a single bottleneck, making a targeted attack highly efficient. With Raft–DHR, the attack surface becomes a majority-quorum subgraph, substantially increasing the cost, coordination, and probability of a successful exploit.

7.4.3. Trust Graph Transformation

The shift from centralized to decentralized control can also be expressed via structural trust graph transformation.

All executors depend on a single root controller, as shown in Figure 2.

This topology implies that compromise of the root node compromises all executors and all security-critical decisions.

The proposed system distributes control across a Raft consensus group, as shown in Figure 3.

Decisions require joint agreement among at least three nodes, transforming trust from a single root to a distributed majority, and removing unilateral decision power entirely.

7.4.4. Summary

This quantitative analysis demonstrates that decentralizing the DHR control plane fundamentally enhances security by:

Reducing takeover probability from 1.0 to a small binomial probability.
Decreasing the attack surface by requiring majority compromise instead of a single node.
Transforming the trust graph from a central root to a majority-quorum model.

These analytical results support the claim that Raft–DHR effectively eliminates single-point control risk and provides a more robust security foundation for dynamic heterogeneous redundancy.

8. Conclusions

To address the core challenges of single-point failure and trust bottleneck in the centralized control of dynamic heterogeneous redundancy (DHR) architectures, this paper is the first to systematically propose and implement a decentralized DHR architecture based on Raft consensus. The main conclusions and contributions of this study are summarized in four points, laid out below.

First, we designed and implemented a decentralized control plane that deeply couples the Raft consensus mechanism with the DHR execution layer. By introducing a dual-log pipeline consisting of BusinessLogEntry (business request log) and ResponseLogEntry (response log), this architecture places all security-critical decisions—including client request distribution, heterogeneous executor scheduling, multi-mode response arbitration, and abnormal executor isolation—under the strict sequential management of the Raft state machine. This design fundamentally eliminates the security risks of traditional DHR relying on a single control node, shifting the system’s trust foundation from a single entity to a consensus mechanism maintained by a majority of nodes.

Second, by formally defining a multi-dimensional adversary model that includes external adversaries, internal executor adversaries, internal node adversaries, and collaborative Byzantine-style adversaries, we systematically analyze the security properties and fault boundaries of the proposed architecture. Under the classic Raft assumption f < N/2, our design preserves state consistency and business integrity for all honest nodes and prevents any single compromised node from unilaterally committing illegal logs or performing unauthorized isolation. Beyond this CFT envelope, the heterogeneous execution and dual-log evidence pipeline allow the system to detect and gradually remove malicious executors and nodes, while explicitly accepting that liveness may be degraded in the presence of stronger-than-CFT adversaries.

Third, we constructed a prototype system with five heterogeneous nodes and designed comprehensive experiments to empirically evaluate both security and performance. The results show that, as long as a majority of selected executors remain honest, the architecture achieves near-100% detection and isolation rates for the four representative attack classes considered in this paper, while maintaining more than 90% service availability under non-Byzantine faults. These results should be interpreted as conditional guarantees under the stated adversary model, not as unconditional Byzantine fault tolerance of the underlying consensus protocol.

Fourth, performance testing revealed the trade-off between security and efficiency in the architecture. In the baseline scenario without attacks, the architecture’s performance is comparable to that of conventional distributed systems; When under attack, the system exchanges controllable performance overhead (10.4–21.6% reduction in throughput) for correct business logic and extremely high security resilience. This trade-off is necessary and acceptable for security-critical domains such as financial transactions and industrial control.

Finally, we provide a runnable Docker-based prototype package that allows reviewers to observe the execution workflow and security-relevant behaviors of the proposed architecture without exposing internal source code. The complete implementation will be made available after acceptance in accordance with institutional policies, to facilitate further inspection and follow-up research, without affecting the scope of the current experimental evaluation. In summary, this research not only addresses the core flaws of traditional DHR architectures but also explores an effective path to transform the reliability guarantee capability of distributed consensus into the endogenous security capability of active defense systems. It provides an engineering-feasible solution for building the next-generation network defense infrastructure with high trustworthiness and measurability.

Author Contributions

Conceptualization, K.C.; methodology, K.C.; software, K.C.; Validation, L.S.; Formal analysis, K.C.; Investigation, K.C.; Resources, K.C.; Data curation, K.C.; Writing—original draft preparation, K.C.; Writing—review and editing, K.C.; Visualization, K.C.; Supervision, L.S.; Project administration, L.S.; Funding acquisition, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Shandong Province Science and Technology Innovation Capacity Improvement Project for Small and Medium Enterprises under Grant 2025TSGCCZZB0235, and Shinan District of Qingdao Science and Technology Innovation Project—TIC (Inspection, Testing and Certification) Technology Innovation Platform Research and Demonstration Application under Grant Project No. 2023-1-001-SZ.

Data Availability Statement

No publicly archived datasets were generated during this study. The experimental results reported in this paper were obtained from controlled prototype-based evaluations of the proposed architecture. Due to institutional and intellectual property constraints, the underlying implementation artifacts and experimental logs are not publicly released at this stage. Additional information regarding the experimental setup and evaluation methodology is provided in the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Gong, S.; Lee, C. BLOCIS: Blockchain-Based Cyber Threat Intelligence Sharing Framework for Sybil-Resistance. Electronics 2020, 9, 521. [Google Scholar] [CrossRef]
Chuppala, R.; PurnachandraRao, D.B. Kubernetes Etcd Implemantation Using Btree and Fractal Trees. SSRN J. 2025, 13, 6. [Google Scholar] [CrossRef]
Lv, Y.; Guo, Y.; Chen, Q.; Cheng, G.; Chen, Y. Active Perceptive Dynamic Scheduling Mechanism Based on Negative Feedback. Procedia Comput. Sci. 2018, 131, 520–524. [Google Scholar] [CrossRef]
Subasi, N.; Guner, U.; Ustoglu, I. N-Version Programming Approach with Implicit Safety Guarantee for Complex Dynamic System Stabilization Applications. Meas. Control 2021, 54, 269–278. [Google Scholar] [CrossRef]
Wu, J. DHR Architecture. In Cyberspace Mimic Defense: Generalized Robust Control and Endogenous Security; Wu, J., Ed.; Springer: Cham, Switzerland, 2020; pp. 273–337. ISBN 978-3-030-29844-9. [Google Scholar]
Liu, X.; Wang, H.; Li, C. A Review of Endogenous Security Research. Electronics 2024, 13, 2185. [Google Scholar] [CrossRef]
Wu, J.; Zou, H.; Xue, X.; Zhang, F.; Shang, Y. Cyber Resilience Enabled by Endogenous Security and Safety: Vision, Techniques, and Strategies. Strateg. Study CAE 2023, 25, 106. [Google Scholar] [CrossRef]
Fu, J.; Yuan, Y.; He, J.; Liang, S.; Huang, Z.; Zhu, H. An Input-Aware Mimic Defense Theory and its Practice. arXiv 2022, arXiv:2208.10276. [Google Scholar] [CrossRef]
Li, Q.; Meng, S.; Sang, X.; Zhang, H.; Wang, S.; Bashir, A.K.; Yu, K.; Tariq, U. Dynamic Scheduling Algorithm in Cyber Mimic Defense Architecture of Volunteer Computing. ACM Trans. Internet Technol. 2021, 21, 1–33. [Google Scholar] [CrossRef]
Feng, F.; Zhou, X.; Li, B.; Zhou, Q. Modelling the Mimic Defence Technology for Multimedia Cloud Servers. Secur. Commun. Netw. 2020, 2020, 8819958. [Google Scholar] [CrossRef]
Chen, P.; Wei, J.; Yu, Z.; Chen, J. Key-Area Cyberspace Mimic Defense against Data-Oriented Attacks. Sands 2025, 4, 19. [Google Scholar] [CrossRef]
Li, R.; Kong, X.; Guo, W.; Guo, J.; Li, H.; Zhang, F. Boosting Multimode Ruling in DHR Architecture with Metamorphic Relations. Softw. Test. Verif. Rel 2024, 34, e1890. [Google Scholar] [CrossRef]
Ying, F.; Zhao, S.; Wang, J. A Security Information Transmission Method Based on DHR for Seafloor Observation Network. Sensors 2024, 24, 1147. [Google Scholar] [CrossRef] [PubMed]
Shao, S.; Yimu, J.I.; Zhang, W.; Liu, S.; Jiang, F.; Cao, Z.; Fei, W.U.; Zeng, F.; Zuo, J.; Zhou, L. A DHR Executor Selection Algorithm Based on Historical Credibility and Dissimilarity Clustering. Sci. China (Inf. Sci.) 2023, 66, 212304. [Google Scholar] [CrossRef]
Ongaro, D.; Ousterhout, J. Search of an Understandable Consensus Algorithm. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, Philadelphia, PA, USA, 19–20 June 2014; USENIX Association: San Francisco, CA, USA, 2014; pp. 305–320. [Google Scholar]
Choumas, K.; Korakis, T. When Raft Meets SDN: How to Elect a Leader over a Network. In Proceedings of the 2020 6th IEEE Conference on Network Softwarization (NetSoft), Ghent, Belgium, 29 June–3 July 2020; pp. 140–144. [Google Scholar]
Tan, P.; Zou, W.; Tang, W. A Consensus Algorithm with Leadership Transfer-LTRaft. In Proceedings of the Wireless Sensor Networks, Guilin, China, 22–25 October 2021; Cui, L., Xie, X., Eds.; Springer: Singapore, 2021; pp. 235–249. [Google Scholar]
Mathur, V.; Chahal, K. Hydra: A Peer to Peer Distributed Training & Data Collection Framework. arXiv 2018, arXiv:1811.09878. [Google Scholar] [CrossRef]
Li, W.; He, M. Imp Raft: A Consensus Algorithm Based on Raft and Storage Compression Consensus for IoT Scenario. J. China Univ. Posts Telecommun. 2020, 27, 53–61. [Google Scholar]
Houichi, M.; Jaidi, F.; Bouhoula, A. Enhancing Smart City Security: An Intrusion Detection System Using Machine Learning Methods With the UNB CIC IoT 2023 Dataset. IET Smart Cities 2025, 7, e70014. [Google Scholar] [CrossRef]
Pun, T.H.; He, Y.J.; Shum, C.D. Strengthening Fault Tolerance of Private/Consortium Blockchain with Trusted Execution Environment. In Proceedings of the 2024 International Conference on Sustainable Technology and Engineering (i-COSTE), Perth, Australia, 18–20 December 2024; pp. 1–6. [Google Scholar] [CrossRef]
Ouyang, L.; Song, K.; Zhang, W.; Wei, S. Microcontroller Design Based on Dynamic Heterogeneous Redundancy Architecture. China Commun. 2023, 20, 144–159. [Google Scholar] [CrossRef]
Han, Z.; Yu, W.; Hao, L.; Hongyu, L.; Liquan, C. Intelligent Dynamic Heterogeneous Redundancy Architecture for IoT Systems. China Commun. 2024, 21, 291–306. [Google Scholar] [CrossRef]
Basin, D.; Cremers, C. Know Your Enemy: Compromising Adversaries in Protocol Analysis. ACM Trans. Inf. Syst. Secur. 2014, 17, 1–31. [Google Scholar] [CrossRef]

Figure 1. Decentralized dynamic heterogeneous redundancy architecture based on Raft consensus.

Figure 2. Centralized DHR trust graph (star model).

Figure 3. Raft–DHR trust graph (majority-quorum graph).

Table 1. Description of core elements.

Symbol	Description
$R o l e \in$ ${L d r, F l w, C a n d}$	The node’s role (leader, follower, candidate).
$T e r m \in N$	The current monotonically increasing term number.
$L o g = ⟨ e_{1}, e_{2}, \dots, e_{n} ⟩$	An append-only log sequence. Each entry $e_{i} = (t_{i}, t y p_{i}, d_{i})$ , where $t y p_{i} \in {B i z, R e s p, I s o}$ denotes the type (business, response, isolation).
$C m t \in N$	The index of the highest log entry known to be committed (requiring confirmation from $⌈ \frac{N + 1}{2} ⌉$ nodes).
$L A p p \in N$	The index of the highest log entry applied to the state machine ( $L A p p \leq C m t$ ).
$E P o o l \subseteq Exs$	The current set of available, healthy heterogeneous executors.
$R M a p : RID \to P (Res)$	A volatile mapping from Request ID to a power set of responses, where a response $Res = (EID, H (Output), NID)$ .
$C S e s s$	Client session state for replay protection, typically storing $⟨ CID, LastSeqNum, PubKey ⟩$ .

Each heterogeneous executor,

A_{j} \in Exs

, is modeled as a deterministic state machine,

A_{j} = (σ_{j}, δ_{j})

, where

σ_{j}

is its internal state and

δ_{j} : (σ_{j}, Req) \to (σ_{j}^{'}, Out)

is the state transition function that produces an output.

Table 2. Adversary’s actions,

A_{t}

.

Table 2. Adversary’s actions,

A_{t}

.

Symbol	Description
$A_{drop}$	Dropping messages
$A_{delay}$	Delaying messages
$A_{forge}$	Forging messages
$A_{replay}$	Replaying messages
$A_{modify}$	Tampering with message content
$A_{equivocate}$	Sending contradictory messages (Byzantine adversaries only)

Table 3. Additional actions of an internal node attacker.

Symbol	Description
$A_{maliciousLog}$	Submitting malicious log entries
$A_{falseVote}$	Casting false votes in elections
$A_{isolate}$	Illegally isolating healthy executors

Table 4. Malicious executor adversary.

Symbol	Description
$L$	Internal executor A_x
$K$	Knows the configuration and business logic of the executor
$C$	Can tamper with the output of A_x, delay responses, and attempt to escape the container
$O$	Disrupts the integrity of business logic, causing the system to output incorrect results

Table 5. Malicious Raft node adversary.

Symbol	Description
$L$	Internal Raft node F_i
$K$	Knows Raft protocol details and partial system status
$C$	Can send arbitrary Raft messages, persist arbitrary states, and manipulate local executors
$O$	Disrupts consistency or availability (e.g., causing state divergence, illegally isolating executors)

Table 6. Raft node configuration in detail.

Node Number	Operating System	Server Software	Implementation Language and Framework	Core Dependency Library Version
Node 1	CentOS 8 Stream	Nginx 1.20.1	Java + Spring Boot 2.6.7	JDK 1.8, Spring Core 5.3.20
Node 2	Ubuntu Server 22.04 LTS	Apache HTTP Server 2.4.52	Python + Flask 2.1.2	Python 3.9.12, Flask 2.1.2
Node 3	Debian 11	Apache Tomcat 9.0.65	Go + Gin 1.9.0	Go 1.19.3, Gin 1.9.0
Node 4	SUSE Linux Enterprise Server 15 SP4	Jetty 10.0.12	Node.js + Express 4.18.2	Node.js 16.17.0, Express 4.18.2
Node 5	Alpine Linux 3.18	Nginx 1.24 + PHP-FPM 8.1	PHP + ThinkPHP 6.0	PHP 8.1.10, ThinkPHP 6.0

Table 7. Security verification experiment results.

Scenario	Attack Detection Rate	System Availability	State Consistency	Key Observation
External Attack	100%	98.5%	Maintained	All replay and tampered requests were rejected; DDoS caused a slight increase in latency.
Malicious Executor	100%	95.2%	Maintained	All malicious behaviors were detected and isolated within 3 request cycles.
Malicious Node	100%	90.1%	Maintained	All malicious proposals were rejected; the malicious Leader was replaced after an average of 12 s.
Byzantine Attack	-	65.4% (Write)/99.8% (Read)	Maintained	System security was perfectly maintained, but written request availability decreased significantly due to failure to reach consensus.
External Attack	100%	98.5%	Maintained	All replay and tampered requests were rejected; DDoS caused a slight increase in latency.

Table 8. Performance of the centralized DHR architecture.

Test Scenario	Throughput (RPS)	Average Latency (ms)	P99 Latency (ms)
Baseline	1850	18	40
Malicious Executor	1720 (−7.0%)	26	60
Malicious Follower	Not applicable	-	-
Malicious Leader	1650 (−10.8%)	33	95

Table 9. Performance of the Raft-only cluster (without DHR execution layer).

Test Scenario	Throughput (RPS)	Average Latency (ms)	P99 Latency (ms)
Baseline	1480	28	75
Malicious Executor	Not applicable	-	-
Malicious Follower	1370 (−7.4%)	39	110
Malicious Leader	1250 (−15.5%)	48	170

Table 10. Performance of the proposed decentralized DHR.

Test Scenario	Throughput (RPS)	Average Latency (ms)	P99 Latency (ms)
Baseline	1250	45	120
Malicious Executor	1120 (−10.4%)	58	185
Malicious Follower	1050 (−16.0%)	75	250
Malicious Leader	980 (−21.6%)	92	350

Table 11. Attack surface reduction.

Architecture	Control Decision Points	Minimum Nodes Required for Manipulation	Attack-Surface Risk
Centralized DHR	1	1	Full compromise with a single successful attack
Raft–DHR (this work)	5	3 (majority quorum)	Reduced by 70% compared to centralized design

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, K.; Shi, L. Decentralized Dynamic Heterogeneous Redundancy Architecture Based on Raft Consensus Algorithm. Future Internet 2026, 18, 20. https://doi.org/10.3390/fi18010020

AMA Style

Chen K, Shi L. Decentralized Dynamic Heterogeneous Redundancy Architecture Based on Raft Consensus Algorithm. Future Internet. 2026; 18(1):20. https://doi.org/10.3390/fi18010020

Chicago/Turabian Style

Chen, Ke, and Leyi Shi. 2026. "Decentralized Dynamic Heterogeneous Redundancy Architecture Based on Raft Consensus Algorithm" Future Internet 18, no. 1: 20. https://doi.org/10.3390/fi18010020

APA Style

Chen, K., & Shi, L. (2026). Decentralized Dynamic Heterogeneous Redundancy Architecture Based on Raft Consensus Algorithm. Future Internet, 18(1), 20. https://doi.org/10.3390/fi18010020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Decentralized Dynamic Heterogeneous Redundancy Architecture Based on Raft Consensus Algorithm

Abstract

1. Introduction

2. Related Work

2.1. Dynamic Heterogeneous Redundancy

2.2. Security Applications of Raft Consensus

3. System Model

3.1. Architecture Overview and Core Components

3.2. Core Workflow Based on Raft

3.2.1. System State Definition

3.2.2. Normal Request Processing Workflow

3.2.3. Anomaly Detection and Executor Isolation

3.2.4. Fault Tolerance and Recovery Mechanisms

3.2.5. Log Growth and Evidence Retention

3.3. Security and Consistency Guarantee Mechanisms

3.3.1. Consensus Layer Security: Defending Against Malicious Single-Point Control

3.3.2. Execution Layer Security: Heterogeneous Redundancy Against Vulnerabilities

3.3.3. Consistency Guarantee: Logs as the “Single Source of Truth”

3.3.4. Proactive Security Enhancement Strategies

3.3.5. Output Normalization and Semantic Equivalence Across Heterogeneous Executors

3.4. Security Boundary and Assumptions

4. Adversary Model

4.1. Adversary Capability Model

4.2. Assumptions About Adversary Knowledge

4.3. Formalization of Attack Objectives

4.4. Formalization of Adversary Behaviors

4.5. Adversary Profiles

4.6. Timeout Handling and Benign Performance Variations

5. Security Analysis

5.1. Analysis of Security Properties

5.1.1. Data Consistency

5.1.2. Service Availability

5.1.3. Business Integrity

5.2. Security Demonstration Against Various Adversaries

5.2.1. Defense Against External Adversaries

5.2.2. Defense Against Internal Component Adversaries

5.2.3. Defense Against Internal Node Adversaries

5.2.4. Defense Against Collaborative Byzantine Adversaries

6. Security Verification Experiments

6.1. Experimental Environment and Configuration

6.2. Experimental Design and Attack Simulation

6.2.1. External Adversary Simulation

6.2.2. Internal Component Adversary Simulation: Malicious Executor Experiment

6.2.3. Internal Node Adversary Simulation: Malicious Raft Node Experiment

6.2.4. Collaborative Byzantine Adversary Simulation: Byzantine Attack Experiment

6.3. Experimental Results and Analysis

6.4. Experimental Conclusions

6.5. Discussion of Additional Adversarial Stress Tests

6.5.1. Partial Message Withholding

6.5.2. Forced Leader Churn

6.5.3. Timestamp Manipulation

6.5.4. Split-Brain Scenarios

7. Performance Testing

7.1. Testing Environment and Configuration

7.2. Test Scenario Design

7.3. Test Results and Analysis

7.4. Quantitative Analysis of Control-Plane Security

7.4.1. Leader Takeover Probability

7.4.2. Attack-Surface Reduction

7.4.3. Trust Graph Transformation

7.4.4. Summary

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI