Possibility and the Impossibility of Reliable Broadcast: A 1-Safe and Reliable Broadcast Algorithm in the Presence of Arbitrary Initialization

Dabees, Aisha; Karaata, Mehmet Hakan

doi:10.3390/a18070437

Open AccessArticle

Possibility and the Impossibility of Reliable Broadcast: A 1-Safe and Reliable Broadcast Algorithm in the Presence of Arbitrary Initialization

by

Aisha Dabees

and

Mehmet Hakan Karaata

^*

Department of Computer Engineering, Kuwait University, Kuwait City 13060, Kuwait

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(7), 437; https://doi.org/10.3390/a18070437

Submission received: 28 April 2025 / Revised: 25 June 2025 / Accepted: 1 July 2025 / Published: 17 July 2025

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

Download

Browse Figures

Versions Notes

Abstract

In this paper, we first prove that it is impossible to devise an asynchronous reliable broadcast algorithm that can start in an arbitrary initial system configuration where processes execute their actions solely based on local knowledge. We then propose the first asynchronous reliable broadcast algorithm that, starting in any arbitrary initial system configuration, ensures three properties, namely, premature feedback safety, propagation 1-safety, and propagation reliability. Premature feedback refers to the conclusion of a broadcast operation at a process although the process has neighbors that did not receive the most recent broadcast yet. Due to possessing the premature feedback safety property, the proposed algorithm always executes as per its specification without premature feedback and is able to implement propagation reliability with exactly once semantics. In addition, our proposed algorithm works even if the processes that the broadcast did not reach yet are perturbed by transient faults, making it more scalable compared to their counterparts requiring initialization.

Keywords:

boundary broadcast; broadcast algorithms; distributed systems; propagation of information; reliability; safety; wave algorithms

1. Introduction

Propagation (broadcast) is a process of dissemination of information in a distributed system where each system process receives a message broadcast by legitimate source(s). Propagation is used in numerous applications, including communication protocols, consensus, distributed coordination, and network management. The properties of propagation, namely propagation safety and propagation reliability, play a crucial role in securing network protocols, defending against attacks, and maintaining effective control in a system [1,2,3,4,5]. A propagation algorithm is said to possess the propagation safety property if each message received by the system processes is always the same message that was last broadcast by a legitimate source process. In other words, propagation safety ensures sending of only those messages that have truly originated from legitimate sources. A legitimate source process for a broadcast algorithm refers to a process which initiates the message broadcast. Any other process from which a message has originated is referred to as an illegitimate source. A broadcast initiated by the legitimate source process is referred to as a legitimate broadcast and an illegitimate broadcast otherwise. On the other hand, a system is said to implement propagation reliability if each message sent by a legitimate source is received exactly once by every node in the system and the receipt of the message by all processes is acknowledged by the source within a finite time. Without the properties of propagation safety and reliability, a distributed system may not operate as intended, leading to incorrect results, security breaches, and possibly even hazardous outcomes [6].

More formally, a property S of a system is said to be safe if, for any execution

E = γ_{1}, γ_{2}, \dots, γ_{k}

of the system, where

γ_{i}

(for

i \geq 1

) denotes a system configuration and E is a finite or infinite sequence of such configurations, if E satisfies S, then every prefix or postfix of E also satisfies S. The system configuration is given as the product of its processes’ states, where a process’s state refers to the collection of the process’s values of variables. A system configuration satisfying S is said to be a safe system configuration and unsafe otherwise. The system executes local actions (transitions) through which it enters new system configurations. If the transition causes the system to enter a safe system configuration from a safe or an unsafe system configuration, the transition is said to be safe, and the transition is said to be a specification violation if it is from a safe system configuration to an unsafe system configuration. A specification violation may cause a broadcast algorithm to enter a system configuration in which the broadcast is discontinued, though some system processes did not receive the broadcast yet violating the propagation reliability. Analogously, a specification violation might cause a broadcast algorithm to reach a system configuration in which a process is ready to transmit or receive a broadcast that did not originate from a legitimate source, violating the propagation safety property. Though it is critical, safety in most distributed systems has a high overhead due to requiring system specification to be continuously maintained.

It has been established that propagation safety cannot be achieved in a system starting in an arbitrary initial system configuration [2]. Nevertheless, weaker forms of propagation safety, such as 1-propagation safety, are achievable. A distributed broadcast algorithm is considered 1-propagation-safe iff any message that was not broadcasted by a legitimate source process (referred to as a spurious message) can only be sent from a process to a neighboring process and cannot be transmitted further.

Propagation reliability can be implemented with numerous semantics. The semantics under consideration herein is the exactly once semantics, where each process receives the message from the legitimate source exactly once. Alternatively, at least once semantics can be considered where each system process receives the message one or more times. At least once semantics may lead to breaching the system safety or security in certain applications, making the exactly once semantics more desirable [7].

Ensuring that a broadcast algorithm satisfies both propagation safety and reliability properties is not straightforward when the system is started in an arbitrary initial system configuration without an initial reset. This is primarily because of a broadcast algorithm that can start in an arbitrary initial system configuration may suffer from premature feedback. Premature feedback refers to the conclusion of a broadcast operation at a process although the process has neighbors that did not receive the most recent broadcast yet. For broadcast algorithms, premature feedback can be caused by transient faults as well as interferences caused by initial arbitrary initialization of system processes. Observe that premature feedback may result in a violation of propagation reliability as the neighboring process may never receive the broadcast. Premature feedback safety is required for a broadcast algorithm without initial setup or reset since arbitrary system initialization can cause specification violations leading to premature feedback, which in turn violates the propagation reliability property.

The implementation of our proposed broadcast algorithm starting in an arbitrary initial system configuration presents a number of challenges. First, some process synchronization needs to be implemented to ensure the premature feedback safety of the broadcast. Second, propagation reliability with exactly once semantics needs to be guaranteed. Third, as the broadcast proceeds to the neighboring processes, determining the next set of neighboring processes and advancing the broadcast to all the predetermined neighboring processes in the presence of interferences caused by arbitrary initialization of processes. Fourth, the elimination of spurious messages and guaranteeing 1-safety of propagation needs to be implemented. These challenges can easily be met if lock-step synchrony is assumed. Lock-step synchrony refers to the synchronization where all processes in a network are synchronized with each other such that they progress in a step-by-step manner where, in each step, all system processes execute all their currently enabled actions. One of the main limitations of the lock-step synchrony assumption is that it is inherently limited by the speed of the slowest process in the network [8]. If one process experiences a delay or a fault, the entire network must wait until that process catches up, which can result in a significant reduction in performance and efficiency. In a real-world scenario, processes may have different processing speeds and may be executing different algorithms, making it difficult to maintain a lock-step execution of all processes. This may lead to major performance degradation, making the lock-step synchrony assumption unsuitable for large-scale distributed systems [9]. In addition, lock-step synchrony is considered a strong assumption with significant overhead while complicating the system and its implementation.

In this paper, we first prove that it is impossible to devise an asynchronous reliable broadcast algorithm that can start in an arbitrary initial system configuration where processes execute their actions solely based on local knowledge. We then propose the first asynchronous reliable broadcast algorithm that, starting in any arbitrary initial system configuration, ensures three properties, namely, premature feedback safety, propagation 1-safety, and propagation reliability. The complexity of the proposed broadcast algorithm is O(

D^{2}

) rounds, where D is the diameter of the communication network.

Due to possessing premature feedback safety property, the proposed algorithm always executes as per its specification without premature feedback and is able to implement propagation reliability with exactly once semantics.

The propagation reliability and propagation 1-safety decide the messages received by system processes. Since the 1-propagation safety property guarantees that no spurious message generated due to arbitrary initialization received by a process can be sent to its neighbors but cannot be forwarded further, it ensures that each process can receive at most one spurious message due to arbitrary initialization.

The proposed algorithm also distinguishes between spurious messages and those sent by a legitimate source. Our algorithm ensures that the receipt of a spurious message is possible only once in the execution of the first enabled action of the algorithm and a message from a legitimate source cannot be received in the execution of the first enabled action by the process. Therefore, the message received after the execution of the first message receipt action by a process is always a message received from a legitimate source. This facilitates the receipt of only those messages sent by legitimate sources, implementing propagation safety. That is, a process receives at most one spurious message due to arbitrary initialization, followed by exactly one message from a legitimate source.

The ability of our proposed algorithm to start in an arbitrary initial system configuration allows the algorithm to start without any reset or initialization, which in turn allows the algorithm to make progress before another phase (initialization or reset) is completed. The same allows transient faults to perturb the states of the processes that the broadcast has not reached yet. In contrast, a broadcast algorithm that requires initialization may not work when the initialized processes are perturbed by transient faults. As system sizes (number of processes) increase, the likelihood of transient faults also increases, limiting the scalability of broadcast algorithms that require initialization, whereas our proposed algorithm works even if the processes that the broadcast did not reach yet are perturbed by transient faults, making it more scalable compared to their counterparts requiring initialization.

The provisions of our proposed algorithm can be facilitated by a broadcast that may fail to provide propagation reliability in its first attempt while violating the exactly once semantics. To handle the failure of the broadcast, separate mechanisms of failure detection and rebroadcast elimination of multiple message receipts are needed. Clearly, the ad hoc nature of such a protocol increases its complexity and the cost of the broadcast compared to our proposed algorithm, which inherently provides all the properties without combining separate mechanisms. In addition, due to their complexity and possible interferences among the mechanisms, such protocols become more error-prone.

The proposed algorithm can be used in various real-world applications that require reliable and safe broadcast, including IoT networks [10], smart homes [11], and autonomous drones [12]. The algorithm can also be used in fields requiring strong security and isolation guarantees, such as smart cities [13], smart grid systems [14], cloud virtualization layers to ensure data security and isolation [15], and blockchains to guarantee secure data transmissions [16].

The rest of the paper is organized into five more sections. Section 2 provides the necessary preliminaries to establish a solid foundation of knowledge. In Section 3, the computational model employed in the paper is presented. Section 4 focuses on the algorithm, including its specifications, underlying principles and detailed explanation. Section 5 offers proof of the correctness of the algorithm. Finally, the paper concludes in Section 6, summarizing the key findings, drawing conclusions, and providing final remarks.

2. Preliminaries

2.1. Arbitrary Initialization and Scalability

Arbitrary initial system configuration refers to a system configuration the system is in before it begins its execution, where system processes are in any arbitrary initial state, where no assumptions can be made about the states of the variables of system processes. Most distributed systems/algorithms make certain assumptions about the initial system configuration and hence employ distributed reset mechanisms [17] to guarantee a certain initial system configuration prior to their execution. At first, this approach seems to be viable; however, systems that require a considerable amount of time to initialize/set up or execute become more prone to transient faults perturbing initialized processes as the system size increases. The perturbation of already initialized processes nullifies the effect of the system initialization or setup; as a result, the safety and liveness properties of systems/algorithms can no longer be guaranteed. Since the ability of a system that requires initialization to ensure its safety and liveness properties decreases as system sizes and the likelihood of transient faults perturbing already initialized processes increase, the initialization requirement of a system reduces its ability to scale up. Examples of systems that are large, initialization is difficult or challenging, and execution or initialization takes a significant amount of time to reach all system processes include the ones given in [18,19,20]. Such systems provide additional benefits such as avoiding delays caused by initialization. In addition, they tend to be more adaptive as they eliminate some preconditions for newly joining processes by allowing them to be in any arbitrary initial state. Therefore, distributed algorithms that do not require initialization are crucial.

Practically, applications start in an arbitrary initial system configurations, where the processes’ states are not known [21]. Starting in an arbitrary system configuration, some nodes/processes may experience partial failures while others continue to work as expected [22]. In addition, data due to arbitrary initialization and its transmission can expose the distributed systems to potential safety risks such as unauthorized access and malicious attacks [23,24,25,26]. Therefore, an algorithm that does not require initialization needs to ensure that the processes that execute the algorithm are free of the interferences caused by arbitrarily initialized processes. For instance, in a broadcast algorithm, a process that received the broadcast with neighbors that did not receive the broadcast may experience interferences by the arbitrarily initialized processes and may prematurely end the broadcast.

2.2. Broadcast Algorithms

Broadcast, a fundamental building block in many computing systems, refers to sending messages from a process to all other processes in the system. The first broadcast algorithm was introduced as a flooding algorithm in [27], which, despite its conceptual simplicity, causes significant message duplication and network congestion. To overcome this, alternative broadcast algorithms were proposed [28,29,30], and research continues as new distributed systems such as blockchain, IoT, and cloud computing demand increased safety, reliability, and adaptability. Most broadcast algorithms operate through three core phases: broadcast (B), feedback (F), and cleaning (C). A classical model capturing these phases is the Propagation of Information with Feedback (PIF), introduced in [31], where a root initiates the broadcast, and feedback is received once all processes acknowledge message receipt. PIF has been used in several foundational distributed computing problems such as synchronization, termination detection, and mutual exclusion [32,33,34,35,36]. Propagation of Information with Feedback and Cleaning (PFC), a variant of PIF, introduces a cleaning phase to erase traces of completed broadcasts, thus enabling new broadcasts to proceed safely and efficiently [37,38]. Both PIF and PFC have demonstrated efficiency and reliability in distributed systems, including wireless sensor networks [39,40,41], peer-to-peer networks [42], and cloud environments [43].

Distributed reset is a broadcast mechanism to bring all system processes to the same distinguished state [44,45]. Distributed reset is similar in nature to a distributed broadcast and can be implemented using a distributed broadcast. It is used in performing system reconfigurations upon new processes or channels being added [45]. Since arbitrarily initialized and unpredictable states do not always align seamlessly with the reset mechanism, arbitrarily initialized processes present challenges. In addition, arbitrarily initialized processes may cause interferences with the actions of distributed reset algorithms, which may prevent reset from reaching all system processes or cause improper reset of some of the processes. These then may breach the safety, security, and reliability of the applications that use distributed reset.

Apache ZooKeeper provides a distributed coordination service that includes a distributed lock service and a reliable broadcast protocol called ZooKeeper Atomic Broadcast (ZAB) [46]. ZAB is a consensus algorithm which ensures that all nodes in the system receive the same set of updates in the same order, even upon crash failures [47]. This ensures that the distributed system is always in a consistent state, which is important for applications that require high levels of reliability such as Skype, Azure Cosmos DB, Azure SQL Database, Azure IoT Hub, and Xbox Live [47,48,49].

In [2], Al-Jady and Karaata proposed the first self- and snap-stabilizing broadcast algorithm with the propagation 1-safety and the propagation reliability properties. Their proposed algorithm uses two waves, namely, a primary wave and an authentication wave. The first wave is employed to implement the broadcast and the authentication wave is employed to facilitate the 1-safety of the first wave by verifying the legitimacy of the broadcast tree prior to broadcasting further. They assume lock-step synchronization where, in each lock-step, all the enabled system processes execute all their enabled actions exactly once. The objective of the lock-step synchrony assumption in their algorithm is to allow clearing the effects of arbitrary initialization to precede the actual broadcast. Without the assumption of lock-step synchrony, if a process in some state due to arbitrary initialization is reached by the broadcast before clearing the effect of the arbitrary initialization, the arbitrary initial state of the reached process may cause a premature initiation of a feedback phase, which leads to loss of the propagation reliability property. In contrast, our algorithm does not require the assumption of lock-step synchrony to provide the propagation 1-safety and propagation reliability properties. In our proposed algorithm, the legitimate broadcast is isolated from the arbitrarily initialized processes to prevent perturbation of the states of the processes already involved in the broadcast by those that are arbitrary initialized and those whose state changes are caused directly or indirectly by the arbitrarily initialized processes. In addition, our proposed algorithm ensures that no process joins the broadcast tree prior to resetting its state in a separate step.

Since the algorithm proposed in [2] assumes lock-step synchrony, its implementation requires separate implementations of the algorithm and the lock-step synchrony along with their interfacing. The complexity and the overhead of the implementation of [2] are eliminated by our proposed simpler algorithm by not requiring a separate mechanism to implement lock-step synchrony. In addition, the time complexity of [2] and our proposed algorithm are the same, i.e., our algorithm does not introduce any overhead to eliminate the assumption of lock-step synchrony.

Table 1 compares the abovementioned broadcast approaches with our proposed algorithm. The comparison shows that most existing algorithms are asynchronous (do not assume lock-step synchrony), do not support arbitrary initialization, and offer limited scalability. They also do not guarantee 1-safety (handles spurious messages) and premature feedback safety, which can lead to safety violations. In contrast, our algorithm is asynchronous and handles arbitrary initialization, premature feedback safety, 1-safety (handles spurious messages) while being scalable with low complexity, as shown in the table.

Numerous broadcast algorithms have been developed with varying properties such as reliability, safety, efficiency, and network overhead for a number of application areas. In wireless communication networks, broadcast is commonly used to disseminate information about network topology and available resources, enabling efficient coordination and communication among nodes. Military operations rely on broadcast to provide troops in the field with real-time updates on the battlefield, tactical movements, and changes in the tactical situation [51]. Financial transactions leverage broadcast to distribute transaction data and updates to all involved parties, ensuring timely and accurate information sharing [52]. Emergency response systems utilize broadcast to communicate critical information about the location and status of emergency responders, facilitating coordinated response efforts [53]. In inventory management systems, broadcast is utilized to distribute information about inventory levels, restock requests, and order updates, optimizing supply chain operations [54]. Power grid management relies on broadcast to distribute essential information about electricity generation, consumption, and distribution, enabling effective coordination of energy resources [55]. Similarly, in supply chain management, broadcast plays a key role in disseminating information about inventory levels, production schedules, and shipping updates, promoting seamless logistics operations [56]. In healthcare systems, broadcast is used to distribute vital information about patient conditions, treatments, and outcomes, supporting comprehensive and accurate healthcare delivery [57].

2.3. Applications of Broadcast

In a blockchain network, broadcasting is a fundamental component that allows the network to work [58]. When a transaction is completed, it is sent to the network, and all nodes receive a copy of the transaction. The nodes then validate the transaction and add it to their blockchain copy. Similarly, when a new block is extracted, it is sent to the network, and all nodes receive a copy of the block. Nodes then validate the block and add it to their cryptographic copies. This ensures that all nodes have the same blockchain copy, which is necessary for the network to function correctly.

In distributed databases, broadcasting is used to replicate data over multiple nodes [59,60,61,62] to increase data availability. When data is updated on a node, it is sent to other nodes in the network, and the data is updated on those nodes to ensure data consistency.

Distributed Stream Processing Frameworks (DSPFs) are rapidly becoming one of the most significant IoT ecosystem components [63]. DSPFs are used to steam real-time rapidly changing data through the distributed network. Streaming these data needs to guarantee data consistency among the nodes of the network. Broadcast is the main component when the DSPFs need to stream and guarantee data consistency in a scalable manner. DSPFs can be used as a data analysis layer for intelligent city IoT applications. The most common real-time processing DSPFs are Apache Storm, Apache Spark Streaming, and Apache Flink.

In consensus algorithms, reliable broadcasting, such as atomic broadcasts, serves as a fundamental building block for establishing agreement among nodes. Consensus algorithms allow distributed applications to reach a common decision or state. Consensus algorithms require four properties: termination, uniform integrity, uniform agreement, and uniform validity. Termination guarantees that every correct process will eventually decide on a value. Uniform integrity ensures that each process decides at most once, preventing duplicate decisions. The uniform agreement ensures that regardless of their states, no two processes will decide on different values, ensuring consistency. Lastly, uniform validity guarantees that a value chosen by a process in the system was already presented by another process. The authors of [64] introduced the practical Byzantine fault tolerance (PBFT) algorithm, which provides Byzantine fault tolerance and uses a reliable broadcast protocol to establish consensus among nodes. The Raft consensus algorithm, proposed in [65], also uses reliable broadcast to maintain consistency among replicated logs.

Path authentication and source authentication are two forms of authentication mechanisms in broadcast algorithms [66]. Path authentication guarantees the integrity of a message during transmission by verifying the path through which the message is traversed. Source authentication ensures message integrity by confirming the authenticity of the message’s origin. Though path and source authentication algorithms exist [67,68,69], existing algorithms do not consider arbitrary initial system configuration. Hence, they do require initialization or reset and have scalability issues.

Apache ZooKeeper [70], Google’s Chubby [71], and Microsoft’s Azure Service Fabric [72] are all distributed systems that provide mechanisms for achieving reliable broadcast, synchronization, and lock-step synchrony in a distributed environment. Apache ZooKeeper and Google’s Chubby both provide distributed lock services, while Microsoft’s Azure Service Fabric provides reliable distributed transactions that support lock-step synchrony. These systems are widely used in industry for building large-scale distributed applications that require high availability and consistency.

Microsoft’s Azure Service Fabric is a distributed system platform providing a wide range of services, including a reliable messaging system and a distributed lock service. The messaging system uses a publish–subscribe model to ensure that messages are reliably delivered to all subscribers, while the distributed lock service provides a mechanism for achieving mutual exclusion in a distributed environment [73]. Service Fabric also includes built-in support for microservices, which allows developers to build complex distributed systems using a modular architecture [74].

In contrast to our proposed algorithm, these systems are fairly complex and provide broadcast with at least one semantic, which may lead to multiple copies of the same message being received by a process that may not be appropriate for some safety-critical applications, including remote surgeries and transportation applications for which receipt of multiple messages poses a safety risk. In contrast, our proposed algorithm provides the exact one semantics which ensures that every process receives exactly one copy of the message.

All the abovementioned algorithms and protocols require an initialization phase before their execution. This initialization process introduces an added layer of complexity to the existing algorithms. Furthermore, this can introduce scalability challenges, especially in large networks. On the other hand, due to being able to start in an “arbitrary system configuration”, our proposed algorithm does not require a system configuration setup. Furthermore, they are not capable of handling “spurious messages”, whereas our proposed algorithm prevents the propagation of “spurious messages” by implementing 1-safety.

3. Computational Model

This section describes the computational model for a network represented by a graph

G = (V, E)

, where the vertex set V represents the set of processes, the edge set E represents the network channels, and n is the total number of processes. The network has one root process, referred to as r, and any two processes in the network, i and j, are considered neighbors if there is a communication link

(i, j)

between them, which is represented by

(i, j) \in E

.

Each process in the network runs a deterministic program that consists of shared variables, constants, input/output capabilities, and a finite set of guarded commands. All processes except for the root process run the same program, and the processes have knowledge of their neighbors. We assume the shared memory modules model for process communication where a process can only write to its own variables while being able to read its own variables and those of its neighboring processes (Figure 1).

The program of a process i can be expressed as follows:

* [G [1] \to A [1] □ G [2] \to A [2] □ \dots □ G [k] \to A [k]],

where

Each $G [j]$ is a Boolean function of the variables of process i and the variables of its neighboring processes;
Each $G [j] \to A [j]$ is referred to as a guarded command [75], where a guard $G [j]$ is a predicate over the variables of p and/or its neighbors that evaluates to a Boolean value, and a command $A [j]$ consists of a sequence of statements that updates one or more local variables of the executing process.
Each command $A [j]$ reads and updates the variables of process i and reads its neighbors’ variables;
$* [S]$ corresponds to the repeated execution of the statement S while there exists an enabled guard.
□ is called the nondeterminism symbol. One of the guarded commands separated by □ is selected nondeterministically in each iteration.

A command is executed only if its corresponding guard evaluates to True. A process with a guard evaluating to True, and the corresponding command are called enabled. If multiple guards are enabled at the same time, the selection of the command to execute is non-deterministic. We assume that the selection of an enabled guard is weakly fair and the execution of the guard and command are performed in a single atomic step. Weak fairness ensures that if a guard remains enabled, it will eventually be executed.

Each nodal process contains an application program process and the proposed broadcast algorithm process as shown in Figure 1. The application program is the main program running at each process in the network, supplying a message M to the root process of the proposed broadcast algorithm to start broadcasting M to all processes in the system. When input M is available at root process r, predicate function

i n p u t ()

holds, and the root takes input M to start broadcasting and disables input()M by executing function

d i s a b l e . i n p u t ()

. Then the proposed broadcast algorithm delivers the message to the application program processes of the system through the actions and communication of the proposed broadcast algorithm processes. The message M is assumed to be delivered to a process when it changes from the cleaning phase to the broadcasting phase and assumes a parent.

The values of a process’s variables determine the local state of that process. The system configuration is defined as the product of its processes’ states. We refer to the set of all possible configurations of the system as C. Given an execution

E = μ_{0} μ_{1} μ_{2}, \dots .

for

i \geq 0

,

μ_{i} \in C

, we define the first round of E to be the minimal prefix

E^{'}

of E where every process enabled prior to

E^{'}

is executed at least once in

E^{'}

. Let

E^{″}

be the suffix of E for which

E = E^{'} E^{″}

. The second round of E is the first round of E, and so on. For any given execution, E, the round complexity (which is sometimes called the execution time) of E is the number of rounds in E, as shown in [76].

4. Specification of Asynchronous and Reliable Broadcast Algorithm

A finite computation

E \in δ

, where

δ

is the set of all possible executions, is called an asynchronous and reliable broadcast algorithm if and only if all the following conditions are

true

:

[Reliable Broadcast 1] Root process r receives message M in the computation step

γ_{1} \mapsto γ_{2}

.

[Reliable Broadcast 2] Let process i be a process that received message M directly or indirectly from root r. Any neighbor j of i where j has not received M in the boundary broadcast is in a state indicating that it has not received a broadcast and remains in that state until receiving the broadcast directly or indirectly from the root.

[Reliable Broadcast 3] For any process i with spurious message M, i can propagate M to neighboring process j; however, j cannot propagate to any process.

[Reliable Broadcast 4] For every process

i \in V

, there exists a unique

x \in [1, k - 1]

such that process i receives M in

γ_{x} \mapsto γ_{x + 1}

.

[Reliable Broadcast 5] In

γ_{k}

, root r receives an acknowledgment of the receipt of M from every process

i \neq r

.

Note that the specification of boundary broadcast is given in the message passing model, while the rest of the paper assumes the shared memory model given in the previous section for the sake of brevity. It is easy to see that receipt of a message by a process in the message passing model corresponds to a state change by the process after the same by a neighboring process in the shared memory model.

5. Algorithm

In this section, we first provide some definitions to facilitate the description of the basis of the algorithm. Subsequently, we present two lemmas that establish the basis of our proposed algorithm. The first lemma states the required synchronization, and premature feedback safety, for any implementation of a broadcast algorithm that starts in an arbitrary initial system configuration. The second lemma shows that in any broadcast algorithm that starts in an arbitrary system configuration, actions cannot solely depend on local knowledge, knowledge collected from the process executing the action and its neighbors. The second lemma implies that any such algorithm needs to implement a mechanism to gather some non-local knowledge which is often implemented using waves in a distributed computational model. We then introduce the first asynchronous reliable broadcast algorithm.

Our proposed algorithm constructs a normal broadcast tree and extends the tree to the entire network to implement the broadcast. The normal broadcast tree refers to the broadcast tree rooted at root r satisfying the following three conditions:

(i): All the processes in the normal broadcast tree have received the broadcast directly or indirectly from root process r;
(ii): All the neighbors of the processes in the broadcast phase in the tree that have not received a message from the root are in the cleaning phase, have assumed parents in the tree, and are locked;
(iii): Each path in the tree from root r to a leaf is made up of zero or more processes in broadcast phases followed by zero or more processes in the feedback phases which are followed by zero or more processes in the cleaning phase.

On the other hand, an abnormal broadcast tree does not satisfy at least one of the above conditions. A process is said to be locked if it cannot change its parent so that the locked process in the normal broadcast tree does not assume a parent not included in the normal broadcast tree and receives a spurious message propagated by an abnormal broadcast tree. All the processes in the broadcast tree remain locked until the completion of the broadcast to avoid the interference of the processes that are not included in the normal broadcast tree.

In a normal broadcast tree, the processes that have received a message from the root and do not have any children in the broadcast phase are referred to as level-l processes. The neighbors of the level-l processes in the cleaning phase are referred to as boundary or level l + 1 processes. Boundary processes are those non-completed processes in the cleaning phase with parents in the broadcast tree that are only allowed to receive the broadcast from a process in the normal broadcast tree. The boundary processes are not affected by the arbitrarily initialized processes due to being locked as locking prevents state changes in boundary processes caused by arbitrarily initialized processes. Therefore, non-completed boundary processes isolate the completed processes in the normal broadcast tree from the state changes caused by arbitrarily initialized processes, which in turn eliminates safety violations during the broadcast. A boundary process remains as a non-completed process in the cleaning phase until it receives a message from the legitimate root directly or indirectly. In addition, the neighbors of the boundary processes that are not included in the tree and not level-l processes are referred to as level

l + 2

processes.

5.1. Basis of the Algorithm

In this section, we introduce the basis of the proposed algorithm and provide essential definition.

Premature feedback safety is a process synchronization such that a broadcast algorithm is said to satisfy premature feedback safety iff a completed non-broadcasted process cannot become completed broadcasted or non-completed prior to all its neighbors receiving the most recently broadcasted message. A completed process refers to a process that has received the most recent message broadcasted by a source process, or a non-completed process otherwise. A process is referred to as a broadcasted process iff all its neighbors have received the most recently broadcasted message and non-broadcasted otherwise.

The following lemma shows the role of maintaining premature feedback safety in implementing propagation reliability for broadcast algorithms. In particular, it shows that if premature feedback safety is not implemented by a broadcast algorithm, propagation reliability cannot be guaranteed after starting in an arbitrary initial system configuration. In other words, premature feedback safety is implemented if, for any completed non-broadcasted process

i \in V

and in any system configuration, process i never becomes broadcasted or non-completed without broadcasting its message to all its neighbors.

Lemma 1 (Premature Feedback Safety Requirement).

Starting in an arbitrary system configuration, if a broadcast algorithm does not implement premature feedback safety, propagation reliability cannot be implemented using only local knowledge under the locally shared memory modules model.

Proof.

Let process i and j be two neighboring processes where process i received message M from the legitimate source and process j received message

M^{'}

from an illegitimate source, where

M \neq M^{'}

. Notice that process j can be in a state which indicates that j is a completed process due to arbitrary initialization though has received broadcast from an illegitimate source. Observe that since it cannot be decided as to which message (M or

M^{'}

) is sent by root process r based on process i’s and process j’s states using local knowledge, i.e., it cannot be decided as to which message shall propagate to the other process. Now, process j is not guaranteed to receive the legitimate broadcast from the legitimate root. Thus, the propagation reliability is violated. Hence, the proof follows. □

The above lemma merely states that any implementation of a reliable broadcast algorithm under the locally shared memory modules model that starts in an arbitrary system configuration needs to implement premature feedback safety.

The following lemma states that starting in an arbitrary initial system configuration, an asynchronous reliable broadcast cannot be implemented with only local knowledge.

Lemma 2 (Non-Local Knowledge Requirement).

There does not exist an asynchronous reliable broadcast algorithm that implements premature feedback safety and can solely rely on local knowledge.

Proof.

Assume that process i is a boundary process. Before advancing the broadcast to process i, each non-completed neighbor j of i needs to become a boundary process. This is because if a non-completed neighbor j of i is not made a boundary process, j can receive a message from a non-completed process violating the boundary reliability requirement by Lemma 1. This requires the identification of whether or not each process j is a non-completed non-boundary process since only the non-completed non-boundary processes are to be made boundary processes prior to advancement of the broadcast to process i. Due to arbitrary initialization, process j’s state and the state of process j’s neighbors may indicate that it is a completed process. Then clearly, in the absence of synchronization, process i’s local knowledge cannot distinguish whether process j is completed or not. Hence, the proof follows. □

The above lemma implies that starting in an arbitrary initial system configuration, any asynchronous broadcast algorithm that ensures premature feedback safety needs to implement mechanisms for processes to collect knowledge beyond their neighbors. For example, this mechanism may involve initiating a new fresh wave to check whether or not the processes to be included in the normal broadcast tree prior to extending the broadcast to a new set of processes.

We now provide the basis of the 1-safe and reliable broadcast algorithm based on the above lemmas, then give the detailed description of the algorithm followed by its formal specification.

The proposed algorithm needs to meet the following challenges. First, between every completed process and a non-completed non-boundary process, at least one boundary process is maintained at all times. Second, as the broadcast proceeds to the boundary processes, the next set of boundary processes needs to be determined, and they need to be made the new boundary processes when the broadcast proceeds in the presence of arbitrary initialization. Third, some process synchronization (premature feedback safety) needs to be implemented so that boundary processes only receive the broadcast from completed processes.

The proposed algorithm uses two concurrent waves, namely the primary wave and the secondary wave, to implement the asynchronous reliable broadcast. The primary wave constructs and contracts a broadcast tree rooted at process r using a top-down broadcast phase, followed by two consecutive bottom-up phases called

f e e d b a c k

and

c l e a n i n g

. A process is in a broadcast phase when it is involved in a message broadcast, in a bottom-up feedback phase when it is involved in acknowledging the root about the completion of the broadcast phase, and in a cleaning phase when it is involved in cleaning the trace of the broadcast tree. On the other hand, the secondary wave is used to verify the normality of a broadcast tree before the advancement of the primary wave to a new set of processes while implementing mechanisms to ensure premature feedback safety. A secondary wave in tree T consists of one bottom-up feedback phase, an f-phase, starting from the leaves, followed by one top-down phase, a b-phase, followed by a bottom-up cleaning phase and a c-phase.

The primary wave, initiated by root process r, broadcasts a message received from the application protocol process of r while simultaneously constructing a broadcast tree rooted at r. In the beginning, root r constructs a broadcast tree which contains only root r as a boundary process. Subsequently, the broadcast tree advances to the neighbors of r by making neighbors of r boundary processes and r a completed process. Then, prior to each advancement of the broadcast to boundary processes, a fresh secondary bottom-up wave in the broadcast tree is used to verify the normality of the broadcast tree and inform the root about the normality of the broadcast tree. Notice that a fresh wave is essential for the implementation of the broadcast algorithm with premature feedback safety since local knowledge is insufficient as stated by Lemma 2. Subsequently, the root informs boundary processes of the result of the verification using a top-down wave to advance the broadcast. The advancement takes place as all the non-completed neighbors of boundary processes become new boundary processes and all the boundary processes become completed processes. The verification of the normality of the broadcast tree, the information gathering and dissemination by the secondary wave prior to the advancement of the broadcast, implements the boundary safety and 1-safety of the broadcast and facilitates level-by-level construction of the broadcast tree where, after each secondary wave, the broadcast advances only to all the current boundary processes.

The secondary wave implements the required synchronization by ensuring that at all times completed processes and non-completed processes are separated by boundary processes. Boundary processes mark the leaves of the broadcast tree and identify the processes that need to initiate the secondary wave. Note that since the system starts in an arbitrary initial system configuration, continuous marking of the leaves is crucial. That is, a locked process does not experience a state change due to the states or state changes of any of its neighbors other than its parent. The secondary wave along with a mechanism of locking the boundary processes guarantee that the boundary processes only receive the broadcast from completed processes avoiding the specification violation mentioned in Lemma 1. Maintenance of this separation with boundary processes at all times requires that before the boundary processes receive the broadcast, all the neighbors of the boundary processes not included in the normal broadcast tree and are not boundary processes (level

l + 2

processes) are designated as the new boundary processes. The secondary wave changes the states of the newly designated boundary processes to indicate that they are non-completed, locks them, and assigns some boundary processes as parents to them. Locking of the level

l + 2

processes prevents state changes of level

l + 2

processes due to arbitrary initialized processes. Notice that if a level

l + 2

process i is not locked, due to the states of its neighbors not in the tree, process i may change its state and enter a broadcast phase and/or assume a parent included in an abnormal broadcast tree immediately before the broadcast advances to the parent of a process i leading to not maintaining boundary processes. That is, when the broadcast phase advances to the parent of process i (a boundary process), and process i’s parent encounters process i that has entered the broadcast phase due to arbitrary initialization, a normal broadcast tree is no longer maintained. Now, observe that if it was not locked, process i’s parent may permaturely enter the feedback phase without delivering the message to process i violating propagation reliability. As a result of locking process i, a level

l + 2

process, once included in the tree, cannot assume a parent other than its designated parent in the broadcast tree or enter broadcast or feedback phases due to the states of its neighbors not included in the normal tree. After locking and parent assignment in the tree for the level

l + 2

processes, the broadcast tree is extended to the level

l + 2

processes by making them the new boundary processes and making the neighbors of the level

l + 2

processes that are neither boundary nor level

l + 2

processes the new level

l + 2

processes. Subsequently, the broadcast is extended to the current boundary processes. This concludes the current secondary wave and a new secondary wave can now be started.

Prior to assigning parents to and locking of level

l + 2

processes, no assumptions can be made about their states, particularly boundary processes cannot identify which of their neighbors are in level

l + 2

based on the states of their neighbors. Therefore, the neighbors of the boundary processes (level

l + 2

processes) that are not included in the broadcast tree yet need to be identified. For that purpose, the secondary wave collects the ids of all the neighbors of boundary processes, the

i d s

of the boundary processes, and the

i d

’s of the leaf processes (level l) and identifies the level

l + 2

processes as those that are neighbors of boundary but neither level l (leaf) nor level

l + 1

processes. Recall that as proven by Lemma 2, advancement of the broadcast to the boundary processes while maintaining the boundary safety property cannot be implemented locally. Therefore, a mechanism of state collection such as the one implemented here is essential.

We now describe the secondary wave in further detail. The secondary wave is initiated by the boundary processes in a normal broadcast tree T and the wave proceeds towards root process r in a bottom-up manner. In the secondary wave, the ids of the neighbors of the boundary, the

i d s

of the boundary processes, and the

i d

’s of the leaf processes (level l) are gathered and sent to the root process in a bottom-up manner. When the root collects the abovementioned information, it identifies the level

l + 2

processes as those that are neighbors of the boundary processes that are neither boundary nor leaf processes. Subsequently, the root assigns each level

l + 2

process i to a neighboring boundary process j such that j is designated as the parent of process i. Then, these designations are sent to boundary processes as the secondary wave proceeds in a top-down manner in broadcast tree T. When the top-down secondary wave reaches a boundary process j with the designations, the process waits until all its designated level

l + 2

children assume j as their parent, enter the cleaning phase and lock themselves. When all the designated children of j assume j as their parent, enter the cleaning phase, and are locked, the secondary wave is completed in the branch of the broadcast tree extending from root r to process j. Subsequently, the primary wave, extends the broadcast to process j as process j enters the broadcast phase. Subsequently, the secondary wave initiated by the new boundary processes (level

l + 2

processes in the previous secondary wave) proceed in a bottom-up manner as described above, and so on.

The secondary wave consists of two bottom-up phases, called f- and c-phases, and one top-down phase referred to as b-phase. A process is in state f, c, and b where it is in f-, c-, and b-phases, respectively.

When the broadcast phase of the primary wave reaches process i such that the broadcast phase cannot be extended to any neighbor of i (i.e., all the neighbors of i have already received the broadcast), i enters the feedback phase instead of the broadcast phase. Subsequently, each internal process enters the feedback phase upon discovering that all its children are in the feedback phase. The feedback phase proceeds in a bottom-up manner in the constructed tree and eventually reaches the root process. After a process is involved in a broadcast phase followed by a feedback phase, upon discovering that all its children are in the cleaning phase, it is removed from the normal tree by entering the cleaning phase. A process in the broadcast phase and the feedback phase is in state B and F, respectively, while a process in the cleaning phase is in state C.

Notice that any process not included in the normal broadcast tree can be in an arbitrary state and such processes may form abnormal trees due to arbitrary initialization. Our algorithm ensures that all such processes are eventually included in the normal broadcast tree and receive the broadcast.

5.2. Detailed Algorithm Description

Now, we introduce several variables that denote the state of the root/internal/leaf processes.

$P_{i}$ $\in {N . i \cup ⊥}$ : denotes i’s parent in the tree rooted at r and ⊥ if it is root r.
$S_{i} \in {B, F, C}$ : denotes the primary phase process i is in where states B, F and C denote the broadcast, the feedback, and the cleaning phases, respectively. In addition, $S_{i}$ = B denotes that process i has received message M directly or indirectly from root r. When $S_{i} = F$ and $S_{i} = C$ holds, it indicates that process i is involved in feedback, acknowledgment of the completion of the broadcast, and cleaning, clearing of the traces of the broadcast and phases, respectively.
$S_{i}^{'} \in {f, b, c} :$ denotes the secondary wave phase process i is in where states f, b, and c denote the f-, b-, and c-phases, respectively. A process i is in state c ( $S_{i}^{'} = c$ ) when it is in the c-phase of the secondary wave, indicating that the process is not currently involved in a secondary wave. If the state of a process is either F or C in a normal broadcast tree, a process i is always in state c. A process i is in state f ( $S_{i}^{'} = f$ ) when it is in the f-phase of the secondary wave where, in a bottom-up manner, the collection of $i d s$ of the boundary, the leaf, and the neighbors of the boundary processes takes place. Similarly, process i is in state b ( $S_{i}^{'} = b$ ) when it is in the b-phase of the secondary wave where the designated parenthood information decided by root process i (based on the collected information) is forwarded to the boundary and the level $l + 2$ processes in a top-down manner.
$L_{i} \in {1, 2, \dots, N}$ : denotes the distance of process i from root r.
$l o c k_{i} \in {true, false}$ : denotes whether or not process i is locked.
$N e i g h I n f o_{i} \in {t_{0}, t_{1}, t_{2}, \dots, \emptyset}$ : contains the tuples containing the $i d s$ of the descendants of process i in the broadcast tree that are neighbors of the boundary, the leaves and boundary processes to be forwarded to the root using the bottom-up f-phase of the secondary wave where $t_{k}$ , $k \geq 0$ , is a tuple of the form $< I D, L, N e i g h >$ , where $I D$ is the id of the leaf or the boundary process from which information is collected, L is the level of the process from which the information is collected, and $N e i g h$ is the neighbors ids if the process from which information is collected is a boundary process.
$C o r r I n f o_{i} \in {t_{0}, t_{1}, t_{2}, \dots, \emptyset}$ : denotes the tuples with the designated parenthood information decided by the root process and is used to forward designated parenthood information to the boundary and level $l + 2$ processes by the top-down secondary wave in the broadcast tree, where $t_{k}$ , $k \geq 0$ , is a tuple of form < $t a r g e t N o d e, P a r e n t$ >.

We denote the state of a process i in the form

X y

where

S_{i} = X

and

S_{i}^{'} = y

indicate that the process is in phase X and y of the primary and the secondary waves, respectively; e.g., process i in state

C c

indicates that

S_{i} = C \land S_{r}^{'} = c

holds.

Root process r starts a primary wave in a normal starting system configuration where the root process r is in state

C c

,

i n p u t ()

holds and for each of its neighbor i,

P_{i} = r \land S_{i} = C \land S_{i}^{'} = c \land L_{i} = 1 \land l o c k_{i}

holds by entering state

B c

(

S_{r} = B \land S_{r}^{'} = c

) (Guards

r B

and

i C 1

). This results in a broadcast tree consisting of process r in the broadcast phase with all of its neighbors included in the tree as boundary processes. Subsequently, a secondary wave is initiated which results in extending the broadcast tree to all the neighbors of the boundary. The primary and the secondary waves extend until there are no more non-completed processes.

When process i discovers that it does not have a non-completed neighbor, process i enters the

F -

phase by entering state F, which continues towards the root as in a PFC cycle in a bottom-up manner (Guards

i F 9

,

i C 10

,

r F

and

r C

). The C-phase follows the

F -

phase in which, upon discovering all their children in state C, processes enter state C (Guard

i C 10

)

The algorithm implementing the above approach is given in Algorithms 1–3.

Algorithm 1: Definitions for a 1-Safe and Reliable Broadcast Algorithm

Constants

r \in V

denotes the normal root.

L_{r} = 0

P_{r} = ⊥;

L o c k_{r} = t r u e;

Parameters

I d \in {1, 2, 3, 4, \dots .}

each process with unique Id.

Variables

P_{i} \in N . i \cup ⊥

for each process

i \in V

.

L_{i} \in {0 \dots n}

for each process

i \in V

.

S_{i} \in {B, F, C}

for each process

i \in V

.

S_{i}^{'} \in {f, b, c}

for each process

i \in V

.

{N e i g h I n f o}_{i} \in {t_{0}, t_{1}, t_{2}, \dots}

for each process

i \in V

, where

t_{j}

,

j \geq 0

, is a tuple of form

< I d, L, N e i g h >

,

where Id is the source of

t_{j}

, L is the level of the source of

t_{j}

and

N e i g h

is the set of the neighbors of the source of

t_{j}

{C o r r I n f o}_{i} \in {t_{0}, t_{1}, t_{2}, \dots}

for each process

i \in V

,

where

t_{j}

,

j \geq 0

, is a tuple of form

< T a r g e t N o d e, P a r e n t >

,

T a r g e t N o d e

is id of the process that

its parent needs to be corrected and

P a r e n t

process is the id of the process should be assigned as

the parent of

T a r g e t N o d e

process.

{L o c k}_{i} \in {true, false}

for each process

i \in V

.

Predicates:

{for root/internal/leaf process i}

g o o d P S S_{i}^{'}

\equiv ((S_{i} = B \land S_{i}^{'} = b) \to (S_{P_{i}} = B \land S_{P_{i}}^{'} = b \land \forall_{j \in C . i} (S_{j} = {C, B} \land S_{j}^{'} \in {b, f} \land l o c k_{j})) \lor

((S_{i} = B \land S_{i}^{'} = c) \to (S_{P_{i}} = B \land S_{P_{i}}^{'} = c \land \forall_{j \in C . i} ((S_{j} = {C, B} \land S_{j}^{'} \in {f, c} \land l o c k_{j})) \lor (S_{i} = F \land S_{i}^{'} = c) \lor

\forall_{j \in C . i} (S_{j} \in {F, B} \land S_{j}^{'} = c \land l o c k_{j}))

\lor (S_{P_{i}} = B \land S_{P_{i}}^{'} = b \land (\forall_{j \in C . i} (S_{j} = B \land S_{j}^{'} = c \land l o c k_{j})))

(S_{i} = B \land S_{i}^{'} = b) \to (S_{P_{i}} = B \land S_{P_{i}}^{'} = b \land \forall_{j \in C . i} (S_{j} = {C, B} \land S_{j}^{'} \in {b, f} \land l o c k_{j}) \lor

(S_{i} = F \land S_{i}^{'} = c) \to (S_{P_{i}} = {C, B, F} \land S_{P_{i}}^{'} = c \land \forall_{j \in C . i} (S_{j} = F \land S_{j}^{'} = c \land l o c k_{j})

{for internal/leaf process i}

g o o d f_{i}

\equiv ((S_{i} = B \land S_{i}^{'} = f) \to (S_{P_{i}} = B \land (\forall_{j \in C . i} (S_{j} = {B, C} \land S_{j}^{'} = f \land l o c k_{j})) \lor

g o o d L e v e l_{i} \equiv P_{i} \neq ⊥ \to (L_{i} = L_{P_{i}} + 1 \land L_{i} ≯ n)

g o o d L o c k_{i} \equiv (S_{i}^{'} \in {f, b} \to l o c k_{i}) \land

(l o c k_{i} \to l o c k_{P_{i}})

n o r m a l I n t e r n a l_{i} \equiv g o o d L e v e l_{i} \land g o o d L o c k_{i} \land g o o d P f

a b n o r m a l_{i} \equiv (i = r) \to (\neg g o o d P S S_{i}^{'} \lor \neg n o r m a l I n t e r n a l_{i})

Function

d e c i d e P a r e n t s () :

if $\forall_{i \in N . r}$ ( $S_{i} = C \land S_{i}^{'} = f$ )

N e i g h I n f o_{r} : = {< r, L_{r}, \emptyset >}

N e i g h I n f o_{r} : = N e i g h I n f o_{r} \cup N e i g h I n f o_{j}

|

j \in N . r \land P_{j} = r \land N e i g h I n f o_{j} ⊄ N e i g h I n f o_{r}

N^{L + 1} = {j \in N . i | < i, L_{i}, N_{i} > \in N e i g h I n f o_{r}}

P^{L} = {i | < i, L_{i}, N_{i} > \in N e i g h I n f o_{r} \land N . i = \emptyset}

P^{L + 1} = {i | < i, L_{i}, N_{i} > \in N e i g h I n f o_{r} \land N . i \neq \emptyset}

P^{L + 2} \equiv N^{L + 1} / (P^{L} \cup P^{L + 1})

C o r r I n f o_{r} : = \emptyset

for each $i \in P^{L + 2}$

C o r r I n f o_{r} : = {< i, j > | (i, j) \in E \land j \in P^{L + 1}} \cup C o r r I n f o_{r}

Algorithm 2: A 1-Safe and Reliable Broadcast Algorithm for Root Process
Actions
{Program for root process r}
$r B$ $i n p u t () \land S_{r} = C \land S_{r}^{'} = c \land \forall_{j \in N . r} (P_{j} = r \land S_{j} = C \land S_{j}^{'} = c \land L_{j} = 1 \land l o c k_{j})$	⟶	$S_{r} : = B;$
		$d i s a b l e . i n p u t ();$
$r b$ $S_{r} = B \land S_{r}^{'} = c \land \forall_{j \in N . r} (S_{j}^{'} = f) \land \neg a b n o r m a l_{r}$	⟶	$d e c i d e P a r e n t s ();$
		$S_{r}^{'} : = b;$
$r c$ $S_{r} = B \land S_{r}^{'} = b \land \forall_{j \in N . r} (S_{j} = B \land S_{j}^{'} = c) \land \neg a b n o r m a l_{r}$	⟶	$S_{r}^{'} : = c;$
$r F$ $S_{r} = B \land S_{r}^{'} = c \land \forall_{j \in N . r} (S_{j} = F) \land \neg a b n o r m a l_{r}$	⟶	$S_{r} : = F;$
		$S_{r}^{'} : = c;$
$r C$ $S_{r} = F \land S_{r}^{'} = c \land \forall_{j \in N . r} (S_{j} = F \lor S_{j} = C) \land \neg a b n o r m a l_{r}$	⟶	$S_{r} : = C;$
		$S_{r}^{'} : = c;$
		$l o c k_{i} : = f a l s e;$
$r C a$ $a b n o r m a l_{r}$	⟶	$S_{r} : = C;$
		$S_{r}^{'} : = c;$

Algorithm 3: A 1-Safe and Reliable Broadcast Algorithm for Internal/Leaf Process
Actions
{Program for internal/leaf process i }
$i C 1$ $S_{i} = C \land S_{i}^{'} = c \land r \in N . i \land P_{i} \neq r \land S_{r} = C \land S_{r}^{'} = c \land \neg a b n o r m a l_{i}$	⟶	$P_{i} : = r;$
		$L_{i} : = 1;$
		$l o c k_{i} : = t r u e;$
$i c 2$ $\exists_{j \in N . i} (S_{j} = C \land S_{j}^{'} = b \land l o c k_{j} \land \exists_{y \in C o r r I n f o_{j}} (< i, j > \in y)) \land$	⟶	$P_{i} : = j;$
$\neg a b n o r m a l_{i}$		$L_{i} : = L_{j} + 1;$
		$l o c k_{i} : = t r u e;$
		$S_{i} = C;$
		$S_{i}^{'} = c;$
$i f 3$ $S_{i} = C \land S_{i}^{'} = c \land S_{P_{i}} = B \land S_{P_{i}}^{'} = c \land l o c k_{i} \land \neg a b n o r m a l_{i}$	⟶	$N e i g h I n f o_{i} : = {< i, L_{i}, N . i >};$
		$S_{i}^{'} : = f;$
$i f 4$ $S_{i} = B \land S_{i}^{'} = c \land \forall_{j \in N . i} (P_{j} = i \to (S_{j} = C \land S_{j}^{'} = f)) \land$	⟶	$N e i g h I n f o_{i} : = {< i, L_{i}, \emptyset >};$
$\neg a b n o r m a l_{i}$		$N e i g h C o m p_{i} ()$
		$S_{i}^{'} : = f;$
$i f 5$ $S_{i} = B \land S_{P_{i}} = B \land \forall_{j \in N . i} (P_{j} = i \to (S_{j} = B \land S_{j}^{'} = f)) \land$	⟶	$N e i g h I n f o_{i} : = C T u p l e s ();$
$\neg a b n o r m a l_{i}$		$S_{i}^{'} : = f;$
$i b 6$ $S_{i} = {B, C} \land S_{i}^{'} = f \land S_{P_{i}} = B \land S_{P_{i}}^{'} = b \land \neg a b n o r m a l_{i}$	⟶	$C o r r I n f o_{i} : = C o r r I n f o_{P_{i}}$
		$S_{i}^{'} : = b$
$i b 7$ $S_{i} = C \land S_{i}^{'} = b \land S_{P_{i}} = B \land S_{P_{i}}^{'} = b \land \neg a b n o r m a l_{i}$	⟶	$S_{i} : = B;$
$\forall_{y \in C o r r I n f o_{i}} (\exists_{j \in N . i} (< j, i > \in y \to (P_{j} = i \land S_{j} = C \land S_{j}^{'} = c \land l o c k_{j})))$		$S_{i}^{'} : = c;$
$i c 8$ $(S_{i} = B \land S_{i}^{'} = b \land S_{P_{i}} = B \land S_{P_{i}}^{'} = b \land$	⟶	$S_{j}^{'} : = c;$
$(\forall_{j \in N . i} (P_{j} = i \to (S_{j} = {B, C} \land l o c k_{j} \land S_{j}^{'} = c) \land \neg a b n o r m a l_{i}$
$i F 9$ $S_{i} = B \land S_{i}^{'} = c \land \forall_{j \in N . i} (P_{j} = i \to (S_{j} = F \land S_{j}^{'} = c)) \land \neg a b n o r m a l_{i}$	⟶	$S_{i} : = F;$
$i C 10$ $S_{i} = F \land S_{i}^{'} = c \land S_{P_{i}} = F \land \forall_{j \in N . i} (P_{j} = i \to (S_{j} = C \land S_{j}^{'} = c)) \land$	⟶	$S_{i} : = C;$
$\neg a b n o r m a l_{i}$		$S_{i}^{'} : = c;$
		$P_{i} : = ⊥;$
		$l o c k_{i} : = f a l s e;$
$i C a$ $a b n o r m a l_{i}$	⟶	$S_{i} : = C;$
		$S_{i}^{'} : = c;$
		$P_{i} : = ⊥;$
		$l o c k_{i} : = f a l s e;$
where $N . i$ denotes the set of neighbors of i. $C T u p l e s ()$ returns the set of tuples in the children’s $N e i g h I n f o$ variables. Function $N e i g h C o m p_{i} ()$ denotes the computation $N e i g h I n f o_{i} : = N e i g h I n f o_{i} \cup C T u p l e s ();$

6. Illustrative Example

This section presents a sample execution of our algorithm starting in some initial system configurations. In our figures, each new graph represents the system configuration obtained after each process executes its enabled guards exactly once in a lock-step manner.

6.1. Illustration of Premature Feedback

We now illustrate premature feedback in the execution of a PFC algorithm, after the PFC algorithm starts in “arbitrary initial system configuration” (Figure 2). In the figure, processes are in states C, B, and F when they are in the cleaning, broadcast, and feedback phases, respectively, in order. In Figure 2 Part 1, we present an initial system configuration of a distributed system in a network on six processes, where Processes 1 through 5 are enabled while Process 0 is disabled. Notice that Processes 2, 4, and 5 are part of an abnormal tree which extends (tree shown in red) to a number of disabled processes such that the elimination of the abnormal tree takes eleven rounds. It is easy to see that since their states are not consistent with those of their parents, Processes 1 and 2 are enabled to enter state C and assign 0 to their L-variables. Furthermore, Process 5 is shown to be abnormal since its level value is 25, whereas its parent (Process 5) is 10. Therefore, in Part 2, in one round, Processes 1, 3, and 5 assign C to their states and assign 0 to their L-variables.

Observe that, in the next round, Processes 0, 1, 3, and 5 enter the broadcast phase (state B). After processes 0, 1, 3, and 5 enter the broadcast phase, they cannot locally decide whether Processes 2 and 4 are part of the normal tree or not. If they enter the feedback phase without checking if Processes 2 and 4 are included in an abnormal tree, the broadcast never reaches Processes 2 and 4, causing premature feedback, which in turn violates propagation reliability.

6.2. Illustration of Specification Violation and Semantics Change

Figure 3 Part 1 presents another system configuration that can be reached by a PFC after starting in an arbitrary system configuration. Notice that in this system configuration, there exists a normal broadcast tree rooted at root Process 0 consisting of Processes 0, 1, 4, and 5 and an abnormal broadcast tree rooted at processes containing Processes 2 and 3. It is easy to see that Process 5 in the normal tree may discover Process 3 to be closer to the root since Process 3’s L-variable is 1, whereas its parent’s (Process 4) L-variable is 2. As in a typical PFC algorithm that constructs a BFS spanning tree starting in an “arbitrary initial system configuration”, Process 5 may change its parent to Process 3 using only local knowledge. Observe that now Process 5 is contained in an abnormal broadcast tree which will eventually be eliminated, and subsequently Process 5 is included in the normal broadcast tree. Now, notice that Process 5 has received the same message by joining the normal broadcast tree leading to the at least once semantics instead of the exactly once semantics. Note that this is caused by an action of Process 5 based on local knowledge. It is easy to see that a mechanism that checks the normality of both trees prior to the action of Process 5 prevents the actions that lead to the semantics change.

6.3. Normal Execution

In this section, we are to show a sample execution of the proposed algorithm starting in an arbitrary system configuration. Each circle in the figure represents an individual process, and within each circle, the primary state variable is shown in the upper left half, the secondary state variable is shown in the lower left half, and the level of the process is shown in the right half. The root process is identified by id 0. Each locked process is denoted by a thicker circle. The construction of the broadcast tree is shown using arrows that indicate the parent–child relationship between adjacent processes.

Figure 4 Part 1 shows the arbitrary starting configuration where processes have arbitrary initial values. It is easy to see that Processes 1, 3, and 5 have inconsistent level values with respect to their parents, whereas Processes 2 and 4 have no parent. In one round, all system processes enter the cleaning phase by executing guard

i C a

as shown in Figure 4 Part 2. Note that, in our algorithm, the level values of the processes are corrected by the secondary wave. Therefore, it is shown in Part 2 that all the processes are in state C, while their L-variables still have the initial arbitrary values.

Figure 4 Part 3 shows the normal starting configuration where root Process 0 waits until all its neighboring processes (Process 1 and Process 3) enter state C, are locked, and with variables

L = 1

,

S = C

, and

S^{'} = c

hold for both the processes by executing guard

i C 1

. In Part 4, the root starts the construction of the normal broadcast tree by entering the broadcast phase (guard

r B

). After entering the broadcast phase, notice that the neighbors of the root are all in state C and locked by the root to guarantee the receipt of the message. However, Processes 1 and 3 do not enter the broadcast phase until ensuring that the neighbors of Processes 1 and 3 not included in the tree are locked and their S,

S^{'}

, and L variables are set to C, c, and 2, respectively, by a secondary wave. For that purpose, Processes 1 and 3 initiate the first secondary wave f (

S = C

and

S^{'} = f

) by entering state f as shown in Figure 5 Part 5.

In the next round, as shown in Figure 5 Part 6, the secondary wave reaches the root as it enters state b. Then, in the next round, Processes 1 and 3 enter state b as the secondary wave reaches them in a top-down manner, as shown in Figure 5 Part 7. As the secondary wave takes place, the required information is collected from the processes, and the designated parents of the new boundary processes (Processes 2, 4, and 5) are determined by the root and sent to Processes 1 and 3. In the next round, as shown in Figure 5 Part 8, Processes 2, 4, and 5 determine their parents and are locked based on the designations. Therefore, Processes 1 and 3 remain in state

C b

until Processes 2, 4, and 5 assign parents to themselves and also are locked according to the received correction information

C o r r e I n f o_{1, 3}

by executing the guards

i c 2

and

i b 7

. Guards

i c 2

and

i b 7

are responsible for executing the correction of states of the processes that are not contained in the tree processes when they discover that their locked neighboring processes have received correction information for them. Moreover, the levels of the processes are assigned according to their parent’s level values. Subsequently, upon discovering that parent assignments and locking of the new boundary processes are completed, Processes 1 and 3 enter state B and c (

S = B

and

S^{'} = c

), extending the broadcast to them as shown in Figure 5 Part 9.

This concludes a secondary wave. Since the broadcast is not completed, another secondary wave is started by Processes 2, 4, and 5 as shown in Figure 5 Part 10. The wave continues in the f-phase in a bottom-up manner and then in a top-down manner in a b-phase in Figure 5 Parts 11 through 14. The secondary wave concludes in Figure 5 Part 15, where the broadcast reaches all system processes. The remaining parts in Figure 6 show the bottom-up F-phase followed by the C-phase to complete the broadcast.

7. Proof of Correctness

In this section, we first define a few terms used in the proofs, then present the proof outline, followed by the proof for the proposed algorithm.

Root r, referred to as the normal root, is the only process that receives a message as input to broadcast to the network. Any process without a parent other than r and is the root of a tree is called an abnormal root.

Proposition 1.

Starting in an arbitrary system configuration, in the execution of the first enabled action of the proposed algorithm, process i may only receive a spurious message from an illegitimate source, while it is impossible to receive a message from a legitimate source.

Definition 1.

Anormal starting configuration with respect to root r refers to a system configuration in which

i n p u t () \land S_{r} = C \land S_{r}^{'} = c \land \forall_{j \in N . r} (P_{j} = r \land S_{j} = C \land L_{j} = 1 \land S_{j}^{'} = c \land l o c k_{j})

holds.

Definition 2 (normal broadcast tree).

A tree T is a normal broadcast tree if it satisfies the following three conditions:

(i): Tree T is rooted at process r.
(ii): Each internal process i in the normal tree satisfies $L_{i} = L_{P_{i}} + 1 \land L o c k_{i} \land S_{i} = B$ and each leaf process i satisfies $S_{i} = C \land l o c k_{i} \land L_{i} = L_{P_{i}} + 1$ .
(iii): For root r of tree T, $P_{r} = ⊥$ holds and for any internal or leaf process i in T, if j is the parent for process i, then $P_{i} = j$ .

Lemma 3 (1-safety).

In any arbitrary system configuration, let i be a process included in an normal broadcast tree after the system starts. No process joins the abnormal tree by assuming process i as its parent and entering state

B c

.

Proof.

Notice that immediately after the system starts, only if process i is a leaf (boundary process) in an abnormal tree, in state

C b

, locked, and all i’s children are in

C c

and locked, then process i enters state

B c

(Guard

i b 7

). Then, i’s children enter state

C f

(Guard

i f 3

) and start the secondary wave and wait until entering

B c

. However, in an abnormal tree, when the secondary wave initiated by i’s children reaches the root of the abnormal tree, the abnormal root cannot start the top-down b-phase which facilitates that i’s children enter

B c

. Therefore, no child of i can enter state

B c

. Hence, the proof follows. □

Lemma 4 (Abnormal State Delay).

Starting in an arbitrary system configuration where i is in an abnormal tree, process i enters state

C c

in at most n rounds.

Proof.

Let process i be a process in an abnormal tree. We are to show that process i enters

C c

. Observe that by Lemma 3, the height of the abnormal tree cannot increase by more than one. By the definition of abnormal tree, the root of the abnormal tree containing process i is in an abnormal state and is enabled to execute guarded command

r C a

or

i C a

. Then by the weak fairness assumption, the root of the abnormal tree enters

C c

in one round, which reduces the height of the abnormal tree containing process i by one. Observe that the new root of the abnormal tree containing process i is now enabled to execute guarded command

i C a

. Inductively, it can be shown that the abnormal tree(s) containing i vanishes in at most n rounds. Hence the proof follows. □

The following corollaries immediately follow from Lemma 4.

Corollary 1 (Delay).

After starting in an arbitrary system configuration, the system enters a normal starting configuration in O(n) rounds.

Proof.

Upon system start, by Lemma 4 root r enters state

C c

in n rounds and the system enters the normal starting configuration

n + 1

rounds. □

The following proposition immediately follows from Lemmas 3 and 4.

Proposition 2.

Starting in an arbitrary system configuration, after n rounds all abnormal trees in the network vanish and all the processes not in the normal tree enter state

C c

. After a process enters state

C c

or is in state

C c

at system start, the process remains in state

C c

until joining the normal tree.

Lemma 5 (Premature Feedback Safety).

Upon starting in an arbitrary system configuration, premature feedback safety is never violated by the proposed algorithm.

Proof.

Let i be a non-completed neighbor of a completed process and j is a non-completed neighbor of process i in an arbitrary initial state. We are to show that process i cannot receive the broadcast from process j. Since process i is a non-completed process with a completed neighbor, it is a boundary process contained in a normal broadcast tree. Also, notice that since process i is contained in a normal broadcast tree, it is locked. Since locked process i may be unlocked only when the parent of i in legitimate tree T changes its state to C or broadcast is completed in the sub-tree rooted at the parent process of i. Since process i is in a normal broadcast tree and the parent of process i does not change and enter state C until the broadcast is completed in the subtree rooted at the parent process of i in a normal broadcast tree, process i cannot receive the broadcast from process j. Hence, the proof follows. □

Lemma 6 (Liveness).

After the system starts in an arbitrary initial system configuration, in any system configuration, if process i is a boundary process in state

C c

in a normal tree T just before the secondary wave is initiated by process i, then upon completion of the secondary wave, process i and all its ancestors enter state

B c

and all neighbors of i not in T enter state

C c

and assume a leaf in T as a parent and are locked in at most

2 D + 1 + n

rounds if the secondary wave is initiated in the first n rounds after the system start and

2 D + 1

rounds if the secondary wave is initiated after the first n rounds after the system start.

Proof.

Observe that when process i joins normal tree T, it is in state

C c

, locked, and its parent is in state

B c

. Then, process i starts the secondary wave by entering state

C f

and collects neighborhood information by assigning all its neighbors to

N e i g h I n f o r_{i}

. Subsequently, when all the children of i’s parent j enter state

C f

, then j enters state

B f

and (includes its id and level l to

N e i g h I n f o r_{j}

). Phase

B f

moves in a bottom-up manner towards the root as each process j enters state

B f

and gathers the neighborhood information coming from the leaves and the parents of the leaves in T to

N e i g h I n f o r_{j}

from its children. It is easy to see that starting from the leaves,

C f

phase takes

D - 1

rounds to reach children of root r. When all the children of the root enter state

B f

(or

C f

if the child is a leaf in T), then root r gathers the neighborhood information from its children.

Upon copying the neighborhood information, root r knows all the leaves, all the neighbors of the leaves, and all the neighbors of the parents of the leaves. Using the collected information, root r removes the leaves and the neighbors of the parents of the leaves from the neighbors of the leaves to obtain the neighbors of the leaves not in T. Subsequently, for each neighbor of the leaves not in T, root r designates a leaf in T as a parent, stores the designation for the neighbor of the leaves not in T in

C o r r I n f o_{r}

and enters state

B b

in one round. Then root r broadcasts the information stored in

C o r r I n f o_{r}

in b-phase moves in a top-down manner. In phase

B b

, upon discovering that i’s parent is in state

B b

, i copies the information stored in

C o r r I n f o_{P_{i}}

to

C o r r I n f o_{i}

and enters state

B b

. By Lemma 5, leaf processes cannot join an normal broadcast tree and hence remain in state

C c

until j enters state

C b

. When the bottom-up wave reaches the leaves as the processes enter state

B b

, the leaves enter state

C b

instead of

B b

. Observe that the b-phase proceeds from root r to the leaves in at most

D - 1

rounds.

When process i enters state

C b

, process i waits until each one of their neighbors not in T enters state

C c

, locks, and points to the designated parent by the root (available in

C o r r I n f o_{i}

). Observe that each one of the neighbors j of i not in T can be in

C c

; however, its parent is not the one designated by the root in an abnormal state or in an abnormal tree. If process j is in an abnormal state or tree, then by Lemma 4, j enters state

C c

in at most n rounds. By Proposition 2, process j will not enter any abnormal tree after entering state

C c

in n rounds. If j is in state

C c

and j’s parent is not the designated one by the root, it is easy to see that in one round guarded command

i c 2

is executed and j assumes the parent designated by the root. Subsequently, guarded command

i c 2

is enabled to assign a leaf in T designated by the root (available in

C o r r I n f o_{i}

) as a parent to each neighbor of i not in T. When all the neighbors of i not in T enter

C c

and assume the designated parent by the root and are locked, process i enters

B c

in an additional round. After all the ancestors of i enter state

B c

in a top-down manner in a

B c

phase, then process i enters state

B c

in one round. Therefore, it is easy to see that process i enters phase

B c

in at most

2 D + 1 + n

rounds. Hence, the proof follows. □

The following corollary, which states that after the completion of the secondary wave, all the neighbors of the leaves which are not in tree T are included in T as leaves to obtain normal tree

T^{'}

, immediately follows from Proposition 1 and Lemma 6.

Corollary 2.

In a normal system configuration where process r is enabled and is in state

C c

, root r is included in normal tree T as a leaf in one round. Let tree T be a normal tree immediately after the completion of a secondary wave. Immediately after the completion of the next secondary wave, all the neighbors of the leaves which are not in tree T are included in tree T as leaves.

Lemma 7 (Propagation Reliability).

Starting in any arbitrary system configuration, the broadcast is completed in

2 D^{2} + 2 (D + 1) + n

rounds, where D is the diameter of graph G.

Proof.

Starting in an arbitrary system configuration, we know that by Lemma 1, root r enters normal starting system configuration after n rounds. It is easy to see by Lemma 6 that the first secondary wave is completed in at most

2 D + 1 + n

rounds. After the first secondary wave, at most

D - 1

secondary waves are required to complete the broadcast. By Lemma 6, we know that after the first secondary wave is completed in

2 D + 1 + n

rounds, each one of the subsequent secondary waves is completed in

2 D + 1

rounds. It is easy to see that after the first secondary wave is completed in

2 D + 1 + n

rounds after the system starts, since there are

D - 1

secondary waves remaining, we need

(D - 1) (2 D + 1)

rounds to complete the remaining in at most

D - 1

remaining secondary waves. Hence, each process is guaranteed to enter state

B c

in at most

2 D + 1 + n + (D - 1) (2 D + 1)

rounds. Observe that subsequently primary F and C waves are completed in at most

D + 1

rounds. Therefore, the broadcast is completed after

2 D^{2} + 2 (D + 1) + n

rounds. Hence, the proof follows. □

The following theorem immediately follows from Lemmas 3, 5 and 7.

Theorem 1.

The proposed algorithm is a 1-safe reliable broadcast algorithm which completes each broadcast in

2 D^{2} + 2 (D + 1) + n

rounds after starting in an arbitrary initial system configuration.

8. Conclusions and Future Work

In this paper, we first demonstrate that it is impossible to develop an asynchronous local reliable broadcast algorithm that operates under arbitrary initial system configurations where processes act solely based on local information. We then introduce the first asynchronous reliable broadcast algorithm that is able to start in any arbitrary configuration and ensures three properties: premature feedback safety, propagation reliability, and propagation 1-safety. Premature feedback safety guarantees that each neighboring process that has not yet completed the broadcast will always be in a state that only allows it to receive the broadcast from a completed process. Propagation 1-safety guarantees that a spurious message can be sent from a process to another but cannot be transmitted further. On the other hand, propagation reliability guarantees that all system processes receive each legitimate message sent by root process r and will be propagated as specified without any backtracking.

If a new uninitialized process joins the network and becomes a neighbor of a completed process, since neighboring processes of a newly joined process cannot locally determine whether or not the newly joined process is a completed process or a non-completed process, premature feedback safety is violated. (As mentioned by Lemma 1).

The challenge of developing multi-source asynchronous reliable broadcast algorithms that allow simultaneous broadcasts from multiple sources remains unsolved. Similarly, devising asynchronous reliable broadcast algorithms specifically designed for certain regular topologies or applications such as blockchain networks, where further restrictions are present, is another open problem.

While our paper mentions various applications for the proposed algorithm, a comprehensive examination of the algorithm’s practicality and its use in real-time applications through performance evaluations is beyond the scope of this paper. However, we view these as potential future work.

Experimental studies are needed to assess the scalability of our proposed algorithm in comparison to distributed reset algorithms. This study will focus on the relationship between network size, number of transient faults (initialized processes vs. uninitialized processes), and algorithm failure rate. By simulating the scenarios, including perturbation of initialized processes by transient faults, we can evaluate the resilience of the algorithm to transient faults and its scalability and determine its compatibility with applications in the real world.

Finally, an adaptive (to tolpology changes) and crash-resilient version of our proposed algorithm is highly desirable and is considered as future work.

Author Contributions

The work presented in this paper was evenly distributed among all authors. Each of the author contributed equally to the design, formal proof, algorithm development, and writing. We collaboratively addressed the challenges of asynchronous reliable broadcast and jointly developed the proposed algorithm and its correctness. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Kuwait University Research Administration, Grant No [EO 06/19].

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, P.; Zhang, J.; Lu, J.; Zhang, H.; Gao, T.; Chen, S. A prior knowledge-embedded reinforcement learning method for real-time active power corrective control in complex power systems. Front. Energy Res. 2022, 10, 1009545. [Google Scholar]
Jady, Z.A.; Karaata, M.H. Possibility and Impossibility of Propagation Safety and Reliability: A 1-Safe and Reliable Snap-Stabilizing Broadcast Algorithm. IEEE Trans. Dependable Secur. Comput. 2023, 20, 2174–2187. [Google Scholar]
Anita, E.A.M.; Bai, V.T.; Raj, E.; Prabhu, B.R. Defending against worm hole attacks in multicast routing protocols for mobile ad hoc networks. In Proceedings of the 2011 2nd International Conference on Wireless Communication, Vehicular Technology, Information Theory and Aerospace & Electronic Systems Technology (Wireless VITAE), Chennai, India, 28 February–3 March 2011; pp. 1–5. [Google Scholar]
Sharif, M.S.; Moein, M. An Effective Cost-Sensitive Convolutional Neural Network for Network Traffic Classification. In Proceedings of the 2021 International Conference on Innovation and Intelligence for Informatics, Computing, and Technologies (3ICT), Zallaq, Bahrain, 29–30 September 2021; pp. 40–45. [Google Scholar]
Caballero, D.; González, F.J.M.; Islam, S.A. Analysis of Network Protocols for Secure Communication. In Proceedings of the 2021 9th International Symposium on Digital Forensics and Security (ISDFS), Elazig, Turkey, 28–29 June 2021; pp. 1–6. [Google Scholar]
Ahmed, W.; Wu, Y.W. A Survey on Reliability in Distributed Systems. J. Comput. Syst. Sci. 2013, 79, 1243–1255. [Google Scholar]
Kreps, J. Exactly-once Semantics Are Possible: Here’s How Apache Kafka Does It. 2017. Available online: https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ (accessed on 10 September 2023).
Dwork, C.; Lynch, N.; Stockmeyer, L. Consensus in the presence of partial synchrony. J. ACM 1988, 35, 288–323. [Google Scholar]
Xiao, Y.; Zhang, N.; Li, J.; Lou, W.; Hou, Y.T. Distributed consensus protocols and algorithms. Blockchain Distrib. Syst. Secur. 2019, 25, 40. [Google Scholar]
Ma, Y.; Zhang, Y.; Cheng, X.; Huang, X.; Wang, X. Internet of things security: A survey. J. Netw. Comput. Appl. 2018, 88, 10–28. [Google Scholar] [CrossRef]
Zhang, Y.; Wen, J.; Li, X.; Lin, C. Smart home energy management systems: Concept, configurations, and scheduling strategies. Renew. Sustain. Energy Rev. 2016, 56, 30–40. [Google Scholar] [CrossRef]
Hayat, A.; Ahmed, M.; Al-Fuqaha, A. Internet of things (IoT): A review of enabling technologies, challenges, and open research issues. IEEE Internet Things J. 2016, 3, 1–27. [Google Scholar]
García-García, L.; Jiménez, J.M.; Abdullah, M.T.A.; Lloret, J. Wireless Technologies for IoT in Smart Cities. Netw. Protoc. Algorithms 2018, 10, 23–64. [Google Scholar]
Li, X.; Liu, S.; Wei, W. A review on the coordination of distributed energy resources in smart grid. Renew. Sustain. Energy Rev. 2018, 82, 387–399. [Google Scholar] [CrossRef]
Bazm, M.M.; Lacoste, M.; Südholt, M.; Menaud, J.M. Isolation in cloud computing infrastructures: New security challenges. Ann. Telecommun. 2019, 74, 197–209. [Google Scholar]
Chi, J.; Li, Y.; Huang, J.; Liu, J.; Jin, Y.; Chen, C.; Qiu, T. A secure and efficient data sharing scheme based on blockchain in industrial internet of things. J. Netw. Comput. Appl. 2020, 167, 102710. [Google Scholar]
Arora, A.; Gouda, M. Distributed reset. IEEE Trans. Comput. 1994, 43, 1026–1038. [Google Scholar]
Andersson, J. Modeling and Analyzing Runtime Properties of Complex Embedded Systems. 2004. Available online: https://www.ipr.mdh.se/pdf_publications/649.pdf (accessed on 10 September 2023).
Kraft, J. Enabling Timing Analysis of Complex Embedded Software Systems; Mälardalen University: Västerås, Sweden, 2010. [Google Scholar]
Kumar, C.S. Rollback Recovery and Checkpointing in Heterogenous Grid Computing. 2010. Available online: http://www.123seminarsonly.com/Seminar-Reports/005/44385199-Grid-Computing.pdf (accessed on 14 February 2023).
Chatzidrossos, I. Live Streaming Performance of Peer-to-Peer Systems. Ph.D. Thesis, KTH Royal Institute of Technology, Stockholm, Sweden, 2012. [Google Scholar]
Majumdar, A. Problems of Distributed Systems. 2018. Available online: https://medium.com/@thisisananth/problems-of-distributed-systems-22fc22eec347 (accessed on 14 February 2023).
Feng, X.; Wang, X.; Cui, K.; Xie, Q.; Wang, L. A distributed message authentication scheme with reputation mechanism for Internet of Vehicle. J. Syst. Archit. 2023, 145, 103029. [Google Scholar]
Zhu, H.; Ling, Q. Byzantine-Robust Distributed Learning with Compression. IEEE Trans. Signal Inf. Process. Over Netw. 2023, 9, 280–294. [Google Scholar]
Liu, C.; Xu, M.; Guo, H.; Cheng, X.; Xiao, Y.; Yu, D.; Gong, B.; Yerukhimovich, A.; Wang, S.; Lyu, W. TBAC: A Tokoin-based Accountable Access Control Scheme for the Internet of Things. IEEE Trans. Mob. Comput. 2023, 23, 6133–6148. [Google Scholar] [CrossRef]
Zhao, X.; Zhong, B.; Cui, Z. Design of a Decentralized Identifier-Based Authentication and Access Control Model for Smart Homes. Electronics 2023, 12, 3334. [Google Scholar] [CrossRef]
Joseph, T.A.; Birman, K.P. Reliable Broadcast Protocols; Technical Report; Cornell University: Ithaca, NY, USA, 1989. [Google Scholar]
Melliar-Smith, P.M.; Moser, L.E.; Agrawala, V. Broadcast protocols for distributed systems. IEEE Trans. Parallel Distrib. Syst. 1990, 1, 17–25. [Google Scholar]
Livny, M.; Melman, M. Load balancing in homogeneous broadcast distributed systems. In Proceedings of the Computer Network Performance Symposium, College Park, MD, USA, 13–14 April 1982; pp. 47–55. [Google Scholar]
Pedone, F.; Schiper, A. Generic broadcast. In Distributed Computing: 13th International Symposium, DISC’99, Bratislava, Slovak Republic, September 27–29, 1999; Proceedings 13; Springer: Berlin/Heidelberg, Germany, 1999; pp. 94–106. [Google Scholar]
Segall, A. Distributed Network Protocols. IEEE Trans. Inf. Theory 1983, IT-29, 23–35. [Google Scholar]
Cournier, A.; Devismes, S.; Villain, V. Light enabling snap-stabilization of fundamental protocols. ACM Trans. Auton. Adapt. Syst. 2009, 4, 1–27. [Google Scholar] [CrossRef]
Cournier, A.; Datta, A.; Petit, F.; Villain, V. Enabling snap-stabilization. In Proceedings of the 23rd International Conference on Distributed Computing Systems, Providence, RI, USA, 19–22 May 2003. [Google Scholar] [CrossRef]
Blin, L.; Tixeuil, S. Compact deterministic self-stabilizing leader election on a ring: The exponential advantage of being talkative. Distrib. Comput. 2017, 31, 139–166. [Google Scholar] [CrossRef]
Nia, M.A.; Faghih, F. Probabilistic analysis of self-stabilizing systems: A case study on a mutual exclusion algorithm. In Proceedings of the 2018 Real-Time and Embedded Systems and Technologies (RTEST), Tehran, Iran, 9–10 May 2018. [Google Scholar] [CrossRef]
Baronti, M.; Mari, F.D.; Putten, R.V.D.; Venturi, I. Maximum, Minimum, Supremum, Infimum. In UNITEXT Calculus Problems; Springer: Cham, Switzerland, 2016; pp. 41–51. [Google Scholar] [CrossRef]
Blin, L.; Cournier, A.; Villain, V. An Improved Snap-Stabilizing PIF Algorithm. In Self-Stabilizing Systems; Lecture Notes in Computer Science Self-Stabilizing Systems; Springer: Berlin/Heidelberg, Germany, 2003; pp. 199–214. [Google Scholar] [CrossRef]
Bein, D.; Datta, A.K.; Karaata, M.H. An Optimal Snap-Stabilizing Multi-Wave Algorithm. Comput. J. 2007, 50, 332–340. [Google Scholar] [CrossRef]
da Silva, A.P.R.; Teixeira, F.A.; Lage, R.K.V.; Ruiz, L.B.; Loureiro, A.A.F.; Nogueira, J.M.S. Using a distributed snapshot algorithm in wireless sensor networks. In Proceedings of the The Ninth IEEE Workshop on Future Trends of Distributed Computing Systems, FTDCS 2003, San Juan, PR, USA, 28–30 May 2003; pp. 31–37. [Google Scholar]
Satyanarayana, P.S.; Mahalakshmi, T.; Rao, P.R.K.; Sheeba, A.; Ravi, J.; Rao, D. Enhancement of Energy Efficiency and Network Lifetime Using Modified MPCT Algorithm in Wireless Sensor Networks. J. Interconnect. Netw. 2022, 22, 2144012:1–2144012:22. [Google Scholar]
Bandyopadhyay, S.; Coyle, E.J. An energy efficient hierarchical clustering algorithm for wireless sensor networks. In Proceedings of the IEEE INFOCOM 2003—Twenty-Second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428), San Francisco, CA, USA, 30 March–3 April 2003; Volume 3, pp. 1713–1723. [Google Scholar]
Cournier, A.; Petit, F.; Villain, V.; Datta, A.K. Self-stabilizing PIF algorithm in arbitrary rooted networks. In Proceedings of the 21st International Conference on Distributed Computing Systems, Mesa, AZ, USA, 16–19 April 2001; pp. 91–98. [Google Scholar]
Pluzhnik, E.; Nikulchev, E.; Payain, S. Concept of feedback in future computing models to cloud systems. arXiv 2014, arXiv:1402.4663. [Google Scholar]
Agrawala, A.; Gong, L. Distributed reset for fault-tolerant systems. In Proceedings of the 13th International Conference on Distributed Computing Systems, Pittsburgh, PA, USA, 25–28 May 1993. [Google Scholar]
Davis, A.J.J. Review of Distributed Reset. 2023. Available online: https://emptysqua.re/blog/review-distributed-reset/ (accessed on 14 February 2023).
The Apache Software Foundation. Apache ZooKeeper Documentation. 2021. Available online: https://zookeeper.apache.org/doc/r3.5.4-beta/zookeeperOver.html (accessed on 14 February 2023).
Ouyang, L.; Huang, Y.; Huang, B.; Wei, H.; Ma, X. Leveraging TLA+ Specifications to Improve the Reliability of the ZooKeeper Coordination Service. arXiv 2023, arXiv:2302.02703. [Google Scholar]
Toasa, R.M.; Aldas, C.; Recalde, P.; Coral, R. Performance Evaluation of Apache Zookeeper Services in Distributed Systems. In Proceedings of the International Conference on Information Technology & Systems, Quito, Ecuador, 6–8 February 2019. [Google Scholar]
Petrescu, M. Leader Election in a Cluster using Zookeeper. Stud. Univ.-Babeș-Bolyai Inform. 2021, 66, 104. [Google Scholar]
Chang, E.J. Echo algorithms: Depth parallel operations on general graphs. IEEE Trans. Softw. 1982, SE-8, 391–401. [Google Scholar]
Douligeris, C.; Kotsopoulos, S.N. Real-Time Broadcast-Based Coordination for Mobile Ad Hoc Networks. IEEE Trans. Mob. Comput. 2004, 3, 125–139. [Google Scholar]
Cho, J.H.; Lee, J.; Kim, H.; Kim, Y. An Overview of Blockchain Technology: Architecture, Consensus, and Future Trends. IEEE Access 2020, 8, 187919–187941. [Google Scholar]
Lee, J.; Kwon, Y.; Lee, K.; Lim, S. An Overview of Wireless Communications for Emergency Responders. IEEE Commun. Mag. 2002, 40, 88–96. [Google Scholar]
Stein, J.C.; Kroening, M. Inventory Management in Supply Chain. Harv. Bus. Rev. 2002, 80, 104–113. [Google Scholar]
Li, J.; Tan, X.; Wu, J.; Zhang, K. Smart Grid Communications and Networking. IEEE Trans. Smart Grid 2010, 1, 109–119. [Google Scholar]
Coyle, J.J.; Bardi, E.J.; Langley, C.J., Jr. Supply Chain Management: A Logistics Perspective; Cengage Learning: Boston, MA, USA, 2017. [Google Scholar]
Lee, J.; Kim, Y.; Lee, J. An Overview of Wireless Health Monitoring Systems. J. Med Syst. 2010, 34, 859–872. [Google Scholar]
Mayne, M. Blockchain Adoption in Broadcast; IBC: Tokyo, Japan, 2022. [Google Scholar]
Wah, B.W.; Lien, Y.N. Design of Distributed Databases on Local Computer Systems with a Multiaccess Network. IEEE Trans. Softw. Eng. 1985, SE-11, 606–619. [Google Scholar]
Lien, Y.N. Distributed Databases on Local Multiaccess Computer Systems (Query Processing, File Allocation, Concurrency Control). Ph.D. Thesis, Purdue University, West Lafayette, IN, USA, 1986. [Google Scholar]
Lin, Y.F.; Lim, E.P.; Ng, W.K. eBroker: An agent-based query routing system for distributed E-commerce databases. In Proceedings of the Seventh International Conference on Parallel and Distributed Systems (Cat. No.PR00568), Iwate, Japan, 4–7 July 2000; pp. 517–522. [Google Scholar]
Borst, S.; Gupta, V.; Walid, A. Distributed Caching Algorithms for Content Distribution Networks. In Proceedings of the IEEE INFOCOM 2010, San Diego, CA, USA, 14–19 March 201; IEEE: Piscataway, NJ, USA, 2010; pp. 1–9. [Google Scholar]
Nasiri, H.; Nasehi, S.; Goudarzi, M. Evaluation of distributed stream processing frameworks for IoT applications in Smart Cities. J. Big Data 2019, 6, 1–24. [Google Scholar]
Castro, M.; Liskov, B. Practical byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OsDI), New Orleans, LA, USA, 22–25 February 1999; Volume 99, pp. 173–186. [Google Scholar]
Ongaro, D.; Ousterhout, J. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference (USENIX ATC 14), Philadelphia, PA, USA, 19–20 June 2014; pp. 305–319. [Google Scholar]
Li, J.; Chen, Y.; Wang, P.; Chen, H.H. A Survey on Wireless Broadcast Algorithms. IEEE Commun. Surv. Tutorials 2015, 17, 976–996. [Google Scholar]
Marabissi, D.; Mucchi, L.; Stomaci, A. IoT nodes authentication and ID spoofing detection based on joint use of physical layer security and machine learning. Future Internet 2022, 14, 61. [Google Scholar]
El Zouka, H.A.; Hosni, M.M. Secure Authentication and Session Key Management Scheme for Distributed Sensor Networks. In Proceedings of the 2022 International Conference on Computing, Electronics & Communications Engineering (iCCECE), Southend, UK, 17–18 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 77–83. [Google Scholar]
Yang, Z.; Ma, H.; Ai, M.; Zhan, M.; Wu, G.; Zhang, Y. A Minimal Disclosure Signature Authentication Scheme Based on Consortium Blockchain. In Proceedings of the 2022 IEEE International Conference on Blockchain (Blockchain), Espoo, Finland, 22–25 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 516–521. [Google Scholar]
Hunt, P.; Konar, M.; Junqueira, F.P.; Reed, B. ZooKeeper: Wait-free Coordination for Internet-scale Systems. SIGPLAN Not. 2010, 45, 1–13. [Google Scholar]
Burrows, M. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, Seattle, WA, USA, 6–8 November 2006; pp. 335–350. [Google Scholar]
Kakivaya, G.; Xun, L.; Hasha, R.; Ahsan, S.B.; Pfleiger, T.; Sinha, R.; Gupta, A.; Tarta, M.; Fussell, M.; Modi, V.; et al. Service Fabric: A Distributed Platform for Building Microservices in the Cloud. In Proceedings of the Thirteenth EuroSys Conference, Porto, Portugal, 23–26 April 2018; ACM: New York, NY, USA, 2018; pp. 1–15. [Google Scholar]
Pandya, B.; Pourabdollah, A.; Lotfi, A.; Acampora, G. An Integrated Fuzzy Logic System under Microsoft Azure using Simpful. In Proceedings of the 2022 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Padua, Italy, 18–23 July 2022; pp. 1–9. [Google Scholar]
Evdemon, J. Microservices and Service Fabric—The Future Architecture? Available online: https://devblogs.microsoft.com/premier-developer/microservices-and-service-fabric-the-future-architecture/ (accessed on 25 January 2023).
Dijkstra, E.W. Guarded commands, nondeterminacy and formal derivation of programs. Commun. ACM 1975, 18, 453–457. [Google Scholar]
Dolev, S.; Israeli, A.; Moran, S. Uniform dynamic self-stabilizing leader election. IEEE Trans. Parallel Distrib. Syst. 1997, 8, 424–440. [Google Scholar]

Figure 1. Process computational model.

Figure 2. Illustration of premature feedback. Normal and abnormal broadcast trees are shown in green and red, respectively.

Figure 3. Illustration of specification violation and semantics change. Normal and abnormal broadcast trees are shown in green and red, respectively.

Figure 4. Illustrative example for starting in an arbitrary system configuration (rounds 1–3).

Figure 5. Illustrative example for normal system configuration (rounds 4–15).

Figure 6. Illustrative example for normal system configuration (rounds 16–20).

Table 1. Comparison of broadcast approaches.

Approach	Handling of Arbitrary Initialization	Scalability	1-Safety (Handling of Spurious Message)	Requires Lock-Step Synchrony	Premature Feedback Safety	Time Complexity
Flooding Broadcast [27]	No	Low	No	No	No	$O (D)$
PIF [31,32,50]	No	Low	No	No	No	$O (h)$
PFC [37,38]	No	Low	No	No	No	$O (h)$
Distributed Reset [44,45]	No	Low	No	No	No	$O (N^{2})$
Al-Jady & Karaata [2]	Yes	Moderate	Yes	Yes	Yes	$O (D^{2})$
Proposed Algorithm	Yes	High	Yes	No	Yes	$O (D^{2})$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dabees, A.; Karaata, M.H. Possibility and the Impossibility of Reliable Broadcast: A 1-Safe and Reliable Broadcast Algorithm in the Presence of Arbitrary Initialization. Algorithms 2025, 18, 437. https://doi.org/10.3390/a18070437

AMA Style

Dabees A, Karaata MH. Possibility and the Impossibility of Reliable Broadcast: A 1-Safe and Reliable Broadcast Algorithm in the Presence of Arbitrary Initialization. Algorithms. 2025; 18(7):437. https://doi.org/10.3390/a18070437

Chicago/Turabian Style

Dabees, Aisha, and Mehmet Hakan Karaata. 2025. "Possibility and the Impossibility of Reliable Broadcast: A 1-Safe and Reliable Broadcast Algorithm in the Presence of Arbitrary Initialization" Algorithms 18, no. 7: 437. https://doi.org/10.3390/a18070437

APA Style

Dabees, A., & Karaata, M. H. (2025). Possibility and the Impossibility of Reliable Broadcast: A 1-Safe and Reliable Broadcast Algorithm in the Presence of Arbitrary Initialization. Algorithms, 18(7), 437. https://doi.org/10.3390/a18070437

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Possibility and the Impossibility of Reliable Broadcast: A 1-Safe and Reliable Broadcast Algorithm in the Presence of Arbitrary Initialization

Abstract

1. Introduction

2. Preliminaries

2.1. Arbitrary Initialization and Scalability

2.2. Broadcast Algorithms

2.3. Applications of Broadcast

3. Computational Model

4. Specification of Asynchronous and Reliable Broadcast Algorithm

5. Algorithm

5.1. Basis of the Algorithm

5.2. Detailed Algorithm Description

6. Illustrative Example

6.1. Illustration of Premature Feedback

6.2. Illustration of Specification Violation and Semantics Change

6.3. Normal Execution

7. Proof of Correctness

8. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI