Communication-Induced Checkpointing with Message Logging beyond the Piecewise Deterministic (PWD) Model for Distributed Systems

Ahn, Jinho

doi:10.3390/electronics10121428

Open AccessArticle

Communication-Induced Checkpointing with Message Logging beyond the Piecewise Deterministic (PWD) Model for Distributed Systems

by

Jinho Ahn

Division of AI Computer Science and Engineering, Kyonggi University, Suwon 16227, Gyeonggi, Korea

Electronics 2021, 10(12), 1428; https://doi.org/10.3390/electronics10121428

Submission received: 4 May 2021 / Revised: 11 June 2021 / Accepted: 11 June 2021 / Published: 14 June 2021

(This article belongs to the Special Issue Fault-Tolerant Design and Applications of Electronic Circuits and Systems)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper introduces an effective communication-induced checkpointing protocol using message logging to enable the number of extra checkpoints to be far lower than the previous number. Even if a situation occurs in which it is decided that a process receiving a message has to perform forced checkpointing, our protocol allows the process to skip the forced checkpointing action if it recognizes that the state of its sender right before the receipt of the message is recoverable. Additionally, the communication-induced checkpointing protocol is thus not required to assume the piecewise deterministic model, despite being combined with message logging. This protocol can maintain these features by piggybacking a one-bit variable and an n-size vector on each message sent. Our simulation results verify our claim that the presented protocol performs much better than the representative optimized protocol with respect to the forced checkpointing frequency, regardless of the communication pattern.

Keywords:

distributed systems; fault tolerance; checkpointing; message logging

1. Introduction

As parallel algorithms perform many operations on a cluster of independent computing nodes, even a single node crash can cause the execution of an algorithm to halt [1]. This undesirable property may make large-scale distributed systems more vulnerable to failures [2]. For this reason, effective fault-tolerance techniques are essentially needed in such systems. Rollback recovery is one such technique that enables the current erroneous state of a distributed system to be restored to a previous failure-free state from the stable storage [3,4]. To achieve this goal, the system recovery information must be occasionally saved to the storage for normal operations [5].

Among the techniques used for rollback recovery, checkpoint-based recovery depends solely upon local states of processes maintained in the stable storage—called checkpoints—to support fault tolerance [5]. In the case of a failure, the system state is recovered by using the most recent consistent global checkpoint kept in the storage. According to when and how consistent sets of checkpoints are formed, checkpoint-based recovery protocols are categorized into coordinated, independent, and communication-induced checkpointing [5]. In order to balance trade-offs between independent and coordinated checkpointing in an effective manner, communication-induced checkpointing (CIC) is used to preclude any local checkpoint that have already been taken from becoming useless by performing forced checkpointing while attempting to increase the degree of checkpointing independence as much as possible [6,7,8,9,10,11,12,13,14,15]. The CIC protocols include HMNR [8], which uses this feature to enable the number of extra checkpoints to be much lower by effectively using the control information contained in each sent message. Next, an improved HMNR protocol [12], LazyHMNR, attempts to use a lazy indexing strategy [13] to alleviate the problem of high frequencies of forced checkpointing that may occur in some particular cases. Next, two protocols, FINE [9] and LazyFINE [10], were designed to try to generate fewer forced checkpoints than HMNR and LazyHMNR with the same numbers of variables that they hold. However, they cannot ensure the property of never having useless checkpoints [7]. Then, an adaptive CIC protocol [11] was developed in an attempt to delay taking forced checkpointing actions as long as possible by evaluating some safety predicates. One of the most recent CIC protocols [14] utilizes the effectiveness of the one-to-many transmission of broadcasting links, which is widely used to lower the number of extra checkpoints. However, this may greatly degrade the applicability of the CIC. Lastly, several recent works [6,15] exploited one or more protocols mentioned above to raise the dependability of the systems to a higher level in the fields of distributed database management systems and web services.

Generally, checkpointing-based recovery, including CIC, is not subject to the piecewise deterministic (PWD) model; thus, it is less restrictive and more realistic for application in distributed systems than in message-logging-based recovery [16,17,18]. However, the former may not ensure the recovery of the system to its pre-failure state, which generally makes the rollback distance of each process much longer than that in the latter [5]. Thus, hybrid protocols that combine the two techniques were proposed to compensate for the drawbacks of the first [5]. However, these protocols cannot be freed from the PWD assumption. For this reason, none of the existing CIC protocols—including a family of HMNR protocols [6,8,9,10,11,12,13,14,15]—developed so far can utilize the benefits of message logging on real-world distributed systems without assuming the PWD model. Whenever each process receives a message, the protocols cause the process to perform forced checkpointing if they decide that one or more checkpoints that have already been taken may become useless. Due to this inherent shortcoming, even the HMNR protocols force each process to take extra checkpoints—more than twice the number of basic checkpoints [7]. However, the property of never having useless checkpoints is ensured without performing the forced checkpointing action if the process can know precisely that the current state of the sender of the message can be deterministically restored by replaying logged messages from the stable storage in the case of a failure of the sender. We found that this observation may be an important way to drive a large reduction in the number of forced extra checkpoints that all of the previous CIC protocols can incur. This paper introduces a CIC protocol, S-CIC, with message logging that is not subject to the PWD model in order to address the aforementioned observation. The protocol can achieve this goal by only carrying a one-bit variable and an n-size vector into every message transmitted. In the checkpointing and communication patterns where the existing CIC protocols—including the family of HMNR protocols—force the receiver of a message to take extra checkpoints, if the information of the message sender piggybacked on it indicates that the sender’s state right before sending is recoverable, the proposed protocol allows the receiver not to perform forced checkpointing.

2. Background

2.1. Fundamentals

In this paper, we assume a distributed system with no global clock or memory and immunity to network partition [5]. Every process can crash according to the fail-stop failure model [19], and the processes collaborate with others only by making reliable message exchanges through an asynchronous transmission channel [5,8]. Each process p begins an execution from its first state and performs a combination of internal, message-sending, and message-delivering events [5]. Here, internal events are produced to perform their individual computations with no interactions with others. All of the event processes that are incurred for normal operations are sequenced according to Lamport’s “happened before” relation [20].

Ck

_{p}^{i}

represents the ith local checkpoint of p, and Ck

_{p}^{i}

.

l c

is the local timestamp assigned to Ck

_{p}^{i}

when it is taken. Assume that each process p records the first checkpoint, Ck

_{p}^{0}

, on the storage containing its initial state when it begins its own computation. A global checkpoint means a set of local checkpoints that hold only one per process in the system [5,7]. A pair of local checkpoints (Ck

_{p}^{i}

,Ck

_{q}^{j}

) is named mutually consistent if and only if there is no case in which m is delivered before Ck

_{q}^{j}

, but is sent after Ck

_{p}^{i}

. A global checkpoint is consistent if and only if every couple of local checkpoints belonging to the first always satisfies the mutual consistency condition [21]. The concept of the Z-path [22] is exploited to check if the condition of mutual consistency is satisfied on an ordered pair of checkpoints by finding causal sub-paths, as well as non-causal (NC) sub-paths, where the two checkpoints can be connected to each other. A Z-path that includes a cycle from a local checkpoint Ck

_{p}^{i}

to itself is called a Z-cycle [22].

Theorem 1.

For any pair of checkpoints Ck

_{p}^{i}

and Ck

_{q}^{j}

, if a Z-path begins with Ck

_{p}^{i}

and terminates with Ck

_{q}^{j}

and Ck

_{p}^{i}

.

l c

is lower than Ck

_{q}^{j}

.

l c

, then no Z-cycle can form [22].

2.2. Related Work

HMNR is an optimized protocol that aims to have no local checkpoints that are useless while decreasing the number of extra checkpoints. To keep this feature in HMNR, each process p should always have the following five state variables [8]:

l c_{p}

is a non-negative integer variable that has the present value of p’s local timestamp.

s e n d

_

t o_{p}

is a vector in which

s e n d

_

t o_{p}

[q] keeps a boolean value to detect a non-causal path to q from p.

c k p t_{p}

is a vector in which

c k p t_{p}

[q] contains the total number of checkpoints that q has recorded in the stable storage from its initial execution that p currently recognizes.

t a k e n_{p}

is a vector where

t a k e n_{p}

[q] keeps the boolean value for q to indicate the existence of at least one causal Z-path from the latest checkpoint of q that p perceives to the subsequent checkpoint for p.

g r e a t e r_{p}

is a vector where

g r e a t e r_{p}

[q] has the boolean value for q, and this indicates whether p’s current timestamp

l c_{p}

is greater than the most recent timestamp of q perceived by p (=true) or not (=false).

This protocol includes a checkpoint-timestamping mechanism that uses Lamport’s logical clock [21] to satisfy Theorem 1, implying that increasing the timestamp flow along any Z-path always ensures that no checkpoint becomes useless. The mechanism is sufficient for ensuring that no causal Z-path includes a Z-cycle formation [8,22].

However, two kinds of non-causal (NC) Z-paths [8,22], as shown in Figure 1, can violate the theorem, even if the timestamping mechanism is used. To prevent Z-cycles from forming in these cases, HMNR forces each process p after receiving a message m to save an additional checkpoint if the following condition

C_{H M N R}

is satisfied.

$C_{H M N R}$ ≡ $C_{1}$ ∨ $C_{2}$
$C_{1}$ ≡ ∃j(1 ≤ j ≤ n): $s e n t_t o_{p}$ [j]∧m. $g r e a t e r$ [j]∧(m. $l c$ > $l c_{p}$ )
$C_{2}$ ≡ ( $c k p t_{p}$ [p] = m. $c k p t$ [p]) ∧ m. $t a k e n$ [p]

The first case is an NC Z-path pattern connecting two checkpoints, Ck

_{p}^{i}

and Ck

_{r}^{k + 1}

, as shown in Figure 1a. In this example, three processes, p, q, and r, are exchanging messages:

m_{1}

,

m_{2}

, and

m_{3}

. As Ck

_{p}^{i}

.

l c

= Ck

_{r}^{k + 1}

.

l c

, the path violates the theorem. When q sends

m_{2}

to r,

s e n t_t o_{q}

[r] becomes true. However, p can get

l c_{r}

before Ck

_{r}^{k + 1}

through

m_{3}

,

m_{3}

.

l c

(=

l c_{r}

) <

l c_{p}

. Thus,

g r e a t e r_{p}

[r] still remains true. Then, it is brought to q when receiving

m_{1}

, so

m 1

.

g r e a t e r

[r] = true and

m_{1}

.

l c

(=

l c_{p}

) >

l c_{q}

. As the first sub-condition of

C_{H M N R}

,

C_{1}

, is satisfied, HMNR forces q to record an extra checkpoint Ck

_{q}^{j + 1}

in the storage before conveying

m_{1}

to the target application.

The second case is another NC Z-path pattern with Ck

_{p}^{i}

and Ck

_{r}^{k + 1}

, as shown in Figure 1b. In the figure, the path incurs a Z-cycle involving Ck

_{r}^{k + 1}

because r sends

m_{3}

to p after Ck

_{r}^{k + 1}

, and then q receives

m_{1}

, depending on

m_{3}

, from p. In this case, as

m_{3}

is transmitted to p from r,

g r e a t e r_{p}

[r](←

g r e a t e r_{p}

[r] ∧

m_{3}

.

g r e a t e r

[r]) becomes false as

m_{3}

.

l c

(=

l c_{r}

) =

l c_{p}

and

m_{3}

.

g r e a t e r

[r] = false. As

m_{1}

.

g r e a t e r

[r] = false when q receives

m_{1}

,

C_{1}

is not sufficient for detecting the violation. Therefore, the second sub-condition of

C_{H M N R}

,

C_{2}

, needs to be checked. When taking Ck

_{r}^{k + 1}

,

t a k e n_{r}

[q] changes to true while the value of

c k p t_{r}

[q] is still the same as the number of checkpoints associated with Ck

_{q}^{j}

. Then, the two values can be brought to q through a causal Z-path composed of

m_{3}

and

m_{1}

in order. As

m_{1}

.

c k p t

[q] =

c k p t_{q}

[q] and

m_{1}

.

t a k e n

[q] = true,

C_{2}

is satisfied.

However, in both examples, if p’s state right before sending

m_{1}

is recoverable and q knows this, q does not need to take Ck

_{q}^{j + 1}

before delivering

m_{1}

, even though

C_{H M N R}

is satisfied. In other words, when q recognizes that both

m_{2}

and

m_{3}

can always be replayed in case of failure and that p’s internal execution before sending

m_{1}

is deterministic, Ck

_{r}^{k + 1}

will not be a useless checkpoint, even though Ck

_{q}^{j + 1}

is taken after delivering

m_{1}

. Based on this new observation, we present a low-overhead CIC protocol, S-CIC, that uses message logging to detect this type of recoverability in an efficient manner without assuming the PWD constraint.

The enhanced version of HMNR [12], LazyHMNR, attempts to lessen the forced checkpointing frequency in unconventional cases that may take place due to asymmetries in the rates of increase of logical timestamps. It fulfills this requirement by delaying the swift growth of the logical timestamps of some processes—which is caused by repeated checkpointing actions on them—as long as possible.

Another CIC protocol [9], FINE, attempts to intensify the optimality of the consistency predicate of HMNR with only its mandatory state variables in order to ensure the property of never having useless checkpoints. The advanced version of FINE [10], LazyFINE, was designed to incorporate the laziness of logical timestamp increases into FINE. However, it was proved that the two protocols can create useless checkpoints because the Z-consistent timestamping rule cannot be enforced [7].

A delayed CIC protocol [11], DCFI, was introduced to lower the forced checkpointing frequency by applying several safety rules that enabled the postponement of checkpointing enforcements. This feature may have a much lower total number of checkpoints that are taken in the system. However, the protocol does not incorporate a method for significantly lowering the frequency of extra checkpointing actions by exploiting the rollback distance reduction benefit of message logging.

Another CIC protocol [14], BN-FI, was recently presented in order to curtail the number of extra checkpoints by exploiting the functional strength that broadcasting networks generally hold, which is called one-to-many transmission effectiveness. This special capability of lightweight group dissemination can speed up the updating of the last logical timestamp of each transmitter for the others on the network. This behavioral property enables each process to precisely detect if the ongoing Z-path has at least one checkpoint that has become useless much earlier than in the previous protocols. However, the performance gain of the protocol limits its applicability to network environments.

One of the most recent CIC algorithms [15] was developed in order to maintain a globally consistent state of each transaction in a distributed database management system while making the delay in failure-free transaction execution as short as possible. The algorithm attempts to enhance the recoverability of the system states with a far lower number of extra checkpoints by recording only the states of the completely committed transactions in the stable storage. However, the algorithm has the same shortcomings as those of the original HMNR protocol mentioned above.

The authors of [6] proposed an adaptive checkpoint generation algorithm in order to decrease the frequency of forced checkpointing actions by considering the system’s behavior in comparison with the static algorithm, which did not reflect the environmental changes onto web services being operated. The algorithm made decisions on whether forced checkpoints should be taken based on the quality of the service parameters and the policies currently applied to their corresponding web services. In order to improve the dependability of the web services, three kinds of CIC protocols were exploited: HMNR [8], DCFI [11], and FINE [9]. Among them, HMNR and DCFI performed better than FINE on the system in terms of the number of forced checkpoints. However, the system still bore the respective limitations of the three CIC protocols stated above.

However, as all of the CIC protocols mentioned above attempt to use message logging to shorten the rollback distance of each process during recovery as much as they can, they must be applied only to deterministic services and systems, resulting in a large contraction of the scope of the application areas. Table 1 shows a summary of a comparison of some primary features of the CIC representatives.

3. The Proposed Protocol

S-CIC was devised to maintain the following three behavioral properties.

Similarly to HMNR, each process attaches the state information related to other processes, as well as to itself, to every outgoing message so that the number of extra checkpoints decreases as much as possible.
Even if either of the two cases where a process should perform a forced checkpointing before delivery of a message in HMNR occurs, S-CIC does not have the process perform the task if it knows that the same message can be replayed in spite of any future failures.
Although pessimistic message logging is used to satisfy the second requirement, S-CIC is not subject to the PWD model.

Initially, each process deterministically performs its computation in a certain interval and, if a non-deterministic (ND) event occurs in this interval, the process begins its ND execution interval. In this research field, ND events can be classified into two types of events. The first type includes loggable ND events, of which there is sufficient support for forcing the replay at the same point in case of failure. Message receipt is one type of loggable ND event that most message-logging protocols detect and save in the stable storage for recovery. Aside from this, there are other types of loggable ND events, such as software interrupts or signals, which some other works [2] attempted to detect in order to make it possible to replay them in case of failure. This effort may raise the rate of the deterministic (DM) execution intervals. The second type comprises unloggable ND events, for which there is no support for taking action to enable a form of repeatable execution in case of failure.

To hold all of them in S-CIC, each process p should always have the following additional state information:

$S S N m V_{p}$ : A vector that saves an element composed of two variables, $s s n$ and $N D$ , for each process q. $S S N m V_{p}$ [q]. $s s n$ keeps the value of the ssn of the latest message m that q has transmitted and is known by p. $S S N m V_{p}$ [q]. $N D$ is a boolean value that indicates whether at least one internal unloggable ND event q has been executed before m since q’s latest checkpoint. The two variables are initialized to (0,false). As a message m is transmitted from p, $S S N m V_{p}$ [p]. $s s n$ increments by one and the vector is attached to m. If p takes a local checkpoint, $S S N m V_{p}$ [p]. $N D$ comes back to false. If a message m is transmitted to p, $S S N m V_{p}$ [q] is updated to m. $S S N m V$ [q] if m. $S S N m V$ [q]. $s s n$ > $S S N m V_{p}$ [q]. $s s n$ .
$N D$ - $m o d e_{p}$ : A boolean variable for detecting whether it there is at least one internal unloggable ND event that any process including p has executed since its latest checkpoint. It is initialized to false and, when a message m is transmitted, it is attached to m. As p performs the receipt of a message or takes a checkpoint, $N D$ - $m o d e_{p}$ comes back to false if $N D$ - $m o d e_{p}$ = true and ∀q(1≤q≤n): $S S N m V_{p}$ [q]. $N D$ = false.
$r s n_{p}$ : A non-negative integer variable that has the same sequence number as that of the most recent message that p has received.

First, we aimed to understand how to identify the recoverable state of each process with checkpointing and message logging without assuming the use of the PWD model. For this purpose, an execution mode detection method was introduced in order to consider three typical cases that can occur in CIC, as shown in Figure 2. As shown in Figure 2a, the first case is that a process performs its computation without any communication with others. In this example, a process p first executes with its internal DM or loggable ND events in a certain interval from its checkpointed state Ck

_{p}^{i}

(

S S N m V_{p}

[p].

N D

= false), which is called the DM mode (

N D

-

m o d e_{p}

= false). Then, if any first internal unloggable ND event occurs (

S S N m V_{p}

[p].

N D

= true), p’s ND interval begins and changes its execution mode to non-deterministic (

N D

-

m o d e_{p}

= true). When taking its next checkpoint Ck

_{p}^{i + 1}

, p’s state becomes recoverable (

S S N m V_{p}

[p].

N D

= false); thus, its execution mode returns to being deterministic (

N D

-

m o d e_{p}

= false). Then, it performs its computation in a similar way. Therefore, in this case, if p fails at a certain execution point after Ck

_{p}^{i + 1}

, it can restart from the latest checkpoint and recover to the state right before any first unloggable ND event after Ck

_{p}^{i + 1}

without considering any dependency relation with others.

As shown in Figure 2b, the second case is that an execution of a process q that is affected by messages transmitted from another single process. In this case, q first executes in its DM mode from its checkpointed state Ck

_{q}^{j}

(

S S N m V_{q}

[q].

N D

= false,

N D

-

m o d e_{q}

= false). When receiving a message

m_{x}

from p, whose current mode is ND (

m_{x}

.

S S N m V

[p].

N D

= true,

m_{x} . N D

-

m o d e

= true), q’s execution mode also changes to ND (

N D

-

m o d e_{q}

= true). Then, q can execute with its internal unloggable ND events, although this is not shown in this figure (

S S N m V_{q} [q] . N D

= true). However, even if q takes its local checkpoint Ck

_{q}^{j + 1}

(

S S N m V_{q} [q] . N D

= false), its execution mode still remains ND because q does not know that p’s current state is recoverable (

S S N m V_{p}

[p].

N D

= false,

N D

-

m o d e_{p}

= false). When receiving

m_{y}

from p, q can recognize that p’s current mode is DM due to the information piggybacked on

m_{y}

(

m_{y}

.

S S N m V

[p].

N D

= false,

m_{y}

.

N D

-

m o d e

= false). Then, q’s mode becomes DM(

N D

-

m o d e_{q}

= false).

As shown in Figure 2c, the third case is that a process q fulfills its computation depending on messages received from more than one process, p and r, which execute with internal unloggable ND events. In this case, q first executes in its DM mode from its checkpointed state Ck

_{q}^{j}

, and then in its ND mode with an unloggable ND event (

S S N m V_{q}

[q].

N D

= true,

N D

-

m o d e_{q}

= true). Next, it receives two messages,

m_{x}

and

m_{z}

, from p and r, which are currently both in ND mode (

m_{z}

.

S S N m V

[r].

N D

= true,

m_{z}

.

N D

-

m o d e

= true). When taking its next checkpoint Ck

_{q}^{j + 1}

(

S S N m V_{q}

[q].

N D

= false), it still executes in ND mode even if the current modes of the two processes are both DM after their latest checkpoints. This unawareness can be resolved after q has received

m_{y}

and

m_{w}

(

m_{w}

.

S S N m V

[r].

N D

= false,

m_{w} . N D

-

m o d e

= false).

Let us examine how, by using the mode detection method, S-CIC can have its number of extra checkpoints lowered in the two types of NC Z-path patterns in comparison with HMNR, as shown in Figure 3. An example of the first pattern of NC paths is illustrated in Figure 3a, which violates the theorem in HMNR. Let the ssns of p, q, and r be

α

,

β

, and

γ

for Ck

_{p}^{i}

, Ck

_{q}^{j}

, and Ck

_{r}^{k}

, respectively. When q sends

m_{2}

to r, its mode is DM(

N D

-

m o d e_{q}

= false) and its ssn is (

β

+ 1)(

S S N m V_{q}

[q] = (

β

+ 1,false)). The two pieces of information are piggybacked on

m_{2}

. On receiving

m_{2}

, r is in the DM mode (

S S N m V_{r}

[r] = (

γ

,false),

N D

-

m o d e_{r}

= false). Then, it increments its rsn,

r s n_{r}

, and saves a log element whose form is e (sender’s identifier, receiver’s identifier, ssn, rsn, data) in the stable storage—for example, e (q, r, (

β

+ 1),

r s n_{r}

,

m_{2}

.

d a t a

) for

m_{2}

. Afterwards, it updates its mode-detection-related information as follows:

N D

-

m o d e_{r}

←

m_{2}

.

N D

-

m o d e

∨

N D

-

m o d e_{r}

= false,

S S N m V_{r}

= {(0,false),(

β

+ 1,false), (

γ

,false)}. After receiving

m_{3}

, whose ssn is (

γ

+ 1) from r, and logging it, p’s mode remains DM, as r and p are both in DM mode (

N D

-

m o d e_{p}

←

m_{3}

.

N D

-

m o d e

∨

N D

-

m o d e_{p}

= false), and its ssn vector

S S N m V_{p}

is updated to {(

α

,false), (

β

+ 1,false),(

γ

+ 1,false)}. When q receives

m_{1}

from p and it is logged, it is in ND mode (

S S N m V_{q}

[q] = (

β

+ 1,true),

N D

-

m o d e_{q}

= true), and

C_{H M N R}

becomes true, as

s e n t_t o_{q}

[r], m.

g r e a t e r

[r], and m.

l c

>

l c_{q}

are all true. However, q does not need to take any forced checkpoints (

m_{1}

.

N D

-

m o d e

∧

C_{H M N R}

= false) because it knows p’s state, including the fact that sending

m_{1}

is recoverable (

m_{1}

.

N D

-

m o d e

= false); thus,

m_{1}

can always be regenerated even if p crashes. When taking its next checkpoint Ck

_{q}^{j + 1}

, q’s mode changes to DM and its vector

S S N m V_{q}

is updated to {(

α

+ 1,false),(

β

+ 1,false),(

γ

+ 1,false)}. Therefore, in this example, p’s recoverable state until sending

m_{1}

after Ck

_{p}^{i}

, Ck

_{q}^{j + 1}

, and Ck

_{r}^{k + 1}

comprises a globally consistent state.

Figure 3b illustrates an example of the second pattern of NC paths that violate the theorem in HMNR. After r receives

m_{2}

from q, whose mode is DM (

m_{2}

.

N D

-

m o d e

= false,

m_{2}

.

S S N m V

= {(0,false),(

β

+ 1, false),(0,false)}), and logging it, its mode is ND(

m_{2} . N D

-

m o d e

∨

N D

-

m o d e_{r}

= true) because of its internal unloggable ND event (

S S N m V_{r} [r

] =(

γ

,true),

N D

-

m o d e_{r}

= true). However, when taking a checkpoint Ck

_{r}^{k + 1}

(

S S N m V_{r}

[r]= (

γ

,false)), r’s mode changes to DM; thus,

m_{3}

can always be replayed (

m_{3}

.

N D

-

m o d e

= false). On

m_{3}

’s receipt and logging, p keeps its DM mode and updates its variables as follows:

N D

-

m o d e_{p}

←

m_{3}

.

N D

-

m o d e

∨

N D

-

m o d e_{p}

= false, and

S S N m V_{p}

= {(

α

,false),(

β

+ 1,false),(

γ

+ 1, false)}. When q receives

m_{1}

from p and logs it,

C_{H M N R}

becomes true, as

m_{1}

.

c k p t

[q]=

c k p t_{q}

[q] and

m_{1}

.

t a k e n

[q] are all true. However, q does not need to take any forced checkpoints (

m_{1}

.

N D

-

m o d e

∧

C_{H M N R}

= false) because r and p can deterministically reproduce and send

m_{3}

and

m_{1}

in order, respectively, even in case of p’s failure (

m_{1}

.

N D

-

m o d e

= false). Therefore, Ck

_{q}^{j + 1}

, which is taken after delivering

m_{1}

(

N D

-

m o d e_{q}

= false,

S S N m V_{q}

[q] = (

β

+ 1,false)), is consistent with the states of the others right after sending

m_{1}

and

m_{3}

.

Figure 4 presents the concrete algorithmic description of S-CIC.

Definition 1.

Z_{p_{i}}^{r_{k + 1}}

is an arbitrary Z-path from

C k_{p}^{i}

to

C k_{r}^{k + 1}

.

Definition 2.

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

) is a variable-length non-causal sub-Z-path of

Z_{p_{i}}^{r_{k + 1}}

.

Lemma 1.

If, in

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

),

C k_{p}^{i}

.

l c

≮

C k_{r}^{k + 1}

, but p’s state immediately before sending m to another process q(

q \neq r

) is recoverable, no forced checkpointing is required to ensure that no useless local checkpoints exist.

Proof.

The correctness of the theorem is proved through contradiction. Assume that when m arrives at q, q has to perform a forced checkpoint action to enforce the safety condition, as q recognizes

C k_{p}^{i}

.

l c

≮

C k_{r}^{k + 1}

. Afterwards, if p crashes, it can recover to hold up to the state right after sending m. The state is always consistent with q’s state immediately after receiving m. Thus, the forced checkpoint is not needed to preclude

C k_{r}^{k + 1}

from being useless.

Therefore, if

C k_{p}^{i}

.

l c

≮

C k_{r}^{k + 1}

, but p’s state immediately before sending m is recoverable, the property of not having any useless checkpoints can be kept without q’s forced checkpointing before delivering m. This contradicts the hypothesis. □

Theorem 2.

S-CIC ensures that no checkpoint is useless.

Proof.

S-CIC includes a checkpoint-timestamping mechanism that uses Lamport’s logical clock, as in HMNR. This feature ensures that any variable-length causal sub-Z-path in

Z_{p_{i}}^{r_{k + 1}}

includes no Z-cycles. Therefore, we only have to prove that the protocol prevents

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

) from containing any Z-cycle formations. The proof goes on by induction on its length, denoted by

L E N G T H

(

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

)).

[Base case] In this case,

Z_{p_{i}}^{r_{k + 1}}

is an NC Z-path with two messages,

m_{1}

and

m_{2}

, where

C k_{p}^{i}

is the latest checkpoint before transmitting

m_{1}

from p and

C k_{r}^{k + 1}

is the most recent checkpoint after

m_{2}

is received by r. Suppose that q takes a checkpoint

C k_{q}^{j}

, sends

m_{2}

to r, and then receives

m_{1}

from p before its next checkpoint. At this point, two cases must be checked.

Case 1:

m_{1}

.

N D

-

m o d e

= false.

q knows p’s state, including that sending

m_{1}

is recoverable. Based on Lemma 1, even if q takes no forced checkpoints,

C k_{r}^{k + 1}

is always useful.

Case 2:

m_{1}

.

N D

-

m o d e

= true

If

m_{1} . l c

≤

m_{2} . l c

,

C k_{p}^{i}

.

l c

≤

m_{1} . l c

≤

m_{2} . l c

<

C k_{r}^{k + 1}

. Thus,

Z_{p_{i}}^{r_{k + 1}}

includes no Z-cycles. Otherwise, two sub-cases must be checked.

Case 2.1: No causal sub-paths are generated from r to p or q between after r receives

m_{2}

and before p sends

m_{1}

or q receives it.

In this case, before p sends

m_{1}

to q,

g r e a t e r_{p}

[r] is true because p knows that

l c_{p}

>

l c_{r}

. When q receives

m_{1}

, it has to perform a forced checkpointing action before delivery of

m_{1}

because

C_{1}

is satisfied.

Case 2.2: A causal sub-path, u, from r to p or q exists between after r received

m_{2}

and before p sends

m_{1}

or q receives it.

At this point, two sub-cases must be checked.

Case 2.2.1: u occurs at r before

C k_{r}^{k + 1}

.

If

l c_{p}

≤ u.

l c

,

C k_{p}^{i}

.

l c

<

C k_{r}^{k + 1}

.

l c

and

C k_{r}^{k + 1}

never becomes a useless checkpoint, regardless of whether u’s destination is p or q. If p receives u, it updates

g r e a t e r_{p}

[r](=

m_{1} . g r e a t e r

[r]) as false. If u goes to q,

l c_{q}

is updated with

u . l c

and is greater than or equal to

m_{1} . l c

. In both cases, S-CIC can recognize that

C_{1}

is not satisfied, and q does not perform a forced checkpointing action.

Otherwise,

C k_{p}^{i}

.

l c

≮

C k_{r}^{k + 1}

and

C k_{r}^{k + 1}

may become useless in both cases. If p receives u, this condition causes

g r e a t e r_{p}

[r](=

m_{1} . g r e a t e r

[r]) to remain unchanged (true). In addition, when

m_{1}

is transmitted to q,

m_{1} . l c

>

l c_{q}

, as u.

l c

≥

l c_{q}

. If u goes to q,

m_{1} . g r e a t e r

[r] is true and

m_{1} . l c

>

l c_{q}

. In both cases, S-CIC can recognize that

C_{1}

is satisfied and causes q to perform a forced checkpointing action before delivering m.

Case 2.2.2: u occurs at r after

C k_{r}^{k + 1}

.

On receiving

m_{2}

, r can keep the value of the latest checkpoint index of q in

c k p t_{r}

[q]. As it takes

C k_{r}^{k + 1}

,

t a k e n_{r}

[q] is set to true. When r sends u,

c k p t_{r}

[q] and

t a k e n_{r}

[q] are eventually brought to q by a directed path—either <u> or <u,

m_{1}

>. In both cases, S-CIC can recognize

C_{2}

is satisfied and causes q to perform a forced checkpointing action before the delivery of the message received by q, as in HMNR.

[Induction hypothesis] It is assumed that the theorem is true for

Z_{p_{i}}^{r_{k + 1}}

if

L E N G T H

(

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

)) = l.

[Induction step] By the induction hypothesis, every checkpoint in

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

) incurs no Z-cycle formations. Therefore, if a new non-causal path forms with

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

),

m_{z}

(which is considered as one message in the induction hypothesis), and the (l + 1)-th message

m_{l + 1}

together, the theorem is true for

Z_{p_{i}}^{r_{k + 1}}

if

L E N G T H

(

N C

-Z-

P A T H

(

Z_{p_{i}}^{r_{k + 1}}

)) = l + 1. The following case is similar to the base case that was mentioned earlier.

Through induction, our protocol ensures that no checkpoint becomes useless. □

4. Performance Evaluation

Let us examine our extensive simulations to make a comparison of the performance of the two protocols, LazyHMNR and S-CIC, with a discrete-event simulation language, PARSEC [23]. LazyHMNR is one of the most recently developed versions of HMNR, which is intended to decrease the high frequency of forced checkpointing [7]. S-CIC is our improved version of LazyHMNR with the advantageous features mentioned in the previous sections.

In this comparison, we precisely examine one important performance index,

N O F C

. The index indicates the total number of forced checkpoints taken in each protocol. The simulated system is a cluster of N computers on a broadcast network. All processes running on each computer begin and finish their individual execution together. As a process transmits an application message, the message is destined to a single recipient at all times. The link capacity and propagation delay of the simulated network are 100 Mbps and 1 ms, respectively. Every process performs a basic local checkpointing task in a certain checkpoint interval according to an exponential distribution with a mean

C I_{b c}

= 5 min. In addition, among N processes, one is selected at random and transfers a message in every timed interval according to an exponential distribution with a mean of

T I_{s e n d}

= 3 s. Furthermore, to measure the communication pattern sensitivity of the two protocols, more complex experiments were conducted by splitting applications into four groups: serial, circular, hierarchical, and irregular [24].

Figure 5, Figure 6, Figure 7 and Figure 8 show the

N O F C

for both LazyHMNR and S-CIC with changes in the numbers of processes—denoted by

N O P

—scaling from 6 to 12 when the percentage of internal unloggable ND events in each process (

U N D

) was 20%, 40%, 60%, and 80% for the four different communication patterns, respectively. In these figures,

U N D

never changed the

N O F C

of LazyHMNR because, unlike S-CIC, LazyHMNR has no method for skipping forced checkpointing actions if the state of the sender of each message right before the receipt of the message is recoverable. As

N O P

increased, the ratio of the

N O F C

of LazyHMNR to that of S-CIC increased for all four patterns according to the change in

U N D

, which ranged from 1.3 to 6.5. The main reason is that there was an increased possibility of forming the two kinds of NC Z-path patterns and inducing forced checkpointing in LazyHMNR when

C_{1}

or

C_{2}

was satisfied. In addition, the occurrence of fewer unloggable ND events per process (i.e., decreasing

U N D

) led to a significant decrease in the forced checkpointing overhead of LazyHMNR due to the advantageous features of S-CIC. These results indicate that S-CIC frequently skips actions to take forced checkpoints for each process by checking the recoverability of the dependent states of other processes, unlike LazyHMNR. In addition, the features have the effect of a large reduction in the number of forced checkpoints, regardless of the communication patterns. However, the degree of their effectiveness may fluctuate in the irregular pattern because its irregularity can cause the formation of Z-cycles and can cause the first type of NC Z-path to differ in every run. From the results, we can see that, with these features, S-CIC can alleviate the shortcomings of the family of HMNR protocols, including LazyHMNR.

5. Conclusions

The proposed protocol, S-CIC, was developed in order to incorporate the following advantageous features. First, though situations can occur in which HMNR or LazyHMNR decides that, on receiving a message, a process has to perform forced checkpointing, S-CIC does not cause the process to perform this action when it recognizes that the state of its sender right before the receipt of the message is recoverable, leading to large reduction in the number of forced checkpoints compared with the family of HMNR protocols. Therefore, S-CIC is also not required to assume the PWD model, despite being combined with message logging. This goal can be realized by piggybacking the sender’s recoverability status and a vector containing the last send sequence number and unloggable event occurrence status of every process onto each sent message. Our simulation results verified that the protocol outperforms the representative optimized protocol, LazyHMNR, with respect to the forced checkpointing frequency, regardless of the communication pattern used.

Funding

This research was funded by Kyonggi University, grant number 2018-033.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author, J.A., upon reasonable request.

Conflicts of Interest

The author declares no conflict of interest.

References

Nakamura, J.; Kim, Y.; Katayama, Y.; Masuzawa, T. A cooperative partial snapshot algorithm for checkpoint-rollback recovery of large-scale and dynamic distributed systems and experimental evaluations. arXiv 2021, arXiv:2103.15285v1. [Google Scholar] [CrossRef]
Lion, R.; Thibault, S. From tasks graphs to asynchronous distributed checkpointing with local restart. In Proceedings of the IEEE/ACM 10th Workshop on Fault Tolerance for HPC at eXtreme Scale, Atlanta, GA, USA, 11 November 2020; pp. 31–40. [Google Scholar]
Estahbanati, M.G.; Schintke, F. Multilevel checkpoint/restart for large computational jobs on distributed computing resources. In Proceedings of the 38th Symposium on Reliable Distributed Systems (SRDS), Lyon, France, 1–4 October 2019; pp. 143–152. [Google Scholar]
Mansouri, H.; Pathan, A. Checkpointing distributed computing systems: An optimisation approach. Int. J. High Perform. Comput. Appl. 2019, 15, 202–209. [Google Scholar]
Elnozahy, E.; Alvisi, L.; Wang, Y.; Johnson, D. A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv. 2002, 34, 375–408. [Google Scholar] [CrossRef] [Green Version]
Vargas-Santiago, M.; Morales-Rosales, L.; Monroy, R.; Pomares-Hernandez, S.; Drira, K. Autonomic web services based on different adaptive quasi-asynchronous checkpointing techniques. Appl. Sci. 2020, 10, 2495. [Google Scholar] [CrossRef] [Green Version]
Garcia, I.C.; Vieira, G.M.D.; Buzato, L.E. A rollback in the history of communication-induced checkpointing. arXiv 2019, arXiv:1702.06167v2. [Google Scholar]
Helary, J.-M.; Mostefaoui, A.; Netzer, R.H.B.; Raynal, M. Communication-based prevention of useless checkpoints in distributed computations. Distrib. Comput. 2000, 13, 29–43. [Google Scholar] [CrossRef]
Luo, Y.; Manivannan, D. FINE: A Fully Informed aNd Efficient communication-induced checkpointing protocol for distributed systems. J. Parallel Distrib. Comput. 2009, 69, 153–167. [Google Scholar] [CrossRef]
Luo, Y.; Manivannan, D. Theoretical and experimental evaluation of communication-induced checkpointing protocols in F_E and F_Lazy-E. Perform. Eval. 2011, 68, 429–445. [Google Scholar] [CrossRef]
Simón, A.; Hernandez, S.; Cruz, J.; Halima, R.; Kacem, H. Self-healing in autonomic distributed systems based on delayed communication-induced checkpointing. Int. J. Auton. Adapt. Comm. Syst. 2016, 9, 183–200. [Google Scholar] [CrossRef]
Tsai, J. Applying the fully-informed checkpointing protocol to the lazy indexing strategy. J. Inf. Sci. Eng. 2007, 23, 1611–1621. [Google Scholar]
Vieira, G.M.; Garcia, I.C.; Buzato, L.E. Systematic analysis of index-based checkpointing algorithms using simulation. In Proceedings of the IX Brazilian Symposium on Fault-Tolerant Computing, Florianapolis, Brazil, 5–7 March 2001; pp. 31–42. [Google Scholar]
Ahn, J. Efficient communication induced checkpointing protocol for broadcast network-based distributed systems. Parallel Process. Lett. 2019, 29, 1–12. [Google Scholar] [CrossRef]
Mansouri, H.; Pathan, A.S.K. A communication-induced checkpointing algorithm for consistent-transaction in distributed database systems. In Proceedings of the International Symposium on Security in Computing and Communication, Chennai, India, 14–17 October 2020; pp. 21–32. [Google Scholar]
Aung, T.; Min, Y.H.; Maw, A.H. Enhancement of fault tolerance in Kafka pipeline architecture. In Proceedings of the 11th International Conference on Advances in Information Technology, Bangkok, Thailand, 1–3 July 2020; pp. 1–8. [Google Scholar]
Ahn, J. Scalable sender-based message logging protocol with little communication overhead for distributed systems. Parallel Process. Lett. 2019, 29, 1–10. [Google Scholar] [CrossRef]
Meyer, H.; Rexachs, D.; Luque, E. Hybrid message pessimistic logging. improving current pessimistic message logging protocols. J. Parallel. Distrib. Comput. 2017, 104, 206–222. [Google Scholar] [CrossRef] [Green Version]
Schlichting, R.D.; Schneider, F.B. Fail-stop processors: An approach to designing fault-tolerant distributed computing systems. ACM Trans. Comput. Syst. 1985, 1, 222–238. [Google Scholar] [CrossRef] [Green Version]
Lamport, L. Time, clocks, and the ordering of events in a distributed system. Commun. ACM 1978, 21, 558–565. [Google Scholar] [CrossRef]
Chandy, K.M.; Lamport, L. Distributed snapshots: Determining global states of distributed systems. ACM Trans. Comput. Syst. 1985, 3, 63–75. [Google Scholar] [CrossRef]
Netzer, R.H.B.; Xu, J. Necessary and sufficient conditions for consistent global snapshots. IEEE Trans. Parallel Distrib. Syst. 1995, 6, 165–169. [Google Scholar] [CrossRef]
Bagrodia, R.; Meyer, R.; Takai, M.; Chen, Y.; Zeng, X.; Martin, J.; Song, H.Y. Parsec: A parallel simulation environments for complex systems. Comput J. 1998, 31, 77–85. [Google Scholar] [CrossRef] [Green Version]
Andrews, G.R. Paradigms for process interaction in distributed programs. ACM Comput. Surv. 1991, 23, 49–90. [Google Scholar] [CrossRef]

Figure 1. Preventing uselessness of local checkpoints in HMNR.

Figure 2. Three cases where the process execution mode changes.

Figure 3. How S-CIC avoids forced checkpointing, unlike HMNR.

Figure 4. Modules of p for S-CIC.

Figure 5.

N O F C

for the serial pattern.

Figure 5.

N O F C

for the serial pattern.

Figure 6.

N O F C

for the circular pattern.

Figure 6.

N O F C

for the circular pattern.

Figure 7.

N O F C

for the hierarchical pattern.

Figure 7.

N O F C

for the hierarchical pattern.

Figure 8.

N O F C

for the irregular pattern.

Figure 8.

N O F C

for the irregular pattern.

Table 1. Comparison with the other CIC representatives (* non-deterministic).

Feature	S-CIC	HMNR	FINE	DCFI	BN-FI
No useless checkpoint existence	Yes	Yes	No	Yes	Yes
Broadcast network only	No	No	No	No	Yes
Model assumed if message logging used	ND *	PWD	PWD	PWD	PWD

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ahn, J. Communication-Induced Checkpointing with Message Logging beyond the Piecewise Deterministic (PWD) Model for Distributed Systems. Electronics 2021, 10, 1428. https://doi.org/10.3390/electronics10121428

AMA Style

Ahn J. Communication-Induced Checkpointing with Message Logging beyond the Piecewise Deterministic (PWD) Model for Distributed Systems. Electronics. 2021; 10(12):1428. https://doi.org/10.3390/electronics10121428

Chicago/Turabian Style

Ahn, Jinho. 2021. "Communication-Induced Checkpointing with Message Logging beyond the Piecewise Deterministic (PWD) Model for Distributed Systems" Electronics 10, no. 12: 1428. https://doi.org/10.3390/electronics10121428

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Communication-Induced Checkpointing with Message Logging beyond the Piecewise Deterministic (PWD) Model for Distributed Systems

Abstract

1. Introduction

2. Background

2.1. Fundamentals

2.2. Related Work

3. The Proposed Protocol

4. Performance Evaluation

5. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI