DDR-coin: An Efficient Probabilistic Distributed Trigger Counting Algorithm

A distributed trigger counting (DTC) problem is to detect w triggers in the distributed system consisting of n nodes. DTC algorithms can be used for monitoring systems using sensors to detect a significant global change. When designing an efficient DTC algorithm, the following goals should be considered; minimizing the whole number of exchanged messages used for counting triggers and even distribution of communication loads among nodes. In this paper, we present an efficient DTC algorithm, DDR-coin (Deterministic Detection of Randomly generated coins). The message complexity—the total number of exchanged messages—of DDR-coin is O(nlogn(w/n)) in average. MaxRcvLoad—the maximum number of received messages to detect w triggers in each node—is O(logn(w/n)) on average. DDR-coin is not an exact algorithm; even though w triggers are received by the n nodes, it can fail to raise an alarm with a negligible probability. However, DDR-coin is more efficient than exact DTC algorithms on average and the gap between those is increased for larger n. We implemented the prototype of the proposed scheme using NetLogo 6.1.1. We confirmed that experimental results are close to our mathematical analysis. Compared with the previous schemes—TreeFill, CoinRand, and RingRand— DDR-coin shows smaller message complexity and MaxRcvLoad.


Introduction
Consider a distributed system with sensors, e.g., the wireless sensor network (WSN). For many cases, monitoring is one of the most important issues and the system would like to detect a significant global state change. For example, we consider traffic surveillance where a large number of sensors are distributed in a targeted area. When the predefined number of cars have passed the targeted area, the system raises an alarm. Another example is that a large number of illegal login attempts on diverse nodes should be alarmed.
A distributed trigger counting (DTC) problem can play an important role in this kind of monitoring applications. DTC problem is formally defined as follows. Suppose a distributed system where n nodes communicate with each other. Assume that from external sources, w triggers arrive at the n nodes, and that no statistical information about the triggers is given to the system in advance. We consider the case where the number of triggers is much greater than the number of nodes, i.e., w n (If w ≤ n, the number of triggers can be easily aggregated using a spanning tree of nodes [1][2][3]). The distributed trigger counting (DTC) problem is to raise an alarm when the total number of detected triggers by the n nodes reaches to w. In a distributed system, various state changes or data from sensors can be used to initiate a trigger. Thus, if we define a global threshold for a certain property on a distributed system as the number of total generated triggers in the system, DTC algorithms can be useful for detecting the time when

•
Message complexity: the total number of exchanged messages among the nodes. For efficiency, this should be low. • MaxMsgLoad: the maximum number of exchanged (i.e., sent and received) messages in each node. For even distribution of load, this should be low. • MaxRcvLoad: the maximum number of received messages in each node. For even distribution of load, this should be low.
As seen in Table 1, the average message complexity of DDR-coin is O(n log n (w/n)), which is lower than the optimal message complexity (O(n log(w/n)) [17]) of exact DTC algorithms. The MaxRcvLoad of DDR-coin is O(log n (w/n)) on average, which is lower than those of other schemes. (For MaxMsgLoad, just as in many of previous schemes, we were unable to get the bounds of DDR-coin since it is too complex.) The failure probability of DDR-coin is negligible, which will be seen in Section 3.1.3. We implemented the prototype of the proposed scheme using NetLogo 6.1.1. We confirmed that experimental results are close to our mathematical analysis. Compared with the previous schemes-TreeFill, CoinRand, and RingRand-DDR-coin shows smaller message complexity and MaxRcvLoad.
This paper is organized as follows. The DDR-coin algorithm is explained in Section 2. We analyze the failure probability, message complexity, and MaxRcvLoad of DDR-coin in Section 3. We show experimental results in Section 4. The related works on DTC algorithms are summarized in Section 5. We conclude this paper in Section 6. Centralized [3] O(n log(w/n)) − − Exact Tree-based [3] O(n log n log(w/n)) O(log n log(w/n)) O(log n log(w/n)) Exact LayeredRand [1] O(n log n log w) O(log n log w) − Exact CompTreeRand [18] O(n log w(log log n) 2 ) − − Probabilistic CompTreeDet [18] O(n(log w log n) 2 ) O((log w log n) 2 ) O((log w log n) 2 ) Exact CoinRand [2] O(n(log w + log n)) O(log w + log n) − Exact RingRand [2] O(n log n log w) O(log n log w) O(log n log w) Probabilistic TreeFill [17] O(n log(w/n)) O(log(w/n)) − Exact DDR-coin O(n log n (w/n)) O(log n (w/n)) − Probabilistic (−: not bounded, which implies that the value is equal to the message complexity.) (The algorithms of the work in [18] are the bounds for arbitrary networks).

DDR-coin Algorithm
After we describe the system model and our objectives in Section 2.1, an overview of DDR-coin is given in Section 2.2. Section 2.3 explains the tree-like structure used by DDR-coin and Section 2.4 deals with detailed explanation of the DDR-coin algorithm. Table 2 summarizes explanation on notation used in this paper. w i The number of remaining triggers to be detected at the beginning of Round i. (w 1 = w, w i = w i−1 −ŵ i−1 (2 ≤ i ≤ f )).
Round iŵ i The number of detected triggers at Round i.
The number of remaining triggers in the beginning of the final (= f ) round. h The height of the tree-like structure.
k Each internal vertex has k children.
n j , node-j Node j (1 ≤ j ≤ n) corresponding to the vertex j in the tree-like structure.
n j .trg The number of received triggers in n j (1 ≤ j ≤ n).

d u
The node u for the internal vertex in the tree-like structure.
Tree-like d u .cns [1..k] The Boolean array of length k in d u .
structure coin When n j at level-h receives a trigger, it generates a coin message with the probability of n/w i . This coin is sent to a randomly-selected node in level-(h-1).

System Model and Objectives
We assume that the number of nodes in the system is n. To simplify the problem, assume that the nodes are fully connected, there are no message drops, there are no external attackers, and the nodes do not fail. Events are being triggered with arbitrary distribution on these nodes in the system. We want to detect and raise an alarm when w or more triggers occur in the system. To this end, n nodes should send and receive messages, and we want to minimize this (i.e., minimizing message complexity). We also want communication overheads to be evenly distributed among nodes (i.e., minimizing MaxRcvLoad). We assume that events continue to be triggered while the protocol is running.
We only consider the case where the number of triggers is much greater than the number of nodes, i.e., w n (for w ≤ n, the works in [1][2][3] solve the problem with O(n) messages using spanning trees). Our objectives are as follows.

•
When w or more than w triggers occur, the system has a very high probability of raising an alarm.
(In other words, the failure probability is negligible.) • When the system raises an alarm, the probability that the number of triggers is less than w is 0 (i.e., no false positives).

•
The average message complexity is O(n log n (w/n)).

•
The average MaxRcvLoad is O(log n (w/n)).

Overall of DDR-coin
The system works in the following way. n nodes have hierarchy to form a complete tree-like structure, e.g., the lower part of Figure 1 shows the nodes on the network when n = 9, and the upper part corresponds to the tree-like structure of these nodes. All the nodes correspond to leaf vertices at the level-h of the tree (h: the height of the tree-like structure), and some nodes correspond to internal vertices in addition to the leaves (i.e., have dual roles). The tree-like structure will be explained in detail in Section 2.3.
DDR-coin operates in multi-round. For Round 1, w 1 , the number of remaining triggers to be detected is set to w. The goal of Round 1 is to detect the state where nodes have been received slightly less than w 1 triggers.
To do so, when an event is triggered on the node associated with the leaf with level-h, with a specific probability, a message is sent to a node associated with the internal vertex corresponding to h-1 level. Then, the node corresponding to the level h-1 counts the number of received messages, and when it exceeds a certain threshold, this node sends a message to a node corresponding to the level h-2 to inform reaching the threshold. When we repeat the work in this way, the node corresponding to the root (level-0) finally receives messages from nodes at level-1. Then, the root starts the aggregating work that counts the number of triggers that have occurred in all nodes (which we call the end-of-round procedure).
In the end-of-round procedure, the root propagates the aggregating message to the leaves and then each leaf sends the message that contains the number triggers (i.e., events triggered) in the leaf to the root. At the end of this process, the root node knows thatŵ 1 (≤ w 1 ) triggers have occurred in Round 1. Then, Round 1 is finished and Round 2 starts. Round 2 works in the same way, but the threshold/parameters are adjusted to detect slightly less than w 2 (= w 1 −ŵ 1 ) triggers.
If we repeat this work, the number of remaining triggers will gradually decrease and will be less than or equal to n. Then, it goes to the final round: using the procedure of Section 2.4.4, we count the number of triggers that have occurred exactly w and raise an alarm.
For better understanding, a detailed example for Rounds 1 and 2 is given in Appendix A.

Tree-Like Structure
In this section, we describe the tree-like structure used by DDR-coin. This structure is the complete k-ary tree, where vertices are associated with nodes in the network. An example of the tree-like structure in DDR-coin, when k = 3 and n = 9, is shown in the upper side of Figure 1.
We define level-l as follows; the root vertex is in level-0 and the vertices at level-(l + 1) are children of vertices at level-l. Note that all the n nodes are related with n leaf vertices in level-h, where h is the height of this tree-like structure, i.e., the maximal level, e.g., in Figure 1, h = 2. Internal vertices are from level-0 through level-(h-1). We assume n = k h for ease of algorithm explanation and analysis. Our algorithm can be easily extended to general cases (which may be hard to analyze mathematically).
Each node in the network (e.g., at the bottom of Figure 1) is associated with each leaf vertex in this tree-like structure. For example, in Figure 1 below, node-3 in the network is associated with leaf vertex-3 of the tree-like structure. Some nodes have dual roles: a node is associated with one leaf and one internal vertex, e.g., in Figure 1, node-4 in the network is associated with leaf vertex-4 and root vertex-4. From now on, "the node u in the tree-like structure" denotes the node u in the network where the node u is associated with the vertex u of the tree-like structure.
Actually, we use this tree-like structure to associate the level of a tree with a node but the message is not necessarily transmitted along the edge of the tree, e.g., as will be explained in detail in Section 2.4, a node at level l sends a message to any node at level l − 1/l + 1 as well as parent/children.
At the beginning the DDR-coin protocol, the nodes for internal vertices are chosen among the n nodes, e.g., in Figure 1, node-4, -5, -8, and -2 are chosen to be internal vertices. Even though we can select any nodes to be the internal vertices, one simple approach may be selecting first (n − 1)/(k − 1) nodes for the internal vertices.

DDR-coin Algorithm
DDR-coin works based on rounds. Overall operations in DDR-coin are as follows. Steps 1-3 are for each round and Step 4 is for the final round.

(Coin generation routine)
Recall that all n nodes are associated at leaf-level (level-h). When a node detects a trigger, it generates a coin message with the probability of n/w i , where w i is the number of not yet received triggers at the beginning of round i. (Initially w 1 = w.) This coin message is sent to the randomly-selected node at level-(h-1).

(Coin propagation routine)
The coin messages are propagated from leaves in the tree-like structure to the root. Eventually, the node for the root vertex detects that n coins have been generated at the leaf-level. 3. (End-of-round procedure) n nodes count the number of generated triggers up to now (using the spanning tree). If the number of not yet detected triggers is greater than n, a new round starts by going back to the coin generation routine again (Step 1). Otherwise, it goes to the final round procedure (Step 4). 4. (Final round procedure) It counts the remaining triggers (the number of which does not exceed n).
Then, it raises an alarm.

Coin Generation Routine
Let w i be the number of the triggers that are not yet detected at the beginning of i-th round. When i-th round begins, w i is calculated as follows; w 1 = w and w i = w i−1 −ŵ i−1 (i ≥ 2), whereŵ i−1 is the number of counted (i.e., detected) triggers in (i − 1)th round.
Let n j (1 ≤ j ≤ n) be a node in the system and n j .trg be the number of received triggers in n j . Initially, n j .trg (1 ≤ j ≤ n) is set to be zero.
When n j receives a trigger, it increases n j .trg by one and generates a coin message with the probability of n/w i . The coin is sent to a randomly-selected node in level-(h-1) of the tree-like structure (note that vertices for all the n nodes are in level-h so coins are sent from level-h to h-1), e.g., in Figure 1, if node-4 detects a trigger, node-4.trg =node-4.trg + 1, it generates a coin message with the probability of n/w 1 = 1/9 and then it sends this coin to a randomly-selected node, e.g., node-2 at level-1. Figure 2 shows the algorithm for the coin generation routine. If i = 1, then w 1 = w.

9:
Generate a coin message with probability n/w i .

10:
If a coin is generated then 11: Send the coin to a randomly-selected node in level-(h-1).

Figure 2.
Coin generation routine for node n j (1 ≤ j ≤ n) in ith round.

Coin Propagation Routine
The goal of the coin propagation routine is that the node for the root vertex detects that n coins have been generated at the leaves. Let d u be a node for internal vertex from level-0 to h-1 (1 ≤ u ≤ (n − 1)/(k − 1)). e.g., Figure 3 shows internal vertices (node-4, node-5, node-8, and node-2) of tree-like structure of Figure 1.
Each d u has a Boolean array of length k, d u .cns [1..k]. This array is initialized with false values at the beginning of a round. This array has two meanings. First, recall that in Section 2.4.1, coins are sent to the node d u at level-(h-1). In level- In this way, the array means d u has received v coins from the node at leaf-level (level-h), e.g., in Figure 3, currently 6 coins have arrived to the level-1: node-5 has one coin, node-8 has two coins, and node-2 has three coins.
Second, for node d u from level-0 to h-2, if d u .cns [v] (1 ≤ v ≤ k) is true, it means that all the nodes in the v-th subtree of d u are fully filled with coins (i.e., all entries in the arrays are true), e.g., Figure 3 shows that node-4 has set node-4.cns [3] as true, because the third subtree of node-4 (i.e., node-2) has fully filled with k = 3 coins.
When a coin arrives at d u at level h-1, there are 3 cases: 1. If a coin arrives and d u .cns[1...k] is not full (i.e., some entries are false), one entry with false is changed to true. 2. Suppose that d u has received k − 1 coins. When a new coin arrives at d u , now d u .cns[1...k] becomes full (i.e., all entries are true). d u sends a full-coin to its parent. Then, the parent node of d u sets the j-th entry of cns[1...k] as true where d u is j-th child, e.g., in Figure 3, when a new coin is sent to node-8, a full-coin is sent to node-4 and node-4.cns [2] is set true. If the parent's array now also becomes full, the full-coin is sent to the grandparent. This work can be repeated until the level-0. ..k]: false value in the entry is changed to true, e.g., when a new coin is sent to node-2 of Figure 3, because node-2.cns[1...k] is already full, the new overflow-coin is sent to a randomly selected node in its upper level, e.g., in Figure 3, there is only node-4 in the upper level of node-2, and the overflow-coin is sent to node-4. In Figure 3, node-4 knows that node-2 is full with 3 coins as node-4.cns [3] is true, and node-5 and node-8 have rooms for another coins. node-4 forwards overflow-coin to node-5. After node-5 receives the forwarded overflow-coin, node-5.cns [2] = true.
This process continues until all the nodes in the level-(h-1) vertices are fully filled with coins, where the number of those coins is n. If fully filled, in the node for the root, d root .cns[1...k] are all true and the root initiates the end-of-round procedure. Figure 4 shows the algorithm of the coin propagation routine. The filled box means true while the empty box means false. Currently, 6 coins have arrived at level-(h-1) (from level-h): 1 for node-5, 2 for node-8, and 3 for node-2. In the root node, node-4, node-4.cns [3] = 1. This means that the third subtree of node-4 is fully filled with coins.
1: When a node d u receives an overflow-coin: Send the overflow-coin to v-th child node. 4: Else if d u is the root node then 5: Initiate the end-of-round procedure. 6: Else 7: Send the overflow-coin to (randomly-selected) one of the nodes in its upper level. 8: 9: When a node d u receives a coin message or a full-coin message from v-th child node: 10: d u .cns[v] ← true.

11:
If d u .cns[1...k] are all true and d u is the root then 12: Initiate the end-of-round procedure. 13: Else if d u .cns[1...k] are all true then 14: Send a full-coin message to the parent. To increase the probability of going to the end-of-round procedure, in the beginning of each round, after all arrays d u .cns[] are initialized with false, κ √ n coins are randomly predistributed among the nodes at level-(h-1) in advance, i.e., κ √ n entries in arrays are true (which is described in Line 5 of Figure 2). κ ( √ n) is a security parameter to adjust the failure probability. We will analyze the relation of κ and the failure probability in Section 3.1.

End-of-Round Procedure
In the end-of-round procedure, the root node sends aggregation-request messages to its children nodes. These messages are recursively sent to the leaf nodes in level-h.
Recall that all the n nodes are in level-h. Each node n j (1 ≤ j ≤ n) sends the count-message containing the number of received triggers (=n j .trg) to its parent node. The internal nodes of DDR-coin aggregate the number of received triggers sent from its children nodes and send the sum to its parent node. Finally, the total number of received triggers at round i,ŵ i , can be calculated at the root node of DDR-coin.
Let the number of received triggers by n nodes in ith round beŵ i . Then, in the root node, w i+1 is calculated as follows; w i+1 = w i −ŵ i . If w i+1 > n, the probability to generate a coin is changed to n/w i+1 and (i + 1)th round begins. If w i+1 ≤ n, the final round begins. Figure 5 shows the algorithm for the end-of-round procedure. 1: The end-of-round procedure: 2: At the root node,ŵ i the number of received triggers in ith round 3: is aggregated using the tree-like structure of DDR-coin. 4: If w i+1 > n then 6: (i + 1)th round begins.

Final Round Routine
Let w f ≤ n be the number of not yet detected triggers in the beginning of the final round. In the beginning of the final round, n − w f coins are distributed among the nodes of level-(h-1) in advance. In each node in level-h, the coin generating probability is set to one, i.e., each node generates a coin whenever it receives a trigger.
When w f coins are generated in the nodes in level-h, the number of coins in the nodes at level-(h-1) is (n − w f ) + w f = n and the root of DDR-coin detects this and raises an alarm.

Analysis
In this section, we show that (1) when w or more than w triggers occur, the system detects this with a very high probability and raises an alarm (i.e., the failure probability is negligible). (2) When the system raises an alarm, the probability that the number of triggers is less than w is zero. (3) The average message complexity is O(n log n (w/n)). (4) The average MaxRcvLoad is O(log n (w/n)). As discussed in Section 4.4 in detail, we conduct analysis under the assumption that κ ( √ n) is a small constant positive integer (e.g., 4∼6).

Failure Probability
The success probability is defined as the probability that the system raises an alarm when w or more triggers have occurred. The failure probability is the probability that it fails to raise an alarm for this case, which is equal to 1-(success probability).
DDR-coin operates in multi-round, and raises an alarm when, in the last round, the number of triggers that the root node has counted is not less than w. Therefore, failure means that it stops before the last round or a problem occurs in the last round. The success probability can be derived by multiplying the probability of successful execution of each round. If the success probability is obtained, the failure probability can be easily calculated.
We first calculate the failure probability of each round and then we calculate the average number of rounds. From this, we will obtain the success/failure probability of DDR-coin.

Failure Probability for each Round
In this subsection, we calculate the failure probability for each round. Each round works as follows. First, κ √ n coins are randomly sent to the nodes at level-(h-1) of the tree-like structure in advance. Then, for each trigger, a coin is generated with a specific probability (and then is sent from level-h to level-(h-1)). When n coins are in the nodes at level-(h-1), the root node detects this (by checking that the array of the root node is full) and goes to the end-of-round procedure. Then, it goes to the next round.
Because the end-of-round routine eventually finishes for every case, the failure occurs in each round when the leaf nodes at level-h have generated less than n − κ √ n coin messages, which implies that the root node's array is not full and the root is waiting forever. Therefore, the failure probability for each round is defined as follows; when the number of observed triggers in the n nodes is w i , the probability of less than n − κ √ n coin messages are generated. We show that the failure probability for each round is negligible. Let w i triggers has observed during round i (1 ≤ i). The random variable X denotes the number of generated coins in i-th round. Theorem 1 shows that Pr(X < n − κ √ n) is negligible with the security parameter κ. Theorem 1. When w i triggers have been observed during round i, the probability of generating less than Proof. e 1 , . . . , e w i denote the triggers. Recall that for each trigger, a coin message is generated with the probability of n w i (independent event). Let X k (1 ≤ k ≤ w i ) be the binary random variable describing for generating a coin message, which is a Bernoulli trial. (1 means successfully generated, 0 means not generated.) X = ∑ w i k=1 X k . E(X) = n. By Chernoff-Hoeffding bounds [20], Theorem 1 shows that DDR-coin eventually finishes the round with the negligible failure probability, e.g., if κ = 5, this probability is 3.726 * 10 −6 .

The Average Number of Rounds
We show that the average number of rounds in DDR-coin is O(log n (w/n)) by Theorem 2.
Theorem 2. If i-th round finishes with n − κ √ n coin messages generated by triggers at the leaf-level (level-h), the average number of triggers is ( Proof. e 1 , . . . , e w i denote the triggers. Recall that for each trigger, the coin message is generated with the probability of p = n w i (independent event). Suppose that n − κ √ n coin messages have been generated. Let Y be the number of triggers. Y has Pascal distribution (also known as negative binomial distribution) [21]. The expectation, By Theorem 2, after ith round of DDR-coin, the average number of remaining triggers to be counted is (κ/ √ n) i w. At the begining of the final round f , w f = (κ/ √ n) f −1 w in average. In the final round, in the worst case, w f = n. Therefore, the average number of rounds in DDR-coin is O(log n (w/n)).

Failure Probability and Success Probability
By Theorems 1 and 2, we get the failure probability of DDR-coin as follows: if the number of observed triggers is w, DDR-coin detects this with negligible failure probability: 1 − (1 − e −κ 2 /2 ) O(log n (w/n)) , e.g., if n = 200, w = 10, 000, and κ = 5, the number of rounds in DDR-coin is 5 and the failure probability is 1.863 * 10 −5 . The success probability is 1-the failure probability.

False Positive Probability
In this paper, the false positive probability means the probability that the number of triggers is less than w when the system raises an alarm. In DDR-coin, the false positive probability is 0 as DDR-coin counts the number of triggers that have occurred in the final round and generates an alarm only when the total number is not less than w.

Message Complexity
In the ith round of DDR-coin, the total number of messages exchanged among nodes is the summation of the following.
By Appendix B, (iii) the average number of overflow-coin messages is O(n). Therefore, the number of messages exchanged among nodes in ith round of DDR-coin is O(n) on average.
We already showed that the number of rounds in DDR-coin is O(log n (w/n)) in Section 3.1.2. Therefore, the overall message complexity of DDR-coin is O(n log n (w/n)) on average.

MaxRcvLoad
In this subsection, we show that MaxRcvLoad of DDR-coin is O(log n (w/n)) with the exponentially high probability when k = 2. In the ith round of DDR-coin, the maximal number of messages in a node is the summation of the following. (v) The number of trigger-aggregation messages at the end-of-round procedure.
In the above numbers, (i)+(ii): n coins are independently arrive at n node in average and the probability of receiving more than 2 coins in a node is Pr(X ≥ (1/n + 1/n)n) ≤ exp(−2/n) by Chernoff-Hoeffding bounds [20]. (iv): each internal node receives one full-coin message so 1 is maximum for each node. (v): the maximum number of aggregation-request messages sent/received in each node is k. That of the count-messages is the same: k.
By Appendix B, (iii) each node at level-j forwards a overflow coin to upper level (j-1) with the probability of less than 1/2 where the number of nodes at the upper level (j-1) is 1/k times smaller than that of level-j. This implies that among all nodes, the root node receives the maximum number of overflow coins: n * ( 1 2 ) h−1 , where h = log k (n), i.e., n( 1 2 ) log k n−1 . By summing (i)-(v) and then multiplying the average number of rounds, we get MaxRcvLoad as follows: (2 + n( 1 2 ) log k n−1 + 1 + 2k)(O(log n (w/n))) = O((n( 1 2 ) log k n−1 )(log n (w/n))) with the exponentially high probability (=1 − exp(−2/n)). Especially, if k = 2, MaxRcvLoad is O(log n (w/n)) with the exponentially high probability.

Experimental Results
In Section 4.1, we briefly describe the prototype implementation of DDR-coin using NetLogo, which is one of the most widely used agent-based modeling tools. In Section 4.2, we compare the analytic results of Section 3 and simulation results. In Section 4.3, we compare the previous work with DDR-coin using NetLogo. In Section 4.4, we discuss some issues of DDR-coin algorithm.

Prototype Implementation
In this section, we describe prototype implementation of the proposed algorithm, DDR-coin. We used NetLogo 6.1.1 (made by Northwestern University, IL, USA) [22] for the simulation. NetLogo is one of the most widely used agent-based simulation tools. It can be used for a wide range of topics, such as epidemic protocols, fractals, and topics in the social sciences [22]. In NetLogo, the Logo programming language is used for modeling. The source code for the prototype implementation is available at [23].
In NetLogo, simulations are conducted with discrete time steps called ticks. In the simulation of DDR-coin, a trigger is generated at each tick of the simulation. Each node is represented as an agent in the simulation. Each node (or agent) executes the algorithms in Section 2.4 at each tick. We assume that a message sent by a node arrives at the destination node instantaneously. It is also assumed that the order of messages sent from one node to another node during simulation is preserved. However, the order of messages sent from multiple nodes to different nodes may change. The message delay and loss will be handled in future work.
Our simulation code of DDR-coin has a main loop that runs repeatedly. In this main loop, a trigger is invoked and a node is randomly selected to get this trigger. After receiving this trigger, the node uses the algorithms in Section 2.4 to handle it: a coin message is generated with a predefined probability and then sent to another node as described in Section 2.4.
In the set up procedure of the simulation, a k-ary tree-like structure is constructed. The number of nodes n is defined as k L , where k and L can be selected by the user. In Figure 6, the simulation screenshot for k = 2, n = 2 4 , κ = 2, and w = 10, 000 is shown.

Comparison of Simulation Results with Mathematical Analysis
In the simulation of DDR-coin, we conducted experiments for various number of nodes while fixing κ and the number of triggers: w = 10, 000 and κ = 5, which implies the failure probability is 4.472 * 10 −5 (n = 64) ∼ 7.453 * 10 −6 (n = 2048). In our experiments, we used 3 values for k in k-ary tree-like structure: 2, 3, and 5. The number of nodes was 2 i 1 , 3 i 2 , and 5 i 3 , where 6 ≤ i 1 ≤ 11, 4 ≤ i 2 ≤ 5, and 3 ≤ i 3 ≤ 4. We repeated 30 times to get the average value of message complexity, the number of rounds, and MaxRcvLoad.
Number of rounds: Figure 7 shows the comparison results between the measured number of rounds in simulation and the calculated one from analysis in Section 3. In this figure, we chose k = 2, κ = 5, w = 10, 000, and n = 2 6 ∼ 2 11 . X-axis corresponds to the number of nodes while y-axis represents the number of rounds. In Figure 7, the dotted line represents the analysis results. Recall that the number of rounds analyzed in Section 3 is O(log n (w/n)). Among diverse functions for O(log n (w/n)), we choose 10 · log n (w/n) whose outputs are close to the measured numbers in simulation, which are represented in the solid line in Figure 7. Similarly, Figure 8 shows the comparison results on the number of rounds where k = 3, w = 10, 000, κ = 5 and n = 3 4 , 3 5 , 5 3 , 5 4 . As shown in Figures 7 and 8, the number of rounds from simulation results and that from analysis of Section 3 are close to each other.  Message complexity: Figure 9 shows the comparison results between the measured message complexity in simulation and the calculated one from analysis in Section 3. In this figure, we chose k = 2, κ = 5, w = 10, 000, and n = 2 6 ∼ 2 11 . X-axis corresponds to the number of nodes where y-axis represents message complexity. In Figure 9, the dotted-line represents the analysis results. Recall that message complexity analyzed in Section 3 is O(n log n (w/n)). Among functions for O(n log n (w/n)), we choose 55 · n log n (w/n) whose outputs are close to the measured numbers in simulation, which are represented in the solid line in Figure 9. Similarly, Figure 10 shows the comparison results on message complexity where k = 3 or 5, w = 10, 000, κ = 5 and n = 3 4 , 3 5 , 5 3 , 5 4 (we chose 35 · n log n (w/n) for the dotted line). As shown in Figures 9 and 10, message complexity from simulation results and that from analysis of Section 3 are close to each other. Figure 9. Message complexity: the measured numbers from simulations and the analysis results from Section 3 when n = 2 6 ∼ 2 11 , w = 10, 000. The solid line represents message complexity measured from simulation. The dotted line (analysis results) represents the function: 55 · n log n (w/n).  Figure 11 shows the comparison results between the measured MaxRcvLoad from simulation and the calculated one from analysis in Section 3. In this figure, we chose k = 2, κ = 5, w = 10, 000, and n = 2 6 ∼ 2 11 . X-axis corresponds to the number of nodes while y-axis represents MaxRcvLoad. In Figure 11, the dotted-line represents the analysis results. Recall that MaxRcvLoad analyzed in Section 3 is O(log n (w/n)). Among functions for O(log n (w/n)), we choose 100 · log n (w/n) whose outputs are close to the measured numbers in simulation, which are represented in the solid line in Figure 11. Similarly, Figure 12 shows the comparison results on the number of rounds where k = 3 or 5, w = 10, 000, κ = 5 and n = 3 4 , 3 5 , 5 3 , 5 4 . As shown in Figures 11 and 12, MaxRcvLoad from simulation results and that from analysis of Section 3 are close to each other.

MaxRcvLoad:
From measured message complexity, the average number of exchanged messages for each node is 35 · log n (w/n) ∼ 55 · log n (w/n). Compared with this, the measured MaxRcvLoad, 100 · log n (w/n) ∼ 130 · log n (w/n), is not so big, which implies that (roughly speaking) message load is evenly distributed among nodes. Figure 11. MaxRcvLoad: the measured numbers from simulations and the analysis results from Section 3 when n = 2 6 ∼ 2 11 , w = 10, 000. The solid line represents MaxRcvLoad measured from simulation. The dotted line (analysis results) represents the function: 100 · log n (w/n).

Comparison with Previous Work
In this section, we compare the simulation results of DDR-coin with those of previous work. Among the previous schemes, we chose CoinRand [2], TreeFill [17], and RingRand [2], which show the best performance in terms of message complexity and MaxRcvLoad. In the simulation, we set the parameters as follows: w = 10, 000, n = 2 i (5 ≤ i ≤ 11), and κ = 5, which implies the failure probability of DDR-coin is 1.178 * 10 −4 (n = 32) ∼ 7.453 * 10 −6 (n = 2048). Figure 13 shows the number of rounds measured in simulations of TreeFill, DDR-coin, CoinRand, and RingRand. For n ≤ 64, DDR-coin has the largest number of rounds since small n violates our assumption, κ √ n. Except for this region, the number of rounds in DDR-coin is significantly smaller than that of CoinRand. CoinRand requires 2.3 to 7.3 times more than DDR-coin. We think that DDR-coin uses a complex tree-like structure and probabilistic algorithms, both of which reduce the number of rounds.
In Figure 13, if n ≥ 64, the number of rounds in TreeFill is about 0.85 ∼ 2 times bigger than that of DDR-coin. If the number of nodes is relatively small (i.e., less than about 90), TreeFill has smaller number of rounds than DDR-coin. As the number of nodes increases, DDR-coin uses fewer rounds compared to TreeFill.
Note that in RingRand, the number of rounds is O(log w) for all n [2]. In the simulation results, the measured number of rounds is about 14∼15, which fits well with log 2 (10000) = 13.3. For n > 256, we were unable to conduct experiments on RingRand due to rapid increase of message complexity.  Figure 14 shows the total number of messages used in TreeFill, DDR-coin, CoinRand, and RingRand. As shown in this figure, when n < 152, among them TreeFill uses the smallest number of messages. For n ≥ 152, DDR-coin has the smallest number of messages. As the number of nodes increases, the difference in message complexity also increases. Especially, RingRand shows the fastest increase.
In the case of CoinRand and DDR-coin, if n < 64, DDR-coin uses more messages due to violation of our assumption, κ √ n. If the number of nodes increases, DDR-coin uses a much smaller number of messages than CoinRand. When the number of nodes is 256, DDR-coin uses about 1/3 less messages than CoinRand. If the number of nodes is 512, CoinRand uses about 4 times the messages compared to DDR-coin. The reason why CoinRand requires more messages is that CoinRand uses (about 2.3∼7.3 times) more rounds than DDR-coin.
TreeFill uses less messages than DDR-coin if n < 152. For large n, DDR-coin uses fewer messages than TreeFill. When the number of nodes is 512, DDR-coin uses 68.4% of messages compared to TreeFill. From this, we think that TreeFill is better than DDR-coin for the case when the number of nodes is not so large.  For 64 ≤ n ≤ 2048, CoinRand uses 1.42 ∼ 3.37 times MaxRcvLoad compared to DDR-coin, and this difference increases as the number of nodes increases. MaxRcvLoad is affected by the number of rounds because it is the maximum of the number of messages received by each node while the algorithm is running. As shown in Figure 13, CoinRand requires 2.3 ∼ 7.3 times more the number of rounds than DDR-coin. DDR-coin uses more messages for each round than CoinRand but the number of rounds is smaller, which explains that MaxRcvLoad of DDR-coin is about 1.42 ∼ 3.37 times smaller compared to that of CoinRand.
TreeFill shows a smaller MaxRcvLoad than that of DDR-coin when n is less than about 180. However, as the number of nodes increases, DDR-coin uses fewer rounds than TreeFill, and thus MaxRcvLoad is also smaller than TreeFill. As for RingRand, MaxRcvLoad is much larger than other algorithms. We think that this is partially because our implementation is not fully optimized. Aside from implementation inefficiencies, we expect that MaxRcvLoad of RingRand is much higher than other algorithms since the analytic result of MaxRcvLoad is O(n log n log w) [2], which is much higher than other schemes.

Discussion
In this subsection, we discuss some issues on DDR-coin algorithm: no-message drops, mean time to detect the global changes, relation of κ and message complexity, and demerit of DDR-coin. As for no-message drops, this assumption is adopted from most of the previous work [2,17,18] due to simplification of analysis. If the DTC algorithm is designed to allow message drops, message complexity will be higher and sometimes it has two-sided failures: even if less than w triggers are detected, it produces a false alarm. One of the easiest ways to allow some message drops is to establish reliable communication, e.g., challenge-and-response and retransmission. Otherwise, we can send coin messages for multiple nodes for redundancy, which also incurs extra communication overhead. (We leave the enhancement of DDR-coin to allow message drops while minimizing communication load for future work.) For mean time to detect the global changes, inherently all DTC algorithms have some delay: when w triggers occurs, they detect this after some time. This is because all DTC algorithms have no false positives and focus on minimizing message complexity, MaxRcvLoad, and MaxMsgLoad. If we try to minimize this delay, it will cause additional communication overhead or lose accuracy. Therefore, this trade-off is another important research topic, which we also leave for future work.
κ affects the failure probability and message complexity. In DDR-coin algorithm, as κ is increased, message complexity also is increased, which is shown is Figure 16. (MaxRcvLoad has the similar property.) However, in the practical point of view, we do not need to use large κ: if the number of nodes is not too small and if we choose appropriate κ (e.g., κ = 4, 5, 6), the failure probability is extremely low while message complexity is much lower than the previously known best algorithms [2,17], which is shown in Section 4.3.
Compared to the previously-known best algorithms [2,17], the DDR-coin algorithm has the disadvantage that if n is not significantly greater than κ 2 (e.g., κ = 5, n ≤ 32 ∼ 64), message complexity and MaxRcvLoad is similar or even bigger.

Related Work
DTC algorithms can be used as a building block for consistent global snapshots [3]. By using efficient DTC algorithms, the message complexity for storing global snapshots can be largely reduced compared with conventional global snapshot algorithms [12][13][14][15][16]. In conventional global snapshot algorithms, the message complexity of channel state recording is typically O(n 2 ). By using DTC algorithms, we can reduce the cost for channel state recording in global snapshots, where the message complexity is O(n log(w/n)) [3].
Garg et al. proposed three DTC algorithms and proved the lower bound of message complexity for general DTC algorithms [3], where the lower bound of DTC algorithms is O(n log(w/n)). One of their algorithms shows an optimal message complexity, but it uses a centralized approach and MaxRcvLoad of this DTC algorithm is not bounded.
Chakaravarthy et al. proposed a near optimal DTC algorithm called LayeredRand [1]. The message complexity and MaxRcvLoad of this algorithm are O(n log n log w) and O(log n log w), respectively [1]. In [2], they proposed two DTC algorithms, which can be considered as an improvement of [1]. The DTC algorithms they proposed are called CoinRand and RingRand, respectively [2]. The message complexity and MaxRcvLoad of CoinRand are O(n(log w + log n)) and O(log w + log n), respectively. This algorithm is based on a network topology similar with binary trees. They use a randomized technique in CoinRand during the message-aggregation process. As a result, it shows better performance than their previous work, LayeredRand [1]. The message complexity and MaxRcvLoad of RingRand are O(n log n log w) and O(log n log w), respectively. Kim et al. proposed an optimal DTC algorithm [17]. The message complexity and MaxRcvLoad of their algorithm are O(n log(w/n)) and O(log(w/n)), respectively. This is also based on a network topology similar with the tree structure.
Emek and Korman proposed DTC algorithms with more generalized assumptions on communications between nodes [18]. They proposed two DTC algorithms. The message complexity of one algorithm they proposed is O(n log w(log log n) 2 ), but MaxRcvLoad of this algorithm is not analyzed. The message complexity and MaxRcvLoad of the other algorithm are O(n(log w log n) 2 ) and O((log w log n) 2 ), respectively.
Kshemkalyani proposed a hypercube-based algorithm for global snapshots [24]. The number of messages used in the hypercube-based algorithm is O(n log n), which is lower than the optimal message complexity of DTC algorithms, O(n log(w/n)). However, the message size in hypercube-based algorithm is O(n) whereas that of DTC algorithms is O(1).
Tsai proved the lower bounds of message complexity for global snapshot algorithms based on the general grid interconnection network, which is generalization of hypercube-based network [25].
Chang et al. proposed a DTC algorithm for arbitrary network topology [19]. The algorithm they proposed is mainly focused on wireless sensor networks (WSNs) in which network topology cannot be known in advance. In the worst case, their algorithm uses x(n log w−n n 2 −n / log n n−1 + n 2 − 1) messages to solve the DTC problem, where x is twice the number of edges in a WSN.

Conclusions
In this paper, we proposed an efficient probabilistic Distributed Trigger Counting (DTC) algorithm, DDR-coin (Deterministic Detection of Randomly generated coins). Even though DDR-coin has a negligible (one-sided) failure probability, the number of exchanged messages to detect w trigger is lower than that of optimal deterministic DTC algorithms: the message complexity of DDR-coin is O(n log n (w/n)) on average and the MaxRcvLoad of DDR-coin is O(log n (w/n)) on average. We implemented prototype of DDR-coin using NetLogo 6.1.1 and then measured the message complexity and MaxRcvLoad to compare analytic results, which shows that the analytic results are close to the measured data. We also implemented CoinRand, RingRand, and TreeFill using NetLogo 6.1.1 for comparison. Experimental results show that DDR-coin shows the best performance for most of the cases. When the number of nodes is small, TreeFill is better than DDR-coin. In our experiments, message complexity and MaxRcvLoad of RingRand are greater than those of other algorithms. Our algorithm can be useful for taking global snapshots for large scale distributed systems and for detecting significant events in the distributed system with sensors. The future work includes precise analysis on the number of overflow-coin messages and implementation of library packages to cope with diverse real-life issues (including node failure, message delay/lost, and limitation on network topology).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Also, this work is the result of personal research and not related to any product of Coupang.

Appendix A
In this appendix we show a detailed example of Rounds 1 and 2 of our DDR-coin algorithm.
-(Section 2.4.1:) Suppose that all nodes have detected 9 triggers, which implies that a coin is generated and sent to level-1, e.g., node-5. -(Section 2.4.1:) Suppose that all nodes have detected 9 triggers, which implies that a coin is generated and sent to level-1, e.g., node-8. Now, node-8's array, d 8 .cns[], is full, so a full-coin is sent to node-4 (Section 2.4.2). -(Section 2.4.1:) Suppose that all nodes have detected 9 triggers, which implies that a coin is generated and sent to level-1, e.g., node-2. Since node-2 is already full, an overflow-coin is sent to node-4 (Section 2.4.2). This coin is forwarded to node-5. After receiving this coin, node-5's array, u 5 .cns[], is now full. Hence, a full-coin is sent to the root, node-4.
-(Section 2.4.1:) Note that in Round 2, for each detected trigger, a coin is generated with the prob. of n/w 2 = 9/54 = 1/6. -(Section 2.4.1:) Suppose that all nodes have detected 6 triggers, which implies that a coin is generated and sent to level-1, e.g., node-5. -(Section 2.4.1:) Suppose that all nodes have detected 6 triggers, which implies that a coin is generated and sent to level-1, e.g., node-8. Now, node-8's array, d 8 .cns[], is full, so a full-coin is sent to node-4 (Section 2.4.2). -(Section 2.4.1:) Suppose that all nodes have detected 6 triggers, which implies that a coin is generated and sent to level-1, e.g., node-2. As node-2 is already full, an overflow-coin is sent to node-4 (Section 2.4.2). This coin is forwarded to node-5. After receiving this coin, node-5's array, u 5 .cns[], is now full. Therefore, a full-coin is sent to the root, node-4.

Appendix B
In this appendix, we show that when the number of nodes is n, the average number of overflow-coin messages sent in each round is O(n).
Lemma A1. When a node receives a coin/an overflow coin from the lower level, the probability of sending an overflow-coin to the upper level is less than 1/2.

Proof. (sketch)
Recall that on the k-ary tree-like structure, the average number of coins arrives at the level-(h-1) nodes is n.
First, we show that the probability of occurring an overflow-coin message when a coin arrives at a node at level-(h-1) is less than 1/2. There are n/k nodes in level-(h-1), and each node has an array of size k. When a coin arrives at a node, an overflow-coin occurs if the array is already full (i.e., all entries are true). The overflow-coin goes up to the upper level and eventually is put in an empty (=false) entry in the array of another node at level-(h-1). Because the node is randomly selected when going-up, this overflow-coin enters a randomly selected one of the empty entries in the arrays of level-(h-1) nodes.
For the same n, if the k value in the k-ary tree-like structure decreases, the probability of generating an overflow-coin message increases (e.g., in the extreme case, if k = n, there is 1 node and the array size is n so no overflow occurs at all. If reduced to k = 3, the number of nodes is n/3. The array size of each node is 3. If 3 coins arrive at one node, it is full, and overflow occurs when another coin arrives. If k is further reduced to 2, the probability increases.).
Consider the case where k = 1 (even though we cannot build the tree-like structure and this does not exactly match our scheme, we can still calculate the probability of occurring overflow because we only focus on the bottom level). If we show that the expected value of the probability of an overflow is less than 1/2, then, for all k > 1, we can see that this probability is also less than 1/2.
Assume k = 1. There are n nodes at level-(h-1). Because k = 1, the array size is 1 and when a coin arrives at a node, the array becomes full. When two or more coin messages are received for each node, overflow occurs and coins (except for the first one) are delivered to another nodes.
When the first coin arrives at a node, the probability of overflow occurring is 0. When the second coin arrives, the probability that an overflow will occur is (1/n) because it accidentally goes to the node containing the first coin, and the overflow-coin goes to the randomly selected one of the not-yet fully filled nodes. When the third coin arrives, the probability that an overflow will occur is (2/n) (when entering one of the two nodes for the preceding two coins), and then the overflow-coin goes to the another not-yet fully filled node. Therefore, the expected value of the probability of occurring an overflow when n coins has arrived is (0 * 1 + 1 * 0) + ((1/n) * 1 + ((n − 1)/n) * 0) + ((2/n) * 1 + ((n − 2)/n) * 0) + . . . + ((n − 1)/n * 1 + (1/n) * 0) = (1/2)(n − 1)/n < 1/2. Therefore, when k > 1, the probability of overflow-coins occurring at level-(h-1) level is less than 1/2. Now, when an overflow-coin arrives at a node of level-(h-2), we show that the probability that the overflow coin will be forwarded to the upper level is also less than 1/2. There are n/k 2 nodes in level-(h-2), and the size of the array of each node is k. One entry in the array becomes true when a full-coin arrives. A full-coin means that the corresponding subtree is full (the array of all nodes of the subtree's level-(h-1) is full with true). As mentioned in the above, as subtrees are randomly filled, a full-coin arrives randomly at one of the level-(h-2) nodes. (The total number of full-coins arriving is exactly n/k.) When an overflow-coin arrives at a node, the coin is forwarded to the upper level when the array is full (with true). For the same n, the probability of forwarding the overflow-coin increases as k becomes smaller, which is just the same as at level-(h-1), i.e., if the value of k increases, the sizes of each subtree grows and the probability that k subtrees are full decreases.
For ease of analysis, we assume that k overflow-coins occur together and are processed together (actually, they are generated/processed one by one. When we analyze this for each overflow-coin we would complete full-proof, which we leave as future work). k overflow-coins arrive together at a randomly selected node of level-(h-2) and if it is full, it is eventually forwarded together to another node at level-(h-2) that is not yet full and then one entry in the array in that node is filled with the true value.
Consider the case when k = 2. The number of nodes at level-(h-2) is n/4, the array size of each node is 2. The number overflow-coins arriving at level-(h-2) is at most n * (1/2). Therefore, the number of 2-overflow-coins is n/4, so this is exactly the same case for the level-(h-1) when k = 2 and the number of nodes is n/2. Thus, the probability of forwarding an overflow-coin is less than 1/2 for k = 2 and when k > 2, the probability that an overflow coin is forwarded is also less than 1/2.
Similarly, it can be analyzed in the same way at all level-j (1 ≤ j ≤ h − 1) and the probability of forwarding an overflow-coin to the upper level is less than 1/2.
From Lemma A1, we can prove the following theorem on the average number of overflow-coin messages.
Theorem A1. When the number of nodes is n, the average number of overflow-coin messages in each round is O(n).
Proof. LetF be the random variable which represents the number of overflow-coin messages when an overflow-coin is initially forwarded from a node in level-(h-1), repeatedly forwarded to a node in level-(h − m), and then eventually sent to a node in level-(h-1).
From Lemma A1, when a coin is forwarded from level-(h-1) to level-(h − m) and to level-(h-1) again, the number of overflow-coin messages is 2m and the probability for this case is Pr(F = 2m) < 1/2 m .
We can get the bound of E(F) as follows, Because nodes at level-(h-1) receive n coins (from level-h), nF is the number of coin forwardings when n coins are sent to nodes at level-(h-1), and E(nF) < 4n. This implies that the average number of overflow-coins for each round is O(n).