Adaptive Connectivity Restoration from Node Failure(s) in Wireless Sensor Networks

Recently, there is a growing interest in the applications of wireless sensor networks (WSNs). A set of sensor nodes is deployed in order to collectively survey an area of interest and/or perform specific surveillance tasks in some of the applications, such as battlefield reconnaissance. Due to the harsh deployment environments and limited energy supply, nodes may fail, which impacts the connectivity of the whole network. Since a single node failure (cut-vertex) will destroy the connectivity and divide the network into disjoint blocks, most of the existing studies focus on the problem of single node failure. However, the failure of multiple nodes would be a disaster to the whole network and must be repaired effectively. Only few studies are proposed to handle the problem of multiple cut-vertex failures, which is a special case of multiple node failures. Therefore, this paper proposes a comprehensive solution to address the problems of node failure (single and multiple). Collaborative Single Node Failure Restoration algorithm (CSFR) is presented to solve the problem of single node failure only with cooperative communication, but CSFR-M, which is the extension of CSFR, handles the single node failure problem more effectively with node motion. Moreover, Collaborative Connectivity Restoration Algorithm (CCRA) is proposed on the basis of cooperative communication and node maneuverability to restore network connectivity after multiple nodes fail. CSFR-M and CCRA are reactive methods that initiate the connectivity restoration after detecting the node failure(s). In order to further minimize the energy dissipation, CCRA opts to simplify the recovery process by gridding. Moreover, the distance that an individual node needs to travel during recovery is reduced by choosing the nearest suitable candidates. Finally, extensive simulations validate the performance of CSFR, CSFR-M and CCRA.


Introduction
Numerous applications of wireless sensor networks has led to much research work recently [1]. For some applications, such as urban search and rescue, space exploration, battlefield surveillance, forest fire detection and containment, it is expected that a set of mobile sensor nodes will be employed to monitor the area of interest collaboratively. The unattended operation of these sensors in the harsh environment, avoids the risks to human life and decreases the cost of the applications. Normally, sensors in these applications would be battery-operated with limited energy, processing and communication capabilities. After deploying the sensors, they are envisioned to form a network through self-organization so that they can communicate with each other and deliver the sensed data to the sink node. To ensure such interactions, nodes need to stay reachable with each other, so the connectivity becomes the bottommost requirement of the network.
Nevertheless, nodes are prone to fail due to the harsh deployed environments and limited energy supply. The loss of node(s) can break the communication paths and divide the network through inward movement when the network is partitioned, and it has been deemed as an efficient strategy in most of the recent studies. For instance, DARA [6] pursues a coordinated relocation to restore the broken communication paths among the neighbors of the failed node. With the information of two-hop neighbors, DARA can identify whether a node is a cut-vertex or not. Once a failure happens, DARA selects the best candidate (BS) from the neighbors of the failed node according to their degree, distance, and node ID, and then moves the BS to the failed node's position to restore the connectivity. If any neighbor of BS cannot communicate with the BS after its movement, the neighbors of the BS initiate the process so that a cascaded movement is proposed to solve the problem. Moreover, Akkaya et al. [7] proposed an algorithm named PDARA, which forms a connected dominating set (CDS). PDARA informs a particular node in advance whether a partition will occur or not in the case that it fails (i.e., cut-vertex identification). Once a cut-vertex fails, a failure handler (FH) of the cut-vertex will initiate the recovery process. The FH finds the closest node that is dominated by the failed node and uses it as a replacement of the failed node. The overall goal is to localize the scope of the restoration and minimize the distance of movement. In order to minimize the message overhead, DCR [9] identifies the critical nodes with one-hop information and designates backups for them. The replacement is similar to DARA.
As mentioned, all these three algorithms are focused on the single node (cut-vertex only) failure problem and relocate the designated backup to the position of failed node. Unlike DARA, etc., another notable work on connectivity restoration named RIM is proposed by Younis et al. [10]. All the 1-hop neighbors move towards the position of the failed node till the distance is "R c /2", which R c is the uniform communication range of all the sensor nodes. Other nodes carry out a cascaded inward movement if they cannot communicate with the moved nodes. Although the distance that the individual node has to travel is small, the number of involved nodes in RIM may be huge in the case the area is deployed with large number of nodes. Recently, a novel method named LTRA is proposed by Zhang et al. [12]. LTRA is focused not only on critical nodes but also on the nodes whose failure may increase the number of the critical nodes. However, they all deal with the problem of single node failure.
Certainly, there are some studies focused on the problems of multi-node failure. Akkaya et al. [7] proposed MDAPRA, which is a modified algorithm of PDARA, and it strives to reposition the nodes to recover from multiple cut-vertex failures via some mutual exclusion mechanism. In the case two or more cut-vertices fail around the same time, each FH of the failed cut-vertex will initiate the recovery process and they may compete to use the same nodes for repositioning. Every cut-vertex has two failure handlers, primary FH(PFH) and second FH(SFH), in MDAPRA. When PFH cannot find a suitable candidate that has not been reserved and get stuck, the SFH will take over the recovery process after a certain time period τ τ > r s + 2 (p + t) (n − 2) . Like MDAPRA, the RAM [9] algorithm, which is based on DCR, imposes additional constraints while choosing backups for critical nodes and avoids to create another network partition during the recovery. The failures of multiple cut-vertex are one of the many situations contained in the problems of multiple node failure, so the algorithms are not suitable for all cases of the multiple node failure.
Moreover, Joshi [14] and Lee [15] focused on the recovery from network partition after multiple node failures. Joshi et al. [14] proposed an autonomous repair algorithm (AuR) to restore network connectivity. AuR is based on the principle that the connectivity between neighboring nodes is modeled as a modified electrostatic interaction based on Coulomb's law. Self-spreading is proposed based on the rationale of electrostatic attraction and repulsion in AuR. The connectivity is re-established through self-spreading and motion towards the center of the area. DORMS [15] strives to re-connect the blocks after failures of multiple node with a subset of surviving relay nodes in each block. The relays are populated in the shortest path from every block towards the center. Then, the paths are simplified by utilizing the principle of Steiner Minimum Tree (SMT), so that the number of involved relays in the recovery could be minimal. Since the multiple node failures may not always cause a network partition Sensors 2016, 16, 1487 5 of 27 and these two algorithms solve only one situation of the multi-node failures, they may be not suitable to handle all cases.
Similar to DORMS, Truong et al. [16] solve the connectivity restoration problem with the routing planning algorithms. However, SCP [16] is proposed in consideration of obstacles on the accessibility paths of the potential locations. With the knowledge of the connectivity graph G C and the mobility graph G M , SCP finds the minimal number of required relays to connect all the terminals based on the Steiner-SMT algorithm, and then it finds the cheapest circuit for the agent to visit all those nodes (including terminals and Steiner nodes). For comparison, Truong et al. present the integrated path algorithm (IP). The IP approach attempts to unite the two objectives, the number of nodes required and the mobility cost, to achieve a better performance in some scenarios. SCP and IP aim to gather information from all of the terminals regularly by a mobile agent with the designed path circuit. Thus, the period of the gathering could be too long for the network and affect the real-time performance.

Network Model
Traditionally, sensor nodes in WSN communicate with each other using the disk model, i.e., the sensing region and communication region of the sensors are the circles centered on the sensors with the sensing radius R s and communication radius R c , respectively. Sensor nodes are equipped with a single antenna and they can communicate with each other only if the Euclidean distance between any two of them is less than the communication radius, i.e., d ij ≤ R c where i and j represent the two nodes.
Nevertheless, the Cooperative Communication technology that has been proposed recently allows single antenna devices to take advantage of the benefits of MIMO system [22]. Generally, the link from node A to node B is available if and only if the received average signal-to-noise ratio (SNR) of node B from node A is not less than a fixed threshold τ, i.e., SNR ≥ τ, and vice versa. Node A and node B are connected if and only if the link from A to B and the link from B to A are both available. Considering this assumption, we utilize the cooperative communication model refers to [18]. Node i can communicate with node j if and only if the received average SNR of node j from node i (β ij ) is not less than τ, i.e., Assume that node i works with the Power P i to communicate with node j and P 0 is the maximum transmission power of all the nodes. h ij , which is generated by a Rayleigh distribution, is the channel coefficient from node i to node j and E[|h ij | 2 ] is the channel gain; the distance between node i and node j is denoted by d ij and ∂ is the path loss exponent; and N is the noise power. In this case, we consider node i as the source node and node j as the destination node, and vice versa.
With the cooperative communication technology, when multiple nodes send the same package to the same destination simultaneously, the destination node (node j) will receive multiple signals at the same time. To decode the signals, the well-known method maximal-ratio combining MRC, which is designed for diversity combining in wireless communication and CC studies [23], is utilized. The total SNR that node j received in the output of MRC combiner can be described as the sum of received average SNRs: ϕ denotes all the nodes which send the same package. Therefore, the total SNR that node j received must satisfy: In cooperative communication model, node i and node j are considered connected when the received SNR β j of node j is not less than τ and vice versa. In some cases, node i may receive insufficient SNR from node j, so that node j cannot establish the cooperative communication link to node i. Then the communication path between node i and j is unidirectional which is irrespective in this paper. It is assumed that node i and j can communicate with each other directly since the cooperative communication link is established.
Assume all the sensors are deployed randomly in the designated area with the uniform transmission power P 0 , a two-dimensional graph G = (V, E) is formed by self-organization. V = (V 1 , V 2 , . . . , V n ) is the set of each sensor node and set E indicates all the connected paths between every two nodes which can communicate directly. V (G) and E (G) are the vertex-set and edge-set of graph G, respectively. The network is constructed base on the disk model initially. The set of nodes that node i can communicate with directly is denoted as N (i). In other words, set N (i) represents all the 1-hop neighbors of node i. Definition 5 (Node Connectivity Materiality). Node connectivity materiality of node i (M i ) is the ratio of the shortest path hops sum between nodes in N (i) before and after node i fails, i.e., M i = L i L 0 = ∑ j,k∈ψ;j =k Hn * jk ∑ j,k∈ψ;j =k Hn jk , L i indicates the sum of the shortest path hops when i is failed and L 0 is the sum of the shortest path hops in the initial graph, Hn jk and Hn * jk indicate the shortest path hops from j to k before and after node i fails, ψ is the set of neighbor nodes of node i, let Hn * jk = ∞ when two nodes cannot communicate with each other due to the failure. Definition 6 (Node Partition Character). Node partition character of node i (PC i ) is the number of disconnected blocks after the failure of node i. Let the node partition character of node i, PC i = 0, when the node connectivity materiality M i = ∞ ; otherwise, PC i ≥ 2.
Assume every node knows its location and each node has a unique ID. All the nodes in WSN exchange their location information and node ID during the self-organization phase. Since the network topology is formed, each node keeps a neighbor table, which contains the location information ID of all the 2-hop neighbor nodes, the connectivity materiality and partition character.

Problem Formulation
Generally, sensors in WSN may be deployed in the harsh environment and form the network by self-organization. Sensor nodes are prone to fail in hostile environment because of battery exhausting, hardware faults, etc. The network connectivity, which is the bottommost requirement to guarantee the availability of the network, will be destroyed due to the node failure(s). Most of the previous studies are devoted to solving the problem of single cut-vertex failure, whereas this paper discusses the problems of single-node (whether it is a cut-vertex or not) and multi-node failures.
As discussed previously, the failure of a cut-vertex will destroy the network connectivity since it divides the network into disconnected regions, as shown in Figure 1a. If node A4 fails for some reason, the network will be divided into two blocks and nodes in different blocks cannot communicate with each other. In this case, the network is doomed since the network connectivity is destroyed. Because node A1 and A2 are out of the communication range of node A5 and A6, they cannot establish communication path with each other in normal communication model (i.e., disk model). However, they can re-establish the communication path via cooperative communication. For instance, node A2 may use its neighbors (node A1, A3, B4 and B5) as helper nodes to build the connecting path to node A6. In addition, node A6 will recover the connectivity between node A2 in the same way as mentioned above. For instance, node 2 may use its neighbors (node 1, 3, 4 and 5) as helper nodes to build the connecting path to node 6. In addition, node 6 will recover the connectivity between node 2 in the same way as mentioned above.
(a) (b) Nevertheless, the failed node may not be a cut-vertex (Case 2 for short), as shown in Figure 1a,b. The failure of node 6 will not affect the network connectivity, but the path length of node 5 and 1 is significantly longer than it was before the failure (from 2 hops to 6 hops), which may cause unnecessary energy consumption. Moreover, many other nodes become cut-vertices due to the failure of node 6, such as nodes 7, 8, etc. As we all know, the load of the paths that contain cut-vertices may be extremely high. The cut-vertices in these paths may consume much more energy than other nodes, and the whole network may be disconnected in a short time. In addition, the traffic interference of the paths will be catastrophic, the communications may be difficult and the network will not be able to work successfully. Similarly, when node 2 fails in Figure 1a, all of its neighbors can still communicate with each other, but the shortest path has changed. Thus, it is also important to solve this problem and not only the cut-vertex failure. In consideration of these two cases, we can formulate the problem of single node failure restoration via cooperative communication as follows: Problem 1. Given a self-organized and connected network = ( , ), when a single node fails, assign a power level to the involved nodes so that: (1) the network connectivity is restored; (2) the node connected degree of ( ) is non-decreasing; and (3) the sum of the assigned cooperative communication power is minimum.
The topology control with cooperative communication problem (TCC) is proven NP-complete in [23]. The above Problem 1 is a particular case of TCC, which only requires the involved neighboring nodes to be CC-based connected. Thus, it is also NP-complete, and it is necessary to have an energy-efficient algorithm that maintains the network availability via cooperative communication to solve the problem. Let ( ) be the minimum required power for source node to communicate with node directly, where ∈ ( ) and the superscript stands for the direct communication. It is also the minimum required power for the help link ( , ). Symbol ( ) and ( ) are the required cooperative communication power assigned for source node and helper node to establish cooperative communication link with destination node d, respectively, and the superscript stands for the cooperative communication. Since the number of helper nodes may change, the assigned power ( ) and ( ) will be different. Moreover, the failed node may have some neighbors to be selected as source and destination node, different combination will cause distinct power assignment. Taking this tradeoff into consideration, some energy-efficient strategies, which select the appropriate source node, helper nodes and destination node to minimize the sum of assigned transmission power, should be proposed. Given a connected graph, = ( , ), ( ) ≠ and ( ) ≠ . Each node is deployed with the uniform initial power . If a node fails, some nodes will be selected as source node and Nevertheless, the failed node may not be a cut-vertex (Case 2 for short), as shown in Figure 1a,b. The failure of node A6 will not affect the network connectivity, but the path length of node A5 and B1 is significantly longer than it was before the failure (from 2 hops to 6 hops), which may cause unnecessary energy consumption. Moreover, many other nodes become cut-vertices due to the failure of node A6, such as nodes A7, A8, etc. As we all know, the load of the paths that contain cut-vertices may be extremely high. The cut-vertices in these paths may consume much more energy than other nodes, and the whole network may be disconnected in a short time. In addition, the traffic interference of the paths will be catastrophic, the communications may be difficult and the network will not be able to work successfully. Similarly, when node A2 fails in Figure 1a, all of its neighbors can still communicate with each other, but the shortest path has changed. Thus, it is also important to solve this problem and not only the cut-vertex failure.
In consideration of these two cases, we can formulate the problem of single node failure restoration via cooperative communication as follows: Problem 1. Given a self-organized and connected network G = (V, E), when a single node i fails, assign a power level to the involved nodes so that: (1) the network connectivity is restored; (2) the node connected degree of N (i) is non-decreasing; and (3) the sum of the assigned cooperative communication power is minimum.
The topology control with cooperative communication problem (TCC) is proven NP-complete in [23]. The above Problem 1 is a particular case of TCC, which only requires the involved neighboring nodes to be CC-based connected. Thus, it is also NP-complete, and it is necessary to have an energy-efficient algorithm that maintains the network availability via cooperative communication to solve the problem. Let P d s (i) be the minimum required power for source node s to communicate with node i directly, where i ∈ H (s) and the superscript d stands for the direct communication.
It is also the minimum required power for the help link (s, i). Symbol P c s (d) and P c i (d) are the required cooperative communication power assigned for source node s and helper node i to establish cooperative communication link with destination node d, respectively, and the superscript c stands for the cooperative communication. Since the number of helper nodes may change, the assigned power P c s (d) and P c i (d) will be different. Moreover, the failed node may have some neighbors to be selected as source and destination node, different combination will cause distinct power assignment. Taking this tradeoff into consideration, some energy-efficient strategies, which select the appropriate source node, helper nodes and destination node to minimize the sum of assigned transmission power, should be proposed. Given a connected graph, G = (V, E), V (G) = φ and E (G) = φ. Each node is deployed with the uniform initial power P 0 . If a node fails, some nodes will be selected as source node s and destination node d. Symbol P d s (i) is the required communication power for node s to communicate with node i(i ∈ H (s)) directly. Without loss of generality, we assume that the channel gain E[|h ij | 2 ] = 1 and the noise power N = 1 in the previous formulas to reduce complex notation. Therefore, according to Equations (1) and (3), we can obtain the following: Symbol P c s ∪H(s) (d) is the minimum assigned cooperative power for source node s and every helper node i ∈ H (s) to establish cooperative communication link with destination node d collectively. Symbol max i ∈H(s) P d s (i) is the minimum assigned power for source node s to communicate with every helper node i ∈ H (s) directly. Thus, problem 1 can be formulated as follows: n + 1). The first constraint denotes that set ϕ is the union set of source node s and the helper nodes set H (s). Assume that the source node s and the helper nodes in H (s) transmit the same package to the destination node d with the same cooperative power simultaneously. The second constraint ensures that there are enough helper nodes to establish a cooperative communication link (denotes as CC-link for short hereafter) between source node and destination node as shown in Equation (3). In order to establish the CC-link, node s will have two power levels when building the CC-link: the power for communicating with the helper nodes directly and the assigned cooperative power. Similarly, the helper nodes will have the power for communicating with the source node directly and the assigned cooperative power. Then, the third and fourth constraints guarantee that the source node and the helper nodes are assigned with a minimum suitable power to maintain the CC-link. As mentioned above, the Case 2 of node failure is also considered in this paper and the fifth constraint ensures that. When a node f fails for some reason, D * n is the connected degree of its neighbor node n (n ∈ N ( f )) after the cooperative communication path is established and D n indicates the connected degree before the failure. Since the failed node is removed from the network, the connected degree of its neighbors will decrease by at least one, regardless of whether the link is reestablished. In Case 2 of single node failure, the connected degree of its neighbors will also decrease significantly and the above constraint makes sure that the connectivity is restored.

Node Failure Restoration via Cooperative Communication
In this section, we present three algorithms: Collaborative Single Node Failure Restoration (CSFR), Collaborative Single Node Failure Restoration with node Mobility (CSFR-M) and Collaborative Connectivity Restoration Algorithm (CCRA) in detail.

Collaborative Single Node Failure Restoration (CSFR)
Similar to our proposed algorithm CCFR [24], CSFR is a distributed algorithm in which the involved nodes restore the connectivity cooperatively. As described previously, the neighbors of the failed node will initialize the recovery process after detecting the failure. Since the neighbors of the failed node may not be able to communicate with each other when the failed node is a cut-vertex, it is important that all nodes should maintain a 2-hop neighbors table. The 2-hop neighbors table may contain all the 1-hop and 2-hop neighbors, along with the coordinates of all the 1-hop and 2-hop neighbors. Each node will also tabulate the node ID, node connectivity materiality and partition character of all the nodes in the 2-hop neighbors table. CSFR ensures that the restoration procedure is convergent. The following sections describe the detailed steps.

Source and Destination Nodes Selection
Heartbeat messages will be sent periodically by every sensor node to their neighbors to declare that they are functional and also to report changes to the 1-hop neighbors. Thus, the failure of a node will be detected if all of its neighbor nodes miss the heartbeat message from it. Depending on the node connectivity materiality, the neighbors of the failed node may decide whether a recovery is needed or not. As discussed previously, the failure of a node that does not impede the connectivity of any other nodes would not necessitate any restorations to the network topology. When a node f fails, the neighbors of node f will check the node connectivity materiality of f : (1) if M f = 1, no recovery is needed; (2) if 1 < M f < ∞, the failed node f is not a cut-vertex, but the shortest path between its neighbors has increased, i.e., the Case 2 discussed above; and (3) if M f = ∞, it is obvious that the network connectivity has been destroyed.
As shown in Figure 1a,b, when node A6 fails, the shortest path between its neighbors increased, so it is necessary to decrease the increment. Since the shortest path between node A5 and B1 is the longest among all the others after failure, CSFR may choose node A5 and B1 as a pair of source and destination nodes when establishing the CC-link. Figure 2a,b shows two cases of single cut-vertex failure, when the network is divided into disconnected regions after node f fails. In Figure 2a, there are four blocks after the failure. Node a, b, c and d will select themselves as the pair of source and destination nodes based on the information of 2-hop neighbors. According to Equations (4) and (5), the two nearest nodes are chosen as source and destination nodes, for example, node a and b. Thus, node a and b will establish connecting path via cooperative communication firstly, the block contains node a and the block contains node b will be referred to as a group after the collaboration. Then, there are three disconnected blocks and the recovery process continues, the second nearest nodes in the different blocks will be selected as source and destination nodes, and the process goes on until the network is connected.
Considering all the analysis above, the 1-hop neighbors of the failed node will initiate the recovery process. The source and destination nodes are selected between them based on the following factors: the distance d sd between each other, the node connected degree and the node ID. The two nodes with a shortest distance d sd , which is greater than the communication range R c , will be selected as a pair of source and destination nodes. When the shortest d sd is equal, the nodes with the biggest node connected degree will be preferred. The pair of nodes with the smallest node ID will choose themselves as the source and destination nodes when the above two factors cannot be judged. Of the source and destination node pair, whichever is closer to the failed node f will choose itself as the source node and start following helper nodes selection first. The other node, i.e., destination node, will start a timer (T b ) to wait for the data packet from the source node. The roles will be exchanged between them so as to build the bidirectional CC-link as discussed before. cut-vertex failure, when the network is divided into disconnected regions after node fails. In Figure 2a, there are four blocks after the failure. Node , , and will select themselves as the pair of source and destination nodes based on the information of 2-hop neighbors. According to Equations (4) and (5), the two nearest nodes are chosen as source and destination nodes, for example, node and . Thus, node and will establish connecting path via cooperative communication firstly, the block contains node and the block contains node will be referred to as a group after the collaboration. Then, there are three disconnected blocks and the recovery process continues, the second nearest nodes in the different blocks will be selected as source and destination nodes, and the process goes on until the network is connected. Considering all the analysis above, the 1-hop neighbors of the failed node will initiate the recovery process. The source and destination nodes are selected between them based on the following factors: the distance between each other, the node connected degree and the node ID. The two nodes with a shortest distance , which is greater than the communication range , will be selected as a pair of source and destination nodes. When the shortest is equal, the nodes with the biggest node connected degree will be preferred. The pair of nodes with the smallest node ID will choose themselves as the source and destination nodes when the above two factors cannot be judged. Of the source and destination node pair, whichever is closer to the failed node will choose itself as the source node and start following helper nodes selection first. The other node, i.e., destination node, will start a timer ( ) to wait for the data packet from the source node. The roles will be exchanged between them so as to build the bidirectional CC-link as discussed before.

Helper Nodes Selection
Again, since the CC-link is bidirectional, the source and destination nodes will select their helper nodes, separately. The helper nodes selection will be explained using the source node as an example. According to Equations (4) and (5), the cooperative communication power P c s ∪H(s) (d) mainly depends on the distance between the nodes in s ∪ H (s) and the destination node. Let D d H denote the distance between the direct neighbors of source node (i.e., N (s)) and destination node. Then, the source node adds neighbor nodes into the helper nodes set H (s) in ascending order of the distance D d H . The source node will check whether Equation (3), which is the primary principle for a CC-link, is satisfied or not once a helper node is added. Upon computation, the source node will set the appropriate nodes into the helper nodes set.

Cooperative Communication
For the source node, it will decide whether the CC-link establish requirement has been satisfied. If not, the source node will give up the establishment and inform all its neighbors, which may contain some of the 1-hop neighbors of the failed node f , about the failure of CC-link buildup. Otherwise, having selected the helper set and assigned corresponding communication powers, it will start cooperative transmission by sending the data packet to all the helper nodes. Then all the nodes in the cooperation set cooperatively transmit the data packet to the destination node over orthogonal channels, such as using Code Division Multiple Access (CDMA) or Time Division Multiple Access (TDMA), at the assigned powers [25]. Once the data packet is sent, the source node will start the same timer (T b ) and act as a destination node.
As for the destination node, it will start the timer (T b ) to wait for the data packet from the source node first. If it receives the data packet before the timer is expired, the destination node will start the helper nodes selection and act as a source node. If the timer expires and the destination node receives nothing, the destination node will inform all of its neighbors, which may contain some of the 1-hop neighbors of the failed node f , about the failure of CC-link buildup. The timer (T b ) may contain the computation time T c for helper nodes selection and the data packet transmission time T p , i.e., T b = T c + T p [26].
In the worst and rare case, such as node f in Figure 2a, suppose the CC-link is established following the order of a → b → d → c . Since all of the neighbors of node f know the PC f , the node d and c will set a timer PC f − 1 × 2T b and wait for CC-link establishment. After node a and b have finished the establishment, they will try to build the CC-link with node d, and the algorithm continues until the network is connected again. If some CC-links cannot be built, such as the CC-link between node a and b, they will back off and announce the failure of the CC-link establishment. The other nodes will find out the infeasible establishment when the timer expires. The pseudo-code of CSFR is shown as Algorithm 1: Return H (u) , ∑ i∈ϕ P i (d) 13: else 14: Announce the failure of CC-link establishment 15: end if 16: else 17: if ∑ i∈ϕ P i (d) < ∑ i∈ϕ∪N k+1 P i (d) then 18: Return H (u) , ∑ i∈ϕ P i (d) 19: else 20: As mentioned in Section 4.1, the source and destination nodes may not be able to build the CC-link; moreover, the recovery may be complicated in the worst case. To overcome the shortcomings of CSFR, optimize this solution and extend to solve the multiple node failure problems, we will utilize the node mobility as assumed above and propose the optimized algorithm CSFR-M. Assume all of the sensors can move without any movement constraints and the moving energy model of each node is the same (such as the model in [13,27]). As shown in Figure 3a: (1) When node A6 fails, the node A5 and node B1 will be chosen as source node and destination nodes in CSFR respectively. However, node A5 cannot build the cooperative communication link with B1 due to the distance. Thus, in the optimized algorithm CSFR-M, if node A5 fails to build the CC-link, the backup solution will be initialized. Node A5 and B1 will announce the infeasible CC-link establishment and choose a suitable candidate from themselves to replace the failed node. If the neighbors of A5 or B1 lose their communication path with them, they should move towards the new location of A5 or B1 till they can communicate again. The process will keep going until no more links are broken due to the movement; (2) Similarly, node A3 and A7 will be chosen as source and destination node, respectively, when node A4 fails in CSFR. However, the node A7 may not have enough neighbor nodes to establish the CC-link with node A3. Notice that node A1, which is the neighbor node of node A4, is an orphan node and it will move to the location of node A4 to repair the connecting path. In summary, when the failure of a single node, such as 4 in Figure 3a,b, has been detected, the 1-hop neighbors of the failed node will find out whether there are orphan nodes after the failure. If there is an orphan node, such as node 1 in Figure 3a, 1 will choose itself as the candidate and move to the location of failed node 4; if not, in order to handle the worst case, such as the case in Figure 2a, the 1-hop neighbors of the failed node will choose a candidate to replace node directly when the partition character = ∞, > 2. Otherwise, the 1-hop neighbors of the failed node will choose the corresponding source and destination nodes ( 3 and 7) from themselves. As discussed before, the source node ( 7 in Figure 3b) will try to build the unidirectional cooperative communication path to node 3 first after selecting respective helper nodes. The destination node ( 3 in Figure 3b) will wait for a fixed time to receive the data package from node 7. If they cannot restore the connecting path, the process will back off and choose the optimum node(s) to repair the communication.
There are several other scenarios: (1) if the node connectivity materiality of the failed node = ∞ and the node partition character = 2, the node which has the minimum number of 1-hop neighbors ( 5 in Figure 3b in this case) will move to the location of the failed node. The neighbors of node 5 will move to keep connecting with it in case the communication path between them is broken. (2) If the node connectivity materiality of the failed node 1 < < ∞ and there are also no orphans around (as discussed above the failure of node 6 in Figure 3a), then the nodes ( 5 and 1) which have been chosen previously will announce the failure of CC-link establishment, and the 1-hop neighbors of node will choose the node with the minimum number of 1-hop neighbors to replace the failed node. The neighbors of the relocated node will move to keep in touch if necessary. The pseudo-code of CSFR-M is shown as Algorithm 3: the orphan which is nearest to node will be chosen and move to replace it 5: else 6: if 1 < < ∞ then

7:
Algorithm 2 CC-link establishment 8: if receiving the announcement then 9: the neighbors with minimum number of neighbors will be chosen to replace 10: end if 11: else 12: if > 2 then In summary, when the failure of a single node, such as A4 in Figure 3a,b, has been detected, the 1-hop neighbors of the failed node will find out whether there are orphan nodes after the failure. If there is an orphan node, such as node A1 in Figure 3a, A1 will choose itself as the candidate and move to the location of failed node A4; if not, in order to handle the worst case, such as the case in Figure 2a, the 1-hop neighbors of the failed node f will choose a candidate to replace node f directly when the partition character M f = ∞, PC f > 2. Otherwise, the 1-hop neighbors of the failed node will choose the corresponding source and destination nodes (A3 and A7) from themselves. As discussed before, the source node (A7 in Figure 3b) will try to build the unidirectional cooperative communication path to node A3 first after selecting respective helper nodes. The destination node (A3 in Figure 3b) will wait for a fixed time T b to receive the data package from node A7. If they cannot restore the connecting path, the process will back off and choose the optimum node(s) to repair the communication.
There are several other scenarios: (1) if the node connectivity materiality of the failed node M f = ∞ and the node partition character PC f = 2, the node which has the minimum number of 1-hop neighbors (A5 in Figure 3b in this case) will move to the location of the failed node. The neighbors of node A5 will move to keep connecting with it in case the communication path between them is broken. (2) If the node connectivity materiality of the failed node 1 < M f < ∞ and there are also no orphans around (as discussed above the failure of node A6 in Figure 3a), then the nodes (A5 and B1) which have been chosen previously will announce the failure of CC-link establishment, and the 1-hop neighbors of node f will choose the node with the minimum number of 1-hop neighbors to replace the failed node. The neighbors of the relocated node will move to keep in touch if necessary. The pseudo-code of CSFR-M is shown as Algorithm 3: the orphan which is nearest to node f will be chosen and move to replace it 5: else 6: if 1 < M f < ∞ then 7: Algorithm 2 CC-link establishment 8: if receiving the announcement then 9: the neighbors N f with minimum number of neighbors will be chosen to replace f 10: end if 11: else 12: if PC f > 2 Then 13: the neighbors N f with minimum number of neighbors will be chosen to replace f 14: else 15: Algorithm 2 CC-link establishment 16: if receiving the announcement then 17: the neighbors N f with minimum number of neighbors will be chosen to replace Since the wireless sensor networks are usually deployed in the harsh environment to monitor the designated area, sensor nodes may fail simultaneously and the function of the network will be affected even be destroyed. Thus, it is important and urgent for the network to restore from the failure and continue monitoring. Since the problem of single node failure contains many cases, the problems of multiple node failure must be much more complicated. The failed nodes may be adjacent to each other or be the 2-hop neighbors of each other, these situations need to be broken down into simple cases in order to handle them. To predigest the recovery process and solve this problem preferably, the designated area, which the sensor nodes are deployed in, is divided into grids [21].
As shown in Figure 4a,b, A denotes the deployed area and (x i , y i ) is the coordinates of arbitrary node i. The grid that a node belongs to can be indicated as: The grid size is decided by variable g; and G i x and G i y denote the row-coordinate and the column-coordinate of node i, respectively. After deployment, the nodes will figure out which grid they belong to using the information of the designated area. By means of gridding, every node will be ascribed to the specified grid. The number of grids will change with variable g and the multiple node failure problems will be solved in each grid severally.
The grid size is decided by variable ; and and denote the row-coordinate and the column-coordinate of node , respectively. After deployment, the nodes will figure out which grid they belong to using the information of the designated area. By means of gridding, every node will be ascribed to the specified grid. The number of grids will change with variable and the multiple node failure problems will be solved in each grid severally.

Problem Description
As mentioned above, single node failure may partition WSN into different blocks, let alone the multiple node failures. Consequently, the algorithm CCRA is proposed to solve this problem based on the algorithm CSFR-M and takes several remedial measures when CC-link is hard to construct. The main purpose of CCRA is to localize the connectivity restoration in each grid to simplify the process and restore the network connectivity with minimum number of involved nodes, called the displacement distance. Different to CSFR and CSFR-M, all of the nodes know their grid coordinates ( , ) and they will also maintain their 2-hop neighbors' grid coordinates in the information table after the self-organization. Meanwhile, the inner/inter grid property of each 2-hop neighbor will be attached in the information table.

Problem Description
As mentioned above, single node failure may partition WSN into different blocks, let alone the multiple node failures. Consequently, the algorithm CCRA is proposed to solve this problem based on the algorithm CSFR-M and takes several remedial measures when CC-link is hard to construct. The main purpose of CCRA is to localize the connectivity restoration in each grid to simplify the process and restore the network connectivity with minimum number of involved nodes, called the displacement distance. Different to CSFR and CSFR-M, all of the nodes know their grid coordinates G i x , G i y and they will also maintain their 2-hop neighbors' grid coordinates in the information table after the self-organization. Meanwhile, the inner/inter grid property of each 2-hop neighbor will be attached in the information table. As shown in Figure 5a, the loss of inner-grid node (such as node ) may cause the network to be partitioned. Upon gridding, the influence is only considered in the grid with the concept of CCRA in this paper. In addition, the inter-grid node failures, such as node , may affect the network connectivity significantly. The failure of node will break the connecting path between grid 1 and 4, thus grid 1 will be isolated grid and the network connectivity will be destroyed. Different from the single node failure problem, multiple nodes may fail in the neighboring area, i.e., the same grid as shown in Figure 5b. Considering all the analysis above, this problem may still be classified into two fundamental categories: the failures of inner-grid node and the failures of inter-grid node. Case 1: the failures of inner-grid node, as shown in Figure 5b. If node and fail simultaneously, the neighbors of them will initial the algorithm similar as CSFR-M to recover the failures respectively.
Case 2: the failures of inter-grid node, as shown in Figure 5b. If node , ℎ and fail simultaneously, the repeated utilization of CSFR-M can also repair the connectivity.
The multiple node failure problems are the combination of these two cases, the basic principle is to ensure the connectivity inside each grid prior to the inter-grid connectivity, i.e., restore the failures of inner-grid node first in the case that the inner-grid nodes and inter-grid nodes fail simultaneously in the same grid. There are three main categories: (1) multiple nodes fail in different grids and they cannot communicate with each other directly; (2) multiple nodes fail in the same grid; and (3) multiple nodes fail in different grids but they have connecting paths. The first category is the simple combination of the two cases previously and can be handled using CSFR-M repeatedly, but As shown in Figure 5a, the loss of inner-grid node (such as node a) may cause the network to be partitioned. Upon gridding, the influence is only considered in the grid with the concept of CCRA in this paper. In addition, the inter-grid node failures, such as node e, may affect the network connectivity significantly. The failure of node e will break the connecting path between grid 1 and 4, thus grid 1 will be isolated grid and the network connectivity will be destroyed. Different from the single node failure problem, multiple nodes may fail in the neighboring area, i.e., the same grid as shown in Figure 5b.
Considering all the analysis above, this problem may still be classified into two fundamental categories: the failures of inner-grid node and the failures of inter-grid node.
Case 1: the failures of inner-grid node, as shown in Figure 5b. If node a and b fail simultaneously, the neighbors of them will initial the algorithm similar as CSFR-M to recover the failures respectively.
Case 2: the failures of inter-grid node, as shown in Figure 5b. If node g, h and i fail simultaneously, the repeated utilization of CSFR-M can also repair the connectivity.
The multiple node failure problems are the combination of these two cases, the basic principle is to ensure the connectivity inside each grid prior to the inter-grid connectivity, i.e., restore the failures of inner-grid node first in the case that the inner-grid nodes and inter-grid nodes fail simultaneously in the same grid. There are three main categories: (1) multiple nodes fail in different grids and they cannot communicate with each other directly; (2) multiple nodes fail in the same grid; and (3) multiple nodes fail in different grids but they have connecting paths. The first category is the simple combination of the two cases previously and can be handled using CSFR-M repeatedly, but the remaining are much more complicated ( Figure 6): (1) When more than one inner-grid nodes fail in the same grid, such as node a and b, as mentioned above, the restoration should be localized in the grid as far as possible. Then, the neighbors of them (denote as N f hereafter) will check whether they could build the CC-link first, if not, the nodes in N f will initial the recovery process based on the node mobility. The fundamental principle is that once the neighbor node has been selected to replace one failed node, such as node a, it can only move to replace node a and another node will be chosen as the candidate of node b. (2) When different inter-grid nodes fail in the neighboring grid, such as node a and b, if there are no neighboring orphan nodes and the CC-link is hard to build after the failure, the neighbor node of the failed node in the same grid that has the shortest distance to the failed node will move to repair the failure. The node with least number of inter-grid neighbors will be selected if the previous parameter is the same. Finally, the smallest node ID is preferred. (3) When inner-grid and inter-grid nodes fail in the same grid, such as node a and d, as mentioned above, the inner-grid node restoration goes first, so node d will be replaced by node b primarily in this case, then node e will move to the location of node a. (4) When more than one inter-grid nodes fail in the same grid, such as node a and b, since each node maintains a 2-hop neighbors' table, node h has the information of the failed nodes in this case and it can find out that one of the only two neighbors of node b (i.e., node a) has also failed, so it will estimate that there are no enough nodes to replace node b and decides to move to recover the connectivity, the topology after the restoration is shown in Figure 6. the previous parameter is the same. Finally, the smallest node ID is preferred. (3) When inner-grid and inter-grid nodes fail in the same grid, such as node and , as mentioned above, the inner-grid node restoration goes first, so node will be replaced by node primarily in this case, then node will move to the location of node . (4) When more than one inter-grid nodes fail in the same grid, such as node and , since each node maintains a 2-hop neighbors' table, node ℎ has the information of the failed nodes in this case and it can find out that one of the only two neighbors of node (i.e., node ) has also failed, so it will estimate that there are no enough nodes to replace node and decides to move to recover the connectivity, the topology after the restoration is shown in Figure 6.

Algorithm Details
Most cases of multiple node failure have been discussed in the foregoing analysis, but there might still be some special cases. However, CCRA is an adaptive algorithm that it can adapt to all of these situations. When failures are detected by neighbors of the failed nodes, the recovery may be initialed simultaneously. The nodes in N will decide whether there are orphan nodes after the failure firstly, if so the nearest orphan will move to restore the connectivity; if not, the nodes in N will try to establish the CC-link according to Equation (3); otherwise, they will make decisions whether to move or not for restoring the connectivity. The fundamental principle is to move the

Algorithm Details
Most cases of multiple node failure have been discussed in the foregoing analysis, but there might still be some special cases. However, CCRA is an adaptive algorithm that it can adapt to all of these situations. When failures are detected by neighbors of the failed nodes, the recovery may be initialed simultaneously. The nodes in N f will decide whether there are orphan nodes after the failure firstly, if so the nearest orphan will move to restore the connectivity; if not, the nodes in N f will try to establish the CC-link according to Equation (3); otherwise, they will make decisions whether to move or not for restoring the connectivity. The fundamental principle is to move the nodes that have the least influence to the network topology, i.e., the node connectivity materiality is smallest and the node is nearest to the failed node. Moreover, the nodes that have minimum number of neighbor nodes are preferred. Restoration is localized in the grid as far as possible, unless too much nodes fail in the same grid and it is impossible for the remainder to reestablish the connectivity. In that case, the recovery will be operated similar to that in Case 4 previously. Since the algorithm is based on the gridding, the probability of failed nodes being located in the same grid will decrease significantly as will be demonstrated in the simulation. Finally, the pseudo-code is presented as Algorithm 4:

Algorithm 4 CCRA
1: ψ f : the set of failed nodes which need to be restored 2: if node i ∈ ψ f has no failed neighbor(s) in the same grid then 3: if node i ∈ ψ f has no failed neighbor(s) in the neighbor grid then 4: recovery process goes to case 2) 5: else 6: restore the connectivity similar to CSFR-M 7: end if 8: else 9: if the connected failed nodes are all inner-grid nodes then 10: recovery process goes to case 1) 11: else 12: if the connected failed nodes are all inter-grid nodes then 13: recovery process goes to case 4) 14: else 15: recovery process goes to case 3) 16: end if 17: end if 18: end if

Algorithm Analysis
The CSFR-M and CCRA algorithms are reactive processes to handle the problem of node failure(s) via cooperative communication primarily and motion ability as the assistant method when CC-link is unable to be established. The energy effectiveness of cooperative communication has been investigated by Gokturk et al. [28]. It is apparent that the node movements will consume much more energy than the communication between the nodes, thus the communicating energy consumption because the recovery process may be ignored compared to that of node movements. Since this paper is concerned on connectivity restoration, the communication between the sensor nodes is assumed with no delays and losses. Meanwhile, the sensing area and overall deployment area shrinkage are not concerned in this paper since CSFR-M and CCRA are focused on the connectivity restoration with minimum number of involved nodes.

Lemma 1.
If the deployed area of WSN has been divided into grids, the failure of inner-grid node only affects the connectivity of the grid which it belongs to.
Proof. Upon gridding, the network connectivity can be expressed as the connection of all of the grids. As we can see from the definition of inner-grid node, all of the neighbors of the inner-grid node are in the same grid, i.e., the inner-grid nodes do not communicate with the nodes in the different grids directly. Since the restoration is localized in each grid, the loss of inner-grid node can be replaced by the neighbors in the same grid. The subsequent node movements, which may include the motion of inter-grid nodes, can be treated as the restoration of inter-grid nodes. That is, the influence of inner-grid node failure only locates in the grid. Lemma 2. The recovery will be simplified significantly after gridding in CCRA. Figure 7a, when all the nodes are deployed in a straight line and the distances between each node are R c . Nodes b, c and d are adjacent failed nodes, node a and e can only detect the failures of node b, c and c, d, respectively, but node f and g can detect the failures of all three nodes. The recovery process will be complicated. However, in CCRA, the failed nodes may not be in the same grid with a suitable grid size g and the restoration can be done in each grid simply, as shown in Figure 7b.

Proof. As shown in
with no delays and losses. Meanwhile, the sensing area and overall deployment area shrinkage are not concerned in this paper since CSFR-M and CCRA are focused on the connectivity restoration with minimum number of involved nodes. Lemma 1. If the deployed area of WSN has been divided into grids, the failure of inner-grid node only affects the connectivity of the grid which it belongs to.
Proof. Upon gridding, the network connectivity can be expressed as the connection of all of the grids. As we can see from the definition of inner-grid node, all of the neighbors of the inner-grid node are in the same grid, i.e., the inner-grid nodes do not communicate with the nodes in the different grids directly. Since the restoration is localized in each grid, the loss of inner-grid node can be replaced by the neighbors in the same grid. The subsequent node movements, which may include the motion of inter-grid nodes, can be treated as the restoration of inter-grid nodes. That is, the influence of inner-grid node failure only locates in the grid. Lemma 2. The recovery will be simplified significantly after gridding in CCRA. Figure 7a, when all the nodes are deployed in a straight line and the distances between each node are . Nodes , and are adjacent failed nodes, node and can only detect the failures of node , and , , respectively, but node and can detect the failures of all three nodes. The recovery process will be complicated. However, in CCRA, the failed nodes may not be in the same grid with a suitable grid size and the restoration can be done in each grid simply, as shown in Figure 7b.  Proof. As mentioned before, CSFR-M and CCRA are not only focused on the restoration of cut-vertex, and obviously it is not necessary for some nodes to be recovered such as leaf nodes, so it is important to decide which node needs restoration. In Section 3, we bring out the node connectivity materiality M f , which is based on the shortest path hops between the neighbors of node f before and after the failure. Apparently, the smallest number of hops of two arbitrary non-adjacent neighbors of node f is 2, and the value will grow after the failure of node f since it is the only common neighbor node. As shown in Figure 8: (1) when node a fails, the shortest path hops between node b and c is still 1 as they are adjacent; (2) when node a fails, the number of shortest path hops between node b and d maintains at 2 because they have another common neighbor, node c; and (3) the shortest path hops between b and e will increase if node a fails, so the failure of node a needs to be recovered which proves the theorem. Proof. The calculation of node connectivity materiality depends on information regarding the whole network gained by the sink node during the self-organization and the message complexity is O N 2 . The competition of source and destination nodes is among the neighbors of failed node only, the computation complexity of each node in N f is O n f 2 , where n f is the number of nodes in N f and n f = N − 1 in the worst case. Then, the determined source and destination node will check whether the CC-link can be established or not. This does not cost any exchanged messages since the decision is made by the source and destination nodes only. If CC-link can be established bilaterally, the source and destination nodes will send one informing message along with the data packet to the nodes in H (s) and H (d), where H (s) and H (d) are the helper nodes of source and destination nodes. Then, the source and helper nodes will transmit the same data packet to destination node simultaneously, and vice versa. Thus, totally, (N − 3) messages will be needed in the worst case when the CC-link can be established. If CC-link cannot be built, the source and destination nodes will send one message to the nodes in N f announcing the failure of CC-link establishment and the selected candidate will move to replace node f . Totally, N − 3 informing messages will be transmitted in the worst case. The candidate will broadcast one message to its children about its movement and N − 3 nodes will move in the worst case. Thus, a total of 2 × (N − 3) messages would be needed when the CC-link cannot be built. Therefore, the message complexity of CSFR-M is O (N). The computation complexity of CSFR-M is O N 2 . The analysis of message complexity and computation complexity of CCRA is similar, the message complexity is also O (N) and the computation complexity will be O N 2 .

Proof. As shown in
Sensors 2016, 16, 1487 17 of 26 Proof. As mentioned before, CSFR-M and CCRA are not only focused on the restoration of cut-vertex, and obviously it is not necessary for some nodes to be recovered such as leaf nodes, so it is important to decide which node needs restoration. In Section 3, we bring out the node connectivity materiality , which is based on the shortest path hops between the neighbors of node before and after the failure. Apparently, the smallest number of hops of two arbitrary non-adjacent neighbors of node is 2, and the value will grow after the failure of node since it is the only common neighbor node. As shown in Figure 8: (1) when node fails, the shortest path hops between node and is still 1 as they are adjacent; (2) when node fails, the number of shortest path hops between node and maintains at 2 because they have another common neighbor, node ; and (3) the shortest path hops between and will increase if node fails, so the failure of node needs to be recovered which proves the theorem. Proof. The calculation of node connectivity materiality depends on information regarding the whole network gained by the sink node during the self-organization and the message complexity is ( ). The competition of source and destination nodes is among the neighbors of failed node only, the computation complexity of each node in is ( ), where is the number of nodes in and = − 1 in the worst case. Then, the determined source and destination node will check whether the CC-link can be established or not. This does not cost any exchanged messages since the decision is made by the source and destination nodes only. If CC-link can be established bilaterally, the source and destination nodes will send one informing message along with the data packet to the nodes in ( ) and ( ), where ( ) and ( ) are the helper nodes of source and destination nodes. Then, the source and helper nodes will transmit the same data packet to destination node simultaneously, and vice versa. Thus, totally, ( − 3) messages will be needed in the worst case when the CC-link can be established. If CC-link cannot be built, the source and destination nodes will send one message to the nodes in announcing the failure of CC-link establishment and the selected candidate will move to replace node . Totally, − 3 informing messages will be transmitted in the worst case. The candidate will broadcast one message to its children about its movement and − 3 nodes will move in the worst case. Thus, a total of 2 × ( − 3) messages would be needed when the CC-link cannot be built. Therefore, the message complexity of CSFR-M is ( ). The computation complexity of CSFR-M is ( ). The analysis of message complexity and computation complexity of CCRA is similar, the message complexity is also ( ) and the computation complexity will be O( ).

Single Node Failure Restoration
In this section, we will validate the effectiveness of the CSFR algorithm through simulation and compare the proposed single node failure restoration algorithm CSFR-M with the previous algorithms DARA, PADRA and RIM. All of the experimental results are achieved on Matlab2009b with a 3.3 GHz CPU and 4 GB RAM computer. In the simulations, numerous mobile nodes are

Single Node Failure Restoration
In this section, we will validate the effectiveness of the CSFR algorithm through simulation and compare the proposed single node failure restoration algorithm CSFR-M with the previous algorithms DARA, PADRA and RIM. All of the experimental results are achieved on Matlab2009b with a 3.3 GHz CPU and 4 GB RAM computer. In the simulations, numerous mobile nodes are deployed randomly in an area of 500 m × 500 m with uniform communication range R c (i.e., all nodes are deployed with the same initial power). The following metrics are presented to evaluate the performance of CSFR and CSFR-M: • Unsuccessful Repair Ratio: The ratio of unsuccessful repair times and the times that the CSFR works. As mentioned before, the cooperative communication is established based on Equation (6). In some cases, the source node or destination node may have no enough neighbors to build the CC-link, so the restoration maybe unsuccessful. • Cooperative Communication Power Ratio (PR): reports the ratio of the average assigned cooperative power and the initial power, where the average assigned cooperative power is the mean value of assigned cooperative power that required for the source node, destination node and their respective helper nodes to build the CC-link according to Equation (6). It is expressed as percentages in the Figures hereafter. •

Number of Sent Messages (SN):
The total number of messages that has been sent among the nodes during the restoration.
Meanwhile, some parameters are utilized to vary the WSN topology characteristics in different simulations and discuss the implications on the performance of CSFR and CSFR-M. They are shown as follows: • Number of Deployed Nodes (DN): This parameter influences the node density so that the network connectivity will be affected. Since large Number of Deployed Nodes increases the node density, the number of neighbors of each node grows and it is more beneficial for establishing the CC-link. • Communication Range (R c ): As assumed before, all deployed nodes have the same communication range R c and the value of R c is directly proportional to the initial power P 0 of each node. Small R c will create a sparse network topology while the large R c increases the connectivity of the holistic network. The CC-link can be built or not with different values of R c and the number of nodes which get involved in the cascaded movement during the restoration process will also be affected.
In the simulations, we have simulated different network topologies (sparse and dense) with some combinations of values of R c and DN. For CSFR algorithm, the value of R c is chosen as 140, 120, 100, 80 and 60 m, the DN is chosen from 10-100. For CSFR-M, DARA, PADRA and RIM, the value of R c is chosen from 10-100 m and the DN is also from 10-100. Without loss of generality, the path loss exponent ∂ in Equation (6) is set as 2. All topologies are run after detecting a failed node randomly and the result of the individual experiment is averaged over 30 tests. All results are subjected to 95% confidence interval analysis and stay within 5% of the sample mean.
Upon deploying, the influence of each node on the network connectivity is different. Since this paper focuses on the restoration of any kind of node failure(s), it is important to make the components of the network clear. Figure 9a,b show the probability distribution of categories that the nodes may belong to after deployment (the communication range R c is set as 100 m). The nodes with node connectivity materiality M f = 1 are classified into Category ; the nodes with node connectivity materiality 1 < M f < ∞ are Category ; and the nodes with node connectivity materiality M f = ∞ are cut-vertices. As can be seen in the Figure 9a,b, it is necessary to handle the node failures in Category as there are many of them in the network.

Unsuccessful Repair Ratio
As mentioned above, this metric is only for CSFR. Figure 10a reports the Unsuccessful Repair Ratio (UR) during the restoration process under different and DN. As can be seen in Figure 10a, the UR of each value of approaches zero with the increase of DN. When is large, the UR decreases quickly as DN increases. While DN is small, the CSFR is more prone to fail when the value of is large because the network topology with smaller R is sparser and it may not need any recuperation. Although there are some disadvantages in CSFR, we can see in Figure 10a that the CSFR is more effective in dense network topology.

Unsuccessful Repair Ratio
As mentioned above, this metric is only for CSFR. Figure 10a reports the Unsuccessful Repair Ratio (UR) during the restoration process under different R c and DN. As can be seen in Figure 10a, the UR of each value of R c approaches zero with the increase of DN. When R c is large, the UR decreases quickly as DN increases. While DN is small, the CSFR is more prone to fail when the value of R c is large because the network topology with smaller R c is sparser and it may not need any recuperation. Although there are some disadvantages in CSFR, we can see in Figure 10a that the CSFR is more effective in dense network topology.

Unsuccessful Repair Ratio
As mentioned above, this metric is only for CSFR. Figure 10a reports the Unsuccessful Repair Ratio (UR) during the restoration process under different and DN. As can be seen in Figure 10a, the UR of each value of approaches zero with the increase of DN. When is large, the UR decreases quickly as DN increases. While DN is small, the CSFR is more prone to fail when the value of is large because the network topology with smaller R is sparser and it may not need any recuperation. Although there are some disadvantages in CSFR, we can see in Figure 10a that the CSFR is more effective in dense network topology.  Figure 10b, the PR of most simulations is around 75% even with different and DN. In sparse network topology (i.e., the DN is small), the number of neighbors of each node varies greatly and the PR increases as the increases. When the number of deployed nodes increases, the neighbors of each node increase. According to Equation (3), the required power for CC-link mainly depends on the distance between the source and destination nodes. When the neighbors are enough for establishing the CC-link, the required power ratios will hold at some certain values (around 75% as shown in Figure 10b). As the CSFR-M algorithm is expanding from CSFR to avoid its disadvantage, the PR metric of CSFR-M is similar to that of CSFR.

Average Travel Distance (TD)
Since the node movements consume more energy than communications, the minimum number of relocated nodes (RN) and average travel distance (TD) will indicate the minimum energy expense of the algorithms. AS the CSFR-M is expanding from CSFR and combining it with node mobility to improve the efficiency, the Average Travel Distance of each node that is involved in the recovery process is an important metric that assesses the performance of the algorithm. As shown in Figure  11a,b, the average travel distance of CSFR-M is compared to three previous algorithms DARA, RIM  Figure 10b, the PR of most simulations is around 75% even with different R c and DN. In sparse network topology (i.e., the DN is small), the number of neighbors of each node varies greatly and the PR increases as the R c increases. When the number of deployed nodes increases, the neighbors of each node increase. According to Equation (3), the required power for CC-link mainly depends on the distance between the source and destination nodes. When the neighbors are enough for establishing the CC-link, the required power ratios will hold at some certain values (around 75% as shown in Figure 10b). As the CSFR-M algorithm is expanding from CSFR to avoid its disadvantage, the PR metric of CSFR-M is similar to that of CSFR.

Average Travel Distance (TD)
Since the node movements consume more energy than communications, the minimum number of relocated nodes (RN) and average travel distance (TD) will indicate the minimum energy expense of the algorithms. AS the CSFR-M is expanding from CSFR and combining it with node mobility to improve the efficiency, the Average Travel Distance of each node that is involved in the recovery process is an important metric that assesses the performance of the algorithm. As shown in Figure 11a,b, the average travel distance of CSFR-M is compared to three previous algorithms DARA, RIM and PADRA with different values of R c and DN. As explained in Section 2, DARA and PADRA focus on the recovery of single cut-vertex failure, while RIM can handle any single node failure, similar to CSFR-M.  poor than others in the performance of average travel distance, but the maximum probability of restoring the connectivity with node movement when = 100 is around 14% as shown in Figure   10a. With the increase of DN, the performance of CSFR-M is as good as DARA and PADRA.  Figure 11a indicates the impact of varying DN values for the network, while all the nodes are equipped with the uniform communication range R c = 100. Small DN makes the CSFR-M a little poor than others in the performance of average travel distance, but the maximum probability of restoring the connectivity with node movement when R c = 100 is around 14% as shown in Figure 10a. With the increase of DN, the performance of CSFR-M is as good as DARA and PADRA. Considering comprehensively the performance of CSFR-M in terms of average travel distance is better than DARA and PADRA algorithms. The average travel distance of RIM is the smallest since the biggest movement distance of RIM is limited to R c /2. Apparently, the average travel distance will increase as R c becomes larger, as we can see in Figure 11b. In a word, CSFR-M is better than DARA and PADRA comprehensively in the performance of average travel distance of each node.

Number of Relocated Nodes (RN)
The average number of relocated nodes during the restoration with different R c and DN is shown in Figure 12a,b. As mentioned earlier, the plotted results are the average over multiple independent simulations. The two figures indicate that CSFR-M, DARA and PADRA need almost the same nodes in the recovery, and the number of reposition nodes is fewer than RIM since RIM requires all neighbors of the failed node to move. Obviously, the number of relocated nodes in RIM will increase when the values of R c and DN grow because the number of neighbors of each node increases. However, the number of relocated nodes almost remains unchanged with different R c and DN. This will contribute to the candidate selection measure. Since the node with the minimum number of 1-hop neighbors is preferred, the number of relocated nodes will always be minimum with different R c and DN.  poor than others in the performance of average travel distance, but the maximum probability of restoring the connectivity with node movement when = 100 is around 14% as shown in Figure   10a. With the increase of DN, the performance of CSFR-M is as good as DARA and PADRA.
Considering comprehensively the performance of CSFR-M in terms of average travel distance is better than DARA and PADRA algorithms. The average travel distance of RIM is the smallest since the biggest movement distance of RIM is limited to 2 ⁄ . Apparently, the average travel distance will increase as becomes larger, as we can see in Figure 11b. In a word, CSFR-M is better than DARA and PADRA comprehensively in the performance of average travel distance of each node.

Number of Relocated Nodes (RN)
The average number of relocated nodes during the restoration with different and DN is shown in Figure 12a,b. As mentioned earlier, the plotted results are the average over multiple independent simulations. The two figures indicate that CSFR-M, DARA and PADRA need almost the same nodes in the recovery, and the number of reposition nodes is fewer than RIM since RIM requires all neighbors of the failed node to move. Obviously, the number of relocated nodes in RIM will increase when the values of and DN grow because the number of neighbors of each node increases. However, the number of relocated nodes almost remains unchanged with different and DN. This will contribute to the candidate selection measure. Since the node with the minimum number of 1-hop neighbors is preferred, the number of relocated nodes will always be minimum with different and DN. Again, the parameter of RN is counted as long as the node movement is used to restore the failure. However, CSFR-M does not always require the nodes to move and replace the failed node, as Again, the parameter of RN is counted as long as the node movement is used to restore the failure. However, CSFR-M does not always require the nodes to move and replace the failed node, as shown in Figure 10a. As can be seen in Figure 12a,b, CSFR-M performs the same as DARA and PADRA when the nodes movement recovery is needed. From the previous analysis, the DARA and PADRA algorithms can only handle single cut-vertex failure via node movement. However, as shown in Figure 10a, the unsuccessful ratio of CSFR may be ignored in dense network, i.e., CSFR-M could restore the network connectivity when arbitrary single node fails via cooperative communication rather than node movement in dense network.

Number of Sent Messages (SN)
As shown in Figure 13a,b, the curve of DARA and PADRA overlap with each other as they send almost the same number of messages during the restoration. The number of sent messages of CSFR-M during the restoration is a little bit less than that of RIM, but CSFR-M introduces significantly less messaging overhead in comparison of DARA and PADRA. RIM is better in the performance of this parameter because it needs 1-hop neighbor information, rather than the 2-hop neighbor information required in DARA and PADRA. Since CSFR-M only sends messages to inform the establishment of CC-link or the necessary of node movements, the messages are transmitted among a few involved nodes and the number should be small. Both figures also indicate that when the network becomes more connected, i.e., larger R c or DN, the message traffic grows. This can be attributed to the increased number of neighbors that must be notified before CC-link is built or relocation takes place. It is worth noting that DARA and PADRA use the same number of messages and their curves totally overlap. Considering the previous simulation results comprehensively, the CSFR-M is still an efficient and favorable approach.
As shown in Figure 13a,b, the curve of DARA and PADRA overlap with each other as they send almost the same number of messages during the restoration. The number of sent messages of CSFR-M during the restoration is a little bit less than that of RIM, but CSFR-M introduces significantly less messaging overhead in comparison of DARA and PADRA. RIM is better in the performance of this parameter because it needs 1-hop neighbor information, rather than the 2-hop neighbor information required in DARA and PADRA. Since CSFR-M only sends messages to inform the establishment of CC-link or the necessary of node movements, the messages are transmitted among a few involved nodes and the number should be small. Both figures also indicate that when the network becomes more connected, i.e., larger or DN, the message traffic grows. This can be attributed to the increased number of neighbors that must be notified before CC-link is built or relocation takes place. It is worth noting that DARA and PADRA use the same number of messages and their curves totally overlap. Considering the previous simulation results comprehensively, the CSFR-M is still an efficient and favorable approach.

Multiple Nodes Failure Recovery
To deal with the problems of multiple node failure, we present the collaborative connectivity restoration algorithm, CCRA, which is an extension of CSFR-M. The CCRA algorithm restores the network connectivity after multiple node failures and simplifies the recovery through network gridding. Since excessive small DN makes the network too sparse, 50-140 nodes will be deployed in an area of 500 m × 500 m with various values of from 50-140 m in the simulations. The metrics presented to evaluate the performance of CCRA are the same as CSFR-M, i.e., RN, TD and PR.
Different from CSFR, CSFR-M, DARA, PADRA and RIM, CCRA deals with multiple node failure problems, and the performance of CCRA will be affected by different numbers of failed nodes (denotes as Fm). Thus, there is one more parameter to be considered as follows: 

Maximum Number of Failed Nodes (Fm):
Indicates the maximum number of failed nodes in each experiment. As shown in Figure 14a, the average power ratio maintains around 75% and the average travel distance holds about 40%, but the number of relocated nodes increases significantly with the increase of Fm. When Fm is set as 20, approximately 60% of the residual healthy nodes moved in the simulations. The network topology is nearly rebuilt when so many nodes change their positions, so the next simulations will be tested with Fm = 5 and 10.

Multiple Nodes Failure Recovery
To deal with the problems of multiple node failure, we present the collaborative connectivity restoration algorithm, CCRA, which is an extension of CSFR-M. The CCRA algorithm restores the network connectivity after multiple node failures and simplifies the recovery through network gridding. Since excessive small DN makes the network too sparse, 50-140 nodes will be deployed in an area of 500 m × 500 m with various values of R c from 50-140 m in the simulations. The metrics presented to evaluate the performance of CCRA are the same as CSFR-M, i.e., RN, TD and PR.
Different from CSFR, CSFR-M, DARA, PADRA and RIM, CCRA deals with multiple node failure problems, and the performance of CCRA will be affected by different numbers of failed nodes (denotes as Fm). Thus, there is one more parameter to be considered as follows: • Maximum Number of Failed Nodes (Fm): Indicates the maximum number of failed nodes in each experiment. As shown in Figure 14a, the average power ratio maintains around 75% and the average travel distance holds about 40%, but the number of relocated nodes increases significantly with the increase of Fm. When Fm is set as 20, approximately 60% of the residual healthy nodes moved in the simulations. The network topology is nearly rebuilt when so many nodes change their positions, so the next simulations will be tested with Fm = 5 and 10. Moreover, one of the significant improvements of CCRA is gridding the network into small grids and localizing the restoration process. As described above, many neighbors of the failed nodes will move when two or more failed nodes are adjacent to each other, thus the number of adjacent failed nodes (denotes as FN) has a great influence to the performance of CCRA. Figure 14b shows the decision of the adjacent failed nodes with gridding or not when and DN are set at 100. Obviously, the number of adjacent failed nodes without gridding is larger than that with gridding in the same case, where is the size of the grid.  Moreover, one of the significant improvements of CCRA is gridding the network into small grids and localizing the restoration process. As described above, many neighbors of the failed nodes will move when two or more failed nodes are adjacent to each other, thus the number of adjacent failed nodes (denotes as FN) has a great influence to the performance of CCRA. Figure 14b shows the decision of the adjacent failed nodes with gridding or not when R c and DN are set at 100. Obviously, the number of adjacent failed nodes without gridding is larger than that with gridding in the same case, where g is the size of the grid. Figure 15a indicates that the number of adjacent failed nodes increases significantly without gridding in both cases (i.e., Fm = 5 and 10) when the communication range of each node varies from 30 to 140 m. However, the values of FN with gridding in both cases maintain around some fixed values. Similarly, all values of FN maintain around some fixed values in both cases, as can be seen in Figure 15b. Deliberating the results of these simulations, the values of FN are found to have smaller fluctuations with the smaller grid size g when other parameters are the same. Thus, we can image how small the values of FN could be when the grid size g is set under 50. However, as we can see from the simulation results, the value of FN is less than or equal to 1 when the grid size g = 50 with different R c and DN. Moreover, when the grid size is excessively small, too many empty grids that do not contain any nodes will cause unnecessary troubles during the recovery process. Certainly, the computation due to gridding will be more complex when the grid size is excessively small. Hence, the grid size g will be set as 50 m in the following simulations after considering the previous analysis comprehensively and cautiously. Moreover, one of the significant improvements of CCRA is gridding the network into small grids and localizing the restoration process. As described above, many neighbors of the failed nodes will move when two or more failed nodes are adjacent to each other, thus the number of adjacent failed nodes (denotes as FN) has a great influence to the performance of CCRA. Figure 14b shows the decision of the adjacent failed nodes with gridding or not when and DN are set at 100. Obviously, the number of adjacent failed nodes without gridding is larger than that with gridding in the same case, where is the size of the grid.  is set under 50. However, as we can see from the simulation results, the value of FN is less than or equal to 1 when the grid size = 50 with different and DN. Moreover, when the grid size is excessively small, too many empty grids that do not contain any nodes will cause unnecessary troubles during the recovery process. Certainly, the computation due to gridding will be more complex when the grid size is excessively small. Hence, the grid size will be set as 50 m in the following simulations after considering the previous analysis comprehensively and cautiously.

Number of Relocated Nodes (RN)
As mentioned before, the number of relocated nodes RN is the primary concern of the node failure restoration with node mobility. While the network is deployed with a fixed number of nodes (DN = 100), the number of relocated nodes increases about 10% when Fm is 5, but that of Fm = 10 increases much more obviously. However, when the communication range is set as 100, RN grows fleetly with various DN in both scenarios. Figure 16a,b indicates that when the number of As mentioned before, the number of relocated nodes RN is the primary concern of the node failure restoration with node mobility. While the network is deployed with a fixed number of nodes (DN = 100), the number of relocated nodes increases about 10% when Fm is 5, but that of Fm = 10 increases much more obviously. However, when the communication range R c is set as 100, RN grows fleetly with various DN in both scenarios. Figure 16a,b indicates that when the number of failed nodes increases, the number of relocated nodes grows greatly in dense networks. Again, since the recovery process may move many residual healthy nodes to replace the failed nodes and almost rebuild the network topology, it is unnecessary to restore the network connectivity if too many nodes fail simultaneously in the network. When Fm = 5, the probability that there are neighboring failed nodes is close to zero, so the number of relocated nodes is between 5 and 15 as shown in the Figure 16a the recovery process may move many residual healthy nodes to replace the failed nodes and almost rebuild the network topology, it is unnecessary to restore the network connectivity if too many nodes fail simultaneously in the network. When Fm = 5, the probability that there are neighboring failed nodes is close to zero, so the number of relocated nodes is between 5 and 15 as shown in the Figure 16a The average travel distance TD is also an important index for node failure restoration algorithm, and the results are shown in Figure 17a,b. With the increase of the communication range , the average travel distance of each node rises rapidly as the number of deployed nodes is fixed. However, the situation is different when the value of is fixed. The average distance that each node travels in the recovery process remains around 40 m as the number of deployed nodes increases. As mentioned above, the primary factor that affects the recovery is the distance between the nodes, i.e., the communication range of each node influences the relations of each node significantly. Thus, different values of make the results quite different from each other. Comparing Figure 11a,b with Figure 17a,b, it is easy to say that the average travel distance of each node in CCRA is less than that in CSFR-M, DARA and PADRA. This is because when CC-link is not useful to reestablish the connecting paths in CCRA, the suitable candidates that are nearest to the failed nodes will move to replace them as we have explained in Section 4.

Cooperative Communication Power Ratio (PR)
The biggest difference between CCRA and other repair algorithms is that CCRA (also CSFR-M and CSFR) use cooperative communication as the primary method to restore the network connectivity. As shown in Figure 18a,b, the cooperative communication power ratio of each involved node stays around 75%, no matter the value of changes or the number of deployed nodes is varied. Since the CCRA algorithm is restoring the network connectivity via cooperative communication primarily and moving nodes to replace the failed nodes when CC-link cannot be established, the experiment results demonstrate that the CCRA works steadily while building the The average travel distance TD is also an important index for node failure restoration algorithm, and the results are shown in Figure 17a,b. With the increase of the communication range R c , the average travel distance of each node rises rapidly as the number of deployed nodes is fixed. However, the situation is different when the value of R c is fixed. The average distance that each node travels in the recovery process remains around 40 m as the number of deployed nodes increases. As mentioned above, the primary factor that affects the recovery is the distance between the nodes, i.e., the communication range of each node influences the relations of each node significantly. Thus, different values of R c make the results quite different from each other. failed nodes is close to zero, so the number of relocated nodes is between 5 and 15 as shown in the Figure 16a The average travel distance TD is also an important index for node failure restoration algorithm, and the results are shown in Figure 17a,b. With the increase of the communication range , the average travel distance of each node rises rapidly as the number of deployed nodes is fixed. However, the situation is different when the value of is fixed. The average distance that each node travels in the recovery process remains around 40 m as the number of deployed nodes increases. As mentioned above, the primary factor that affects the recovery is the distance between the nodes, i.e., the communication range of each node influences the relations of each node significantly. Thus, different values of make the results quite different from each other. Comparing Figure 11a,b with Figure 17a,b, it is easy to say that the average travel distance of each node in CCRA is less than that in CSFR-M, DARA and PADRA. This is because when CC-link is not useful to reestablish the connecting paths in CCRA, the suitable candidates that are nearest to the failed nodes will move to replace them as we have explained in Section 4.

Cooperative Communication Power Ratio (PR)
The biggest difference between CCRA and other repair algorithms is that CCRA (also CSFR-M and CSFR) use cooperative communication as the primary method to restore the network connectivity. As shown in Figure 18a,b, the cooperative communication power ratio of each involved node stays around 75%, no matter the value of changes or the number of deployed nodes is varied. Since the CCRA algorithm is restoring the network connectivity via cooperative communication primarily and moving nodes to replace the failed nodes when CC-link cannot be established, the experiment results demonstrate that the CCRA works steadily while building the cooperative paths. Comparing Figure 11a,b with Figure 17a,b, it is easy to say that the average travel distance of each node in CCRA is less than that in CSFR-M, DARA and PADRA. This is because when CC-link is not useful to reestablish the connecting paths in CCRA, the suitable candidates that are nearest to the failed nodes will move to replace them as we have explained in Section 4.

Cooperative Communication Power Ratio (PR)
The biggest difference between CCRA and other repair algorithms is that CCRA (also CSFR-M and CSFR) use cooperative communication as the primary method to restore the network connectivity. As shown in Figure 18a,b, the cooperative communication power ratio of each involved node stays around 75%, no matter the value of R c changes or the number of deployed nodes is varied. Since the CCRA algorithm is restoring the network connectivity via cooperative communication primarily and moving nodes to replace the failed nodes when CC-link cannot be established, the experiment results demonstrate that the CCRA works steadily while building the cooperative paths. Considering the above factors comprehensively, the CCRA algorithm is efficient in handling the problems of multiple node failure with the combination of cooperative communication and node mobility.

Discussion
Recently, the application of WSNs in inhospitable environment has received growing interest. In all of these applications, human intervention is hard to implement and the WSNs work unattended. Since all nodes of the WSNs are deployed in such a harsh environment, the nodes are susceptible to failure, which may influence the quality of some services and even destroy the function of the whole network. In this paper, we have researched the problem of network connectivity recovery after the failure(s) of node and presented three algorithms: CSFR, CSFR-M and CCRA. CSFR is proposed to handle the single node failure problem with cooperative communication only and it may fail due to the sparse network topology. CSFR-M is an extension of CSFR to restore the network connectivity more effectively with node motion. Moreover, CCRA is focused on the network connectivity recovery from multiple node failures. Unlike the previous schemes mentioned in the literature, CSFR-M and CCRA algorithms trigger an extensive recovery on all kinds of node through cooperative communication primarily and assistant node motion.
The performance of CSFR-M and CCRA is validated via simulation analysis. The simulation results have confirmed the effectiveness of CSFR-M in single node failure restoration and demonstrated the CCRA's efficiency in term of multiple node failure recovery. CCRA simplifies the restoration of multiple node failures by network gridding and localizes the initial recovery in every grid with failed node(s). Additionally, the simulation results have indicated that CSFR-M and CCRA are both favorable in dense networks for which the CC-link can be established to decrease node movements.
CSFR-M can recover from arbitrary single node failure and CCRA can handle the problems of multiple node failure. The coverage loss due to the node failures has not been considered in this paper and it may be handled in our future research. In addition, the nodes are simply taken as failure nodes if any part of their functions is lost, similar to in previous literature, however, we may distinguish different failed components and make the best of the residual efficient parts for network topology reconfiguration in the future.