A Multi-Hop Energy Neutral Clustering Algorithm for Maximizing Network Information Gathering in Energy Harvesting Wireless Sensor Networks

Energy resource limitation is a severe problem in traditional wireless sensor networks (WSNs) because it restricts the lifetime of network. Recently, the emergence of energy harvesting techniques has brought with them the expectation to overcome this problem. In particular, it is possible for a sensor node with energy harvesting abilities to work perpetually in an Energy Neutral state. In this paper, a Multi-hop Energy Neutral Clustering (MENC) algorithm is proposed to construct the optimal multi-hop clustering architecture in energy harvesting WSNs, with the goal of achieving perpetual network operation. All cluster heads (CHs) in the network act as routers to transmit data to base station (BS) cooperatively by a multi-hop communication method. In addition, by analyzing the energy consumption of intra- and inter-cluster data transmission, we give the energy neutrality constraints. Under these constraints, every sensor node can work in an energy neutral state, which in turn provides perpetual network operation. Furthermore, the minimum network data transmission cycle is mathematically derived using convex optimization techniques while the network information gathering is maximal. Simulation results show that our protocol can achieve perpetual network operation, so that the consistent data delivery is guaranteed. In addition, substantial improvements on the performance of network throughput are also achieved as compared to the famous traditional clustering protocol LEACH and recent energy harvesting aware clustering protocols.


Introduction
A typical wireless sensor network (WSN) [1] consists of large numbers of low-power and cheap sensor nodes with limited sensing, processing and communication abilities. When randomly deployed in sensor field, these sensor nodes can automatically self-organize into an ad hoc network [2]. Wireless sensor networks (WSNs) are widely used in many domains, such as environmental monitoring [3], target tracking [4], security surveillance [5] and disaster management [6], for the purpose of information gathering about the diverse phenomena of interest.
WSNs usually are deployed in hostile or harsh environments and work in an unattended fashion [7]. In addition, most sensor nodes in the network are driven by battery which has finite stored energy. It is inconvenient to replace or recharge the battery once its energy is exhausted. Battery life is an extension to ENC is proposed to maximize the amount of information gathering, and the optimal number of clusters is mathematically derived using convex optimization. Simulation results show that ENC can provide perpetual network operation and maximize network information gathering. However, the multi-hop inter-cluster communication structure is not considered in this protocol.

Sensor and Network Model
Sensor nodes usually have limited communication ability in WSNs. When they are deployed into a large-scale sensor field, multiple BSs are necessary for the purpose of information gathering, as shown in Figure 1. In addition, for the sensor nodes distributed within a relatively smaller circular area covered by a BS, as shown in Figure 2, they can self-organize into a subnet to collect and transmit information independently. In this paper, we solve the clustering problem for such a subnet consisting of the EH-Sensors. In addition, these EH-Sensors are uniformly deployed in a circular sensor field with dense ρ. The BS is located at the center of this field. Let S denote the area of this field. We divide the circular sensor field into m concentric ring-based units with equal area S/m. CHs in different units cooperatively transmit data to BS by a multi-hop communication method during the data transmission process. Each CH in unit i (i > 1) selects a routing head from the CHs in unit i-1, and it only needs to transmit data to this routing head. Every CH in the first unit can transmit data to BS directly. addition, in [20], a distributed Energy Neutral Clustering (ENC) protocol is proposed for EH-WSNs, with the goals of achieving perpetual network operation and maximizing network information gathering. A CH group (CHG) mechanism is adopted to allow that several CHs in each cluster are selected to share the heavy traffic load. This CHG mechanism can prevent the excessive energy consumption of sensors which in turn holds the network-wide energy neutral state, and then the perpetual network operation is achieved. Furthermore, an extension to ENC is proposed to maximize the amount of information gathering, and the optimal number of clusters is mathematically derived using convex optimization. Simulation results show that ENC can provide perpetual network operation and maximize network information gathering. However, the multi-hop inter-cluster communication structure is not considered in this protocol.

Sensor and Network Model
Sensor nodes usually have limited communication ability in WSNs. When they are deployed into a large-scale sensor field, multiple BSs are necessary for the purpose of information gathering, as shown in Figure 1. In addition, for the sensor nodes distributed within a relatively smaller circular area covered by a BS, as shown in Figure 2, they can self-organize into a subnet to collect and transmit information independently. In this paper, we solve the clustering problem for such a subnet consisting of the EH-Sensors. In addition, these EH-Sensors are uniformly deployed in a circular sensor field with dense ρ. The BS is located at the center of this field. Let S denote the area of this field. We divide the circular sensor field into m concentric ring-based units with equal area S/m. CHs in different units cooperatively transmit data to BS by a multi-hop communication method during the data transmission process. Each CH in unit i (i > 1) selects a routing head from the CHs in unit i-1, and it only needs to transmit data to this routing head. Every CH in the first unit can transmit data to BS directly.  For the development of our protocol, we make several assumptions about the sensors as follows:  All EH-Sensors are homogeneous and have the same ability to harvest energy from the ambient environment;  Each EH-Sensor is stationary or nearly stationary after deployed in the sensor field;  Each EH-Sensor can estimate the distance to the transmitter according to the received signal For the development of our protocol, we make several assumptions about the sensors as follows: ‚ If an EH-Sensor serves as a CH, it compresses each member node's data with a fixed compression ratio a.

Radio Model
Making realistic modeling of radio wave propagation between low-power sensing devices is greatly challenging. Based on the previous discussions about the radio model [36][37][38][39], two modes are adopted to analyze this work: the free space model and two-ray ground propagation model [40]. The free space model assumes that in the ideal propagation condition, there is only one clear, unobstructed line-of-sight path between the transmitter and receiver. However, two-ray ground propagation model predicts path loss when the signal received consists of the line-of-sight component and multi path component formed predominately by a single ground reflected wave. Two-ray ground propagation model is reasonably accurate for predicting the large-scale signal strength over a long transmission distance. In addition, for a shorter transmission distance, free space model gives a better prediction. Then, a distance threshold d 0 is introduced to determine which model should be adopted. If the distance between the transmitter and receiver is shorter than this threshold, we adopt the free space model; otherwise, we adopt the two-ray ground propagation model.
At the transmitter, the energy spent to transmit a k-bit packet can be expressed as follows [38]: where d is the distance between the transmitter and receiver; E elec is the amount of energy spent for a one-bit packet by the transmitting or receiving circuit; ε f s and ε amp are the amplifier characteristic constants corresponding to the free-space propagation model and two-ray ground reflection model, respectively; and d 0 is the distance threshold which can be calculated as follows [39]: At the receiver, the amount of energy spent for receiving a k-bit packet can be calculated as follows [20]: where (1/a-1)E DA is the amount of energy spent to aggregate one bit of data and a is data compression ratio within the range of (0,1], when a = 1 means no compression. As concentric ring-based network model is adopted in this paper, we assume that each CH has a short inter-cluster data transmission distance and employs free-space propagation model for its inter-cluster communication.

Problem Statement
Achieving perpetual network operation and maximizing network information gathering are among the central concerns when designing clustering protocols for EH-WSNs. By guaranteeing that every EH-Sensor node can work in an energy neutral state, the perpetual network operation can be achieved. This is easy to study in single-hop cluster-based routing schemes, since each CH only needs to handle the intra-cluster traffic. However, it is more complex when adopting multi-hop communication model in clustered EH-WSNs because each CH has both intra-and inter-cluster traffic. Moreover, for the purpose of maximizing network information gathering, all CHs should independently transmit data to BS at their fastest rate in single-hop clustering. However, this is not suitable for multi-hop clustering since the data from the CHs farther from BS has to be relayed by the ones closer to BS. If CHs farther from BS have faster data transmission rates than the ones closer to BS, the data from these CHs can not be immediately forwarded by the ones closer to BS that may reduce the performance of real-time systems. Next, we theoretically analyze the problem of how to achieve perpetual network operation and maximum network information gathering in a multi-hop clustered EH-WSN.

Theoretical Analyses
In this section, through theory analysis about the energy consumption of nodes for data transmitting, we first give the energy neutrality constrains. Under these constraints, each node can work in an energy neutral state. In addition, then, we conclude a constraint formula of the number of clusters between neighbor units by balancing the average energy consumption of nodes in different units. At last, under the energy neutrality constrains, we optimize the parameters in our protocol for the purpose of maximizing network information gathering.

Energy Neutrality Constrains
We assume that each sensor node transmits its sensing data to the corresponding CH or BS periodically, and sensor nodes in different units may have different data transmission cycles. Let T i (1 ď i ď m) be the data transmission cycle of nodes in unit i, and q be the length of data packet transmitted to CH or BS by a node during each cycle. As aforementioned, to achieve perpetual network operation, each sensor node should work in an energy neutral state. That is, during each cycle, a sensor node should consume less energy than the total amount of energy which contains the accumulated residual energy before this cycle and the additional energy resource harvested from the ambient environment in the current cycle. Since CHs consume energy much faster than normal nodes, all sensor nodes within a cluster should take turns serving as the CH to distribute the energy consumption. Let n i (1 ď i ď m) be the average number of sensor nodes per cluster in unit i, and then each node in this unit can serve as the CH once every n i cycles. To maintain every sensor node in unit i in an energy neutral state, the energy consumption of nodes should satisfy the following constraints: Ech i ď n i T i P e´p n i´1 q Ecm i pi " 1, 2,¨¨¨, mq where Ecm i is the average amount of energy consumed by a non-cluster head node in unit i to transmit data to its corresponding CH; Ech i is the average amount of energy consumed by a CH in unit i to receive data from its member nodes, aggregate this with its own data and transmit the aggregated data to its routing head or BS; P e is the energy harvesting rate; T i is the data transmission cycle of the nodes in unit i; and n i is the average number of sensor nodes per cluster in unit i. Equation (4) guarantees that a normal node consumes less energy than the amount of energy it can harvest during each cycle. Equation (5) guarantees that a CH consumes less energy than the total amount of energy including the accumulated residual energy before the current cycle and the energy harvested during this cycle.
Ecm i and Ech i can be computed by the following equations respectively: where q is data packet length of each node; r i is the average distance between a non-cluster head node and the CH within a cluster in unit i; ρ is node density; m is the number of units; S is the sensor field size; a is data aggregation ratio; c i is the number of clusters in unit i; n i is average number of sensor nodes per cluster in unit i; and d i is the average distance between a CH in unit i and its routing head or BS. On the right-hand side of Equation (7), the first term is the average energy consumed by a CH in unit i to receive and forward the inter-cluster data traffic. In addition, the rest three terms are the average energy consumed by a CH in unit i to receive, aggregate and forward the intra-cluster data traffic, respectively.
As the average number of sensor nodes per cluster in unit i (1 ď i ď m) is n i , and ρ is node density, then the average cluster radius of a CH in unit i can be calculated by and by adopting the same method in [41], we can calculate r i as follows: where ρ is node density; n i is average number of sensor nodes per cluster in unit i; and n i /ρ represents the average area covered by a cluster in unit i. The number c i of clusters in unit i can be expressed by the average number n i of sensor nodes per cluster in this unit, which is shown as follows: The average distance d i between a CH in unit i and its routing head or BS can be calculated by where d iÑBS is the average distance between nodes in unit i and BS, which can be calculated as follows: where m is the number of units; S is sensor field size; and R i is the outer radius of unit i which is computed as follows: And then we can recalculate d iÑBS as follows: For a CH in unit i, it consumes much more energy than a normal node in the same unit. That is, Ecm i < Ech i . Thus, Equation (4) is redundant and Equation (5) can effectively restrict each node to work in an energy neutral state. In addition, based on Equation (5), the energy neutrality constraint for the sensor nodes in unit i can be re-expressed as follows: Let Eav i represent the average energy consumption of nodes per cycle in unit i, and it can be expressed as follows: Then Equation (15) can be simplified as follows: To maximize the network information gathering, nodes in each unit should transmit data to BS at the fastest rate. That is, the data transmission cycle should be minimized. As a result, the energy neutrality constrain for the nodes in unit i can be updated as follows:

Balancing the Average Energy Consumption of Nodes in Different Units
For any sensor node in unit i (1 ď i ď m), if Equation (18) is satisfied, this node can work in an energy neutral state. Then we explore how to maximize network information gathering under the energy neutrality constraint. Assuming that the information can be continuously collected by each sensor node in the network, then the following question is to determine the information transmission cycle of a sensor node based on its available energy.
For a CH in the unit closer to BS, once received data from the CH farther from BS, it should immediately forward this data if it has enough energy, otherwise, it discards this data. This is reasonable in terms of the following two aspects: First, real-time transmission of data is more meaningful than non-real-time transmission; Second, as sensor nodes in the unit closer to BS have heavier relay traffic and consume energy much faster than the ones in the unit farther from BS, then if a CH has not enough energy to forward the received data, it still has not enough energy to forward this data over time because it should give priority to forward the newly received data after it harvested some energy. In addition, then, if T j < T i (j > i), packet loss problem will occur since CHs in the unit closer to BS have lower data transmission rate and can not timely forward data from the CHs in the unit farther from BS. In addition, if T j > T i (j > i), the network information gathering can not be maximized as the nodes in the unit farther from BS have the potential to enhance their data transmission rate. Thus, to solve these problems, the constraint T i = T j (i " j) should be satisfied. In addition, according to Equation (18), the average energy consumption of nodes in different units should be balanced to guarantee this constraint. That is where Eav i and Eav i+1 are the average energy consumption of nodes per cycle in unit i and i + 1, respectively. Combining Equations (6), (7), (9), (10), (16) and (19), we havê where c i and c i+1 are the number of clusters in unit i and i + 1; and parameters ϕ, ψ and ω are shown respectively as follows: where m is the number of units in the network; q is data packet length of each node; ρ is node density; S is sensor field size; a is data aggregation ratio and d i is the average distance between a CH in unit i and its routing head or BS. According to Equations (11) and (14), we can have which implies that the CHs in the unit closer to BS have longer inter-cluster communication distance. Thus, we have ω> 0 according to Equation (23). In addition, based on Equations (20´22), we have which shows that for the purpose of balancing the average energy consumption of nodes in different units, clusters in the unit closer to BS should strictly more than that in the unit farther from BS. As a result, if Equation (20) is met, the average energy consumption of nodes in different units can be balanced. In addition, then the constraint T i = T j (i " j) can be satisfied. Let T denote the network data transmission cycle, then we have According to Equations (11), (14) and (21)- (23), to calculate ϕ, ψ and ω in Equation (20), the number of units m should be determined. In addition, we will find the optimal value of m in the following subsection.

Maximizing Network Information Gathering
When designing clustering protocols for EH-WSNs, another main objective is to maximize network information gathering, which needs to minimize the network data transmission cycle T. According to Equations (18) and (26), we can realize this objective by minimizing the average energy consumption Eav 1 of nodes in the first unit. Combining Equations (6), (7), (9)-(11), (14) and (16), Eav 1 is calculated as follows: where q is data packet length of each node; m is the number of units; ρ is node density; S is sensor field size; c 1 is the number of clusters in the first unit; a is data aggregation ratio; and ∆ is a constant to m and c 1, and it is expressed as follows: According to Equation (27), the derivative of Eav 1 with respect to c 1 can be calculated by the following equation: From Equation (29), we find d(Eav 1 )/dc 1 < 0, which means c 1 should be maximized to minimize Eav 1 . In addition, when all sensor nodes in the first unit are selected as CHs, c 1 reaches the maximum value which can be expressed as follows: where S is sensor field size; ρ is node density and m is the number of units. According to Equations (27) and (30), we can find d(Eav 1 )/dm > 0, which means the number of units m should be minimized to minimize Eav 1 .
As aforementioned, we adopt the free-space propagation model for the inter-cluster communication. Then the maximum inter-cluster data transmission distance d max should meet the following constraint: where d 0 is the distance threshold that calculated by Equation (2). We define unit size as the difference between the outer radius and inner radius of this unit. In addition, for any unit i (1 ď i ď m), its inner radius is equal to the outer radius of unit i -1, then its size Ru i can be expressed as follows: where R i is the outer radius of unit i (R 0 = 0). Combining Equations (13) and (32), the size Ru i of unit i (1 ď i ď m) can be recalculated as follows: where S is sensor field size and m is the number of units. From Equation (33), we can find that the unit closer to BS has larger size than the one father from BS. For a CH in unit i (1 < i ď m), we assume it has the tendency to select its routing head from the CHs in unit i-1 toward the direction of BS. Then for the CHs in unit i (1 ď i ď m), the one nearest the outer edge may have the longest inter-cluster data transmission distance d piq max , which is estimated as follows: where Ru i´1 and Ru i are the size of unit i -1 and i (Ru 0 = 0). The maximum inter-cluster data transmission distance d max can be expressed as follows: Combing Equations (33)-(35), we can recalculate d max as follows: where S is sensor field size and m is the number of units. And based on Equations (2), (31) and (36), we have As aforementioned, for the purpose of minimizing Eav 1 , the optimal value of m should be equal to its minimum value, which is expressed as follows: As a result, the number of clusters in each unit can be determined based on Equations (11), (14), (20)-(23), (30) and (38). In addition, if Equations (18), (26´28), (30) and (38) are satisfied simultaneously, the network information gathering is maximized since the network data transmission cycle T reaches the minimum value which is expressed as follows: where m is the optimal number of units; a is data aggregation ratio; q is data packet length of each node; S is sensor field size and P e is energy harvesting rate.
Note that, only the number c 1 of clusters in the first unit is the optimal value since it is obtained by minimizing the average energy consumption Eav 1 of nodes in this unit. In addition, the numbers of clusters in other units are acquired by balancing the average energy consumption of nodes in different units. Thus, if nodes in each unit transmit data based on the minimum network data transmission cycle T calculated by Equation (39), only the data transmission rate of nodes in the first unit is maximized.

The Detail of Our Protocol
In this section, we will describe the detail of our protocol. We call it Multi-hop Energy Neutral Clustering (MENC). As shown in Figure 3, the procedure of MENC consists of a one-time initialization phase and many repeated rounds that can be further divided into topology formation phase and steady-state phase. The duration of each round is just the minimum network data transmission cycle T. During each round, sensor nodes in every unit will be re-grouped into several clusters. In addition, a cluster consists of one CH and some cluster member nodes. Since a cluster member node consumes less energy than it can harvest during each round, its accumulated residual energy will reach a certain threshold after several rounds (energy accumulation cycle), and then it has the eligibility to be selected as a CH. Since the units closer to BS have more clusters, the nodes in these units have shorter energy accumulation cycle and smaller energy threshold. For any node in unit i (1 ď i ď m), its energy accumulation cycle is n i¨T , since it serves as the CH once every n i rounds. In addition, based on Equations (16), (18) and (26), its energy threshold ETh i can be calculated as follows: where Ech i is the average energy consumed by a node in unit i when serving as a CH; n i is the average number of sensor nodes per cluster in unit i; T is the minimum network data transmission cycle; P e is energy harvesting rate; and Ecm i is the average energy consumed by a non-cluster head node in unit i to transmit data to its corresponding CH. In this section, we will describe the detail of our protocol. We call it Multi-hop Energy Neutral Clustering (MENC). As shown in Figure 3, the procedure of MENC consists of a one-time initialization phase and many repeated rounds that can be further divided into topology formation phase and steady-state phase. The duration of each round is just the minimum network data transmission cycle T. During each round, sensor nodes in every unit will be re-grouped into several clusters. In addition, a cluster consists of one CH and some cluster member nodes. Since a cluster member node consumes less energy than it can harvest during each round, its accumulated residual energy will reach a certain threshold after several rounds (energy accumulation cycle), and then it has the eligibility to be selected as a CH. Since the units closer to BS have more clusters, the nodes in these units have shorter energy accumulation cycle and smaller energy threshold. For any node in unit i (1 ≤ i ≤ m), its energy accumulation cycle is ni· T, since it serves as the CH once every ni rounds. In addition, based on Equations (16), (18) and (26), its energy threshold EThi can be calculated as follows: where Echi is the average energy consumed by a node in unit i when serving as a CH; ni is the average number of sensor nodes per cluster in unit i; T is the minimum network data transmission cycle; Pe is energy harvesting rate; and Ecmi is the average energy consumed by a non-cluster head node in unit i to transmit data to its corresponding CH.  ≤ m). A sensor node only receives the message broadcasted by BS with the lowest power level and determines which unit it belongs to based on this message. In addition, this node also estimates its distance to BS according to the received signal strength. CHs in each unit will be selected in the topology formation phase, and each one in unit i (1 < i ≤ m) will select a routing head from the CHs in unit i -1. All CHs act as a routing backbone and cooperatively transmit data to BS. Based on this  (1 ď i ď m). A sensor node only receives the message broadcasted by BS with the lowest power level and determines which unit it belongs to based on this message. In addition, this node also estimates its distance to BS according to the received signal strength. CHs in each unit will be selected in the topology formation phase, and each one in unit i (1 < i ď m) will select a routing head from the CHs in unit i -1. All CHs act as a routing backbone and cooperatively transmit data to BS. Based on this backbone, each sensor node will begin to transmit its sensing data to BS during the steady-state phase. To avoid the conflict of signals on the wireless channel, each CH creates an individual time division multiple access (TDMA) schedule for its cluster members. Every cluster member node transmits its sensing data to the CH in its own time slot and goes dormant during other time slots. Once a CH has received the data from its member nodes, it aggregates this with its own data by adopting data compression techniques and then transmits the data to its routing head or BS using a randomly selected code division multiple access (CDMA) code.
In the rest of this section, we will give the detail of topology formation phase which includes clusters formation and routing heads selection. In addition, the pseudo code of the topology formation phase is shown in Figure 4. role ← tentativeCH 4. broadcast "tentative CH" message within radius Rc

Clusters Formation
To construct the cluster topology, four steps are performed sequentially. In addition, they are tentative CHs selection, final CHs contention, tentative clusters formation and final clusters formation.
Our CH selection mechanism is similar to that adopted in LEACH. At the beginning of each round, a sensor node which has the eligibility to be a CH randomly chooses a number within range [0, 1]. In addition, if this number is less than a threshold, this node will serve as a CH in the current round. The threshold Thi,j for any sensor node j in unit i can be expressed as follows:

Clusters Formation
To construct the cluster topology, four steps are performed sequentially. In addition, they are tentative CHs selection, final CHs contention, tentative clusters formation and final clusters formation.
Our CH selection mechanism is similar to that adopted in LEACH. At the beginning of each round, a sensor node which has the eligibility to be a CH randomly chooses a number within range [0,1]. In addition, if this number is less than a threshold, this node will serve as a CH in the current round. The threshold Th i,j for any sensor node j in unit i can be expressed as follows: where p i is the desired percentage of CHs in unit i; r is the current round and G is the set that consists of the sensor nodes which have eligibility to be CHs. If node j meets the following two conditions, it can be classified into set G: First, it has not been a CH during the last 1/p i rounds; Second, its accumulated energy is more than the energy threshold ETh i . Since a sensor node in unit i serves as a CH once every n i rounds, then p i can be calculated by For a CH in unit i, its cluster radius is Rc i which is calculated by Equation (8). To avoid the case that two CHs in the same unit may be within each other's cluster radius, these nodes successfully selected as CHs keep the CH state tentatively (tentative CH) and will further compete for the final CHs. Each tentative CH first broadcasts "tentative CH" message (including node ID, unit number and CH status) within the corresponding cluster radius. In addition, then, by receiving this kind of messages broadcasted by other CHs in the same unit and storing these messages into set S t , each tentative CH knows its neighbor CHs within its cluster radius. If a tentative CH has no neighbor CHs, it will immediately announce itself to be a final CH by broadcasting the "final CH" message (including node ID, unit number and CH status) within its cluster radius. In addition, then, its neighbor sensor nodes know its successful election by receiving this message and storing it into set S f . If a tentative CH has several neighbor CHs, it will announce itself to be a final CH in a randomly selected time slot. In addition, once hearing the announcement, its neighbor CHs will abandon the declaration to be a final CH and return to the normal state.
After all final CHs are determined, each normal sensor node will choose a CH in the same unit to join cluster. If a normal node has received several "final CH" messages, it randomly selects one CH to join tentative cluster; Otherwise, this normal node (isolate node) will be given another chance to compete for the CH. If the residual energy of an isolate node is more than the corresponding energy threshold, this node will broadcast a "final CH" message in a randomly selected time slot. In addition, an isolate node which has received one or more "final CH" messages before it broadcasts this kind of message, it will give up broadcasting the "final CH" message and randomly select one message broadcaster to join tentative cluster; otherwise, it will be a final CH.
For a tentative cluster in unit i, it becomes the final cluster automatically if there are no more than n i sensor nodes in it. However, if this cluster contains more than n i sensor nodes, the corresponding CH may have no enough energy to burden the intra-cluster data traffic. To avoid this problem, such a tentative cluster should be divided into several final clusters with no more than n i sensor nodes in each of them. For the CH in this tentative cluster, firstly, it selects enough additional CHs (we refer to them as the newly selected CHs) with more residual energy from the normal nodes within the same cluster; Secondly, it chooses the nearest n i -1 normal nodes as its final member nodes; Thirdly, it randomly selects less than n i normal nodes as cluster member nodes for each newly selected CH; At last, it broadcasts a message including the role (normal node or newly selected CH) of each node and the corresponding CH of each normal node within the cluster.
Then after all final clusters are formed, the CHs in these clusters will select their own routing heads using the scheme described below.

Routing Heads Selection
Since CHs in the first unit are very close to BS, they transmit data to BS directly. As shown in Equation (25), the units closer to BS have more CHs than the ones farther from BS. Let m i = 2ˆrc i´1 /c i s (1 < i ď m). For any CH j in unit i (expressed as ch i,j ), it randomly selects m i CHs from set Sch i´1 as its routing head candidates and stores them into set S hc . In addition, the candidate with the most residual energy will be selected as its final routing head. This random candidate selection mechanism can mitigate the unbalance of inter-cluster traffic. The set Sch i´1 (1 < i ď m) is expressed as follows: where dis(ch i,j , ch i´1,k ) is the distance between CH ch i,j and ch i´1,k ; dis(ch i,j , BS) is the distance from CH ch i,j to BS and d i´2ÑBS is the average distance from the nodes in unit i-2 to BS (d 0ÑBS = 0).

Performance Evaluations
In this section, we will evaluate the performance of our Multi-hop Energy Neutral Clustering (MENC) algorithm via simulations performed on MATLAB platform. We first give the performance comparison among energy harvesting aware routing protocols mentioned above, as shown in Table 1.
In addition, then we evaluate the performance of our protocol by comparing it with the most related ones.
Since we are devoted to constructing the optimal multi-hop clustering architecture for EH-WSNs in this paper, we will evaluate the performance of our protocol by comparing it with all the energy harvesting aware clustering protocols mentioned in Table 1. In addition, the CH selection mechanism in our protocol is related to that in LEACH, we will evaluate the performance of MENC by comparing it with LEACH as well. As the protocol proposed in [29] is not named, we call it P-29 for simplicity.

Parameters Setup
The parameters used throughout the simulations are listed in Table 2. Since all sensor nodes in the first unit act as CHs and transmit data to BS directly in MENC, let sensor nodes in the first unit in LEACH, AEHAC, EP-LEACH, ENC and P-29 adopt the same method. Besides, the probability P C of a node (not in the first unit) to be selected as a CH in LEACH is computed by where m is the optimal number of units in MENC; S is the area of sensor field; c i is the number of clusters in unit i and n is the total number of sensor nodes in the network. Equation (44) indicates that the probability of a node to be selected as a CH in LEACH is equal to the average probability of a node to be CH in MENC.
The probability of a node to be a Center Node (CN) in ENC is also computed by Equation (44), to keep consistency of the number of clusters between MENC and ENC. In addition, the initial probability of a node to be the CH in AEHAC is calculated by Equation (44) in the following simulations. Moreover, we set the number of clusters N C predefined in P-29 to be the total number of clusters in MENC.

Simulation Results
In this section, we simulate a total of 500 rounds of data transmission. In addition, we suppose that before transmitting data to the destination in each round, every CH checks whether it has enough energy according to the distance to the destination. If a CH has no enough energy for data transmission, a cluster failure will occur and this CH will discard the data packet in the current round. In addition, as the CH in each cluster has a dedicated relay node (RN) in protocol P-29, cluster failure means a RN has no enough energy to relay data. We adopt four metrics to evaluate the performance of MENC, including Average Cluster Failure Times per Round (ACFTR), Average Network Throughput per Round (ANTR), Average Network Throughout per Second (ANTS) and Average Energy Consumption per Round (AECR). Here we define ANTR as the average number of data packets successfully collected by BS per round, and ANTS is the average number of data packets successfully collected by BS per second.
We first evaluate the performance of MENC under the case that sensor nodes are uniformly deployed in a circular sensor field with area 4ˆ10 4 m 2 , and the data transmission cycle ranges from 0.4¨T to 2¨T. Here T is the minimum network data transmission cycle calculated by Equation (39). As node density ρ is 1 node/100 m 2 , the total number of sensor nodes is 400. In addition, according to Equation (38), the number of units is 4. Figure 5 gives the comparison of Average Cluster Failure Times per Round (ACFTR) among the six clustering protocols for different data transmission cycles. This figure shows that with the increase of data transmission cycle, the ACFTR of all the six protocols decrease until to the minimum value 0. This is because that each CH or RN has more time to harvest energy from the ambient environment with the increase of data transmission cycle. In addition, this figure also shows that MENC outperforms other five protocols in terms of ACFTR. This owes to the energy neutrality constraint considered in MENC. Under this constraint, a sensor node is not eligible to be CH until it harvests enough energy from the environment, and the number of sensor nodes within a cluster is not beyond the expectation of the CH. The protocol ENC adopts a Cluster Head Group (CHG) mechanism to allow that several sensor nodes serve as the CH in turn within a cluster to share the traffic load. It has smaller ACFTR than EP-LEACH, AEHAC, P-29 and LEACH. However, a sensor node with less energy may be selected as a member of CHG, which increases the probability of cluster failure. Then ENC has poorer performance than MENC in terms of ACFTR. With the increase of data transmission cycle, EP-LEACH firstly has bigger ACFTR than P-29, and then it has smaller ACFTR than P-29. This is because that the protocol P-29 selects a dedicated RN for the CH within each cluster so that the traffic load within a cluster is shared by RN and CH. Then P-29 has smaller probability of cluster failure than EP-LEACH when sensor nodes have not enough time to harvest energy from the environment. In addition, when sensor nodes have more time to harvest energy, EP-LEACH has less ACFTR than P-29 since sensor nodes with more available energy have bigger probability to be the CH in EP-LEACH. In AEHAC, an energy threshold is introduced to ensure that sensor nodes with too little energy switch to sleep mode for saving energy. Moreover, the sensor nodes with more available energy have bigger probability to be the CH. Thus, AEHAC has smaller ACFTR than EP-LEACH and P-29. mechanism to allow that several sensor nodes serve as the CH in turn within a cluster to share the traffic load. It has smaller ACFTR than EP-LEACH, AEHAC, P-29 and LEACH. However, a sensor node with less energy may be selected as a member of CHG, which increases the probability of cluster failure. Then ENC has poorer performance than MENC in terms of ACFTR. With the increase of data transmission cycle, EP-LEACH firstly has bigger ACFTR than P-29, and then it has smaller ACFTR than P-29. This is because that the protocol P-29 selects a dedicated RN for the CH within each cluster so that the traffic load within a cluster is shared by RN and CH. Then P-29 has smaller probability of cluster failure than EP-LEACH when sensor nodes have not enough time to harvest energy from the environment. In addition, when sensor nodes have more time to harvest energy, EP-LEACH has less ACFTR than P-29 since sensor nodes with more available energy have bigger probability to be the CH in EP-LEACH. In AEHAC, an energy threshold is introduced to ensure that sensor nodes with too little energy switch to sleep mode for saving energy. Moreover, the sensor nodes with more available energy have bigger probability to be the CH. Thus, AEHAC has smaller ACFTR than EP-LEACH and P-29. The comparison of Average Network Throughput per Round (ANTR) among the six clustering protocols for different data transmission cycles is given in Figure 6. From this figure, we can see that with the increase of data transmission cycle, the ANTR of all the six protocols increase until to the maximum value. This is because the ACFTR decreases with the increase of data transmission cycle for the six protocols. In addition, we can also find that before the ANTR of the six protocols reach the maximum value, MENC has bigger ANTR than other five protocols. This is because that MENC has the smallest ACFTR compared with ENC, EP-LEACH, AEHAC, P-29 and LEACH. In addition, MENC has higher energy efficiency than these protocols since it adopts multi-hop communication method for inter-cluster data transmission. The comparison of Average Network Throughput per Round (ANTR) among the six clustering protocols for different data transmission cycles is given in Figure 6. From this figure, we can see that with the increase of data transmission cycle, the ANTR of all the six protocols increase until to the maximum value. This is because the ACFTR decreases with the increase of data transmission cycle for the six protocols. In addition, we can also find that before the ANTR of the six protocols reach the maximum value, MENC has bigger ANTR than other five protocols. This is because that MENC has the smallest ACFTR compared with ENC, EP-LEACH, AEHAC, P-29 and LEACH. In addition, MENC has higher energy efficiency than these protocols since it adopts multi-hop communication method for inter-cluster data transmission. The comparison of Average Network Throughput per Second (ANTS) among the six clustering protocols for different data transmission cycles is shown in Figure 7. From this figure, we can see that with the increase of data transmission cycle, the ANTS of MENC, ENC EP-LEACH, AEHAC and P-29 decrease, and that of LEACH increases firstly and then decreases. This phenomenon can be explained as follows: with the increase of data transmission cycle, the ANTR increases for all the six protocols. In addition, LEACH has faster increase rate of ANTR than that of data transmission cycle when data transmission cycle is smaller than 0.8 T. This figure also shows that MENC has bigger ANTS than other protocols, and the difference of ANTS among the six protocols tends to be zero when data transmission cycle increases. This is because MENC has bigger ANTR than other protocols, and the ANTR tends to be the maximum value for the six protocols with the increase of data transmission cycle. The comparison of Average Network Throughput per Second (ANTS) among the six clustering protocols for different data transmission cycles is shown in Figure 7. From this figure, we can see that with the increase of data transmission cycle, the ANTS of MENC, ENC EP-LEACH, AEHAC and P-29 decrease, and that of LEACH increases firstly and then decreases. This phenomenon can be explained as follows: with the increase of data transmission cycle, the ANTR increases for all the six protocols. In addition, LEACH has faster increase rate of ANTR than that of data transmission cycle when data transmission cycle is smaller than 0.8 T. This figure also shows that MENC has bigger ANTS than other protocols, and the difference of ANTS among the six protocols tends to be zero when data transmission cycle increases. This is because MENC has bigger ANTR than other protocols, and the ANTR tends to be the maximum value for the six protocols with the increase of data transmission cycle. different data transmission cycles.
The comparison of Average Network Throughput per Second (ANTS) among the six clustering protocols for different data transmission cycles is shown in Figure 7. From this figure, we can see that with the increase of data transmission cycle, the ANTS of MENC, ENC EP-LEACH, AEHAC and P-29 decrease, and that of LEACH increases firstly and then decreases. This phenomenon can be explained as follows: with the increase of data transmission cycle, the ANTR increases for all the six protocols. In addition, LEACH has faster increase rate of ANTR than that of data transmission cycle when data transmission cycle is smaller than 0.8 T. This figure also shows that MENC has bigger ANTS than other protocols, and the difference of ANTS among the six protocols tends to be zero when data transmission cycle increases. This is because MENC has bigger ANTR than other protocols, and the ANTR tends to be the maximum value for the six protocols with the increase of data transmission cycle. Based on the above results, we can conclude that T is the optimal data transmission cycle in MENC. This is because that on the one hand, the network-wide energy neutral state can be well guaranteed when data transmission cycle is equal to or bigger than T, since the ACFTR is nearly zero and the ANTR is nearly maximal according to Figures 5 and 6; On the other hand, as the ANTS decreases rapidly with the increase of data transmission cycle, the ANTS can be maximized on the premise of guaranteeing perpetual network operation when data transmission cycle is equal to T. Based on the above results, we can conclude that T is the optimal data transmission cycle in MENC. This is because that on the one hand, the network-wide energy neutral state can be well guaranteed when data transmission cycle is equal to or bigger than T, since the ACFTR is nearly zero and the ANTR is nearly maximal according to Figures 5 and 6; On the other hand, as the ANTS decreases rapidly with the increase of data transmission cycle, the ANTS can be maximized on the premise of guaranteeing perpetual network operation when data transmission cycle is equal to T.
In order to observe the energy consumption of nodes in the network, we give the Average Energy Consumption per Round (AECR) for different data transmission cycles, as shown in Figure 8. From this figure, we can find that the AECR of the six protocols have a downward trend with the increase of data transmission cycle. This is because more energy can be harvested by each node with the increase of data transmission cycle. When data transmission is smaller than 0.8 T, MENC has bigger AECR than other five protocols since it has fewer cluster failure times. In addition, according to Figures 5, 6 and 8 we can conclude that MENC has the highest energy efficiency among the six protocols. This can be explained as follows: when data transmission cycle is bigger than 1.2 T, MENC has smaller AECR than other five protocols and nearly the same ACFTR and ANTR with other protocols; In addition, a node has higher energy efficiency if it consumes less energy for transmitting the same amount of data. Since multi-hop communication method is adopted in MENC and other five protocols adopt the single-hop one, this conclusion is also in accordance with the case that multi-hop communication has higher energy efficiency than single-hop one. In order to observe the energy consumption of nodes in the network, we give the Average Energy Consumption per Round (AECR) for different data transmission cycles, as shown in Figure 8. From this figure, we can find that the AECR of the six protocols have a downward trend with the increase of data transmission cycle. This is because more energy can be harvested by each node with the increase of data transmission cycle. When data transmission is smaller than 0.8 T, MENC has bigger AECR than other five protocols since it has fewer cluster failure times. In addition, according to Figures 5, 6 and 8, we can conclude that MENC has the highest energy efficiency among the six protocols. This can be explained as follows: when data transmission cycle is bigger than 1.2 T, MENC has smaller AECR than other five protocols and nearly the same ACFTR and ANTR with other protocols; In addition, a node has higher energy efficiency if it consumes less energy for transmitting the same amount of data. Since multi-hop communication method is adopted in MENC and other five protocols adopt the single-hop one, this conclusion is also in accordance with the case that multi-hop communication has higher energy efficiency than single-hop one. In the following simulations, we evaluate the performance of MENC under the case that the sensor field size S ranges from 2 × 10 4 m 2 to 6 × 10 4 m 2 , and the data transmission cycle is equal to T which is calculated by Equation (39).
The comparison of ACFTR among the six clustering protocols for different sensor field sizes is given in Figure 9. This figure shows that the ACFTR of MENC is bigger than other five protocols when S is equal to 2 × 10 4 m 2 , and it tends to be zero when S increases. This is because multi-hop In the following simulations, we evaluate the performance of MENC under the case that the sensor field size S ranges from 2ˆ10 4 m 2 to 6ˆ10 4 m 2 , and the data transmission cycle is equal to T which is calculated by Equation (39).
The comparison of ACFTR among the six clustering protocols for different sensor field sizes is given in Figure 9. This figure shows that the ACFTR of MENC is bigger than other five protocols when S is equal to 2ˆ10 4 m 2 , and it tends to be zero when S increases. This is because multi-hop communication method is adopted in MENC. In addition, MENC can work well in large-scale networks as sensor nodes need not to transmit data to BS directly for a long distance. However, when the network size is small, data from the nodes farther from BS still has to be relayed by the ones closer to BS that may result in performance degradation of the protocol. From this figure, we can also find that the ACFTR of ENC, EP-LEACH, AEHAC, P-29 and LEACH increase with the increase of sensor field size S. This is because single-hop communication model is adopted by these five protocols. In addition, with the increase of S, CHs or RNs have longer data transmission distance to BS which increases the probability of cluster failure.  Figure 10 gives the comparison of ANTR among the six clustering protocols for different sensor field sizes. The ANTR of all the six protocols increase with the increase of sensor field size S. This is because the total number of sensor nodes is proportional to S. More data can be collected and transmitted to BS with the increase of S. In addition, when S is equal to 2 × 10 4 m 2 , MENC has smaller ANTR than other five protocols since its ACFTR is the smallest among these protocols.

Conclusions
In this paper, we present a Multi-hop Energy Neutral Clustering (MENC) algorithm to construct the optimal multi-hop clustering architecture in EH-WSNs, for the purpose of achieving perpetual network operation and maximizing network information gathering. In MENC, the sensor field is divided into several units with equal size, and sensor nodes in each unit are grouped into some clusters. By analyzing the energy consumption of intra-and inter-cluster data transmission, we give the energy neutrality constraints which guarantee that each sensor node can work perpetually with consistent data delivery. Furthermore, to maximize network information gathering, we  Figure 10 gives the comparison of ANTR among the six clustering protocols for different sensor field sizes. The ANTR of all the six protocols increase with the increase of sensor field size S. This is because the total number of sensor nodes is proportional to S. More data can be collected and transmitted to BS with the increase of S. In addition, when S is equal to 2ˆ10 4 m 2 , MENC has smaller ANTR than other five protocols since its ACFTR is the smallest among these protocols.  Figure 10 gives the comparison of ANTR among the six clustering protocols for different sensor field sizes. The ANTR of all the six protocols increase with the increase of sensor field size S. This is because the total number of sensor nodes is proportional to S. More data can be collected and transmitted to BS with the increase of S. In addition, when S is equal to 2 × 10 4 m 2 , MENC has smaller ANTR than other five protocols since its ACFTR is the smallest among these protocols.

Conclusions
In this paper, we present a Multi-hop Energy Neutral Clustering (MENC) algorithm to construct the optimal multi-hop clustering architecture in EH-WSNs, for the purpose of achieving perpetual network operation and maximizing network information gathering. In MENC, the sensor

Conclusions
In this paper, we present a Multi-hop Energy Neutral Clustering (MENC) algorithm to construct the optimal multi-hop clustering architecture in EH-WSNs, for the purpose of achieving perpetual network operation and maximizing network information gathering. In MENC, the sensor field is divided into several units with equal size, and sensor nodes in each unit are grouped into some clusters. By analyzing the energy consumption of intra-and inter-cluster data transmission, we give the energy neutrality constraints which guarantee that each sensor node can work perpetually with consistent data delivery. Furthermore, to maximize network information gathering, we optimize the parameters appeared in our proposed protocol using convex optimization techniques, and these parameters include the optimal number of units, number of clusters in each unit and minimum network data transmission cycle. Extensive simulation results verify that MENC can avoid the case of cluster failures and guarantee the perpetual operation of the network. In addition, compared to LEACH and recent energy harvesting aware clustering protocols, MENC is more energy-efficient since it can provide larger network throughput.