DCPVP: Distributed Clustering Protocol Using Voting and Priority for Wireless Sensor Networks

This paper presents a new clustering protocol for designing energy-efficient hierarchical wireless sensor networks (WSNs) by dividing the distributed sensor network into virtual sensor groups to satisfy the scalability and prolong the network lifetime in large-scale applications. The proposed approach is a distributed clustering protocol called DCPVP, which is based on voting and priority ideas. In the DCPVP protocol, the size of clusters is based on the distance of nodes from the data link such as base station (BS) and the local node density. The cluster heads are elected based on the mean distance from neighbors, remaining energy and the times of being elected as cluster head. The performance of the DCPVP protocol is compared with some well-known clustering protocols in literature such as the LEACH, HEED, WCA, GCMRA and TCAC protocols. The simulation results confirm that the prioritizing- and voting-based election ideas decrease the construction time and the energy consumption of clustering progress in sensor networks and consequently improve the lifetime of networks with limited resources and battery powered nodes in harsh and inaccessible environments.


The LEACH Protocol
The Low-Energy Adaptive Clustering Hierarchy (LEACH) protocol [19] proposed by Heinzelman et al., is one of the basic clustering and routing protocols in WSNs and is used by many subsequent clustering and routing protocols. The main idea of LEACH is to select cluster heads by rotation and the high energy consumption for communicating with the BS is spreads among all the nodes.
The operation of LEACH consists of rounds, and each round consists of two phases; the set-up phase and the steady-state phase. In the set-up phase the clusters are formed and in the steady-state phase data is delivered to the BS. In the set-up phase, each node decides to become a CH or not for the current round. The decision is based on the suggested percentage of CHs for the network and the times of being CH so far. The node generates a random number between 0 and 1, then it becomes a CH for the current round if the number is lower than the threshold T(i), as follows: where P is the desired percentage of CHs, r is the number of current round, and G is the set of nodes that have not been elected as CHs in the last 1/P rounds [16]. When a node is elected as CH, it broadcasts an advertisement packet. According to the received signal strength, other nodes decide which cluster they could join [20,21].

The HEED Protocol
The Hybrid Energy-Efficient Distributed Clustering (HEED) protocol [22], proposed by Younis and Fahmy, is a multi-hop clustering protocol which provides energy-efficient clustering. Unlike the LEACH protocol that randomly selects nodes as CHs, the HEED selects the CHs based on residual energy and intra-cluster communication cost. One of the main ideas of HEED is to achieve a balanced distribution of CHs throughout the network. Moreover, the probability of two nodes within each other's communication range becoming CHs at the same time is very small in the HEED protocol. Initially, Cprob, a percentage of CHs among all nodes, is set to assume that an optimal percentage cannot be computed. The probability of which a node becomes a CH is: where is the estimated current energy of the node, and is a reference maximum energy, which is equal for all nodes. The value of is not allowed to be less than a certain threshold and the threshold is inversely proportional to . After that, each node executes several iterations to find the CH. On the other hand, CHs forward data to the BS using a multi-hop communication scheme [23].

The WCA Protocol
The Weighted Clustering protocol [24] proposed by Chaterjee et al. is based on nodes' neighbors' number and it considers the movement of nodes. The CH election is based on node degree (number of neighbors), transmit and receive energy and residual energy. To ensure that CHs is would not be under overload or high energy consumption conditions, there is a threshold number which shows the maximum number of cluster members. In other word the cluster size is limited [25,26]. This fact that CH election process does not happen periodic causes reduction in calculations. The nodes would be elected as CH according to their weight which is: where is the ID of node, is obtained by subtracting the threshold from the number of the neighbors, is summation of distances of node from all its neighbors, represents consumed energy and indicates the mobility. The node with minimum weight is elected as CH. After that this process iterates until each node either finds a cluster or becomes a CH.

The GCMRA Protocol
The Energy efficient Grid based Clustering and Routing Protocol [27], proposed by Jana and Jannu, is a location-based method that divides the whole region into several grids. Nodes in every grid form a cluster. After the cluster forming step, cluster members elect the most suitable node as CH. According to the transmission range of nodes and considering the fact that every node in each cluster should be able to communicate with every node in eight-neighbor clusters, the grid size is calculated as R = x/2.83. On the other hand, the number of clusters can be calculated by knowing the grid and the network size, so the number of clusters in this method is fixed.
After finding the clusters, the nodes start by calculating the sum of distances from all nodes in a cluster. Finally the node with a minimum sum of distances becomes CH as long as its energy level is higher than a set threshold. This protocol uses a multi-hop routing scheme between CHs for shorter transmissions. As the relay nodes are between the source cluster and the BS, first, all nodes consider the BS as next-hop and if there is a CH in its radio range that is closer to BS, it becomes the next-hop. In fact, this approach focuses on reduction the communication range, and consequently reduction in long distance communication energy consumption [27].

The TCAC Protocol
The Topology-Controlled Adaptive Clustering (TCAC) protocol [28] was proposed by Dahnil et al. In this protocol all clustering steps are done assuming that the transmission energy of the nodes can vary. This method has three phases; the first is a periodic update. In order to reduce the effect of energy overhead (transmission start-up cost) and delay time, the periodic update is executed once in every D cycles. If this process were to execute in every cycle, the delay time and energy consumption would increase. In second phase, which is CH election phase, every node generates a random number between 0 and 1 and compares it with P(CCH). If the random number is less than P(CCH), the node becomes a candidate where the P(CCH) is the probability of becoming a candidate that is calculated as the ratio of residual energy of node and the average energy of all network nodes. After electing the candidates, the competition between them starts. The candidate with the most energy among all the candidate neighbors becomes the CH. In third phase, the CHs send a packet and each node that receives the packet responds it with another packet. Afterward the CH creates and broadcasts a list of nodes that send the packet and rank them based on the signal strength. Nodes use the list to find the best CH. This protocol focuses on the scalability of the network. In other word, increasing the number of nodes doesn't affect the efficiency of this protocol [28].

Network Model and Assumptions
The network model is considered as a graph G = (V, E), where V is a set of nodes which contains the BS and N sensor nodes distributed in the ROI. The BS node has an unlimited energy source and it can be placed inside or outside of the desired region and collects the receiving data. E ⊆ V 2 is the set of links, if two nodes are in communication range of each other, there is a link between them. The links are symmetric and bidirectional. Since nodes do not use GPS positioning equipment, they are not aware of their own geographical coordinates. The nodes equipped with RSSI to measure their distances. The nodes are similar and are distributed evenly or randomly in a square-shaped field. Nodes explore the network topology independently. They send data packets with fixed signal strength and on a fixed frequency. Nodes' position is assumed fixed and the initial energy of nodes is assumed similar.

The Energy Model
The same radio-energy model as stated in [29] is used, which is described briefly as follows. The schematic of the model is presented in Figure 1. This model considers the transmission energy in two parts. The first part is the amplification energy (propagation loss) that depends on the number of bits, the distance from the receiver and the acceptable bit-error rate. The propagation loss is proportional to for distances less than and is proportional to for distances more than . For the receiving energy model, only the electronic processing energy, that depends on the number of bits, is considered. For -bit packets over a distance , the consumed energy is: where is transmitter energy consumption, is the receiver energy consumption, and are the electronic processing energy consumption and is the amplifier energy consumption.

The DCPVP Protocol
In the previous protocols, such as LEACH and HEED, the size of clusters is uniform regardless of the distance from the BS, which causes early death of some nodes. Since the energy consumption increases proportionally to transmission distance, multi-hop routing is used by CHs, so the neighbors of a BS have the duty of forwarding data packets of farther nodes to the BS. On the other hand, as the cluster size increases, the CH energy consumption increases too. According to the discussed issues, we assume the size of clusters is proportional to the distance from BS, which provides balancing of CHs' lifetime in different places. This means that closer clusters to the BS have smaller sizes. To avoid uncontrolled increases in the cluster number, as the distance from the BS increases, the cluster size will be bigger [30], as shown in Figure 2. On the other hand, in the places where nodes are distributed densely, by choosing one node as the CH, the energy consumption of that node increases extensively and may cause its death. To avoid that, the size of clusters should be smaller and the number of them should increase in such places. Therefore, the load of several nodes is prorated over several CHs and this avoids the death of a single node [30] (Figure 3). As mentioned, in the DCPVP method, nodes have similar roles in the clustering process, so the control manner of this protocol is distributed, which causes the protocol to be scalable and makes it more adaptive. However, in centralized methods by increasing the number of nodes, the manager node should be able to access all nodes and in some cases this is not possible. On the other hand, as the number of nodes increases, the time of the clustering process increases too and relative to steady state cycle times this becomes inefficient. According to all these issues we can mention that the main idea of DCPVP is choosing the cluster size based on the nodes' distance from the BS, the local density of nodes and the nodes' average distance from neighbors. We should mention that the important parameters such as residual energy times being elected as CH are also considered. The protocol includes five phases which are described in details.

Phase A: Exploration Phase
In this phase, each node explores the network topology and gathers some information which includes the node distance from the BS, the number, and distance from its own neighbors. Distances are calculated by the RSSI equation [31,32]: where RSSI is the received signal strength, RSSI0 is the signal attenuation for one meter distance from the source node and is the path loss exponent [33].
When the exploration phase begins, the BS broadcasts a packet (Start-Packet) to inform nodes of the beginning time and to let nodes calculate their distances from the BS. Then every node sequentially (based on ID) sends a packet to its neighbors during its own time slot (Hello-Packet) which contains its ID. All the other nodes monitor the channel during this time slot, so the neighbor nodes can receive the packet and store the information of the packet and the distance which is measured by RSSI in their local memory ( Table 1). The nodes' information about the network topology depends on the neighbors ID, so if a node dies, the neighbor node should only know its ID and modify its calculations as the network is considered as being stationary. This phase is a significant preparation for the next phases and is done once for a network. After the exploration phase, DCPVP operates in rounds. In every round, the clusters' construction should be rebuilt and new CHs should be elected. After clustering, nodes generate Data-Packets and the network works normally, so the remaining phases repeat in every round. At the end of this phase, each node forms a table such as Table 1, and then updates and uses it later.  Table 1 is updated only when a node dies in its neighborhood. When the node's energy level falls to less than a predefined threshold, the node is considered dead and it notifies its neighbors by sending a corresponding packet (Death-Packet). Every node that is in its neighborhood receives the packet and removes its information row from the Table 1. Figure 4 shows the flowchart of this phase.

Phase B: Cluster Head Election Phase
In this phase, each node calculates its own weight: where , , and are adjust coefficients, Y is the times that the node has been a CH so far, is the residual energy, is initial energy, MD is mean distance to neighbors and is the optimum deviation. MD can be calculated as follows: where X is the nodes' ID, is the number of neighbors, DV is the distance vector, DV (i, j) is the distance between the nodes i and j and ΔC can be calculated as follows: where is the number of neighbors and is optimum number of the neighbors (See Figure 2): where is the distance from BS, is maximum cluster size, 1, 2 and 3 are threshold values of distance and 1, 2 and 3 are the coefficients and are less than 1 ( 1, 2, 3 < 1). Thus we allow the nodes which are farther from BS to form bigger clusters, and allow the closer nodes to only form smaller clusters. This allows load balancing. In this step, each node calculates its own weight and broadcasts it sequentially. Now each node adds a column to its table and writes the weight of each neighbor in it as Table 2. Then each node creates a priority list from its neighbors based on their weight. After that they broadcast a packet containing the voting list (Vote-Packet) and vote the best nodes. In other words, the Vote-Packet is a list of nodes' IDs which are sorted based on their weight. After finishing the election, the node which has the most number of votes, is chosen as the CH and introduces itself by sending a packet (CH-Packet). Figures 5 and 6 show the flowchart of this phase.  All ID-based sequential broadcasts have the same following algorithm. When node i wants to send its own packet, it waits for i-1 time slots from the end of last step until its turn comes and then sends its own packet. During all other time slots the node monitors the channel for receiving probable packets [33].

Phase C: Cluster Building Phase
As discussed before, the CH introduces itself by sending a packet to its neighbors. All nodes in its radio range hear the packets and are aware of the voting result, so when they receive the CH-Packet they respond by sending a packet to the CH (Join-Packet). The CH accepts their join request based on their weight from less to greater. In other words, the NO accepted nodes in cluster are the NO lower nodes in the weight-sorted table. The reason for priority voting is that often, after primary clustering, there are some nodes that are not in any cluster, because of the limited size of clusters, so in a second step, the nodes which have the highest vote and were not elected in the previous step introduce themselves as CH and the previous process will repeat. Finally, if there is a node which is neither organized nor received any packets, it introduces itself as a CH (outlying nodes). The benefit of this iterative method is that we don't have overlapping clusters. For example when node Y becomes a CH in a specific region it certainly has received the highest number of votes, which means that all neighbor nodes know Y as the most weighted node. The elected CH forms a cluster and accepts nearby nodes into its cluster. Also in some regions where the density of nodes is high and the cluster size is limited, the cluster is filled and some nodes will not be organized. In this case according to weight-based acceptance, the remaining nodes are the high weighted nodes of the previous step. These nodes elect the most weighted node among themselves as CH. Finally, after organizing all the nodes, the elected CHs form a timing table for their cluster members and the network goes into steady-state phase [34].

Phase D: Cluster Head Routing Phase
Data forwarding to the BS is multi-hop and done by other CHs. Routing between CHs is done before the steady state phase using the "Most Forwarding Progress within Radius" technique ( Figure 7). CHs implement this technique by knowing their distance to the BS and share it with other CHs. Any CH transmits its data to the CH which is closer to the BS and is in its radius. The flowchart of phases C and D is shown in Figure 8.

Phase E: The Steady State Phase
In this phase, the network works normally and nodes sense the desired environmental parameters and transmit them to the BS. This phase continues until a certain time (tC), which varies depending on the application. Then, the above cycle from the phase B repeats. In this protocol, each node has the chance to be elected as the CH. The optimum results are obtained by tuning the adjust coefficients.

The Simulation Results
In simulation experiments, the sensor nodes are distributed in a 50 m × 50 m area and the BS is located at (100, 25). Initial values are summarized in Table 3. The MATLAB tool is employed for providing simulation test-bench. The results are compared with the results of previous protocols such as the LEACH, HEED, WCA, GCMRA and TCAC protocols. For a fair comparison, when 80% of nodes die, the network will become useless which is considered in all protocols.  Figure 9a,b shows the network clusters provided by the DCPVP protocol for 100 nodes in random and uniform distribution, respectively. The percentage of dead nodes in uniform distribution for all protocols is presented for 100 nodes in Figure 10, for 144 nodes in Figure 11 and for 196 nodes in Figure 12. As shown in Figures 10-12, for the equal round number in uniform distribution, the percentage of dead nodes in DCPVP is less than the other protocols. Figure 13 shows the network life-time in 20 simulation experiments versus the number of nodes in uniform distribution until the network becomes useless [30]. Compared to other protocols, the DCPVP protocol shows better performance for all experiments and after increasing the number of nodes this protocol still performs better than all other protocols.       The number of dead nodes for all protocols is presented in Figure 14 for 100 randomly distributed nodes, for 150 randomly distributed nodes in Figure 15 and for 250 randomly distributed nodes in Figure 16, respectively. As shown in Figures 14-16 for an equal number of rounds, the percent of dead nodes in the DCPVP protocol is ostensibly less than in the other protocols. Furthermore, in some cases when just 10% of nodes in DCPVP die, the networks life time is finished in some other protocols. Figure 17 shows the average life cycle of protocols. As shown in Figure 17, in random distribution the DCPVP protocol behaves better than other protocols for all cases. Also when the number of nodes increases, the DCPVP shows fewer downfalls in comparison with other algorithms. In Figure 17 as the number of nodes increases, the network lifetime would increase too, but when the number of nodes exceeds 150 nodes, the lifetime starts to decrease. This happens due to the structure of clusters and it seems that for this network topology, area dimensions and BS location, the optimum number of nodes is 150. For more nodes, the hot nodes which communicate directly to the BS, would be overloaded and the efficiency of the topology would decrease. The decline in lifetime is relative to the clustering structure and the selection progress of cluster heads. In addition, to provide a fair comparison between protocols, the load balance factor (LBF) is used, which was introduced and used in [24,35], respectively. As the cluster heads support its members and also route the data packets from the nodes belonging to other clusters, therefore, it is not desirable to have some overloaded cluster heads while some others are lightly loaded. At the same time, it is difficult to maintain a perfectly load balanced system during all times due to frequent detachment and attachment of the nodes from and to the CHs. To quantitatively measure how well balanced the cluster heads are, the authors in [24] introduced the LBF parameter. A higher value of LBF means better load balancing where is calculated as follows: where is the number of CHs, ( ) is the number of members in cluster and can be calculated as follows: where is the number of nodes and is the number of clusters. The LBF is calculated for 100 nodes in random distribution and the results are shown in Table 4. Each value in Table 4 is the average of 10 experiments. As shown in Table 4, the DCPVP protocol has a higher LBF value that confirms the network lifetime extension.

Conclusions
This paper has proposed a new distributed clustering protocol using voting and priority ideas for virtual portioning sensor networks called the DCPVP protocol. The DCPVP method is an energy-efficient, scalable, load-balanced and self-organized clustering protocol that could be employed in large-scale and harsh environments. The size of clusters vary depending on the distances from the data link to overcome the energy hole problem for nodes closer to the base station. The results confirmed that DCPVP prolongs the network lifetime for both evenly and random sensor node distributions compared to some well-known clustering protocols in the literature such as the LEACH, HEED, WCA, GCMRA and TCAC protocols. The LBF values confirmed that the scalability of the DCPVP protocol enhances the lifetime of distributed sensor networks.