An Energy-Efficient Clustering Routing Protocol for Wireless Sensor Networks Based on AGNES with Balanced Energy Consumption Optimization

To further prolong the lifetime of wireless sensor network (WSN), researchers from various countries have proposed many clustering routing protocols. However, the total network energy consumption of most protocols is not well minimized and balanced. To alleviate this problem, this paper proposes an energy-efficient clustering routing protocol in WSNs. To begin with, this paper introduces a new network structure model and combines the original energy consumption model to construct a new method to determine the optimal number of clusters for the total energy consumption minimization. Based on the balanced energy consumption, then we optimize the AGglomerative NESting (AGNES) algorithm, including: (1) introduction of distance variance, (2) the dual-cluster heads (D-CHs) division of the energy balance strategy, and (3) the node dormancy mechanism. In addition, the CHs priority function is constructed based on the residual energy and position of the node. Finally, we simulated this protocol in homogeneous networks (the initial energy = 0.4 J, 0.6 J and 0.8 J) and heterogeneous networks (the initial energy = 0.4–0.8 J). Simulation results show that our proposed protocol can reduce the network energy consumption decay rate, prolong the network lifetime, and improve the network throughput in the above two networks.


Introduction
As a symbol of the 4th generation of sensor networks, wireless sensor network (WSN) is a distributed self-organizing network that integrates data acquisition, processing and communication functions. It has a wide range of applications in many important fields, such as agriculture, transportation, and military. Usually, the nodes are powered by limited batteries, so the purpose of extending the lifetime of WSN can be achieved by reducing the energy consumption.
As an effective scheme to save energy consumption of WSN, a reasonable clustering routing protocol is generally divided into three phases: cluster setup phase, cluster heads (CHs) election phase, and data transmission phase. In the cluster setup phase, the sensor node groups in the detection area

•
The optimal number of clusters is derived to minimize the total energy consumption of WSN. • Variance introduction, the D-CHs division of the energy balance strategy and the node dormancy mechanism are necessary to enable the energy consumption balance. • A new CHs priority function can ensure the nodes with better positions and adequate residual energy could have higher probabilities to be CHs.

•
The new clustering routing protocol achieves good network performance, including lifetime, energy consumption, and throughput.
As for the communication technology, the previous Bluetooth [27] has high system complexity, short transmission distance, and large power consumption, which is not popular in WSN. In contrast, ZigBee has a wide range of applications in WSN due to its simplicity, low power consumption, low cost, and long-distance transmission.
In 2016, Bluetooth 5 [28,29] came into being. Compared with the previous Bluetooth version, its maximum transmission distance is increased 3 times, the power is greatly reduced, and the transmission rate is significantly improved. In addition, it will increase the maximum data capacity to 255 bytes, while ZigBee has only 100 bytes in this aspect. Thus, Bluetooth 5 is gradually becoming a new generation of Internet of Things communication technology. In this paper, we choose Bluetooth 5 as the communication technology of WSN. • A new CHs priority function can ensure the nodes with better positions and adequate residual energy could have higher probabilities to be CHs.

•
The new clustering routing protocol achieves good network performance, including lifetime, energy consumption, and throughput.
As for the communication technology, the previous Bluetooth [27] has high system complexity, short transmission distance, and large power consumption, which is not popular in WSN. In contrast, ZigBee has a wide range of applications in WSN due to its simplicity, low power consumption, low cost, and long-distance transmission.
In 2016, Bluetooth 5 [28,29] came into being. Compared with the previous Bluetooth version, its maximum transmission distance is increased 3 times, the power is greatly reduced, and the transmission rate is significantly improved. In addition, it will increase the maximum data capacity to 255 bytes, while ZigBee has only 100 bytes in this aspect. Thus, Bluetooth 5 is gradually becoming a new generation of Internet of Things communication technology. In this paper, we choose Bluetooth 5 as the communication technology of WSN. The rest of this paper is organized as follows: Section 2.1 introduces a new network structure model, Section 2.2 describes the original energy consumption model, and Section 2.3 proposes a new method for determining the optimal number of clusters. Section 3 describes the details of the protocol. The simulation study is conducted in Section 4. Finally, Section 5 summarizes the research and prospects for the future.

Network Model and Optimal Cluster Number Calculation
In this section, we propose a new network structure model and quote the energy consumption model proposed in [12], then suggest a new method to determine the optimal cluster number.

Network Model
The network model used in this paper is a WSN model in which N sensor nodes are evenly arranged in a circular area of diameter M. The BS in the center of the network area has strong computing power. Because the BS energy can be self-replenished, the energy loss of the BS is not considered in this work. On this basis, we can make the following assumptions about the WSN: The rest of this paper is organized as follows: Section 2.1 introduces a new network structure model, Section 2.2 describes the original energy consumption model, and Section 2.3 proposes a new method for determining the optimal number of clusters. Section 3 describes the details of the protocol. The simulation study is conducted in Section 4. Finally, Section 5 summarizes the research and prospects for the future.

Network Model and Optimal Cluster Number Calculation
In this section, we propose a new network structure model and quote the energy consumption model proposed in [12], then suggest a new method to determine the optimal cluster number.

Network Model
The network model used in this paper is a WSN model in which N sensor nodes are evenly arranged in a circular area of diameter M. The BS in the center of the network area has strong computing power. Because the BS energy can be self-replenished, the energy loss of the BS is not considered in this work. On this basis, we can make the following assumptions about the WSN: (1) All sensor nodes are static, nodes transmit data to each other in single or multiple hops, and the node energy cannot be supplemented. (2) The idealized simulation environment does not consider the influence of natural factors such as temperature, humidity, light, and wind on the sensor nodes.

Energy Consumption Model
This paper quotes the energy consumption model proposed in [12]. According to the actual transmission distance from the CHs to the BS, the free space model and the multipath fading channel model both need to be comprehensively analyzed, which is different from [12] considering only the multipath fading channel model. Therefore, the expression of the total energy consumption of the model will undergo some changes.
E T (e,d) indicates the energy consumed by the wireless transmitter to transmit a set of e bits of information. The expression is as follows: E R (e) indicates the energy required to receive the information of the e bit. The expression is as follows: In Equations (1) and (2), E elec is the energy consumed per bit by the transmitter or receiving circuit and d is the distance between the transmitter and the receiver. In Equation (2), when d < d 0 , we use the free space model and ε fs acts as the energy factor per bit. Otherwise, the multipath fading channel model is used, and ε mp acts as the energy factor per bit. In addition, d 0 is used as the distance threshold. As long as it is input as an independent variable into the free space model and the multipath fading channel model to establish an equation, the following expression can be obtained: In each round of data transmission, the cluster member nodes are responsible for sensing information from the environment, then transmitting it to the CH of the corresponding cluster. Therefore, the calculation formula for the energy consumed by transmitting e bit information is defined as follows: In Equation (4), d toCH represents the distance from the cluster member node to the CH. The CH receives information from the cluster member nodes in the cluster, then fuses the information with that which it senses from the environment, eventually transmits the merged information to the BS. In this paper, we assume that in each round of data transmission, the information size obtained after processing by the CH is e bit. The energy consumed in the process is calculated as follows: In the above formula, the energy consumed consists of three parts: receiving energy consumption, processing energy consumption, and transmitting energy consumption. In Equation (5), n is the number of nodes surviving in the monitored area, k is the number of clusters to be divided, E DA is the energy consumed by the CH to process each bit of data (including received data and sensed data), and d toBS is the distance between the CH and the BS.

Optimal Number of Clusters
In general, the inter-cluster communication traffic in WSN increases as the number of clusters increases, and the intra-cluster communication traffic increases as the number of clusters decreases. In addition, the network's energy consumption increases as communication traffic increases. The determination of the optimal cluster number of the network is of great significance to the network's communication. In this section, we will determine the optimal number of clusters k in combination with the network structure model and energy consumption model described respectively in Sections 2.1 and 2.2.
The monitoring area in this paper is a circle with a diameter of M. In real life, the cluster areas of WSN must be irregular and inconsistent, and the nodes are randomly placed. If these three points are both considered, the proposed model must be complex and not universal. So like the optimal numbers of clusters in [11,12], that in this paper is also used as a relatively common reference standard in the actual monitoring area. To derive the optimal number of clusters more intuitively, we construct an inline square in the circular region with a side length of L. We assume that the clusters in the square are all circular in shape with a radius of R and the cluster distribution is uniformly distributed. Finally, after calculating the total circular cluster area and the circular area to establish the relationship, we can obtain the relationship between the total number of clusters k and the number of circular clusters k 1 .
As shown in Figure 2, the monitoring area in this paper is a large circle with a diameter of M and the length of the embedded square is L. From this, we can derive the relationship between L and M: divided, EDA is the energy consumed by the CH to process each bit of data (including received data and sensed data), and dtoBS is the distance between the CH and the BS.

Optimal Number of Clusters
In general, the inter-cluster communication traffic in WSN increases as the number of clusters increases, and the intra-cluster communication traffic increases as the number of clusters decreases. In addition, the network's energy consumption increases as communication traffic increases. The determination of the optimal cluster number of the network is of great significance to the network's communication. In this section, we will determine the optimal number of clusters k in combination with the network structure model and energy consumption model described respectively in Sections 2.1 and 2.2.
The monitoring area in this paper is a circle with a diameter of M. In real life, the cluster areas of WSN must be irregular and inconsistent, and the nodes are randomly placed. If these three points are both considered, the proposed model must be complex and not universal. So like the optimal numbers of clusters in [11,12], that in this paper is also used as a relatively common reference standard in the actual monitoring area. To derive the optimal number of clusters more intuitively, we construct an inline square in the circular region with a side length of L. We assume that the clusters in the square are all circular in shape with a radius of R and the cluster distribution is uniformly distributed. Finally, after calculating the total circular cluster area and the circular area to establish the relationship, we can obtain the relationship between the total number of clusters k and the number of circular clusters k1.
As shown in Figure 2, the monitoring area in this paper is a large circle with a diameter of M and the length of the embedded square is L. From this, we can derive the relationship between L and M: After the sensor network is divided into many clusters, the CH receives the information transmitted by the cluster member nodes, and after processing, eventually transmits it to the BS for final data fusion. In the intra-cluster communication, as the distance between the cluster member node and the CH is not large, we adopt the free space model.
To understand the network structure model more intuitively, Figure 3 shows an example dividing the cluster into 16 clusters. In the figure, the positive center position of the monitoring area After the sensor network is divided into many clusters, the CH receives the information transmitted by the cluster member nodes, and after processing, eventually transmits it to the BS for final data fusion. In the intra-cluster communication, as the distance between the cluster member node and the CH is not large, we adopt the free space model.
To understand the network structure model more intuitively, Figure 3 shows an example dividing the cluster into 16 clusters. In the figure, the positive center position of the monitoring area is the BS indicated by $. A blue circle indicates a cluster. Consequently, we can obtain the expression of the blue cluster number k 1 as follows: Sensors 2018, 18, 3938 6 of 27 The area of the monitoring area S sum is calculated as follows: The inscribed square area S square is calculated as follows: One blue cluster area S cluster is calculated as follows: The total area of the blue clusters S cluster_sum is calculated as follows: According to Equations (8) and (11): If the cluster is completely divided into clusters in the monitoring area, the total number of clusters k is twice that of k 1 , namely: The cluster member nodes obey the uniform distribution, and then the distribution function can be expressed as follows: Calculate the expected squared distance of the cluster member nodes to the CH.
The distances between some CHs and BS in the model may be larger than d 0 , so it is necessary to simultaneously refer to the free space model and the multipath fading channel model when considering the energy consumption between clusters. Then the value range of the diameter M is greater than 2d 0 .
According to Equation (5): Then the expectation of f (d toBS ) is calculated: Let Thus, the average energy consumed by a cluster in one round is The energy consumed by all of the clusters in the region in one round is Deriving for E SUM , let dE SUM dk = 0, so we can obtain the optimal number of clusters

The Clustering Protocol
The main steps of the clustering protocol in this paper are as follows: (1) Calculate the number k of clusters required according to the calculation formula of the network optimal cluster number introduced in Section 2.3. (2) Through the AGNES (AGglomerative NESting) algorithm with balanced energy consumption optimization, we can build the required k clusters. (3) Implement the selection mechanism of the CH in each cluster, then we can implement the

The Clustering Protocol
The main steps of the clustering protocol in this paper are as follows: (1) Calculate the number k of clusters required according to the calculation formula of the network optimal cluster number introduced in Section 2.3. (2) Through the AGNES (AGglomerative NESting) algorithm with balanced energy consumption optimization, we can build the required k clusters. (3) Implement the selection mechanism of the CH in each cluster, then we can implement the D-CHs division of the energy balance strategy and node dormancy mechanism for the large cluster area before and after the death of the first node, respectively. (4) Data transmission and energy update.
To minimize the total energy consumption and balance the energy consumption of the nodes in the network, we perform the node death decision after each round of data transmission in the network (once the node dies, return to Step 1; otherwise, return to Step 3). In addition, Steps 1, 2, and 3 are collectively called the preparation phase of the protocol.
Step 4 is called the stabilization phase of the protocol.
Before entering the stabilization phase, each member of each cluster needs to send a set of control message named as Node_Msg to its CH in the form of (Node_NO, Node_Status). The status is only divided into work and dormancy. According to the received Node_Msg, the CH of each cluster allocates a time slot for the member nodes of the cluster that need to work. Then every CH sends a set of control message named as Schedule_Msg to its member nodes that need to work in the form of (Node NO.1, Time Slot1; Node NO.2, Time Slot2; . . . . . . ). Once entering the stabilization phase, the nodes which have received time slots send their sensed information to their associated CHs, and others are in dormancy. As for the CHs, they are responsible for receiving and processing the information sent by the member nodes and eventually transmitting it to the BS. The time slot allocation of the clustering protocol in this paper is provided in Figure 4. A flowchart of the clustering protocol in this paper is presented in Figure 5.

AGNES Algorithm
The AGNES (AGglomerative NESting) [30] algorithm is a hierarchical clustering algorithm. First, several objects are input, each one constitutes an initial cluster by itself. Then the two clusters with the shortest distance are continuously merged into one cluster until the number of clusters obtained reaches the number of clusters k satisfying the termination condition. Finally, the resulting k clusters are the target clusters of our algorithm.
In this algorithm, each cluster equals a sample set, and the merger between clusters equals the merger between sets. The merging standard is the distance between the two clusters, which usually assumes three forms: (1) the longest distance, (2) the shortest distance, and (3) the average distance.

AGNES Algorithm
The AGNES (AGglomerative NESting) [30] algorithm is a hierarchical clustering algorithm. First, several objects are input, each one constitutes an initial cluster by itself. Then the two clusters with the shortest distance are continuously merged into one cluster until the number of clusters obtained reaches the number of clusters k satisfying the termination condition. Finally, the resulting k clusters are the target clusters of our algorithm.
In this algorithm, each cluster equals a sample set, and the merger between clusters equals the merger between sets. The merging standard is the distance between the two clusters, which usually assumes three forms: (1) the longest distance, (2) the shortest distance, and (3) the average distance. For example, given two clusters (C i and C j ), the distance between the two clusters can be obtained from the following three equations: The longest distance: The shortest distance: The average distance:

The AGNES Algorithm with Balanced Energy Consumption Optimization
To further improve the various indicators of WSN, we make balanced energy consumption optimization for the AGNES algorithm: (1) To reduce the difference in the distance set between the nodes in the two clusters: On the basis of the original indicator that can be combined in two clusters (note: the average distance D avg is used in this paper), the distance set variance δ 2 is added, so the two merged clusters cannot only have a shorter average distance, but also the distance difference between the nodes in the two clusters tends to be smaller. Thus, the energy consumption of the cluster nodes is more uniform, which can help effectively avoid the phenomenon that some cluster member nodes die prematurely due to the jaggedness of the transmission distances during the communication process. (2) The D-CHs division of the energy balance strategy in large clusters is implemented before the death of the first node: The AGNES algorithm can obtain the k cluster needed, but it does not limit the size of the cluster, so the resulting clusters may have different sizes. As a result, CHs in large clusters tend to receive and process large amounts of cluster information, then the energy is prematurely exhausted, which will have a very negative impact on network lifetime extension, energy consumption reduction, and throughput increase. Based on this, we implement the D-CHs division of energy balance strategy in the large cluster area before the death of the first node. The strategy mainly includes the following: the secondary cluster head (S-CH) is responsible for receiving the information sent by the cluster member nodes and the positive cluster head (P-CH) is responsible for merging the former information with the self-sensing information and finally transmitting it to the BS. (3) The node dormancy mechanism in large clusters is implemented after the death of the first node as follows: The data obtained by WSN needs to meet two requirements (large amount of data and high data integrity). Before the first node dies, the network is in a stable period, and the energy of the node group is enough, and many rounds of iterations can be performed. At this time, the data in the network can well satisfy the above two requirements. The node dormancy mechanism will cause data loss in some areas while causing energy consumption reduction. Therefore, the node dormancy mechanism is not implemented at this time. But after the death of the first node, it means that the energy of the node is greatly reduced, and the mortality rate is greatly improved. Even if the node dormancy mechanism is not performed, the network coverage of the monitoring area becomes smaller as the nodes die continuously. It will inevitably lead to a reduction in data integrity. At this time, it is not practical to maintain data integrity as well as the stable period. Therefore, our focus is on the improvement of data volume. By extending the network life cycle, WSN will have a longer monitoring time for the region, and can obtain a larger amount of information. The node dormancy mechanism can make the nodes which have relatively low energy in the cluster and relatively long distance from the CHs be in dormancy, avoiding its premature death, and reducing the energy load of the cluster head, thereby prolonging the network life cycle, which just satisfies the actual needs of the period.
It's worth emphasizing that the protocol in this paper must re-select the CHs at the end of each round, which can help balance the energy consumption of the nodes and maintain the network coverage in the area.

Cluster Setup
By adding the two cluster average distance D avg , and the variance δ 2 of the distance set of two clusters in the cluster setup process, we can construct a cluster setup factor. The two clusters corresponding to the largest cluster-merging factor can be merged until the number of clusters reaches the pre-set number of clusters k. Compared to the original AGNES algorithm, the algorithm has smaller distance difference between the nodes in the two clusters, and therefore the energy consumption of the nodes is more uniform during data transmission. The detail procedure of this phase is given by the pseudo-code in Algorithm 1.

Algorithm 1. Cluster Setup Algorithm
Inputs: (1) n objects (2) k-number of clusters Result: k clusters of different sizes 1: Each object constitutes an initial cluster; 2: Current cluster number k' = n; if i~= j 7: calculate the distance of the two nodes in the two clusters C i and C j ; 8: build a distance set D and obtain the distance mean D avg and the variance δ 2 ; 9: calculate the clustering factor of clusters C i and C j

CHs Election
This section is mainly divided into two parts: (1) CHs election in general clusters; and (2) CHs election in large clusters.

CHs Election in General Clusters
In general, if a node in a cluster wants to be the CH of the cluster, the following three conditions must be met: (a) Compared to most other nodes, its location is closer to the center of the cluster. (b) Compared to most other nodes, its distance from BS is relatively small. (c) Compared to most other nodes, it owns more residual energy.
On this basis, we calculate the center position Cen (X C , Y C ) of a cluster by the following formula: where |C| represents the number of nodes in the cluster C. Then we construct an objective function for picking an appropriate node as the CH in cluster C.
In Equation (28), S(i).E represents the residual energy of node i; d i toCen represents the distance from node i to Cen; d i toBS represents the distance from node i to the BS; and α and β are respectively the routing factors of d i toCen and d i toBS , where α + β = 1. The larger the value of G CH of node i, the more likely it is to be selected as the CH. Algorithm 2 shows us the CH election in a general cluster.

CHs Election in Large Clusters
For large clusters, before the death of the first node, we implement the D-CHs division of energy balance strategy, which involves the selection of the positive cluster head (P-CH) and the secondary cluster head (S-CH). The detail procedure is given by the pseudo-code in Algorithm 3. In this paper, a cluster with CH energy consumption greater than 1.5 times the average CH energy consumption is defined as a large cluster. According to the energy consumption formulas introduced in Sections 2.2 and 2.3, we can calculate the average energy consumption of the CH, E CH .
We use x to represent the total number of nodes in the cluster, so we can use the following formula to get the CH energy consumption E of the cluster.
Next, we determine what the value of E is when the corresponding cluster can be defined as a large cluster. Here, we assume that E is γ times as E CH . We compare the difference between the energy consumption of the CH in the cluster and the average energy consumption of the CH in the case of General CHs and D-CHs, respectively, to obtain the critical E value of a large cluster.
According to Figure 6, when γ is less than 1.5, the energy error of General CHs is less than D-CHs; Once γ is more than 1.5, the energy error of D-CHs is less than General CHs, which means the effect of D-CHs division of energy balance strategy is better than General CHs strategy. So, we determine the value of γ is 1.5.
To compute E > 1.5 × E CH , it can be concluded that the number of nodes in the large cluster is required to satisfy the condition: As shown in Figure 7, the WSN consists of 5 clusters (2 large clusters and 3 general clusters). Compared to the general cluster, the member nodes (white dots) in the large cluster collect environmental information and send it to the S-CH (yellow dots); the latter receives such information and the data fusion is performed in the P-CH (green dots). The final information is sent to the BS (red pentagram) for decision making. As shown in Figure 7, the WSN consists of 5 clusters (2 large clusters and 3 general clusters). Compared to the general cluster, the member nodes (white dots) in the large cluster collect environmental information and send it to the S-CH (yellow dots); the latter receives such information and the data fusion is performed in the P-CH (green dots). The final information is sent to the BS (red pentagram) for decision making.

Node Dormancy Mechanism
In general, the node dormancy mechanism is mainly divided into two categories: (1) randomly select a certain proportion of nodes to be dormant based on their different locations and (2) select nodes of different proportions to be dormant based on their distance to the CHs.

Node Dormancy Mechanism
In general, the node dormancy mechanism is mainly divided into two categories: (1) randomly select a certain proportion of nodes to be dormant based on their different locations and (2) select nodes of different proportions to be dormant based on their distance to the CHs.
After the network enters the unstable period, the cluster member nodes in the large cluster area have low energy and long transmission distance, so it is extremely easy for them to exhaust their energy. The CH needs to receive and process a large amount of information, and once it dies, all of the information in the cluster cannot be transmitted to the BS, so that valuable information may be lost.
Based on the residual energy of the cluster member nodes and their distances to the CHs, the node dormancy mechanism proposed in this paper causes the member nodes with low energy and long distances to the CH to become dormant, thus reducing the load on the CH and improving the network throughput. Its steps are as follows: First, set up dormancy factors S dor for all of cluster member nodes, which is calculated as follows: The smaller the S dor of the node i, the higher the mortality rate of the node i and the higher the dormancy probability.
Next, the node dormancy ratio P is determined.
In Equation (33), C(j)·NO represents the total number of nodes in cluster j, n is the number of currently surviving nodes, and k is the number of clusters established.
Finally, the dormancy factors of all of the cluster member nodes in the large cluster are sorted from small to large, then the nodes corresponding to the pre-P dormancy factors are dormant. The detail procedure is given by the pseudo-code in Algorithm 4.
To understand the node dormancy mechanism more intuitively, look at Figure 7 (A simple example).
As shown in Figure 8, first, after the CH is determined in (1), the node dormancy mechanism is implemented. Then the three dormant nodes are determined in (2). After several rounds of data iterations, we determined a new CH and a dormant node in (3). After several further rounds of data iterations, only two nodes survive in (4). After re-determining the CH and the last multiple rounds of data iterations, all of the nodes of the cluster die.

Inputs: (1) nodes of cluster C
(2) size of cluster C: NO (3) number of surviving nodes: n (4) optimal number of clusters: k Result: dormant nodes of cluster C 1: for i = 1 → NO 2: calculate the dormancy factor of node i, S i dor 3: end for 4: calculate node dormancy ratio, P 5: sort the dormancy factor set from small to large 6: put the nodes corresponding to the previous P dormancy factors into dormancy 1: for i = 1 → NO 2: calculate the dormancy factor of node i, S i dor 3: end for 4: calculate node dormancy ratio, P 5: sort the dormancy factor set from small to large 6: put the nodes corresponding to the previous P dormancy factors into dormancy

Simulation Results
In this section, we evaluate the proposed protocol by simulating using MATLAB 2016b (MathWorks, Natick, MA, USA) on a desktop PC (Lenovo, made in Beijing, China) with Intel(R) Core (TM) i3-4170 CPU @ 3.70GHz, 4GB RAM. When building the network model, this study assumed that all of the wireless sensor nodes are distributed in a circular area with a diameter of M, and the BS is located at the center of the area (0, 0). Specific parameters in the simulation are shown in Table 1 (Note that J in this paper stands for Joule, which is a unit of energy). We mainly compared our proposed protocol to the original classic protocols from the two kinds of networks including the homogeneous and heterogeneous networks, and the four aspects including the death round of the first node, the lifetime of the network, the trend of the network energy consumption, and the trend of the network throughput with the rounds of iterations.

Simulation Results
In this section, we evaluate the proposed protocol by simulating using MATLAB 2016b (MathWorks, Natick, MA, USA) on a desktop PC (Lenovo, made in Beijing, China) with Intel(R) Core (TM) i3-4170 CPU @ 3.70GHz, 4GB RAM. When building the network model, this study assumed that all of the wireless sensor nodes are distributed in a circular area with a diameter of M, and the BS is located at the center of the area (0, 0). Specific parameters in the simulation are shown in Table 1 (Note that J in this paper stands for Joule, which is a unit of energy). We mainly compared our proposed protocol to the original classic protocols from the two kinds of networks including the homogeneous and heterogeneous networks, and the four aspects including the death round of the first node, the lifetime of the network, the trend of the network energy consumption, and the trend of the network throughput with the rounds of iterations.

Determination of the Optimal Routing Factor
Based on the CHs priority function mentioned in Section 3.4, we can select the nodes with better positions and more adequate residual energy as the CHs. To get the optimal routing factor α, we conduct related simulations, including α is taken from 9 numbers between 0.1 and 0.9 and the network lifetime is simulated and compared in homogeneous networks (initial energy = 0.6 J) and heterogeneous networks (the initial energy is evenly distributed at 0.4-0.8 J).

The Network Lifetime Comparison in Homogeneous Networks
In Section 4.1.1, we compare the network lifetime in homogeneous networks with α taking from 9 numbers between 0.1 and 0.9. Figures 9 and 10, and Table 2 show us the related results.

Determination of the Optimal Routing Factor
Based on the CHs priority function mentioned in Section 3.4, we can select the nodes with better positions and more adequate residual energy as the CHs. To get the optimal routing factor α, we conduct related simulations, including α is taken from 9 numbers between 0.1 and 0.9 and the network lifetime is simulated and compared in homogeneous networks (initial energy = 0.6 J) and heterogeneous networks (the initial energy is evenly distributed at 0.4-0.8 J).

The Network Lifetime Comparison in Homogeneous Networks
In Section 4.1.1, we compare the network lifetime in homogeneous networks with α taking from 9 numbers between 0.1 and 0.9. Figures 9 and 10, and Table 2 show us the related results.     In Section 4.1.1, we compare the network lifetime in heterogeneous networks with α taking from 9 numbers between 0.1 and 0.9. Figures 11 and 12, and Table 3 show us the related results.

The Optimal Routing Factor
Although the network has the longest lifetime in the case of α = 0.1 compared with other cases, its first node death time is too early, which means that the energy consumption distribution of the nodes is rather uneven in this case. In homogeneous networks, the first node's death round in the case of α = 0.2 is 1018, it is too small. Although the first node's death rounds in the case of α = 0.4-0.9 are very close, the last node's death round in the case of α = 0.4 is the largest among them. The first node's death round in the case of α = 0.3 is less than that in the case of α = 0.4, but the last node's death round in the case of α = 0.3 is more than that in the case of α = 0.4. Thus, the optimal routing factor α in homogeneous networks is 0.3 or 0.4.

The Optimal Routing Factor
Although the network has the longest lifetime in the case of α = 0.1 compared with other cases, its first node death time is too early, which means that the energy consumption distribution of the nodes is rather uneven in this case. In homogeneous networks, the first node's death round in the case of α = 0.2 is 1018, it is too small. Although the first node's death rounds in the case of α = 0.4-0.9 are very close, the last node's death round in the case of α = 0.4 is the largest among them. The first node's death round in the case of α = 0.3 is less than that in the case of α = 0.4, but the last node's death round in the case of α = 0.3 is more than that in the case of α = 0.4. Thus, the optimal routing factor α in homogeneous networks is 0.3 or 0.4.
In heterogeneous networks, the first node's death rounds in the case of α = 0.2-0.9 are very close, but the last node's death rounds in the case of α = 0.2 and 0.3 are both the largest among them. Thus, the optimal routing factor α in heterogeneous networks is 0.2 or 0.3.
In summary, we can determine the optimal routing factor (α = 0.3) in the protocol.

Comparison of the Death Round of the First Node
In WSNs, network performance tends to decline with the nodes' death, and the network is in a stable period before the first node dies. The death of the first node indicates that the network enters an unstable period and its performance starts to decline. The clustering protocol in this paper balances the energy consumption of the network by cyclically selecting the CHs and considers the remaining energy and location of the node in the process of the CHs selection. Figure 13 shows that in the homogeneous networks (0.4 J and 0.8 J), the round of the first node's death in the three protocols LEACH, SEP, and DEEC is not substantially different. In comparison, our protocol has an advantage in delaying the round of the first node's death. In the heterogeneous networks (0.4-0.8 J), our protocol can still maintain good performance in this respect. At this time, the DEEC performance is the best among the other three protocols, the LEACH performance is second, and the SEP performance is poor. It is not difficult to understand that in heterogeneous networks, the energy gap between nodes is large. But regardless of the energy of the nodes in the network, the same probability that LEACH gives these nodes is elected as the CH. SEP only considers the initial energy of the node that will cause the high-energy node to have less energy but maintain a high probability of being selected as the CH after multiple rounds of iterations, so that it increases the death rate of the node and causes the first node to die the earliest. DEEC comprehensively analyzes the initial energy and residual energy of the node, which can ensure the probability that the node with high initial energy is elected as the CH can be lowered after multiple rounds of data iterations, so that other nodes with high remaining energy have higher probability to be elected as the CH. The commonality of the three protocols is that they do not consider the location of the node, resulting in some nodes with much energy and relatively remote locations being elected as the CHs, thus causing unnecessary energy waste. Therefore, the protocol in this paper considers the energy and position of the node, so that the CH in one round tends to have more energy and better position, thus effectively extending the death round of the first node. the DEEC performance is the best among the other three protocols, the LEACH performance is second, and the SEP performance is poor. It is not difficult to understand that in heterogeneous networks, the energy gap between nodes is large. But regardless of the energy of the nodes in the network, the same probability that LEACH gives these nodes is elected as the CH. SEP only considers the initial energy of the node that will cause the high-energy node to have less energy but maintain a high probability of being selected as the CH after multiple rounds of iterations, so that it increases the death rate of the node and causes the first node to die the earliest. DEEC comprehensively analyzes the initial energy and residual energy of the node, which can ensure the probability that the node with high initial energy is elected as the CH can be lowered after multiple rounds of data iterations, so that other nodes with high remaining energy have higher probability to be elected as the CH. The commonality of the three protocols is that they do not consider the location of the node, resulting in some nodes with much energy and relatively remote locations being elected as the CHs, thus causing unnecessary energy waste. Therefore, the protocol in this paper considers the energy and position of the node, so that the CH in one round tends to have more energy and better position, thus effectively extending the death round of the first node.

Comparison of the Network Lifetime
SEP considers the impact of the initial energy on the basis of LEACH. Yet in the homogeneous networks, the initial energy of all of the nodes is the same, then SEP equals LEACH. As shown in Figures 14 and 15, the numbers of surviving nodes with the rounds of SEP and LEACH are very close, which is a good testimony to its efficacy. With the continuous rounds of iterations, the advantage of DEEC gradually emerged. Compared to LEACH, DEEC extends the network lifetime by 8.93% and 12.37% in the two homogeneous networks, respectively. Compared to DEEC, the protocol in this paper further extends the network lifetime by 25.89% and 24.36%, respectively.
As shown in Figure 16, in heterogeneous networks, compared to LEACH, the number of surviving nodes in SEP is less than that in LEACH in the early period. SEP causes many nodes with high initial energy to die in the early period, so it has more nodes with less initial energy in the network. During the later period, the energy distribution of the nodes in SEP is more balanced so that it can maintain a longer network lifetime. Considering the initial energy and residual energy of the node, DEEC has advantages in heterogeneous networks compared to SEP and LEACH. Compared to LEACH, SEP, and DEEC, the protocol in this paper leads to the survival of 86 nodes in the 1400th round, thus ensuring that the protocol can carry more rounds of network communication, while LEACH, SEP, and DEEC retain only 28, 32, and 33 nodes, respectively. networks, the initial energy of all of the nodes is the same, then SEP equals LEACH. As shown in Figures 14 and 15, the numbers of surviving nodes with the rounds of SEP and LEACH are very close, which is a good testimony to its efficacy. With the continuous rounds of iterations, the advantage of DEEC gradually emerged. Compared to LEACH, DEEC extends the network lifetime by 8.93% and 12.37% in the two homogeneous networks, respectively. Compared to DEEC, the protocol in this paper further extends the network lifetime by 25.89% and 24.36%, respectively.  As shown in Figure 16, in heterogeneous networks, compared to LEACH, the number of surviving nodes in SEP is less than that in LEACH in the early period. SEP causes many nodes with high initial energy to die in the early period, so it has more nodes with less initial energy in the network. During the later period, the energy distribution of the nodes in SEP is more balanced so that it can maintain a longer network lifetime. Considering the initial energy and residual energy of the node, DEEC has advantages in heterogeneous networks compared to SEP and LEACH. Compared to LEACH, SEP, and DEEC, the protocol in this paper leads to the survival of 86 nodes in the 1400th round, thus ensuring that the protocol can carry more rounds of network communication, while LEACH, SEP, and DEEC retain only 28, 32, and 33 nodes, respectively.  Figures 14 and 15, the numbers of surviving nodes with the rounds of SEP and LEACH are very close, which is a good testimony to its efficacy. With the continuous rounds of iterations, the advantage of DEEC gradually emerged. Compared to LEACH, DEEC extends the network lifetime by 8.93% and 12.37% in the two homogeneous networks, respectively. Compared to DEEC, the protocol in this paper further extends the network lifetime by 25.89% and 24.36%, respectively.  As shown in Figure 16, in heterogeneous networks, compared to LEACH, the number of surviving nodes in SEP is less than that in LEACH in the early period. SEP causes many nodes with high initial energy to die in the early period, so it has more nodes with less initial energy in the network. During the later period, the energy distribution of the nodes in SEP is more balanced so that it can maintain a longer network lifetime. Considering the initial energy and residual energy of the node, DEEC has advantages in heterogeneous networks compared to SEP and LEACH. Compared to LEACH, SEP, and DEEC, the protocol in this paper leads to the survival of 86 nodes in the 1400th round, thus ensuring that the protocol can carry more rounds of network communication, while LEACH, SEP, and DEEC retain only 28, 32, and 33 nodes, respectively.

Comparison of the Network Energy Consumption
In this paper, the energy consumption model proposed in [12] is referenced in determining the optimal cluster number of the network, and a new optimal cluster number method is proposed according to the specific network model. Then, for each round of data transmission, we determine

Comparison of the Network Energy Consumption
In this paper, the energy consumption model proposed in [12] is referenced in determining the optimal cluster number of the network, and a new optimal cluster number method is proposed according to the specific network model. Then, for each round of data transmission, we determine the most suitable CH based on the remaining energy and positions of the nodes in the cluster. As shown in Figure 17, Figure 18, Figure 19, compared to the other three protocols, the number of clusters calculated by the protocol in this paper is superior, and the CHs selection mechanism is more reasonable, so less energy is consumed in the network.

Comparison of the Network Energy Consumption
In this paper, the energy consumption model proposed in [12] is referenced in determining the optimal cluster number of the network, and a new optimal cluster number method is proposed according to the specific network model. Then, for each round of data transmission, we determine the most suitable CH based on the remaining energy and positions of the nodes in the cluster. As shown in Figures 17-19, compared to the other three protocols, the number of clusters calculated by the protocol in this paper is superior, and the CHs selection mechanism is more reasonable, so less energy is consumed in the network.

Comparison of the Network Throughput
Network throughput is an important indicator that fundamentally reflects the performance of a protocol. It refers to the number of packets in the network that are ultimately sent to the BS. The cluster member nodes transmit the information sensed by itself to the CH in the form of packets, and the CH fuses this information with that sensed by itself, and finally sends the information to the BS in the form of packets. During this period, if the energy of the CH is insufficient to receive, fuse, or transmit the information, all of the information of the cluster in this round cannot be transmitted to the BS, resulting in a decrease in network throughput.
As shown in Figures 20-22, the protocol in this paper achieves a good improvement in the network throughput: In the homogeneous networks with the initial energy of 0.4 J, the final throughputs of LEACH, SEP, and DEEC are 5404, 5357, and 7072, respectively. In comparison, the protocol in this paper achieves 89.64%, 91.3%, and 44.9% in throughput improvement, respectively.

Comparison of the Network Throughput
Network throughput is an important indicator that fundamentally reflects the performance of a protocol. It refers to the number of packets in the network that are ultimately sent to the BS. The cluster member nodes transmit the information sensed by itself to the CH in the form of packets, and the CH fuses this information with that sensed by itself, and finally sends the information to the BS in the form of packets. During this period, if the energy of the CH is insufficient to receive, fuse, or transmit the information, all of the information of the cluster in this round cannot be transmitted to the BS, resulting in a decrease in network throughput.
As shown in Figures 20-22, the protocol in this paper achieves a good improvement in the network throughput: In the homogeneous networks with the initial energy of 0.4 J, the final throughputs of LEACH, SEP, and DEEC are 5404, 5357, and 7072, respectively. In comparison, the protocol in this paper achieves 89.64%, 91.3%, and 44.9% in throughput improvement, respectively.
In the homogeneous networks with the initial energy of 0.8 J, the final throughputs of LEACH, SEP, and DEEC are 10,099, 10,336, and 13,590, respectively. In comparison, the protocol in this paper achieves 102.32%, 97.68%, and 50.35% in throughput improvement, respectively.
In the heterogeneous networks where the initial energy is evenly distributed at 0.4-0.8 J, the final throughputs of LEACH, SEP, and DEEC are 8017, 10,591, and 12,426, respectively. In comparison, the protocol in this paper achieves 104.79%, 55.02%, and 32.13% in throughput improvement, respectively.
As indicated in Tables 4-8, as the best among the four clustering protocols, our protocol can achieve a longer first node's death round, longer network lifetime, lower energy consumption and a higher amount of communication data than the others, which is of great significance for various environmental monitoring applications.
In the homogeneous networks with the initial energy of 0.8 J, the final throughputs of LEACH, SEP, and DEEC are 10,099, 10,336, and 13,590, respectively. In comparison, the protocol in this paper achieves 102.32%, 97.68%, and 50.35% in throughput improvement, respectively.
In the heterogeneous networks where the initial energy is evenly distributed at 0.4-0.8 J, the final throughputs of LEACH, SEP, and DEEC are 8017, 10,591, and 12,426, respectively. In comparison, the protocol in this paper achieves 104.79%, 55.02%, and 32.13% in throughput improvement, respectively.   SEP, and DEEC are 10,099, 10,336, and 13,590, respectively. In comparison, the protocol in this paper achieves 102.32%, 97.68%, and 50.35% in throughput improvement, respectively.
In the heterogeneous networks where the initial energy is evenly distributed at 0.4-0.8 J, the final throughputs of LEACH, SEP, and DEEC are 8017, 10,591, and 12,426, respectively. In comparison, the protocol in this paper achieves 104.79%, 55.02%, and 32.13% in throughput improvement, respectively.   As indicated in Tables 4-8, as the best among the four clustering protocols, our protocol can achieve a longer first node's death round, longer network lifetime, lower energy consumption and a higher amount of communication data than the others, which is of great significance for various environmental monitoring applications.

Conclusions
To further improve the performance of WSNs and increase the application value of WSNs in various scenarios, this paper proposes a new WSN clustering routing protocol. First, a new network structure model is introduced, then according to the original energy consumption model, a new method for determining the optimal cluster number of the network is proposed to balance the energy consumption within the cluster and between the clusters. Next, aiming at the shortcomings of the original AGNES algorithm, this paper introduces the variance in the cluster setup phase to reduce the difference in the distance between the nodes in the two clusters and implements the D-CHs division of the energy balance strategy and the node dormancy mechanism before and after the death of the first node, respectively. Finally, the CH priority function is constructed based on the residual energy and position of the node and the CHs are selected repeatedly at the end of each round. The simulation results show: In homogeneous networks, the performance of LEACH and SEP is similar to each other. At this time, the rounds of the first node's death in the three protocols LEACH, SEP, and DEEC are not substantially different. The protocol in this paper has increased by approximately 13-17% in the round of the first node's death. Compared to LEACH and SEP, the protocol in this paper has increased by approximately 40% and 90-103% in network lifetime and network throughput, respectively; Compared to DEEC, the protocol in this paper has increased by approximately 25% and 45-50% in network lifetime and network throughput, respectively.
In heterogeneous networks, compared to LEACH, the advantages of SEP and DEEC are gradually reflected. In the stable period, SEP causes many nodes with high initial energy to act as CHs frequently, so the first node's death round in SEP is less than that in LEACH. Different from the former two, DEEC considers the residual energy of the nodes, so that the energy consumption of the nodes in the network is relatively more balanced. Compared with LEACH and SEP, its overall network performance is better. For the protocol in this paper, it takes into account the location and the remaining energy of the node, so that the total energy consumption in the network is smaller and the energy consumption is more balanced, and the network can survive more nodes after multiple iterations than the other three. Especially in terms of network throughput, it has increased by approximately 32% than DEEC.
So, the protocol can achieve a certain improvement in terms of the round of the first node's death, network lifetime, network energy consumption, and network throughput.
However, there are some shortcomings in our protocol: First, the scenario considered by the protocol is too idealistic. In reality, even if the node energy is sufficient, transmission failure may occur due to the uncertainty of the natural environmental factors in the information transmission process. We can later consider adding a probability model to simulate the natural environment during the information transmission process.
Second, the protocol is applicable only to 2D scenarios. Typical 3D scenarios, such as underground coal mines, underground pipe corridors, and indoor homes, in which, WSN must be arranged in three dimensions. Therefore, in the future we will consider proposing a clustering routing protocol suitable for 3D scenarios based on this protocol.
Last, the protocol proposed in this paper does not optimize the information transmission path. Therefore, compared with some general low-latency protocols, its delay may be larger, which is not suitable for some projects with higher real-time requirements. Therefore, in the future we will consider optimizing the information transmission path through a relatively practical optimization algorithm.