Optimization of Submodularity and BBO-Based Routing Protocol for Wireless Sensor Deployment

Wireless sensors are limited by node costs, communication efficiency, and energy consumption when wireless sensors are deployed on a large scale. The use of submodular optimization can reduce the deployment cost. This paper proposes a sensor deployment method based on the Improved Heuristic Ant Colony Algorithm-Chaos Optimization of Padded Sensor Placements at Informative and cost-Effective Locations (IHACA-COpSPIEL) algorithm and a routing protocol based on an improved Biogeography-Based Optimization (BBO) algorithm. First, a mathematical model with submodularity is established. Second, the IHACA is combined with pSPIEL-based on chaos optimization to determine the shortest path. Finally, the selected sensors are used in the biogeography of the improved BBO routing protocols to transmit data. The experimental results show that the IHACA-COpSPIEL algorithm can go beyond the local optimal solutions, and the communication cost of IHACA-COpSPIEL is 38.42%, 24.19% and 8.31%, respectively, lower than that of the greedy algorithm, the pSPIEL algorithm and the IHACA algorithm. It uses fewer sensors and has a longer life cycle. Compared with the LEACH protocol, the routing protocol based on the improved BBO extends the life cycle by 30.74% and has lower energy consumption.


Introduction
Wireless sensors are widely deployed on a large scale in commercial fields [1,2], but are limited by node costs, communication efficiency between nodes, and energy consumption [3][4][5], e.g., in forest and grassland fire risk monitoring and early warning. The problem of wireless sensor deployment is considered as deploying a certain number of nodes to meet monitoring needs, that is, finding the number and location of deployed nodes. The goal of solving this problem is to find as few sensors as possible to meet the monitoring requirements and reduce the communication cost. It is transformed into an optimal sensor node solution set, which is an NP-hard problem. The sensor deployment problem has diminishing returns, e.g., submodularity [6][7][8]. Initially, when a small number of sensors are deployed, each new sensor will significantly improve its deployment utility. As more sensors are placed, the improvement in utility from adding new sensors diminishes. Krause [9] showed that for problems with submodularity, at least the (1 − 1/e) approximation of the optimal solution can be obtained using the greedy algorithm.
Many methods have been proposed for sensor deployment. In [10], Huang et al. assumed that the node's perception ability is a circular area. That is, targets within the circular area are fully perceived, effects using different algorithms, and analyze the performance of the protocol for the BBO algorithm. Finally, this article is summarized in Section 5.

Problem Description
Given a certain area of interest, V(|V| = N) is the set of monitored locations in the area. The goal of node deployment is to select a subset A ⊂ V(|A| = K < N), and the base station can efficiently estimate the value of any element in the set V\A(|V\A| = N − K) based on the observed values of the subset.
Mutual information is a way to describe the correlation between two sets of events. This paper uses mutual information to describe the correlation between set A with deployed sensors and set V\A without deployed sensors. Suppose set V = [V 1 , V 2 , . . . , V N ] represents N positions, and X V = [X 1 , X 2 , . . . , X N ] describes the random variables of the observation results of these positions. For any subset A ⊂ V, X A is used to represent the set of random variables associated with the location subset A. The objective function is to maximize the amount of information containing the unselected location set through the selected deployment set A.
In Equation (1), H(X V\A ) represents the entropy of the random variable X V/A , and H(X V\A |X A ) represents the random variable X V/A relative to the conditional entropy of X A .

Objective Function of Sensor Deployment Considering Communication Cost
When deploying sensors, we must consider not only the number of sensors, but also the energy consumption during wireless sensor network transmission, because energy consumption is related to the distance between sensors. However, we cannot determine which sensors will communicate to accurately reduce the communication distance between them. Therefore, the communication distance between all the sensors is selected to be reduced so that the communication distance between subsequent sensors is reduced. Assume that there are N optional points. Node i and node j are selected to deploy sensors. The communication cost between the two nodes is defined as follows.
In Equation (2), (x i , y i ) is the node i coordinate, and (x j , y j ) is the node j coordinate.
In Equation (3), (x i , y i ) and (x j , y j ) are the coordinates of the nodes in the set A.
In this paper, the problem of improving the efficiency of the sensor submodularity and reducing the communication cost (C) is transformed into a combination optimization problem. The objective function is as follows. max For the communication cost budget B > 0, Equation (4) aims to find the solution set with the maximum mutual information within a low communication cost.

IHACA-COpSPIEL Deployment Method
Krause et al. proposed the pSPIEL algorithm, which is an improved greedy algorithm, but with both a larger number of sensors and a longer communication distance. The ant colony algorithm is easy to combine with other methods and performs well in path optimization. Therefore, this paper combines the improved ant colony algorithm with the chaotic operator improved pSPIEL algorithm, and proposes the IHACA-COpSPIEL.

Chaos Optimized pSPIEL Algorithm
The standard pSPIEL algorithm applies non-decreasability, submodularity, and locality to solving the problem of sensor node deployment. Compared with the greedy algorithm, pSPIEL can optimize the sensor layout and reduce the communication cost, but with a large number of sensors and slightly higher communication costs. The ergodicity of chaotic motion can effectively traverse each state within a specified range. Therefore, this paper introduces a chaotic operator to traverse all cluster numbers to determine the optimal cluster number. We propose a chaotic optimized pSPIEL algorithm (COpSPIEL). The basic idea of the chaotic locality parameter r adjustment strategy is to use a chaos generator to generate a set of chaotic variables and then use the carrier transform method to map to the locality parameters, and map it to the value range of the locality parameters. Logistic mapping is a typical chaotic system.
In Equation (5), µ is the control parameter. When µ = 4, the system is completely chaotic. The search r i is mapped to the domain (0,1) of the logistic equation by Equation (6).
It iteratively generates a chaotic sequence by Logistic equation: The generated chaotic sequence is inversely mapped by Equation (8): This returns to the original solution space and produces a solvable chaotic sequence containing chaotic variables. r m i = (r m 1 , r m 2 , . . . , r m i ) The locality parameter r optimizes the search space in this sequence.

Improved Heuristic Ant Colony Algorithm
The ant colony algorithm was inspired by the research on real ant colony behavior, and has been applied to the optimization of communication networks and others. The essence of the ant colony algorithm is to use pheromone as a medium for ants in an ant colony to communicate. In the sensor layout, the ants must be moved toward the sensor node with a large submodular gain. The traditional ACA algorithm has the problem of blind search and it is easy to fall into local optimal solutions. To improve the heuristic function and pheromone, a new mechanism, improved heuristic ant colony algorithm, IHACA is proposed.
The heuristic function of the traditional ant colony algorithm takes no consideration of the distance relationship between the next node j and the adjacent cluster head, and the search is blind. Therefore, this paper adds the Euclidean distance between the next node j and the cluster head of the adjacent cluster. The improved heuristic function is as follows.
, w ∈ (0, 1). (10) where g i1 is the first node of cluster i and w is the weight. In order to avoid premature, stagnation or local optimization problems due to excessive pheromone concentration [27], this paper introduces a local and global pheromone update mechanism. Local update of pheromone helps ants to select unselected points, and a full update of pheromone helps to enhance the global search ability of the algorithm.
Each ant moves from node i to node j, and needs to update the pheromone on the path (i, j) that it just walked.
In Equation (11), n is the number of iterations, ξ is a local pheromone evaporation coefficient, τ 0 is a pheromone under initial conditions, and ε is a constant.
When all the ants complete this iteration, we select the shortest path and the longest path in this iteration to globally update the pheromone on the path.
In Equations (12) and (13), m is the number of ants, ρ is the evaporation coefficient of the global information system, τ k ij is the pheromone left by ant k on (i, j), Q is the pheromone quality coefficient, L best is the shortest path, and L worst is the longest path.

IHACA-COpSPIEL Algorithm
(1) Clustering Using the local parameter αr chaotic sequence pair of Equation (9) randomly divides the position set V into small clusters of diameter αr, where α ∈ (0, 1]. The nodes near their cluster boundaries are stripped, so the clusters are well separated. The locality of F makes the clusters almost independent and provides a wealth of information [13]. (2) Establishing module approximation In the ith cluster (C i ), a greedy algorithm is used to obtain the ranks of g i,1 , g i,2 , . . . , g i,n i on the ith cluster's nodes (n i ), and the nodes are connected in this order to form a chain of clusters. A module approximation graph G is created from G through these chains. A modular directional arithmetic algorithm is used on G to solve the corresponding objective function, the selected path in G is extended according to the corresponding shortest path in G, and the solution set A is output.
(3) Select the next position The initial node of A is used as the initial value of the Improved Heuristic Ant Colony Algorithm (IHACA). The IHACA algorithm selects the next position from the first node according to Equation (14), and adds the selected position to the taboo table or tabu k of ant k. η β ij is calculated by Equation (10), and τ ij is calculated by Equation (15).
In Equation (14), α is the weight of the path, β is the weight of the heuristic information, and τ ij represents the pheromone intensity of the path from the cluster C i to the k j sensor.

(4) Pheromone update
After the next position is determined, the pheromone traversed by the ant (i, j) is updated according to Equation (11). When all ants reach the endpoint, the global pheromone is updated according to Equation (12), and the tabu list is cleared.
The pseudo-code of the IHACA-COpSPIEL algorithm is as follows. Line 2 calls clustering with complexity O(m c max ). Lines 3-6 form a chain with complexity O(NlogN). Line 7 calls the block-oriented algorithm with complexity O(nlogN) where n is the number of connectable edges of graph nodes (n ≤ N). Lines 8-10 select nodes for A of the greedy algorithm with complexity O(log(N − n i )). Lines 13-17 reach the given maximum mutual information with complexity O(kN 2 ) where k is iteration times. Line 22 calls updating global pheromone with complexity O(N 2 ). The computational complexity of Algorithm 1 is O(kN 2 ) approximately.

Algorithm 1 Improved Heuristic Ant Colony Algorithm-Chaos Optimization of Padded Sensor Placements at Informative and cost-Effective Locations (IHACA-COpSPIEL).
Input: Position set V and covariance matrix Output: Solution set A 1: Initialize parameters: α, β, w, ξ, τ 0 , ε, n max 2: Divide V into m cmax clusters {C i |i ∈ [1, m c max ]} 3: for each cluster C i do 4: Sort position points in C i by greedy algorithm and then get the ranks of g i,1 , g i,2 , . . . , g i,n i 5: Connect g i,1 , g i,2 , . . . , g i,n i to form a chain which is then included into G i 6: end 7: Uses G as input of block-oriented algorithm to solve F(A) and then get the solution A , where G = {G i |i ∈ [1, m c max ]} 8: while a given maximum mutural information in A is not reached do 9: Select nodes for A with greedy algorithm 10: end 11: A =A 12: for n=1:n max do 13: while a given maximum mutural information in A n is not reached do 14: Select IHACA initial points in A n from head nodes in A 15: Select next point with Equation (14)  16: Update local pheromone τ ij (n) with Equation (11)

Communication Model
The energy consumption of data sent by sensor nodes is shown in Equation (16).
where k is the number of bits of transmitted data, d is the transmission distance, E elec (k) is the energy consumption of the transmitting circuit to send k bit data, and E amp (k, d) is the transmission power amplifier transmitting k bit data when the transmission distance is d. E elec is the unit energy consumption of the transmitting or receiving circuit, and d 0 is the threshold. E f s is the energy consumption parameter of the transmission power amplifier under the free space channel model and E mp is the energy consumption parameter of the transmission power amplifier under the multipath fading channel model. The calculation of the energy consumption of the receiving circuit to receive k bit data is shown in Equation (17).

Optimal Clustering
The number of cluster heads has a great impact on network performance. According to [14], the optimal number of cluster heads is shown in Equation (18).
In Equation (19), N A is the number of nodes in set A, M is the area side length, and d toBS is the distance from the node to the base station.
The probability of a node being elected as a cluster head is shown in Equation (19).

Fitness Function
The fitness values are based on parameters used to achieve the best solution. It considers intra-cluster compactness, inter-cluster separation and total energy consumption.
Tightness refers to the internal distance, that is, the distance between the nodes in the cluster and the cluster head (CH).
Separability refers to the distance between clusters, that is, the minimum distance between cluster heads. S = min The total energy consumption refers to both the cluster head communication energy consumption E CH and ordinary node communication energy consumption E NN , of which the energy consumption of the cluster head includes the energy consumption E RN required to receive data sent by the nodes in the cluster, the energy consumption E HD required to collect data for fusion, and the energy consumption E TB required to send data to the base station. The energy consumption of ordinary nodes includes the energy consumption E TH required to send data to the cluster head. Assume that the total number of nodes is N A , the number of cluster heads is m, and the ordinary nodes in each cluster are n 1 , n 2 , ..., n m . E CH = E RN + E HD + E TB = kE elec n i + kE DA (n i + 1) + (kE elec + kE mp l 4 HB ). (22) In Equation (16), E DA is the energy consumed by unit bit data fusion, and l HB is the distance between the cluster head and the base station.
In Equation (17), l NH is the distance between the nodes in the cluster and the cluster head.
In Equation (18), the closer the distance between the cluster nodes and the cluster head in a cluster, the better. The greater the separation between cluster heads, the better the total energy consumption. The fitness function is as follows.

Routing Protocol Based on BBO Algorithm
BBO algorithm is an information intelligence heuristic algorithm first proposed by Dan Simon in 2008. The habitats of biological populations have their corresponding Habitat Suitability Index (HSI), which is used to describe the quality of the habitat environment, and the factors that affect the fitness index are called Suitable Index Variables (SIVs). The BBO algorithm has the advantages of simple operation, fast convergence, and fewer parameters [28]. The standard BBO algorithm uses a simple linear migration model, but in the real biogeographic environment, species migration often occurs randomly and does not follow this rule. Complex and natural migration models are much better than simple migration models [23,29]. In this paper, a cosine migration model is used. When the number of species in the habitat is either large or small, the change in the immigration rate λ and the emigration rate µ is relatively stable. When the number of species in the habitat is neither large nor small, the immigration rate λ and the emigration rate µ changes quickly. The expression of the cosine migration model is shown in Equations (26) and (27).
In Equations (26) and (27), I is the maximum value of the immigration rate, E is the maximum value of the emigration rate, k is population number and n is the maximum population number.
The mutation operator provides a certain global search capability for the algorithm through the mutation of the habitat's own information.
In Equation (28), m max is the maximum mutation rate, p s is the probability that habitat i has s species, and p max = max(p s ).
The steps of optimizing wireless sensor network routing protocol based on the BBO algorithm are as follows. Lines 3-7 select CH with complexity O(n 2 ). Lines 9-31 reach the minimum fitness value with complexity O(qn 2 ) where q is the number of iterations. Lines 10-20 calculate the migration rate with complexity O(n 2 ). Lines  for k = 1: n do 11: Calculate the migration rate λ k according to Equation (26)  12: If λ k is greater than a uniformly distributed pseudo random number in [0,1] then 13: for t = 1: n do 14: Calculate the migration rate µ t according to Equation (27)  15: If µ t is greater than a uniformly distributed pseudo random number in [0,1] then 16: The roulette selection method is used to select the population to move out of the habitat t and move into the habitat k 17:

Parameter Settings
One of the application backgrounds of this research is for forest and grassland fire risk monitoring and early warning. The precision and accuracy of forest and grassland fire risk monitoring depend on the multi-source and space-time data which are collected under the high-resolution sensor layout. How to deploy as few sensors as possible to monitor a large area of forests and grasslands and how to measure such multi-source parameters related to fire risk warning as atmospheric temperature, light, soil temperature and humidity, wind speed and rainfall are the application problems that need to be solved. To meet the application requirements, this paper takes the distance between and among the wireless sensor nodes into consideration. The service time and life of the whole network are extended to the maximum extent under such constraints as communication energy consumption and node distances. In order to verify the comprehensive performance of the algorithm proposed in this paper, we conducted simulation experiments and comparisons among the four algorithms, i.e., the greedy algorithm, the pSPIEL algorithm, the ant colony algorithm, and the improved heuristic ant colony algorithm. The BBO-based routing protocol is used for data transmission. In this paper, the forest environment monitoring area is separated into |V| = N = 86 locations, and a subset is selected to deploy sensors. The experimental parameter settings are shown in Table 1. We explain the experimental parameter settings as follows.
In Algorithm 1, α is a parameter in Equation (14). If α is large, it would make the ant search according to the pheromone and fall into the local minimum easily, whereas if α is small and its value is 0.1, it would increase the randomness of the search. For the same reason, β in Equation (14), ξ in Equation (11), and ρ in Equation (12) are all 0.1. w is the weight parameter of the heuristic function in Equation (10). To achieve equilibrium of the distance effects from node j to node i and from node j to head of adjacent cluster on the heuristic function, w takes the value of 0.5. Q is a parameter in Equation (13) and its value is 1 in order to strengthen the positive feedback mechanism of the algorithm. τ 0 is a parameter in Equation (11) and it takes a small value of 0.0003 to increase the probability for ant to choose an optimal path. ε is a parameter in Equation (11), which takes constant value 1. n max is greater than N.
In Algorithm 2, p s is a parameter in Equation (28). The smaller the p s value is, the more it is prone to mutation. Hence the p s value is 0.1. p max is a parameter in Equation (28). It is the maximum value of p s and it takes a value of 1. I is a parameter in Equation (26) and E is a parameter in Equation (27). In order to make both the immigration rate and the emigration rate take value at [0,1], both I and E take value 1. E 0 , E elec , E mp and E f s take the commonly used default values. The value round max depends on the lifetime of all nodes of the network.

Results and Analyses
(1) Comparison of communication cost and the number of sensors Table 2 is a set of experimental data of the simulation. Tables 2 and 3 compare the greedy algorithm, the pSPIEL algorithm, and the IHACA algorithm, respectively. The algorithm proposed in this paper not only meets the same deployment requirements but also achieves the best results, that is, the lowest communication cost and the number of sensors. With the same amount of mutual information and compared with the greedy algorithm, the pSPIEL algorithm, and the IHACA algorithm, the communication cost of the IHACA-COpSPIEL algorithm is also the lowest. When mutual information is 0.16, sensor deployment has a high-cost performance, and the communication cost of the IHACA-COpSPIEL algorithm is 38.42% lower than that of the greedy algorithm, 24.19% lower than that of the pSPIEL algorithm, and 8.31% lower than that of the IHACA algorithm. The sensor deployments of the four algorithms are shown in Figure 1, where the blue dots indicate the possibly-deployed points and the red squares indicate the selected points for deployment.
The r value of the pSPIEL algorithm is randomly selected and the number of clusters is also random. Thus the number of clusters affects the selection of nodes and it is difficult to obtain the optimal number of clusters. Therefore the communication cost is high. The IHACA-COpSPIEL algorithm adds a chaotic operator, which can traverse the local parameter r value to obtain the number of clusters under different r values. The deployed nodes are selected within the optimal number of clusters. The heuristic function concerns the distance between the next node and the adjacent cluster head to minimize the communication distance between sensors. The first node of the solution set of the COpSPIEL algorithm is used as the first node of the IHACA. The node with maximum mutual information is selected as the deployment point. This can reduce the number of sensors and reduce the total communication cost, so it has better performance than any other algorithm.
The objective function of sensor deployment is submodular. When mutual information was 0.14 and the number of sensors was small, or when mutual information was 0.20 and the number of sensors was large, a sensor is added each time and the communication cost was the lowest compared with the greedy algorithm, the pSPIEL algorithm, and the IHACA algorithm. The more the amount of mutual information increases as the number of sensors increases, the better the sensor deployment effect. When the number of sensors is small, the increment of submodular benefit is large by adding a new sensor each time. As the number of sensors increases, the increment of submodular benefit starts to decrease whenever a new sensor is added.  Figure 2 shows that under the mutual information of 0.14-0.20, the ratio of the IHACA-COpSPIEL algorithm is higher than that of the greedy algorithm, the pSPIEL algorithm, and the IHACA algorithm. Therefore, IHACA-CpSPIEL achieves the best cost-benefit ratio. (2) Comparison of the lifecycle and average energy The life cycle refers to the time from which the wireless sensor starts to work until the death of the first node. Figure 3 shows the comparison results of the life cycle of the sensor nodes selected by the routing protocol based on the BBO algorithm for the greedy algorithm, the pSPIEL algorithm, the IHACA algorithm, and the IHACA-COpSPIEL algorithm proposed in this paper. It shows from Figure 3 that the first dead node under the greedy algorithm appears in round 1368, the first dead node in the pSPIEL algorithm appears in round 1430, the first dead node in the IHACA algorithm appears in round 1272, and the first dead node of the IHACA-COpSPIEL algorithm proposed in this paper appears in round 1681, which indicates that the wireless sensors deployed in the IHACA-COpSPIEL algorithm have a longer life cycle. The reason for this is that the communication distance of the wireless sensors deployed by the IHACA-COpSPIEL algorithm is the shortest, which reduces the energy consumption of transmission. Figure 4 shows a comparison of the remaining energy percentages of data transmission after the sensors are deployed in each of the four algorithms. The IHACA-COpSPIEL algorithm has a higher percentage of remaining energy per round than the greedy, pSPIEL, and IHACA algorithms. Therefore, the overall energy consumption of the proposed algorithm is lower than any of the other three algorithms.  As the number of simulation rounds increases, Figures 5 and 6 show both the deaths of nodes by using the LEACH routing protocol and the BBO routing protocol, respectively, and the network energy consumption results by using the LEACH routing protocol and the BBO routing protocol, respectively, to deploy sensors in the IHACA-COpSPIEL algorithm. As is seen from Figure 5, under the LEACH protocol, the first node died in round 1435 and the last node died in round 1584, and under the BBO routing protocol, the first node died in round 1659 and all nodes died in round 2071. The network survival time is longer than the former. As is seen from Figure 6 that the remaining energy under the routing protocol based on the BBO algorithm has always been more than that under the LEACH protocol. This is because the routing protocol based on the BBO algorithm fully takes the cluster into account. The distance between the nodes and the cluster heads, the distance between the cluster heads and the cluster heads, and the total energy consumption effectively balances the network load. Hence the life of the entire wireless sensor network is significantly extended.

Conclusions
In order to reduce costs and save energy, this paper proposes a large-scale sensor deployment method called the IHACA-COpSPIEL algorithm and a routing protocol based on the BBO algorithm. Mutual information is introduced to describe the correlation between observed points and unobserved points, a mathematical model with submodularity is established, and the edges of graph theory are used to represent communication costs. The pSPIEL algorithm with enhanced optimization ability by a chaos operator and the ant colony algorithm with improved heuristic function and pheromone update mechanism are used to find the optimal path. What has been studied can further solve the sensor deployment problem under the constraint of communication cost. Finally, the BBO algorithm-based routing protocol transmits data to the deployed sensors. The computational complexity of the IHACA-COpSPIEL is O(kN 2 ), and the computational complexity of the routing protocol based on the BBO algorithm is O(qn 2 ). The experiments show that the deployment algorithm proposed in this paper has better sensor deployment capabilities. This deployment algorithm reduces the communication cost by 38.42% compared with the greedy algorithm. It also reduces the number of sensors and has a longer life cycle. Compared with the LEACH protocol, the BBO algorithm-based routing protocol has lower energy consumption and longer network life.
In the future, we intend to use a discrete event simulator (DES) such as NS-3 to further combine practical application scenarios to improve the effectiveness of the algorithm. Our vision for future work is as follows.
We will complete the IHACA-COpSPIEL protocol design in the NS-3. We will refer to the RFC document of Multi-Protocol Label Switching protocol, and elaborate on the design and implementation of each basic component of IHACA-COpSPIEL, including the forwarding equivalence class (FEC), next hop label forwarding entry (NHLFE), FEC to NHLFE mapping (FTN), etc. By statically configuring the label forwarding table, the communication between private networks through the backbone network by IHACA-COpSPIEL forwarding will be realized.