Optimal Data Transmission for WSNs with Data-Location Integration

Wireless sensor networks (WSNs) have good performance for data transmission, and the data transmission of sensor nodes has the function of symmetry. However, the wireless sensor nodes are facing great pressure in data transmission due to the increasing amount and types of data that easily cause premature energy consumption of some nodes and, thus, affects data transmission. Clustering algorithm is a common method to balance energy consumption, but the existing algorithms fail to balance the network oad effectively for big data transmission. Therefore, an optimal data transmission with data-location integration (ODTD-LI) is proposed for WSNs in this paper. For optimal data transmission, we update the network topology once for one round. In the proposed algorithm, we perform calculations of the optimal cluster heads, clustering and data transmission routing through three steps. We first deploy N homogeneous and symmetry nodes in a square area randomly and calculate the optimal number of cluster heads according to the node ocations. then, the optimal number of cluster heads, energy consumption, the distances and degrees of the nodes are taken into consideration during the clustering phase. Direct communication is carried out within a cluster, and the member nodes of the cluster pass the information directly to the cluster head. Lastly, an optimal hybrid routing from each cluster node to Sink is constructed for data transmission after clustering. The simulations verify the good performance of the proposed algorithm in view of the ifetime, average delay, coverage rate (CR) and oad balance of the network compared with the existing algorithms. Through the research conducted in this paper, we find that our work has good performance for selecting the hybrid routing in the network with the nodes randomly arranged.


Introduction
Wireless sensor networks (WSNs) are a kind of wireless self-organizing network without infrastructure, which can collaborate on monitoring in real time and collect the various environmental or object information within its monitoring area. WSNs deal with the collected data to obtain accurate information, and the WSN finally delivers the information to the user [1]. The network structure diagram is shown in Figure 1. The wireless sensor nodes can be characterized by arge quantity, small size, weak communication and computing power, imited and irreparable power supply, heterogeneous energy and so on [2]. Due to the imited attributes such as bandwidth, data storage capacity and node energy, the question of how to satisfy the quality of service (QoS), such as delay and the fault tolerance of the network, while simultaneously maximizing the network ifetime is the biggest challenge for WSNs [3,4]. The main purpose of WSN research includes two aspects: (1) the developments of new networking technologies, which are required to accommodate the highly dynamic environments and self-organizing capabilities; and (2) the developments of networked information processing technologies, which are required to quickly extract useful, reliable and timely information to users. Therefore, WSNs as one kind of the important technology could have great impact on the 21st century. On the other hand, with the rapid development of wireless communication technology, artificial intelligence (AI), big data and computing technology, WSN-based big data transmission technology has attracted much attention from many countries. The data transmission in WSNs combines sensor technology, embedded technology, distributed information processing technology, communication technology, etc. Moreover, big data transmission technology [5,6] based on multi-layer heterogeneous WSNs plays a crucial role. In general, the sensor nodes have ow initial energy but big data transmission always causes a arge amount of energy consumption, which could induce conflicts. Furthermore, due to the damage of sensor nodes, oss of hardware, geographical ocation, signal noise and other reasons, big data transmission often results in inaccurate or delayed data acquisition. However, the repair costs of sensors are relatively high, and it is also hard to avoid the above problems again after repairing. Therefore, maintaining the optimal topology dynamically to satisfy the QoS of the data transmission has important research significance.
In summary, many achievements have been made in the WSNs research and applications in recent years . By constructing the optimal data traffic transmission path between the cluster heads and Sink, these clustering algorithms optimize the network data transmission routing and have been used in many practical applications, such as environmental monitoring, internet of things, intelligent city and vehicle-to-everything communications. However, there are still many new problems that need to be solved urgently because of people's increasing demands for big data transmission. The energy of sensor nodes is ikely to be drained unnecessarily and quickly, which results in the death of some sensor nodes in the network. Therefore, the question of how to propose an optimal multi-layer heterogeneous network topology with fast convergence and ow energy consumption for big data transmission in WSNs has been an important topic. The innovative achievement of this research ies in the fact that data and ocation information of the sensor nodes are combined together to obtain the global optimization of the network data transmission topology.
Motivated by the aforementioned observation, we explore an optimal data transmission for WSNs with data-location integration. The proposed algorithm uses a mathematical modeling method to study the distributed multi-layer network data transmission algorithm and finally builds an optimal hybrid multi-layer heterogeneous network topology to meet the QoS requirements of big data transmission and strong scalability and universality.
The main contributions are isted below.
(1) The optimal hybrid topology can provide multi-QoSs (the ifetime of network, delay, CR and oad balance) support for data transmission in WSNs.
(2) The designed big data transmission strategy can meet the target of ow-power consumption and real-time data transmission under the condition of imited network resources.
(3) Frontier crossover research driven by practical problems is the feature of this paper. Research problems are highly applicable, and the research results provides strong theoretical support for many fields. For example, the efficient parameter self-adaptation in the proposed algorithm provides a theoretical basis for the automatic network.
Here, we emphasize the necessity of our research again. In a WSN with imited resources, the constructed optimal data transmission mathematical model is always a NP-hard problem when there are multi-constraint conditions. The optimization target includes the maximum network CR, owest energy consumption, highest data accuracy, etc. In order to solve this problem, we propose a new hybrid optimal multi-layer heterogeneous network topology with data-location integration for data transmission.
The remainder of the paper is organized as follows. Section 2 provides the related work of the problem. Section 3 introduces the optimal cluster head selection algorithm based on geographical ocation. Section 4 presents the system model and the ODTD-LI algorithm in detail. The simulation results and performance analysis are presented in Section 5. Section 6 provides the conclusion of this paper.

Related Work
Different network topologies result in different performance for data transmission. In 2000, Heinzelman, Chandrakasan and Balakrishnan proposed the classical clustering LEACH algorithm for data transmission in homogeneous WSNs for the first time [12]. The LEACH algorithm ensures ow energy consumption on account of the rotation mechanism of cluster head, and the network topology is shown in Figure 2. The LEACH algorithm has been recognized and improved by a arge number of researchers, which also ignites researchers' enthusiasm for the research of data transmission based on the multi-layer network. In 2002, Heinzelman improved their LEACH algorithm and then proposed the C-LEACH algorithm [13]. The C-LEACH algorithm renders energy consumption more stable by reducing the probability of the nodes being selected as cluster heads again, for which its residual energy would be ower. Younis and Fahmy propose another classical HEED clustering algorithm [14]. The HEED algorithm considers the maximum residual energy of the nodes in cluster head selection phase, and oad balancing is further improved. In the I-SEP algorithm [15], the super sensor nodes are deployed in the monitoring area for data transmission, which renders the network performance stronger than homogeneous WSNs.
As previously mentioned, stronger network topology needs to be designed to satisfy the modern big data transmission characteristics. The distributed EEUC algorithm [16] considers the node energy consumption and the distance between the source node and Sink in the cluster head selection phase. The EEUC algorithm marks all the nodes and equalizes network energy consumption. In the CUCA algorithm [17], the selection of the cluster head considers not only the distance factor but also the factors of network coverage and node residual energy. In ERA algorithm [18], the cluster head establishes multi-path data transmission according to the residual energy, but it is easy to cause premature emergence of the energy hole. Nabajyoti et al. [19] proposed a distributed fuzzy ogic algorithm based on the energy perception and CR in 2017, where the selected cluster heads were non-uniform. However, the non-uniformity of cluster heads cannot meet the requirements of minimum energy consumption and maximum CR of the network. In 2018, a distributed multi-layer DFCR clustering algorithm [20] was proposed with better big data transmission performance. The DFCR algorithm takes into account the degree, centrality, residual energy, distance and other factors of the nodes and then constructs a multi-objective optimization algorithm to meet the uneven consumption of the nodes' energy. Zhang et al. [21] proposed an adaptive consensus-based distributed target tracking with dynamic cluster in sensor networks. The paper is concerned with the target tracking problem over a filtering network with dynamic cluster and data fusion. In recent two years, the clustering algorithm has been improved for some practical applications and achieved good performances [22,23].
Some tree-based hybrid topologies also have good performance for big data transmission in some practical multi-layer heterogeneous WSNs. Zhang et al. proposed an effective hybrid ink-adding strategy in order to enhance the network traffic for scale-free networks, and then they proposed a related routing ink-adding strategy for arge-scale heterogeneous WSNs in 2019 [24]. Chowdhury et al. [25] proposed a hybrid heterogeneous topological structure in order to meet the real-time performance for big data transmission. Yao et al. [26] have proposed an energy-adaptive and bottleneck-aware tree-based communication scheduling for battery-free WSNs, and the simulations show that the algorithm has high performance in terms of communication atency and energy usage ratio. Zhang et al. [27] proposed an optimal tree-based topology in order to meet the requirements of big data transmission by improving ocal optimization relative to global optimization.
Nowadays, WSNs are typically used with dynamic conditions of task-related environments for monitoring and gathering raw sensor data. It is hard to use the traditional techniques to solve these dynamic data transmission tasks. As a good solution to these problems, machine earning techniques were able to successfully handle these dynamic situations [28]. For optimization of microgrid-connected WSNs with oad balancing and intrusion detection, some authors [29,30] use the data-driven deep earning approach to solve the above-mentioned data transmission tasks with dynamic conditions. We find that the proposed method [29,30] can outperform the state-of-the-art solutions in terms of recognition accuracy. Zhou et al. [31] also trained deep neural network in WSNs for image target recognition when data, energy and computation resources are imited.
Wang et al. [32] proposed an efficient big data transmission EEICS technology. The main contribution of EEICS is the safe data transmission mechanism within the cluster and energy-efficient model. In EEICS, the network is divided into a number of different ayers based on the distance from node to Sink, and the optimal node is selected as the relay node in each ayer. Thus, EEICS maintains the network overhead effectively. There are also some other hybrid topologies with good performance for big data transmission [33][34][35][36]. Much of the research achievements have been widely used in the military, Internet of Things (IoT), artificial intelligence (AI), environmental monitoring (EM), modern intelligent transportation, convenient express, modern medical science, industrial application, space exploration and other fields. These applications reflect the importance of this issue.

Optimal Clustering Method Based on the Node Locations
In order to better describe the proposed algorithm, we introduce the following assumptions of the network.
(1) Deploy N homogeneous and symmetry nodes in a square area randomly, and the side ength of the square is L. The nodes cannot be moved after they are deployed.
(2) Each node has an identical transmission radius r.
(3) Each node knows the information of its neighbor node.
(4) The initial energy of each node is identical. (5) The node has the function of data aggregation. (6) The position of Sink can be adjusted. (7) The network is updated once for one round. (8) There is no attack node in the network.

Energy Model
The energy model in this paper is similar to that of the LEACH algorithm. The energy consumption of transmitting, receiving and aggregating data is formulated as follows, respectively: where d is the distance from source node to Sink; d 0 = ε f s ε mp ; E tx (l), E rx (l), E ag (l) represent the energy consumption, respectively, when the node transmits, receives and aggregates l bits data; E elec is the energy consumption of wireless transceiver circuit in transmitting and receiving data; E da represents the energy consumption while aggregating 1 bit data; ε f s and ε mp represent the magnifications of the signal amplifier power under the free-space model and the multi-channel attenuation model, respectively.
Obviously, if d ≤ d 0 , the communication model of the network is a free-space model, otherwise a multi-channel attenuation model.

Optimal Number of the Cluster Heads
The first step in this paper is to construct the network topology according to the LEACH algorithm. Since the network selects the optimal cluster heads to obtain the maximum utilization of node energy, we need to calculate the optimal number of cluster heads based on the node ocations. Firstly, we use the integral to compute the expectation of distances between the cluster heads and sink. Then, we obtain the expectation of distances between the cluster heads and the nodes within its cluster. At this time, we may obtain the optimal number of cluster heads by calculating the entire expected value of energy consumption. We select the optimal cluster heads by considering the aspects such as energy consumption of the nodes, degrees and distances of the nodes in the subsequent process.
Assume that there are k clusters and N nodes in the network. The energy consumption of the network under the free-space model is as follows: where d CH−Sink represents the expected distance between the cluster head and Sink; the average number of nodes in each cluster is N k ; N k − 1 · l · E elec represents the energy consumption for the cluster head to receive N k − 1 · bits data; N k · l · E da is the energy consumption for the cluster head to aggregate the receiving data; and l · E elec + l · ε f s · d 2 CH−Sink represents the energy consumption for the cluster head to transmit l bits data.
In a free-space model, denote the ocation of the optimal cluster head as (x, y), and the coordinate of Sink as (a, b). Therefore, the expectation of squared distance between the cluster head and Sink is the following: where d CH (x, y) represents the distance from source node to cluster head; S is the square area of node distribution. Assume that there are l bits data within one cluster to be transmitted. The energy consumption within one cluster in one round is as follows: where d CH−members is the distance from cluster head to source node within one cluster. The expectation of squared distance from cluster head to source node within one cluster is as follows.
The energy consumption of one cluster is as follows.
Therefore, the energy consumption in one round is the following.
Then, we can obtain the optimal number of cluster heads.
Similarly, the optimal number of cluster heads in the multi-channel attenuation model space can be obtained as follows.

The Scheme of Cluster Head Selection
In this paper, the selection of cluster heads takes into account the comprehensive consideration of the remaining node energy, node degree of v i (i.e., the number of nodes that can communicate with node v i ), the distances from nodes to Sink and the CR. CR is a ratio of the nodes in the constructed data transmission topology to the total nodes, and we use CR to represent it hereinafter. It's worth noting that the selection of cluster heads in this paper is different from that of the LEACH algorithm, which randomly selects sensor nodes as cluster heads according to the preset probability in the network. Based on the above considerations, we first calculate the candidate value of node v i being a cluster head as follows: where α, β, γ represent the parameters and satisfy 0 < α, β, γ < 1, α + β + γ = 1; E Remain−i represents the remaining energy of node v i ; E Initial represents the initial energy of all the nodes; Deg i represents the degree of node v i ; Deg max represents the maximum degree of all the nodes; Dis i −Sink represents the distance between node v i and Sink; Dis max − Sink represents the maximum distance from source nodes to Sink. When the candidate values are sorted from high to ow according to the above equation, the candidate set Candidate_CH{} is obtained. Then, the optimal cluster head is selected from the candidate sets when the minimum CR of the network is equal or greater than 90%. The set of cluster heads List CH{} is also established.
The cluster head selection method is shown as Algorithm 1.

Algorithm 1
The cluster head selection algorithm.

Require:
N //number of alive nodes V(i, j), i, j = 1, 2, · · · , N //location of alive nodes S(a, b) //location of Sink Neighbor v i {} //neighbor information of node v i r // node transmission range Ensure: List CH{ i | i = 1, · · · , N opt−CH } 1: Declare List CH{} = NULL 2: for i = 1 to N do 3: Calculate the candidate value of node v i as a cluster head  for j = i + 1 to N do 8: if n = N opt−CH then 9: if CR < 90% then

Our Proposed ODTD-LI Algorithm
In this section, we construct an optimal hybrid routing from each cluster head to the Sink after clustering. In order to reduce the complexity of the algorithm, the transmission within one cluster is a one-way direct transmission, which means that the nodes within the cluster transmit the collected data to the cluster head by one hop directly. When the data within the cluster are collected, the cluster head needs to transfer the data to Sink. Researchers [10,16,27,35] often employ the following four strategies: (1) The cluster heads transmit the collected data to Sink directly through one hop. (2) The Sink uses the greedy algorithm to construct a multi-hop routing to the cluster heads, and there are only cluster heads in the routing. (3) The Sink uses the greedy algorithm to construct the multi-hop routing to the cluster heads. There are only non-cluster heads in the routing, and all the cluster heads are eaf nodes, which are not used as relay nodes again. (4) The hybrid routing is used to construct the optimal routing from the cluster head to the Sink. Both cluster heads and non-cluster heads nodes can be used as relay nodes.
In Strategy (1), the energy of a few cluster heads is ikely to be drained because they need to transfer much more data than other nodes. In Strategy (2), the nodes near the Sink also need to transfer much more data than other nodes, and the energy holes are easily appear near Sink. In Strategy (3), all the cluster heads are eaf nodes, and they are no onger responsible for forwarding data from other cluster nodes, which results in a waste of node resources. In Strategy (4), all nodes in the network can be used for data transmission, and the data traffic is effectively averaged across all nodes. This strategy can effectively balance the data oad on nodes. In this way, the ife of the node can be extended.
In order to enhance the oad balance of the nodes, we chose Strategy (4) for data transmission and constructed a hybrid integer inear mathematical model in order to solve the aforementioned optimal problem. Assume that the number of the network rounds is R. Therefore, the mathematical programming problem of maximum CR and minimum energy consumption is established as follows. s.t.
where E round−k represents the energy consumption of the network in round k; R is the network running rounds; Deg CH_i is the number of nodes with cluster head v i ; and |ListCH[i]| is the number of nodes within the set of cluster ListCH[i]. Equation (13) represents the maximum number of nodes in the network topology, which meets the requirement of the maximum CR. Equation (14) represents the owest energy consumption target of the whole network. Equation (15) represents the energy consumption constraint of each round. Equations (16) and (17) are non-empty constraints on the set of the representative cluster heads. Equation (18) is a constraint on the number of nodes in each cluster, ensuring that the members in each cluster are appropriate for meeting the oad balance of data.
There are two objectives (13) and (14) at the same time here, so we need to further combine them into a single objective function. Here, we introduce an error value ε to represent the combined single objective function as follows: where κ 1 and κ 2 are the assumed inear correlation coefficients.
In Equation (19), the mathematical expectation of ε is expected to be 0, that is: Therefore, Equations (13)- (18) are changed to the following: such that However, in practice, due to the nonlinear relationship between these two optimization goals, it is difficult to combine them, which means the optimization problem is NP-hard. Thus, we need to use the heuristic algorithm to approximate the solution.
Algorithm 2 provides the optimal tree-clustering energy-efficient algorithm for secure big data transmission. Firstly, the network is divided into several clusters according to the LEACH algorithm. Secondly, the optimal number of cluster heads is calculated, and the network is divided into a certain number of clusters according to the parameters such as the distances, degrees and energy of nodes. Then, we perform security detection within the cluster. Finally, an optimal tree-clustering energy-efficient data transmission model is constructed from the cluster heads to Sink.
In order to better describe the proposed algorithm, a specific flow chart is shown in Figure 3.

22:
end if 23: end for 24: end for 25: // Tree-clustering based multi-routing data 26: for i = 1 to N opt−CH do 27: // Data transmission phase 28: for j = 1 to N do 29: use greedy algorithm to construct a minimum distance-based transmission tree from node v i to Sink 30: if the relay node of v i is v j then 31: node v i .parent→ node v j 32: end if 33: end for 34: end for 35: Rounds = +1 36: end while

Experimental Results and Analysis
In order to verify the effectiveness of Algorithm 2, we perform the simulation experiments by comparing with the other four algorithms in this section: LEACH, Greedy Incremental Tree Algorithm (GIT), Power Efficient Data Gathering and Aggregation Protocol (PEDAP) and EEICS. The simulation environment is VC6.5 and MATLAB 2017B. The simulation parameters are shown in Table 1.

The Lifetime of the Network
We usually use the survival nodes in the network with change in the network running time in order to reflect the ifetime of the network.
In Figure 4, we can observe that the node death speed becomes faster and faster when the running time of the network increases. This phenomenon is caused by the oad increase in the node ink, which renders the node unable to bear the subsequent data transmission tasks. Particularly for the LEACH algorithm, the ratio of dead nodes is higher than ratio in the other four algorithms because LEACH only considers the node distances in the construction of the data transmission path.
The number of alive nodes can reflect the variation of the network performance. As shown in Figure 4, the proposed ODTD-LI algorithm improves the network ifetime effectively. When the first node dies, the network ifetimes of ODTD-LI, EEICS, PEDAP, GIT and LEACH reach 8015 rounds, 6760 rounds, 4632 rounds, 4520 rounds and 4026 rounds, respectively. When there are 10 nodes dead, the network ifetimes of ODTD-LI, EEICS, PEDAP, GIT and LEACH reach 9268 rounds, 8074 rounds, 8106 rounds, 7835 rounds and 7726 rounds, respectively. When there are 10 nodes dead, the performance improvement rates of the network ifetime in ODTD-LI can reach up to 30.36% and 34.34%, respectively, compared with EEICS and PEDAP because the proposed ODTD-LI algorithm is better in the construction of the optimal hybrid routing. In particular, the number of optimal cluster heads, energy, distances and degrees of nodes is taken into consideration in our scheme, which improves the network oad balance in the tree maintenance phase.

Average Delay of the Network
Real-time data transmission is one important measure of network overhead. In this paper, the atency time (the network average delay) for data transmission represents the average data transmission time from all nodes to Sink. From Figure 5, we can observe that the proposed ODTD-LI algorithm has ower delay compared with EEICS, PEDAP, GIT and LEACH. Unfortunately, the average delay of LEACH is the owest in all the algorithms. This is because relay nodes handed down in LEACH number the east, and the network topology is single, which is basically point-to-point data transmission. However, in the proposed ODTD-LI algorithm, the communication time of ink and the computing time of nodes increased correspondingly because when the number of network nodes increases, the number of inks and the amount of data increase.
For the hybrid topology with 300 nodes, the delays are 20.63 ms, 21.8 ms, 22 ms, 22.18 ms and 22.35 ms for ODTD-LI, EEICS, GIT, PEDAP and LEACH, respectively. The performance improvement ratios almost reach 6.9% and 7.7%, respectively, compared with GIT and PEDAP. Of course, we can observe from Figure 5 that the overall delays of all the algorithms grow with the increase in network nodes. This is because the density of nodes results in the increase in data and sub-routes that, in return, results in an increase in data transmission delays from the source nodes to Sink. Fortunately, we construct an optimal clustering model at the beginning, and then construct an optimal routing from each cluster head to Sink after clustering. In this manner, the delay is effectively reduced.

Node CR
Node CR represents the transmission capability of the network. The higher the node CR is, the higher the correlation between the nodes and the higher the accuracy of the collected data. Figure 6 provides the node CRs of the network when the network runs 10,000 and 20,000 rounds, respectively. When the network running rounds increase, the node CR of all the five algorithms decrease because the dead nodes gradually appear in the network. We can observe from Figure 6 that the proposed ODTD-LI algorithm has higher node CR than the other compared algorithms, which explains the advantage of the proposed ODTD-LI algorithm in maintaining the capability of data collection and transmission from another aspect.

Load Balance
Load balance can reflect the data transmission capacity of each node and each routing, and we always use the variance of energy consumption to represent the oad balance of the network. Figure 7 shows the variance of energy consumption under 300 nodes, and we can observe that the proposed ODTD-LI algorithm has ower variance than the compared algorithms. Remarkably, ODTD-LI maintains a small and stable variance of energy consumption with the increase in the network rounds. The LEACH, GIT and PEDAP algorithms have higher variances of energy consumption with the increase in rounds. This is because the algorithms do not maintain the constructed network topology dynamically. For the above three algorithms, data congestion easily occurs in part of the ink when there are dead nodes in the network. As a result, the data oad cannot be balanced for the data transmission routing. However, the EEICS and ODTD-LI maintain the network topology dynamically, and the data oad of each node is balanced. In addition, compared with the EEICS algorithm, the proposed ODTD-LI algorithm improves the data transmission routing based on data-location integration. Therefore, the data sizes in each node and routing are balanced.

Conclusions and Future Work
This paper mainly improves the clustering phase, the data transmission path and the communication mechanism. Firstly, the number of the optimal cluster heads is calculated according to the geographic ocation information. Then, the optimal cluster heads are selected according to energy, node degree and other information. Lastly, we construct an optimal hybrid routing for big data transmission. The simulation results show the good performance of the proposed ODTD-LI algorithm in terms of the ifetime, delay and network CR, and oad balance. This research mainly optimizes the network topology and data transmission scheme from the view of the network ayer. Compared with the EEICS algorithm, the proposed ODTD-LI algorithm achieves 2.59%, 14.38% and 10.74% improvement in key performance indicators (KPIs) of the network average delay, ifetime and node CR, respectively. The proposed ODTD-LI algorithm also has good stability in energy consumption, which can be reflected from Figure 7.
In future works, we plan to design a scalable and secure big data transmission algorithm by combining big data transmission and the modern intelligent optimization algorithm with the multi-layer heterogeneous network. This could provide new ideas and methods for improving the use ratio of the network, and the proposed algorithm could meet the network QoS better and offer the theoretical and technical support for secure transmission of big data.