A Novel Energy-Efficient MAC Aware Data Aggregation Routing in Wireless Sensor Networks.

Embedding data-aggregation capabilities into sensor nodes of wireless networks could save energy by reducing redundant data flow transmissions. Existing research describes the construction of data aggregation trees to maximize data aggregation times in order to reduce data transmission of redundant data. However, aggregation of more nodes on the same node will incur significant collisions. These MAC (Media Access Control) layer collisions introduce additional data retransmissions that could jeopardize the advantages of data aggregation. This paper is the first to consider the energy consumption tradeoffs between data aggregation and retransmissions in a wireless sensor network. By using the existing CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) MAC protocol, the retransmission energy consumption function is well formulated. This paper proposes a novel non-linear mathematical formulation, whose function is to minimize the total energy consumption of data transmission subject to data aggregation trees and data retransmissions. This solution approach is based on Lagrangean relaxation, in conjunction with optimization-based heuristics. From the computational experiments, it is shown that the proposed algorithms could construct MAC aware data aggregation trees that are up to 59% more energy efficient than existing data aggregation algorithms.


Introduction
Wireless sensor networks (WSNs) have been blooming recently, which can probe and collect environmental information, such as temperature, atmospheric pressure, and irradiation to provide ubiquitous sensing, computing, and communication capabilities. A WSN has two important and interesting characteristics that are different from traditional wireless networks. First, after the event occurs, multiple sensors nodes (denoted as data source nodes) around this event will sense the event, and then send the data back to one sensor node (denoted as sink node). Hence, communication mode in WSN occurs from multiple data source nodes to one data sink node. This is a type of multipoint-topoint, rather than the traditional point-to-multipoint (i.e. multicast) communication in wireless networks. For example, Figure 1 shows a data aggregation tree from three data source nodes to one sink. This data aggregation tree is a type of reverse-multicast tree. Second, energy saving is possible at the nodes on the data aggregation tree because intermediate nodes on the data aggregation tree could receive redundant data from the data source nodes. In order to avoid transmitting useless, redundant data back to the sink, the intermediate nodes could save energy by collecting and processing data before transmission and prevent disconnected networks due to rapid energy depletion of sensors. This type of data aggregation capability has been put forward as particularly useful for routing, in terms of energy consumption in WSN [2]. There are several data aggregation schemes, and in addition to reducing redundant transmissions, other aggregation schemes could compute maximum values (MAX), minimum values (MIN), or summations (SUM) of the collected data. For example, in Figure 1, an event in sensing range of data source nodes n 1 , n 2 , and n 3 is probed for temperature (60, 65, and 63˚F, respectively), and the MAX temperature is then sent back to the sink node S. If node n 3 could aggregate (i.e. MAX = 65˚F) these data before returning it to the sink, the total number of transmission times for node n 3 could be reduced from three to one.
Since it is almost impossible to replace the battery in a sensor node, power efficient communication in WSN plays a crucial role. In data aggregation routing, the key issue is how to construct the reverse multicast tree in such a way to save the total energy consumption. Most existing research literatures construct the tree by only considering the data aggregation aspect [2,6]. The basic idea of these data aggregation aspect algorithms is trying to maximize the times of aggregation to reduce the number of transmissions. However, there remains one issue important to the construction of a data aggregation tree, the MAC layer retransmission issue.
In WSNs, any sensor nodes within another's transmission range trying to transmit simultaneously would result in collision. In addition, two nodes that are not within each other's transmission range trying to simultaneously transmit to the same node might also incur collision. This is well known as the hidden-node problem. Because of hidden-node problem, the interference range is larger than the transmission range in wireless communications. In Figure 2(a), shows that even though the transmission radius of nodes n 1 and n 3 do not overlap, collision still occurs at the receiver (node S) when they transmit at the same time. When collision occurs, retransmission is required to ensure the data is successfully received. These retransmissions incur additional energy consumption, which will jeopardize the advantages of data aggregation. Data retransmission times are determined by the total number of sensor nodes whose transmission radius covers the receiver (or equivalent to the total number of sensor nodes within each other's interference range). In other words, the more flows are aggregated, the higher the probability that the senders will incur data retransmission. Hence, there is a tradeoff between data aggregation and retransmission. Good data aggregation tree should address data aggregation and MAC layer retransmission at the same time.   Figure 2 gives an illustrative example to show the tradeoff between the data aggregation and retransmission, where nodes n 1 , n 2 , and n 3 are the data source nodes. Without considering data collision, the optimal aggregation tree is as shown in Figure 2(a). Note that when an intermediate node aggregates more data, a greater number of collisions would occur at the intermediate nodes, which results in additional energy consumption. Node S, the receiver of the three children nodes, will suffer significant collisions that results in more retransmission times. With considered collision effects, a more energy efficient data aggregation tree is as shown in Figure 2(b). In this figure, by reducing the transmission radius of node n 1 , and change its routing assignment to node n 4 , the total energy consumption could be reduced. Even though there is extra energy consumption at node n 4 , there are only two children nodes at node S, and thus, the retransmission times caused by collision could be significantly reduced. Hence, the energy consumption associated with retransmission from collisions should be carefully addressed in WSN. This example also shows that a good tradeoff between data aggregation and retransmission is facilitated by intelligent transmission radius and routing assignments. The energy consumption function (including transmission power and retransmission power), as shown in Figure 2, is calculated by its objective function (IP), as described in Section 3.
This paper discusses the impacts of retransmission on data aggregation, and proposes a MAC aware energy efficient data aggregation algorithm to consider a tradeoff between the benefits of data aggregating and data retransmission costs in WSN. To the best of our knowledge, there is no literature that addresses the cross-layer (layer 2 and layer 3) MAC aware data aggregation routing algorithm in WSNs. This paper proposes an optimization-based heuristic algorithm to solve the MAC aware energy-efficient data aggregation routing problems (MAC-DAR) based on the CSMA/CA protocol in WSNs. The problem is first formulated as a nonlinear programming problem, where the objective function is to minimize total energy consumption from data transmissions and retransmissions. The Lagrangean relaxation scheme in conjunction with the optimization-based heuristic algorithm is proposed to solve this problem. From the computational experiments, the proposed solution approach outperforms the conventional non-MAC aware data aggregation heuristics. In addition, the proposed nonlinear programming formulation for the MAC-DAR problem is based on the existing CSMA/CA protocol, and thus, our algorithm could be deployed in the wireless sensor network, without the necessity of modifying the MAC protocol in WSNs. In summary, besides better solution quality, our proposed approach could be easily deployed in WSNs without changing the existing CSMA/CA protocol.
The remainder of this paper is organized as follows. Section 2, surveys existing related works on data aggregation routing and MAC layer protocols in WSNs. In Section 3, mathematical formulation of the MAC-DAR in WSNs is proposed. In Section 4, solution approaches, as based on Lagrangean relaxation are presented. In Section 5, heuristics are developed for calculating a good primal feasible solution. In Section 6, computational results are reported. Finally, Section 7 concludes this paper.

Related Works
Existing researches have been conducted to address pure data aggregation routing problem in WSN. In [2], they devise three interesting suboptimal aggregation heuristics, Shortest Paths Tree (SPT), Center at Nearest Source (CNS), and Greedy Incremental Tree (GIT) for data centric routing problems. In [6], mathematical formulations for data aggregation problem in WSN are well formulated, and an optimization-based heuristic algorithm is then proposed to tackle the problem. In [5], they address latency issues in constructing a minimum energy aggregation tree, and propose the CCA algorithm, which includes the basic idea of a balanced tree to simultaneously minimize energy and latency issues.
Several papers have discussed MAC layer protocol in ad-hoc and sensor networks [7][8][9]. X. H. Lin [9] enhanced the standard IEEE 802.11 MAC protocol by improving the handshake and power control mechanisms. W. Ye [7,8] reviewed several MAC protocols, and discussed design tradeoffs on energy efficiency and data transmissions. W. Ye proposed S-MAC protocol to fit the energy-efficient requirements for sensor networks, which is also a variation of a CSMA-like protocol that needs extra messages for transmitting data.
Several works have proposed cross-layer algorithms to deal with retransmission issues caused by collisions in wireless sensor networks. In [10], instead of dealing with the retransmission issue directly, they assign sensor nodes within each other's interference range, which have different channels to circumvent collision problems. They proposed integrated channel assignments and data aggregation routing algorithms in WSNs. In [11,12], the authors proposed an interesting MAC layer anycasting mechanism and randomized waiting at the application layer, to facilitate data aggregation spatially and temporally in structure-free sensor networks. They address the collision problem by proposing a modified CSMA/CA protocol and randomized waiting scheme to reduce the number of retransmissions.

Problem Formulation
A MAC-DAR in WSNs is modeled as a graph, in which sensors are represented as nodes, and the arc connecting the two nodes indicates that one node is within the other's transmission radius. The definitions of notations adopted in the formulation are listed below.
First, the given parameters are shown as follows: N The set of all sensor nodes P sq The set of all candidate paths that connect data source node s to sink node q S The set of all data source nodes h Longest distance of shortest path to reach the farthest data source node M An arbitrary large number The indicator function, which is 1 if the link from node n to node k is on path p, and 0 otherwise d nk Euclidean distance between node n and node k t data Transmission time for transmitting a data packet RTS Transmission time for RTS frame SIFS Short inter-frame space time θ Maximum propagation delay for transmitting data packet Q The sink node

R n
The set of all possible transmission radii that node n can adopt, which is a discrete set ) ( n n r e Energy consumption function of node n per unit time, which is a function of the sensor's transmission radius

T
The largest number of retransmission times Then, the decision variables are shown below.
x sp 1 if data source node s uses path p to reach sink node q, and 0 otherwise y (n,k) 1 if the link from node n to node k is on the tree, and 0 otherwise r n Transmission radius of the node n z nk 1 if node k is covered within transmission radius of node n, and 0 otherwise c nk Retransmission times of node n to transmit data to node k Please note that we do not have to generate all candidate paths that connect data source node s to sink node q (i.e., P sq ). Section 4 will explain by using Lagrangean multipliers as the link arc weight (in Subproblem 2), x sp will be associated with the minimum-weight path by using the shortest path algorithm for each data source node s.
An analysis of retransmission times is conducted as follows. First, it assumes that each sensor node is equipped with a CSMA/CA compatible transceiver, each transmission conforms to a Geometric distribution, and each sensor node generates data packets that follow a Poisson distribution at a certain rate of λ. Successful transmission of data from sender to receiver depends on the number of senders whose transmission radius covers the receiver. By considering receiver side collisions, in terms of communication radii of sensor nodes, the hidden-node problem is implicitly contemplated. In the CSMA/CA protocol, when a sender wants to transmit a packet to a receiver, it will first issue an RTS control frame and wait for a CTS frame from the receiver to ensure that the channel be free [4]. According to the CSMA/CA protocol, the time interval between RTS and CTS is no larger than a short inter-frame spacing (SIFS) time. Let the propagation delay from sender to receiver be θ, and turnaround time be 2θ. The overall contention period is (RTS + SIFS + 2θ). Then the average retransmission time from node n to node k (i.e., c nk ) is as follows: Average Retransmission Times (n,k) z calculates the total number of senders whose transmission radius covers node k. The meaning of (0) is the mean value of the Geometric distribution, where the successful transmission probability, say p success , is that no data transmissions are occurring at any node whose transmission radius covers receiver node k within the contention period (RTS+SIFS+2θ). The MAC-DAR problem in WSNs is then formulated as the following nonlinear optimization problem (IP).
The objective function of (IP) is to minimize total energy consumption, where 

A. Data aggregation tree constraint
The basic idea of this set of Constraints, are to ensure that the union of all routing paths, from data source nodes to sink shall be a data aggregation tree. Recall that a data aggregation tree is a reversemulticast tree, which is a multicast tree rooted at the sink node, but with opposite transmission directions. The data aggregation tree properties are enforced by Constraints (1) to (5). Constraint (1) requires that if path p is selected for source node s to reach sink node q, the path must be on the tree. This constraint also enforces that if links (n, k) on path p are adopted by source node s to reach the sink node, then y (n,k) must be 1. Constraints (2) and (11) require that the total number of links on an aggregation tree is at least the maximum of h and the cardinality of S. Both h and |S| are legitimate lower bounds of the total number of links on an aggregation tree, and they could be calculated in advance [3]. According to [3], introducing Constraint (2) will significantly improve solution quality. The left-hand term of Constraint (3) calculates the number of paths that are destined for the sink node, and pass through link (n, k) on the aggregation tree. The right-hand term of Constraint (3) is at most |S|. When the union of paths destined for a sink node contains a cycle, and this cycle contains link (n, k), then Constraint (3) would not be satisfied because there would be too many paths passing through this link. In other words, Constraint (3) enforces the union of paths that do not contain a cycle [6]. Constraints (4) and (10) require that any data source adopts only one routing path destined for the sink node. Constraint (5) is the outgoing link constraint. All intermediate nodes on the aggregation tree should have only one outgoing link. For example, in Figure 1, each node on the data aggregation tree has only one outgoing link to the sink node. In summary, Constraints (1)-(5), (10), and (12) enforce that the union of all routing paths shall be a data aggregation tree.

B. Transmission coverage constraint
The basic idea of this set of constraints are to ensure that if a node k is covered within the transmission radius of node n, then the distance between node n and node k must be smaller than the transmission radius of node n. Because M is a very large number, on the left hand side of Constraint (6) , then, z nk could be equal to 0. Constraint (7) enforces that if node k is covered within the transmission radius of the node n, then the transmission radius of node n must be larger than the distance between nodes n and k. Hence, Constraints (6) and (7) specify the transmission coverage constraints for decision variables n r and jk z . Then  N j jk z , which is used in Equation (0), calculates the total number of senders whose transmission radius covers the node k. Constraint (8) relates decision variable y (n,k) to z nk . When y (n,k) equals to 1, it will force z nk to be 1. Constraint (13) restricts that the set of possible transmission radii that node n can adopt is a discrete and finite set. Constraint (14) ensures that each data source node turns on its transmission radius. The transmission radius of each source node cannot be 0.

C. Retransmission time constraints
The basic idea of this set of constraints are to calculate the retransmission times of node n to transmit data to node k, where the retransmission times are determined by the total number of nodes on the data aggregation tree, whose transmission radius covers node k. Constraint (9) calculates the retransmission times of node n to transmit data to node k. Since only the sensor nodes on the aggregation tree need to calculate retransmission times, when y (n,k) = 1, the right side of Constraint (9) is the same as Equation (0), i.e., to enforce the retransmission times (i.e., nk c ) and should be at least the average retransmission times. When y (n,k) = 0, the right side of Constraint (9) is zero, it implies that there is no retransmission time constraint. Constraint (15) is an integer constraint of retransmission times.
In order to make the problem (IP) tractable, a natural logarithm is used on both sides of Constraint (9) for applying the Lagrangean relaxation schemes, ) ,
subject to: (LR) can be decomposed into four independent subproblems. Step2. For all outgoing links of node n, find the smallest coefficient. If the smallest coefficient is negative, then set the corresponding y (n,k) as 1, and the other outgoing links y (n,k) as 0, otherwise set all outgoing links y (n,k) as 0. Repeat step 2 for all nodes.
Step3. If the total number of y (n,k), whose value is 1 (denoted as τ) are smaller than max{h, |S|}, then first let each y (n,k) whose corresponding coefficient is negative be 1. Second, assign the (max{h, |S|}   ) number of y (n,k) to be 1 whose corresponding coefficients are the smallest positive values. Third, let the remaining y (n,k) be 0.
The computational complexity of above algorithm is O(|N| 2 ).
(SUB2) can be further decomposed into |S| independent shortest path problems with nonnegative arc weight whose value is is negative then set z nk to be 1, otherwise 0. The computational complexity of (SUB4) is According to the algorithms proposed above, the Lagrangean relaxation problem can be effectively and optimally solved. Based on the weak Lagrangean duality theorem, Z D (u 1 ,u 2 ,u 3 ,u 4 ,u 5 ,u 6 ) is a lower bound of Z IP. The tightest lower bound is calculated by using the subgradient method [1].

Obtaining Primal Feasible Solutions
It is noted that solutions to the problem (LR) may not be feasible for the primal problem (IP), because six constraints are relaxed to the objective function. This paper proposes an optimizationbased integrated primal feasible algorithm, called LGR-Primal, which jointly address data aggregation and retransmission to obtain primal feasible solutions to the problem (IP). The information in problem (LR) provides useful information to obtain good primal feasible solutions. In LGR-Primal, the information from the Lagrangean relaxation (the solutions to the dual problem and the Lagrangean multipliers) is used to optimize the tradeoff between data aggregation and retransmission.
LGR-Primal is presented in Algorithm 1; it identifies the routing path (i.e., x sp ) for each data source node, and then the data aggregation tree is obtained by unifying all the routing paths from each data source node to the sink. In order to obtain an energy efficient data aggregation tree, the link arc weight assignment optimizes the tradeoff between data aggregation and retransmission.
When the routing path x sp of each data source node s, which is returning to the sink node, is determined, then the selected links (i.e., y (n,k) ) on the data aggregation tree could also be determined. In addition, the transmission radius (i.e., r n ) of each node could also be determined to cover the termination node for all selected links on the data aggregation tree. After the transmission radius of each node is determined, then the coverage decision variable z nk could also be determined. Finally, the value of retransmission time c nk could also be determined to satisfy Constraint (9).
In Step 1 of Algorithm 1, the first term of the arc weight assignment is energy consumption for link (n, k), . This makes this algorithm scalable to a large scale WSN.
Step 1) Assign the arc weight of the each link (n, k) as Step 2) Perform a Dijkstra's shortest path algorithm to identify the routing path (i.e., x sp ) from each data source node s to the sink node.
Step 3) Determine the other decision variables (y (n,k) , r n , z nk and c nk ) without violating the associated constraints.
Step 4) Calculate the objective value of the problem (IP).
The following section will show a complete algorithm (denoted LGR), as based on subgradient method [1] for solving problem (IP). The computational complexity of the LGR is LGR Algorithm.

Begin
Input: Network topology, data source nodes

Computational Experiments
The proposed algorithms for MAC-DAR problems are coded in C and run on a PC with PIV-2G. In a LGR algorithm, Max_Iteration and quiescence_age are set to 2000 and 30, respectively. The step size coefficient step_size, is initialized as 2, and is halved when the objective function value of the dual problem is not improved by iterations reaching quiescence_age. The computational times for the following experiments are all within five minutes.
The network topology comprises N (= 150 in Figure 3 and 4, up-to 250 in Figure 5) sensor nodes randomly placed within a 1×1 square unit area. The cost of the energy consumption function (in milliwatts), ) ( n n r e , is defined as the square of 100×Euclidean distance multiplied by the energy consumption per millisecond when the sensor node is transmitting data. The set of all possible transmission radii of sensor node n (i.e., R n ) are configured to begin from 0 to the maximum communication radius (e.g. 0.25 in Figure 3) with step size 0.01. The CSMA/CA related parameters (RTS, SIFS, θ) are the same settings as in [4]. To evaluate the solution quality of our proposed algorithm, four existing algorithms are implemented for comparison. The SPT, GIT, and CNS algorithms are proposed in [2], and the forth algorithm CCA, is described in [5]. It is worth noting that, all four heuristic constructs data aggregation trees without considering MAC layer collision effects. Each plotted point in Figures 3-5 is a mean value over 10 simulation results. Two different models in WSN are simulated. The first is an event-driven, where neighboring sensor nodes of the event will become the data source nodes. The second is a random-source, where data source nodes are determined in random. Hence, the data source nodes in an event-driven model will be closer to each other than in a random-source model. LGR CCA CNS GIT SPT Figure 5 depicts the experiments evaluating the solution quality under different network sizes (i.e., network density). The LGR algorithm outperforms the other heuristics for all the network sizes. In large network size (i.e., a high density of sensor nodes within a fixed deployment area), the solution quality of LGR over the other heuristics is more significant. Recall that in the first term (i.e.,    N n n n data r e t ) ( min ) of the objective function in problem (IP), which favors communication links of shorter distances, thus, the data aggregation tree will be composed of more relay nodes for data aggregation. However, in this case, as too many relay nodes on the data aggregation tree will introduce higher probability of collision, we will have larger retransmission cost (i.e., the second term of the objective function in problem (IP)). It is expected that the second term will play a more important role in a dense network topology. According to Figure 5, the solution for LGR increase more mildly than the other heuristics in a dense network topology, which reveals that LGR will not select a large number of hops for data aggregation, in order to avoid extra energy loss from retransmissions in a dense network topology.

Conclusions
In addition to the data aggregation, retransmission energy loss due to MAC collision plays another major crucial factor for energy-efficient data-aggregation routing in WSNs. This paper is the first one proposing a novel nonlinear mathematical formulation for MAC aware energy efficient data aggregation routing problems in WSN, where the objective function is to minimize the total energy consumption (including data transmission power and retransmission power) subject to data aggregation trees, transmission coverage, and data retransmissions constraints. The proposed solution approach is based on Lagrangean relaxation to construct a MAC aware energy-efficient data aggregation tree that jointly considers the tradeoff between data aggregation and data retransmission by using the Lagrangean multipliers. According to the computational experiments, the proposed LGR algorithm outperforms other heuristics under all tested cases, especially in an event-driven model. This is because in an event-driven model, the data source nodes are clustered, and thus, extra energy loss from retransmissions will be more significant. This indicates that a good data aggregation algorithm should be a cross layer algorithm that jointly addresses data aggregation in the network layer, and the retransmission energy loss in the MAC layer.