HiCoDG: A Hierarchical Data-Gathering Scheme Using Cooperative Multiple Mobile Elements †

In this paper, we study mobile element (ME)-based data-gathering schemes in wireless sensor networks. Due to the physical speed limits of mobile elements, the existing data-gathering schemes that use mobile elements can suffer from high data-gathering latency. In order to address this problem, this paper proposes a new hierarchical and cooperative data-gathering (HiCoDG) scheme that enables multiple mobile elements to cooperate with each other to collect and relay data. In HiCoDG, two types of mobile elements are used: the mobile collector (MC) and the mobile relay (MR). MCs collect data from sensors and forward them to the MR, which will deliver them to the sink. In this work, we also formulated an integer linear programming (ILP) optimization problem to find the optimal trajectories for MCs and the MR, such that the traveling distance of MEs is minimized. Two variants of HiCoDG, intermediate station (IS)-based and cooperative movement scheduling (CMS)-based, are proposed to facilitate cooperative data forwarding from MCs to the MR. An analytical model for estimating the average data-gathering latency in HiCoDG was also designed. Simulations were performed to compare the performance of the IS and CMS variants, as well as a multiple traveling salesman problem (mTSP)-based approach. The simulation results show that HiCoDG outperforms mTSP in terms of latency. The results also show that CMS can achieve the lowest latency with low energy consumption.


Introduction
Wireless sensor networks (WSNs) that consist of a number of low-cost, low-power stationary sensors have been widely used in various applications such as environmental monitoring, industrial sensing, and battlefield surveillance [1]. In WSNs, data gathering from distributed sensors deployed over a large area is one of the most important tasks. Data gathering in WSNs has traditionally been performed via multi-hop data forwarding to the sink. However, it is known that multi-hop forwarding leads to the energy-hole problem, where the energy of sensors near the sink is depleted quickly since they forward more data packets than sensors distant from the sink [2]. As a result, the energy-hole problem makes the network disconnected and decreases overall network lifetime.
In order to avoid the energy-hole problem, mobile data-gathering schemes have been studied [3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20][21]. In such schemes, mobile elements (MEs) (e.g., autonomous robots) are used to collect the data from sensors and bring the data to the sink. By using MEs, not only can the energy-hole effect be avoided but data gathering in a sparse or disconnected network also becomes possible because MEs can travel and directly collect the data from sensors. The drawback to this approach is long data-gathering latency since data delivery relies on the physical movements of MEs. In addition, high energy consumption by MEs is another issue to be addressed.
In order to achieve low data-gathering latency with minimal energy consumption, this paper proposes a new hierarchical and cooperative data-gathering (HiCoDG) scheme where two different types of MEs cooperate to collect and relay data from sensors to the sink. One type of ME is the mobile collector (MC), which collects data from sensors. Another type is the mobile relay (MR), which gathers data from MCs and delivers them to the sink. In HiCoDG, mobile collectors do not need to visit the sink, which can result in a significant improvement in terms of latency and energy consumption.
More specifically, in HiCoDG, sensors are first organized into one-hop clusters. The positions of clusterheads are considered points of interest in the network, which are partitioned into several groups with respect to geographic position. In each group, an MC is scheduled to periodically visit the points of interest, where it collects data from sensors within the cluster via one-hop communications. Then, MCs forward the data to the MR, which periodically travels from the sink to visit points of interest called meeting points (MPs) to receive data from MCs. Finally, the MR delivers the data when it returns to the sink.
In order to find optimum trajectories for MCs and the MR to minimize the total traveling distance, we define an optimization problem, named local data gathering with global relay (LDG-GR). Then, an integer linear programming (ILP) problem is formulated to find the solution to LDG-GR.
In this paper, depending on the way data are relayed from MCs to the MR, two variants of HiCoDG are considered: intermediate station (IS)-based and cooperative movement scheduling (CMS)-based schemes. In the IS scheme, each MC drops off data to an MP, which will store the data until the MR visits and picks up the data. In this case, special hardware (e.g., high-capacity data storage and batteries) is required for MPs. On the other hand, in the CMS scheme, the optimal movement speeds of MCs and the MR are predetermined, such that MCs and the MR can meet just in time during their journey to directly forward data using a wireless channel.
In addition, an analytical model was designed to estimate the average data-gathering latency in HiCoDG. The model can be useful for estimating whether MEs with a certain movement speed are able to achieve the required latency before they are actually deployed.
Extensive simulation and analysis were conducted to evaluate the performance of the two variants of HiCoDG. Those variants were also compared with multiple traveling salesman problem (mTSP)-based approach. The simulation results show that HiCoDG outperforms the mTSP-based approach in terms of latency and energy consumption. The simulation results also indicate that the CMS scheme can achieve the lowest latency.
The rest of the paper is organized as follows. In Section 2, we present related work and compare it to our algorithm. Section 3 presents the proposed scheme and algorithms in detail. In Section 4, an analytical model for estimating latency is described. Section 5 presents the simulation setup and performance analysis. Finally, Section 6 concludes the paper.

Related Work
In this section, we present an overview of previous studies on mobile data gathering, and compare those studies with our proposed algorithm.
There are a lot of recent studies on data gathering in WSNs using mobile elements. The majority of these studies considered the use of a single ME [5][6][7][8][9][10][11][12].
In these studies, an ME is scheduled to periodically visit sensors in the network and collect the data, then deliver them to the sink. For example, two studies [5,6] focused on the problem of finding the optimal trajectory and movement schedule for an ME, with the objective of maximizing network lifetime. More specifically, Wang et al. [5] assumed that the ME only visits some specific nodes in the network and stays for a given sojourn time to collect the data through multi-hop data transmissions. An linear programming (LP) optimization formulation was proposed to jointly determine the specific nodes the ME should visit and its sojourn time at each node, so that the network lifetime is maximized.
Gu et al. [8] and Somasundara et al. [9] proposed movement scheduling algorithms to schedule an ME such that the packet loss at sensors due to buffer overflow can be avoided. For example, Gu et al. [8] first organized the sensors in the network into a number of groups based on their data generation rates and locations. Then, the ME was scheduled to visit sensors in each group at an adequate frequency so the buffer overflow at the sensors is reduced.
In addition, Kumar et al. [11] and Sugihara and Gupta [12] focused on the problem of reducing data-gathering latency. Kumar et al. [11] proposed a single mobile collector-based data collection architecture that lowers latency by considering sensor clustering and long-range wireless communications. Sugihara and Gupta [12] proposed a heuristic scheduling algorithm that considers both sensor locations and time constraints to minimize latency in an 1D case problem.
Those studies differ from our work in that they use only one ME to collect the data from sensors. The potential drawback of this approach is less scalability. When the number of sensors or the network area increases, the path length the ME must travel increases accordingly. The ME will take a longer time to finish one round, which results in high data-gathering latency. Therefore, a single ME may not be sufficient for certain applications that require low latency in a large-scale WSN. In our work, we consider the use of multiple MEs for gathering data to reduce latency.
There have also been several studies that use multiple MEs for data gathering in WSNs [14][15][16][17][18][19][20]. For example, Zhao et al. [15] proposed a data-gathering scheme using multiple MEs that are controlled to move along parallel straight lines through the area of interest and collect data from sensors using multi-hop transmission. This scheme can work well in a large-scale WSN where the sensors are uniformly distributed. However, if not all of the sensors in the network are connected using a wireless link, the MEs may not cover some of them because they traverse only straight lines.
Kim et al. [16] defined two combinatorial optimization problems to find the trajectories, with the objective of minimizing data-gathering latency: the k-traveling salesperson problem with neighborhood (k-TSPN) and the k-rooted path cover problem with neighborhood (k-PCPN). They also proposed constant-factor approximation algorithms for those problems.
Those studies on multiple ME-based mobile data gathering differ from our work in that they schedule each ME to visit a part of the sensing field to collect data and all MEs need to return to the sink. This may lead to high data-gathering latency in a large-scale WSN where a number of sensors are deployed over a large area distant from the sink. In our scheme, sensors are first organized into groups, and a mobile collector (MC) that does not need to return to the sink is assigned to each group to collect data. The mobile relay is scheduled to periodically visit each group, receive the data from the MCs, and bring the data to the sink. Moreover, in our work, a cooperative movement scheduling algorithm is proposed to control the movements of the MCs and the MR in order to reduce the data-gathering latency in the network.
In addition, Aslanyan et al. [21] considered the use of multiple MEs that move with a random mobility pattern in the area of interest to collect the data, and deliver the collected data to the sink when they are close enough to the sink to ensure a direct transmission. Due to the random mobility, the data-gathering latency may not be estimated. In contrast, in our work, MEs are scheduled to move in cooperation to collect and relay the data to the sink. As a result, the average latency in our proposed schemes can be estimated, which is useful to determine whether or not the data-gathering scheme using MEs can satisfy the latency requirement before MEs are actually deployed.

Hierarchical and Cooperative Data Gathering (HiCoDG) Scheme
In this section, we present the proposed HiCoDG scheme in detail. The considered network consists of a number of stationary sensors that are deployed over an area of interest. Each sensor periodically generates data packets and saves them in its buffer. It is assumed that the positions of sensors are known a priori.
We consider the use of multiple mobile elements (e.g., autonomous robots equipped with RF transceivers) to gather data from sensors and to deliver the data to the sink located at a given position. More specifically, multiple mobile elements including mobile collectors and a mobile relay are deployed to cooperatively collect and relay the data. Figure 1 illustrates the overview of the proposed data-gathering scheme. In Figure 1, sensors are partitioned into three groups, each of which consists of several one-hop clusters. In each group, an MC moves along a closed path that consists of points of interest at which an MC collects data from sensors via one-hop wireless communications. The path of the MC begins and ends at an meeting point, which is one of the points of interest. The MC forwards the collected data to the MR via the MP or directly. The MR starts moving from the position of the sink, and visits each group at an MP and receives the data from the MP or MCs. It is assumed that MCs can be replaced with spare MCs in case they can not perform the mission due to power depletion or malfunction. In this paper, it is also assumed that collision between MEs can be avoided using existing collision avoidance algorithms [22,23], even if their movement paths intersect. From here, we first describe the clustering algorithms that are used to partition sensors into several groups. Then, the optimization problem formulation to find the optimal trajectories for MEs is presented. Finally, two cooperative data relay schemes are described.

Clustering and Grouping Algorithms
In HiCoDG, sensor nodes are partitioned into k groups where k is the number of MCs used in the network. To form the groups, sensor nodes are first organized into one-hop clusters using a highest degree-based clustering algorithm [24]. More specifically, a node that has the maximum number of neighboring nodes is selected as the first clusterhead, and its neighboring nodes join the cluster. Among nodes that do not belong to the first cluster, another node with the maximum number of neighbors is selected to form the second cluster, and so on. This process is repeated until every node belongs to a cluster.
Then, the positions of clusterheads, which are regarded as points of interest (or points), are divided into k groups, each of which has an MC assigned to it. In this paper, we consider the use of two methods for partitioning the points into k groups: a K-means-based grouping algorithm and a sink position-based grouping (SPG) algorithm.
K-means-based grouping algorithm: We adopt K-means like a minimum mean distance algorithm (or K-means algorithm) [25], which uses the minimum mean distance between points as an objective function of grouping. More specifically, in the K-means algorithm, the points are divided into groups such that the mean distance between the group-head and members in each group is minimized.
Sink position-based grouping (SPG) algorithm: In order to achieve low data-gathering latency, it is desirable for the MR to travel a short tour so that it can deliver the data from the MCs to the sink in a short amount of time. However, when the K-means-based algorithm is used, some groups might be located far from the sink, which results in a long tour for the MR. Therefore, we propose the SPG algorithm. The main idea behind SPG is to divide the points into groups such that every group includes one point that is close to the sink. More specifically, in SPG, the area is first divided into k sub-parts such that the sink position is included in every sub-part. The points located in the same sub-part belong to the same group. Figure 2 illustrates grouping when SPG is applied. In Figure 2, the sink is located in the left corner of a rectangular area. Assume there are four MCs, i.e., k = 4. Then, the area of interest is divided into four sub-parts represented by the dotted lines starting from the sink position such that ϕ 1 = ϕ 2 = ϕ 3 = ϕ 4 , as shown in Figure 2.

Optimal Trajectory Formulation
Now, we consider the problem of finding the optimal trajectories for the MR and MCs that minimize latency and energy consumption in the network. Note that there have been a lot of studies on finding the optimal trajectory over visiting points in order to minimize total distance, such as the traveling salesman problem (TSP) [26], multiple salesman problem [27], and vehicle routing problem [28].
However, those studies cannot be used in our data-gathering architecture because they do not consider different types of MEs, the hierarchical architecture of MEs' movements, and the need for data exchange among MEs. More specifically, in our work, two types of mobile elements (MC and MR) are used. Each MC collects data from sensors in its group and forwards the data to the MR at a meeting point, which is one of the sensor positions in the group. The MR receives data from MCs at meeting points, and delivers the data to the sink.
Therefore, we define a new optimization problem.
Proof. It can be readily shown that a Euclidean TSP [29] is polynomial-time reducible to LDG-GR, i.e., Euclidean TSP ≤ p LDG-GR. Suppose that a Euclidean TSP instance consists of a set of k "cities" c 1 , . . . , c k , a distance d(i, j) between every pair of cities c i and c j (1 ≤ i, j ≤ k), and an integer D.
The corresponding LDG-GR instance has k subsets, V 1 , ..., V k , each of which has only one element i.e., |V i | = 1. Let V i have a point i as an element that corresponds to c i , and define the distance between a pair of points in k i V i the same as d(i, j). Let point 0 be the sink. The instance of LDG-GR can be constructed in polynomial time. Note that in the constructed LDG-GR instance, only the MR has a traveling distances since every subset has only one point. Then, it is clear that if the length of the tour is D in Euclidean TSP, the total path length in LDG-GR becomes D. Conversely, if the total path length in LDG-GR is D, the length of the tour becomes D in the Euclidean TSP. LDG-GR is in NP, since a guessed path, obtained by using a non-deterministic algorithm, can be checked in polynomial time as to whether it has a total length of D and if every point is visited.
In this paper, we propose ILP formulation for jointly finding the optimal trajectories for both MCs and the MR. We model the network as a complete directed graph G = (V, E) where E = {(i, j) : i, j ∈ V, i = j} is the set of arcs. Let c ij denote the travel cost (Euclidean distance) from point i to point j. We also define x ij as a binary variable that becomes 1 if arc(i, j) is on the trajectories of the MCs, but becomes 0 otherwise. Similarly, a binary variable y ij represents whether or not arc(i, j) is on the trajectory of the MR. We denote n i as the number of points in subset V i i.e., | V i |= n i . Note that n 0 = 1. Also, u i is defined as the number of points that have been visited by the MCs and the MR up to point i. We also define p org i ∈ V i as the origin point of the trajectory solution of MC i . Note that p org i can be an arbitrary point in V i and can be chosen a priori.
Then, the ILP problem can be formulated as follows: subject to: The objective function in Equation (1) is for minimizing the total distance traveled by both the MCs and the MR. Equations (2)-(6) define constraints on the optimal trajectories of the MCs, whereas Equations (7)-(10) represent constraints on the optimal trajectory of the MR. More specifically, constraints Equations (2) and (3) ensure that every point except 0 is visited exactly once by only one MC. Constraint Equation (4) states that MCs do not travel between two points that belong to two different subsets (i.e., groups). Constraints Equations (5) and (6) eliminate the subtours in the trajectories of the MCs in a way similar to the MTZ subtour elimination of the TSP [26].
On the other hand, constraint Equation (7) ensures that the MR visits each group exactly once at one point. Note that among all points of interest, the point that will be visited by the MR is called a meeting point (MP). Constraint Equation (8) ensures that the MR enters and leaves each group at the same MP. Note that MP i may be different from p org i which was arbitrarily chosen to find the optimal trajectories. Constraints Equations (9) and (10) eliminate the subtour in the trajectory of MR.
Note that, in case there are constraints (e.g., obstacles) on the link between two points, the link cost value can be set to a high value (or even an infinite number). Due to the objective function that minimizes the path cost, such high cost links will be avoided in the solution. In other words, the ILP formulation can find the solution even when there are physical constraints on the links.

Cooperative Data Relay Scheme
Along the optimal trajectories obtained from the ILP formulation, each MC periodically performs data gathering in its group. On each tour, an MC starts from an MP, visits all points in the group, and returns to the MP. The MR also periodically travels to receive data from the MCs at the MPs. In order for the MR to relay the data from MCs to the sink, we consider two schemes: IS-based data relay and cooperative movement scheduling CMS-based data relay. In the IS scheme, MPs require more powerful HW than regular points of interest, while CMS does not require special HW.
(1) IS-based data relay In this scheme, it is assumed that the points that are selected as MPs are equipped with high-capacity data storage and high-capacity batteries. During its journey, an MC sends its collected data to its MP when they can establish a direct wireless communications channel. Then, the MP stores the data and forwards them when the MR visits. The MR receives data from all MPs and returns to the sink.
Note that, in the IS scheme, MCs and the MR move independently. Due to lack of cooperation among nodes, the MR can visit an MP before an MC sends data to that MP. In that case, the MR cannot receive any data from the MP until the next visit. As a result, the MR wastes energy, and data-gathering latency may increase. Moreover, the need for data storage and high-capacity batteries at MPs increases system cost.
In order to address these problems, we propose a CMS-based data relay algorithm, where the movements of MCs and the MR are scheduled such that the MR can directly receive the collected data from MCs while the MR and MCs are traveling.
(2) CMS-based data relay The main idea behind CMS is that the movement speeds of MCs and the MR are controlled so that MCs can meet the MR at MPs during the periodic movements and can forward data to the MR using a direct wireless channel. Let L i (i = 0 . . . k) denote the path (or tour) length of each ME (MC or MR), where L 0 is the tour length of the MR and L i (1 ≤ i ≤ k) the tour length of MC i . We also define s max as the maximum movement speed of MEs.
Recall that the path of the MR consists of k + 1 meeting points, where MP 0 is the sink's position. Let τ denote the time duration in which the MR stays to receive the data from MC i at MP i (i = 0), or to send the data to the sink at MP 0 . We also define T m as the time it takes for the MR to finish one tour, which includes moving time and sojourn times at MPs. Then, the value of T m can be calculated as The speeds of MEs are determined such that the MR can periodically meet MC i at every T m . Using T m and the tour length of each ME, the speed of each ME can be calculated as follows: where s 0 is the speed of the MR and s i is the speed of MC i . Note that s 0 and s i are equal to or less than s max .
Also, denote t 0 as the time at which the MR leaves the sink and d ij as the distance from MP i to MP j . Then, during the same tour, the MR meets MC i at time t i (i = 0), which can be calculated as In CMS, the MR starts a new tour at the sink at every T m as shown in Figure 3. The MR begins its journey from MP 0 , and visits MP i , where it meets MC i and stays for τ seconds to receive data. Then, the MR returns to MP 0 , stays for τ seconds to deliver data to the sink, and then starts another journey. Note that it is assumed that the initial positions of MC i (1 ≤ i ≤ k) are also determined such that MC i can meet the MR at t i . After sending the data to MR, MC i starts new round at t i + τ as shown in Figure 3. By using the speed from Equation (12), MC i will periodically meet the MR every T m seconds during future rounds.

Analytical Model for Estimating Data Gathering Latency in HiCoDG
In wireless sensor networks, the application often has a specific requirement for data-gathering latency. Therefore, it is important and necessary to estimate whether or not the mobile data-gathering system using MEs can satisfy the latency requirement before the MEs are actually deployed. In order to address this issue, in this section, we discuss an analytical model to estimate data-gathering latency in the proposed HiCoDG.
We first present some general assumptions for analysis. Then, an analytical model to estimate data-gathering latency is described. Finally, we verify the model by comparison with simulation results.

General Assumptions
We consider a network consisting of k + 1 MEs, including k MCs and one MR. The points of interest (or points) in the network are partitioned into k groups G i (i = 1, . . . , k). MC i visits and collects data at points in group G i . Recall that L 0 and L i (i = 1, . . . , k) denote the path lengths of the MR and MC i , respectively. Similarly, we define T i (i = 0, . . . , k) as the time that it takes for MR and MC i to finish one tour, respectively. In addition, recall that s 0 and s i are movement speeds of the MR and MC i , respectively. Without loss of generality, in the model, we do not consider latency of wireless communications since the network transmission speed is much higher than the movement speed of MEs. Thus, we can have It is assumed that at every point on the paths of the MCs, a data packet is periodically generated at the rate of R (in packets per second). The packet is stored in the buffer of the point until an MC collects it. We also assume that on the path of each MEs (the MR or MCs), the points are randomly placed in an uniform distribution.

Estimation of Average Data-Gathering Latency
In the model, we estimate the average data-gathering latency of a packet generated at an arbitrary point p located on the path of MC i in group G i . In HiCoDG, the latency of the packet at point p is the sum of four time components as shown in Figure 4, which are: (1) the time duration for the packet to wait in the buffer at point p from its generation until MC i collects it.

Estimation of Waiting Time in the Buffer
We first estimate the expected latency for a packet waiting in the buffer at point p. Let M denote the total number of packets that MC i collects at point p at each arrival. Since MC i periodically visits and collects the packets from point p at every interval T i , the value of M is the total number of packets generated at point p during period T i . Then, we have M ≈ RT i , assuming T i ≫ 1 R .
We suppose that packets are generated in the time interval [0, T i ]. Then, those packets will be collected by MC i at T i as shown in Figure 5. Let ∆t denote the time that the first packet is generated at point p during T i , and let ∆τ denote the time difference between the time the last packet is generated and T i . If we define Υ as the total waiting time of M packets in the buffer, then the value of Υ can be simply calculated as Let t p denote the packet interval and t p = 1 R . Then, we have By substituting ∆τ into Equation (15), we obtain Let t i b denote the mean time of a packet waiting in the buffer at point p. Then, By substituting M = RT i = T i tp into Equation (18), we get Since the first packet can be generated in a random time, ∆t can be considered a uniform random variable that takes the value in [0, t p ). As a result, t i b is also a random variable because it is a function of ∆t. Thus, the expected value of t i b can be expressed as

Estimation of Traveling Time in MC i
After MC i collects the packet from point p, it will bring the packet to MP i . Recall that all points on the path of MC i are randomly placed in a uniform distribution. Let χ be a random variable that represents the traveling distance from MP i to arbitrary point p on the path of MC i . Then, the probability density function (pdf) of χ, f χ (x), is expressed as Let t i mc denote the time the packet is in MC i before MC i arrives at MP i at the end of its round. Then, t i mc is the time it takes for MC i to travel from point p to MP i , which is Since t i mc is a function of the random variable χ, the expected value of t i mc can be calculated as note that T i = L i s i

Estimation of Waiting Time in MP i
In the IS scheme, the packet collected by MC i is stored in MP i until the MR arrives and picks it up. Now we estimate the time that the packet might wait at MP i for the MR under the IS scheme. Note that in CMS, since MR and MC i are scheduled to periodically meet each other at MP i , the time a packet waits in MP i equals zero.
Recall that MR and MC i periodically visit MP i every T 0 and T i (1 ≤ i ≤ k), respectively. We define {Y 1 , Y 2 , . . . , Y l , . . .} as the time that MC i arrives at MP i during its journey, where Y l is the time of the l th arrival. Note that Y l+1 − Y l = T i . Also, let {Z 1 , Z 2 , . . . , Z h , . . .} denote the time that the MR visits MP i , where Z h is the time that MR arrives at MP i in its h th round.
Suppose that MC i collects the packet at point p in the l th round and brings the packet to MP i at time Y l (l ≥ 1). Also, assume that MC i arrives at as shown in Figure 6. In the time interval [(h − 1)T 0 , hT 0 ], the MR starts its h th round from the sink's position at (h − 1)T 0 , visits MP i at Z h , and finally returns to the sink at hT 0 .
Since MP i is considered as an arbitrary point randomly placed in the path of the MR, Z h is a uniform random variable that represents the time the MR arrives at MP i in the interval [(h − 1)T 0 , hT 0 ]. Thus, the pdf of Z h can be expressed as In the interval [(h − 1)T 0 , hT 0 ], the packet stored in MP i at time Y i is collected by the MR at time Z h if the MR visits MP i after the time MC i arrives i.e., Z h ≥ Y l . Otherwise, the MR will collect the packet during its next visit at time Z h+1 in the interval [hT 0 , (h + 1)T 0 ]. Note that Z h+1 = Z h + T 0 . Two cases for this collection are shown in Figure 6.
We define t i mp as the waiting time of the packet in MP i . Then, the value of t i mp is calculated as shown in Figure 6, which is Since t i mp is a function of the random variable Z h , it is also a random variable. Thus, the expected value of t i mp can be calculated as

Estimation of Time in MR
After the MR collects the packet at MP i , it brings the packet to the sink. Let t i mr denote the time it takes for the MR to move from MP i to the sink's position. That is Since Z h+1 = Z h + T 0 , the value of t i mr is rewritten as Similarly, the expected value of t i mr can be calculated as Define t i mean as the mean latency of a packet collected by MC i from arbitrary point p in group G i . From Equations (20), (23), (26) and (29), the value of t i mean can be approximated as if CMS is used (30) Note that in the IS scheme, the packet may need to wait at the MP until the MR visits the MP and picks it up. Thus, the latency of the packet under IS consists of the time the packet waits in the MP. In contrast, under CMS, the packet is immediately delivered to the MR when the MC arrives at the MP since the MR and MC are scheduled to periodically meet each other at the MP.
By substituting T i = L i s i and T 0 = L 0 s 0 into Equation (30), we obtain Note that in Equation (31), the movement speeds s i of MEs in two cases for IS and CMS may have different values from each other.
We also estimate the movement energy consumption of MEs for data gathering. Let Π denote the total energy consumed by MEs for their movement during data collection time t c . The value of Π can be calculated as where ρ represents the energy consumed to travel a meter, and we assume ρ = 8.27 joules.

Result Analysis
Equation (31) gives the estimated mean latency of a packet collected by MC i . Now, in order to verify the accuracy of the analytical model, we compare the results from the model with the results from simulations using a simulator developed in Matlab. For comparison, we consider a network that consists of one MR and four MCs. The MR and MCs have a given path length. There are 100 points randomly placed on the path of each MC. Each point periodically generates a packet every 1 s, i.e., the data generation rate R = 1 (packet/second). Also, we assume the MEs have maximum movement speed s max . In CMS, the movement speeds of the MR and MCs are determined using Equations (11) and (12).
In IS, every ME moves at the same speed.
We collect the average latency of the packet from the simulation and the analytical model with two different path length configurations for MEs, as shown in Table 1. For each path length configuration for MEs, we compare the latency under CMS and IS. All MEs in IS move at the average movement speed for MEs in CMS.  Figure 7 shows the average latency of the packet from the estimation and the simulation over variations in maximum movement speeds for MEs. More specifically, Figure 7 show the results for path length configurations 1 and 2, respectively. Each result from the simulation is the average value over 100 simulation runs. In each simulation run, the positions of points on the path for the MEs are randomly chosen. Also, the result from the simulation is the average latency of all packets collected at the sink during the simulation time of 3000 s. In the analytical model, the result is the average of the estimated latency of the packets collected by four MCs.  As shown in Figure 7, the average latency estimated from the analytical model is close to the average latency from the simulation in the two different path length configurations. The gap between the results from the model and the simulation is narrow and the model consistently has a higher latency than the simulation. In other words, the analytical model can provide an upper bound for average latency compared with the simulation.
We also collect energy consumption by the MEs for movement. Table 2 shows the total energy consumed by MEs' movements during simulation time with two different path length configurations. As shown in Table 2, for each configuration, both CMS and IS show the same movement energy consumption, since they have the same average speed of MEs. Note that for each data relay scheme, the analytical model and the simulation also have the same result of energy consumption. Moreover, the results shown in Figure 7 and Table 2 indicate that on a given amount of energy consumed for the movement of MEs, CMS achieves lower latency than IS. This confirms that the movement scheduling scheme can reduce the data gathering latency.
In summary, in this section we present an analytical model for approximating average latency of packets collected by MEs in HiCoDG, which is verified by comparison with the simulation results. Note that the model is useful for assessing whether MEs can provide the required latency before they are deployed.

Performance Evaluation
In this section, we evaluate the performance of HiCoDG variants by comparing them with an mTSP-based approach. Recall that in HiCoDG, two cooperative data relay schemes are proposed, IS and CMS. Also, two different grouping algorithms (i.e., K-means and SPG) are considered. Therefore, four combinations of HiCoDG variants are evaluated in the simulations.
• IS with K-means • IS with SPG • CMS with K-means • CMS with SPG We first present the simulation parameters and performance metrics, and then discuss the performance results.

Simulation Setup
Recall that in HiCoDG, MCs visit a point of interest (i.e., a clusterhead) and collect the data from the clusterhead, which received them from sensors within its domain. In this section, for the sake of brevity, we consider a simplified network configuration that only includes a certain number of points (i.e., clusterheads) that are assumed to generate data for their cluster.
The considered network consists of 32 points of interest (POIs) that are randomly deployed in an area of 2000 m × 2000 m. Each POI generates a data packet every 1 s and stores the packet in its buffer until an MC collects it. One MR collects data from four MCs and delivers the data to the sink (i.e., five MEs are used). Accordingly, POIs are divided into four groups.
In the mTSP-based approach, we use the ILP formulation for the mTSP in [27] to find m optimal trajectories that begin and end at the position of the sink with the objective of minimizing the total distance traveled. One ME is assigned to each trajectory. Also under mTSP, five MEs (i.e., m = 5) are arranged on five routes to visit POIs and return to the sink.
To simulate data gathering in wireless sensor networks, we use a network simulator, Qualnet 6.1, in which the controllable mobility simulator architecture, ConMoSim [30] was integrated to simulate the mobility of MEs. The GNU Linear Programming Kit [31] was used to solve the ILP problem of finding optimal trajectories for MEs in both HiCoDG and mTSP.
Four performance metrics are used to evaluate the performance of algorithms: • maximum gathering latency: the maximum value over elapsed times for all packets from generation to arrival at the sink. • average gathering latency: the average value over elapsed times for all packets from generation to arrival at the sink.
• movement energy consumption: the total energy consumed by MEs' movements.
• maximum number of packets in buffer of ME: the maximum number of packets that is stored in the buffer of mobile elements.
Also, the simulation time is 10,000 s, and we assume that an ME consumes 8.27 joules to travel one meter (or 0.21 J/inch) [32]. Figure 8 plot the movement trajectories (i.e., movement paths) for MEs generated by HiCoDG with SPG and K-means algorithms, and mTSP, respectively. The length of each path (i.e., the distance that each ME travels to finish one round) and total length of those paths are presented in Table 3. Note that in Table 3, ME 1 and ME i (i = 2, . . . , 5) in mTSP correspond to MR and MC i (i = 1, . . . , 4) in HiCoDG, respectively. More specifically, as shown in Figure 8 and Table 3, under mTSP, the difference between the longest and shortest paths for MEs is higher than those of HiCoDG. In other words, under mTSP, an ME may travel a longer distance than others, which results in unbalanced energy consumption. Moreover, under mTSP, the total path length of all MEs is much longer than the total length in HiCoDG.

Performance Analysis
In addition, as shown in Figure 8 and Table 3, under HiCoDG, when the SPG grouping algorithm is used, the path length of MR is shorter than when using K-means algorithm. The reason is that when the SPG algorithm is used, the path of MR includes POIs which are closer to the sink than when using the K-means algorithm. Note that due to the nature of the K-means algorithm, groups are formed such that POIs in a group are closed to each other without consideration of the sink position. As a result, all POIs in a group can be distant from the sink. In other words, when the K-means algorithm is used, the path of MR may consist of POIs that are distant from the sink, which leads to a longer path length. Moreover, as shown in Table 3, the maximum path length of MCs in K-means algorithm is longer than that of the SPG algorithm. This is possible because a large number of POIs that are close to each other can be in a group when K-means algorithm is used.  Recall that in HiCoDG, along the optimal paths as shown in Figure 8, the MR and MCs cooperate with each other to collect and relay the data from POIs to the sink. More specifically, under IS, the MR and MCs cooperate in a way that each MC periodically drops off the collected data to the MP, and the MR periodically visits the MP and relays the data to the sink. On the other hand, under CMS, a higher level of cooperation is performed. The movement speeds of the MR and MCs are determined such that the MR can periodically meet MCs at MPs to directly receive the data from MCs. Now we analyze the effects of movement speed of MEs on the performance metrics. The maximum speed for mobile elements varies from 5 to 10, 20, 25, 30 m/s. Recall that, in CMS, the speed of the MEs is determined by using the maximum speed and the path length of the MEs, i.e., the actual movement speeds of the MEs are below the maximum speed. In order to compare the data-gathering latency between CMS and IS that have the same energy consumption for MEs' movements, under IS, MEs always move at the average speed of MEs under CMS for collecting the data. Note that under CMS, when the SPG algorithm is used, the average speed of MEs is higher than when using the K-means algorithm. In this work, in order to favor mTSP in terms of data-gathering latency, we make MEs under mTSP move at the average speed of MEs under CMS with the SPG algorithm. Figure 9 shows the maximum data-gathering latency of the schemes over variations in maximum speed of the MEs. As shown in Figure 9, four variants of HiCoDG always exhibit a lower maximum latency than mTSP over variations in the maximum speed of MEs. For example, when the maximum speed is 20 m/s, the maximum latency of mTSP is about 1600 s, while the maximum latency of HiCoDG schemes is less than 1100 s. The reason is that in mTSP, every ME has to return to the sink at the end of a data-gathering tour, which results in high latency. Moreover, in mTSP, the optimal trajectories for MEs are unbalanced and an ME travels a long path compared to others. On the other hand, in HiCoDG, MCs travel only in the region assigned to them and do not need to return to the sink.  In addition, as shown in Figure 9, in both CMS and IS, when the SPG algorithm is used for grouping, the maximum gathering latency is always lower than when using the K-means algorithm. The reason is that when K-means is used, the path length of the MR is longer than when using the SPG algorithm. Moreover, as shown in Table 3, under K-means, the maximum path length of MCs is longer than that under the SPG algorithm. Note that when the path length of an MC is longer, the time duration from time the packet is generated to time the MC brings it to the MP also becomes longer. As a result, the packet has higher maximum latency when the K-means algorithm is used for grouping. Figure 10 compares the average data-gathering latency of the algorithms. As shown in Figure 10, the average latency of algorithm are inversely proportional to the maximum speed of MEs, which agrees with the relationship between traveling time and the movement speed on a given distance. Moreover, all variants of HiCoDG have a lower average gathering latency than mTSP. In particular, CMS with SPG consistently shows the lowest latency compared to others. For example, when the maximum speed of MEs is 10 m/s, the average latency of mTSP is 54% higher than the latency of CMS with SPG. In Figure 10, we also see that when the maximum speed of MEs becomes very high, the average latency tends converge to a certain value, which depends on the sojourn time of MEs at POIs and the sink for data transfer as well as the movement speed of MEs. In addition, as shown in Figure 10, when the same grouping algorithm is used, CMS has a lower average latency than IS. The gap between latency values of CMS and IS tends to decrease as the maximum speed of MEs grows. The reason of lower latency in CMS compared to IS is that, under IS, due the lack of cooperation between the MR and MCs, the packets may need wait for a long time at MPs before the MR arrives at MPs and picks them up. On the other hand, under CMS, the waiting time of the packets at MPs is zero. Moreover, as shown in Figure 10, when the K-means algorithm is used, the latency gap between CMS and IS is higher than when using the SPG algorithm. For example, at the maximum speed of 10 m/s, under K-means algorithm, the latency of CMS is about 32% lower than that of IS, while it is about 16% under SPG algorithm. Figures 11 and 12 show the variation of maximum and average latency of algorithms over time. The maximum speed of MEs is set to 10 m/s. Note that at several time points around time 0, the latency is zero since no packets are collected at the sink. As shown in Figures 11 and 12, all variants of HiCoDG exhibit more stable latency (i.e., maximum and average latency) than mTSP over time. In particular, CMS shows the most consistent latency among algorithms. Recall that in CMS, the initial positions of MCs are determined such that each MC can meet the MR at MP. Thus, at the first round, the MC may start from a middle point of its path, and send the smaller number of packets with a low latency to the MR at the first meeting at MP. Thus, as shown in Figures 11 and 12, at time before 1000 s, CMS has a lower latency than at time after 1000 s. Figure 11. Maximum data-gathering latency over time. In addition, note that under IS, due to the lack of movement cooperation among the MR and MCs, the number of packets collected and latency are different over the rounds. Thus, IS shows less consistent latency than CMS. Moreover, as shown in Figure 12, the average latency of mTSP is fluctuating due to the unbalanced trajectories of MEs. From results shown in Figures 11 and 12, we see that variants of HiCoDG achieve not only lower but also more consistent latency than mTSP. Figure 13 plots the energy consumed by movements of MEs as their maximum speed increases. As shown in Figure 13, when the same grouping algorithm is used, movement energy consumption under the CMS and IS schemes is similar to each other because they have the same average movement speed of MEs. In addition, in both CMS and IS, the SPG algorithm results in higher energy consumption than the K-means algorithm as shown in Figure 13. This is because when the SPG algorithm is used, the average speed of the MEs under CMS is higher than when using K-means algorithm. Note that mTSP shows similar energy consumption to CMS with SPG because MEs under mTSP move at the average speed of MEs in CMS with SPG.  Figure 14 shows the maximum number of packets stored in the buffer of MEs when their maximum movement speed varies. As shown in Figure 14, when the speed of MEs increases, the maximum number of packets in the buffer of MEs decreases since the number of packets that are collected by MEs in one round decreases. Moreover, mTSP always exhibits a highest maximum number packets in buffer than others since, under mTSP, an ME travels on a much longer tour compared to others. Note that the maximum number of packets in the buffer of MEs indicates a large size of buffer is required for MEs. Thus, those results imply that mTSP requires a higher capacity buffer for MEs compared to HiCoDG, which leads to the increase of the system cost. In addition, as shown in Figure 14, when the same grouping algorithm is used, IS always shows a higher maximum number of packets that are stored in the buffer of MEs than CMS does. In both CMS and IS, the MR always has the largest number of packets in the buffer since it brings the data collected by all MCs to the sink at each round. Note that ,under IS, the MR might receive the data collected during several rounds of MCs at MPs. On the other hand, under CMS, the MR always receives the data collected in one round of MCs. As a result, under IS, the maximum number of packets stored in the buffer is higher.
Recall that IS requires not only a larger buffer size for MEs but also the data storage in MPs. In order to estimate the capacity of data storage required for MPs in IS, we also collect the maximum number of packets stored in the buffer of MPs. Table 4 shows the maximum number of packets stored in MPs under IS with SPG and K-means algorithms over variations in maximum speed of MEs. When the path lengths of MC and MR are longer, a larger number of packets will be stored in the buffer of the MP waiting for the MR. Thus, as shown in Table 4, when the K-means algorithm is used, the maximum number of packets stored in MPs is larger than when using the SPG algorithm. In summary, those results imply that on given movement energy consumption, CMS can achieve lower data-gathering latency than IS. In addition, the SPG algorithm results in lower latency but higher energy consumption than the K-means algorithm. Also note that by scheduling the movements of the MEs, the need of the data storage at MPs can be avoided. In other words, lower and more consistent data-gathering latency than the existing approach (e.g., mTSP) can be achieved by using cooperative movement among MEs without the need for special special hardware.

Concluding Remarks
In this paper, we have proposed a new hierarchical and cooperative data-gathering (HiCoDG) scheme that enables multiple mobile elements to cooperate to collect and relay data. In HiCoDG, there are two types of mobile elements: mobile collectors (MCs) and a mobile relay (MR). MCs collect the data from sensors, and the MR gathers the data from the MCs, delivering them to the sink. An ILP optimization problem has been formulated to find the optimal trajectories for the MCs and the MR with the objective of minimizing energy consumption. Also, we have proposed a cooperative movement scheduling algorithm to determine the optimal movement speeds for MCs and the MR. Simulations have been conducted to compare the performance of HiCoDG with the mTSP-based approach.
For future work, in order to reduce the execution time for finding the trajectories for the MR and MCs, we plan to design a solution method where an approximation algorithm for TSP and a greedy algorithm are applied to find the trajectories for the MC and MR, respectively. We also intend to extend HiCoDG to take into consideration the problem of the limited buffer size of sensors as well as MEs when scheduling the movements for the MEs such that the buffer overflow at sensors and MEs can be avoided.