Average Throughput Performance of Myopic Policy in Energy Harvesting Wireless Sensor Networks

This paper considers a single-hop wireless sensor network where a fusion center collects data from M energy harvesting wireless sensors. The harvested energy is stored losslessly in an infinite-capacity battery at each sensor. In each time slot, the fusion center schedules K sensors for data transmission over K orthogonal channels. The fusion center does not have direct knowledge on the battery states of sensors, or the statistics of their energy harvesting processes. The fusion center only has information of the outcomes of previous transmission attempts. It is assumed that the sensors are data backlogged, there is no battery leakage and the communication is error-free. An energy harvesting sensor can transmit data to the fusion center whenever being scheduled only if it has enough energy for data transmission. We investigate average throughput of Round-Robin type myopic policy both analytically and numerically under an average reward (throughput) criterion. We show that Round-Robin type myopic policy achieves optimality for some class of energy harvesting processes although it is suboptimal for a broad class of energy harvesting processes.


Motivation
The Internet of Things (IoT) is an intelligent large-scale communication infrastructure of uniquely identifiable devices capable of communicating with each other wirelessly through the Internet [1]. The devices in an IoT structure are typically equipped with wireless sensors [2]. Wireless Sensor Networks (WSNs) provide the opportunity of efficient data collection and transmission anywhere [3]. Thus, WSNs have various applications, such as agriculture [4], ambient air monitoring [5,6], frost monitoring [7], structural health monitoring [8][9][10], remote assistance for elderly people [11], home monitoring [3,[11][12][13] and smart cities [14,15]. Being frugal with energy consumption is important to several WSN deployments. Energy harvesting (EH) [16] can particularly facilitate WSN applications where replacing battery is not practical. Therefore, energy harvesting is a promising approach for the emerging IoT technology [17]. Energy may be harvested from the environment in several different ways (solar, piezoelectric, wind, etc.) [17]. As energy harvesters generally depend on uncontrollable energy resources and the amount of harvested energy is generally low [17,18], WSNs need robust, self-adaptive, energy efficient policies to optimize their reliable operation lifetime [19,20].
In this paper, we consider a fusion center (FC) collecting data from M EH wireless sensors. At each time slot (TS), K sensors are scheduled for data transmission by the FC, which does not have the direct knowledge of the battery states of the sensors or the statistics of their EH processes. It is assumed that the communication is error-free and the sensors are data backlogged but limited in available energy. Each sensor has an infinite-capacity battery to store the harvested energy and battery leakage is ignored. When a sensor is scheduled in TS t, it sends data to the FC in the TS t as long as it has enough energy in that TS. Sending one packet takes up one TS. The objective of the FC is to maximize the average throughput over a time horizon.
In fact, battery states can be made available to the FC through some additional cost (i.e., feedback) and complexity in some WSNs. However, sending information about the battery state will cause extra time and energy consumption, which we avoid. Assume that the header containing only battery state is H bytes and the remaining part (payload + other headers) of the data packet is P bytes, then sending information about battery state will cause H P times more time and energy consumption than those consumption not sending no information about battery state. We can avoid extra time consumption by consuming significantly more energy. As it is well known in communication field, data transmission rate is a concave function of transmission power. In fact, the well known Shannon's capacity formula [21] ( Shannon's capacity formula is C = B log 2 (1 + S N ), where C is the maximum capacity of the channel in bits/second otherwise called Shannon's capacity limit for the given channel, B is the bandwidth of the channel in Hertz, S is the signal power in Watts and N is the noise power, also in Watts. The ratio S N is called Signal to Noise Ratio (SNR)) indicates that this concave function is a logarithmic function. Therefore, when sending info about battery state, we can avoid the extra time consumption only by consuming much more extra energy than H P . For example, assume that the overhead containing knowledge of battery state is one fourth of the exact data, i.e., H P = 1 4 . Then, sending both overhead and exact data instead of sending only exact data in the same time duration may cause two times more energy consumption. Thus, it can be said that although energy consumption and network lifetime are not performance metrics in the problem at hand, the problem definition (with sending no information about battery states) helps sensors decrease energy consumption per data packet transmission and thus network lifetime can be increased. Therefore, it is more relevant from a practical perspective that the FC makes scheduling decisions without any knowledge about battery states or statistics of their EH processes [22].
To set up the problem, a model about generation and usage of energy is needed. Each sensor accesses the energy state of its own battery only at the beginning of the time slots in which it is scheduled by the FC. Moreover, independent from the functional form (linear or other) and type of energy harvesting resource (solar, wind, piezoelectric, RF, etc.), the net amount of harvested energy minus used energy is stored losslessly. This assumption is consistent with typical batteries in use today for which leakage is negligibly small over several minutes because battery leakage causes the battery to self-discharge less than 10% (10% for Nickel-based batteries and 5% for Lithium-ion batteries) in 24-h from the results in [23]. Based on these mild assumptions about EH processes, an appropriate performance criterion is the average throughput (reward) criterion over a time horizon rather than expected discounted throughput (reward) for the problem at hand [24].

Related Work
Although EH processes are not limited to be Markovian in this work, under Markovian assumption, the problem could be formulated as a partially observable Markov decision process (POMDP) [25]. In this case, dynamic programming (DP) [26] may be employed for optimal solution. However, DP has exponential complexity, which limits its scalability [27].
A second approach is reinforcement learning by considering the problem as a POMDP. Q-learning [28], one of the most effective model-free reinforcement learning algorithms, would guarantee convergence to an optimal solution in this problem. However, its very slow convergence [29] deems it non-ideal for a problem with a sizeable state space, especially as the discount factor approaches 1. R-learning [30], which maximizes the average reward, may be considered; however, there is no guarantee on the convergence of R-learning. Therefore, reinforcement learning do not seem to be suitable for obtaining an efficient solution to this problem. There are other approaches that can, in the long run, guarantee convergence to optimal behavior. However, in many practical applications, a policy that achieves near optimality very quickly is preferable to the one that converges too slowly to exact optimality [29].
Another approach to this problem is to set it up as a restless multi-armed bandit (RMAB) problem. An optimal solution was proposed for RMAB problem under certain assumptions by Whittle [31]. It is shown that finding the optimal solution to a general RMAB problem is PSPACE-hard [32] (In complexity theory, PSPACE is the set of all decision problems that can be solved by a Turing machine using a polynomial amount of space). As a policy with a reasonable complexity, myopic policy (MP) has been suggested for various RMAB problems. While MP is not optimal in general since it focuses only on the present state [33], it can be proven to be optimal in certain special cases.
A very similar problem to the problem at hand is investigated in [34,35]. In fact, we pose the same problem in [34,35] with the exception that we assume infinite capacity battery without leakage at the sensors, in contrast to [34,35] where either no battery or unit capacity batteries with leakage are assumed. Both [34,35] formulate the problem as a POMDP, and, due to the myopic approach in these work, the focus is on the immediate reward instead of future rewards. In [35], a single-hop WSN consisting of EH sensors with unit capacity batteries (i.e., able to store only one transmissions's worth of energy) and a fusion center is posed as a RMAB problem. The optimality of a round-robin (RR) based MP is proved under certain specific assumptions. Then, it is shown that this RR based MP coincides with the Whittle index policy, which is generally suboptimal for RMAB problems [36], for a specific case. In [34], the problem is formulated as a POMDP and the optimality of a MP is proven for two cases: (1) the sensors are unable to harvest and transmit simultaneously, and transition probabilities of the EH processes are affected by the scheduling decisions, and (2) the sensors have no batteries.
In [37][38][39], we investigate quite a similar problem with the problem at hand, although the problem in [38] has some differences due to its system model. In this paper, we consider more general class of energy harvesting processes than [37,39] do (as it is explained in the rest of this paper, we consider energy harvesting processes with intensities both ρ ≤ 1 and ρ > 1 in this paper, whereas we consider only energy harvesting processes with intensities ρ ≤ 1 in [37][38][39]. Besides this, in this paper, we also consider the cases for which finding exact throughput performance of the myopic policy is not possible with using only intensities. For these cases, we find an upper bound for the throughput performance of the myopic policy).

Our Contributions
Main contributions of the paper are summarized as follows: • The EH WSN problem is studied under average throughput (reward) criterion and no battery leakage assumption for the most general class of EH processes whereas the problem is studied under discounted throughput (reward) criterion and battery leakage assumption for certain specific cases in [34,35]. • This paper considers a battery capacity (infinite-capacity) larger than unit capacity (which is the maximum battery capacity considered in [34,35]) for the EH WSN problem.

•
We show that under average throughput criterion and infinite-capacity battery assumption, RR policies including the MP in [34,35] achieve optimality for some class of EH processes although they are suboptimal for a broad class of EH processes.

•
We obtain an upper bound for throughput performance of the RR policies under average throughput criterion for quite general (Markov, i.i.d., nonuniform, uniform, etc.) EH processes. Furthermore, we show that all RR policies including the myopic policy achieve almost the same throughput performance under an average throughput criterion.

•
Compared with [34,35], we consider more reasonable finite capacity battery case in the numerical results and show that there is a slight difference in throughput performance between the finite capacity battery case and infinite capacity battery case.

Organization of the Paper
The rest of this paper is organized as follows. The system model and problem formulation are given in Section 2. In Section 3, we show that RR based MP in [34,35] cannot achieve 100% throughput for a broad class of EH processes under average throughput (reward) criterion. Moreover, we obtain an upper bound for throughput performance of RR policies including the myopic policy under average throughput criterion. Furthermore, we show that RR policies including the myopic policy achieve almost the same throughput as each other. In Section 4, numerical results show that the myopic policy is suboptimal for a broad class of EH processes, which supports the results found in Section 3. Section 5 concludes the paper and provides some future directions.

System Model and Problem Formulation
We consider a single-hop WSN where a fusion center (FC) collects data from M EH-capable sensors (please see Figure 1). The index set of all sensors is denoted by S = {1, 2, . . . , M}. The WSN operates in a time-slotted fashion indexed as t = 1, 2, . . . , T. At the beginning of each TS, the FC schedules K sensors for data transmission by assigning each sensor to one of its K mutually orthogonal channels. As the research community working on multi-channel protocols generally either assume that channels are perfectly orthogonal (interference-free) or consider the use of only orthogonal channels [40], we assume that the channels are mutually orthogonal, i.e., there is no interference. If the sensors send data at a low data transmission rate and interference management is applied, very low-error transmission can be achieved. Therefore, we assume error-free transmission in the WSN. We assume that the sensors always have data to send as it is assumed in [34,35]. When you consider a single hop wireless sensor network in a wide lowland (flat cropland), there will be no obstacles like buildings, hills which may cause shadowing, reflection, refraction or absorption/diffractions, etc. In a single hop WSN with a central scheduler, the sensors are expected to send the same type of data such as humidity, temperature, pressure, etc. Considering the applications of WSNs in agriculture and frost monitoring [41][42][43][44][45], the sensors have nearly the same propagation conditions to send the same type of data in large croplands. Therefore, we assume that data packets have equal size and sending one packet takes up one TS. A unit energy is defined as the energy required for a sensor to send one packet in one TS. The energy harvested by sensor i in TSs 1 through t is denoted by E i (t), and the energy harvested . . , T, we define the activation set, denoted by π(t), as set of the sensors scheduled in TS t under a policy π.
As it is assumed in [34,35], if a sensor has sufficient energy and scheduled in TS t, it sends one data packet to the FC in TS t. The number of data packets sent by sensor i in TS t under a policy π can be written as } is the indicator function and B π i (t) is the stored energy in infinite-capacity battery of sensor i in TS t under a policy π. Under the policy π, B π i (t) is evolved as The number of data packets sent by all sensors to the FC within the first t TSs under a policy π is where the number of packets sent by sensor i in TSs . In [34,35], the objective is to find a policy that maximizes the total throughput over the time horizon under expected discounted reward criteria, where the discount factor corresponds to battery leakage. (Ref. [34] considers the problem under discounted throughput (reward) criteria since [34] assume battery leakage with discount factor 0.9 such that stored energy decreases to 90% in a time slot which is generally less than 1 ms. However, this is not realistic with recent battery technology.) On the other hand, battery leakage in typical batteries causes less than 10% decrease in the stored energy in 24 h [23]. This decrease implies that battery leakage in a 1 ms-long time slot is less than 0.000000005% and so the discount factor is greater than 0.9999999988, i.e., 0.9999999988 ≤ β < 1 (Twenty-four hours equals to 86400000 ms. If length of a time slot is chosen as 1 ms, then 0.90 ≤ β 86400000 < 1 which implies that 0.9999999988 ≤ β < 1). Therefore, we neglect battery leakage in our problem formulation, which is practical in terms of engineering aspects. As the problem at hand assumes infinite data backlog and no battery leakage from [19,23], it is delay insensitive by nature. Hence, from [19,23,24], we formulate the scheduling problem as follows.
The following notions are used in the rest of the paper.

Definition 1.
For a given sequence of energy harvests, an optimal policy, π * , is a policy that maximizes the total throughput of all sensors upto KT over a time horizon, T, i.e., π * arg max

Definition 2.
A fully efficient policy, π FE , is a policy under which the sensors use up all of their harvested energy at the end of the time horizon which yields V FE (T) = ∑ M i=1 V FE i (T) (Although we use V π (t) to denote the total throughput achieved in first t TSs under a policy π, the total throughput achieved in first t TSs under a policy π FE is denoted by For certain EH processes, an optimal policy may not be a fully efficient policy, as it is explained in Remark 1.

Definition 3.
Efficiency of a policy π, denoted by η(π), is defined as the ratio of the throughput of a policy π over the throughput of a fully efficient policy, π FE , over the time horizon, T. It can be expressed as where V π (T) and V FE (T) are the number of collected data packets (throughput) over a time horizon T under a policy π, and fully efficient policy π FE , respectively (When K and T are in order of tens and thousands, respectively, throughput of an optimal policy is expected to be in the order of ten thousands. The term, efficiency, provides us the opportunity of dealing with small numbers less than or equal to 1 instead of large throughput numbers. Efficiency of a policy also gives us the relative throughput of that policy to the throughput of a fully efficient policy, which provide convenience in numerical results).
The efficiency term itself can also be considered as relative energy consumption of the system to total energy harvested by the system.
The number of data packets which can be sent by all sensors from TS t + 1 to TS T is denoted by The number of data packets which can be sent by sensor i from TS t + 1 to TS T is denoted by where G * is the set of all throughput-optimal policies (under different throughput-optimal policies, the throughput of a sensor i in first t TSs may be differed since sensor i may be scheduled by the FC different times under different throughput-optimal policies).

Definition 4.
Intensity of sensor i, ρ i , is defined as the integer part of the total energy harvested by sensor i over the time horizon, T, normalized by KT M , i.e., Definition 5. Intensity, ρ, is defined as the sum of integer parts of the total energy harvested by all sensors over the time horizon, T, normalized by KT, i.e.,

Remark 1.
If both of the following conditions, Y i (t) ≤ (T − t) ∀i ∈ S, ∀t and Y(t) ≤ K(T − t) ∀t, are satisfied, then an optimal policy becomes a fully efficient policy, i.e., V * (T) = V FE (T). Otherwise, an optimal policy cannot achieve throughput of V FE (T) = ∑ M i=1 E i (T) , i.e., V * (T) < V FE (T). In the cases violating at least one of these conditions, comparing a policy with a fully efficient policy is much simpler than comparing it with an optimal policy. Therefore, we also introduce the notion of fully efficient policy.
For ease of reference, our commonly used notation is summarized in Table 1. Table 1. Summary of commonly used symbols and notation.

M
The number of energy harvesting nodes K The number of mutually orthogonal channels of FC S The index set of all nodes T The time horizon V π (t) Throughput of all nodes in TSs 1 through t under a policy π V π i (t) Throughput of node i in TSs 1 through t under a policy π η(π) Efficiency of a policy π Y i (t) The number of packets which can be sent by node i in (t, T] ρ i Intensity of node i ρ Intensity

Efficiency of Myopic and Round Robin Policies
A similar problem to the problem at hand is studied in [34,35] for certain specific cases under discounted reward criterion. RR based MP is proposed in both papers in which they prove the optimality of this policy for certain specific cases. We applied this RR based myopic policy to the problem at hand. As the MP in [34,35] is an RR policy with quantum = 1 TS, we investigate only RR policies with quantum = 1 TS, denoted by π RR , in this paper.

Definition 6.
For the network that consists of M sensors and an FC with K channels, a Round Robin (RR) policy with quantum = 1 TS is an RR policy under which the FC schedules the sensors by allocating one TS to each sensor for data transmission in a period of M K TSs (Quantum is defined as the number of TSs allocated to each sensor in a period (round) by an RR policy. An RR policy with quantum=n TSs is an RR policy that allocates n TSs to each sensor in a period (round) of Mn K , and so on. For applicability of RR policies with quantum = n ≥ 1 TS, M K must be an integer).
In this section, we show that RR policies with quantum = 1 TS are generally suboptimal by Theorem 1. Next, we study their efficiencies more precisely and obtain an upper bound for their efficiencies by Theorem 2. Then, we show that an RR policy with quantum = 1 TS achieves almost the same efficiency as another RR policy with quantum = 1 TS by Theorem 3, which implies that the MP in [34,35] is generally suboptimal and the upper bound obtained for RR policies with quantum = 1 TS is also valid for the MP in [34,35].
for some t < T, all RR policies with quantum = 1 TS have efficiency lower than 100% even if fully efficient policy exists for Problem 1 (they are suboptimal).

Proof.
Under an RR policy with quantum = 1 TS, each sensor is visited by the FC either is not an integer. If

K(T−t) M
is an integer, then FC allocates TSs to each sensor. This means that total number of transmissions from any sensor cannot exceed implies that sensor i need to send more than data packets in TSs t + 1 through T so as to send V FE i (T) = E i (T) packets, which must be sent by each sensor for full efficiency. Hence, even if a fully efficient policy exists, any RR policy with quantum = 1 TS is not fully efficient for Problem 1 (they are suboptimal).

Efficiency Bounds of RR Policies with Quantum = 1 TS
In this subsection, efficiency bounds of RR policies with quantum = 1 TS are studied precisely for general EH processes.

Lemma 1.
There exists a class of EH processes with intensity ρ i , such that, for these EH processes, some sensor i transmits lower than min {Y i (0), q i } data packets over a time horizon, T, by an RR policy with quantum = 1 TS, where q i is the number of TSs allocated to sensor i over the time horizon, i.e., q i ∈

(i)
If KT M / ∈ Z, efficiency of an RR policy with quantum=1 TS satisfies

(ii)
If KT M ∈ Z, efficiency of an RR policy with quantum=1 TS satisfies Mρ .
Proof. Please see Appendix B.
From Theorem 2, we derive the following corollaries.

Throughput Difference of RR Policies with Quantum = 1 TS
We will prove that the throughput difference between any two RR policies with quantum = 1 TS cannot be greater than M − K in a time horizon. Recall that, for all RR policies with quantum = 1 TS, the scheduling is periodic with a period of M K TSs. The only difference between any two RR policies with quantum = 1 TS is their initial scheduling time, t 0 ; therefore, an RR policy with quantum = 1 TS that starts to send first packet in TS t 0 can be labeled as RR t 0 , where 1 ≤ t 0 ≤ M K .

Lemma 2.
The number of transmissions of sensor i can be varied at most one under two different RR policies with quantum = 1 TS over the time horizon T, i.e.,

Proof.
Please see Appendix C.
The following example is given to illustrate Lemma 2.  12). The difference between the most and the least efficient RR policies with quantum = 1 TS are Hence, it is observed that throughput of a sensor i can be varied at most 1 under any RR policies with quantum = 1 TS.
The following theorem is based on the extension of Lemma 2 for the whole network. Theorem 3. An RR policy with quantum = 1 TS achieves at most M − K more throughput than another RR policy with quantum = 1 TS over the time horizon T, i.e., where G RR is the set of all RR policies with quantum = 1 TS.
Proof. In this proof, we first consider the case of transmitting messages of M K sensors over a single channel under an RR policy with quantum = 1 TS. Then, we extend the result of Lemma 2 to the case of multiple (K) channels. Notice that is the best choice regardless of the sensor i and this most efficient RR policy must be applied to one of the M K sensors over a single channel. When K channels of the FC are considered, this most efficient RR policy must be applied to K of M sensors. By considering this fact and Lemma 2, an RR policy with quantum = 1 TS transmits at most M − K more data packets than another RR policy with quantum = 1 TS.

Remark 2. From Theorem 3 and Definition 2,
where G RR is the set of all RR policies with quantum = 1 TS.
As the MP in [34,35] is also an RR policy with quantum = 1 TS, from Remark 2, it has almost same efficiency as another RR policy with quantum=1 TS for sufficiently large time horizon T, i.e., T. For sufficiently large time horizons, these results can be extended for RR policies with quantum = n > 1 TSs.

Numerical Results
In this section, efficiency of the myopic policy (MP) is evaluated for the cases of infinite battery and finite battery with B = 50 (B = 50 implies that the battery of a sensor can store energy enough to send 50 data packets since we assume that each data packet transmission requires one unit of energy) at the time horizons varying from 0 to 2000 TSs via simulations (as efficiency of a policy are defined only for the time horizon, T, we obtain efficiency vs. time horizon figures in this section. Notice that efficiency of the MP at T = 0 is taken as 0 for these simulations).
For each node i, the Markovian EH process is modelled by a state space {0, 1 Notice that efficiency of the MP in [34,35] is almost the same as efficiency of an RR policy with quantum=1 TS since η(π RR ) ≈ η(π MP ) for sufficiently large T from Theorem 3 and Remark 2. We observe efficiency of the MP under nonuniform EH processes (it is obvious that the MP achieve efficiencies close to efficiency of an optimal policy for uniform EH processes. As this case is trivial, we did not show the results for this case).
The simulations are made by taking M = 100 and K = 10 under Markovian and i.i.d EH processes with various intensities which are adjusted by choosing intensities of some sensors as 3.0 and choosing others as 0.3 as explained in Table 2 (In WSNs, it is highly possible that some EH sensors can harvest energy much efficiently than others due to their energy harvesting resource (solar, piezoelectric, RF, wind, etc.) and environmental conditions. For example, solar energy harvesting is generally more efficient than the others. Therefore, we choose intensities of some sensors much larger than the others (3.0 for some sensors and 0.3 for the remaining ones) in order to represent the difference between the amount of energy harvested by sensors).

Infinite Capacity Battery
In Figure 2, for i.i.d. EH process with ρ ≤ 1, efficiencies of the MP at T = 2000 are 0.758, 0.564 and 0.469 for ρ = 0.435, ρ = 0.705 and ρ = 0.975, respectively. In Figure 3, for Markov EH process with ρ ≤ 1, at T = 2000, the MP achieves efficiency of 0.758, 0.552 and 0.467 for ρ = 0.435, ρ = 0.705 and ρ = 0.975, respectively. The dramatic difference between efficiencies of the MP in these three intensities is expected since Theorem 2 and Corollary 1 state that, as the number of nodes with intensity ρ i > 1 increases, the efficiency of RR policies with quantum = 1 TS decrease. Notice that, from Theorem 2 and Corollary 1, efficiencies of an RR policy with quantum = 1 TS at T = 2000 are expected to be η(π RR ) ≤ 0.770, η(π RR ) ≤ 0.574 and η(π RR ) ≤ 0.487 for ρ = 0.435, ρ = 0.705 and ρ = 0.975, respectively. When EH processes have memory (Markov processes), we observe similar results to the results in memoriless (i.i.d.) EH processes with same intensity.

Finite Capacity Battery
In this subsection, the simulations are made by considering a finite battery capacity with B = 50 under Markovian and i.i.d EH processes with the intensities in Table 2.
In Figure 4   Markov EH processes have similar results to the results for i.i.d. EH processes with the same intensity.

Discussion
In this subsection, the efficiencies of the myopic policy in both infinite and finite capacity battery cases are compared with each other based on the numerical results in Table 3. Besides this, these numerical results are compared with the expected upper bounds for efficiency of myopic policy. Finally, the complexity of the Round-Robin based myopic policy is investigated.
From Table 3, it can observed that the maximum efficiency difference (0.009) occurs between B = ∞ (0.564) and B = 50 (0.555) for i.i.d. EH processes with intensity ρ = 0.705. For this intensity, the efficiency in finite battery case is 1.596% less than the efficiency in infinite battery case. Besides this, the minimum difference (0.000) occurs between B = ∞ (0.380) and B = 50 (0.380) for i.i.d. EH processes with ρ = 1.515. Therefore, the efficiency of MP for B = 50 is only 1.596% less than that for B = ∞ at most. Hence, we can conclude that the MP can achieve almost the same throughput performance with a reasonable finite capacity (B = 50) battery as that with infinite capacity battery. Table 3. Efficiency of MP for IID and Markov EH processes under both infinite and finite capacity battery assumptions B = ∞ and B = 50 stands for infinite and finite capacity batteries, respectively. ρ denotes the intensity. Max. efficiency difference between B = ∞ and B = 50 represents the efficiency difference between B = ∞ and B = 50 cases for the same intensity. Max. efficiency difference (%) btw. B = ∞ and B = 50 represents the percentage of efficiency difference between B = ∞ and B = 50 cases over the efficiency in B = ∞ case for the same intensity. Max. deviation between the bound and efficiency of MP represents the difference between the upper bound of efficiency of MP and minimum efficiency result of MP for the same intensity.  Table 3, it can observed that the maximum deviation (difference) occurs between the upper bound (0.406) and efficiency of MP for i.i.d. EH processes (0.380) with intensity ρ = 1.515. For this intensity, the efficiency is 6.40% less than the upper bound. Besides this, the minimum deviation occurs between the upper bound (0.770) and efficiency of MP for i.i.d. EH processes (0.758) with ρ = 0.435. For this intensity, the efficiency is only 1.56% less than the upper bound. Based on these results, we can say that the upper bounds for efficiency of the MP are generally tight.
In addition, the Round Robin based myopic policy is a simple policy for this problem. There is an initial ordering and the order is kept during the time interval when Round Robin based scheduling is performed. Sorting algorithms that are required for initial ordering has a worst case complexity of O(M 2 ) and Round Robin algorithm has a complexity of O(1). Therefore, the myopic policy is a low-complexity solution for the problem at hand.

Conclusions
This paper investigates a problem occurring in a single-hop WSN where an FC schedules a set of EH sensors to collect data from them. The FC does not know the instantaneous battery states or the statistics of EH processes at sensors that are data backlogged and the communication is error-free. There is no leakage from the infinite-capacity batteries. The problem at hand is set up as an average throughput (reward) maximization problem. The myopic policy in [34,35] that has an RR structure is applied to this problem as a solution. It is shown that RR policies with quantum = 1 TS are suboptimal for a broad class of EH processes. Next, an upper bound is obtained for efficiencies of RR policies with quantum = 1 TS. Then, it is shown that the myopic policy have almost equal efficiency as another RR policy with quantum = 1 TS. Furthermore, numerical results show that the myopic policy is suboptimal for a broad class of EH processes although it achieves optimality for certain specific cases.
As a future work, we search for a simple, optimal solution to this problem for quite general EH processes. As another future work, we look for extending the single hop problem to multi hop case. Moreover, we plan to investigate the same problem under finite capacity battery case and make the throughput performance analysis of the myopic policy under finite capacity battery case. In addition, we will work to extend the problem in our future works such that we can consider the network lifetime as a performance metric. We believe that novel approaches and concepts in this paper will give insight to the researchers who study similar scheduling problems.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Define a new function, ρ i (t), for a sensor i as where G RR is the set of all RR policies with quantum = 1 TS. Assume ρ i (t) > 1 > ρ i for some sensor i and t. From Label (A1), where τ = min t.
In the interval, τ ≤ t < T, sensor i can send at most K(T−t) M data packets under an RR policy with quantum = 1 TS. Therefore, Label (A2) implies that min {Y i (0), q i } = Y i (0) packets cannot be sent by sensor i over a time horizon, T, where q i ∈ KT M , KT M + 1 . If the EH process at sensor i were a constant EH process with intensity ρ i < 1 (for these EH processes, ρ i (t) = ρ i for all 1 ≤ t < T), then sensor i would send min {Y i (0), q i } = Y i (0) packets over a time horizon, T, from Definition 5 and Remark 1. Hence, efficiencies of RR policies with quantum=1 TS under such an EH process that ρ i (t) > 1 > ρ i for some sensor i and t become lower than they do under a constant EH process with intensity ρ i . This implies that there exists a class of EH processes for which some sensor i cannot achieve the throughput of min {Y i (0), q i } packets over time horizon, T.

Appendix B. Proof of Theorem 2
Define where q and m are integers and 0 ≤ m ≤ M − K. Recall M K ∈ Z for applicability of RR policies with quantum = 1 TS.
Case i: Assume KT M / ∈ Z and the RR policy with quantum = 1 TS starts to scheduling with sensor 1 without loss of generality. Notice that K ≤ m ≤ M − K if KT M / ∈ Z. If the order 1, 2, . . . , M is followed by the RR policy with quantum=1 TS for the scheduling, sensors 1, . . . , m are scheduled q + 1 times and sensors m + 1, . . . , M are scheduled q times over a time horizon, T. From Lemma 1, for some EH processes, otherwise, Label (A4) becomes equality for other EH processes. Hence, . . , m} and H 2 ⊂ {m + 1, . . . , M} are the index sets of sensors that have enough energy to send more than q + 1 and q data packets, respectively. With this specification, the total throughput is From Label (A6) and Definition 3, If the numerator and denominator of the second term of the right-hand side in Label (A7) are normalized by KT M , From Definition 4, ρ i = MY i (0) KT . Labels (A3) and (A8) yields where Hence, efficiencies of RR policies with quantum = 1 TS satisfy Case ii: If KT M ∈ Z, then q = KT M since m = 0 in Label (A3) in this case. Let S = H 2 ∪ (S − H 2 ) where H 2 is index set of sensors that have enough energy to send more than q data packets. (Notice that H 1 = ∅ and H = H 1 ∪ H 2 = H 2 if KT M ∈ Z.) By following similar steps in part (i), we obtain From Label (A10) and Definition 3, efficiency of RR policies with quantum = 1 TS can be expressed as From Label (A3), Definitions 4 and 5, we obtain Mρ .

Appendix C. Proof of Lemma 2
Recall that M K ∈ Z for applicability of RR policies with quantum = 1 TS. There are M K RR policies with quantum=1 TS which start to schedule a sensor i in different TSs. Recall that 1 ≤ t 0 ≤ M K where t 0 is initial time for RR t 0 policy to schedule the sensor i. The proof is divided into two cases: