Distributed Optimal Random Access Scheme for Energy Harvesting Devices in Satellite Communication Networks

This paper considers satellite communication networks where each satellite terminal is equipped with energy harvesting (EH) devices to supply energy continuously, and randomly transmits bursty packets to a geostationary satellite over a shared wireless channel. Packet replicas combined with a successive iteration cancellation scheme can reduce the negative impact of packet collisions but consume more energy. Hence, appropriate energy management policies are required to mitigate the adverse effect of energy outages. Although centralized access schemes can provide better performance on the networks’ throughput, they expend extra signallings to allocate the resources, which leads to non-negligible communication latencies, especially for the satellite communication networks. In order to reduce the communication overhead and delay, a distributed random access (RA) scheme considering the energy constraints is studied. Each EH satellite terminal (EH-ST) decides whether to transmit the packet and how many replicas are transmitted according to its local energy and EH rates to maximize the average long-term network throughput. Owing to the nonconvexity of this problem, we adopted a game theoretic method to approximate the optimal solution. By forcing all the EH-STs to employ the same policy, we characterized and proved the existence and uniqueness of the symmetric Nash equilibrium (NE) of the game. Moreover, an efficient algorithm is proposed to calculate the symmetric NE by combining a policy iteration algorithm and the bisection method. The performance of the proposed RA scheme was investigated via numerous simulations. Simulation results showed that the proposed RA scheme is applicable to the EH devices in the future low-cost interactive satellite communication system.


Introduction
In recent years, the packet-oriented Internet protocol (IP) traffic and machine-to-machine (M2M) applications supported by satellite communication networks have received increasing attention in the research community, which is significant for complementing the terrestrial infrastructure to provide global seamless coverage [1]. For emerging applications, satellite networks need to efficiently tackle bursty traffic with a low duty cycle generated by a massive number of terminals, with constrain energy and limit computational resource. However, this presents a significant challenge for the multiple access protocols in satellite communication networks. Traditional solutions for resource allocation based on fixed assignment and demand assignment are obviously not applicable to this type of traffic [1]. Random access (RA) protocols have become a candidate multiple access scheme in interactive satellite

•
Propose a distributed optimal RA scheme in satellite communication networks with the consideration of energy management policies by extension of the CRDSA protocol towards an EH scenario. Each EH-ST determines whether to transmit the packet and how many replicas are sent according to its local energy information so as to maximize the average long-term network throughput; • develop an analytical model of the average long-term throughput with constraints of packet loss ratio and energy. A game theoretic method is adopted to tackle the nonconvex optimization problem. By employing the same access scheme to all EH-STs, the symmetric Nash equilibrium of this game is characterized, and its existence and uniqueness are proved; • exploit a policy iteration algorithm combined with a bisection method to approximate the optimal solution of the game. The performance of the proposed RA scheme by both numerical analysis and extensive simulations is investigated under different EH rates for different metrics as throughput, packet loss ratio, and data delivery probability.
The rest of this paper is organized as follows. Section 2 introduces the system model and performance metrics. Problem descriptions and formulation are presented in Section 3. Section 4 analyzes and solves the optimization problem. Simulation results are shown in Section 5. Finally, Section 6 concludes this paper and indicates future work.
Notations: Uppercase boldfaces, lowercase boldfaces and normal letters denote the matrices, vectors and scalars respectively, such as X, x and x.

System Model
The scenario under consideration in this paper consists of U wireless EH-STs, where EH-STs send data packets via a shared wireless channel to a GEO satellite, as shown in Figure 1. These EH-STs are placed in the outdoor field to monitor the environment and post the acquired data to the satellite [32]. The data packets received by the satellite are forwarded to the ground gateway station transparently with independent backhaul channels, which are assumed to be error free. The energy harvested by each EH-ST from the environment is stored in a rechargeable battery for the radio frequency unit and sensing apparatus. Each EH-ST is equipped with a processing unit to manage the available energy. The following parts of this section introduce the data model, MAC protocol, energy consumption, energy storage, and energy harvesting models of the system in detail.

MAC Protocol Operation and Performance Metrics
(1) MAC Protocol Operation: The adopted MAC protocol is based on the idea of CRDSA and its frame structure is shown in Figure 2 [4]. Time is slotted, and each RA frame consists of N time slots. The duration of each time slot is assumed to be equal to the packet transmission time, which is denoted as T s . Once the EH-STs have registered in the network, they will keep time synchronization via the procedures described in Reference [33] or in Reference [34]. An EH-ST can transmit only one MAC packet per RA frame. In fact, the EH-ST will transmit l replicas of the same MAC packet physically, where l ∈ Z + and l ≤ N. These replicas carry the same preamble and payload information and are randomly put in l slots of an RA frame. For a specific replica, the payload should contain the location information of other replicas within the frame. Once a packet is detected successfully at the gateway station (e.g., the first packet of user 3 in Figure 2), it will use the location information to cancel the interference caused by other replicas on other slots (e.g., the second packet of user 3 in Figure 2). Most of the packets that are collided initially can be recovered by iterating the above approach. For packets decoded at the gateway station, it is assumed that channel information is always available, which can be acquired by the methods mentioned in References [4,10]. For every frame, the EH-STs will always have a packet to send, and all packets are with equal importance. Each EH-ST decides whether to transmit the packet independently following the policy ω, which is only related to its current energy level and EH rate. Specifically, ω indicates two parts. One is the number of packet replicas to send, which is denoted as l, and the other one is the probability of transmitting these replicas, which is denoted as η. For simplicity, this paper only considers stationary policies independent of each frame, and ω = (ω 1 , ω 2 , . . . , ω U ) denotes as the joint transmission policy of the network. If the packet is not transmitted at the current frame, or it fails to be transmitted successfully, it will be transmitted in the next frame.
(2) Performance Metrics: The performance of the RA scheme is measured by the packet loss ratio (PLR), the throughput T of the system, and data delivery probability p d , where PLR and T are related to the normalized traffic load L, number of packet replicas l, and number of time slots in a frame N [4]. Similar to the definition of throughput in Reference [10], for given l and N, which are usually predefined in the network, the relationship between PLR and T is defined as: where L = M/N, and M is the number of the co-transmitted EH-STs within the duration of one frame (M ∈ Z + ). Moreover, PLR is supposed to be a continuous increasing function with respect to (w.r.t) the traffic load L [4], and the traffic load increases with the increase of packet transmission probability. The data delivery probability is the probability of the data of any EH-ST u to be successfully delivered and correctly received by the satellite during the kth frame when there are new packets ready to be reported at the start of the frame k. This metric, denoted as p d , measures the ability of the RA scheme to successfully transmit the packets from EH-STs to the satellite in every frame, without depleting their energy. p d is expressed as: p d (k) = Pr [u transmits successfully in frame k|u has new data in frame k]. ( Note that the successfully transmitted packets by EH-ST u in Equation (2) include both the noncollided packets from the beginning and the packets that have been resolved during its frame time. Moreover, the statistical probability (Equation (2)) is made by each EH-ST independently. Since EH technologies can provide EH-STs with potentially perpetual operation, the average long-term data delivery probability can be expressed as p LT d = lim k→∞ p d (k).
Packet delay is another metric to describe the average delay of a packet that is successfully received at the gateway station. Each EH-ST transmits l packet replicas with the probability η independently. If there is not enough energy in the battery, it will wait until it has enough energy to transmit the packets. The evolution of the available energy in the battery can be modeled as a Markov chain, which is introduced in detail in Section 3. Moreover, if a packet fails to be received at the gateway station, it will be retransmitted with the same policy ω, according to the residual energy. Therefore, the packet delay D pk is measured as the number of frames that elapse from the start of one frame where the packet was transmitted for the first time, until the end of the frame where the packet was received successfully. According to the description above, the packet delay distribution can be expressed as [35]: where π η (e) is the steady-state probability of the available energy e, and the derivation of π η (e) is presented in the next section. η(e) is the packet transmission probability and e max is the capacity of the battery. Specifically, the term π η (e)η(e)(1 − PLR) represents that the packet is received successfully. For f > 1, the term π η (e)η(e)PLR means the packet is transmitted in the first frame, but it is not received by the gateway station. Furthermore, the term (1−π η (e)η(e))+PLR·π η (e)η(e) f −2 can be explained as that the packet is still not received in the next f − 2 frame periods due to the energy shortage or packet collisions. Thus, the long-term average packet delay D LT pk is given by:

Energy Consumption and Storage Models
In this paper, the rechargeable battery of each EH-ST is modeled as a buffer [36], and each position in the buffer can hold one energy unit. It is assumed that packets are transmitted with equal power, which will consume one energy unit for transmitting one physical packet in a frame, and the other energy expenditures due to battery leakage or data collecting, etc. are not taken into consideration. The number of energy units that are stored in a battery of an EH-ST is denoted by E ∈ {0, 1, 2, . . . , e max }, where e max represents the capacity of the battery. Denote the amount of energy units of EH-ST u at the frame k as E u,k , where k ∈ Z + , and the evolution of E u,k is formulated as follows: where Q u,k is the packet transmission action of EH-ST u. Q u,k = 1 if the EH-ST determines to transmit the current packet with l u,k replicas, which will consume l u,k energy units, otherwise the EH-ST u will remain idle for Q u,k = 0. B u,k represents the amount of energy that the EH-ST u harvested within the duration of frame k, and the harvested energy within the duration of frame k can only be used for the next frame. Thus, if there is no energy left in the battery, then the EH-ST u will remain idle (Q u,k = 0).

Energy Harvesting Model
The communication scenario under consideration in this paper is mainly applied to environmental monitoring, and each EH-ST can harvest different amounts of energy units from the ambient energy sources. For instance, for a solar source, EH-STs in the direct sunlight situation will harvest more energy than those in a cloudy situation. Therefore, we assume the energy harvester of EH-ST u acquires B u,k energy units from the environment within the duration of frame k. Based on the energy harvesting models mentioned in References [22,24,25], the energy arrival process in this paper is assumed to follow Poisson distribution during one frame transmission time with parameter b k , which represents the stochastic and intermittent nature of the ambient energy sources. Thus, the probability of B u,k energy units arrived within the duration of frame k is: The energy arrival process of the components of B k = (B 1,k , B 2,k . . . , B U,k ) is considered to be identically and independently distributed (i.i.d) over time and all EH-STs. β denotes the average energy units harvested within a long period of time, which reflects the average long-term EH rate, and β = E [B u,k ].

Problem Descriptions and Optimization
This paper focuses on designing a distributed RA policy for EH-STs in satellite communication networks. Following this way, all the EH-STs in the networks are considered to have their own local information, i.e., the energy level at the current time and the EH rate. Therefore, EH-ST u decides whether to transmit the packet in the current frame k or keep idle based only on E u,k , independent of the energy level of other EH-STs (E i,k , i = u). For the given E u,k , the probability of EH-ST u transmitting its current packet (Q u,k = 1) with l u,k replicas is η (E u,k ), which is the transmission policy of our design.
Supposing there are M EH-STs transmitting packets in the same frame duration, given the initial state of energy levels E 0 = e 0 ∈ E U , the policy ω, the number of packet replicas l, and the number of time slots in a frame N, the average long-term throughput contributed by the specific EH-ST u can be denoted as follows: where i indicates the other co-transmitted EH-STs in addition to EH-ST u, and j represents the rest of the EH-STs which keep idle. The term 1−PLR (M, l, N) represents the throughput contributed by EH-ST u, and the term ( U M ) ∏ i =u,j η i,k ∏ j =u,i 1 − η j,k is a binomial operator which stands for the probability of M EH-STs transmitting packets. Specifically, the average probability of EH-ST u transmitting l u,k replicas in energy level e is denoted as: Furthermore, the expected throughput contributed by EH-ST u in energy level e is denoted as g (η u (E u,k )), which is given by: g (η u (e)) is a concave function w.r.t the traffic load, which is related to the transmission probability η u (e). Therefore, Equation (7) can be restated as: Moreover, the average long-term throughput of the network is defined as the number of the packets successfully uploaded to the satellite, which can be represented as: This paper aims to design the packets transmission policies ω to maximize the throughput of the network, i.e., However, the design of transmission policies ω for the EH-STs in the satellite communication networks needs to consider the following trade-offs: • More packet replica EH-STs transmitting seems to provide a better performance of packet detection [7]; nonetheless, too many replicas will cause channel saturation easily in the heavy traffic load [35] and battery depletion occurs more frequently; • larger transmission probability makes EH-STs transmit more often, which yields large access latencies due to packet collisions, especially for the heavy traffic load [4,28]; hence, a lower network throughput is accrued; • smaller transmission probability results in a lighter traffic load, which increases the packet transmission success rate; however, as Equation (1) indicated, smaller network throughput may be acquired.
Thus, the optimal transmission policy ω * reflects an optimal trade-off among energy consumption, access delay, and network overall throughput. In order to keep balance between the performance of packet detection and energy consumption, this paper assumes that each EH-ST can transmit l max packet replicas in one frame at most. A specifically admissible policy Ω that indicates the number of transmission packet replicas and transmission probability independent of initial energy state e 0 is defined below.

Definition 1.
A specifically admissible policy Ω is defined as: η (e) ∈ (0, 1) and l = e, e < l max η (e) ∈ (0, 1] and l = l max , l max ≤ e ≤ e max Remark 1. Note that the set of the specific policy Ω is made up of two parts. One is the number of transmission packet replicas l and the other one is the probability of transmitting these l packet replicas η(e). In fact, l can be an arbitrary positive integer, and it is no larger than the current energy level e. However, considering the complexity of the analysis procedure and the algorithm design, we predefine a reasonable transmission policy in terms of the number of transmission packet replicas. Specifically, each EH-ST will transmit l max replicas with probability η(e) if l max ≤ e ≤ e max , or it will keep idle with probability 1 − η(e). If the current energy level of an EH-ST is less than l max , the EH-ST will transmit e replicas with probability η(e). Therefore, we mainly focus on the design of the optimal transmission probability η * . Since there is a one-to-one mapping between the specific policy Ω and the transmission probability η, without loss of generality, we refer to η u as the policy of EH-ST u in the following parts of this paper.
Under the policy η ∈ Ω U , the evolution of the energy available in the battery {E k } can be modeled as a Markov chain. For each EH-ST, the state in the chain is defined by {E k }, where E k ∈ {0, 1, 2, . . . , e max } represents the available energy units stored in frame k. The transition matrix of the Markov chain is denoted by P = [p mn ], where p mn is the probability of one-step transition given by: According to the energy harvesting model, energy consumption model, and the defined policy, the transition probability from state e m to state e n is formulated by: if e m = 0 and e n = e max which indicates the Markov chain is irreducible under this transmission policy. Therefore, there exists a unique steady-state probability distribution, which is denoted by π η (e), e ∈ E U , and it satisfies both (P − I) π η (e) = 0 and ∑ e max e=0 π η (e) = 1, where P and I are transition probability and identity matrices [37]. Thus, the steady-state probability π η (e) can be acquired by solving the linear equations. Moreover, the steady-state probability distribution is independent of the initial state e 0 , so Equation (10) can be further deduced as: Since the packet transmission action Q u,k is based only on the current available energy unit in EH-ST u, the energy harvesting process is i.i.d for all EH-STs. Moreover, the energy level of each EH-ST is independent, so π η (e) can be represented as: where π η u (e u ) is the steady-state probability of EH-ST u at energy level e u . Letting: Equation (16) can be rewritten as: In Equation (19), G(η u ) is the average long-term throughput of EH-ST u, assuming there are M EH-STs co-transmitting packets in the same frame. ∏ i =u,j P(η i ) is the steady-state probability of other M − 1. EH-STs are also transmitting packets, and ∏ j =u,i 1−P(η j ) represents the steady-state probability of the rest of the U − M EH-STs being in an idle state. According to Equation (11), the network overall throughput under the packet transmission policy η becomes: In order to keep the packet transmission procedure fair, this paper adopts symmetric control policies [28], i.e., all EH-STs employ the same RA policy η u = η, ∀u. Therefore, Equation (20) can be rewritten as: The total number of EH-STs (U) registered in the network can be acquired from the satellite broadcast information. The optimization problem (Equation (12)), under the admissible symmetric policies, can be represented as:

Optimization and Analysis
The optimization problem (Equation (22)) can be separated into two parts. The term M · G(η) represents the expected network total throughput when there are M EH-STs co-transmitting packets in the same frame, and the term ( U M )P(η) M−1 (1−P(η)) U−M is the probability of M EH-STs transmitting packets concurrently in the same frame, which can be regarded as a binomial operator. Moreover, g(η) is concave; thus, Equation (22) is a nonconvex optimization problem. In order to determine the approximated solutions of Equation (22), this paper exploits a game theoretic method which is mentioned in Reference [36] to formulate the random access procedure. Specifically, we use a game theoretic method to model the optimization problem, where each EH-ST u acts as a player which optimizes its own transmission policy η u to maximize the network throughput (Equation (19)).
This section first analyzes the characteristics of the general Nash equilibrium (NE) of this game. The transmission policy profile is defined as η * = (η * 1 , η * 2 , . . . , η * U ), which is the joint policy. According to the definition of NE, if any EH-ST u adopts the policy η u = η * u , while all the other EH-STs use the policy η * i (i = u) to achieve the NE, then a smaller network throughput is obtained. The NE condition must satisfy all the EH-STs, and no player can improve the throughput by deviating unilaterally, i.e., By the definition of NE, if an NE (not necessarily symmetric) exists for the game, it must solve ∀u: In the last step, the optimization formula is divided by the term ( U M )∏ i =u,j P(η * i )∏ j =u,i 1−P(η * j ) , and we remove the additive term, which is independent of η u . Furthermore, we impose the symmetric policy η * u = η * , ∀u, and obtain the symmetric NE as follows: where Γ(η) is defined as: η * in Equation (25) is the policy that is simultaneously optimal for all the EH-STs, and any single EH-ST unilaterally deviates from the equilibrium condition η * yielding a smaller network throughput. G(η) in Equation (25) is the throughput contributed by EH-ST u when there are M EH-STs accessing the channel simultaneously. The term Γ(η * ) can be regarded as a Lagrange operator, which is associated to control the transmission probability of EH-ST u and further controls the number of concurrent EH-STs and packet loss rate due to packet collisions. Therefore, the overall objective of Equation (25) is to maximize the individual throughput, so as to maximize the network throughput. Meanwhile, constraints on the average transmission and number of concurrent EH-STs need to be taken into consideration. For a fixed network size U, the Lagrange operator decreases with the increase of M; however, due to the concavity of g(η), larger M will increase the packet loss rate, which is negative to network throughput. Moreover, the larger the throughput or transmission probability of other EH-STs, i.e., G(η * ) or P(η * ), the larger the Lagrange operator Γ(η * ); thus, a stringent transmission probability is needed to reduce the packet collisions. The Lagrange operator optimally controls the transmission probability of each EH-ST. Therefore, in order to achieve the maximum throughput of the network, the average transmission probability needs to be properly designed based on the Lagrange operator.
To solve Equation (25), we simplified it to a more general optimization problem, which is represented as follows. For γ ≥ 0: and G(η) − γP(η) = ∑ e max e=1 π η (e) [g(η(e)) − γη(e)]. Based on this simplified optimization problem, we can further prove the existence and uniqueness of the symmetric NE in Equation (25), thus finding the solution of Equation (25), which is locally optimal for the original optimization problem (Equation (22)).
Remark 2. The first property can be proved by the concavity of g(x); thus, x * γ = arg max x∈[0,1] g(x) − γx has a unique solution. Due to the continuity of g(x), η (γ) is continuous in γ. For the third property, the transmission probability η ∈ Ω cannot be larger than the energy harvesting rate β(B ≥ l), where B is number of acquired energy units and l is the number of transmitted packet replicas. Moreover, g(η) is a concave function which is related to PLR, and the relationship between g(η) and PLR is dictated in Equation (9). Since PLR increases with the traffic load increasing, PLR should be below a PLR threshold. Because a large PLR means more packets cannot be detected successfully, which will cause a large delay penalty for the satellite networks, η PLR th is the maximum packet transmission probability that ensures the PLR of the network the being below the PLR threshold.

Proof of Proposition 3. See Appendix B.
Since the proposed policy is a symmetric policy, we further analyze the characteristics of the symmetric NE condition for the game. According to the above propositions, the following theorem proves the existence and uniqueness of the symmetric NE, i.e., the solution of Equation (25).

Proof of Theorem 1. See Appendix C.
Generally, the symmetric NE may be a suboptimal solution of the original optimization problem (Equation (22)), because it is optimal only for the case that EH-ST deviates from the network unilaterally, not symmetrically. Contrarily, if all the EH-STs change the policy by the same quantity (the policy is still symmetric), then it may improve the throughput of the system. Therefore, in this situation, the obtained policy holds for the symmetric NE may not be globally or locally optimal. In the following theorem, we show that the obtained symmetric NE is a locally optimal solution of the optimization problem proposed in this paper. (25) is locally optimal for the original optimization problem (Equation (22)).

Proof of Theorem 2. See Appendix D.
In this paper, we also present an algorithm to determine the optimal policy η * . From the analysis above, we need to determine the unique γ * , s.t. f (γ * ) = 0, where we have previously defined f (γ) = Γ(η (γ) ) − γ. Then, according to γ * , we can obtain the optimal policy η * as η * = η (γ * ) . Since f (γ) is a continuous decreasing function of γ, f (0) > 0, and f (∞) → −∞, we can use the bisection method [38] to search the unique γ * which makes f (γ * ) = 0. Thus, upper bounds γ up and lower bounds γ low are needed to approach γ * , where γ low < γ * < γ up . By computing the value of f (γ) for the updated γ = (γ up + γ low )/2, the upper and lower bounds are updated and refined recursively until f (γ) satisfies the expected accuracy. In order to compute f (γ), we also need to determine Γ(η (γ) ), which can be computed efficiently by a policy iteration algorithm [39]. The lower bound is initialized by γ low = 0. For the upper bound, note that M ≈ UP(η), and the average throughput gain G(η) is measured by packet/slot which is less than 1. Therefore, the upper bound is initialized by γ up = U. The algorithm is described in detail in Algorithm 1.
The policy iteration algorithm employed in Algorithm 1 is able to determine the optimal policy η (γ) when δ PI A → 0. The optimality and convergence of the policy iteration algorithm is proven in Reference [39]. Particularly, in the policy improvement step of Algorithm 1, the optimization function has a unique optimal solution, since d γ (η(e)) is a concave function of η(e). The optimal solution can be obtained by taking the derivative of the optimization function w.r.t η(e).

Remark 3.
Algorithm 1 is combined policy iteration algorithm with a bisection method, which can be operated efficiently. For the policy iteration part, we need to calculate the Markov steady state probability distribution π η [i] (e), which can be solved by linear programming using the transition probability (Equation (13)) , and its computing complexity is O(e max ). Furthermore, in order to obtain the value function v(e), we also need to solve the linear system, whose computing complexity is O(e max ). For the policy improvement step, it needs to take the derivative of the optimization function and then solve the linear function. Thus, the complexity of this step scales as O(e max ). Finally, these steps are needed to iterated N iter (δ PI A ) times until it converges. Typically, N iter (δ PI A ) is no larger than 10 [28]. Therefore, the overall computing complexity of policy iteration algorithm scales as O(e max N iter (δ PI A )). For the bisection method, the maximum iterations to achieve the expected accuracy δ BM are log 2 (U/δ BM ). Thus, the overall complexity of Algorithm 1 is about O(e max N iter (δ PI A )log 2 (U/δ BM )).

Simulation Results
This section provides some simulation results to evaluate the proposed random access policy. It is assumed that the number of time slots in a frame N is 200 and each packet occupies one time slot. The length of each packet is 100bits and the data are modulated by QPSK. The received energy per symbol noise power spectral density ratio E s /N 0 of each packet is assumed to be equal with 10 dB. Additive white Gaussian noise (AWGN) is added before CRDSA demodulating, and the maximum decoding iteration is to set N max iter = 15 . For CRDSA, a packet is either clear (not collided with other packets) or interfered with other packets entirely [4]. For the clear packets, they can be decoded easily even without FEC, since E s /N 0 is large enough to decode the packets correctly. For the entirely collided packets, they can hardly be decoded correctly even with FEC, since the power of the overlapped packets is nearly the same [11]. Thus, we do not consider FEC in this paper. Actually, a power unbalance scenario and proper transmission power selection schemes combined with FEC can further boost the performance of the CRDSA [40], but the analytical model needs to be redesigned, which is out of the scope of this paper. This paper aims to provide a common but effective train of thought of designing a distributed RA policy for energy harvesting devices in satellite communication networks. Thus, the application of FEC is left for future research together with the power unbalance scenario and some variant CRDSA protocols. The main simulation parameters in this section are listed in Table 1. Since CRDSA adopts the scheme of successive interference cancellation, the value of PLR is obtained by an iteration method for a specific traffic load [4,10]. Therefore, the expression of PLR(L) cannot be obtained directly. This paper uses the method of curve fitting to approximate the PLR function. In order to guarantee the PLR(L) increasing and g(η) concave in the range (0, 1), we adopt a Gaussian model with 8 terms to fit the curve, and the simulation results are shown in Figure 3. PLR value of CRDSA is computed by the method mentioned in Reference [10]. The traffic load is normalized by M/N, where M is the number of concurrent terminals. Figure 3 shows the fitted curves are well matched with CRDSA PLR curves for both 2 and 3 replicas.  Figure 4 illustrates the network throughput of the proposed policy obtained by simulation and theoretical analysis. We plot the relationship between the normalized network throughput and the number of EH-STs in the network U, and consider different scenarios by varying the energy harvesting rateβ ∈ {1, 2, 3}. The normalized throughput is defined as T(L) = (1 − PLR(L))L, where the traffic load L is normalized by M/N. The capacity of the battery is assumed to be e max = 10 and the PLR threshold is set 10 −3 . The simulation results indicate that the analytical network throughput in different EH rate scenarios is well matched with that simulated in the same scenario. For the lower EH rates β = 1 andβ = 2, the network throughput increases with the number of EH-STs in the network linearly, because in these scenarios, the transmission probability is constrained by the EH rate. Moreover, under a lower EH rate, although the network size increases, the average number of concurrent EH-STs is still less than what CRDSA can support, i.e., the achieved PLR does not exceed the threshold. Due to the properties of g(η), the network throughput increases with the network size for the lower EH rates. As the EH rate grows, each EH-ST has more transmission opportunities. Since the CRDSA protocol has a good performance on recovering collided packets under moderate traffic load, the network will gain higher throughput under a higher EH rate. When the EH rate is high enough (β = 3), the throughput of the network increases with the network size first, but afterwards it tends to be flat. This is because when the network size becomes larger, more EH-STs have enough energy to transmit the packet. For the larger network size, packet collisions become the bottleneck of the performance. The policy will control the transmission probability of each EH-ST to guarantee the PLR is under the threshold. In addition, we also plot the upper bounds (UB) of the network throughput for different scenarios, which are represented by the black dashed line in Figure 4. Since g(η) is a concave function, G(η) ≤ g(P(η)) from Jensen's inequality [41]. Moreover, P(η) ≤ min β(B ≥ l), P(η PLR th ) ; thus, the upper bound of the network throughput can be obtained as:

Throughput and PLR
We notice that the proposed policy calculated by Algorithm 1 closely approaches the upper bound under each scenario and the performance degradation for each case is within 5% w.r.t the upper bound. This indicates the locally optimal solution achieved by the symmetric NE is a near-global optimum.
In addition to the proposed policy, we also evaluated the performance of the following policies, which are based on the baseline mentioned in Equation (12): The energy-balanced policy (EBP), where each EH-ST transmits the packets with the probability of β(B ≥ l); the network-balanced policy (NBP), where each EH-ST transmits the packets with the probability of η PLR th so as to maximize the throughput of the network; and the greedy policy (GP), where each EH-ST transmits the packets with the probability of 1 as long as it has enough energy. The simulation results for throughput and packet loss ratio are represented in Figures 5 and 6, respectively. In addition, we also present the performance of CRDSA (without consideration of the EH process and the limitation of energy) with 2 and 3 replicas to compare with other policies.
For the lower EH ratesβ = 1 andβ = 2 cases, the normalized throughput of the proposed policy, EBP and NBP all increase linearly with the increase of network size. The performance of EBP is nearly the same as that of the proposed policy, because in these scenarios, the transmission probability of each EH-ST is limited by the EH rate. The performance of NBP is a little worse than that of the proposed policy and EBP, because in these scenarios, η PLR th > β(B ≥ l), which means the transmission probability of each EH-ST is larger than the energy arrival rate. Therefore, more EH-STs are in the state of energy exhausted. From the perspective of PLR, all these three policies are able to control the PLR under the PLR threshold in the lower EH rate cases. Interestingly, the performance of GP seems better than the other three policies. In fact, under GP, EH-STs will always transmit packets as long as they have energy, which increases the traffic load; thus, GP can achieve a higher throughput. However, due to the low EH rate, most EH-STs have no energy to transmit packet replicas, which makes the collided packets unable to be recovered and increases the PLR. Actually, Figure 6 shows the PLR of GP is beyond the threshold even though the traffic load is light. A large PLR leads to packets retransmission frequent, which will cause unacceptable large communication latencies in the satellite networks and expand more energy units for the EH-STs. On the other hand, whenβ = 3, the performance of NBP and EBP is close to the proposed policy for the light and moderate traffic load. When the traffic load becomes heavier, the normalized throughput of NBP and the proposed policy tends to be flat at 0.6 packets/slot, while the performance of EBP drops dramatically. This is because for the lighter traffic load, β(B ≥ l) < η PLR th , the performance of these three policies is still constrained by the EH rate. As the network size grows larger, packet collisions become severe. EH-STs employ NBP and the proposed policy transmitting the packets with probability of η PLR th , but those who employ EBP transmit the packets with a probability of β(B ≥ l). At this time, β(B ≥ l) > η PLR th , which results in the PLR of EBP, exceeds the threshold. Thus, the performance of EBP becomes worse for the larger network size. GP seems to behave better than the other three policies in the small and medium network size scenario. However, as mentioned above, the PLR of GP exceeds the threshold even for the light traffic load. Moreover, the maximum normalized throughput of EBP and GP is similar with that of CRDSA with 3 replicas, which almost reaches 0.64 packets/slot. However, when the performance achieves the peak, the PLR is beyond the threshold. Notice that all the policies cannot perform as well as the CRDSA protocols for the small and medium network size, because conventional CRDSA does not consider the limitation of energy and the EH process. Therefore, terminals employing conventional CRDSA protocols are assumed to always have enough energy to transmit their packet replicas. In other words, if the EH rate is high enough, the proposed RA scheme can perform as well as CRDSA protocols.

Data Delivery Probability
The trade-off between the average long-term data delivery probability and the network normalized throughput is shown in Figure 7 under different EH ratesβ ∈ {1, 2, 3}. System parameters are the same with the previous simulations. We evaluated the performance of two centralized access protocols, which are time division multiple access (TDMA) and dynamic frame slotted Aloha (DFSA) [20,21], to compare with that of the proposed distributed RA scheme. TDMA and DFSA are centralized access protocols, while the proposed policy is based on a random access scheme, which is different from the other two protocols in terms of the nature of mechanisms. In fact, the centralized schemes are expected to achieve higher throughput performance than the RA schemes, because packet collisions can be avoided by proper management at the central controller. However, in the energy-constrained networks, the centralized access schemes cannot always achieve such high data delivery probability and throughput due to energy shortage. In the simulations, we compare the performance of these three schemes from the perspective of energy constraint, which seems reasonable. The packet delivery probability of all these three schemes increases with the increase ofβ. TDMA always outperforms DFSA and the proposed RA scheme in terms of data delivery probability. Since time slots have been preallocated to each EH-ST, it does not suffer packet collisions. Therefore, the data delivery probability of TDMA is determined by the EH rate. DSFA can adjust the length of frame dynamically according to the number of active EH-STs, and it offers retransmission opportunities for the EH-STs. For the lower EH rates, many EH-STs are in a state of energy shortage; thus, the data delivery probability of DSFA is limited by the EH rate. Moreover, a larger frame makes more EH-STs deliver their data to the satellite successfully, but causes lower throughput. Unlike DSFA, the proposed distributed RA scheme balances well for both data delivery probability and throughput. This is because the delivery probability of the proposed scheme is restricted not only byβ, but also by η PLR th . Thanks to the interference cancellation mechanism, the maximum throughput of the proposed RA scheme is higher than that of DFSA. However, transmitting more packet replicas will consume more energy. Compared with TDMA and DFSA (they only transmit one physical packet once), more EH-STs do not have enough energy to deliver the data. That is why the data delivery probability of the proposed RA schemes is only about 0.26 whenβ = 1. For the lower EH rates (e.g.,β = 1 and β = 2), the EH rate dominates the delivery probability. When EH rates become higher, almost every EH-ST has enough energy to send data. At this time, the proposed policy needs to guarantee the PLR being not beyond the PLR threshold; thus, η PLR th determines the delivery probability. Specifically, whenβ = 3, the delivery probability of the proposed scheme presents to be flat for the lower and moderate throughput but drops dramatically to control the PLR for the higher throughput. Notice that both TDMA and DFSA are centralized access protocols, and the central controller will allocate the time slots or adjust the frame length to improve the performance. However, the proposed distributed RA scheme can achieve acceptable throughput under the higher data delivery probability, which even performs better than DFSA. In addition, the proposed distributed RA scheme does not need the procedure of resource allocation and controls the PLR strictly, which reduces the communication delays caused by resource assignment and packet retransmission. Therefore, the proposed distributed RA scheme is more suitable for the future low-cost interactive satellite communication scenario with a large network size and high EH rate.

Packet Delay Assessment
The metric of packet delay is assessed for the proposed scheme and DFSA protocol in a heavy traffic load, and the results are shown in Figure 8. For the proposed scheme, we evaluate the performance of packet delay via both simulation (solid lines) and analytical method (dash lines), and they are well matched under different energy harvesting rates. Moreover, as the energy harvesting rate increases, the performance of packet delay becomes better. For instance, the probabilities of a packet being received successfully within 2 frames are about 0.48, 0.8, and 0.85 under the EH rates β = 1, 2, and 3 respectively. This is because the (re)transmission probability is not only constrained by the current energy level, but also the PLR threshold, regardless of the packet being transmitted for the first time or being retransmitted. Therefore, with the constraint of PLR threshold, almost all the packets can be correctly received as long as they are transmitted, and retransmissions will not block the channel. According to the analysis in Section 5.2, for the lower EH rates (e.g.,β = 1 andβ = 2), the EH rate dominates the delivery probability, while for the higher EH rate, η PLR th determines the delivery probability. Thus, the packet delay whenβ = 3 performs slightly better than whenβ = 2. Since TDMA is a collision-free protocol, no retransmissions occur in this system. Therefore, we only adopt DFSA to compare with the proposed scheme in terms of packet delay. For DFSA, packets are transmitted as long as EH-STs have energy, and no replicas are used, which will consume less energy. For different EH ratesβ ∈ {1, 2, 3}, almost every EH-ST has energy to transmit its packet; thus, their performances of packet delay are nearly the same. Since each packet is only transmitted once in a frame, EH-STs applying DFSA have more energy and chances to retransmit their data when the EH rate is low (β = 1) compared with those applying the proposed scheme. Therefore, the packet delay performance of DFSA is better than that of the proposed scheme whenβ = 1. However, without a packet collision resolution mechanism, EH-STs applying DFSA have to retransmit their data more times. Therefore, when the EH rate becomes higher, EH-STs applying the proposed scheme have more opportunities to transmit their data, and the packet delay performance of the proposed scheme is superior to that of DFSA.

Conclusions
This paper considers a satellite communication network made up by a GEO satellite and multiple EH-STs. The EH-STs transmit data packets to the satellite randomly over a shared wireless collision channel. Packet replicas and a successive interference cancellation mechanism are adopted to improve the network throughput. Considering the EH process and energy constraints of EH-STs, we have designed a distributed RA scheme aiming to maximize the average long-term network throughput. Firstly, we developed an analytical model of the average long-term throughput with constraints of packet loss ratio and energy and adopted a game theoretic method to approximate the solution of the nonconvex optimization problem. Then, we characterized the symmetric NE of the game and proved its existence and uniqueness, which indicated it is a locally optimal solution of the original optimization problem. A policy iteration algorithm combined with a bisection method was used to compute the symmetric NE. Finally, we evaluated the proposed RA scheme by simulations. Simulation results showed that the proposed RA scheme is more suitable for the large network size and high EH rate scenario, which is applicable to EH devices in the future low-cost interactive satellite communication system. This paper provides a common but effective train of thought of designing a distributed RA policy for energy harvesting devices in satellite communication networks, but it is based on a simplified version of CRDSA. In order to further improve the policy, we will concentrate on researching the following works in the future. First, we will extent the policy to an asynchronous version, since it is more suitable to the energy constrained system (time synchronization will cost extra signalings and energy). Second, we will investigate the performance of using irregular replicas based on the analysis method in this paper. In addition to the number of packet replicas, the effect of power imbalance and transmission power selection are also important factors that need to be taken into consideration in future work.
Then, we explain the second part of the theorem that P(η (γ * ) ) ≤ min β(B ≥ l), P(η PLR th ) . From the third property of P1, the average transmission probability should not be larger than the energy harvesting rate, i.e., P(η (γ * ) ) ≤ β(B ≥ l). Moreover, according to the nature of the MAC protocol CRDSA, as the traffic load increases, PLR increases as well. On the other hand, the throughput of the network increases with the increase of traffic load first but decreases sharply when the traffic load is larger than a threshold value [4]. The traffic load is directly related with the transmission probability and the raffic load threshold value mentioned above corresponds to a PLR threshold value. In order to maximize the network throughput, the average transmission probability should not be larger than the transmission probability that makes the network PLR larger than the PLR threshold, i.e., P(η (γ * ) ) ≤ P(η PLR th ) , where η PLR th is the policy that achieves the PLR threshold. Thus, the theorem is proven.