POMDP-Based Throughput Maximization for Cooperative Communications Networks with Energy-Constrained Relay under Attack in the Physical Layer

In this paper, we investigate jamming attacks in the physical layer against cooperative communications networks, where a jammer tries to block the data communication between the source and destination. An energy-constrained relay is able to assist the source to forward the data to the destination even when the jammer tries to block the direct link. Due to a limited-capacity battery of the relay, a non-radio frequency energy harvester equipped in the relay helps to prolong its operation. We propose a scheme based on a partially observable Markov decision process (POMDP) to find the optimal action for the source such that we can maximize the achievable throughput of cooperative communications networks. Under this scheme, the source dynamically selects the appropriate action mode for its transmission in order to obtain maximum throughput under the jamming attack. Simulation results verify that the proposed scheme is superior to the Myopic scheme where only current throughput is taken into account for making decisions.


Introduction
Cooperative communications are used to effectively improve the quality of a wireless network.The reliability and capacity of wireless communications are substantially increased by deploying the cooperative communication technique.In a cooperative communication system, each user can directly transmit data and collaborate with other users (i.e., relays) to transmit its data to a destination for enhancing the quality of transmissions [1].In this case, an intermediate relay is used to support the transmissions between the source and the destination.Cooperative communications can offer remarkable advantages for wireless networks such as high energy efficiency and extended network lifetime [2,3].Recently, several studies have showed that cooperative communications can help to enhance the capacity and reliability of the wireless networks [4][5][6].
The physical layer is the lowest layer in the Open Systems Interconnection (OSI) model.It can be used to verify the physical properties of a transmission in the network.However, the broadcast nature of wireless communications leaves the physical layer vulnerable to threats, e.g., eavesdropping, node tampering, hardware hacking, and jamming attacks [7].In an eavesdropping attack, the eavesdropper can overhear the confidential information of legitimate users and occupy the data in the transmission area of a node.In node tampering, the attacker can replace the entire physical node or part of the node.Hardware hacking can damage nodes via malicious entities such that the nodes can lose their expected functionality, leaving them vulnerable to other risks.In jamming attacks, a jammer attempts to prevent users from accessing wireless network resources and reduces network availability by generating interference signals on the channels.This exhausts the energy of the nodes in the network [8,9].
A cooperative communications network is particularly vulnerable to malicious attacks in the physical layer.Moreover, jamming is one of the more serious attacks that greatly degrade network performance.In order to tackle the jamming attacks, frequency hopping spread spectrum and direct sequence spread spectrum are widely utilized [10].However, the same sequence can be used by the jammer to attack its target if the hopping sequence is exposed.Thus, the random rendezvous [11] and the uncoordinated frequency hopping [12] are used to safely share the hopping sequence.Nevertheless, these techniques result in the time wastes for the communications.Therefore, other secret sharing protocols are proposed such as public key cryptography, certificate and authentication protocol but they cause the large overheads and computational [13].Desmedt [14] proposed an efficient coding method that provides protection against malicious users.Popper et al. [15] applied the uncoordinated spread spectrum (USS) techniques to prevent jamming of the communications between transmitter and receiver.USS achieved effective anti-jamming by discarding the require secrets before sharing, at the expense of a decreased communications throughput.However, USS techniques require the complex frequency synthesizers.Chorti derived optimal power allocation policies for transmitter and receiver pairs, where the active jammer is formulated as a one-shot zero-sum game for anti-jamming in secret key-generation systems [16].Almost all the previous works on anti-jamming focus on how to design physical layer technologies (e.g., spread spectrum) [10,12,15,[17][18][19].If the signals are widely spread, it will become harder for the jammer to interrupt the transmission link; meanwhile, the complexity in spread spectrum technique may be enlarged and thus it is not easy to deploy in the reality.
Recently, various jammer localization schemes have been proposed in wireless communications [20,21].The authors in [20] use spatial information as the basis for detecting the attacks to verify the number of attackers and localize the ambient adversaries.In [21], the authors consider the multiple jammers scenario in which multiple jammers can attack the network at the same time to achieve a better jamming effect.They propose the jamming detection scheme by developing X-rayed jammed-area localization algorithm.However, measurement data collection and position information sharing will bring more challenges.In [22], the authors investigate the multi-hop multi-channel cognitive radio network in the presence of multiple jammers.To deal with the energy-constrained problem, two novel algorithms are proposed to maximize the energy efficiency of data transmission from the source to the destination under the jamming attacks.Although the simulation results can verify the effectiveness of the proposed scheme, the applicable metrics may be limited due to the high overheads and algorithm complexity when the number of intermediate relays in the network is large.The digital feedback scheme is proposed to improve the speed of transceivers and it is also proved to be robust to noise in the feedback channel [23].In order to deal with the energy-constrained issue and resource scarcity, the authors developed the joint controller and the related supporting access protocol to maximize both the energy and bandwidth efficiency of the vehicular access network, which is guaranteed to be reliable and safe in the wireless communication [24].
Recently, energy harvesting has become one of the appealing techniques for solving the energy-constraint problem in wireless networks.Practically, user equipment units (UEs) often are equipped with a limited-capacity battery.That results in degradation of network performance due to limited energy for the operation.Thus, energy harvesting technique can provide permanent energy for the battery without any physical replacements.Fortunately, UEs can harvest energy from non-radio frequency (RF) signals (e.g., solar, wind, heat, etc.) [25] or from RF signals [26,27], which are available in ambient environments.
In this paper, we consider the jamming attack scenario in cooperative communication system that consists of a source, a destination, a relay, and a jammer where the jammer intends to inject interference signals to block the transmission link (from the source to destination).The jammer in this paper is assumed to always broadcast enough the interference (power) to block the communication in its communication range when it is actived.We define the jammer as "absolute jammer".The behavior of the jammer is assumed to follow the Markov chain model.
In order to deal with the challenge of the jammer, we propose to use a relay to help the source to forward the data to the destination.However, the relay is assumed to be a small and movable device (for easy set up).Subsequently, the relay has a small battery that can allow the device to work in a limited time.The energy harvesting technique is applied to deal with energy-constrained problem at the relay; however, the limited energy arrival rate is also taken into account where the relay can harvest non-RF energy from the ambient environment with a limited amount of energy in each time slot [22,25].Moreover, the imperfection of the spectrum sensing mechanism [20,21,28] on the jamming detection at the source is also considered.
This paper aims to maximize the long-term achievable throughput of the cooperative communication system in which the source will make the optimal decision on whether or not the source should cooperate with the relay to transmit the data to the destination securely with the purpose of degrading the jamming effect.We formulate the problem based on the POMDP framework [28] and propose a novel scheme to obtain the optimal policy such that the source can select the best action in every single time slot by considering long-term throughput maximization.The main contributions of this paper are summarized as follows

•
We investigate the throughput maximization of the cooperative communication system under the jamming attack, where an intermediate relay equipped with a limited-capacity battery is deployed to securely facilitate the transmission from the source to the destination.Meanwhile, due to the limited energy of the jammer, the jamming attack operation is assumed to follow the Markov chain model.

•
We consider the imperfection of the spectrum sensing as well as the non-RF energy harvesting model at the relay for long-term operation, which greatly affects the network performance in practice.

•
We propose a POMDP-based scheme at the source node to determine the optimal policy in the cooperation with the relay to improve the long-term achievable throughput of the network in the presence of the jamming attack.As a result, according to the optimal policy, the optimal action in every single time slot operation can be obtained using the proposed scheme.

•
We evaluate the performance of the proposed scheme in comparison with traditional schemes such as Myopic and Direct Link Only schemes via Matlab simulation under various network conditions.The numerical results are given to show the superior of the proposed scheme as compared with others according to the various network parameters of the cooperative communication network.
The remainder of the paper is organized as follows.In Section 2, we introduce the system model and the Markov chain model of the jammer.In Section 3, we describe the optimal policy for direct and relay-assisted transmission modes.Section 4 evaluates our proposed scheme via simulation results.Finally, we conclude this paper in Section 5. To make it tractable to follow, we indicate the most frequently used symbols in Table 1.

System Model
As shown in Figure 1, we consider a cooperative communication network consisting a source, S, a destination, D, a relay, R, and a jammer, J.The source, destination, and jammer are assumed to have a fixed power supply such that they always have enough energy for transmission, reception, and jamming.By using the cooperative technique, S can cooperate with R to maximize the achievable throughput in the presence of the attack performed by J.The network is assumed to follow a synchronous, time-slotted model with time slot duration T. The channel coefficients between S and D, S and R, and R and D are denoted by h SD , h SR and h RD , respectively.In this paper, we consider the "absolute jammer" who always has enough energy to transmit the interference signals to destroy the channel in target transmission link in a whole time slot duration.Therefore, when the jammer attacks the channel, the direct transmission link from the source to the destination will be blocked; and thus, D can not receive the data transmitted from S. Fortunately, R can help the source to forward its data to the destination in this case.However, the relay is assumed to have a limited-capacity battery without any fix powered supplies.Hence, the relay needs to harvest non-RF energy to maintain its long-term operation.The relay is assumed to scavenge energy during a whole time slot T and the harvested energy is stored in a battery with a finite capacity, E ca .Therefore, at time slot tth, the amount of energy E h (t) (

Relay Destination Source
where P E h (t) is the probability of the harvested energy in time slot tth.In this paper, we assume that the harvested energy of the relay follows a Poisson distribution where E h (t) is a Poisson random variable with mean value e h mean , and the PMF in (1) can be rewritten as follows: In order to deal with the jamming attack problem, at the beginning of each time slot, the source needs to determine whether it should use the direct transmission mode or relay-assisted transmission mode to transmit its data to the destination.
Figure 2a depicts the time frame structure of the direct transmission mode.For this mode, the frame is divided into two phases: sensing and data transmission.In the sensing phase, the source performs spectrum sensing to detect jamming signals infected by the jammer.In the data transmission phase, the source transmits the data to the destination without any help from the relay.Figure 2b illustrates the time frame structure of the relay-assisted transmission mode.For this mode, the frame is divided into three phases: sensing, data transmission from the source to the destination (S − R), and data forwarding from the relay to the destination (R − D).Unlike direct transmission, after the sensing phase, the source will transmit the data to the relay and then, the relay will forward the data to the destination.τ s represents the sensing time of the source.τ SD , τ SR , and τ RD represent the data transmission times of the links: S to D, S to R, and R to D, respectively.Note that the duration of sensing phase in relay-assisted transmission mode is the same as direct transmission mode meanwhile the duration τ SR = τ RD = 1 2 τ SD .For the direct transmission mode, the received signal of the destination at the end of a time slot can be expressed as follows where P S is the transmission power at the source; x s represents the signal transmitted from the source, n D is white Gaussian noise (AWGN) with zero-mean and variance σ 2 at the destination.For the relay-assisted transmission mode, the received signals of the relay after the S-R phase can be expressed as follows where n R denotes the white Gaussian noise with zero-mean and variance σ 2 at the relay.This paper adopts an amplify-and-forward (AF) relaying protocol to forward data to the destination.Hence, the relay amplifies the signal by using a scale factor, β r , which can be calculated as follows: where P R represents the transmission power at the relay.
In phase 2, the received signal at the destination is given by where n J denotes jamming noise with zero-mean and variance σ 2 J .Note that in the case that the jammer does not attack, the n J will be zero.As a result, the signal-to-interference-plus-noise ratio (SINR) at D is denoted as ϕ 0 , ϕ 1 , and ϕ 2 for the direct transmission without jamming, relay-assisted transmission with jamming and relay-assisted transmission without jamming, respectively, obtained as follows: According to the transmission mode, the average throughput can be calculated as where P f and P d are the probability of false alarm and the probability of detection of the sensing mechanism, respectively, according to sensing time duration τ s .Pr ( J) and Pr (J) denote the probability of no jamming and the probability of jamming in the network, respectively.C 0 , C 1 , and C 2 represent achievable throughput at the destination under the different cases in (10), i.e., direct transmission without jamming (10a), relay-assisted transmission with jamming (10b), and relay-assisted transmission without jamming (10c).
The probability of detection and false alarm can be estimated as follows [28]: and where ϑ denotes energy threshold, M represents for number of sensing samples and can be calculated as M = 2τ s f s ( f s is sensing bandwidth), γ is signal-to-noise ratio (SNR) of the sensing channel (i.e., the channel between source and jammer).There are some available researches that propose methods to estimate the SNR value.Therefore, in this paper we assume that the value is available at the source.The probability of false alarm also can be achieved by In this paper, we assume that the states of the jammer follow a Markov chain model.The states of the jammer changes between the two states, presence (J) and absence (J), shown in Figure 3.The transition probabilities of the jammer from state J to state J and from state J to itself are denoted as P J J and P J J , respectively [29].We assume that the source always has a data packet to transmit to the destination.At the beginning of a time slot, the information about remaining energy of the relay ( e re , 0 ≤ e re ≤ E ca ) is assumed to be available at the source.
Figure 4 shows the operation process of the system.First of all, the source performs sensing to identify the states ( "presence" or "absence") of the jammer.If the sensing engine provides the result "absence", i.e., there is no jamming signal in the current time slot (not always true due to the imperfect sensing), the source will trust the result and then transmits its data directly to the destination.If the source receives an acknowledge (ACK) message after the transmission phase, then the reward is calculated as The belief probability p b t+1 , which represents the probability of the jammer being present in the next time slot, will be updated as The remaining energy in the battery of the relay can be updated as where E h (t) is the amount of harvested energy of the relay in time slot tth.If the source does not receive an ACK (or receive NACK) after the transmission phase, the reward will be zero (i.e., R = 0).Besides that, the updated belief in the next time slot, p b t+1 , can be calculated as The transition probability is given as If the result obtained from the sensing engine is "presence", then the proposed scheme, based on a partially observable Markov decision process (POMDP), will be applied to select the optimal action (i.e., either performs the direct transmission mode or relay-assisted transmission mode).The proposed scheme will be presented in more detail in the next section.

Optimal Mode Decision Policy Based on POMDP
In this scheme, we apply POMDP to obtain an optimal mode decision policy to maximize the throughput in a cooperative communications network in the presence of the jamming attack.In this system, there are two operation modes for the source: direct transmission (DT) and relay-assisted transmission (RT), a t = {RT, DT}.In DT mode, the relay will not assist the source to forward the data to the destination (i.e., the relay is inactive for this case).That means the destination will receive the data transmitted directly from the source.In RT mode, the relay will help the source to forward the data to the destination (i.e., the relay is active for this case).Due to the energy-constrained problem in the relay as well as the imperfect spectrum sensing, the source will consider the long-term reward to efficiently cooperate with the relay to optimize the network performance.
In order to formulate the framework of POMDP, we define the state space of the system as s = e re t , p b t where e re t and p b t are the remaining energy of the relay and the probability of the presence of the jammer in time slot tth, respectively.Value function V (e re ,p b ) represents the maximum total discounted throughput of the system, which is given by where 0 < α < 1 denotes the discount factor and it is chosen to adjust the impact of future action to current action.More specifically, if the value of alpha is large, the reward of the future action will more affect to the reward of the current action and vice versa, R e re t , p b t , a t is the achieved throughput of system in time slot tth when action a t is performed at the state s = e re t , p b t .

RT Mode
In the RT mode, the relay will help the source forward data to the destination.In this mode, the destination can always receive data packet, so it will be difficult to distinguish the presence of jamming.From the received signals at the destination, we can realize whether the jammer actually attacks the channel or not.That is because the signal strength from received data packet when the jammer attacks will become stronger than a normal received data packet (i.e., without jamming).Therefore, the destination can recognize whether the original signal contains the jamming signal or not, in terms of the predefined jamming threshold χ jam such as where P D is the received signal energy at the destination.Observation 1 (Φ 1 ): The sensing result indicates the presence of the jammer, the source transmits data packet via the relay and the jammer actually attacks the channel.In this case, jammer attack is well detected, and corresponding achieved throughput is given by The probability that case Φ 1 happens can be calculated as The updated belief for the next time slot is computed as The updated remaining energy for the next time slot is where E tr is the required energy for transmission from the relay to the destination.The transition probability can be calculated as Observation 2 (Φ 2 ): In this case, the sensing result indicates the presence of the jammer, the source transmits data via the relay, and the jammer actually does not attack the channel.Hence, we recognize the false alarm happens in this case.The achieved throughput is given as The updated belief for the next time slot is given as The remaining energy for the next time slot can be updated as The probability that case Φ 2 happens is The transition probability if the case

. DT Mode
In direct transmission mode, the source directly transmits the data to the destination (without a help from the relay).According to whether the source receives ACK from destination, the following two observations can be described as follows.
Observation 3 (Φ 3 ): The source transmits the data directly to the destination and receives an ACK.The achieved throughput is given by The probability that the case Φ 3 occurs is computed as The belief that the jammer will be present in the next time slot can be updated as In this case, although the relay does not receive and forward data to the destination, it still harvests energy for future use.The updated remaining energy of the relay for the next time slot is The transition probability that the case Φ 3 happens is computed as Observation 4 (Φ 4 ): The source transmits data directly to the destination, but it does not receive an ACK.In this case, there is no achieved throughput such that we have R e re t , p b t |Φ 4 = 0.The probability that case Φ 4 happens can be calculated as follows: The updated belief for the next time slot can be given as The remaining energy in the relay is updated in the same way as Equation (37) under Φ 3 .The transition probability that case Φ 4 occurs is given by Pr e re t → e re t+1 for k = 1, 2, 3..., ν.
Based on these observations, we can calculate the expected value function, and further we can find the optimal operation mode, a k .Therefore, the value function in (22) can be rewritten as follows In order to solve problem in Equation (42), a numerical method is used [30].The solution to the problem provides the optimal policy of the system.The complexity of algorithm can be analyzed based on the amount of computation space such as number of states, actions, transition probabilities and observations.Based on the Bellman's equation, the optimal policy is chosen by solving the value function using iteration-based dynamic programming.Let us denote Z, S be the action set and the possible state set at the beginning of each time slot, respectively.The algorithm complexity can be defined according to the action and state space of the system.In the POMDP, the agent has to control the process at each time step to maximize the long-term reward.Therefore, the number of O |Z| |S| 2 operations is required in each iteration to calculate total number of the transition probabilities from one state s (t) to other state s (t) after performing an action a (t).

Simulation Results
In order to evaluate the effectiveness of the proposed scheme, we implemented a simulation using MATLAB.In this section, we present the performance comparisons among the proposed scheme, the Myopic scheme and the Direct Link Only scheme.In the Myopic scheme, we only considered the throughput for the current time slot to select optimal action.In the Direct Link Only scheme, the source always uses the direct link to transmit data packet to the destination.The parameters used for our simulation are shown in Table 2.In Figure 5 shows the average throughput of the system according to the required transmission energy of relay node.According to Figure 5, the throughput of the system decreases as the required transmission energy increases.The reason is as follows: For a large amount of the required transmission energy, the source will have fewer opportunities to transmit the data packet via relay when the jammer appears since the relay lacks energy for the forwarding process.It is obvious that the increase in the required transmission energy does not affect the average throughput of the Direct Link Only scheme.The figure verifies that the proposed scheme outperforms the Myopic scheme and the Direct Link Only scheme.Figure 6 shows the relation between average throughput and battery capacity in the relay.We can see that the average throughput of the system increases as the battery capacity of the relay increases.The reason is why the relay has more energy to assist communication between the source and destination.On the other hand, the battery capacity of the relay does not affect the Direct Link Only scheme, and corresponding throughput is not changed.The figure shows that the proposed scheme can provide higher throughput than the Myopic and the Direct Link Only schemes.Figure 7 shows average throughput of the proposed scheme according to the battery capacity of relay node for different values of detection probability P d .It is observed that average throughput of the proposed scheme increased as the battery capacity of relay node increases for a fixed value of P d .However, the average throughput of the proposed scheme goes into a saturation mode for a certain value of the battery capacity.That is, as the battery capacity reaches a certain value, the average throughput of the proposed scheme cannot be enhanced.Figure 7 also shows that more detection probability of jammer, the more the average throughput.To do this, however, we need more accurate sensing scheme at the source node.Finally, Figure 8 shows the average throughput of the proposed scheme according to the detection probability of jammer for different values of the required transmission energy of the relay node.As like the previous observation, the more required transmission energy of the relay node, the less the average throughput.For a fixed value of the required transmission energy of the relay node, the average throughput of the proposed scheme is improved as the detection probability of jammer increases.

Conclusions
In this paper, we investigated the average throughput maximization of cooperative communications networks when under a jamming attack.In addition, the energy-constraint problem was taken into account.We proposed a POMDP-based scheme to achieve the optimal mode decision policy to maximize the long-term throughput by taking into account future reward.Simulation results confirmed that the proposed scheme can improve the overall throughput in cooperative communications networks and outperforms a Myopic scheme and Direct Link Only scheme under the jamming attack.In the future, we would like to investigate the joint relay and channel selection scheme in multiple relays and multiple channels to enhance the overall throughput of the network.Moreover, multiple jammers should be considered and an actor-critic-based scheme should be studied to determine the optimal policy in the cooperation communications networks with energy-constrained relay under multiple jammers.Therefore, it is the key challenge in the future to find the optimal solution for the source to select the best relay and channel in the multiple channels, relays and jammers.

Figure 1 .
Figure 1.The system model of the proposed scheme.

Figure 3 .
Figure 3.A Markov chain model of the jammer.

Figure 4 .
Figure 4. Flowchart of the proposed scheme.

Figure 6 .
Figure 6.Average throughput versus capacity of battery.

Figure 7 .
Figure 7. Average throughput versus capacity of the battery when detection probability P d = 0.4, 0.6, and 0.9.

Table 1 .
The notation list.SD , τ SR , τ RD Transmission time between S and D, S and R, and R and D, respectively P S , P R Transmission power at the source and the receiver SD , h SR , h RD Channel coefficients between S and D, S and R, and R and D γ SNR of the channel between the source and the jammer α Discount factor P d , P f Probability of detection and false alarm energy units) that is harvested by the relay can be expressed as E h (t) ∈ e h 1 , e h 2 , e h 3 , ..., e h ν , where e h 1 , e h 2 , e h 3 , ..., e h ν are harvested energy levels, and 0 < e h 1 < e h 2 < e h 3 < ... < e h ν < E ca .The probability mass function (PMF) of the harvested energy is given as:
Average throughput versus detection probability P d when transmission energy E tr = 4, 6, and 8.