Unmanned-Aerial-Vehicle-Assisted Secure Free Space Optical Transmission in Internet of Things: Intelligent Strategy for Optimal Fairness

Xu, Fang; Dong, Mingda

doi:10.3390/s24248070

Open AccessArticle

Unmanned-Aerial-Vehicle-Assisted Secure Free Space Optical Transmission in Internet of Things: Intelligent Strategy for Optimal Fairness

by

Fang Xu

^1,* and

Mingda Dong

²

¹

College of Electronic and Information Engineering, Southwest University, Chongqing 400715, China

²

Qualcomm Communication Technologies (Shanghai) Co., Ltd., Shanghai 201208, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(24), 8070; https://doi.org/10.3390/s24248070

Submission received: 16 October 2024 / Revised: 16 November 2024 / Accepted: 19 November 2024 / Published: 18 December 2024

(This article belongs to the Special Issue Advances in Security for Emerging Intelligent Systems)

Download

Browse Figures

Versions Notes

Abstract

:

In this article, we consider an UAV (unmanned aerial vehicle)-assisted free space optical (FSO) secure communication network. Since FSO signal is impossible to detect by eavesdroppers without proper beam alignment and security authentication, a BS employs FSO technique to transfer information to multiple authenticated sensors, to improve the transmission security and reliability with the help of an UAV relay with decode and forward (DF) mode. All the sensors need to first send information to the UAV to obtain security authentication, and then the UAV forwards corresponding information to them. Successive interference cancellation (SIC) is used to decode the information received at the UAV and all authenticated sensors. With consideration of fairness, we introduce a statistical metric for evaluating the network performance, i.e., the maximum decoding outage probability for all authenticated sensors. In particular, applying an intelligent approach, we obtain a near-optimal scheme for secure transmit power allocation. With a well-trained allocation scheme, approximate closed-form expressions for optimal transmit power levels can be obtained. Through some numerical examples, we illustrate the various design trade-offs for such a system. Additionally, the validity of our approach was verified by comparing with the result from exhaustive search. In particular, the result with DRL was only

0.3 %

higher than that with exhaustive search. These results can provide some important guidelines for the fairness-aware design of UAV-assisted secure FSO communication networks.

Keywords:

secure FSO communication; security authentication; successive interference cancellation (SIC); decoding outage probability; UAV relay

1. Introduction

Secure information transmission has been widely investigated in the past, e.g., [1,2,3,4,5], where complicated techniques needed to be used to avoid information leakage. Since eavesdroppers cannot directly receive an optical signal without beam alignment and security authentication, FSO communication is a promising technique for secure information transmission. Compared to electromagnetic wave transmission, there are several advantages for such technique, e.g., high security performance, high data rate, long transmission distance, and low interference, etc. However, optical signals only can be transferred over a line-of-sight (LoS) path, where the signal quality will considerably deteriorate if obstacles exist in the transmission link. With the rapid development of wireless communications, longer transmission distance and higher reliability are more and more desirable for real applications. While noting that a relay can considerably improve communication coverage and reliability, FSO relaying transmission attracted a lot of attentions from both industry and academia over the past decade.

1.1. Previous Work

FSO relaying transmission has been extensively investigated by many researchers in the past, e.g., [6,7,8,9,10,11]. Refs. [6,7,8,9,10,11] mainly focused on mixed FSO/RF transmission. More specifically, Ref. [6] considered a dual-hop multi-user relay system with mixed FSO/RF links, where closed-form expressions for the outage probability and ergodic capacity were derived accordingly. Ref. [7] considered a FSO backhaul network, where concise mathematical expressions for different performance metrics were derived accordingly, e.g., outage probability, average bit error rate (BER), and so on. Ref. [8] considered a mixed FSO/RF spectrum sharing network, where the transmit power and jamming power of a second user (SU) were jointly optimized to achieve the best secrecy performance under both a RF-dominant case and FSO-dominant case. Ref. [9] considered a mixed RF/FSO dual-hop transmission system, where closed-form expressions for outage probability and bit error probability were derived. Ref. [10] analyzed the secrecy outage probability for a dual-hop relaying transmission system involving a hybrid MIMO RF/FSO link, where four transmit antenna selection (TAS) schemes were proposed to enhance the secrecy performance under imperfect CSI. Ref. [11] investigated the throughput maximization issue in a parallel hybrid RF/FSO channel, where the optimal relay selection and time allocation policies were obtained a for buffer-aided relay and non-buffer-aided relay. However, since RF transmission has broadcasting characteristics, all sensors and eavesdroppers can still receive RF signals from an UAV, even with security authentication, and a more complicated technique is needed to ensure the privacy of information transmissions. Combining the above illustrations of the FSO technique and [12], the security performance of mixed FSO/RF transmission is usually not as good as FSO transmission alone.

Refs. [13,14,15,16,17,18,19,20,21,22,23] mainly considered a single FSO relay transmission with high security performance. Ref. [13] considered a three-node cooperative FSO transmission network with an energy harvesting (EH) decode-and-forward (DF) relay, where a novel harvest–store–use (HSU)-based relaying transmission strategy was proposed to improve the overall performance. Ref. [14] considered a single FSO-based distinct eavesdropper (ED) located near the relay, wiretapping the FSO link, where closed-form expressions for the secrecy outage probability (SOP) were derived for three different application scenarios. Ref. [15] investigated secure communications with the aid of hybrid FSO/RF links in a two-phase uplink transmission, where the trajectory and power allocation of the UAV were jointly optimized to maximize the average secrecy rate of the network. Ref. [16] investigated the physical layer security (PLS) performance of a decode and forward (DF) dual-hop mixed FSO/RF communication network in the presence of multiple eavesdroppers, where the lower bound for the secrecy outage probability (SOP) and effective secrecy throughput (EST) were derived. Ref. [17] considered a FSO communication network with a single relay, where a distance-dependent cooperative scheme was proposed to achieve a higher diversity order. Ref. [18] considered a relay-based FSO communication network, where a novel technique was further proposed to regenerate the optical signal. Ref. [19] considered a FSO communication system and proposed a novel protocol for enhancing the cooperative diversity, where a closed-form asymptotic expression for outage probability was derived. Ref. [20] considered a dynamic cooperative FSO transmission network, where two resource-allocation schemes were proposed to jointly optimize the performance. Ref. [21] proposed a novel relay selection strategy to improve the quality of transmission in a multi-relay assisted FSO transmission network, where the closed-form outage probability was derived accordingly. Ref. [22] proposed a novel three-stage methodology for a cooperative FSO transmission network, where closed-form expressions for the conditional error probability were derived. Ref. [23] considered MIMO relay-assisted FSO communication, where the performance of different methods was compared via outage probability analysis in the case of independent fading among the different apertures of communication nodes. Note that all the above references either derived closed-form expressions or used iterative algorithms to optimize the performance. While noting that the optimization problem for fairness-aware secrecy is extremely complex, the approaches adopted in these references are not suitable for optimizing such a metric.

Since an UAV can be flexibly deployed in some hard-to-reach locations, they have recently been applied to FSO communication networks to forward information [24,25,26,27]. Ref. [24] demonstrated that the usage of a UAV relay can considerably improve the performance of conventional FSO communication networks. In [25,26,27], the performance of UAV-aided FSO relaying transmission networks was analyzed and optimized. However, to the best of our knowledge, an optimal fairness-aware design scheme for a UAV-assisted secure FSO relay communication network has not been fully investigated to date.

1.2. Contributions

In this article, we consider an UAV-assisted secure FSO relay communication network, where a BS transmits information to multiple remote sensors with the help of an UAV relay. More specifically, the UAV can effectively decode the received information using SIC, and then forwards it to multiple remote sensors. All the sensors adopt SIC for information decoding. We derive the closed-form expression for decoding outage probability at all sensors. We employ a DRL approach to arrive at a near-optimal scheme, for transmit power allocation, to minimize the maximum decoding outage probability for all sensors. Accordingly, approximate closed-form expressions for optimal transmit power levels can be obtained. The main contributions of our paper are summarized as below:

We analyzed the decoding outage probability for all sensors, in terms of an UAV-assisted FSO relay communication network, where all closed-form expressions were derived.
We used a DRL-based approach to tain the optimal scheme, for transmit power allocation, to minimize the maximum decoding outage probability of all sensors. Note that the relationship between input and output can be modeled using a well-trained policy network, approximate analytical expressions for optimal transmit power levels can be obtained. To the best of our knowledge, this is the first work to investigate an optimal transmission scheme for UAV-assisted FSO communication network in the IoT from the perspective of fairness.
We illustrate some design trade-offs and compare various design approaches through selected numerical examples. These results can greatly facilitate the fairness-aware design of UAV-assisted FSO communication networks in the IoT.

The rest of this paper is organized as follows. Section 2 presents the system and FSO channel model under consideration. The closed-form outage probabilities for all authenticated sensors are derived in Section 3. Section 4 addresses the issue of parameter optimization in terms of fairness-aware outage probability minimization. After that, we present some selected numerical examples and discuss some design trade-offs in Section 5. Finally, some concluding remarks are presented in Section 6.

2. System and Channel Model

2.1. System Model

We consider a secure FSO relaying communication network, comprising of one BS, one UAV relay, one eavesdropper, and multiple sensors, as shown in Figure 1. Since the complexity of implementing a wavelength division multiplex (WDM) is extremely high, we use authentication-based non-orthogonal multiple access (NOMA) to simultaneously transmit information to multiple sensors in this case. More specifically, the BS first transfers data to the UAV. After that, the UAV decodes the received information and forwards it to all sensors. An optical phased array can be used to facilitate the alignment and tracking of an optical beam [28], to perform accurate dynamic alignment for FSO communication. Note that all the sensors need to send signal to the UAV for security authentication, and then they are authorized to access the network for information transmission and reception. An eavesdropper also requires security authentication from the UAV, as a normal sensor. We assume that the probability that the UAV identifies an eavesdropper as a normal sensor is

P_{s e c r}

, which is also regarded as the secrecy outage of data transmission. SIC is used to decode the received information at the UAV and all sensors. According to [29], an UAV can employ spatial light modulation and a lens system to generate multiple light beams to point at authenticated sensors. Such a system is very simple to realize in real application scenarios. In addition, we ignore the transfer efficiency from the electrical signal to optical signal, and direct optical link is supposed to be not available due to the existence of obstacles. Within the first phase of the information transmission, the received signal of the UAV is denoted by

\begin{matrix} y_{r} (t) = (s_{1} (t) + s_{2} (t) + \dots + s_{M} (t)) \sqrt{I_{S R}} + N_{R} (t), \end{matrix}

(1)

where

I_{S R}

denotes the channel power gain of the FSO link from the BS to the UAV,

s_{i} (t)

denotes the transmitted signal for sensor i,

i = 1, 2, \dots, M

, and

N_{R} (t)

denotes the received noise signal at the UAV. Within the second phase of information transmission, the received signal of sensor i,

i = 1, 2, \dots, M

, is given by

\begin{matrix} y_{D_{i}} (t) = s_{i} (t) \sqrt{I_{R D_{i}}} + N_{D_{i}} (t), \end{matrix}

(2)

where

I_{R D_{i}}

denotes the channel power gain of the FSO link from the UAV to sensor i, and

N_{D_{i}} (t)

denotes the noise signal received at sensor i.

2.2. Channel Model

We consider a composite optical channel model [30], where the channel power gain is proportional to the path loss, atmospheric turbulence, and pointing error. Specifically, the overall channel power gain

I_{S R}

can be calculated by

\begin{matrix} I_{S R} = I_{S R}^{l} I_{S R}^{p} I_{S R}^{a} . \end{matrix}

(3)

Here,

I_{S R}^{l}

denotes the power gain of the path loss,

I_{S R}^{P}

denotes the power gain of the pointing error, and

I_{S R}^{a}

denotes the power gain of atmosphere turbulence following a gamma–gamma distribution. In particular, the PDF (probability density function) of

I_{S R}

is shown as

\begin{matrix} f_{I_{S R}} (x) = \frac{α_{S R} β_{S R} ϵ_{S R}^{2}}{I_{S R}^{l} A_{S R} Γ (α_{S R}) Γ (β_{S R})} G_{1, 3}^{3, 0} [\frac{α_{S R} β_{S R}}{I_{S R}^{l} A_{S R}} x |_{ϵ_{S R}^{2} - 1, α_{S R} - 1, β_{S R} - 1}^{ϵ_{S R}^{2}}] . \end{matrix}

(4)

Here,

G_{1, 3}^{3, 0} [.]

denotes the Meiger-G-function,

α_{S R}

and

β_{S R}

are parameters related to the atmosphere turbulence, while

A_{S R}

and

ϵ_{S R}

are parameters related to the pointing error.

I_{S R}^{l}

is denoted by

\begin{matrix} I_{S R}^{l} = exp (- k L_{S R}), \end{matrix}

(5)

where k denotes the attenuation coefficient, and

L_{S R}

denotes the distance from the BS to the UAV.

In terms of the second-hop links, following a similar analysis process, the PDF of

I_{R D_{i}}

,

i = 1, 2, \dots, M

can be denoted by

\begin{matrix} f_{I_{R D_{i}}} (x) = Q_{R D_{i}} G_{1, 3}^{3, 0} [\frac{α_{R D_{i}} β_{R D_{i}}}{I_{R D_{i}}^{l} A_{R D_{i}}} x |_{ϵ_{R D_{i}}^{2} - 1, α_{R D_{i}} - 1, β_{R D_{i}} - 1}^{ϵ_{R D_{i}}^{2}}], \end{matrix}

(6)

where

Q_{R D_{i}}

denotes

\frac{α_{R D_{i}} β_{R D_{i}} ϵ_{R D_{i}}^{2}}{I_{R D_{i}}^{l} A_{R D_{i}} Γ (α_{R D_{i}}) Γ (β_{R D_{i}})}

. Here,

α_{R D_{i}}

,

β_{R D_{i}}

,

ϵ_{R D_{i}}

,

I_{R D_{i}}

and

I_{R D_{i}}^{l}

are the channel parameters for link from the UAV to sensor i,

i = 1, 2, \dots, M

, which have the same physical meaning as

α_{S R}

,

β_{S R}

,

ϵ_{S R}

,

I_{S R}

and

I_{S R}^{l}

in the first-hop link. Then, the CDF of

I_{R D_{i}}

can be derived as follows:

\begin{matrix} F_{I_{R D_{i}}} (x) = Q_{R D_{i}} x G_{2, 4}^{3, 1} [\frac{α_{R D_{i}} β_{R D_{i}}}{I_{R D_{i}}^{l} A_{R D_{i}}} x |_{ϵ_{R D_{i}}^{2} - 1, α_{R D_{i}} - 1, β_{R D_{i}} - 1, - 1}^{0, ϵ_{R D_{i}}^{2}}] . \end{matrix}

(7)

3. Outage Probability Analysis

In this section, we analyze the decoding outage probability at each sensor, where the transmit power level of

s_{i} (t)

is denoted by

P_{i}

,

i = 1, 2, \dots, M

. We assume that

P_{1} \geq P_{2} \geq \dots \geq P_{M}

. Note that different from conventional SIC, the order of the transmit power levels is not only determined based on the order of magnitude of the channel power gains, where the optimal scheme for transmit power allocation is obtained using the DRL approach. Combined with the descriptions in Section 2, while noting that an imperfect SIC scheme is adopted for decoding the signals in a descending order of transmit power levels, the received signal-to-interference-and-noise ratio (SINR) of the UAV for decoding

s_{i} (t)

is calculated by

\begin{matrix} S I N R_{R, i} = \frac{P_{i}}{(P_{i + 1} + \dots + P_{M}) + u [i - 2] ξ (P_{1} + \dots + P_{i - 1}) + σ^{2} / I_{S R}} . \end{matrix}

(8)

Here,

ξ

denotes the proportional coefficient of residual interference caused by imperfect decoding [31],

u [.]

denotes the unit step function. Using

γ_{t h}

to denote the threshold of the SINR for successful information decoding, the outage probability of decoding

s_{i} (t)

can be denoted by

\begin{matrix} 1 - \prod_{k = 1}^{i} (1 - Pr [S I N R_{R, k} < γ_{t h}]) = \\ 1 - \prod_{k = 1}^{i} (1 - \frac{Q_{k} ϵ_{S R}^{2} G_{2, 4}^{3, 1} [Q_{k} |_{ϵ_{S R}^{2} - 1, α_{S R} - 1, β_{S R} - 1, - 1}^{0, ϵ_{S R}^{2}}]}{Γ (α_{S R}) Γ (β_{S R})}), \end{matrix}

(9)

where

Q_{k}

denotes

\frac{α_{S R} β_{S R} γ_{t h} σ^{2} / (A_{S R} I_{S R}^{l})}{P_{k} - γ_{t h} (P_{k + 1} + \dots + P_{M} + u [k - 2] ξ (P_{1} + \dots + P_{k - 1}))}

.

Pr [S I N R_{R, k} < γ_{t h}]

can be calculated by performing an integral operation for

f_{I_{S R}} (x)

from 0 to

A_{S R} I_{S R}^{l} Q_{k} / (α_{S R} β_{S R})

. The corresponding integral result can be obtained based on Equation (26) of [32]. During the second-hop transmission from the UAV to all sensors, the UAV utilizes spatial modulation to generate multiple directional light beams to point at the corresponding sensors [29]. Applying the imperfect SIC, in the case that the eavesdropper does not obtain security authentication from the UAV, the received SINR of sensor j,

j = 1, 2, \dots, M

, for decoding

s_{i} (t)

can be denoted by

\begin{matrix} S I N R_{D_{j \to i}} = \frac{P_{i}}{P_{i + 1} + \dots + P_{M} + u [i - 2] ξ (P_{1} + \dots + P_{i - 1}) + \frac{σ^{2}}{I_{R D_{j}}}} . \end{matrix}

(10)

Accordingly, the resulting outage probability at sensor j for decoding

s_{i} (t)

is derived as below

\begin{matrix} Pr [S I N R_{D_{j \to i}} < γ_{t h}] = \int_{0}^{Q_{R D}^{i}} f_{I_{R D_{j}}} (x) d x = F_{I_{R D_{j}}} (Q_{R D}^{i}), \end{matrix}

(11)

where

Q_{R D}^{i}

denotes

\frac{γ_{t h} σ^{2}}{P_{i} - γ_{t h} (P_{i + 1} + \dots + P_{M} + u [i - 2] ξ (P_{1} + \dots + P_{i - 1}))}

. Note that a decoding outage will occur in cases where either the received SINR of the first-hop link or that of the second-hop link is not high enough for the decoding the information properly. As such, the decoding outage probability for sensor i,

i = 1, 2, \dots, M

, can be calculated as follows

\begin{matrix} P_{o u t, i} = 1 - (1 - P_{s e c r}) \prod_{q = 1}^{i} Pr [S I N R_{D_{i \to q}} \geq γ_{t h}] \prod_{k = 1}^{i} Pr [S I N R_{R, k} \geq γ_{t h}] . \end{matrix}

(12)

Combining Equations (9) and (11), a closed-form expression for Equation (12) can be derived accordingly. Note that a secrecy outage with the probability of

P_{s e c r}

is also regarded as a decoding outage in this case, where the eavesdropper also obtains the security authentication from the UAV.

4. Fairness-Aware Optimal Transmission Scheme

In this section, we will investigate the optimal transmission scheme of an UAV-assisted FSO communication network from the perspective of fairness. We apply a DRL-based approach to arrive at a near-optimal allocation scheme for the transmit power levels. In particular, with a well-trained policy, while noting that a deep neural network (DNN) can model the relationship between the input and output, approximate closed-form expressions for near-optimal transmit power levels can be obtained.

To ensure the fairness among all authenticated sensors, an objective function is set to their maximum decoding outage probability. Note that fairness performance becomes better when maximum decoding outage probability decreases. Combining all the above analyses, the optimization problem is formulated as follows

\begin{matrix} min_{P_{1}, P_{2}, \dots, P_{M}} max {P_{o u t, 1}, P_{o u t, 2}, \dots, P_{o u t, M}}, \\ \begin{matrix} s . t . & 0 < P_{1} + P_{2} + \dots + P_{M} \leq P_{max}, 0 & < P_{1}, \dots, 0 < P_{M} . \end{matrix} \end{matrix}

Here,

P_{o u t, i}

was derived in Section 3,

i = 1, 2, \dots, M

. We can see that such an objective function is extremely complex and not convex with respect to all variables. Obviously, it is impossible to derive closed-form optimal solutions in this case. Note that the FSO channel parameters are unstable under dynamic weather conditions. With iterative algorithms, the optimization process needs to be repeated when the model parameters change, which will lead to a very high computational complexity. In addition, the objective function involves a complicated Meiger-G function, which does not have a gradient expression. As such, the gradient ascent approach is not feasible for handling the corresponding optimization issues. As such, we intend to use the DRL approach to arrive at a near-optimal policy model for approximating the closed-form expressions of the optimal transmit power levels. The state space

S

is defined as

{α_{S R}, β_{S R}, α_{R D_{1}}, β_{R D_{1}}, α_{R D_{2}}, β_{R D_{2}}, \dots, α_{R D_{M}}, β_{R D_{M}}}

, and the action space

R

is defined as

{P_{1}, P_{2}, \dots, P_{M}}

.

4.1. Reward Function

Since our objective is to minimize the maximum decoding outage probability for all sensors, we define the inverse of the above objective function as the reward function, shown as follows

\begin{matrix} R_{s t a t i s} = \frac{1}{max {P_{o u t, 1}, P_{o u t, 2}, \dots, P_{o u t, M}}} . \end{matrix}

(13)

4.2. Deep Actor Networks and Critic Network

In this case, through interacting with one critic network, several actor networks can be well-trained to determine the optimal actions under a dynamic state input. Since the gradient operation of the reward function may be NAN for some random experience tuples, we introduce a critic network to approximate it during the training process. As shown in Figure 2, the critic network is used to output approximate reward value for a specific power allocation scheme, and the actor networks are used to determine the transmit power levels of all sensors. Here, the ReLu function is used as the output activation function. The parameter set of the critic network is denoted by

θ^{Q}

, and the parameter set of the actor network k is denoted by

θ^{μ_{k}}

,

k = 1, 2, \dots, M

. The input of all actor networks is the statistical channel state information

{\vec{s}}_{t}

, which is defined as

{[α_{S R}, β_{S R}, α_{R D_{1}}, β_{R D_{1}}, α_{R D_{2}}, β_{R D_{2}}, \dots, α_{R D_{M}}, β_{R D_{M}}]}^{T}

. The input of the critic network consists of a state vector

{\vec{s}}_{t}

and action vector

{\vec{a}}_{t}

, where

{\vec{a}}_{t}

is denoted by

{[P_{1}, P_{2}, \dots, P_{M}]}^{T}

. More specifically, within each training iteration, the critic network can evaluate the output of the actor networks and then feed back an estimated reward value to them, based on which all the actor networks can improve their performance by performing gradient ascent. Also note that we use

Q (\vec{s_{t}}, \vec{a_{t}} | θ^{Q})

to denote the output of the critic network and use

μ_{k} (\vec{s_{t}} | θ^{μ_{1}})

to denote the output of the actor network k,

k = 1, 2, \dots, M

, respectively. After a certain amount of iterations, all actor networks can be properly trained. While noting that the exact mathematical relationship between the input and output can be modeled by a deep neural network, approximate closed-form expressions for optimal transmit power levels can be obtained from well-trained actor networks.

4.3. Model Training Process

Prior to the beginning of training, we need to set up a memory buffer with the size of K to save the experience tuples. In particular, we randomly generate a statistical channel state vector

{\vec{s}}_{t} = {[α_{S R}, β_{S R}, α_{R D_{1}}, β_{R D_{1}}, \dots, α_{R D_{M}}, β_{R D_{M}}]}^{T}

, and then feed it to the actor networks to arrive at an action vector

{\vec{a}}_{t} = {[P_{1}, \dots, P_{M}]}^{T}

. By substituting the obtained state vector and action vector into Equation (13), the resulting reward value

R_{t}

can be calculated accordingly. The obtained new experience tuple

\vec{a_{t}}

-

\vec{s_{t}}

-

R_{t}

is put into the memory buffer. Note that such a process needs to be repeated several times, until the memory buffer is full.

Once K experience tuples are available, we can start the training process. During each iteration, following a similar process described above, we first generate a new state–action–reward tuple and put it into the memory buffer to randomly replace an existing tuple. After that, we extract N experience tuples at random from the memory buffer and use them to construct a critic network for approximating the reward function. The parameter set of the critic network can be updated by minimizing the loss function, shown as follows

\begin{matrix} θ^{Q^{'}} = min_{θ^{Q}} \sum_{i = 1}^{N} {(Q (\vec{s_{i}}, \vec{a_{i}} | θ^{Q}) - R_{i})}^{2} . \end{matrix}

(14)

After obtaining an updated critic network, we can further renew the actor networks by performing the operation of joint gradient ascent with the help of chain rule. The parameter set of actor network j,

j = 1, 2, \dots, M

, is updated as follows

\begin{matrix} θ^{μ_{j}^{'}} \leftarrow θ^{μ_{j}} + \frac{ϵ}{N} \sum_{i = 1}^{N} ▽_{μ_{j} (\vec{s_{i}} | θ^{μ_{j}})} Q (\vec{s}, \vec{a} | θ^{Q^{'}}) |_{\vec{s} = \vec{s_{i}}, \vec{a} = \vec{a_{i}}} ▽_{θ^{μ_{j}}} μ_{j} (\vec{s_{i}} | θ^{μ_{j}}), \end{matrix}

(15)

where

\begin{matrix} \vec{a_{i}} = {[μ_{1} (\vec{s_{i}} | θ^{μ_{1}}), \dots, μ_{M} (\vec{s_{i}} | θ^{μ_{M}})]}^{T} . \end{matrix}

(16)

Here,

ϵ

denotes the learning rate for all actor networks.

4.4. Random Exploration

To increase the chance of converging to a global optimal action policy, we introduce random exploration into all output actions. More specifically, the output of the actor network k,

k = 1, 2, \dots, M

, is denoted by

\begin{matrix} a_{k} = max {0, N (μ (\vec{s} | θ^{μ_{k}}), v_{k})}, \end{matrix}

(17)

where

N (.)

denotes the mathematical sign of the normal distribution and

v_{k}

denotes the corresponding variance,

k = 1, 2, \dots, M

. The transmit power level

P_{k}

,

k = 1, 2, \dots, M

, can be calculated as below

\begin{matrix} P_{k} = P_{max} a_{k} / (Σ_{j = 1}^{M} a_{j}) . \end{matrix}

(18)

At the end of each training iteration, the variance

v_{k}

is updated to

β v_{k}

,

k = 1, 2, \dots, M

, where

β

is a constant ranging from 0 to 1. As such, all the variances should become smaller and smaller with on-going training, based on which the scope of exploration is gradually reduced as well. After a sufficient number of training iterations, all actor networks can be well trained to determine the near-optimal scheme for transmit power allocation.

4.5. The Pseudo Code

The pseudo code of the action policy training is shown in Algorithm 1.

Algorithm 1 The pseudo code of training.

for t $\in [1, 2, \dots, T]$ do
Generate a random state vector $\vec{s_{t}}$ .
Generate action vector $\vec{a_{t}}$ with given state vector $\vec{s_{t}}$ as in Equations (17) and (18) and the above descriptions.
Use $\vec{a_{t}}$ and $\vec{s_{t}}$ to calculate the resulting reward $R_{t}$ as in Equation (13).
Employ the state–action–reward tuple $\vec{s_{t}}$ - $\vec{a_{t}}$ - $R_{t}$ to randomly replace an existing tuple in the memory buffer.
Randomly extract N state–action–reward tuples $\vec{s_{i}}$ - $\vec{a_{i}}$ - $R_{i}$ , $i = 1, 2, \dots, N$ , from the memory buffer to update the parameter set of the critic network to $θ^{Q^{'}}$ as in Equation (14).
Update the parameter set of all actor networks by performing joint gradient ascent as in Equation (15).
Update the variance $v_{k}$ to $β v_{k}$ , $k = 1, \dots, M$ .
end for

5. Numerical Results

In this section, we present some selected numerical results to illustrate the effect of the proposed approach. The common parameters, used in the simulation, are shown in Table 1. There are three hidden layers for each actor network, each of which has 16 neurons. As for the critic network, there is only one hidden layer with 128 neurons. To simplify the process, we assume that there are three sensors in the simulation.

Figure 3 presents the minimum value for the maximum decoding outage probability of all sensors as a function of the first-hop channel parameter

α_{S R}

, where

α_{S R}

varied from 0.3 to 24. As expected, the minimum value for the maximum decoding outage probability of all sensors from the exhaustive search is very close to that from the DRL approach. The complexity of the proposed DRL approach is mainly from the offline training, which needs around 90,000 iterations. As for the online operation, the well-trained deep network only needs to output the action, without involving any iterations, where the complexity is very low. As such, we can show that our proposed approach has arrived at near-optimal performance.

Figure 4 presents the statistical transmit power levels of the well-trained actor networks as a function of the first-hop channel parameter

α_{S R}

. We can see that most power is allocated to transfer signal to the first sensor, and the transmit power of the third sensor’s signal has the minimum level. In addition, we can observe that the transmit power level of the first sensor’s signal decreased as

α_{S R}

increased, while the transmit power levels for other two sensors’ signals have a contrary trend. Note that the sum of all transmit power levels is always equal to

P_{max}

with varying

α_{S R}

.

Figure 5 presents the average reward over 100 consecutive steps during the training process of the optimal scheme for different learning rate settings. As expected, we can see that the training reward firstly increases with on-going training and then converges to a stable value. We can also observe that the learning rate of

0.0000007

leads to a considerably higher convergent reward value than those of

0.000007

and

0.00007

. As such, it is of great importance to set an appropriate learning rate to achieve good training performance. Note that the fluctuation of the reward function is caused by random exploration, which will approach zero after a certain amount of training iterations.

6. Conclusions

In this paper, we considered a UAV-assisted secure FSO communication network, where the BS transfers information to multiple authenticated sensors via an optical link. We analyzed and minimized the maximum decoding outage probability for all authenticated sensors. More specifically, we applied a DRL-based approach to arrive at a near-optimal scheme for transmit power allocation from the perspective of fairness. The validity of our approach was verified by comparing with the result from exhaustive search. These results will be very valuable for improving the fairness of UAV-assisted secure FSO communication networks.

Author Contributions

Methodology, F.X. and M.D. F.X. proposed the idea, system model and designed the intelligent algorithm. M.D. assisted F.X. writing the codes and performing the simultion. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

Author Mingda Dong was employed by the company Qualcomm Communication Technologies (Shanghai) Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Wu, C.; Chen, K.H.J.; Zhao, Z.; Du, R. Toward robust detection of puppet attacks via characterizing fingertip-touch behaviors. IEEE Trans. Dependable Secur. Comput. 2021, 19, 4002–4018. [Google Scholar] [CrossRef]
Yang, X.; Chen, J.; Bai, K.H.H.; Wu, C.; Du, R. Efficient privacy-preserving inference outsourcing for convolutional neural networks. IEEE Trans. Inf. Forensics Secur. 2023, 18, 4815–4829. [Google Scholar] [CrossRef]
Wu, C.; Cao, H.; Xu, G.; Zhou, C.; Sun, J.; Yan, R.; Liu, Y.; Jiang, H. It’s all in the touch: Authenticating users with HOST gestures on multi-touch screen devices. IEEE Trans. Mob. Comput. 2024, 23, 10016–10030. [Google Scholar] [CrossRef]
Wu, C.; Cao, H.; Xu, G.; Zhou, C.; Sun, J.; Yan, R.; Liu, Y.; Jiang, H. CaiAuth: Context-aware implicit authentication when the screen is awake. IEEE Internet Things J. 2020, 7, 11420–11430. [Google Scholar] [CrossRef]
Liang, R.; Chen, J.; He, K.; Wu, Y.; Deng, G.; Du, R.; Wu, C. Ponziguard: Detecting ponzi schemes on ethereum with contract runtime behavior graph (crbg). In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–12. [Google Scholar]
Li, R.; Chen, T.; Fan, L.; Dang, A. Performance analysis of a multiuser dual-hop amplify and forward relay system with FSO/RF links. J. Opt. Commun. Netw. 2019, 11, 362–370. [Google Scholar] [CrossRef]
Bag, B.; Das, A.; Ansari, I.S.; Proke, A.; Bose, C.; Chandra, A. Performance analysis of hybrid FSO systems using FSO/RF-FSO link adaptation. IEEE Photonics J. 2018, 10, 33465–33473. [Google Scholar] [CrossRef]
Hu, Z.; Chen, C.; Zhang, Z. Secure cooperative transmission for mixed RF/FSO spectrum sharing networks. IEEE Trans. Commun. 2020, 68, 3010–3023. [Google Scholar] [CrossRef]
Nasab, E.S.; Uysal, M. Generalized performance analysis of mixed RF/FSO cooperative systems. IEEE Trans. Wireless Commun. 2016, 15, 714–727. [Google Scholar] [CrossRef]
Lei, H.; Luo, H.; Park, K.H.; Ansari, I.S.; Lei, W.; Pan, G.; Alouini, M.S. On Secure Mixed RF-FSO Systems with TAS and Imperfect CSI. IEEE Wireless Commun. Lett. 2020, 68, 4461–4475. [Google Scholar]
Najafi, M.; Jamali, V.; Schober, R. Two-way relay selection in multiple relayed FSO networks. IEEE Trans. Commun. 2017, 65, 2794–2810. [Google Scholar] [CrossRef]
Erdogan, E.; Altunbas, I.; Kurt, G.K.; Yanikomeroglu, H. The Secrecy Comparison of RF and FSO Eavesdropping Attacks in Mixed RF-FSO Relay Networks. IEEE Photonics J. 2012, 14, 7901508. [Google Scholar] [CrossRef]
Rjeily, C.A.; Kaddoum, G. Free Space Optical Cooperative Communications via an Energy Harvesting Harvest-Store-Use Relay. IEEE Trans. Wireless Commun. 2020, 19, 6564–6577. [Google Scholar] [CrossRef]
Saxena, V.N.; Gupta, J.; Dwivedi, V.K. Secured End-to-End FSO-VLC-Based IoT Network With Randomly Positioned VLC: Known and Unknown CSI. IEEE Internet Things J. 2023, 10, 1347–1357. [Google Scholar] [CrossRef]
Zhang, Y.; Gao, X.; Yuan, H.; Yang, K.; Kang, J.; Wang, P.; Niyato, D. Joint UAV Trajectory and Power Allocation with Hybrid FSO/RF for Secure Space–Air–Ground Communications. IEEE Internet Things J. 2024, 11, 31407–31421. [Google Scholar] [CrossRef]
Pattanayak, D.R.; Dwivedi, V.K.; Karwal, V.; Upadhya, A.; Lei, H. Secure Transmission for Energy Efficient Parallel Mixed FSO/RF System in Presence of Independent Eavesdroppers. IEEE Photonics J. 2022, 14, 7307714. [Google Scholar] [CrossRef]
Zhu, B.; Cheng, J.; Wu, L. A Distance-Dependent Free-Space Optical Cooperative Communication System. IEEE Commun. Lett. 2015, 19, 969–972. [Google Scholar] [CrossRef]
Aljohani, A.J.; Mirza, J.; Ghafoor, S. A novel regeneration technique for free space optical communication systems. IEEE Commun. Lett. 2021, 25, 196–199. [Google Scholar] [CrossRef]
Li, R.; Zhang, J.; Dang, A. Cooperative system in free-space optical communications for simultaneous multiuser transmission. IEEE Commun. Lett. 2018, 22, 2036–2039. [Google Scholar] [CrossRef]
Ghazy, A.S.; Selmy, H.A.I.; Shalaby, H.M.H. Fair resource allocation schemes for cooperative dynamic free-space optical networks. J. Opt. Commun. Netw. 2016, 8, 822–834. [Google Scholar] [CrossRef]
Chatzidiamantis, N.D.; Michalopoulos, D.S.; Kriezis, E.E.; Karagiannidis, G.K.; Schober, R. Relay selection protocols for relay-assisted free-space optical systems. J. Opt. Commun. Netw. 2013, 5, 92–103. [Google Scholar] [CrossRef]
Rjeily, C.A.; Haddad, S. Inter-relay cooperation: A new paradigm for enhanced relay-assisted FSO communications. IEEE Trans. Wireless Commun. 2014, 62, 1970–1982. [Google Scholar] [CrossRef]
Rjeily, C.A. Performance analysis of FSO communications with diversity methods: Add more relays or more apertures? IEEE J. Sel. Areas Commun. 2015, 33, 1890–1902. [Google Scholar] [CrossRef]
Fawaz, W.; Rjeily, C.A.; Assi, C. UAV-aided cooperation for FSO communication systems. IEEE Commun. Mag. 2018, 56, 70–75. [Google Scholar] [CrossRef]
Wang, J.Y.; Ma, Y.; Lu, R.R.; Wang, J.B.; Lin, M.; Cheng, J. Hovering UAV-based FSO communications: Channel modelling, performance analysis, and parameter optimization. IEEE J. Sel. Areas Commun. 2021, 39, 2946–2959. [Google Scholar] [CrossRef]
Dabiri, M.T.; Sadough, S.M.S.; Khalighi, M.A. Channel modeling and parameter optimization for hovering UAV-based free-space optical links. IEEE J. Sel. Areas Commun. 2018, 36, 2104–2113. [Google Scholar] [CrossRef]
Lee, J.H.; Park, K.H.; Ko, Y.C.; Alouini, M.S. Throughput maximization of mixed FSO/RF UAV-aided mobile relaying with a buffer. IEEE Trans. Wireless Commun. 2021, 20, 683–694. [Google Scholar] [CrossRef]
McManamon, P.F.; Dorschner, T.A.; Corkum, D.L.; Friedman, L.J.; Hobbs, D.S.; Holz, M.; Liberman, S.; Nguyen, H.Q.; Resler, D.P.; Sharp, R.C.; et al. Optical phased array technology. Proc. IEEE 1996, 84, 268–298. [Google Scholar] [CrossRef]
Gao, J.; Dang, J.; Zhang, Z.; Wu, L. Rate Analysis of Intensity Modulated Broadcast Optical Mobile Communication System with User Mobility. IEEE Photonics J. 2020, 12, 7905312. [Google Scholar] [CrossRef]
Sharma, M.; Chadha, D.; Chandra, V. High-altitude platform for free-space optical communication: Performance evaluation and reliability analysis. J. Opt. Commun. Netw. 2016, 8, 600–609. [Google Scholar] [CrossRef]
Shuai, H.; Guo, K.; An, K.; Zhu, S. NOMA-based integrated satellite terrestrial networks with relay selection and imperfect SIC. IEEE Access 2021, 9, 111346–111357. [Google Scholar] [CrossRef]
Adamchik, V.S.; Marichev, O.I. The algorithm for calculating integrals of hypergeometric type functions and its realization in REDUCE system. In Proceedings of the International Symposium on Symbolic and Algebraic Computation, Tokyo, Japan, 20–24 August 1990; pp. 212–224. [Google Scholar]

Figure 1. Configuration of UAV-assisted FSO communication network.

Figure 2. The configuration of the employed deep neural networks.

Figure 3. The minimum value for the maximum outage probability for the exhaustive search and deep reinforcement learning,

β_{S R} = 13

,

ϵ_{S R} = 0.4

,

α_{R D 1}

= 13

,

β_{R D 1} = 7.5

,

ϵ_{R D 1} = 1.7

,

α_{R D 2} = 13.5

,

β_{R D 2} = 8

,

ϵ_{R D 2} = 2.1

,

α_{R D 3}

= 11.5

,

β_{R D 3} = 7.2

,

ϵ_{R D 3} = 2.4

.

Figure 3. The minimum value for the maximum outage probability for the exhaustive search and deep reinforcement learning,

β_{S R} = 13

,

ϵ_{S R} = 0.4

,

α_{R D 1}

= 13

,

β_{R D 1} = 7.5

,

ϵ_{R D 1} = 1.7

,

α_{R D 2} = 13.5

,

β_{R D 2} = 8

,

ϵ_{R D 2} = 2.1

,

α_{R D 3}

= 11.5

,

β_{R D 3} = 7.2

,

ϵ_{R D 3} = 2.4

.

Figure 4. Near-optimal statistical transmit power levels for the signals of all machines,

β_{S R} = 13

,

ϵ_{S R} = 0.4

,

α_{R D 1}

= 13

,

β_{R D 1} = 7.5

,

ϵ_{R D 1} = 1.7

,

α_{R D 2} = 13.5

,

β_{R D 2} = 8

,

ϵ_{R D 2} = 2.1

,

α_{R D 3}

= 11.5

,

β_{R D 3} = 7.2

,

ϵ_{R D 3} = 2.4

.

Figure 4. Near-optimal statistical transmit power levels for the signals of all machines,

β_{S R} = 13

,

ϵ_{S R} = 0.4

,

α_{R D 1}

= 13

,

β_{R D 1} = 7.5

,

ϵ_{R D 1} = 1.7

,

α_{R D 2} = 13.5

,

β_{R D 2} = 8

,

ϵ_{R D 2} = 2.1

,

α_{R D 3}

= 11.5

,

β_{R D 3} = 7.2

,

ϵ_{R D 3} = 2.4

.

Figure 5. The average reward over 100 consecutive steps for the training of the statistical optimal scheme.

Table 1. Parameters used in the simulation.

Notations	Meaning	Values
$σ^{2}$	Average noise power	$10^{- 5}$ Watt
B	Channel bandwidth	200 KHz
$P_{max}$	The peak power	1 Watt
v	Exploration variance	14
$β$	Updation factor	0.99
$γ_{th}$	SINR threshold	0.2
$P_{s e c r}$	security outage probability	0.05

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, F.; Dong, M. Unmanned-Aerial-Vehicle-Assisted Secure Free Space Optical Transmission in Internet of Things: Intelligent Strategy for Optimal Fairness. Sensors 2024, 24, 8070. https://doi.org/10.3390/s24248070

AMA Style

Xu F, Dong M. Unmanned-Aerial-Vehicle-Assisted Secure Free Space Optical Transmission in Internet of Things: Intelligent Strategy for Optimal Fairness. Sensors. 2024; 24(24):8070. https://doi.org/10.3390/s24248070

Chicago/Turabian Style

Xu, Fang, and Mingda Dong. 2024. "Unmanned-Aerial-Vehicle-Assisted Secure Free Space Optical Transmission in Internet of Things: Intelligent Strategy for Optimal Fairness" Sensors 24, no. 24: 8070. https://doi.org/10.3390/s24248070

APA Style

Xu, F., & Dong, M. (2024). Unmanned-Aerial-Vehicle-Assisted Secure Free Space Optical Transmission in Internet of Things: Intelligent Strategy for Optimal Fairness. Sensors, 24(24), 8070. https://doi.org/10.3390/s24248070

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unmanned-Aerial-Vehicle-Assisted Secure Free Space Optical Transmission in Internet of Things: Intelligent Strategy for Optimal Fairness

Abstract

1. Introduction

1.1. Previous Work

1.2. Contributions

2. System and Channel Model

2.1. System Model

2.2. Channel Model

3. Outage Probability Analysis

4. Fairness-Aware Optimal Transmission Scheme

4.1. Reward Function

4.2. Deep Actor Networks and Critic Network

4.3. Model Training Process

4.4. Random Exploration

4.5. The Pseudo Code

5. Numerical Results

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI