Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing

Jo, Seongjun; Yang, Wooyeol; Choi, Haing Kun; Noh, Eonsu; Jo, Han-Shin; Park, Jaedon

doi:10.3390/s22041630

Open AccessArticle

Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing

by

Seongjun Jo

¹

,

Wooyeol Yang

¹

,

Haing Kun Choi

²,

Eonsu Noh

³,

Han-Shin Jo

^1,*

and

Jaedon Park

^3,*

¹

Department of Electronic Engineering, Hanbat National University, Daejeon 34158, Korea

²

TnB Radio Tech., Seoul 08504, Korea

³

Agency for Defense Development, Daejeon 34186, Korea

^*

Authors to whom correspondence should be addressed.

Sensors 2022, 22(4), 1630; https://doi.org/10.3390/s22041630

Submission received: 12 January 2022 / Revised: 17 February 2022 / Accepted: 18 February 2022 / Published: 19 February 2022

(This article belongs to the Special Issue Unmanned Aerial Vehicle (UAV)-Enabled Wireless Communications and Networking)

Download

Browse Figures

Versions Notes

Abstract

:

A High Altitude Platform Station (HAPS) can facilitate high-speed data communication over wide areas using high-power line-of-sight communication; however, it can significantly interfere with existing systems. Given spectrum sharing with existing systems, the HAPS transmission power must be adjusted to satisfy the interference requirement for incumbent protection. However, excessive transmission power reduction can lead to severe degradation of the HAPS coverage. To solve this problem, we propose a multi-agent Deep Q-learning (DQL)-based transmission power control algorithm to minimize the outage probability of the HAPS downlink while satisfying the interference requirement of an interfered system. In addition, a double DQL (DDQL) is developed to prevent the potential risk of action-value overestimation from the DQL. With a proper state, reward, and training process, all agents cooperatively learn a power control policy for achieving a near-optimal solution. The proposed DQL power control algorithm performs equal or close to the optimal exhaustive search algorithm for varying positions of the interfered system. The proposed DQL and DDQL power control yields the same performance, which indicates that the actional value overestimation does not adversely affect the quality of the learned policy.

Keywords:

Deep Q-learning (DQL); Double Deep Q-learning (DDQL); dynamic spectrum sharing; High Altitude Platform Station (HAPS); cellular communications; power control; interference management

1. Introduction

A High Altitude Platform Station (HAPS) is a network node operating in the stratosphere at an altitude of approximately 20 km. The International Telecommunication Union (ITU) defines a HAPS in Article 1.66A as “A station on an object at an altitude of 20 to 50 km and a specified, nominal, fixed point relative to the Earth”. Various studies have been performed on HAPS in recent years, and the commercial applications of HAPS have significantly increased [1]. In addition, the HAPS has potential as a significant component of wireless network architectures [2]. It is also an essential component of next-generation wireless networks, with considerable potential as a wireless access platform for future wireless communication systems [3,4,5].

Because the HAPS is located at high altitudes ranging from 20 to 50 km, the HAPS-to-ground propagation generally experiences lower path loss and a higher line-of-sight probability than typical ground-to-ground propagation. Thus, the HAPS can provide a high data rate for wide coverage; however, it is likely to interfere with various other terrestrial services, e.g., fixed, mobile, and radiolocation. The World Radiocommunication Conference 2019 (WRC-19) adopted a HAPS as the IMT Base Station (HIBS) in the frequency bands below 2.7 GHz previously identified for IMT by Resolution 247 [6], which addresses the potential interference of HAPS with an existing service. In such a situation, if the existing service is not safe from HAPS interference, the two systems cannot coexist. Therefore, the HAPS transmitter is requested to reduce its transmission power to satisfy the interference–to–noise ratio (

I N R

) requirement for protecting the receiver of the existing service. However, if the HAPS transmission power is excessively reduced, the signal–to–interference–plus–noise ratio (SINR) of the HAPS downlink decreases; thus, the outage probability may exceed the desired level. Herein, a HAPS transmission power control algorithm is proposed that aims to minimize the outage probability of the HAPS downlink while satisfying the

I N R

requirement for protecting incumbents.

1.1. Related Works

Studies have been performed on improving the performance of HAPS. In [7], resource allocation for an Orthogonal Frequency Division Multiple Access (OFDMA)-based HAPS system that uses multicasting in the downlink to maximize the number of user terminals by maximizing the radio resources was studied. The authors of [8] proposed a wireless channel allocation algorithm for a HAPS 5G massive multiple-input multiple-output (MIMO) communication system based on reinforcement learning. Combining Q-learning and backpropagation neural networks allows the algorithm to learn intelligently for varying channel load and block conditions. In [9], a criterion for determining the minimum distance in a mobile user access system was derived, and a channel allocation approach based on predicted changes in the number of users and the call volume was proposed.

Additionally, spectrum sharing studies on HAPS have been performed. In [10], a spectrum sharing study was conducted to share a fixed service using a HAPS with other services in the 31/28-GHz band. Interference mitigation techniques were introduced, e.g., increasing the minimum operational elevation angle or improving the antenna radiation pattern to facilitate sharing with other services. In addition, the possibility of dynamic channel allocation was analyzed. In [11], sharing between a HAPS and a fixed service in the 5.8-GHz band was investigated using a coexistence methodology based on a spectrum emission mask.

In contrast to previous studies in which HAPS communication improvement and spectrum sharing were dealt with separately, in the present study, a combination of spectrum sharing with other systems and HAPS downlink coverage improvement is considered. In this regard, this study is more advanced than previous HAPS-related studies.

Deep Q-learning (DQL) is a reinforcement learning algorithm that applies deep neural networks to reinforcement learning to solve complex problems in the real world. DQL is widely used in various fields, including UAV, drone, and HAPS. In [12], the optimal UAV-BS trajectory was presented using a DQL for optimal placement of UAVs, and the author of [13] used a DQL to determine the optimal link between two UAV nodes. In [14], a DQL is used to find the optimal flight parameters for the collision-free trajectory of the UAV. In [15], two-hop communication was considered to optimize the drone base station trajectory and improve network performance, and a DQL was used to solve the joint two-hop communication scenario. In [16], a DQL was used for multiple-HAPS coordination for communications area coverage. A Double Deep Q-learning (DDQL) is an algorithm developed to prevent the overestimation of a DQL and shows better performance than the DQL in various fields [17].

1.2. Contributions

The contributions of the present study are as follows. (1) For the first time, a multiagent DQL was used to improve the HAPS outage performance and solve the problem of spectrum sharing with existing services. (2) We defined the power control optimization problem to minimize the outage probability of the HAPS downlink under the interference constraint for protecting the existing system. The state and reward for the training agent were designed to consider the objective function and constraints of the optimization problem. (3) Because the HAPS has a multicell structure, the number of power combinations increases exponentially as the number of cells (

N_{c e l l}

) and power levels increase linearly. Thus, the optimal exhaustive search method requires an impractically long computation time to solve the multicell power optimization problem. The proposed DQL algorithm performs comparably to an optimal exhaustive search with a feasible computation time. (4) Even for varying positions of the interfered system, the proposed DQL produces a proper power control policy, maintaining stable performance. (5) Comparing the proposed DQL algorithm with the DDQL algorithm shows no performance degradation due to overestimation in the proposed DQL. The remainder of this paper is organized as follows.

Section 2 presents the system model, including the system deployment model, HAPS model, interfered system model, and path loss model. In Section 3, the downlink SINR and

I N R

are calculated. In Section 4, a DQL-based HAPS power control algorithm is proposed. Section 5 presents the simulation results, and Section 6 concludes the paper.

2. System Model

2.1. System Deployment Model

HAPS communication networks are assumed to consist of a single HAPS, multiple ground user equipment (UE) devices (referred to as UEs hereinafter), and a ground interfered receiver. The HAPS, UE, and interfered receiver are distributed in the three-dimensional Cartesian coordinate system, as shown in Figure 1. The coordinates of the HAPS antenna and the interfered receiver antenna are (0, 0,

h_{H A P S}

) and (X, Y,

h_{V}

), respectively. The

N_{U E}

UE devices with an antenna height of

h_{U E}

are uniformly distributed within the circular HAPS area.

2.2. HAPS Model

We modeled the HAPS cell deployment and system parameters with reference to the working document for a HAPS coexistence study performed in preparation for WRC-23 [18]. As shown in Figure 2, a single HAPS serves multiple cells that consist of one 1st layer cell denoted as

C e l l_1

and six 2nd layer cells denoted as

C e l l_2

to

C e l l_7

. The six cells of the 2nd layer are arranged at intervals of 60° in the horizontal direction. Figure 3 presents a typical HAPS antenna design for seven-cell structures [4], where seven phased-array antennas conduct beamforming toward the ground to form seven cells, as shown in Figure 2. The 1st layer cell has an antenna tilt of 90°, i.e., perpendicular to the ground; the 2nd layer cell has an antenna tilt of 23°.

The antenna pattern of the HAPS was designed using the antenna gain formula presented in Recommendation ITU-R M.2101 [19]. The transmitting antenna gain is calculated as the sum of the gain of a single element and the beamforming gain of a multi-antenna array. The single element antenna gain is determined by the azimuth angle (

ϕ

) and the elevation angle (

θ

) between the transmitter and receiver and is calculated as follows:

A_{E} (ϕ, θ) = G_{E, m a x} - m i n {- [A_{E, H} (ϕ) + A_{E, v} (θ)], A_{m}}_{},

(1)

where

G_{E, m a x}

represents the maximum antenna gain of a single element,

A_{E, H} (ϕ)

represents the horizontal radiation pattern calculated using Equation (2), and

A_{E, v} (θ)

represents the vertical radiation pattern calculated using Equation (3).

A_{E, H} (ϕ) = - m i n [12 {(\frac{ϕ}{ϕ_{3 dB}})}^{2}, A_{m}]

(2)

Here,

ϕ_{3 dB}

represents the horizontal 3 dB beamwidth of a single element, and

A_{m}

represents the front-to-back ratio.

A_{E, V} (θ) = - m i n [12 {(\frac{θ - 90}{θ_{3 dB}})}^{2}, S L A_{v}]

(3)

Here,

θ_{3 dB}

represents the vertical 3 dB bandwidth of a single element, and

S L A_{v}

represents the front-to-back ratio.

The transmitting antenna gain of the HAPS is calculated using the antenna arrangement and spacing, as well as the target beamforming direction. The gain for beam i is calculated as follows:

A_{A, B e a m i} (θ, ϕ) = A_{E} (θ, ϕ) + 10 \log_{10} ({| \sum_{m = 1}^{N_{H}} \sum_{n = 1}^{N_{V}} w_{i, n, m} \cdot v_{n, m} |}^{2}),

(4)

where

N_{H}

and

N_{V}

represent the number of antennas in the horizontal and vertical directions, respectively.

v_{n, m}

is the superposition vector that overlaps the beams of the antenna elements, which is calculated using Equation (5), and

w_{i, n, m}

is the weight that directs the antenna element in the beamforming direction, which is calculated using Equation (6).

\begin{matrix} n = 1, 2, \dots N_{V}; m = 1, 2, \dots N_{H} \\ v_{n, m} = e x p (\sqrt{- 1} \cdot 2 π ((n - 1) \cdot \frac{d_{V}}{λ} \cdot c o s (θ) + (m - 1) \cdot \frac{d_{H}}{λ} \cdot s i n (θ) \cdot s i n (ϕ))) \end{matrix}

(5)

Here,

d_{H} and d_{V}

represent the intervals between the horizontal and vertical antenna arrays, respectively, and

λ

represents the wavelength.

\begin{matrix} w_{i, n, m} = \frac{1}{\sqrt{N_{H} N_{V}}} & e x p (\sqrt{- 1} \\ \cdot 2 π ((n - 1) \cdot \frac{d_{V}}{λ} \cdot s i n (θ_{i, e t i l t}) - (m - 1) \cdot \frac{d_{H}}{λ} \cdot c o s (θ_{i, e t i l t}) \\ \cdot s i n (ϕ_{i, e s c a n}))) \end{matrix}

(6)

Here,

ϕ_{i, e s c a n}

and

θ_{i, e t i l t}

represent the

ϕ

and

θ

of the main beam direction, respectively.

The 1st layer cell of the HAPS uses a 2 × 2 antenna array, and the 2nd layer cell uses a 4 × 2 antenna array. Figure 4 shows the antenna pattern of the 1st layer cell, and Figure 5 shows the antenna pattern of the 2nd layer cell.

2.3. Interfered System Model

Various interfered systems, e.g., fixed, mobile, and radiolocation services, can be considered for the interference scenario involving a HAPS. We adopted a ground IMT base station (BS) for the interfered system, referring to the potential interference scenario [6]. The antenna pattern of the interfered system was applied by referring to Recommendation ITU-R F.1336 [20]. The receiving antenna gain is calculated as follows:

G (ϕ, θ) = G_{0} + G_{h r} (x_{h}) + R \cdot G_{v r} (x_{v}),

(7)

where

G_{0}

represents the maximum gain in the azimuth plane;

G_{h r} (x_{h})

represents the relative reference antenna gain in the azimuth plane in the normalized direction of (

x_{h}

, 0), which is calculated using Equation (8); and

G_{v r} (x_{v})

represents the relative reference antenna gain in the elevation plane in the normalized direction of (0,

x_{v}

), which is calculated using Equation (9).

R

represents the horizontal gain compression ratio when the azimuth angle is shifted from 0° to

ϕ

, which is calculated using Equation (10).

\begin{matrix} G_{h r} (x_{h}) = - 12 x_{h}^{2} f o r x_{h} \leq {0.5}_{} \\ G_{h r} (x_{h}) = - 12 x_{h}^{(2 - k_{h})} - λ_{kh} f o r 0.5 < x_{h} \\ G_{h r} (x_{h}) \geq G_{180} \end{matrix}

(8)

\begin{matrix} G_{vr} (x_{v}) = - 12 x_{v}^{2} f o r x_{v} < x_{k} \\ G_{vr} (x_{v}) = - 15 + 10 \log (x_{v}^{- 1.5} + k_{v}) f o r x_{k} \leq x_{v} < 4 \\ G_{vr} (x_{v}) = - λ_{kv} - 3 - C \log (x_{v}) f o r 4 \leq x_{v} < 90 ˚ / θ_{3} \\ G_{vr} (x_{v}) = G_{180} f o r x_{v} \geq 90 ˚ / θ_{3} \end{matrix}

(9)

R = \frac{G_{h r} (x_{h}) - G_{h r} (180 ° / ϕ_{3})}{G_{h r} (0) - G_{h r} (180 ° / ϕ_{3})}

(10)

Here,

x_{h}

and

λ_{k h}

are given by Equations (11) and (12), respectively;

ϕ_{3}

represents the 3 dB beamwidth in the azimuth plane; and

k_{h}

is an azimuth pattern adjustment factor based on the leaked power. The relative minimum gain

G_{180}

was calculated using Equation (13).

x_{h} = | ϕ_{} | / ϕ_{3}

(11)

λ_{kh} = 3 (1 - {0.5}_{}^{- k_{h}})

(12)

G_{180} = - 15 + 10 \log (1 + 8 k_{a}) - 15 \log (\frac{180 °}{θ_{3}})

(13)

Returning to Equation (9),

x_{v}

is given by Equation (14), and the 3-dB beamwidth in the elevation plane

θ_{3}

is calculated using Equation (15), where

G_{0}

represents the maximum gain in the azimuth plane. In addition,

x_{k}

is calculated using Equation (16), where

k_{v}

is an elevation pattern adjustment factor based on the leaked power.

λ_{kv}

was calculated using Equation (17), and the attenuation inclination factor

C

was calculated using Equation (18). Figure 6 shows the antenna pattern of the interfered system calculated using Equation (7), which is the pattern for a typical terrestrial BS with a broad beamwidth in the azimuth plane but a narrow beamwidth in the elevation plane.

x_{v} = | θ_{} | / θ_{3}

(14)

θ_{3} = 107.6 \times 10^{- 0.1 G_{0}}

(15)

x_{k} = \sqrt{1.33 - 0.33 k_{v}}

(16)

λ_{kv} = 12 - C \log (4) - 10 \log (4^{- 1.5} + k_{v})

(17)

C = \frac{10 \log (\frac{{(\frac{180 °}{θ_{3}})}^{1.5} \cdot (4^{- 1.5} + k_{v})}{1 + 8 k_{p}})}{\log (\frac{22.5 °}{θ_{3}})}

(18)

2.4. Path Loss Model

The path loss model of Recommendation ITU-R P.619 [21] was applied to the working document for the HAPS coexistence study performed in preparation for WRC-23 [22]. The total path loss that occurs when the HAPS signal reaches the UE and the IMT BS is expressed as follows:

L_{p} = F S L + A_{x p} + A_{g} + A_{b s},

(19)

where

F S L

represents the free-space path loss calculated using Equation (20), which occurs in a straight path from a transmitting antenna to a receiving antenna in a vacuum state, and

A_{x p}

is assumed to be 3 dB for depolarization attenuation.

A_{g}

represents the attenuation loss due to atmospheric gases.

A_{b s}

represents the resistive loss due to the spread of the antenna beam as the beam spreads attenuation.

A_{g}

and

A_{b s}

were calculated using the formulae in P.619.

F S L = 92.45 + 20 \log (f \cdot d)

(20)

Here,

f

represents the carrier frequency (in GHz), and

d

represents the distance (in km) between the transmitter and receiver.

3. Calculation of Downlink SINR and $I N R$

3.1. Calculation of Downlink SINR

The signal received by the UE from the HAPS transmission for the

i

th cell (

C e l l_i

) is calculated as follows:

S_{C e l l_i} = P_{C e l l_i} + G_{C e l l_i} + G_{p} + G_{r, U E} - L_{p} - L_{o h m},

(21)

where

P_{C e l l_i}

represents the HAPS transmission power for

C e l l_i

,

G_{C e l l_i}

represents the transmitting antenna gain of

C e l l_i

,

G_{p}

represents the polarization gain,

G_{r, U E}

represents the receiving antenna gain, and

L_{o h m}

represents the ohmic loss. The UE receives signals from all

N_{c e l l}

cells and considers the remaining signals (except for the strongest

C e l l j

signal) as interference. Equation (22) is used to calculate the signal and interference, and the receiver noise is calculated using Equation (23).

j = \underset{i}{argmax} S_{C e l l_i} S_{H A P S} = S_{C e l l_j} I_{H A P S, U E} = 10 \log (\sum_{\begin{matrix} i = 1 \\ i \neq j \end{matrix}}^{N_{c e l l}} 10^{\frac{S_{C e l l_i}}{10}})

(22)

N = 10 \log (k \times T \times B W) + N_{f}

(23)

Here,

k

and

T

represent the Boltzmann constant and noise temperature, respectively, and

B W

represents the channel bandwidth.

N_{f}

represents the noise figure. Finally, the downlink SINR is calculated as follows:

η = 10 \log (\frac{10^{\frac{S_{H A P S}}{10}}}{10^{\frac{I_{H A P S, U E}}{10}} + 10^{\frac{N}{10}}}) .

(24)

3.2. Calculation of $I N R$

The interference power received by the interfered receiver from the HAPS transmitter servicing

C e l l i

is calculated as follows:

I_{C e l l_i} = P_{C e l l_i} + G_{C e l l_i} + G_{p} + G_{r, V} - L_{p} - L_{o h m},

(25)

where

G_{r, V}

represents the antenna gain of the interfered receiver. The aggregated interference power at the interfered receiver is calculated as follows:

I_{H A P S, V} = 10 \log (\sum_{i = 1}^{N_{c e l l}} 10^{\frac{I_{C e l l_i}}{10}}) .

(26)

Finally, after converting the aggregated interference into

I N R

form in accordance with Equation (27) and comparing it with the protection criteria (

I N R_{t h}

) of the interfered receiver, it is possible to check whether the interfered receiver is protected from the interference of the HAPS.

I N R = I_{H A P S, V} - N

(27)

4. DQL-Based HAPS Transmission Power Control Algorithm

4.1. Problem Formulation

To satisfy the

I N R_{t h}

of the interfered system, the transmission power of the HAPS must be reduced. However, as the power of the HAPS is reduced, the

η

of the UE decreases, and the outage probability

P_{o u t}

increases. Thus, the objective of this study was to find a HAPS transmission power set for each cell, i.e.,

P = {P_{C e l l_i} | i = 1, \dots, N_{c e l l}}

, that satisfies the

I N R_{t h}

of the interfered system while minimizing

P_{o u t}

. The optimization problem of the HAPS transmission power can be formulated as follows:

\min_{P} P_{o u t} = \frac{N_{U E, o} (P)}{N_{U E}} \begin{matrix} s . t . & C 1 : I N R \leq I N R_{t h} \\ C 2 : P_{m i n} \leq P_{C e l l_{i}} \leq P_{m a x} \forall i \in {1, \dots, N_{c e l l}}, \end{matrix}

(28)

where

N_{U E, o} (P)

represents the number of UEs that do not satisfy the minimum required SINR

η_{o}

for a given HAPS transmission power set

P

.

4.2. Proposed Algorithm

To control the HAPS transmission power, it is necessary to independently determine the power level of each cell. Accordingly, the total number of HAPS transmission power sets increases exponentially to

N_{p}^{N_{c e l l}}

as the number of selectable powers

N_{p}

increases linearly. Although an exhaustive search algorithm can be used to find optimal solutions, this incurs excessive complexity and a long computation time. To solve this problem, we propose a DQL-based power optimization algorithm that can find a near-optimal

P

with low complexity. In the proposed DQL model, each agent functions as the power controller of a cell; accordingly, the number of agents is

N_{c e l l}

.

The agent—the subject of learning—learns a deep neural network called Deep Q Network (DQN) and selects an action using this network. DQL is an improved Q-learning method. Q-learning is a method for selecting the best action in a specific state through the Q-table of a state-action pair. As the state–action space grows in Q-learning, creating a Q-table and finding the best policy become highly complex. In addition, the use of Q-learning is limited because learning in the Q-table format becomes more complex when multiple agents are used. In contrast, a DQL is a promising way to solve the curse of dimensionality by approximating a Q function using a deep neural network instead of a Q-table. The proposed algorithm uses a method in which each agent learns a policy based on its observation and action while treating all other agents as part of the environment to solve the multiple-agent problem.

The basic DQL parameters (state, action, and reward) are presented below. Each agent learns the policy independently using the training data at each timestep

t

. The state space of the

m

^th agent comprises a set of

(N_{c e l l} - 1

) interferences that the agent provides to UEs located at the centers of other cells and the agent’s interference to the interfered receiver, which is expressed as

S_{t} = {I_{v}, {I_{U E_i} | i = 1, \dots, N_{c e l l}, a n d i \neq m}} .

(29)

Two power sets configure the action space of an agent:

A_{1} = {29, 31, 33, 35, 37}

and

A_{2} = {26, 28, 30, 32, 34}

(unit: dBm). The agent of

C e l l_1

in the 1st layer cell selects an action from

A_{1}

, and the agents of the 2nd layer cell select an action from

A_{2}

. All agent actions are initialized to the minimum power value to minimize the interference to the interfered receiver at the beginning of the learning process. The reward is calculated as follows. First, because the interfered receiver must be safe from HAPS interference, an agent receives a fixed

r_{t}

of −100 (deficient value) for

I N R > I N R_{t h}

. In contrast, for

I N R \leq I N R_{t h}

, an agent receives

r_{t}

computed according to the lower 5% downlink SINR of each cell

{{\hat{η}}_{i} | i = 1, 2, \dots, N_{c e l l}}

and the required SINR

η_{o}

. The reward can be expressed as

r_{t} = {\begin{matrix} r_{1, t} + r_{2, t} & f o r I N R \leq I N R_{t h} \\ r_{t} = - 100 & o t h e r w i s e, \end{matrix},

(30)

where

r_{1, t} = 10 \cdot (\sum ({\hat{η}}_{i} - η_{o})) f o r {\hat{η}}_{i} \geq η_{o} r_{2, t} = \sum ({\hat{η}}_{i} + η_{o}) f o r {\hat{η}}_{i} < η_{o} .

(31)

Figure 7 shows the structure of the proposed DQL-based HAPS transmission power control algorithm. Each agent learns its DQN, and one DQN consists of the main network, target network, and replay memory. The main network estimates the Q-value

Q (s, a; w)

corresponding to the state–action pair through a deep neural network with a weight

w

. The main network consists of an input layer composed of seven neurons, a hidden layer consisting of 24 neurons, and an output layer consisting of five neurons. It is a fully connected network.

w

is updated every

t

in the direction that minimizes the loss function

L (w) = E [{(y_{j} - Q (s, a; w))}^{2}]

. The target network calculates the target value

y_{j} = r_{j} + γ \max_{a^{'}} \hat{Q} (s^{'}, a^{'}; w^{-})

, where

γ

is the discount factor;

s^{'}

and

a^{'}

denotes the state and action, respectively, in the next step; and

\hat{Q} (s^{'}, a^{'}; w^{-})

is the Q-value estimated through the target network with weight

w^{-}

. The agent’s transition tuple

(s_{t}, a_{t}, r_{t}, s_{t + 1})

is piled in the replay memory, from which a minibatch (size of 512 tuples) are randomly sampled at each step. The minibatch data are used to compute the target value

y_{j}

. In a DQL, learning is stabilized, and the learning performance is improved through replay memory and a separate target network [23].

Algorithm 1 describes the proposed DQL-based HAPS transmission power control algorithm. For DQN training,

N

was set as 100,000, and the minibatch size was set as 512.

M

was set as 500, and

T

was set as 10. The Adam optimizer was used to minimize

L (w)

, and the learning rate and

γ

were 0.01 and 0.995, respectively. An

ε

-greedy policy was used to balance exploration and exploitation;

ε

was initially set as 1 and was reduced by 0.01 for every episode.

Algorithm 1. Training Process for the DQL-Based HAPS Power Control Algorithm

1:: Initialize the replay memory $D$ to capacity $N$
2:: Initialize the $Q$ -function with random weights $w$
3:: Initialize the target $\hat{Q}$ -function with the same weights: $w^{-} = w$
4:: for episode = 1, $M$ do
5:: Initialize action $a_{0} = \min_{a} A$
6:: for timestep = 1, $T$ do
7:: if $t = 1$
8:: Calculate $s_{t}$ via Equations (21) and (25)
9:: end if
10:: With probability, select a random action $a_{t}$
11:: Otherwise, select $a_{t} = \underset{a}{\arg \max} Q (s_{t}, a; w)$
12:: Assign the selected power to the mth cell and compute $I N R$ and $η$
13:: Observe the reward $r_{t}$ and $s_{t + 1}$
14:: Store the experience in ( $s_{t}, a_{t}, r_{t}, s_{t + 1}$ ) in $D$
15:: Sample a random minibatch of experiences from $D$
16:: Set $y_{j} = r_{j} + γ \max_{a^{'}} \hat{Q} (s^{'}, a^{'}; w^{-})$
17:: Perform optimization via $L (w)$ and update $w$
18:: Update the target network $\hat{Q}$ with $w^{-} = w$ every 4 steps
19:: end for
20:: end for

A DDQL is a reinforcement learning algorithm to improve performance degradation due to the overestimation of the DQL. Action-value can be overestimated by the maximization step in line 16 of Algorithm 1. Therefore, the DDQL calculates the target value as

y_{j} = r_{j} + γ \hat{Q} (s^{'}, \underset{a^{'}}{argmax} Q (s^{'}, a^{'}; w); w^{-})

to eliminate the maximization step. The DDQL-based HAPS power control algorithm proceeds the same way as Algorithm 1 except for calculating the target value.

5. Simulation Results

5.1. Simulation Configuration

The simulation was conducted using MATLAB for three positions of the interfered receiver, and the learning order of the agent was randomly set for each

t

. Subsequently, the simulation proceeded according to Algorithm 1. When all

M

episodes were finished, the simulation ended, and the set

P_{c}

composed of the power selected by each agent was calculated as the simulation result. Finally, the performance of the simulation was verified by comparing

P_{c}

with the optimal power set

P^{*}

obtained via an exhaustive search algorithm considering all

N_{p}^{N_{c e l l}}

cases. The total elapsed time of the DQL and exhaustive search was about 7500 s and 21,000 s, respectively. The total elapsed time of the exhaustive search increased exponentially with the rise of N, but the DQL did not. Therefore, the computational efficiency of the DQL is more remarkable as the number of cells and power levels increase. In this simulation, performance comparison with the DDQL was additionally performed to check performance degradation due to overestimation of the DQL.

We applied the HAPS parameters and interfered system parameters, referring to the working document for the HAPS coexistence study performed in preparation for WRC-23 [18,24]. The simulation parameters of the two systems are presented in Table 1 and Table 2, respectively.

5.2. Numerical Analysis

Figure 8 shows the SINR maps obtained using

P_{m a x} = {37, 34, 34, 34, 34, 34, 34}

and

P_{m i n} = {29, 26, 26, 26, 26, 26, 26}

for all cells, that is, with no power control. We considered the three positions of the interfered receiver that do not satisfy the

I N R_{t h}

of −6 dB for the use of

P_{m a x}

. In addition, the three locations were designed considering the representative interference power, which can accurately reflect the operating characteristics of the proposed power control algorithm. Interfered receiver ① was located in the main beam direction for

C e l l_3

and received the highest interference from

C e l l_3

. Therefore, the minimum power use of only

C e l l_3

satisfied an

I N R_{t h}

of −6 dB. Interfered receiver ② was placed on the boundary between

C e l l_3

and

C e l l_4

and thus received equal (and the strongest) interference from these two cells. Interfered receiver ③ was located in the main beam direction for

C e l l_3

, as the interfered receiver. However, the minimum power use of only

C e l l_3

could not satisfy the

I N R_{t h}

of −6 dB, and at least one other cell had to use less than the maximum power.

Table 3 presents the

I N R

and

P_{o u t}

for

P_{m a x}

and

P_{m i n}

with varying interfered receiver locations. The results confirm that the

P_{o u t}

and

I N R

had a tradeoff relationship. The same

P_{o u t}

is shown regardless of the interference receiver position because of the absence of power control. Next, we compared the simulation results of the optimal exhaustive search and the proposed DQL-based power control algorithm for the three positions of the interfered receiver.

5.2.1. Simulation Results for Interfered Receiver ①

Figure 9 shows the SINR map based on the

P_{c}

acquired using the proposed DQL-based power control algorithm for interfered receiver ①. Table 4 presents a performance comparison of the

P^{*}

values obtained via an exhaustive search and

P_{c}

and a comparison of DQL and DDQL results. As shown,

P_{c}

was equal to the optimal value

P^{*}

, providing the same

P_{o u t}

and

I N R

performance. Because the interfered receiver was located in the azimuth main beam direction of

C e l l_3

, the power of

C e l l_3

significantly affected the interfered receiver. Even though all other cells used the maximum power, their interference was negligible. Therefore, all the cells except for

C e l l_3

used the maximum power for minimizing

P_{o u t}

, as shown in Table 4.

Figure 10 presents the

I N R

and

p_{o u t}

for each learning episode. As shown, the

I N R

and

p_{o u t}

converged to the optimal values of the exhaustive search algorithm as the number of learning episodes increased. The

I N R

started at −11.01 dB, which was the value for the use of

P_{m i n}

, as shown in Table 3, and converged to the optimal value of −6.93 dB. Similarly,

p_{o u t}

started at 43.7% and converged to 0.6%. A large variance due to frequent exploration was observed at the beginning of the learning, but it gradually decreased and converged as the learning progressed. Figure 11 presents the cumulative and average rewards for each learning episode. As shown, the reward rapidly increased and then gradually converged at approximately 300 episodes, indicating that the proposed DQL training process allowed the agent to learn the power control algorithm quickly and stably.

We compared the learning results of the DQL and DDQL. Even when the DDQL is used, the results are the same as in Table 4 and Figure 10 and Figure 11, which shows that the overestimation of the DQL did not occur. As a result, it was confirmed that performance degradation due to overestimation did not happen, and sufficient learning is possible only with DQL.

5.2.2. Simulation Results for Interfered Receiver ②

Figure 12 shows the SINR map based on

P_{c}

acquired using the proposed DQL-based power control algorithm for interfered receiver ②. Table 5 presents a performance comparison of the

P^{*}

values obtained via an exhaustive search and

P_{c}

and a comparison of the DQL and DDQL results. As shown,

P_{c}

was equal to the optimal value

P^{*}

, providing the same

P_{o u t}

and

I N R

performance. The interfered receiver was located on the boundary between

C e l l_3

and

C e l l_4

and, thus, received equal (and the strongest) interference from these two cells. In addition, even though all the cells other than

C e l l_3

and

C e l l_4

used the maximum power, their interference was marginal. Therefore, in the optimal power control,

C e l l_3

and

C e l l_4

reduced the power required to satisfy the

I N R_{t h}

, whereas all the other cells used the maximum power for minimizing

P_{o u t}

, as shown in Table 5.

As shown in Figure 13, the

I N R

and

p_{o u t}

converged to the optimal values of the exhaustive search algorithm. Similar to the case of receiver ①, as the learning progressed, the

I N R

converged from −12.08 to −6.08 dB, and the

p_{o u t}

converged from 43.7% to 0.2%. Figure 14 shows that the reward gradually converged at approximately 300 episodes, indicating that the proposed DQL training process allowed the agent to quickly and stably learn the power control algorithm. We compared the learning results of the DQL and DDQL. Even when the DDQL was used, the results were the same as in Table 5 and Figure 13 and Figure 14, verifying that the desired learning is attainable with the DQL only.

5.2.3. Simulation Results for Interfered Receiver ③

Figure 15 shows the SINR map based on

P_{c}

obtained using the proposed DQL-based power control algorithm for interfered receiver ③. The interfered receiver was located in the azimuth main lobe direction of

C e l l_3

. It was closer to the HAPS than the receiver considered in Section 5.2.1 and was more severely affected by

C e l l_3

;

I N R_{t h}

was not satisfied even for the minimum power of

C e l l_3

. Thus, the optimal power control adjusted the power of

C e l l_2

and

C e l l_4

, which caused the second-most interference. Table 6 presents a comparison of the

P^{*}

values obtained using an exhaustive search and

P_{c}

and a comparison of the DQL and DDQL results. Although the

p_{o u t}

of

P_{c}

was 0.6% higher than that of

P^{*}

, it corresponded to the third-smallest value among the 78,125 values generated by the exhaustive search algorithm. In summary, the proposed power control algorithm achieved outstanding performance close to the optimal value.

As shown in Figure 16, the

I N R

and

p_{o u t}

converged to the optimal values of the exhaustive search algorithm, with slight gaps. Similar to the results presented in Section 5.2.1, as the learning progressed, the

I N R

converged from −6.19 to −6.06 dB, and the

p_{o u t}

converged from 43.7% to 5.7%. Figure 17 shows the cumulative and average rewards for each learning episode. The reward exhibited no noticeable improvement until approximately 130 episodes, after which it rapidly increased and then gradually converged at approximately 350 episodes. This is because to satisfy the

I N R_{t h}

, more agents had to take action, and the actions had to be more diverse. Nonetheless, the proposed DQL training process allowed the agent to learn the power control algorithm quickly and stably. We compared the learning results of the DQL and DDQL. Even when the DDQL was used, the results were the same as in Table 6 and Figure 16 and Figure 17, verifying that the desired learning is attainable with the DQL only.

6. Conclusions

This paper proposed a DQL-based transmission power control algorithm for multicell HAPS communication that involved spectrum sharing with existing services. The proposed algorithm aimed to find a solution to the power control optimization problem for minimizing the outage probability of the HAPS downlink under the interference constraint to protect existing systems. We compared the solution with the optimal solution acquired using the exhaustive search algorithm. The simulation results confirmed that the proposed algorithm was comparable to the optimal exhaustive search.

Future work will include various power levels and expanding to multiple-HAPS communication in spectrum sharing with multiple interference systems. Since the increase in the power level could reveal a value-based algorithm’s limit, it is preferred to apply the policy-based algorithm. Given that multiple-HAPS communication could lead to the non-stationarity problem of multiagent reinforcement learning, its solution would be worth studying.

Author Contributions

Conceptualization and methodology, S.J. and H.-S.J.; software, S.J. and W.Y.; validation, formal analysis, and investigation, S.J. and W.Y. and H.-S.J.; resources and data curation, H.K.C. and E.N.; writing—original draft preparation, S.J. and W.Y.; writing—review and editing, S.J., H.-S.J. and J.P.; visualization, W.Y. and H.K.C.; supervision, J.P.; project administration, H.-S.J. and J.P.; funding acquisition, J.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Agency for Defense Development (ADD).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Arum, S.C.; Grace, D.; Mitchell, P.D. A review of wireless communication using high-altitude platforms for extended coverage and capacity. Comput. Commun. 2020, 157, 232–256. [Google Scholar] [CrossRef]
Alam, M.S.; Kurt, G.K.; Yanikomeroglu, H.; Zhu, P.; Đào, N.D. High altitude platform station based super macro base station constellations. IEEE Commun. Mag. 2021, 59, 103–109. [Google Scholar] [CrossRef]
Kurt, G.K.; Khoshkholgh, M.G.; Alfattani, S.; Ibrahim, A.; Darwish, T.S.; Alam, M.S.; Yanikomeroglu, H.; Yongacoglu, A. A vision and framework for the high altitude platform station (HAPS) networks of the future. IEEE Commun. Surv. Tutor. 2021, 23, 729–779. [Google Scholar] [CrossRef]
Hsieh, F.; Jardel, F.; Visotsky, E.; Vook, F.; Ghosh, A.; Picha, B. UAV-based Multi-cell HAPS Communication: System Design and Performance Evaluation. In Proceedings of the GLOBECOM 2020-2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020. [Google Scholar]
Xing, Y.; Hsieh, F.; Ghosh, A.; Rappaport, T.S. High Altitude Platform Stations (HAPS): Architecture and System Performance. In Proceedings of the 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring), Online, 7–11 April 2021. [Google Scholar]
International Telecommunications Union (ITU). World Radio Communication Conference 2019 (WRC-19) Final Acts; International Telecommunications Union: Geneva, Switzerland, 2020; p. 366. [Google Scholar]
Ibrahim, A.; Alfa, A.S. Using Lagrangian relaxation for radio resource allocation in high altitude platforms. IEEE Trans. Wirel. Commun. 2015, 14, 5823–5835. [Google Scholar] [CrossRef] [Green Version]
Guan, M.; Wu, Z.; Cui, Y.; Cao, X.; Wang, L.; Ye, J.; Peng, B. An intelligent wireless channel allocation in HAPS 5G communication system based on reinforcement learning. EURASIP J. Wirel. Commun. Netw. 2019, 2019, 1–9. [Google Scholar] [CrossRef] [Green Version]
Guan, M.; Wang, L.; Chen, L. Channel allocation for hot spot areas in HAPS communication based on the prediction of mobile user characteristics. Intell. Autom. Soft Comput. 2016, 22, 613–620. [Google Scholar] [CrossRef]
Oodo, M.; Miura, R.; Hori, T.; Morisaki, T.; Kashiki, K.; Suzuki, M. Sharing and compatibility study between fixed service using high altitude platform stations (HAPS) and other services in the 31/28 GHz bands. Wirel. Pers. Commun. 2002, 23, 3–14. [Google Scholar] [CrossRef]
Mokayef, M.; Rahman, T.A.; Ngah, R.; Ahmed, M.Y. Spectrum sharing model for coexistence between high altitude platform system and fixed services at 5.8 GHz. Int. J. Multimed. Ubiquitous Eng. 2013, 8, 265–275. [Google Scholar] [CrossRef]
Lee, W.; Jeon, Y.; Kim, T.; Kim, Y.I. Deep Reinforcement Learning for UAV Trajectory Design Considering Mobile Ground Users. Sensors 2021, 21, 8239. [Google Scholar] [CrossRef] [PubMed]
Koushik, A.M.; Hu, F.; Kumar, S. Deep Q-Learning-Based Node Positioning for Throughput-Optimal Communications in Dynamic UAV Swarm Network. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 554–566. [Google Scholar] [CrossRef]
Raja, G.; Anbalagan, S.; Narayanan, V.S.; Jayaram, S.; Ganapathisubramaniyan, A. Inter-UAV collision avoidance using Deep-Q-learning in flocking environment. In Proceedings of the 2019 IEEE 10th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, New York, NY, USA, 10–12 October 2019; pp. 1089–1095. [Google Scholar]
Fotouhi, A.; Ding, M.; Hassan, M. Deep Q-Learning for Two-Hop Communications of Drone Base Stations. Sensors 2021, 21, 1960. [Google Scholar] [CrossRef] [PubMed]
Anicho, O.; Charlesworth, P.B.; Baicher, G.S.; Nagar, A.K. Reinforcement learning versus swarm intelligence for autonomous multi-HAPS coordination. SN Appl. Sci. 2021, 3, 1–11. [Google Scholar] [CrossRef]
Hasselt, H.V.; Guez, A.; Silver, D. Deep reinforcement learning with double q-learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
International Telecommunications Union Radiocommunication Sector (ITU-R). Working Document towards a Preliminary Draft New Report ITU-R M.[HIBS-CHARACTERISTICS]/Working Document Related to WRC-23 Agenda Item 1.4; R19-WP5D Contribution 716 (Chapter 4-Annex 4.19); International Telecommunications Union: Geneva, Switzerland, 2021. [Google Scholar]
International Telecommunications Union Radiocommunication Sector (ITU-R). Modelling and Simulation of IMT Networks and Systems for Use in Sharing and Compatibility Studies; Recommendation ITU-R M.2101-0; International Telecommunications Union: Geneva, Switzerland, 2017. [Google Scholar]
International Telecommunications Union Radiocommunication Sector (ITU-R). Reference Radiation Patterns of Omnidirectional, Sectoral and Other Antennas for the Fixed and Mobile Service for Use in Sharing Studies in the Frequency Range from 400 MHz to about 70 GHz; Recommendation ITU-R F.1336-5; International Telecommunications Union: Geneva, Switzerland, 2019. [Google Scholar]
International Telecommunications Union Radiocommunication Sector (ITU-R). Propagation Data Required for the Evaluation of Interference between Stations in Space and Those on the Surface of the Earth; Recommendation ITU-R P.619-5; International Telecommunications Union: Geneva, Switzerland, 2021. [Google Scholar]
International Telecommunications Union Radiocommunication Sector (ITU-R). Working Document towards Sharing and Compatibility Studies of HIBS under WRC-23 Agenda Item 1.4—Sharing and Compatibility Studies of High-Altitude Platform Stations as IMT Base Stations (HIBS) on WRC-23 Agenda Item 1.4; R19-WP5D Contribution 716 (Chapter 4—Annex 4.20); International Telecommunications Union: Geneva, Switzerland, 2021. [Google Scholar]
Mnih, V.; Kavucuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
International Telecommunications Union Radiocommunication Sector (ITU-R). Characteristics of Terrestrial Component of IMT for Sharing and Compatibility Studies in Preparation for WRC-23; R19-WP5D Temporary Document 422 (Revision 2); International Telecommunications Union: Geneva, Switzerland, 2021. [Google Scholar]

Figure 1. System deployment model.

Figure 2. HAPS seven-cell layout.

Figure 3. Typical antenna structure for multi-cell HAPS communication.

Figure 4. 1st layer cell antenna pattern.

Figure 5. 2nd layer cell antenna pattern.

Figure 6. Interfered system antenna pattern.

Figure 7. DQL-based HAPS power control architecture.

Figure 8. (a) SINR map for

P_{m a x}

; (b) SINR map for

P_{m i n}

.

Figure 8. (a) SINR map for

P_{m a x}

; (b) SINR map for

P_{m i n}

.

Figure 9. SINR map based on the

P_{c}

obtained using the proposed DQL-based power control algorithm for interfered receiver ①.

Figure 9. SINR map based on the

P_{c}

obtained using the proposed DQL-based power control algorithm for interfered receiver ①.

Figure 10. (a)

I N R

and (b)

p_{o u t}

for each learning episode for interfered receiver ①.

Figure 10. (a)

I N R

and (b)

p_{o u t}

for each learning episode for interfered receiver ①.

Figure 11. Reward for each learning episode for interfered receiver ①.

Figure 12. SINR map based on the

P_{c}

obtained using the proposed DQL-based power control algorithm for the interfered receiver ②.

Figure 12. SINR map based on the

P_{c}

obtained using the proposed DQL-based power control algorithm for the interfered receiver ②.

Figure 13. (a)

I N R

and (b)

p_{o u t}

for each learning episode for interfered receiver ②.

Figure 13. (a)

I N R

and (b)

p_{o u t}

for each learning episode for interfered receiver ②.

Figure 14. Reward for each learning episode for interfered receiver ②.

Figure 15. SINR map based on

P_{c}

obtained using the proposed DQL-based power control algorithm for interfered receiver ③.

Figure 15. SINR map based on

P_{c}

obtained using the proposed DQL-based power control algorithm for interfered receiver ③.

Figure 16. (a)

I N R

and (b) p_out for each learning episode for interfered receiver ③.

Figure 16. (a)

I N R

and (b) p_out for each learning episode for interfered receiver ③.

Figure 17. Reward for each learning episode for interfered receiver ③.

Table 1. HAPS system parameters.

Parameter	Value
$Center frequency (f)$	2545 MHz
$Channel bandwidth (B W$ )	20 MHz
Area radius	90 km
$Altitude (h_{H A P S}$ )	20 km
$Number of cells (N_{c e l l}$ )	7
Antenna pattern	Recommendation ITU-R M.2101
$Element gain (G_{E, m a x}$ )	8 dBi
Horizontal/vertical 3 dB beamwidth of single element	65° for both H/V
Antenna array configuration (Row × column)	2 × 2 elements (1st layer cell) 4 × 2 elements (2nd layer cell)
$Ohmic losses (L_{o h m})$	2 dB
Antenna tilt	90° (1st layer cell) 23° (2nd layer cell)
Antenna polarization	Linear/±45°
$Number of distributed UEs (N_{U E}$ )	1000
UE height	1.5 m
UE antenna gain	−3 dBi
$Minimum required SIN R (η_{o}$ )	−10 dB

Table 2. Interfered system (IMT BS) parameters.

Parameter	Value
$Center frequency (f)$	2545 MHz
$Channel bandwidth (B W$ )	20 MHz
$Noise figure (N_{f})$	5 dB
$Antenna height (h_{V})$	20 m
Antenna tilt	10°
Antenna pattern	Recommendation ITU-R F.1336 (recommends 3.1) $k_{a} = 0.7$ $k_{p} = 0.7$ $k_{h} = 0.7$ $k_{v} = 0.3$ Horizontal 3 dB beamwidth: 65° Vertical 3 dB beamwidth is determined from the horizontal beamwidth equations in Recommendation ITU-R F.1336. Vertical beam widths of actual antennas may also be used when available.
Antenna polarization	Linear/±45°
$Maximum antenna gain (G_{0})$	16 dBi
$Protection criteria (I N R_{t h}$ )	−6 dB

Table 3.

I N R

and

P_{o u t}

for the interfered receiver locations.

Table 3.

I N R

and

P_{o u t}

for the interfered receiver locations.

Interfered Receiver	Location (km)	$I N R for P_{m a x} (dB)$	$I N R for P_{m i n} (dB)$	$P_{o u t} for P_{m i n} (%)$
①	100, 0, 0.02	−3.01	−11.01	43.7
②	77.9, 45, 0.02	−4.08	−12.08	43.7
③	65.8, 0, 0.02	1.81	−6.19	43.7

Table 4. Performance comparison for interfered receiver ①.

	$P_{C e l l_1} (dBm)$	$P_{C e l l_2} (dBm)$	$P_{C e l l_3} (dBm)$	$P_{C e l l_4} (dBm)$	$P_{C e l l_5} (dBm)$	$P_{C e l l_6} (dBm)$	$P_{C e l l_7} (dBm)$	$I N R$ (dB)	$P_{o u t}$ (%)
Optimal	37	34	30	34	34	34	34	–6.93	0.6
DQL	37	34	30	34	34	34	34	–6.93	0.6
DDQL	37	34	30	34	34	34	34	-6.93	0.6

Table 5. Performance comparison for interfered receiver ②.

	$P_{C e l l 1}$ (dBm)	$P_{C e l l 2}$ (dBm)	$P_{C e l l 3}$ (dBm)	$P_{C e l l 4} (dBm)$	$P_{C e l l 5} (dBm)$	$P_{C e l l 6} (dBm)$	$P_{C e l l 7} (dBm)$	$I N R$ (dB)	$P_{o u t}$ (%)
Optimal	37	34	32	32	34	34	34	−6.08	0.2
DQL	37	34	32	32	34	34	34	−6.08	0.2
DDQL	37	34	32	32	34	34	34	−6.08	0.2

Table 6. Performance comparison for interfered receiver ③.

	$P_{C e l l 1}$ (dBm)	$P_{C e l l 2}$ (dBm)	$P_{C e l l 3}$ (dBm)	$P_{C e l l 4} (dBm)$	$P_{C e l l 5} (dBm)$	$P_{C e l l 6} (dBm)$	$P_{C e l l 7} (dBm)$	$I N R$ (dB)	$P_{o u t}$ (%)
Optimal	37	34	26	32	34	34	34	−6.02	5.1
DQL	37	32	26	32	34	34	34	−6.06	5.7
DDQL	37	32	26	32	34	34	34	−6.06	5.7

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jo, S.; Yang, W.; Choi, H.K.; Noh, E.; Jo, H.-S.; Park, J. Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing. Sensors 2022, 22, 1630. https://doi.org/10.3390/s22041630

AMA Style

Jo S, Yang W, Choi HK, Noh E, Jo H-S, Park J. Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing. Sensors. 2022; 22(4):1630. https://doi.org/10.3390/s22041630

Chicago/Turabian Style

Jo, Seongjun, Wooyeol Yang, Haing Kun Choi, Eonsu Noh, Han-Shin Jo, and Jaedon Park. 2022. "Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing" Sensors 22, no. 4: 1630. https://doi.org/10.3390/s22041630

APA Style

Jo, S., Yang, W., Choi, H. K., Noh, E., Jo, H.-S., & Park, J. (2022). Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing. Sensors, 22(4), 1630. https://doi.org/10.3390/s22041630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing

Abstract

1. Introduction

1.1. Related Works

1.2. Contributions

2. System Model

2.1. System Deployment Model

2.2. HAPS Model

2.3. Interfered System Model

2.4. Path Loss Model

3. Calculation of Downlink SINR and $I N R$

3.1. Calculation of Downlink SINR

3.2. Calculation of $I N R$

4. DQL-Based HAPS Transmission Power Control Algorithm

4.1. Problem Formulation

4.2. Proposed Algorithm

5. Simulation Results

5.1. Simulation Configuration

5.2. Numerical Analysis

5.2.1. Simulation Results for Interfered Receiver ①

5.2.2. Simulation Results for Interfered Receiver ②

5.2.3. Simulation Results for Interfered Receiver ③

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Deep Q-Learning-Based Transmission Power Control of a High Altitude Platform Station with Spectrum Sharing

Abstract

1. Introduction

1.1. Related Works

1.2. Contributions

2. System Model

2.1. System Deployment Model

2.2. HAPS Model

2.3. Interfered System Model

2.4. Path Loss Model

3. Calculation of Downlink SINR and I N R

3.1. Calculation of Downlink SINR

3.2. Calculation of I N R

4. DQL-Based HAPS Transmission Power Control Algorithm

4.1. Problem Formulation

4.2. Proposed Algorithm

5. Simulation Results

5.1. Simulation Configuration

5.2. Numerical Analysis

5.2.1. Simulation Results for Interfered Receiver ①

5.2.2. Simulation Results for Interfered Receiver ②

5.2.3. Simulation Results for Interfered Receiver ③

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Calculation of Downlink SINR and $I N R$

3.2. Calculation of $I N R$