2.1. Channel Model
Figure 1 shows the reference scenario considered in this paper. We studied LOS estimation and link selection in the presence of dual connectivity in NTNs with simultaneous use of two radios, as envisioned by the 3GPP [
2]. In this architecture, the LEO satellites are equipped with the gNB Distributed Unit (DU) [
22], while the Centralized Unit (CU) is located on the ground. We considered a scenario in which the satellites and a UAVs equipped with two pieces of UEs were moving relative to each other. We made use of the StarLink satellite system with a mass constellation of 3000 LEO satellites. The UE could connect to two satellites simultaneously. To allow our framework to overcome energy constraints, our AC agent ran on the Ground Control Station (GCS) of the UAV which had enough computational resources. After computations, the GCS sent the traffic scheduling policy to the energy-constrained UE (UAV), which only performed traffic scheduling over the satellite links. (see
Figure 1).
According to [
23], a satellite in the constellation moves in a circular orbit with inclination
at an altitude
h and an orbit radius
, and the satellites move independently of each other. The same authors in [
23] defined
as the central angle between the Earth station (the UE in this case) and the locus of the trajectory points of the satellite corresponding to an elevation angle
, with
. For any single point of the satellite’s locus, the maximum elevation angle
determines the visibility time of the satellite and the distribution of the elevation angles in the visibility region [
23]. The visibility region is defined as the smallest angle
for which the satellite is visible from the UE along its whole trajectory. Therefore, given the UE latitude
, the probability for a satellite in its trajectory to be visible from the UE can be determined from the Probability Density Function (PDF) of
, denoted by
where
and
The PDF in Equation (
2) may assume different shapes according to
,
and
, as detailed in [
23]. For space limitations, we will only account for the PDF of elevation angles considering the points of the satellite’s trajectory in the visibility region. The authors in [
23] derived the PDF
as the marginalization of the joint probability
, defined as in the following equation:
where
and
Therefore, the satellite visibility interval from UE at a given latitude as the elevation angle varies from
to
was given in [
23] as
The satellites move in different orbits at different speeds. According to 3GPP [
4], the LOS probability changes with the changing satellite elevation angle. In general, the LOS probability increases with the elevation, reaching a maximum at
Nadir (90
) when the satellite is above the UE if it is in the orbital plane of the satellite. In dense urban areas, the LOS probability is lower, especially at low altitudes, because the signal is obstructed by and reflected off of high-rise buildings. Consequently, the AC agent must learn whether to schedule traffic transmission on any one link or on both links simultaneously (for redundancy), according to a given QoS requirement and according to the estimate of the LOS probability model of the two links as the satellites change their elevation angles.
In this work, for the channel model, we adopted the statistical model for mixed propagation conditions provided by the International Telecommunication Union (ITU) for the design of Earth-space land mobile telecommunications systems [
5]. In this ITU recommendation, a communication channel between a satellite and a UAV or any land mobile terminal is characterized by variations in the received signal power due to shadowing from buildings and vegetation, as well as multipath fading as a result of reflections from obstacles and from the ground. The ITU recommendations [
5] provide a three-state Markov chain model to characterize the behavior of the land mobile satellite channel:
The first state is characterized by the presence of the LOS. This state is modeled by a Ricean fading for unshadowed areas with high received signal power.
The second is the state with no LOS due to strong shadowing and blocking from obstacles. This state is modeled with Rayleigh fading.
Between these two states, there is a third state known as the transition state, in which the multipath component power increases or decreases linearly [
5]. For the purpose of this work, we followed a more simplified Markov chain model known as the Lutz model [
24,
25], which approximates the three states into two states: The first is the lossless good state (G) with LOS, and the second is the bad state (B) with no LOS which is characterized by shadowing, blocking and erroneous traffic reception.
We now derive the state transition probabilities according to the Lutz model. Following the work in [
26], we let
be the switching matrix of the Lutz model. In this model, the time required to transmit a bit is taken as the channel state transition unit, and
b and
g are the transition probabilities from G to B and from B to G, respectively, with G denoting the good state and B denoting the bad state.
If we let
and
denote the mean length (in meters) of the G and B states, respectively, as derived in [
24], and let
be the speed in meters per second (m/s) of a moving vehicle, and we then assume that the packets transmitted by the vehicle have a length of
l bits, with
R being the bit rate in bits per second (bps), then the state time durations
and
are given by
and
, respectively, equal to
According to [
5],
and
can be computed as follows:
where
and
are the mean, standard deviation and minimum state lengths in meters, respectively, of the channel states.
These parameters (
and
) are provided in [
5] for urban, suburban and rural environments at different elevation angles and transmission frequencies as reported in
Table 1.
For the purpose of this work, we used the parameters for an urban environment at 2.2 GHz to compute the mean lengths of the channel states
and
using Equation (
7). We then used Equation (
6) to calculate the corresponding transition probabilities
g and
b for our channel model as reported in
Table 2. We then used these transition matrices to create Markov link state traces for training our model. We assumed successful traffic reception only if there was an LOS (i.e, it was in a good state). We also assumed that the UAV received some feedback reports as described in [
3], indicating the traffic reception status and the link state.
2.5. Simulation Set-Up
We simulated UAV satellite transmission with dual connectivity, in which one UAVs could use two pieces of UEs to connect to two different satellites. Our goal was to train an AC learning agent to estimate the LOS model of the two satellites and the optimal policy for selecting appropriate links (transmission policy) while tracking the changes in the elevation angles of the satellites. As pointed out earlier, the LOS probability changes according to the elevation angle. Therefore, the learning agent must continuously track the variation of the LOS of the two satellites as a function of the elevation angles. To this end, using a satellite tracker (
https://satellitemap.space, accessed on 11 January 2023), pairs of Starlink satellites visible from Paris, France at a given time were selected. A satellite pair was selected that provided clear handoff events that forced the learning agent to retrain its model. This means that as the elevation angle of one satellite decreased and, consequently, the LOS probability decreased, the corresponding UAV interface connected to a new visible satellite with a higher elevation angle. Note, however, that the handoffs of the two interfaces did not happen simultaneously, since the two radio interfaces were independent of each other.
Since the new connections had channel models different from the previous ones, the learning agent was forced to retrain to adapt to the new environment. The selected pairs of angles and the two handoff events are shown in
Figure 3.
Figure 4 shows the probability mass function of the satellite visibility at given elevation angles, evaluated using Equation (
4), compared with the empirical values achieved from the real dataset collected by the satellite tracker during a window of 15 min (the maximum allowed).
The ITU recommendations in [
5] provide the link parameters required to compute the state transition probabilities at different elevation angles, propagation frequencies and environments. These parameters are reported in
Table 1 for each state: the good state (G) and bad state (B). These include the mean (
), variance (
) and minimum duration of each state in the given propagation environment. For the purpose of this work, we used the parameters for the dense urban scenario at 2.2 GHz and elevation angles of
,
and
, at which the satellites were visible from Paris, France, which was our reference scenario. By applying these parameters in Equations (
6)–(
8), we computed the corresponding state transition probabilities, as reported in
Table 2. Other parameters used for the computation included the velocity of the mobile UE (
= 10 m/s) and the packet length (
l = 1000 bits).
Using the selected pairs of angles, shown in
Table 4, and the computed transition probabilities, we constructed state transition matrices for each satellite and for each pair of elevation angles, obtaining a total of six transitions or
contexts (i.e., in each context or range of elevation angles, we transitioned to a different channel model). The duration of the context was approximated to the minimum satellite visibility time. It is important to note that we did not draw any assumptions about the physical layer schemes or channel loss models because it was not mandatory for the training of the AC agent, according to the objective function that was designed. We then trained our AC agent using the channel traces of LOS and NLOS created above. The simulated AC networks consisted of 3 fully connected layers with 64 neurons on each layer. The output layer of the actor had a softmax activation function because it had to give the probabilites of selecting each channel, while the output layer of the critic networks had no activation functions because they gave only a single value: the state-action value. The other simulation parameters are reported in
Table 3. We ran the simulation for 6 million iterations, with 1 million iterations for each context. On each iteration or transmission event, we considered it to have good reception only if the reported channel state was good or in the LOS. The E2E loss was computed after an episode of 1000 iterations, and the results are reported in the following sections.