1. Introduction
Global network traffic has been proliferating in recent years [
1]. To meet the increasing traffic demand in the network, ubiquitous networking and computing (UNC) have been widely used in production and life in recent years [
2]. As an essential part of the ubiquitous network communication [
3,
4], the low earth orbit (LEO) satellite constellations offer high coverage [
5] and remain available in times of disaster [
6]. It also plays an essential role in the military [
7] and the field of autonomous driving [
8].
With the development of satellite constellations in recent years, [
9], AMC has been used for satellite-to-ground communications within complicated and variable channel environments [
10,
11]. AMC adjusts the modulation and coding scheme according to the varying channel conditions [
12,
13,
14]. The basis for the correct choice of AMC for satellite-to-ground communications is the accurate estimation of the satellite-to-ground channel state. Vincent et al. [
15] introduced a noncoherent M-ary orthogonal AMC method for use in direct sequence code division multiple access scenarios. Ola et al. [
16] dynamically selected different low-density parity-check (LDPC) codes according to the bit error rate (BER). However, in these methods, how to estimate the complicated and variable satellite-to-ground channel is not taken into consideration.
To obtain accurate estimations of the satellite-to-ground channel state, many research works have been carried out. Hole et al. [
17] predicted the channel quality using a smooth fading method to predict channel quality. Daniels et al. [
18] and Tsakmalis et al. [
19] adopted support vector regression (SVR) [
20], a variant of support vector machine (SVM) [
21] to estimate the channel status. Angelone et al. [
22] estimated the channel quality by subtracting the estimated feeder downlink signal to noise ratio (SNR) from the end-to-end SNR and selecting the coding scheme based on this. Wang et al. [
23] adopted machine learning and tested the prediction accuracy of multiple machine learning methods (i.e., linear regression and multilayer perception). Due to the strong regularity of satellite motion, historical channel information is highly relevant to the channel information to be predicted. With the development of machine learning in recent years, long short-term memory (LSTM) [
24] has been widely used in the field of time series prediction. Cheng et al. [
25] used LSTM in an audio-video coding scenario to predict the channel quality of the next moment. Moniem et al. [
26] used LSTM that considers historical SNRs in predicting the channel quality. The above works have not considered the effect of real-time weather on the channel state and, therefore, cannot accurately predict the channel state.
As mentioned, the weather condition strongly affects satellite-to-ground communications, so it should be involved in the estimation process of channel states. According to the second-generation standard for satellite broadband services, DVB-S2 [
27], satellite-to-ground communications should allow for a fixed margin of 6 dB for rain loss. However, the average rain loss varies across the globe and is significantly lower at high latitudes compared with low ones [
28]. Even at low latitudes, the real-time rain loss depends on whether it is raining locally and how much it rains. Therefore, the coding scheme can be selected based on real-time weather conditions. Thus, an encoding scheme that can transmit more data is selected when sunny, and a more secure encoding scheme when it is raining or snowing. Choi et al. [
29] considered weather and predicted the channel variation using an auto-regressive method. Alberty et al. [
30] proposed a method to estimate satellite-to-ground channel quality according to the quality of service (QoS) under different weathers. These schemes only simulate the situation over a specific area. Nevertheless, LEO satellites move around the globe, and thus a model that can consider real-time multiple-region weather is necessary.
Furthermore, even if the actual channel state is obtained, AMC needs to adjust the coding scheme dynamically. Some works [
15,
16] adopted a look-up table method to obtain the optimal coding scheme. However, the conventional look-up table method cannot cope with multiple antennas scenario. Reinforcement learning can adapt to the complicated satellite-to-ground channel and multiple antennas scenario. Victor et al. [
31] proposed a Q-learning algorithm with constrained exploration spaces. Some methods [
32,
33] proposed a multi-objective Q-learning algorithm to select a coding scheme for satellite-to-ground communication. The satellites move in traditional ways, and therefore, there exist relationships between the current channel state and the historical ones. Unfortunately, these works have not considered the past channel state, which may limit their performance and feasibility.
This paper proposes an AMC method based on deep learning (DL) and deep reinforcement learning (DRL) for ubiquitous satellite-to-ground communications. The estimation model solves the drawback of inaccurate channel estimation in the previous works by establishing a global real-time weather model and considering the precious channel information. The decision model can only cope with the multi-antenna scenario that the table look-up method in the previous works can cope with. Nevertheless, the throughput of the satellite-to-ground communication can be improved by using an actor-critic network. The proposed method consists of a DL-based estimation model and a DRL-based decision model. It is proved that the proposed AMC scheme can improve the throughput performance.
This paper’s remaining sections are organized as follows: Finally,
Section 2 describes in detail the work in recent years that is similar to this paper.
Section 3 introduces a satellite-to-ground channel and formulates the AMC problem into a Markov decision process (MDP).
Section 4 presents the proposed intelligent weather-conscious AMC method.
Section 5 describes the performance validation procedure and the simulation results.
Section 6 concludes this paper.
3. System Model
As shown in
Figure 1, it is expected that the weather varies from location to location. The satellite over Beijing can choose a coding scheme that can transmit more data because of the clear weather and higher SNR, whereas the satellite over Shanghai can only choose a relatively conservative coding scheme because of the interference from clouds and low SNR. The SNR of the communication link is mainly determined by free-space loss (FSL) and rain loss. However, the current margin value accounting for rain loss is tremendous, which results in the waste of spectrum, so we can dynamically adjust the coding scheme according to the SNR to make full use of the spectrum source.
3.1. Satellite-to-Ground Channel Loss Formulation
We set the scene as terrestrial user equipment (UE) downloading from a satellite. In this scenario, the satellite can dynamically adjust the coding scheme according to the SNR to increase the throughput. The SNR in our scenario is defined as
In this equation, Boltzmann’s constant
k is fixed, transmitter power
, receiver power
, and bandwidth
are set by the scenario in the beginning and usually remain unchanged. We assume that the transmission rate
is constant to observe the band utilization. Here, only the remaining
and rain loss
are changing.
FSL is only related to distance and frequency, and frequency changes only slightly, so we consider it fixed. Satellites become increasingly closer to us and then fade away, so FSL initially decreases and increases. According to the rain loss calculation method specified by ITU-R P.618-1, rain loss is mainly associated with average rainfall, the altitude, and latitude of UE, elevation angle, and communication frequency. As the communication frequency is fixed, the UE altitude does not easily change, and the elevation angle of each communication is also the same. Therefore, the latitude has the most significant impact. Generally speaking, low-latitude areas have thicker clouds and more significant annual rainfall, while high-latitude areas have lower annual precipitation. Therefore, the rain loss is usually more significant in low-latitude areas, while the rain loss in high-latitude areas is slight. The margin for conventional rain loss, fixed at 6 dB, thus wastes band resources. If we can accurately predict the SNR of the next moment and select a proper coding scheme, band utilization and the throughput can be significantly improved.
Table 1 summarizes the terms and their abbreviations in this paper in a cross-reference table.
3.2. AMC Problem Formulation
We assume that the coordinates of a UE are
, and the coordinates of a satellite are
, and the distance between them can be identified by their coordinates. As cloud thickness and rainfall vary in different locations on Earth, we can use the local real-time weather
w and the position of UE
to indicate the local real-time rain loss
.
From
Section 3.1, SNR can be expressed as a function of
d and
:
In a real scenario, we do not know the real-time SNR at this moment due to the delay in channel transmission, and for the estimation method normally used, which we denote as
, we want the error between the estimated value and the true value to be close to 0:
Now that we have obtained an estimate of the real-time SNR, the next task is to select a suitable redundancy rate. The communication coding standard proposed by the consultative committee for space data systems (CCSDS) for use in LEO satellites, uses the accumulate-repeat-4-jagged-accumulate (AR4JA) [
37] code was constructed from the protograph based on LDPC, using three data rates of 50%, 66%, and 80%. As is shown in
Figure 2, each data rate will have a specific BER at the corresponding SNR. We should try to maximize the data rate while maintaining the quality of communication to obtain the maximum throughput:
where
is the data rate and
represents the package error rate of the corresponding data rate and SNR.
3.3. Markov Decision Process Formulation
As the throughput at this frame is only related to the data rate selected before transmission, we can transform the above problem into an MDP problem represented by a tuple , where is the set of states observed from the environment, is the set of actions from the available selections, the probability distribution of the system is written as , and is the reward.
We first define the state at time slot t as . This part consists of two components: the distance of the satellite-to-ground channel and real-time weather . As we can obtain the distance from the satellite position and the UE position , Moreover, rain loss at different latitudes and longitudes around the world varies. Therefore, we retain the original position information of the state. The connection between the satellite and the UE is concise. Even the longest time is less than 3 min in the Starlink scenario. Hence, we use the weather at the beginning of the connection as the real-time weather for each connection in a single connection. In summary, we define .
Next, we define the action at time slot t as , where is the data rate at time t. It is worth mentioning that the action will not affect the environment itself because the choice of data rate will not affect the location of the satellite and UE, nor will it affect the weather. It will, however, affect the throughput. Although the will be very low if we choose an overly conservative data rate, the throughput will also be low; on the contrary, selecting an overly aggressive action will cause the receiver to make mistakes, and the is so high that it cannot communicate normally.
Finally, we define the reward at time t as , where represents the package error rate of the corresponding data rate and . In order to transmit more data per time slot, we define as the number of bits transmitted per unit bandwidth. The cumulative reward is , where is the discount rate.
4. Intelligent Weather-Conscious AMC Scheme for Global Satellite-to-Ground Communications
4.1. Overview
We propose an AMC method for satellite-to-ground based on DL and DRL, which fully considers satellite motion patterns, historical SNRs, and weather conditions. This method can identify transmitter and receiver characteristics, learn online, and adapt to highly variable radio communication scenarios. As shown in
Figure 3, the intelligent weather-conscious AMC model processes the position and weather information from the environment. Furthermore, it estimates the state of the satellite-to-ground channel jointly with the past channel information. The coding scheme is then dynamically selected based on the estimation results. So the integrated AMC model can be represented as a DL-based estimation model and a DRL-based decision model. The estimation model makes full use of the historical data of the satellite-to-ground channel and takes the real-time global weather model into account. The decision model can input multi-dimensional information and identify the characteristics of different transmitters and receivers.
First, the estimation model reads information from the environment, including the position of the UE at moment t: , the position of the satellite at moment t: , the SNR for the past moments , and the real-time weather conditions. We use one-hot encoding to describe weather; if it is currently sunny, the encoding should be , and to write this conveniently, we will use to represent weather at moment t. In summary, the information read by the estimation model from the environment at moment t is .
Next, when the estimation model predicts the satellite-to-ground channel state at the moment t as , it is passed to the actor-critic network in the decision model to select the optimal encoding scheme. The actor-network is responsible for selecting the optimal encoding scheme, i.e., giving the selected action , Moreover, the critic network needs to score the selection of the actor-network, and the two enhance each other and work together to learn the optimal strategy for selecting the encoding. When the actor selects the action , it needs to be passed to the environment as the encoding scheme for the satellite-to-ground channel at time t. Finally, the environment will pass back the reward , which is the actual throughput at moment t. This concludes a complete interaction.
4.2. DL-Based Estimation Model
Along with DL development, LSTM is widely used in temporal sequence prediction. Considering that the memorability of LSTM can adequately identify the regular motion of satellites, past SNRs, and weather, we chose the LSTM network as an estimation model to predict the SNR of the next moment.
The input of the LSTM network state
is divided into two parts: location information and weather information. We classify the global weather into three types: sunny, cloudy, and rainy, and represent them with one-hot encoding
, denote as
. We denote rainfall and snowfall weather uniformly as rainy because they are both precipitations. The UE on the ground can acquire the weather conditions
and its real-time position
and the satellite position for the next moment by storing the satellite orbit information. We use latitude and longitude to describe the location of the UE on the ground. In a practical scenario, we can use GeoHash [
38] to compress the latitude and longitude information to 6 bits to reduce the bandwidth consumption while preserving the location information. As satellites have altitude, any 3D coordinate system can, theoretically, describe their position. We use the geographic coordinate system [
39] to describe the satellite position, where
denotes the longitude, latitude, and altitude, respectively, of the satellite at the moment
t.
After introducing the input for the LSTM network, we introduce its architecture. As shown in
Figure 3, the input parameters need to go through the embedding layer first. The embedding layer has two blocks, whereby the first block aims to process weather information and the other block aims to process the location information. Two embedding blocks converge together into an LSTM layer. The LSTM layer, differently from conventional RNNs, controls the flow of information through three gates: the forget, memory, and output gates.
4.2.1. Forget Gate
When new information is input, the model needs to forget some of the old information, and the forget gate is used to select which information to forget and which to keep and, in this way, avoids the problems of gradient disappearance and gradient explosion:
where
represents the weight between the input and the forget gate,
is the weight between the precious hidden state
and the forget gate,
is the bias of the forget gate, and
is the sigmoid function.
4.2.2. Input Gate
The input gate is used to determine which new information is saved in the cell state of the gate. The input gate is divided into two parts, where one is a control signal consisting of a sigmoid function to control the
input, and the other is the estimated cell state
at the current moment generated by a
function:
where
and
are the weights between the input gate and state
, while
and
are the weights between the precious hidden layer
and the input gate.
is a hyperbolic tangent function. The cell state vector is updated as follows:
where ⊙ represents Hadamard product operator.
4.2.3. Output Gate
The output gate, which is responsible for selectively outputting the hidden state of the cell, has two parts. One is the control signal
represented by the sigmoid function, and the other is the final output value
:
where
is the weight between the current input and the output gate and
is the weight between the hidden state of the last moment
and the output gate. The predicted state is represented as
where
is the weight vector of the output gate.
The LSTM layer is followed by a fully connected layer, which is used to integrate and analyze the outputs of the LSTM layer. The output layer is connected after the fully connected layer, and since we only need to predict SNR in moment t as , the size of the output layer is one neuron.
The estimation model can be pre-trained using historical information. Finally, the output of the estimation model is passed to the actor-critic network as input in the decision model. Accurate prediction of the SNR at this moment is crucial, which is the basis upon which the decision model can make correct decisions.
4.3. DRL-Based Decision Model
As MIMO antennas are often used in satellite-to-ground channel communication, the conventional look-up table method cannot cope with it; moreover, in order to be compatible with the gap between different devices and to adapt to the dynamically changing characteristics of the satellite-to-ground channel, we adopt a DRL-based decision model for selecting the optimal coding scheme for each moment.
For simplicity of presentation, we use
to represent state, action, reward, respectively, at moment
t. The state of the decision model is the satellite-to-ground channel state estimated by the estimation model, action is defined as
, and reward is represented as throughput
. We suppose a trajectory exists in the MDP problem and that the trajectory describes the interaction process between the environment and the DRL agent. Therefore, we can obtain the rewards of each time in the trajectory, and the real cumulative reward at state
is
where discount factor
is a hyperparameter that balances short-term and long-term returns.
There is less accurate data available for learning in satellite-to-ground channel communication, so we want to use historical data fully. Therefore, the Proximal policy optimization (PPO) algorithm [
40], based on the actor-critic algorithm [
41], is used as the gradient update algorithm. Actor and critic are the two neural networks in the agent. The actor-network is responsible for making decisions and selecting the best action
, while the critic network is responsible for scoring and evaluating the choice of the actor.
We use this sampled value as the expected cumulative reward to train the critic network. The loss function is defined as
where
is a parameter of the critic network.
The environments of satellite-to-ground channels are similar, under similar environments and similar SNRs, and the choices are likely to be the same. To fully use the information from other trajectories, we introduced importance sampling into gradient propagation.
where
is the current policy and
is the old policy for collecting trajectory,
is the estimation of advantage function, which measures how much a specific action
is better than the average actions at state
. To reduce the bias of advantage function, we employ an exponentially weighted method to obtain the Generalized Advantage Estimation (GAE) [
42]:
where
is a hyperparameter. If
, we have
.
Referenced by the gradient descent, we obtain the first-order derivative solution, which is closer to the second-order derivative solution, by adding soft constraints. Due to excessive deviations in the trajectory, we adopted the method in [
40] to avoid large gradient deviations.
where
is the ratio between the new policy and the old policy,
is a hyperparameter that denotes the tolerance for the deviation level, and
modifies the surrogate objective by clipping the probability ratio, which removes the incentive for moving
outside of the interval
.
Therefore, we can formulate the objective function of the actor network as
where
is an entropy bonus that encourages exploration and
is a balancing hyperparameter. We summarize the training procedure of purposed intelligent weather-conscious AMC in Algorithm 1. Each expectation term is evaluated by the averaged results of a batch of samples.
Algorithm 1:Training of the Intelligent Weather-Conscious AMC. |
![Electronics 11 01297 i001]() |
6. Conclusions
In this paper, we proposed a weather-conscious AMC method for satellite-related UNC. Firstly, the satellite-to-ground scenario was modeled and formulated into an MDP problem. Then, the proposed framework was depicted, which contained the DL-based estimation model and the DRL-based decision model. The estimation model was based on LSTM, which remembered historical information and was responsible for acquiring information from the environment and predicting satellite-to-ground channel states. The decision model was designed based on the actor-critic network. The actor-network in the decision model was responsible for selecting a proper coding method, and the critic network scored the selection of the actor-network. Within our proposed method, the real-time global weather and historical channel information were fully considered, and therefore, the accuracy of channel estimations could be improved. The designed decision model can intelligently switch coding schemes in advance, thus increasing the total throughput of satellite-to-ground communications. Simulations were carried out by using the LSTM network and actor-critic network to verify the performance of the proposed method. Results showed that our estimation model outperformed three existing ones, including SVR, linear soothing, and exponential smoothing. It was also demonstrated that the proposed method improved the throughput by 3.1% over the BER-based and PER-based look-up table method. This work can be helpful to realize the internet connectivity service everywhere in the UNC.