Research on Satellite Network Traffic Prediction Based on Improved GRU Neural Network

The current satellite network traffic forecasting methods cannot fully exploit the long correlation between satellite traffic sequences, which leads to large network traffic forecasting errors and low forecasting accuracy. To solve these problems, we propose a satellite network traffic forecasting method with an improved gate recurrent unit (GRU). This method combines the attention mechanism with GRU neural network, fully mines the characteristics of self-similarity and long correlation among traffic data sequences, pays attention to the importance of traffic data and hidden state, learns the time-dependent characteristics of input sequences, and mines the interdependent characteristics of data sequences to improve the prediction accuracy. Particle Swarm Optimization (PSO) algorithm is used to obtain the best network model Hyperparameter and improve the prediction efficiency. Simulation results show that the proposed method has the best fitting effect with real traffic data, and the errors are reduced by 26.9%, 37.2%, and 57.8% compared with the GRU, Support Vector Machine (SVM), and Fractional Autoregressive Integration Moving Average (FARIMA) models, respectively.

be congested and unable to respond to the service in real time. Through the forecast of satellite network traffic, we can grasp the changing characteristics and trends of network traffic in advance. At the same time, according to the traffic load of each satellite, the path selection can be planned to avoid local congestion in the network and improve the routing strategy efficiency of the network.
Aiming at the problems of low prediction accuracy and low efficiency of current satellite network traffic prediction methods, this paper fully considers the characteristics of self-similarity and long correlation of satellite network traffic data and proposes an improved GRU neural network satellite network traffic prediction model. Based on the traditional GRU neural network, this paper integrates the attention mechanism to form a new neural network model, which pays attention to the importance of traffic data and hidden state. It not only learns the time-dependent characteristics of input sequences but also mines the interdependent characteristics of data sequences, thus improving the accuracy of traffic prediction. At the same time, the particle swarm optimization (PSO) algorithm is used to determine the optimal super-parameter combination of the model, which makes the model have higher prediction efficiency in traffic forecasting.
The contributions of this paper are as follows: (1) In this paper, the correlation characteristics of satellite network traffic are fully considered, and the nonlinear time dynamic correlation is obtained by using a gating unit to avoid gradient disappearance or gradient explosion during training. (2) In the coding and decoding stages of the GRU network, an attention mechanism is introduced, and multiple intermediate vectors are added to uniformly process the time series and input information of the intermediate vectors at the current movement. (3) Particle swarm optimization algorithm is used to adjust the hyperparameters of the neural network.
The organizational structure of this paper is as follows: The first part introduces the current research status of traffic forecasting at home and abroad. The second part introduces the definition of the satellite network traffic prediction problem and the overall framework of the Attention-GRU (AT-GRU) satellite network traffic prediction model proposed in this paper. The third part introduces the realization principle of each part of the model, explains the coding and decoding unit design module of the new prediction model in detail, and introduces the determination method of the new model parameters and the loss function. In the fourth part, the simulation comparison is carried out. By comparing it with several commonly used prediction models, it is shown that the new model has good prediction performance and high prediction accuracy. The fifth part is the summary and prospect, which summarizes the contents of this paper and points out the areas that can be further improved and optimized in the current research results.

Literature Review
In recent years, satellite network traffic forecasting methods have emerged endlessly, and the forecasting accuracy and efficiency of the different forecasting methods are different. The prediction models can be roughly divided into traditional mathematical-statistical fitting models and popular prediction models based on machine learning neural networks.
Markov model and time series model are commonly used in mathematical fitting models. Yan Z et al. analyzed the disadvantages of the usual Poisson traffic model and proposed a deterministic Markov modulation process model [3] to simulate satellite network traffic. The process of traffic acquisition, storage, and transmission was transformed into a queuing model, some closed expressions of service quality indicators were derived, and the validity of the theory was verified. Dong Y et al. put forward the Autoregressive Moving Average Model (ARMA) to forecast satellite network traffic [4], and the current traffic sampling value is represented by the weighted sum of several historical traffic sampling values. Chen et al. introduced the geographic longitude of the satellite and the transit time of traffic to establish a mathematical model and proposed a forecasting algorithm based on the proxy model to forecast the traffic volume in the satellite coverage area [5]. Although FARIMA Figure 1 shows a system structure diagram of satellite communication. A satellite communication system consists of the space-based network, adjacent space and the ground-based network. The space-based network consists of Geosynchronous Earth Orbit (GEO), Middle Earth Orbit (MEO), and Low Earth Orbit (LEO), which mainly realize global coverage and broadband access. Adjacent space is composed of some aircraft, which mainly realize the function of edge service. The ground-based network is composed of ground gateways, network control centers, and satellite control centers, and is mainly responsible for network services in business-intensive areas. The terminal is a device that inputs programs and data to the computer or receives the processing results output from the computer via communication facilities. Satellite terminals can be divided into handheld terminals, portable terminals, vehicle-mounted terminals, etc., according to terminal types. Each satellite terminal is responsible for managing terminal connections in multiple systems. The terminal will be connected to the default gateway station of its beam when accessing the satellite network for communication. In the satellite uplink, the traffic generated on the ground is uploaded to the satellite, which then forwards it to the ground gateway station. In the satellite downlink, ground gateway stations and data centers complete traffic monitoring [17]. The traffic of all satellites comes partly from other satellites and partly from the ground network. Network traffic prediction can guarantee high-quality communication, so it is widely used in many satellite applications. Satellite traffic has complex characteristics such as self-similarity and long correlation. Different from the terrestrial network, the available resources of the satellite network are more limited, and the topological structure of the satellite network changes from time to time. The satellite traffic prediction algorithm must take into account both accuracy and efficiency. Most ground network prediction models have high computational complexity. If the ground network traffic prediction model is directly applied to the satellite, it will increase the burden on the satellite. In this paper, a Network traffic prediction can guarantee high-quality communication, so it is widely used in many satellite applications. Satellite traffic has complex characteristics such as self-similarity and long correlation. Different from the terrestrial network, the available resources of the satellite network are more limited, and the topological structure of the satellite network changes from time to time. The satellite traffic prediction algorithm must take into account both accuracy and efficiency. Most ground network prediction models have high computational complexity. If the ground network traffic prediction model is directly applied to the satellite, it will increase the burden on the satellite. In this paper, a new neural network satellite network traffic prediction model with an attention mechanism is proposed, which can be applied to satellite network traffic prediction in terms of prediction accuracy and operation efficiency. The flow forecast model diagram is shown in Figure 2. The flow forecast mathematics is set as follows: series Y before the t timestep, which can be expressed as follows ( ) 1 2 , , , where ˆT t y + represents the predicted value and ( )  The overall structure model of this paper is shown in Figure 2. Based on the traditional GRU coding and decoding structure, this paper integrates the attention mechanism. The traditional coding-decoding structure must compress all the input information into fixed-length vectors. Using this simple fixed-length coding to represent longer and more complex inputs often leads to the loss of input information. The attention model allows the decoder to access the output generated by all encoders to overcome the above shortcomings. Its core idea is that all the outputs of the encoder are weighted and combined, and then input into the decoder at the current position to influence the output of the decoder. By weighing the output of the encoder, the alignment between the input and the output can be realized, and at the same time, more information about the original data can be used. The attention module can automatically learn the weight to capture the correlation between the hidden state of the encoder and the hidden state of the decoder.
Attention mechanism, as a milestone in deep learning research, can adjust the original input data according to different needs, and get new input data more in line with the current model. The purpose of this paper is to improve the GRU unit by introducing an attention mechanism, which is to improve the model's acquisition of historical time series information, and integrate the historical time series information into memory cells, so that the whole model can obtain more historical flow information, and extract more helpful information for future prediction for prediction and analysis, thus improving the overall prediction accuracy. The combination of the attention mechanism and GRU can learn the spatial relationship between input variables in the coding stage, and then pay attention to the importance of different input sequences. In the decoding stage, the attention mechanism is introduced to select the hidden state of the encoder, and then the attention weight of the hidden state of the encoder is obtained, and finally, the network traffic prediction value is obtained. Given satellite network traffic data Y, where Y = (y 1 , y 2 , . . . y T ) represents the target sequence within the length of the window T. Based on the given time series information, the goal of satellite network traffic prediction is to predict the value of the time series Y before the t timestep, which can be expressed as followŝ y T+t = F(y 1 , y 2 , . . . , y T ), (1) whereŷ T+t represents the predicted value and F(·) represents the nonlinear mapping function. The overall structure model of this paper is shown in Figure 2. Based on the traditional GRU coding and decoding structure, this paper integrates the attention mechanism. The traditional coding-decoding structure must compress all the input information into fixedlength vectors. Using this simple fixed-length coding to represent longer and more complex inputs often leads to the loss of input information. The attention model allows the decoder to access the output generated by all encoders to overcome the above shortcomings. Its core idea is that all the outputs of the encoder are weighted and combined, and then input into the decoder at the current position to influence the output of the decoder. By weighing the output of the encoder, the alignment between the input and the output can be realized, and at the same time, more information about the original data can be used. The attention module can automatically learn the weight to capture the correlation between the hidden state of the encoder and the hidden state of the decoder.
Attention mechanism, as a milestone in deep learning research, can adjust the original input data according to different needs, and get new input data more in line with the current model. The purpose of this paper is to improve the GRU unit by introducing an attention mechanism, which is to improve the model's acquisition of historical time series information, and integrate the historical time series information into memory cells, so that the whole model can obtain more historical flow information, and extract more helpful information for future prediction for prediction and analysis, thus improving the overall prediction accuracy. The combination of the attention mechanism and GRU can learn the spatial relationship between input variables in the coding stage, and then pay attention to the importance of different input sequences. In the decoding stage, the attention mechanism is introduced to select the hidden state of the encoder, and then the attention weight of the hidden state of the encoder is obtained, and finally, the network traffic prediction value is obtained.

Design of Coding Unit Based on Attention Mechanism
To capture the long-term dependence on satellite traffic data, the encoder structure is based on the GRU unit [18,19]. GRU is a variant network form derived from the recurrent neural network RNN which incorporates a gating unit into the basic structure of RNN and controls the flow of information through the "gate" structure. It can encode the input sequence as a feature representation, and combined with the attention mechanism [20][21][22],

Design of Coding Unit Based on Attention Mechanism
To capture the long-term dependence on satellite traffic data, the encoder structure is based on the GRU unit [18,19]. GRU is a variant network form derived from the recurrent neural network RNN which incorporates a gating unit into the basic structure of RNN and controls the flow of information through the "gate" structure. It can encode the input sequence as a feature representation, and combined with the attention mechanism [20][21][22], the purpose of the encoding stage is to learn the spatial relationship between input variables. The encoding stage is shown in Figure 3. . .
Given the input sequence Y , GRU neural network is used to learn the nonlinear mapping function between the input sequence and the hidden state t h of the encoder at time where ( ) e f ⋅ stands for GRU unit. The GRU unit update process is summarized as follows: , where t r is the reset gate, and how many previous states can be controlled by the reset gate that need to be memorized and stored; t z is the update gate, which is used to control how cells store information; r represent offset values, which are all parameters that need to be optimized for learning; Given the input sequence Y, GRU neural network is used to learn the nonlinear mapping function between the input sequence and the hidden state h t of the encoder at time t: where f e (·) stands for GRU unit. The GRU unit update process is summarized as follows: where r t is the reset gate, and how many previous states can be controlled by the reset gate that need to be memorized and stored; z t is the update gate, which is used to control how cells store information; W r , W z , W h represent the hidden layer weight; b r , b z , b h represent offset values, which are all parameters that need to be optimized for learning; · indicates point multiplication relation; [h t−1 , Y] indicates the concatenation of the currently input network traffic data and the previous hidden state; σ means activation function.
Combining the attention mechanism with GRU neural network, the spatial relationship between input variables can be adaptively learned. Given the Kth sequence Y k of input data, the attention mechanism in the first stage can be expressed as where h t−1 represents the historical hidden state of the encoder, v and U are all parameters to be learned, and e k t can be obtained by the tanh transformation. Attention weight (determined by the historical hidden state of the encoder and the current input value) is introduced to describe the importance of the kth input feature. Formula (8) ensures that all attention weights add up to 1.
At this point, Formula (2) can be updated to: By designing an encoder with an attention mechanism, we can pay attention to the importance of different input sequences, instead of treating all input sequences uniformly.

Design of Decoding Unit Based on Attention Mechanism
To simplify the model design, the decoding unit still adopts the GRU unit. In the decoding stage, the time attention mechanism and GRU neural network are combined to select and weight the hidden state h t of the encoder. In this way, the time relationship of the input sequence can be learned. The decoding stage is shown in Figure 4.
, where represents the historical hidden state of the encoder, and are all parameters to be learned, and can be obtained by the tanh transformation. Attention weight (determined by the historical hidden state of the encoder and the current input value) is introduced to describe the importance of the kth input feature. Formula (8) ensures that all attention weights add up to 1. At this point, formula (2) can be updated to: .
By designing an encoder with an attention mechanism, we can pay attention to the importance of different input sequences, instead of treating all input sequences uniformly.

Design of Decoding Unit Based on Attention Mechanism
To simplify the model design, the decoding unit still adopts the GRU unit. In the decoding stage, the time attention mechanism and GRU neural network are combined to select and weight the hidden state of the encoder. In this way, the time relationship of the input sequence can be learned. The decoding stage is shown in Figure 4.  Assuming the hidden state h t of the encoder at the moment, the attention weight of the hidden state of the encoder can be expressed as where d t−1 represents the historical hidden state of the previous decoder, v and W_t represent the parameters to be learned by the neural network, and l t can be obtained by the tanh transformation. β_t represents the importance of the hidden state of the t-th encoder. The intermediate quantity c t can be calculated as follows: After obtaining the intermediate vector, combine it with the satellite network traffic data sequence (y 1 , y 2 , . . . , y T ), as follows: The hidden state of the decoder at t time can be expressed as where f d (·) stands for GRU unit, and d t can be updated to: where r t is the reset gate, and how many previous states can be controlled by the reset gate that need to be memorized and stored; z t is the update gate, which is used to control how cells store information; W r , W z , W d represent the hidden layer weight; b r , b z , b d are offset values, which are all parameters that need to be optimized for learning; · indicates point multiplication relation; [d t−1 , y t−1 ] represents the concatenation of the previous hidden state and the decoder input data; σ means activation function. Get the network prediction valueŷ T+t at the t moment: where g(·) is a linear change; the softmax function is selected in this paper.

PSO Algorithm for GRU Hyperparameter Selection Problem
There are many network hyperparameters in the construction of a neural network, such as Learning_rate, number of neurons, and epoch. These hyperparameters not only determine the fitting effect of the neural network but also affect the training effect of the model. To achieve the best effect on the network, a strategy should be adopted to determine the values of these hyperparameters. The PSO algorithm is an optimization method based on Swarm Intelligence. Its advantages lie in its simplicity, easy implementation, and profound intelligent background. It is suitable for both scientific research and engineering applications, and there are not many parameters to adjust. Combining the good global optimization ability of the PSO algorithm with a neural network can improve the generalization ability and learning performance of a neural network, thus improving the overall working efficiency of the neural network. In this paper, the values of hyperparameters are determined according to the results of the PSO algorithm, and the optimal structure of the network model is obtained. A particle of PSO represents a hyperparameter combination scheme. In this experiment, a particle is an array of [Number of neurons, Learning_rate, Epoch]. PSO is the process of calculating the fitness of each particle's corresponding scheme and finding the most suitable scheme. The target function selected by PSO algorithm in this paper is the fitness function. In order to calculate the fitness value of each particle, fitness function takes the sum of errors between the predicted value and the real value; we set the best fitness function as: where N is the number of samples, y i is the actual value, and y i is the predicted value. This paper applies the constraint condition of the algorithm, that is, the value range of the hyperparameters. Lecun Y et al.' s research [23] provides the rules of hyperparameter configuration for neural networks. According to the above references, the initial population number is 20, the position range of each particle is [1,150], [0.001, 0.15], [100, 700], that is, the number of neurons is set between [1,150], the learning rate is set between [0.001, 0.15], and the number of iterations is set at [100, 700]. The particle update formula is as follows: In which v i is the velocity of the particle, a i is the current position of the particle, i = 1, 2, . . . , N, N is the total number of particles in the particle swarm, rand is a random number, pbest i is the local optimal solution of the particle i, gbest is the global optimal solution of the particle swarm, that is, gbest is the best value among all pbest i , and c1 and c2 are learning factors. According to the research of Song Mengpei et al. [24], c1 and c2 in this paper are set c1 = c2 = 2.
The Hyperparameter algorithm description of the PSO model is shown in Algorithms 1: Description of Hyperparameter algorithm of PSO optimization model. Initialize the velocity v i and position of particle i 5.
Evaluate particle i and set pbest i = a i 6. end for 7. gbest = min{pbest i }; 8. While not stopping do 9.
for i = 1 to N do 10.
Update the velocity and position of the particle i; 11.
Evaluate particle i;

Loss Function
The loss function can be used to show the difference between the predicted value and the real data. The loss function in this paper adopts the mean square error function: where N represents the number of samples, y i represents the actual flow value, and y i represents the predicted flow value of the model proposed in this paper. At the same time, Adam [25][26][27] optimizer is used to optimize the model parameters, and the loss function is reduced by updating the neuron weight matrix and offset value.

Description of the Data Set
This simulation experiment uses OPNET and STK simulation software to simulate the information transmission network. From the Iridium constellation, we selected six satellites and two ground stations to simulate the integrated information network of heaven and earth. First, the data packets are routed through the satellite network to find the best path, then transmitted to the relay node, and finally arrive at the ground station. The INT node is the input end of the ground base station communication traffic in the analog network topology, and the node transmits the ground communication traffic into the satellite network. Four rcv nodes in the middle simulate the change of satellite topology with time. The OUT node is the output end of the communication traffic in the satellite network, which transmits the communication traffic back to the ground. The maximum number of hops from the output end to the input end is no more than three hops, which means that the link is disconnected. Each node will judge the connectivity of the current link according to its own position and the position of the surrounding nodes at the beginning of the simulation experiment, and save the connectable link as an optional link. Then, according to the optional links, the link with the minimum hop count is found to form the satellite network topology. The topology of the satellite network will change with time as the satellite nodes move in order to simulate the traffic characteristics of the integrated information network between heaven and earth, so this experiment intercepts some experimental data. In the experiment, 25,000 pieces of traffic data in the sampling period are taken as experimental data sets, and the data are divided into two parts: a training set and a testing set, with the training set accounting for 4/5 and the testing set accounting for 1/5. The original satellite network traffic data and test set obtained through simulation are shown in Figures 5 and 6. The red lines in Figures 5 and 6 are partial enlarged views of some sections.

Experimental Environment
The basic hardware environment based on the experiment is shown in

Experimental Environment
The basic hardware environment based on the experiment is shown in Table 1. The experiment is based on the deep learning library of TensorFlow1.15. TensorFlow1. 15 is an open source deep learning framework. TensorFlow can handle all kinds of neural net- Figure 6. Test set of satellite network traffic data.

Experimental Environment
The basic hardware environment based on the experiment is shown in Table 1. The experiment is based on the deep learning library of TensorFlow1.15. TensorFlow1. 15 is an open source deep learning framework. TensorFlow can handle all kinds of neural networks conveniently. TensorFlow is used as the background program in the experiment, and matplotlib is used for visualization. We adopted TensorFlow1.15 version 1.15.

Evaluating Indicator
In this paper, the average absolute error MAE, root mean square error RMSE, and goodness of fit R 2 are selected to evaluate the proposed prediction model: where N is the number of training samples, y i is the actual value, and y i is the predicted value of the model. RMSE can be used to reflect the change amplitude of data. The smaller the value is, the higher the accuracy of the model prediction will be. MAE is not easily affected by outliers, and the smaller the value, the smaller the error. R 2 is used to evaluate the fitting effect of the model, and the value is (0, 1). The closer the value is to 1, the better the fitting effect of the model is.

Results and Analysis of Optimal Parameter Combination of Model
Hyperparameters have a significant influence on the deep neural network, which directly affects the quality of the model. In this experiment, it is verified that the selected hyperparameters are Learning_rate, the number of neurons, and the epoch. The specific selection is as follows: (1) Learning rate: A large choice for learning rate will lead to the lowest loss, while a small choice will lead to the local optimum of the result. Therefore, an appropriate learning rate is crucial. As shown in Figure 7, with the iterative evolution of the PSO optimization algorithm, the learning rate stabilized at 0.073 in the 12th iteration of the optimization algorithm. Therefore, the learning rate selected by the model was 0.073.  (2) Several neurons: The number of neurons will affect the learning ability and network complexity of the model. Too many nodes will prolong the network training time, while too few nodes will lead to poor network performance. As shown in the Figure  8, after the ninth iteration of the optimization algorithm, the number of neurons was stable at 35.  (2) Several neurons: The number of neurons will affect the learning ability and network complexity of the model. Too many nodes will prolong the network training time, while too few nodes will lead to poor network performance. As shown in the Figure 8, after the ninth iteration of the optimization algorithm, the number of neurons was stable at 35.  (2) Several neurons: The number of neurons will affect the learning ability and network complexity of the model. Too many nodes will prolong the network training time, while too few nodes will lead to poor network performance. As shown in the Figure  8, after the ninth iteration of the optimization algorithm, the number of neurons was stable at 35.  (2) Several neurons: The number of neurons will affect the learning ability and network complexity of the model. Too many nodes will prolong the network training time, while too few nodes will lead to poor network performance. As shown in the Figure  8, after the ninth iteration of the optimization algorithm, the number of neurons was stable at 35.

Comparison and Analysis of Simulation Results of Different Algorithms
This experiment shows the prediction performance of AT-GRU by comparing it with the FARIMA model, the SVM model, and the GRU model. FARIMA: Fractional Difference Autoregressive Moving Average Model [28], which is a self-similar model, can capture both long-term and short-term correlation characteristics of traffic data.
SVM (Support Vector Machine): It maps the feature vectors of instances to some points in the space [29]. The purpose of SVM is to draw a line to distinguish these two types of points, so that if there are new points in the future, this line can also make a good classification.
GRU: Gate Recurrent Unit [30], which is a variant network form of RNN neural network, can effectively capture the characteristics between long sequences and alleviate the phenomenon of gradient disappearance or gradient explosion.
The above experiments show the comparison results of the prediction effects of various prediction models. In order to clearly express the experimental results, in Figure 10, (b), (d), (f) and (h) are partial enlarged images of (a), (c), (e) and (g) respectively. Simulation experiments show that all the above four prediction models have a certain prediction ability, but the AT-GRU algorithm proposed in this paper has the most obvious fitting effect. The AT-GRU network model can better reflect the complex characteristics of satellite network traffic data, and its prediction performance is better. Compared with other prediction models, the error is lower, with an MAE of 14.24, RMAE of 20.37, and R 2 score closest to 1. The FAR algorithm has the worst prediction performance, and the FARIMA traffic prediction model can only roughly predict the changing trend of traffic, but there is still a large error with the real traffic value. As FARIMA is a linear series model, it can only process short-time data series, but cannot fit complex nonlinear data, so the prediction error of this model is the largest. MAE was 33.73, RMAE was 42.20, and R 2 scored the lowest. The SVM algorithm can fit the real value well, but the overall traffic prediction effect has some errors. Because the SVR algorithm is mainly used to solve the linear regression problem of small samples, it cannot well reflect the complex characteristics of satellite network traffic. For the prediction of large samples, the error is large, and the model MAE is 22.68 and RMAE is 32.25. In this paper, the attention mechanism is introduced into the GRU network. Compared with the traditional GRU prediction, it can make full use of the target sequence, and the prediction effect is better than that of a single GRU neural network model. The prediction error MAE of a single GRU model is 19.49 and RMAE is 28.08.
As can be seen from Figure 11, AT-GRU has the best fitting effect and the lowest prediction error. The addition of an attention mechanism has promoted the prediction of GRU networks, which can improve the prediction accuracy of satellite network traffic to a certain extent and better meet the demand for satellite network traffic prediction. Figure 12 shows that the proposed method has obvious advantages in convergence speed, which shows that the model can learn data well.

Model Complexity Analysis
Time complexity and space complexity are two important indexes to measure an algorithm, which are used to indicate the amount of time increase and auxiliary space required by the algorithm in the worst state. In deep learning, the Flops value is usually used to explain the time complexity of the model, and the parameters of the model are used to explain the space complexity of the model. Based on the above description of model prediction steps, the time complexity of each model training is analyzed. The complexity of a neural network is usually represented by floating-point arithmetic number Flops. Based on the results in Table 2, we can clearly see that the Flops value of the scheme proposed in this paper is the lowest, so the complexity of the scheme proposed in this paper is the lowest. Meanwhile, according to the results of Figure 12, it can be seen that the loss value of the scheme proposed in this paper decreases the fastest and the loss value is the smallest. We can see that the complexity of the model after optimization is lower than that before optimization. Based on the analysis, it can be concluded that the complexity of the proposed scheme is obviously better than that of the comparison algorithms. As can be seen from Figure 11, AT-GRU has the best fitting effect and the lowest prediction error. The addition of an attention mechanism has promoted the prediction of GRU networks, which can improve the prediction accuracy of satellite network traffic to a certain extent and better meet the demand for satellite network traffic prediction.  Figure 12 shows that the proposed method has obvious advantages in convergence speed, which shows that the model can learn data well.

Model Complexity Analysis
Time complexity and space complexity are two important indexes to measure an algorithm, which are used to indicate the amount of time increase and auxiliary space required by the algorithm in the worst state. In deep learning, the Flops value is usually used to explain the time complexity of the model, and the parameters of the model are used to explain the space complexity of the model. Based on the above description of model prediction steps, the time complexity of each model training is analyzed. The complexity of a neural network is usually represented by floating-point arithmetic number Flops. Based on the results in Table 2, we can clearly see that the Flops value of the scheme proposed in this paper is the lowest, so the complexity of the scheme proposed in this paper is the lowest. Meanwhile, according to the results of Figure 12, it can be seen that the loss value of the scheme proposed in this paper decreases the fastest and the loss value is the smallest. We can see that the complexity of the model after optimization is lower than that before optimization. Based on the analysis, it can be concluded that the complexity of the proposed scheme is obviously better than that of the comparison algorithms.   Figure 12 shows that the proposed method has obvious advantages in convergence speed, which shows that the model can learn data well.

Model Complexity Analysis
Time complexity and space complexity are two important indexes to measure an algorithm, which are used to indicate the amount of time increase and auxiliary space required by the algorithm in the worst state. In deep learning, the Flops value is usually used to explain the time complexity of the model, and the parameters of the model are used to explain the space complexity of the model. Based on the above description of model prediction steps, the time complexity of each model training is analyzed. The complexity of a neural network is usually represented by floating-point arithmetic number Flops. Based on the results in Table 2, we can clearly see that the Flops value of the scheme proposed in this paper is the lowest, so the complexity of the scheme proposed in this paper is the lowest. Meanwhile, according to the results of Figure 12, it can be seen that the loss value of the scheme proposed in this paper decreases the fastest and the loss value is the smallest. We can see that the complexity of the model after optimization is lower than that before optimization. Based on the analysis, it can be concluded that the complexity of the proposed scheme is obviously better than that of the comparison algorithms.

Critical Analysis and Discussion
To improve the accuracy of satellite network traffic prediction, this paper proposes and verifies a satellite network traffic prediction method based on GRU neural network and attention mechanism. Based on GRU neural network, the attention mechanism is integrated to pay attention to the importance of traffic data and hidden state, which not only learns the time-dependent characteristics of input sequences but also mines the interdependent characteristics of data sequences, thus improving the accuracy of traffic prediction. At the same time, the PSO algorithm is used to determine the optimal hyperparameter combination of the model, which makes the model have higher prediction efficiency when forecasting traffic. In this paper, the effectiveness of the proposed method is illustrated by comparing it with several popular traffic prediction algorithms. In terms