Caching-Aware Intelligent Handover Strategy for LEO Satellite Networks

: Recently, many Low Earth Orbit (LEO) satellite networks are being implemented to provide seamless communication services for global users. Since the high mobility of LEO satellites, handover strategy has become one of the most important topics for LEO satellite systems. However, the limited on-board caching resource of satellites make it difﬁcult to guarantee the handover performance. In this paper, we propose a multiple attributes decision handover strategy jointly considering three factors, which are caching capacity, remaining service time and the remaining idle channels of the satellites. Furthermore, a caching-aware intelligent handover strategy is given based on the deep reinforcement learning (DRL) to maximize the long-term beneﬁts of the system. Compared with the traditional strategies, the proposed strategy reduces the handover failure rate by up to nearly 81% when the system caching occupancy reaches 90%, and it has a lower call blocking rate in high user arrival scenarios. Simulation results show that this strategy can effectively mitigate handover failure rate due to caching resource occupation, as well as ﬂexibly allocate channel resources to reduce call blocking.


Introduction
In recent years, the 5th generation mobile communication system (5G), which aims to provide high-speed wireless services [1] for global users, has developed rapidly. However, due to the impacts of terrain and cost on infrastructure construction, terrestrial cellular networks can only cover densely populated areas [2], and it fails to provide communication services for mountainous terrain, oceans, and air areas. The advantages of satellite communication, which are wide coverage, strong resistance to destruction, and insensitive to terrain factors, can compensate for the limitations of the terrestrial mobile communication networks. Therefore, satellite communication has become one of the key technical components for systems beyond 5G to achieve global coverage [3]. It is widely implemented in many fields [4], such as military, disaster emergency, digital broadcasting and television, and mobile communication. Some scholars also proposed that the 6th generation (6G) wireless communication will be "5G + satellite network" [5]. Thus, many countries and companies are actively engaging in the research and implementation of satellite communication systems, especially the Low Earth Orbit (LEO) communication systems with lower propagation delay [6], such as Starlink, OneWeb, etc.
Generally, the LEO satellite system has a dynamic topology, which leads to frequent handover between terrestrial terminals and satellites [7]. Unfortunately, frequent handover not only makes the LEO satellite systems difficult to guarantee the quality of service (QoS) of users but also leads to the waste of radio resources. Moreover, the existing studies on satellite handover strategies, such as inter-beam handover and inter-satellite handover [8], mainly focus on handover based on the receiving signal quality, remaining service time, and so on.
Papapetrou et al. [9] proposed three different handover criteria: the maximum remaining service time, the maximum number of idle channels, and the minimum distance, which could be applied to new coming or handover calls. The strategy based on the maximum remaining service time can greatly reduce the handover times, delay, and signaling cost. On basis of this strategy, Hu et al. [10] proposed a velocity-aware handover prediction method to find the shortest path of the time-expanded graph dynamically. Duan et al. [11] proposed a distributed handover method by taking into account the impact of routing, and it can reduce the propagation delay while keeping the handover times acceptable. Seyedi et al. [12] proposed a simple real-time handover strategy, which exploited both the global positioning system (GPS) infrastructure and multiple satellites, to minimize the expected handover times. The strategy based on the maximum number of idle channels selected the candidate satellite with the largest number of available channels. Thus, the load distribution of satellite systems can be balanced, and the limited network resources can be efficiently utilized. Zhou et al. [13] proposed a dynamic channel reservation scheme based on priorities. The traffic, which is predicted based on the deterministic movement property of LEO satellites, was used to obtain the thresholds for reserved channels. It can effectively reduce handover failure rate and improve channel utilization by dynamically adjusting the thresholds according to the traffic conditions. The strategy based on the minimum distance selected the candidate satellite by considering the distance between the satellites to avoid link interruption. Wu et al. [14] proposed a graph theory-based inter-satellite handover strategy, which adopted the shortest path algorithm to obtain the optimal handover scheme. Furthermore, other single-attribute handover criteria can be achieved with this handover model by changing the path weights. Since the above studies only consider the effect of a single attribute of the candidate satellite, they cannot achieve a good trade-off between handover times, system load, and success rates of handover.
Li et al. [15] proposed a multi-layer handover management framework and different handover procedures based on handover prediction which can reduce handover delay and signaling cost. Furthermore, they also proposed a dynamic handover optimization method, which takes traffic, rate demand, and channel gain into account, aiming at reducing the dropping rate and guaranteeing the QoS of mobile terminals. Li et al. [16] proposed a user-centric handover scheme for ultra-dense LEO satellite networks, and it can realize seamless handover by buffering user's downlink data in multiple satellites simultaneously. Wu et al. [17] proposed a handover algorithm based on the potential game, and the strategy considered the remaining service time and the satellite elevation angle which can be utilized to minimize the average satellite handover times and decrease call-dropping probability. They also proposed a terminal random-access algorithm aiming at balancing the network load. He et al. [18] proposed a load-aware satellite handover strategy based on multi-agent reinforcement learning, and it can balance satellite load to avoid network congestion, while, at the same time, maintain low signaling overhead. Miao et al. [19] proposed an LEO satellite handover strategy based on the multi-attribute decision. The strategy used the technique for order preference by similarity to an ideal solution (TOPSIS) evaluation method to calculate the weighted values of three attributes, which include signal strength, remaining service time, and remaining idle channels. And then, the stratrgy selected the candidate satellite with the best overall performance. Zhang et al. [20] considered the impacts of channel quality, remaining service time, and the number of service users on handover strategy, and they used the entropy method to weight each factor and transform it into a single objective optimization problem. Xu et al. [21] analyzed a quality of experience (QoE)-driven handover strategy, which considered routing delay, remaining service times, and remaining idle channels with high-speed mobile users for LEO satellite networks. Table 1 shows the summary and comparison of the above studies.

Single attribute
Papapetrou et al. [9] reduce the handover times balance load, reduce handover failure rate avoid link interruption He et al. [10] velocity-aware handover prediction, find the shortest path Duan et al. [11] routing delay reduce the propagation delay Seyedi et al. [12] GPS, multiple satellite minimize the handover times Zhou et al. [13] traffic prediction reduces handover failures rate, improves channel utilization Wu et al. [14] optimal handover strategies for end-to-end communication

Multiple attribute
Li et al. [15] traffic, rate demand reducing the dropping rate, guarantee the QoS of mobile users.
Wu et al. [17] minimize the handover times, decrease call-dropping probability He et al. [18] load-aware balance load, maintain low signaling overhead Miao et al. [19] single strength reduce handover times, balance load and guarantee QoS Zhang et al. [20] number of users, satellite power reduce handover times, balance load and guarantee SNR Xu et al. [21] routing delay reduce handover times, failure rate and transition delay Besides factors related to the communication metrics, resource management by considering the integration of communication, computing and caching (3C) is also important for future mobile edge computing (MEC) enhanced satellite systems [22]. Caching data at satellite nodes can improve communication efficiency by avoiding duplicate transmissions [23,24]. Liu et al. [23] proposed a novel caching algorithm by optimizing contents placement in LEO satellite constellation networks, which was used to minimize user terminals content access delay. Zhang et al. [25] analyzed the caching restricted resource allocation with joint optimization of satisfaction index and spectrum efficiency for multibeam satellite systems. Since the satellite on-board caching resource is limited, we focus on evaluating the inter-satellite handover strategies for the LEO satellite systems with the caching aware strategy.
Moreover, we aim to tackle the following problems encountered by the existing handover strategies: (1) Although the existing handover strategies analyze several factors that affect the performance of handover, the effect of limited on-board caching is not considered. Moreover, the joint-effect of multiple attributes, which are on-board caching, remaining service time, and idle channels, is not considered either. (2) The existing handover strategies make the handover decisions with the snap shotbased topology. However, the topology of LEO satellite networks is time varying, and the snap shot-based handover strategies cannot guarantee the long term performance of the dynamic system.
To solve these problems, the effects of on-board caching and joint-effect of multiple attributes on handover strategies are analyzed in this paper. Furthermore, an intelligent handover strategy based on deep reinforcement learning (DRL) is proposed to reduce the dropping probability and call blocking rate. The main contributions are listed as below.
(1) A novel framework for caching-aware intelligent handover strategies is proposed for LEO satellite networks. Different from existing handover strategies, the joint-effect of multiple attributes, including remaining service time, remaining idle channels, and remaining caching capacity, on handover performance are investigated with dynamic network topology. (2) To adapt to the dynamic topology of satellite systems, the inter-satellite handover process is modeled as a Markov decision process, and the process for the intelligent handover strategy is provided in detail. (3) An intelligent handover algorithm based on DRL is proposed. The algorithm can make decisions on when will the handover be activated and select the target satellite in each time slot. Moreover, the DRL algorithm can make continuous handover decisions, which makes the whole system obtain the maximum long-term benefits. Simulation results demonstrate the effectiveness of the proposed handover strategy.

System Architecture and Handover Factors
LEO satellites are typically deployed at low altitude, such as 500 km to 1500 km, and the LEO satellites move with a high speed relative to the users on the ground. Thus, frequent handover may occur during the service time of users. This paper considers a constellation of LEO satellites in the sun-synchronous orbit. There are 12 orbits within the constellation, and 9 satellites are located at each orbit. The satellites are located at an altitude of 1000 km and have an orbital inclination of 99.4843 deg. A typical handover scenario in the LEO satellite system is presented in Figure 1. Each terrestrial user terminal establishes a communication link with an LEO satellite for transmitting data. Due to the high mobility of the LEO satellite, the terrestrial users will move out of the coverage of the serving satellite after a period of connecting, and handover is required to ensure continuous communication. In addition to this case, the access of new coming users can also lead to handover within the system. And the adjacent satellites will continuously exchange the remaining resource information through the inter-satellite link. When the source satellite detects that the handover is needed for the connected user links, it will analyze the resource information and select the best satellite from the candidate satellite list. Before the handover, the data that has not been sent will be cached, and then the cached data should be sent to the optimal candidate satellite after the new link is established.
In the P time slot, the serving satellite refers to the satellite currently connected to the subscriber, which provides communication services to the subscriber during the time slot. Candidate satellites refer to satellites that are available for connection in the P time slot other than the serving satellite. The candidate satellites can be selected when the handover is activated for the user's communication link. There are 2 to 4 visual satellites for a user in this system. In addition to the serving satellites, there are 1 to 3 candidate satellites. Once the handover is activated, the optimal candidate satellite is selected from the candidate satellites according to the intelligent policy.

Remaining Service Time
Due to the high mobility, a LEO satellite stays in the visible range of the user for about 10 min generally, during which the satellite can communicate with the user. This period is called the maximum service time. The maximum remaining service time refers to the time that the communication link can maintain before the serving satellite moves out of the user's visual range. In Figure 2, due to the user's speed is far less than the speed of the satellite, we assume that the user terminal is stationary relative to the satellite. For every user and its serving satellite, the elevation angle between the user and the satellite will gradually increase from the minimum value to the maximum value and then, finally, decrease to the minimum value. Specifically, at T 0 , the user terminal enters the satellite coverage area, and the elevation angle from the user terminal to the satellite is the smallest. At this point, the satellite begins to provide communication services for users, and the remaining service time is the longest. With the movement of the satellite, the elevation angle increases gradually, and the user terminal gets the maximum elevation angle at T 1 . The position of the satellite footprint Q 1 coincides with the point H, and H stands for the closest point from the location of the user terminal to the satellite footprint trajectory. After T 1 − T 0 , the elevation angle of the user terminal is minimum again in T 2 slot, and the user leaves the satellite service coverage area. According to the geometric relationship, we can obtain the maximum service time T max [20] as follows: where Γ(t 0 ) is the radian of ∠Q 0 OH, O is the center of the Earth, and ω is the angular velocity of the satellite. Γ(t 0 ) can be calculated with Υ(t 0 ) and Υ min [20] as follows: where Υ min is the radian of ∠Q 1 OU, O is the center of the Earth, and Q 1 U represents the shortest distance between the user terminal and the satellite footprint trajectory. Υ(t 0 ) is the radian of ∠Q 0 OU, and Q 0 U represents the longest distance between the user terminal and the satellite footprint trajectory. According to the geometric relationship, we can calculate the maximum service time with the total radian of the satellite footprint trajectory 2Γ(t 0 ).
First, Υ(t 0 ) [20] can be obtained as shown in the Figure 3. It can be expressed as where R e is the radius of Earth, h is the altitude of satellite orbit, and θ min is the user's minimum elevation angle. Then, Υ(t 0 ) can be expressed as Figure 3. The geometric relationship of Υ(t 0 ). Figure 4, Υ min [20] can be obtain as follows:

As show in
where d min is the shortest distance of user terminal to the satellite footprint trajectory plane. In the actual system, the latitude and longitude of the satellite footprint point (φ s t , ϕ s t , 0) and user (φ u t , ϕ u t , ζ u t ) for time slot t can be obtained according to the GPS positioning system. Converting latitude, longitude and altitude coordinates (φ, ϕ, ζ) to Earth-Centered Earth-Fixed (ECEF) coordinates (x, y, z). The conversion formula is: where a is the length of the Earth's semimajor axis, and e is the eccentricity of the Earth. Use three coordinates of the satellite footprint {(x s t , y s t , z s t )|t = 0, 1, 2} to determine the satellite footprint trajectory plane equation where According to the plane Equation (7), d min can be expressed as Finally, the angular velocity ω of the satellite in the ECEF coordinate system can be got according to the geometric relationship in Figure 5, which can be expressed as where ω s and ω e are, respectively, the angular velocity of the satellite and the Earth in the Earth-Centered Inertial(ECI) coordinate system, σ is the satellite orbit inclination, ω s can be expressed as where µ is the Kepler constant, R e is the Earth radius, and h is the altitude of the satellite orbit.
With Equations (4) and (5), Γ(t 0 ) can be written as Then, the maximum service time of satellite can be obtained by Equation (1), and the remaining service time at the current slot T can be expressed as where T 0 is the recorded service start time.

Remaining Idle Channels
Each satellite has a fixed number of channels. Channels that are occupied will not be allocated to other users, while unoccupied channels are idle and waiting to be allocated. The number of remaining idle channels reflects the satellite load. Unbalanced load distribution will lead to a large number of idle channels in the light-load satellite coverage area, and the dropped calls and call blocking rate will increase in the overload areas. This situation will greatly reduce the overall performance of the system. In this paper, the number of idle channels is obtained in real-time through information interaction between satellites and utilized as a factor to make the handover decisions. For example, the handover strategy can dynamically reset the links of connected users to the adjacent satellites with idle channels to reduce the blocking rate of new calls.

Remaining Caching Capacity
During the handover period, the user's unsent data will be cached to the serving satellite, and they will not be sent to the target satellite until the handover is completed. The data will be cached and utilized to ensure the integrity of user data and avoid packet loss caused by handover. Apart from the occupied caching resources, the remaining caching resources that can be used for caching the data during the handover process are called the remaining caching capacity. However, the limited satellite on-board caching resource is difficult to meet the requirement of a large amount of data caching caused by multi-user simultaneous handover processes, and it will lead to handover failure and data packet loss. Therefore, the on-board caching resource is regarded as one of the decision factors of handover, and the information can also be obtained from the information interaction between adjacent satellites.

Handover Flow
This paper proposes an inter-satellite handover strategy, where the handover decisions are made by the serving satellite, and handover flow is shown in Figure 6. Firstly, the user periodically reports the user's location to the serving satellite, and the satellite calculates the remaining service time according to the user's location. At the same time, each satellite will periodically send its resource information to adjacent satellites, mainly including the number of idle channels and the remaining caching capacity. For example, at a certain time slot, the serving satellite will receive resource information from adjacent satellites and make handover decisions according to the user location information and the satellites resource information. In detail, the trained DRL network is adopted to decide when will the handover be executed and the candidate satellite. If the handover procedure is activated, the serving satellite will send the handover request to the optimal candidate satellite. Then, the optimal candidate satellite applies for the resources. If the resources are sufficient, it will send the handover response to the serving satellite, who is caching the unsent user's data and sends the handover notification to the user. From then on, the user and the candidate satellite can establish the communication link, and the handover can be completed. If the candidate satellite has no remaining resources, or the serving satellite's caching is overflow or the link is interrupted, handover failure occurs. Based on the handover signaling flow chart, we will analyze the handover latency. The latency of this process is composed of propagation latency and data transmission latency, which is ignored here as the signaling transmission latency is much smaller than the propagation latency. Firstly, the signaling interaction from user to satellite for reporting location information and information exchange among adjacent satellites occurs periodically, so this part of the signaling overhead is not counted as part of the handover delay. When the serving satellite decides to activate handover based on the state of the environment, it interacts with the optimal candidate satellite for the handover request signaling. The request and response signaling is carried out on the inter-satellite link, and the maximum signaling time between two satellites at a time is approximately 17 ms, as can be calculated from the STK simulation environment. When the serving satellite finishes caching the unsent data, it sends a handover command to the ground user, and the propagation delay from satellite to ground user is about 3.3 ms. The user receives the handover command and establishes a connection with the optimal candidate satellite. Then, the handover process is finished. This process involves two Earth-satellite link transmission and three inter-satellite link transmissions. The total propagation delay of the process is 57.6 ms.
Here, we should note that there will be several reasons that will cause satellite handover. Besides the satellite movement, we also focus on the following two situations in this paper: • When the new coming user asks for access, the communication links of the connected users may be reset from the serving satellite that has no idle channels to another candidate satellite. Thus, the channels can be released for the new coming users. • If the remaining caching capacity is less than the amount of data that will be sent by the users, the handover cannot be carried out. Otherwise, the handover will fail, and it will result in packet loss and a sharp decline in user experience.
Moreover, the handover decisions will be made by considering the joint-effects of several attributes, such as remaining service time, remaining idle channels, and remaining caching capacity, to obtain the best overall system performance.

Intelligent Handover Strategy with Multiple Attributes
The handover decisions will affect the resource utilization of the connected satellite resources, so the handover decisions at each time slot will also be affected by the decisions at the previous time slots; thus, the process can be modeled as the Markov decision process. In order to achieve a dynamic and continuous handover decision that takes into account multiple factors in a continuous environmental state, we propose an intelligent handover strategy based on the DRL. The strategy can intelligently decide when will the handover be activated and select the optimal satellite based on the resource utilization information of the candidate satellite. It fully takes into account the joint-effect of multiple attributes which include the remaining service time, the number of idle channels, and the remaining caching capacity. Among these attributes, two attributes are the state of the candidate satellite (the remaining service time and the number of idle channels), and another attribute is the state of the serving satellite (the remaining caching capacity). Moreover, the handover strategy can make continuous decisions for the dynamic LEO network. The network structure is shown in Figure 7, and the detailed explanation for the DQN training process can be found in Reference [26]. We need to train this decision network in advance and use the parameter-stabilized DRL network to output the decision actions. During the actual handover process, the satellite simply feeds the environment information from the mobile terminal periodically into the DRL network, and the corresponding handover decision will be obtained. The specific design of an intelligent handover decision network based on DRL, which is shown in Figure 7, can be denoted as a tuple (S, A, P, R). S denotes the state space of the LEO system. A is the handover action space. P is the space of the state transition probability. S denotes the reward of state and action.
State space (S): the state space is derived from the environment of the LEO satellite communication system. There are four environment state quantities in this paper: the satellite label Ω, the number of remaining idle channels Θ, the remaining cache capacity C, and the remaining service time T. Firstly, the satellite label is used to express which satellite is currently connected to the user, and which satellites are candidates. The number of remaining idle channels, remaining cache capacity, and the remaining service time are the factors to be taken into account for handover. And each state consists of the corresponding state of the serving satellite and the handover candidate satellites. The state at time slot p can be defined as Ω p is the set of satellite label, which can be expressed as [Ω * p , Ω 1 p , Ω 2 p , . . . , The upper corner of the status variable symbol is used to refer to the serving satellite or the candidate satellite. "*" represents the relevant status of the serving satellite which connected to the user. The lower corner "i, p" indicates the user number i and the time slot p. So, Ω * i,p is used to express the serving satellite label of user i. Ω * i,p ∈ {1, 2, 3, . . . , M}, M is the total number of the LEO satellites. Ω K p is difined as [Ω K 1,p , Ω K 2,p , . . . , Ω K i,p ]. The upper corner "k" indicates the relevant status of the candidate satellite k. For example, the maximum number of visible candidate satellites for the user in this paper is 3, then the upper corner is 1 or 2 or 3. So, Ω K i,p is used to express the kth candidate satellite label of user i. Θ p refers to the information of satellite channels, which can be expressed as , and Θ * i,p is the number of idle channels of the serving satellite of user i. Θ * i,p ∈ {1, 2, 3, . . . , I}, I is the total channel number of a single satellite. Θ K p is defined as [Θ K 1,p , Θ K 2,p , . . . , Θ K i,p ], and Θ K i,p is the number of idle channels for user i's Kth candidate satellite. C p = [C * p , C 1 p , C 2 p , . . . , C K p ] and T p = [T * p , T 1 p , T 2 p , . . . , T K p ] is the remaining caching capacity and service time.
Action space (A): For each time slot, the agent will decide whether the handover be activated or not and select the optimal target satellite for every user. The action of time slot p can be expressed as where Λ i,p ∈ {0, 1, 2, 3, . . . , K}. The value '0' means that the handover for user i will not be activated. Other values mean that handover will be activated, and the corresponding number is the label of the candidate satellites. K is the maximum number of candidate satellites. Transition probability (P): Since the system state in this paper is continuous and the state is affected by the handover decision, the transition probability from s p to s p+1 with the action a p is difficult to be obtain. Hence, a model-free DRL framework based on a deep Q-learning network (DQN) is adopted.
Reward function (R): We divide the reward function R(s p , a p ) into two parts: gain function g p and cost function l p .
g i,p is the gain of remaining communication resources. Three attributes, which are the number of idle channels, remaining caching capacity, and remaining service time, are normalized. w is the weight of each attribute, and w 1 + w 2 + w 3 = 1. l p is the cost function. α p is the number of successful handover times in time slots p, and β p is the number of failed handover times. The failed handover defined here refers to the handover failure caused by several reasons. For example, the candidate satellite does not have idle channels, or the remaining service time is reduced to zero, or the serving satellite does not have enough caching capacity. δ p is the number of dropped calls caused by insufficient resources (remaining service time, channel, etc.). According to the definitions of state, action, and the reward, we can calculate the target Q value: Then, the mean square error loss function is calculated as where θ is the main network parameters, and θ − is the target network parameters. After calculating the loss function, the gradient descent strategy is used to update the main network parameters. To break the time correlation between the sequences, the replay buffer is used to store the experience, and a random sample of minibatch is used for learning. The detailed algorithm is described as Algorithm 1.

Algorithm 1 Intelligent handover algorithm based on DRL.
Initialize replay buffer D Initialize action-value function Q with random weights θ Initialize target action-value functionQ with weights θ − = θ Repeat Initialize episode = 1 and p = 1 Initialize the start state s 1 with connection relationship. for p = 1, T do Select a random action a 1 With probability ε otherwise select a p = arg max a Q(s p , a; θ).
Execute action a p in LEO environment emulator.
Receive a reward R(s p , a p ) and next state s p+1 Store transition (s p , a p , r p , s p+1 ) in D.
Sample minibatch of transition (s p , a p , r p , s p+1 ) randomly from D when D is full. Update the main network parameters θ using gradient descent with the goal of minimizing the loss function L(θ) defined in Equation (20) Every C steps resetQ = Q. end for Until episode > episode_ max.

Results
Satellite tool kit (STK) is used to obtain the topology of the satellite network. The simulation parameters are shown in Table 2. And the number of visible satellites for each terrestrial user is from 2 to 4, which means that each user has up to 3 candidate satellites except the serving satellite. Five user terminals are set up in Beijing with the locations following a uniform distribution. We use STK software to simulate the scene and obtain the geographical coordinates of each user. And STK is used to simulate LEO satellite constellation to obtain satellite operation data. In this paper, DQN is adopted, and Figure 8 shows the convergence of the DQN network loss function with the increase of training steps. It shows that the loss function converges when the number of training steps reaches about 4500.
In the simulation, the convergence characteristics of the reward value with different learning rates, which are 0.0001, 0.001, and 0.01, are compared and shown in Figure 9. Simulation results show that different learning rates will achieve different performances, and the convergence speed will also vary. It can be seen from Figure 9 that the convergence speed with learning rate 0.01 is the fastest, and the reward value after convergence has a small fluctuation range and high stability. The reward value with a learning rate of 0.01 is much higher than that, with a learning rate of 0.001. Therefore, the learning rate in this paper is set as 0.01.   Figure 10 compares the performance of four handover strategies in terms of handover failure rate. RST represents the traditional handover strategy based on the remaining service time (RST), which takes the maximum remaining service time as the selection criteria for selecting optimal candidate satellites, and handover is activated when the current serving satellite has no remaining service time. NIC is a traditional handover strategy based on the number of remaining idle channels (NIC), which takes the maximum number of remaining idle channels as the selection criteria for selecting candidate satellites, and handover is activated when there is no remaining service time. MAF is a multi-attribute fusion (MAF) decision based on the TOPSIS evaluation strategy. The strategy considers the remaining service time and the number of idle channels to select the optimal candidate satellite, and handover is activated when there is no remaining service time. IMF is the handover strategy of Intelligent multi-attributes fusion (IMF) decisions based on the DRL, which is proposed in this paper. It can be seen from Figure 10 that the handover failure rate increase with the raise of caching occupancy. Among the strategies, the RST strategy has the highest handover failure rate because it does not consider the remaining channel resources and the remaining caching capacity. Moreover, the growth trends of the handover failure rate of RST, NIC, and MAF are significantly improved when the caching occupancy is higher than 50%. When the caching occupancy is 90%, the handover failure rate is close to 100%. When the caching occupancy is 90%, the handover failure rate of IMF is close to 20%, which shows its performance gain over the referred strategies. Therefore, the proposed IMF handover strategy has the best performance with high caching occupancy.
Moreover, the performance of these strategies on call blocking rate is also compared. Here, the call blocking rate is defined as: Here, N new new coming users arrive in the satellite coverage area and send an access request to the satellite. However, because the satellite does not have enough remaining channel resources for serving all the new coming users, N block new coming users' access requests are rejected. In this paper, we consider the average call blocking rate in one hour under a high load rate (95% channel occupancy). The call blocking rate can reflect the flex-ibility of the handover algorithm and the QoS of users in the coverage area. As shown in Figure 11, the call blocking rate of every handover strategy is increasing with the rise of user arrival rate. Among them, the new call blocking rate of RST is the highest because the channel state is not considered. The call blocking rate of the NIC is lower than that of the RST strategy. The MAF strategy considers the remaining service time and the number of idle channels, but the blocking rate is still high because it cannot flexibly adjust the connections of connected users. The IMF strategy can not only consider multiple factors but also flexibly configure the connected user link. If necessary, the connected user link can be reset to the other satellites with the idle channel resources, and it can greatly reduce the call blocking rate. The simulation results show that the call blocking rate of IMF is the lowest. Especially when the user arrival rate is 10, the call blocking rate of IMF is about 23% lower than that of the RST strategy.

Discussion and Conclusions
In this paper, a caching-aware intelligent handover strategy is proposed for the LEO satellite network. The strategy is different from existing handover strategies. First, the handover strategy focuses not only on the selection of the optimal candidate satellite for handover but also on the handover moment when it will be activated. That is to say the intelligent handover strategy needs to judge whether the handover can be activated at this moment and to select the candidate satellite. Secondly, the effects of multiple factors, including remaining service time, remaining idle channels, and remaining caching capacity, on the handover strategy are jointly considered. Then, DRL is adopted to make continuous intelligent handover judgment, which enables sequential decision to maximize long-term gains by interacting with the environment.
Via simulation, it is verified that the caching-aware intelligent handover strategy has a significant performance improvement in both the handover failure rate and call blocking rate. The performance of this strategy is compared with typical RST, NIC single-attribute handover, and MAF multi-attribute strategies. When the system caching occupancy is 10%, this strategy reduces the handover failure rate by nearly 40% compared to the RST handover strategy and nearly 20% compared to the MAF strategy. With the increasing of caching occupancy, the handover failure rate of the RST, NIC, and MAF handover strategies increase rapidly, while the caching-aware handover strategy proposed in this paper only has a failure probability of 18.3%. This shows that the handover strategy proposed in this paper can improve the quality of service for users in high caching occupancy scenarios. In addition, the system call blocking rates of each policy are compared for different user arrival rates. When the user arrival rate is high, the proposed intelligent handover strategy reduces the call blocking rate by 25% compared to the RST handover strategy and by 18.5% compared to the MAF handover strategy. Therefore, it can be concluded that the proposed strategy can effectively balance the system load, relieve the network pressure, reduce packet loss, and improve the quality of service for users.
Moreover, the DRL used in this strategy can make intelligent and continuous handover decisions to obtain maximum long-term gains. The complexity of the DRL mainly depends on the dimensions of state and action space, which is affected by the number of users, satellite and handover factors considered. Generally, the training process of DRL will cost long time to converge. Once the training process is finished, the complexity of decision process is low. Therefore, the training process can be executed on ground gateways with a large amount of computing resources, and the decision process can be implemented on-board. Moreover, artificial intelligence applied in satellite communication systems is still an open topic, and we will also investigate other light-weight models that could be utilized for resource management in satellite networks in our future works.

Conflicts of Interest:
The authors declare no conflict of interest.