A Service-Caching Strategy Assisted by Double DQN in LEO Satellite Networks

Luan, Yuchen; Sun, Fukun; Zhou, Jiaen

doi:10.3390/s24113370

Open AccessArticle

A Service-Caching Strategy Assisted by Double DQN in LEO Satellite Networks

by

Yuchen Luan

¹

,

Fukun Sun

^1,*

and

Jiaen Zhou

^2,*

¹

Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100045, China

²

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Authors to whom correspondence should be addressed.

Sensors 2024, 24(11), 3370; https://doi.org/10.3390/s24113370

Submission received: 9 April 2024 / Revised: 15 May 2024 / Accepted: 21 May 2024 / Published: 24 May 2024

(This article belongs to the Section Sensor Networks)

Download

Browse Figures

Versions Notes

Abstract

Satellite fog computing (SFC) achieves computation, caching, and other functionalities through collaboration among fog nodes. Satellites can provide real-time and reliable satellite-to-ground fusion services by pre-caching content that users may request in advance. However, due to the high-speed mobility of satellites, the complexity of user-access conditions poses a new challenge in selecting optimal caching locations and improving caching efficiency. Motivated by this, in this paper, we propose a real-time caching scheme based on a Double Deep Q-Network (Double DQN). The overarching objective is to enhance the cache hit rate. The simulation results demonstrate that the algorithm proposed in this paper improves the data hit rate by approximately 13% compared to methods without reinforcement learning assistance.

Keywords:

Satellite fog computing; Double DQN; caching hit

1. Introduction

1.1. Background and Motivations

With the emergence and widespread adoption of 5G systems, the limitations inherent to terrestrial networks, particularly in terms of the coverage area and construction costs, have become increasingly pronounced. This has spurred a quest for alternative solutions to mitigate these drawbacks. Concurrently, as communication technology continues to evolve and our understanding of space deepens, there has been a burgeoning interest in exploring the potential of low-Earth-orbit (LEO) satellite internet as a viable alternative.

LEO satellite networks offer several distinct advantages over traditional terrestrial networks. Firstly, they provide extensive coverage, spanning vast geographic areas that may be challenging or cost-prohibitive for terrestrial infrastructure to reach. Secondly, the deployment costs associated with LEO satellites are comparatively lower than those of ground-based networks, making them an attractive option for expanding connectivity to underserved or remote regions. Additionally, LEO satellites boast a significant capacity, enabling them to support a large volume of data traffic with minimal latency. Furthermore, the integration of LEO satellite networks with other high-altitude platforms, such as UAV networks, presents opportunities to enhance the delivery of services to ground users. By leveraging the flexibility and mobility of UAVs, satellite internet providers can offer more convenient and tailored services, thereby improving the overall user experience [1].

In the context of future communication technologies, such as 6G, satellite internet emerges as a key breakthrough direction. Its inherent advantages make it well-suited for various applications, including civilian and emergency communication, Internet-of-Things (IoT) connectivity, military operations, and beyond [2,3,4]. As such, the potential for the further development and widespread application of satellite internet is vast and promising [5]. Amidst the construction and deployment of LEO constellations such as Starlink, the conventional LEO satellite service model, which relies on transparent forwarding, is no longer suited to meet the rapidly evolving demands of businesses. Various applications now require lower latency and smoother content delivery, necessitating more responsive and efficient services at the user end. Advancements in storage technology and computing chips have empowered LEO satellites to offer computing and caching services for content. However, the exponential surge in demand for low-latency mobile applications and multimedia services poses a significant challenge. According to Cisco’s Visual Networking Index (VNI) report [6,7], IP video traffic is expected to double by 2022, constituting 82% of the total IP traffic. In response to the escalating traffic demands, LEO satellite caching has garnered attention as an innovative solution. By pre-caching popular content on LEO satellites, which can function as fog nodes, timely and dependable content services can be delivered to edge users [8,9,10,11]. In satellite networks, numerous challenges are encountered, such as the time-varying topology of the satellite network. The neighboring satellites of a satellite vary at different times. Satellites move at high speeds, resulting in increased switching frequencies and lower cache hit probabilities. In the model proposed in the paper, content requested by users is considered a cache hit for both the satellite and its neighboring satellites, which significantly differs from ground-based networks.

The dynamic movement of LEO satellites results in continuous changes in the coverage area of satellites and the connected edge user terminals, necessitating frequent data replacement in the cache [12,13,14,15]. However, the cache space of each satellite is limited, and redundant data may accumulate if identical data segments are repeatedly cached across the satellite constellation, leading to suboptimal cache space utilization. Therefore, exploring collaborative cooperation among satellites becomes imperative [16,17,18]. Additionally, for large-scale LEO constellations, processing massive data caching incurs high computational complexity and costs [19]. To tackle these challenges, the development of a rational LEO satellite data-caching scheme is essential. This scheme should focus on determining cache placement locations and enhancing cache efficiency [20,21,22].

1.2. Related Works

Currently, scholars have conducted in-depth research on caching strategies for low-Earth-orbit (LEO) satellites. Some scholars aim to reduce the bandwidth resource consumption when satellites transmit cached content, balancing network load distribution as an optimization objective. The authors in [23] propose a dual-layer caching model based on content delivery for satellite–ground communication. This model deploys one layer of cache at ground stations and another layer on satellites, reducing satellite bandwidth consumption through the joint optimization of the dual-layer cache. The literature [24] presents a caching deployment method for LEO satellite networks based on named data networking (NDN), allocating loads in the network and maximizing the effectiveness of internal caching. This method considers the placement of cached data based on the topology of the satellite constellation. The results show that, by using only a small number of caching nodes, the length of the cache path can be reduced to one-third, aiding in load distribution within the network.

Furthermore, some scholars aim to reduce resource consumption in the network by increasing cache hit rates. Bommaraveni et al. employ active learning methods to understand the popularity of content, allowing the system to balance between caching new content and current content [25]. Xu et al. propose a replacement algorithm called ALIRS to improve scalability, maintaining cache hit rates while reducing service latency [26]. Chen et al. formulated a cache placement problem based on obtained request probabilities, aiming to maximize cache hit rates under storage capacity constraints [27]. They developed a dynamic programming algorithm to obtain the optimal caching strategy.

To enhance satellite service quality for users to obtain a better experience, the authors in [28] propose a quality of experience (QoE)-based optimization scheme for video stream caching placement. This scheme considers the required video stream rates and social relationships among users to optimize cache placement. The results show that the proposed caching method significantly outperforms traditional methods in terms of QoE. The authors in [29] study joint caching placement and content delivery in satellite–ground integrated cloud wireless access networks. The literature minimizes long-term power consumption through optimizing cache placement, access point (AP) clustering, and multicast beamforming. In [12], the authors propose a content layout optimization-based caching algorithm for LEO satellite constellation networks. They designed an optimized caching policy to cache popular content preferred by users on satellites. This scheme enhances service quality in various scenarios and achieves more efficient content distribution within the satellite network.

However, the above-mentioned solutions mainly focus on the optimal caching placement at specific time points, neglecting the cooperation between satellites and resulting in the underutilization of cache space. Therefore, the authors in [30] propose a cooperative content-sharing method between satellites to maximize the utilization of the limited storage space on individual satellites. By increasing the connectivity status between satellites and ground stations to ensure coordinated content transmission, they effectively reduce the average service latency and address challenges using cooperative caching between multiple satellites and base stations. In [31], to jointly consider meeting user resources, the authors model the data-caching problem in satellite–ground networks as a joint optimization problem involving caching, resource allocation, and computational resources, and they solve it using deep Q learning algorithms.

1.3. Contribution and Organization

In this work, we employ the Double DQN algorithm to predict service content for LEO satellites. The proposed method aims to improve the cache hit rate of LEO satellites. Moreover, as the cache hit rate increases, the frequency of content requests within the network decreases, thereby reducing the likelihood of communication congestion in the entire network. Additionally, it can decrease the computational resources consumed via the system for content re-encoding and decoding. Within this framework, the main contributions of this paper are as follows:

We propose a three-layer architecture integrating cloud, fog, and edge computing. The fog layer can provide services to edge users through its own content or content from neighboring nodes while also accessing content from the cloud, thereby achieving synergistic effects among cloud, fog, and edge computing.
We utilize a reinforcement learning-based approach to train Double DQN agents and conduct simulation experiments based on synthetic data, providing an optimal solution for satellite caching schemes.
We established a simulation experiment verification scheme and compare it with multiple methods to validate the effectiveness of the proposed approach in satellite edge caching strategies.

The rest of this paper is organized as follows: in Section 2, we present the network model of the cloud–fog–edge three-layer architecture and introduce the caching mode. Section 3 introduces the Double DQN method adopted in this study. Section 4 describes conducted simulation experiments and analyzes the results based on the proposed architecture and method. In Section 5, we summarize the entire work and provide future research prospects.

2. System Model

2.1. Network Model

As depicted in Figure 1, our work considers a network architecture composed of three tiers: cloud, fog, and edge. The cloud layer comprises service centers deployed on the ground or at medium and high orbits, containing all content and services. The middle layer consists of LEO satellites, which store a portion of content and services and connect to both the cloud service center and edge users through wireless networks. Edge users, such as vehicles, aircraft, and other terminal devices, have demands for content and services. Due to the fast movement of satellites, edge users are assumed to be stationary in the network model, while fog satellite nodes are in a state of high-speed motion. Each edge user locally stores historical data and content preference records. Since fog satellite nodes only cache a portion of content and services, when the cache of a fog satellite fails to match the demands of edge users, it needs to retrieve the corresponding content from the cloud server. Specifically, the following aspects are involved:

Fog satellite nodes: Fog satellite nodes directly connect with edge users and provide network and content services. When the requested content is available at a fog satellite node, it interacts directly with the edge user in the form of content or services. If the requested content is not pre-cached, the fog node requests it from adjacent nodes or the cloud service center and caches the relevant content.
Adjacent satellites: If adjacent satellites cache the requested content, they collaborate to provide content or services to the satellite directly connected to the edge user and then continue to serve the edge user.
Cloud service center: The cloud service center is connected to the backbone internet. We assume it contains all content and services. Thus, when fog nodes are unable to provide content services, they initiate content retrieval requests to the cloud center, cache the content locally, and continue to provide services.

2.2. Mobility Model

Due to the complexity of satellite networks, we adopted the BPP model as the representative for satellite networks in this study. The BPP model has been proven by Wang [32] to be an effective model for satellite networks.

Proposition 1.

For a point in homogeneous BPP, the azimuth angle is uniformly distributed between 0 and

2 π

, i.e.,

ϕ_{B P P}

∼

U [0, 2 π]

, and the cumulative distribution function (CDF) of each point’s polar angle (of the spherical coordinate),

θ_{B P P}

, follows

F_{θ_{B P P}} (θ) = \frac{1 - cos θ}{2}, 0 \leq θ_{B P P} \leq π,

(1)

and

θ_{B P P}

can be generated by,

θ_{B P P} = \arccos (1 - 2 U [0, 1]), 0 \leq θ_{B P P} \leq π .

(2)

Note that the BPP given in subsequent parts of this paper means the homogeneous BPP unless otherwise stated. The distribution of user terminals is modeled randomly.

Considering the relativity of motion, we assumed that the satellite topology remains unchanged, assuming that ground terminals have a random initial direction and speed of movement for simulation. When a ground user moves into the coverage area of a satellite, they initiate a content request to the satellite. To facilitate simulation, we set up a scheme for each time slot that calculates the positions of ground users before the start of each time slot, and within each time slot, the network topology is relatively static. The speed of ground users is distributed randomly to simulate the heterogeneous and diverse user characteristics in a real environment. For the t-th time slot, the current satellite, user, and content are denoted as

s^{t}

,

u^{t}

, and

o^{t}

, respectively.

2.3. Communication Model

We assumed that all edge users within the satellite coverage area could establish service with the satellite. We calculated the intersection point of the line connecting each satellite to the center of the Earth with the Earth’s surface, and then we computed the distance between the intersection point and the user as d. Each user selects the satellite with the minimum d to establish a communication link. It is easy to understand that, when the satellite is directly above the user, its projection coincides with the user, resulting in the highest communication quality. To represent the access relationship between users and satellites, we use the variable

α ≜ {[α_{m, k}]}_{\forall (m, k) \in (M \times K)}

as

α_{m, k} = \{\begin{matrix} 1, & {SUE}_{k} is served by {LEO}_{m}, \\ 0, & otherwise . \end{matrix}

(3)

Based on the real application scenario, we assumed that each edge user can only access one satellite. Therefore, the following constraints can be derived:

(C 1) : \sum_{\forall m \in M} α_{m, k} \leq 1, \forall k \in K .

(4)

Once

{SUE}_{k}

is served via

{LEO}_{m}, l e t W_{k}^{SUE}

, which is the bandwidth allocated to

{SUE}_{k}

,

p_{m, k}

indicates the transmission power of this user. The orthogonal bandwidth assignment is assumed in this work based on which the signal-to-noise ratio (SNR) of

{SUE}_{k}

can be written as

γ_{m, k}^{SUE} = \frac{p_{m, k} h_{m, k}}{σ_{m} W_{k}^{SUE}},

(5)

where

σ_{m}

is the noise power density per

Hz

at

{LEO}_{m}

. The achievable rate of

{SUE}_{k}

at

{LEO}_{m}

can be expressed as

R_{m, k}^{SUE} = W_{k}^{SUE} {log}_{2} (1 + γ_{m, k}^{SUE}) = W_{k}^{SUE} {log}_{2} (1 + \frac{p_{m, k} h_{m, k}}{σ_{m} W_{k}^{SUE}}) .

(6)

Taking into account the LEO association decision, the achievable transmission rate of

{SUE}_{k}

can be described as

R_{k}^{SUE} (p, W^{SUE}, α) = \sum_{\forall m \in M} α_{m, k} R_{m, k}^{SUE},

(7)

where

p ≜ {[p_{m, k}]}_{\forall m, k}

and

W^{SUE} ≜ {[W_{k}^{SUE}]}_{\forall k}

. Regarding the communication rate demand at each SUE, the following constraint is introduced,

(C 2) : R_{k}^{SUE} (p, W^{SUE}, α) \geq {\bar{R}}_{k}^{SUE}, \forall k \in K,

(8)

in which

{\bar{R}}_{k}^{SUE}

indicates the required transmission rate of

{SUE}_{k}

.

2.4. Caching Model

Based on the characteristics of LEO satellite orbits, we assume that all satellites within the environment move in the same direction. Satellite access to edge users follows a Poisson distribution with an average arrival rate of

λ

. When an edge user accesses a satellite node, a content request is generated. Considering the segmented nature of content requests, we assume that, within one service cycle of a satellite, user requests can be completed. Therefore, in a single segment request, the edge user and fog satellite node are unique. The speed of satellites is dependent on the orbit altitude, and satellites at the same altitude move at the same speed.

We denote satellite nodes and edge users, respectively, as

s = {1, 2, \dots, N}

and

u = {1, 2, \dots, K}

. The requested content is represented as

o = {1, 2, \dots, M}

. We assume that the cache priority between different contents is equal, primarily considering user demands for cache decisions. We consider that each user may have preferences for several content types, with preference levels defined as

α \in [0, 1]

. A higher value of

α

indicates a higher preference level for the user, also implying a higher probability of the user requesting the corresponding content.

The inter-satellite links of LEO satellites utilize laser for communication, offering an extremely high bandwidth and ultra-low latency, which can be negligible in the overall delay. Therefore, as shown in Figure 2 in our model, we assume that, when cached content exists in neighboring satellites, it also indicates a cache hit. If cached content is absent in all the satellites accessed by the current user and neighboring satellites, the satellite needs to request content from the cloud. In this case, due to the additional satellite-to-ground transmission process, we consider the transmission delay for cached content to double.

3. Implementation

3.1. Introduction to Deep Reinforcement Learning

Traditional reinforcement learning (RL) is a process in which the “agent” entity continuously, through trial and error, evaluates rewards and improves solutions in the unknown environment. Some classic methods include Q-learning, SARSA, etc. During the learning process, the “agent” selects corresponding actions based on the current state, obtains corresponding rewards, and transitions to the next state. It is worth noting that each action may affect the subsequent state. Through continuous iteration, the RL method aims to determine the optimal action in each state to obtain the largest reward in the target problem. Generally, reinforcement learning can be modeled as a Markov model:

M D P = 〈S, A, P, R, γ〉

, where S represents the set of states, A represents the set of actions, P represents the probability of selecting different actions in each state, and R represents rewards. The parameter

γ

is a discount factor used to discount future rewards into current rewards.

However, traditional RL methods can only handle limited and discontinuous states, which have certain limitations. With the combination of deep learning, the agent can learn to observe any state feature in the environment and perform corresponding decision training. The DQN algorithm is one of the classic DRL methods that has been improved on the basis of the Q-learning algorithm, and its process is shown in Figure 3.

As shown in Figure 3, the DQN algorithm adopts an experience replay mechanism. For any state, s, after taking an action, a, it will enter another state,

s^{'}

, and receive the corresponding reward, r. The quadruple

〈s, a, r, s^{'}〉

will form a sample and be stored in the replay buffer. After reaching the specified number, a batch of samples will be randomly selected for training. The DQN algorithm uses two Q-networks; one is the evaluation network, also known as the original network, which is responsible for controlling the “agent” and collecting experience, with weight parameters of

ω

, and the other is the target network, with weight parameters of

ω^{-}

, which is used to calculate the time difference (TD) target, as shown in the following equation.

y = r + γ \cdot arg max_{a} Q (s^{'}, a; ω^{-})

(9)

The DQN algorithm uses the gradient descent method to update the weight parameters; the update formula is as follows:

ω_{t + 1} = ω_{t} + α [r + γ \cdot arg max_{a} Q (s^{'}, a, ω^{-}) - Q (s, a, ω^{-})] \nabla Q (s, a, ω^{-})

(10)

During the update process, we can adapt the approach by fixing the weight parameters of the original network, updating the weight parameters of the target network, and regularly copying the target network weight parameters to the original network in order to enhance the fitting stability. However, the DQN algorithm involves the weight parameters of the target network in its own update process, which leads to the accumulation of positive errors and the overestimation of the Q value. In this regard, the Double DQN algorithm has been improved on the basis of the DQN algorithm by using the origin Q-network to obtain the best action, which, to some extent, solves the problem of overestimation. In the Double DQN algorithm, the calculation formula for the TD target is as follows:

y = r + γ \cdot arg max_{a} Q (s^{'}, a; ω)

(11)

where s is the size of the transmitted content, and

R_{s, u}, R_{s, s^{'}}, R_{c, u}

are the transmission rates between the fog satellite and users, the fog satellite and its adjacent satellite, and cloud service centers and users, respectively. We set different reward functions based on the transmission delay of cached content in different situations, which are shown in the following equation:

r (t) = \{\begin{matrix} e^{- λ_{1} d_{1}} & f \in s (t) \\ e^{- (λ_{1} d_{1} + λ_{2} d_{2})} & f \in s_{n} (t) \\ e^{- λ_{3} d_{3}} & f \notin s (t) and f \notin s_{n} (t) \end{matrix}

(12)

where

λ_{1} + λ_{2} + λ_{3} = 1, λ_{1} < λ_{2} ≪ λ_{3}

. Therefore, the overall reward function is accumulated from the rewards of all the requested content on each satellite.

R (t) = \sum_{S a t} \sum_{f} r (t)

(13)

3.2. Caching Hit Method

Based on the above settings, we propose a caching strategy based on Double DQN. A flowchart of the method is shown in Figure 4, where a detailed introduction to the algorithm is provided as follows:

Initialization: The local fog satellite selects k contents with a high request frequency as the cache content, while an adjacent satellite caches other k contents outside of the local fog satellite’s cached content.
In each time slot, observe the current state, $s (t)$ , calculate the Q value, and obtain the action, $a (t)$ , taken in that state. After executing the action, observe the new state, $s (t + 1)$ , and obtain the reward, $r (t)$ .
Store the tuple $(s (t), a (t), r (t), s (t + 1))$ as a sample in the replay buffer.
After samples of the buffer have reached a certain number, a random batch of them is selected for training. The TD target of all contents is calculated, and the weight parameters of the evaluation network will be updated with the gradient descent method.
After every certain number of steps, copy the weight parameters of the evaluation network to the target network.

3.3. Performance Benchmark

Our proposed approach (Section 3.2), referred to as the proposed method hereafter, was compared to three representative approaches, namely a Thompson sampling baseline, a Random baseline, and

ε

-greedy approaches to solving the caching problem. Thompson Sampling: In this method, the selection of content to cache on satellites is determined probabilistically using a Bayesian model. Each piece of content is chosen to be cached on a satellite by sampling from the posterior distribution, which is updated based on the historical hit rate and other relevant data. Content items are cached sequentially, with each decision influenced by the feedback obtained from previous cache hit outcomes.

Random algorithm: In this approach, the selection of content to cache on satellites is performed randomly without considering past performance or user preferences. Each piece of content is randomly assigned to a satellite for caching without any probabilistic model guiding the decision-making process. Content items are cached in no particular order, with each allocation being independent of previous cache hit outcomes or historical data.
$ε$ -greedy algorithm: In this approach, the selection of content to cache on satellites combines both the exploration and exploitation strategies. With a probability of $ε$ , a random content item is chosen for caching, allowing for the exploration of new content and satellite combinations. With the remaining $1 - ε$ of the time, the algorithm exploits the known information by selecting the content with the highest estimated cache hit rate for caching on a satellite. This method balances between exploring new content possibilities and exploiting the currently known best options.
CAFR [33]: A cooperative caching scheme for edge computing that leverages FL and DRL to predict popular content and optimize caching strategies, this approach aims to improve cache hit ratios and reduce content transmission delays. It is currently the most outstanding method in the field of edge caching.

All experiments were implemented in Python 3.9.2 and conducted on a Windows machine equipped with an Intel Core i9-12900K processor (16 CPUs, 3.2 GHz), 32 GB of RAM, and an RTX3070 graphics card.

4. Simulation

In the simulation, we constructed a satellite network consisting of 66 satellites distributed across 6 orbital planes. On the ground, we employed Monte Carlo to generate 200 users following a Poisson-point-process (PPP) distribution for simulation settings. As shown in Table 1, the other parameters in our experiments were set according to references [34].

Firstly, we conducted training in the environment of Pytorch. Figure 5 illustrates one instance of the training process, showing that, as the training epochs increase, the decision outcomes of the DRL agent improve progressively. Around the tenth episode, the agent’s performance remains stable within a certain range of fluctuations. The latency stabilizes at around 66.75 ms, and the cache hit rate stabilizes at around 24%, demonstrating the stability and effectiveness of the algorithm.

Subsequently, as depicted in Figure 6 and Figure 7, we compared the proposed method with the performance benchmark introduced in Section 3.3. We varied the cache capacity from 50 to 400 to obtain cache hit rates and content transmission delays under different cache capacities. With the increase in cache capacity, except for the random algorithm exhibiting the worst performance, the cache hit rates of the other three algorithms continuously increased, while the average content transmission delay continuously decreased. When the cache capacity was sufficiently large, the

ε

-greedy algorithm and Thompson algorithm showed similar performance, while our proposed method consistently demonstrated optimal performance.

To demonstrate the distinctiveness and superiority of the proposed method over existing methods, we compared it with the most advanced method, CAFR, in the current edge-caching domain. The content-caching delay was primarily used as the evaluation metric. Figure 8 presents the comparative results, from which it can be inferred that the proposed method effectively addresses the satellite edge-caching issue.

5. Conclusions

This paper has addressed the issue of the content-caching strategy in the context of cloud–fog–edge collaboration scenarios in satellite internet services. We proposed a caching prediction scheme that considers the mobility characteristics of LEO satellites to enhance the cache hit rate of fog satellite nodes. Initially, we analyzed the characteristics of LEO constellations and edge users, establishing a network model. Subsequently, we introduced a caching prediction method based on Double DQN to improve the cache hit rate. The simulation results demonstrate that our proposed method can enhance the cache hit rate by 13% compared to baseline schemes. Based on the analysis, it can be concluded that, with the assistance of the Double DQN algorithm, the cache hit rate of edge-user cache requests can be effectively improved, thereby efficiently saving satellite networks’ bandwidth resources and reducing content-transmission latency.

Author Contributions

Conceptualization, Y.L. and F.S.; methodology, Y.L. and F.S.; software, Y.L. and F.S.; validation, J.Z., Y.L. and F.S.; formal analysis, J.Z., Y.L. and F.S.; investigation, J.Z., Y.L. and F.S.; resources, J.Z., Y.L. and F.S.; data curation, J.Z., Y.L. and F.S.; writing—original draft preparation, Y.L. and F.S.; writing—review and editing, J.Z., Y.L. and F.S.; visualization, J.Z., Y.L. and F.S.; supervision, Y.L. and F.S.; project administration, J.Z.; funding acquisition, J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are contained within the article.

Acknowledgments

The authors acknowledge the contributions of the Aerospace Information Research Institute at the Chinese Academy of Sciences and School of Information and Communication Engineering of Beijing University of Posts and Telecommunications for supporting this work with research facilities and resources.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SFC	Satellite fog computing
Double DQN	Double Deep Q-Network
LEO	low Earth orbit
IoT	Internet of Things
VNI	Visual Networking Index
NDN	Named data networking
QoE	Quality of experience
AP	Access point
RL	Reinforcement learning
TD	Time difference

References

Zhang, X.; Peng, M.; Liu, C. Sensing-Assisted Beamforming and Trajectory Design for UAV-Enabled Networks. IEEE Trans. Veh. Technol. 2023, 73, 3804–3819. [Google Scholar] [CrossRef]
Wang, Y.; Gu, L. Status quo and future development of LEO satellite mobile communication. Commun. Technol. 2020, 53, 2447–2453. [Google Scholar]
Jia, X.; Lv, T.; He, F.; Huang, H. Collaborative data downloading by using inter-satellite links in LEO satellite networks. IEEE Trans. Wirel. Commun. 2017, 16, 1523–1532. [Google Scholar] [CrossRef]
Li, J.; Lu, H.; Xue, K.; Zhang, Y. Temporal netgrid model-based dynamic routing in large-scale small satellite networks. IEEE Trans. Veh. Technol. 2019, 68, 6009–6021. [Google Scholar] [CrossRef]
Sun, Y.; Peng, M.; Zhang, S.; Lin, G.; Zhang, P. Integrated satellite-terrestrial networks: Architectures, key techniques, and experimental progress. IEEE Netw. 2022, 36, 191–198. [Google Scholar] [CrossRef]
Cisco, V. Cisco visual networking index: Forecast and trends, 2017–2022. White Pap. 2018, 1, 10–20. [Google Scholar]
Cisco, U. Cisco Annual Internet Report (2018–2023) White Paper; Cisco: San Jose, CA, USA, 2020; Volume 10, pp. 1–35. [Google Scholar]
Sun, Y.; Chen, S.; Wang, Z.; Mao, S. A joint learning and game-theoretic approach to multi-dimensional resource management in fog radio access networks. IEEE Trans. Veh. Technol. 2022, 72, 2550–2563. [Google Scholar] [CrossRef]
Sellami, Y.; Jaber, G.; Lounis, A. Distributed Fog-based Caching Solution for Content-Centric Networking in IoT. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 493–494. [Google Scholar]
Sellami, Y.; Jaber, G.; Lounis, A.; Lakhlef, H.; Bouabdallah, A. A Cooperative Caching Scheme in Fog/Sensor Nodes for CCN. In Proceedings of the 2022 International Wireless Communications and Mobile Computing (IWCMC), Dubrovnik, Croatia, 30 May–3 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 481–486. [Google Scholar]
Yu, Z.; Hu, J.; Min, G.; Wang, Z.; Miao, W.; Li, S. Privacy-preserving federated deep learning for cooperative hierarchical caching in fog computing. IEEE Internet Things J. 2021, 9, 22246–22255. [Google Scholar] [CrossRef]
Liu, S.; Hu, X.; Wang, Y.; Cui, G.; Wang, W. Distributed caching based on matching game in LEO satellite constellation networks. IEEE Commun. Lett. 2017, 22, 300–303. [Google Scholar] [CrossRef]
Zhu, X.; Jiang, C.; Kuang, L.; Zhao, Z. Cooperative multilayer edge caching in integrated satellite-terrestrial networks. IEEE Trans. Wirel. Commun. 2021, 21, 2924–2937. [Google Scholar] [CrossRef]
Vu, T.X.; Maturo, N.; Vuppala, S.; Chatzinotas, S.; Grotz, J.; Alagha, N. Efficient 5G edge caching over satellite. In Proceedings of the 36th International Communications Satellite Systems Conference (ICSSC 2018), Niagara Falls, ON, Canada, 15–18 October 2018; IET: London, UK, 2018; pp. 1–5. [Google Scholar]
Zhang, H.; Xu, J.; Liu, X.; Long, K.; Leung, V.C. Joint optimization of caching placement and power allocation in virtualized satellite-terrestrial network. IEEE Trans. Wirel. Commun. 2023, 22, 7932–7943. [Google Scholar] [CrossRef]
Xv, H.; Sun, Y.; Zhao, Y.; Peng, M.; Zhang, S. Joint beam scheduling and beamforming design for cooperative positioning in multi-beam LEO satellite networks. IEEE Trans. Veh. Technol. 2023, 73, 5276–5287. [Google Scholar] [CrossRef]
Wang, Q.; Xu, X.; Fan, C. Distributed Resource Management for Multi-node Aggregated Satellite Edge Computing in Satellite-Terrestrial Integrated Internet of Vehicles. In Proceedings of the 2022 IEEE International Conference on Satellite Computing (Satellite), Shenzhen, China, 25–27 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 54–55. [Google Scholar]
Linnabary, R.B.; O’Brien, A.J.; Smith, G.E.; Ball, C.; Johnson, J.T. Using cognitive communications to increase the operational value of collaborative networks of satellites. In Proceedings of the 2019 IEEE Cognitive Communications for Aerospace Applications Workshop (CCAAW), Cleveland, OH, USA, 25–26 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Zhao, R.; Ran, Y.; Luo, J.; Chen, S. Towards coverage-aware cooperative video caching in leo satellite networks. In Proceedings of the GLOBECOM 2022–2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1893–1898. [Google Scholar]
Zhang, J.; Yang, Y.; Sang, H.; Gao, Z.; Song, T. Content-Aware Proportional Caching for Efficient Data Delivery over Satellite Network. In Proceedings of the GLOBECOM 2023–2023 IEEE Global Communications Conference, Kuala Lumpur, Malaysia, 4–8 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 4890–4895. [Google Scholar]
Zhang, T.; Wang, S.; Chen, S.; Zhang, X.; Wang, X.; He, F. Inter-satellite Cache Push Scheme Based on ICN for Low Orbit Earth Satellite Network. In Proceedings of the 2023 11th International Conference on Information Systems and Computing Technology (ISCTech), Qingdao, China, 30 July–1 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 290–295. [Google Scholar]
Liu, Z.; Li, Y.; Zhu, J.; Yao, Q.; Ren, X. User-Driven Cache Replacement Strategy for Satellite-Terrestrial Networks Based on SDN. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 680–688. [Google Scholar]
Wu, H.; Li, J.; Lu, H.; Hong, P. A two-layer caching model for content delivery services in satellite-terrestrial networks. In Proceedings of the 2016 IEEE Global Communications Conference (GLOBECOM), Washington, DC, USA, 4–8 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
Rodríguez-Pérez, M.; Herrería-Alonso, S.; Suárez-Gonzalez, A.; López-Ardao, J.C.; Rodríguez-Rubio, R. Cache Placement in an NDN Based LEO Satellite Network Constellation. IEEE Trans. Aerosp. Electron. Syst. 2022, 59, 3579–3587. [Google Scholar] [CrossRef]
Bommaraveni, S.; Vu, T.X.; Chatzinotas, S.; Ottersten, B. Active Popularity Learning with Cache Hit Ratio Guarantees using a Matrix Completion Committee. In Proceedings of the 2020 IEEE 31st Annual International Symposium on Personal, Indoor and Mobile Radio Communications, London, UK, 31 August–3 September 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
Xu, Y.; Han, Y. ALIRS: A High Scalability and High Cache Hit Ratio Replacement Algorithm. In Proceedings of the 2011 International Conference on Computational and Information Sciences, Chengdu, China, 21–23 October 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 66–70. [Google Scholar]
Chen, X.; He, L.; Xu, S.; Hu, S.; Li, Q.; Liu, G. Hit ratio driven mobile edge caching scheme for video on demand services. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1702–1707. [Google Scholar]
Zhong, G.; Yan, J.; Kuang, L. QoE-driven social aware caching placement for terrestrial-satellite networks. China Commun. 2018, 15, 60–72. [Google Scholar] [CrossRef]
Han, D.; Peng, H.; Wu, H.; Liao, W.; Shen, X.S. Joint cache placement and content delivery in satellite-terrestrial integrated C-RANs. In Proceedings of the ICC 2021-IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Zhao, R.; Luo, J.; Ran, Y. Coverage-Aware Cooperative Caching and Efficient Content Distribution Schemes in LEO Satellite Networks. In Proceedings of the 1st ACM MobiCom Workshop on Satellite Networking and Computing (SatCom ’23). Association for Computing Machinery, New York, NY, USA, 6 October 2023; pp. 31–36. [Google Scholar]
Qiu, C.; Yao, H.; Yu, F.R.; Xu, F.; Zhao, C. Deep Q-learning aided networking, caching, and computing resources allocation in software-defined satellite-terrestrial networks. IEEE Trans. Veh. Technol. 2019, 68, 5871–5883. [Google Scholar] [CrossRef]
Wang, R.; Kishk, M.A.; Alouini, M.S. Evaluating the Accuracy of Stochastic Geometry Based Models for LEO Satellite Networks Analysis. IEEE Commun. Lett. 2022, 26, 2440–2444. [Google Scholar] [CrossRef]
Wu, Q.; Zhao, Y.; Fan, Q.; Fan, P.; Wang, J.; Zhang, C. Mobility-aware cooperative caching in vehicular edge computing based on asynchronous federated and deep reinforcement learning. IEEE J. Sel. Top. Signal Process. 2022, 17, 66–81. [Google Scholar] [CrossRef]
Zhou, J.; Sun, Z.; Zhang, R.; Lin, G.; Zhang, S.; Zhao, Y. A cloud-edge collaboration CNN-based routing method for ISAC in LEO satellite networks. In Proceedings of the 2nd Workshop on Integrated Sensing and Communications for Metaverse, Helsinki, Finland, 18 June 2023; pp. 25–29. [Google Scholar]

Figure 1. Network model.

Figure 2. Cache structure.

Figure 3. Process of DQN algorithm.

Figure 4. Flowcharts.

Figure 5. Cache hit radio and content transmission delay of each episode in training.

Figure 6. Cache hit radio with different cache capacities.

Figure 7. Content transmission delay with different cache capacities.

Figure 8. Content transmission delay compared to CAFR.

Table 1. Variable description.

Variable	Description
LEO satellite bandwidth	500 MHz
LEO satellite altitude	1000 km
LEO satellite antenna gain	40 dBi
UE antenna gain	30 dBi
Number of UEs	50–400
Number of Sats	66

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luan, Y.; Sun, F.; Zhou, J. A Service-Caching Strategy Assisted by Double DQN in LEO Satellite Networks. Sensors 2024, 24, 3370. https://doi.org/10.3390/s24113370

AMA Style

Luan Y, Sun F, Zhou J. A Service-Caching Strategy Assisted by Double DQN in LEO Satellite Networks. Sensors. 2024; 24(11):3370. https://doi.org/10.3390/s24113370

Chicago/Turabian Style

Luan, Yuchen, Fukun Sun, and Jiaen Zhou. 2024. "A Service-Caching Strategy Assisted by Double DQN in LEO Satellite Networks" Sensors 24, no. 11: 3370. https://doi.org/10.3390/s24113370

APA Style

Luan, Y., Sun, F., & Zhou, J. (2024). A Service-Caching Strategy Assisted by Double DQN in LEO Satellite Networks. Sensors, 24(11), 3370. https://doi.org/10.3390/s24113370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Service-Caching Strategy Assisted by Double DQN in LEO Satellite Networks

Abstract

1. Introduction

1.1. Background and Motivations

1.2. Related Works

1.3. Contribution and Organization

2. System Model

2.1. Network Model

2.2. Mobility Model

2.3. Communication Model

2.4. Caching Model

3. Implementation

3.1. Introduction to Deep Reinforcement Learning

3.2. Caching Hit Method

3.3. Performance Benchmark

4. Simulation

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI