LFDC: Low-Energy Federated Deep Reinforcement Learning for Caching Mechanism in Cloud–Edge Collaborative

Zhang, Xinyu; Hu, Zhigang; Zheng, Meiguang; Liang, Yang; Xiao, Hui; Zheng, Hao; Xu, Aikun

doi:10.3390/app13106115

Open AccessArticle

LFDC: Low-Energy Federated Deep Reinforcement Learning for Caching Mechanism in Cloud–Edge Collaborative

by

Xinyu Zhang

¹

,

Zhigang Hu

^1,*,

Meiguang Zheng

¹,

Yang Liang

^1,2,

Hui Xiao

¹,

Hao Zheng

¹ and

Aikun Xu

¹

School of Computer Science, Central South University, Changsha 410083, China

²

School of Informatics, Hunan University of Chinese Medicine, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(10), 6115; https://doi.org/10.3390/app13106115

Submission received: 5 April 2023 / Revised: 16 April 2023 / Accepted: 17 April 2023 / Published: 16 May 2023

(This article belongs to the Special Issue Edge and Cloud Computing Systems and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The optimization of caching mechanisms has long been a crucial research focus in cloud–edge collaborative environments. Effective caching strategies can substantially enhance user experience quality in these settings. Deep reinforcement learning (DRL), with its ability to perceive the environment and develop intelligent policies online, has been widely employed for designing caching strategies. Recently, federated learning, when combined with DRL, has been in gaining popularity for optimizing caching strategies and protecting data training privacy from eavesdropping attacks. However, online federated deep reinforcement learning algorithms face high environmental dynamics, and real-time training can result in increased training energy consumption despite improving caching efficiency. To address this issue, we propose a low-energy federated deep reinforcement learning strategy for caching mechanisms (LFDC) that balances caching efficiency and training energy consumption. The LFDC strategy encompasses a novel energy efficiency model, a deep reinforcement learning mechanism, and a dynamic energy-saving federated policy. Our experimental results demonstrate that the proposed LFDC strategy significantly outperforms existing benchmarks in terms of energy efficiency.

Keywords:

cloud–edge collaborative environments; caching strategies; deep reinforcement learning (DRL); low-energy federated deep reinforcement learning strategy for caching mechanisms (LFDC); energy efficiency

1. Introduction

In recent years, with the rapid development of cloud computing and the Internet of Things (IoT), the combination of cloud and edge has become one of the research hotspots. Traditional cloud computing has the advantage of strong resource service capability but the disadvantage of distant transmission [1], while edge computing has the advantage of low transmission latency but limited resource constraints [2]. Cloud–edge integration refers to a computing mode that combines cloud-computing resources with edge-computing resources. The goal is to make more efficient use of edge-computing resources, while also fully leveraging the advantages of cloud computing [3,4].

Caching plays a pivotal role in reducing bandwidth traffic and the number of remote server accesses, thereby enhancing the system response time and reliability in the collaborative environment of cloud–edge computing [5]. However, caching in cloud–edge environments poses several challenges. Firstly, limited hardware resources impose constraints on cache capacity [6,7]. Secondly, restricted network bandwidth may result in prolonged response times for the cache when cache hit rates are low [8]. Cloud–edge caching is typically divided into cloud cache and edge cache, and a caching policy is employed to ensure that popular content is cached as much as possible at the edge. However, existing cloud–edge caching policies face two major problems. Firstly, due to the heterogeneous resources and communication links of the edge base station (BS), such caching strategies can become exceedingly complex. Secondly, the dynamics of user demands lead to real-time changes in content popularity, necessitating time-varying cache content. This requires caching policies capable of adapting in real time to the dynamic caching environment. Therefore, effectively managing caching in real time to improve cache hit rates and reduce response time remains a persistent and crucial challenge.

DRL, an AI model that can dynamically generate policies based on the environment, has seen successful applications in various cloud–edge collaborative environments, including wireless access network mode selection and resource management [9], autonomous driving strategy [10], and computation offloading optimization [11]. In the context of cache optimization, DRL methods have been utilized to improve cache efficiency by predicting content popularity [12] and dynamically adjusting cache policies [13,14,15]. Despite these achievements, using DRL-based caching strategies in cloud–edge environments still poses two significant challenges: (1) limited hardware resources, which leads to non-negligible energy consumption during DRL training, and (2) the need to jointly schedule cloud and edge storage resources for caching, which raises security and privacy concerns. Federated learning can be conducted at local BSs for learning and inference, and models can be aggregated and distributed in the cloud. Therefore, under federated learning, each BS can take advantage of its local characteristics while also learning from the experiences of other BSs, allowing it to adjust caching policies in real-time while maintaining data privacy. Building on the work of [16] that proposed dynamically adjusting the number of local iterations depending on resource utilization in federated learning, we propose a new federated deep reinforcement learning framework that can effectively reduce energy consumption while ensuring cache performance through the sharing and collaboration of models among multiple nodes. The federated nature of the learning framework maximizes data security and prevents data leaks. This paper’s main contributions include the following:

(1): Creating a system model for the cache environment that combines cloud and edge, including network, cache performance, DRL, and energy-consumption models, and proposing a new objective function of the cache energy efficiency ratio to balance performance and energy consumption.
(2): Designing a new federated deep reinforcement learning method that dynamically adjusts cloud aggregation and edge training or decision making to reduce energy consumption while ensuring cache efficiency.
(3): Presenting a series of simulation experiments that demonstrate the effectiveness of the proposed strategies in reducing the training energy consumption of the system while maintaining cache performance.

This article is structured as follows. In Section 2, we conduct a comprehensive review of related literature. Section 3 outlines the system’s model and problem definition. Section 4 presents our proposed federated mechanism for achieving energy-efficient caching. The effectiveness of our proposed strategies is validated through simulation experiments in Section 5. Finally, Section 6 concludes the paper by summarizing our contributions.

2. Related Works

Research on the cache-enabled cloud–edge collaborative architecture can be broadly categorized into two groups.

The first group employs traditional methods based on convex optimization or probabilistic modeling to solve the content cache placement problem in cloud–edge environments. This category includes classic algorithms, such as least recently used (LRU) [17], first in first out (FIFO) [18], and least frequently used (LFU) [19]. However, these methods disregard the popularity trend of the content, which can result in significant performance degradation [12]. K. Poularakis et al. [20] optimize the hit rate by jointly optimizing content placement and user association. Poularakis et al. [21] propose a greedy service placement algorithm that maximizes the number of service requests by considering collaborative service placement, but it is not able to account for random requests in a dynamic environment. Similarly, Yang et al. [22] address the joint service caching and load scheduling problem to minimize the average request delay. Zhang et al. [23] employ probabilistic caching in heterogeneous networks to enhance traffic offloading and optimize edge device cache resources. Since cloud–edge collaboration is constantly evolving, the content cache problem in cloud–edge environments requires continuous optimization. Nevertheless, these strategies are often challenging to adapt to the dynamic cloud–edge environment.

The second category of research pertains to machine learning algorithms, such as machine learning, deep learning, and federated learning [24]. This approach involves learning key attribute features in the cloud–edge environment, including user request behavior, content popularity, and user mobility distribution, in order to optimize content cache strategies. Reinforcement learning [25] and deep reinforcement learning [15] exhibit significant potential in designing content cache strategies. Specifically, Rathoret et al. [26] propose a proactive caching framework based on deep learning that achieves significant improvements in feedback link and QoE. Wan et al. [27] employ an improved DRL method to optimize caching for enhancing video streaming quality of service. To mitigate the excessive network resource consumption of deep reinforcement learning during the transmission of training and testing data, Wang et al. [13] propose a caching framework based on federated deep reinforcement learning. Li et al. [12] present the CVC video caching framework based on federated deep reinforcement learning to reduce communication costs.

While machine-learning-based methods can achieve higher caching efficiency by adapting to the caching environment, they themselves consume significant amounts of energy. Reducing training energy consumption offers three benefits: (1) enhancing system efficiency alleviates base station energy burden, and lowers operational costs; (2) improving device reliability and stability, and reducing maintenance and replacement costs; and (3) reducing the system’s carbon footprint, supporting environmentally responsible edge computing. To address this issue, we propose a novel federated deep reinforcement learning mechanism that incorporates energy-saving properties to reduce the energy consumption associated with machine learning.

3. System Model and Problem Formulation

This section models the network topology, cache hit rate, deep reinforcement learning and its energy consumption model, respectively, and presents a cache energy efficiency ratio problem. Some of the key parameters are shown in Table 1.

3.1. Networks Model

As shown in Figure 1, in our network architecture, all data are sourced from cloud servers. The core network communicates with the cloud through a backhaul network and with edge base stations (BSs) through frontier networks. Each BS is equipped with an access point that provides wireless connectivity to users. When the requested content is available in the cache of a BS within its service range, it is referred to as a cache hit.

N = \{1, \dots, n, \dots, N\}

represents the set of BSs, each with a cache capacity of

C_{n}

for storing content.

U = \{1, \dots, u, \dots, U\}

represents the set of user equipment (UE), randomly distributed within the service range of the BSs.

F = \{1, \dots, f, \dots, F\}

represents the set of content items, each with its request frequency

r_{u, f}

.

T = \{1, \dots, t, \dots, T\}

represents the epochs of aggregation/distribution for federated learning.

For the sake of illustration, we assume that the size of all content is equal to M bits. This assumption has been widely considered in previous studies (e.g., [28,29,30]) because large contents are often divided into roughly equal-sized chunks. BSs can directly communicate with other BSs to retrieve the requested content from neighboring BSs. Since retrieving content from neighboring BSs is faster and less costly than retrieving it from cloud servers, priority is given to retrieving content from the cloud last. There are three states when a user requests content, namely local hit, retrieval from a neighboring BS, and retrieval from the cloud. That is, when a request arrives at a local BS, if the requested content is cached locally, the local BS immediately returns the cached content. If the local BS misses, it retrieves the content from a neighboring BS and returns it to the user. If the neighboring BS also misses, the content is retrieved from the cloud to serve the request.

3.2. Cache Performance Model

The caching performance is primarily determined by the quality of user service, namely, the latency in accessing the content. To this end, it is necessary to establish an overall delay model for cloud–edge caching and to ensure that the requested content by the user is represented as a hit in its local BS (resulting in minimum content transfer delay). Similar to [31], the transmission medium of content between the BS and UE is wireless and is determined by the Shannon formula:

v_{u, n} = B log (1 + \frac{q_{u} g_{u, n}}{ϖ + \sum_{i \in U \ {u} : a_{i} = a_{u}} q_{i} g_{i, n}})

(1)

where B represents the channel bandwidth,

ϖ

represents the background noise,

q_{u}

represents the transmission power between the BS_n and the UE u, and

g_{u, n}

represents the channel gain.

a_{i} = a_{u}

represents that UE u has the content it needs.

Caching content raises the issue of content popularity. High content popularity indicates that users request the content more frequently, while low content popularity indicates that users request the content less frequently. Therefore, content popularity can directly affect cache performance. As a result, edge BSs should cache more popular content. In this article, content popularity is defined as a measure of user preference for content f. Similar to [13],

d_{U - B}

is defined as the transmission delay between the user and the BS.

d_{U - B} = \sum_{f \in U} \sum_{f \in F} P_{u, f} \frac{D_{f}}{v_{u, n}}

(2)

where

D_{f}

represents the size of content f, and

P_{u, f}

is the UEs’ preferences for content f:

P_{u, f} = r_{u, f} / ℜ_{u}

(3)

\begin{array}{l} s u b j e c t t o : \\ f \in F \\ \sum_{f \in U, f \in F} P_{u, f} = 1 \end{array}

ℜ_{u}

represents the total number of requests made by UE u, while

r_{u, f}

represents the number of requests made by UE u for content f. Similarly, we can obtain the transmission delay between the cloud and the edge:

d_{C - B} = \sum_{f \in U} \sum_{f \in F} P_{u, f} \frac{D_{f}}{v_{a}}

(4)

v_{a}

denotes the transmission rate from the cloud to the edge BS, and similarly, by setting

v_{b}

as the transmission rate between BSs, we can obtain the transmission delay between BSs:

d_{B - B} = \sum_{f \in U} \sum_{f \in F} P_{u, f} \frac{D_{f}}{v_{b}}

(5)

The cache hit rate can directly evaluate the effectiveness of caching policies and reflect the performance of caching systems. For cloud–edge caching systems, the cache hit efficiency is another way of expressing cache efficiency, which is more representative of the efficiency of caching in real scenarios because it is closely related to the transfer latency in various cases [15]. As the edge has a faster response time than the cloud, caching at the edge for a requested content is considered a cache hit. Taking into account the issue of transmission delay, in the t-th round of federation, the cache hit efficiency for all contents within a single BS can be represented as

H_{t, n} = \sum_{u \in U} \sum_{f \in F} \frac{d_{U - B}}{d_{U - B} + x_{f, B} d_{B - B} + x_{f, C} d_{B - C}}

(6)

where

x_{f, B}

and

x_{f, C}

are decision variables representing whether the requested content f comes from the neighboring BS (

x_{f, B} = 1

if yes, 0 otherwise), or the cloud (

x_{f, C} = 1

if yes, 0 otherwise). Therefore, the formula for the cache hit efficiency of the system in the t-th epoch of federation is as follows:

H_{t, s y s t e m} = \sum_{n \in N} H_{t, n}

(7)

\begin{array}{l} s u b j e c t t o : \\ f \in F \\ n \in N \\ u \in U \\ x_{B} a n d x_{c} \in \{0, 1\} \\ \sum_{f \in F} P_{u, f} = 1 \end{array}

Based on the paper [30], it is known that maximizing the cache hit efficiency, i.e., minimizing the average delay of the system, is an NP-complete problem. We prove this in the next section. Moreover, achieving satisfactory results through heuristic algorithms such as GREEDYAD is difficult [31]. The difficulty in obtaining effective strategies with heuristic algorithms for edge caching can be attributed to their inability to adequately address dynamic environments, heterogeneous resources, scalability, and collaboration requirements. Therefore, we introduce an approach to optimize the caching efficiency through DRL methods.

3.3. Np-Complete Proof

Theorem 1.

The cloud–edge caching problem is an NP-complete problem.

Proof of Theorem 1.

If the set cover (SC) problem for the NP-complete set can be reduced to a cache placement problem, then we can prove that the decision problem for cache placement is NP-complete. Firstly, we define two sets, the first being set

G = \{g_{1}, g_{2}, \dots, g_{I}\}

, and the second being

H = \{h_{1}, h_{2}, \dots, h_{J}\}

, representing the subsets of G, where each element

h_{j}

has a weight. The objective of the SC problem is to select a subset from H such that the union of all elements in the subset equals the set G, and the sum of the weights is either minimized or maximized.

The steps to simplify the SC problem to the cache placement problem are as follows: Firstly, set the number of contents to 1. Let

G = \{g_{1}, g_{2}, \dots, g_{I}\}

represent all BSs, and let

H = \{h_{1}, h_{2}, \dots, h_{J}\}

represent the different ways to content caching on BSs. For instance,

h_{1} = \{1, 3, 5\}

denotes caching content on

g_{1}

,

g_{3}

, and

g 5

, while

h_{2} = \{2, 4, 6\}

denotes caching content on

g_{2}

,

g_{4}

, and

g_{6}

. Then, let the cache value and link cost respectively denote the weight of each subset.

Therefore, our problem can be formulated as an SC problem as follows: we have a set

V = \{v_{1}, v_{2}, \dots, v_{N}\}

for all edge servers, and each element in the set

C = \{c_{1}, c_{2}, \dots, c_{K}\}

represents all possible caching decisions for a piece of content on all BSs. For each caching decision

c_{k}

, there is a cache hit efficiency. The cache hit efficiency of caching decision

c_{k}

is the content caching revenue obtained with

c_{k}

. The objective of our problem is to find a subset

C^{'} = \{c_{1}, c_{2}, \dots, c_{K}^{'}\}

, which is a subset of C, and the union of all elements in

C^{'}

equals V such that the sum of all the content caching revenues is maximized. □

The set cover problem can be simplified to the content caching placement problem through the transformation described above. By integrating practical variables into the framework of the set cover problem, we identified that these practical variables correspond to the general variables in the set cover problem. In essence, our problem takes into account the same factors as the set cover problem and can be considered a variation of it. Consequently, the data caching problem is classified as an NP-complete problem.

3.4. Drl Energy Consumption Model

In the context of cloud computing, the significant performance advantages of cloud-based operations outweigh the energy consumed in transmitting and aggregating model parameters. Accordingly, this study concentrates solely on the energy expenditure involved in training at the edge, encompassing the local training energy consumption and the energy consumption associated with the transmission of model parameters at the edge. As demonstrated in the research conducted by [32], the computational burden is approximately proportional to the data volume in a nearly linear fashion. Specifically, assuming that a base station (BS) trains a single data sample, it necessitates

c_{n}

CPU cycles per bit. Taking into account the training data volume

D_{t}

, the total energy consumption of

B S_{n}

during the t-th training epoch is expressed in accordance with [33,34]:

E_{t, n}^{c m p} = L_{n} E_{t, n}^{c m p} = L_{n} ζ_{n} c_{n} D_{t, n} f r e_{n}^{2}

(8)

\begin{array}{l} s u b j e c t t o : \\ L_{n} \geq 1 \\ n \in N_{1} \\ N_{1} \subset N \end{array}

The effective capacitance coefficient of the computing chip group of BS_n is denoted as

ζ_{n}

, while

L_{n}

represents the count of local training iterations conducted by BS_n. To compute the local model, BS_n is required to perform CPU cycles. Based on Equation (8), it is apparent that the energy expenditure associated with local training can be appropriately regulated by adjusting the number of local training iterations

L_{n}

. Assuming that

P_{n}

signifies the transmission power of

B S_{n}

, the energy consumption involved in uploading the model can be derived as follows:

E_{t, n}^{u p} = P_{n} \frac{℘_{t, n}}{v_{a}}

(9)

\begin{array}{l} s u b j e c t t o : \\ n \in N_{3} \\ N_{3} \subset N \end{array}

Here,

℘_{n}

represents the model size in

B S_{n}

. The total energy consumption of the system in the t-th epoch is

E_{t, s y s t e m} = \sum_{n \in N_{1}} E_{t, n}^{c m p} + \sum_{n \in N_{2}} E_{t, n}^{u p}

(10)

\begin{array}{l} s u b j e c t t o : \\ N_{1} \subset N \\ N_{3} \subset N \end{array}

In this context,

N_{1}

denotes the group of BSs that necessitate training during the t-th epoch, while

N_{3}

signifies the set of BSs that require model aggregation during the same epoch. The selection of

N_{1}

and

N_{3}

shall be performed utilizing a dynamic and energy-efficient federated approach, which will be elaborated comprehensively in Section 4.

3.5. Problem Formulation

This section is devoted to the investigation of optimizing cache hit efficiency and base station (BS) selection for online training with the aim of maximizing the ratio of cache efficiency to training energy consumption. The total number of federated aggregation epochs is represented by T. To improve the cache performance while minimizing training energy consumption, this study proposes a hit efficiency to energy (HE) gain metric, defined as the ratio of the hit efficiency to the energy consumption of the selected BSs. The objective is formulated as follows:

P 1 : max \sum_{t \in T} \frac{H_{t, s y s t e m}}{E_{t, s y s t e m}}

(11)

\begin{array}{l} s u b j e c t t o : \\ N_{1} \subset N \\ N_{3} \subset N \\ L_{n} \geq 1 \\ f \in F \\ n \in N \\ u \in U \\ x_{B} a n d x_{c} \in \{0, 1\} \\ \sum_{f \in F} P_{u, f} = 1 \end{array}

The application potential of energy efficiency ratios, such as the one proposed herein, is considerable. Specifically, the aim of attaining reliable training efficiency while simultaneously reducing energy consumption in the context of online training can be expanded to a broader range of scenarios.

4. Low-Energy Federated Deep Reinforcement Learning for Caching

This section first designs the local iterative DRLs, followed by a systematic planning of the approach for optimizing federated learning.

4.1. Local DRL Model Design

When a request for content is received by a local base station (BS) from user equipment (UE), the cloud–edge system conducts a search for the content and subsequently delivers it to the UE. The content may be stored in the local BS, neighboring BSs, or the cloud. The objective is to cache the content proactively prior to user requests in order to ensure high cache hit efficiency. To achieve this goal, the proactive caching policy is updated in each epoch of global federation, based on the local deep reinforcement learning (DRL) training performance of the BS. Therefore, it is essential to design the training process of the local DRL. Generally, the design of DRL involves three steps: (1) designing the system state, action, and reward; (2) defining the caching policy based on the system state, action, and reward; and (3) continuously optimizing the caching policy through online iteration.

DRL distinguishes itself from reinforcement learning (RL) by utilizing deep network stacking, which enhances the generalization capability of the trained neural network as the sample data grow larger. The first step in DRL is to define the state

s_{i}

in the environment, followed by selecting an action a based on

s_{i}

and the action probability. This is then followed by a new state

s_{i + 1}

and a reward R. In the context of the cloud–edge collaborative caching environment, the cache state is defined for content f at each decision i. We model the content replacement process in the cache of BS as an Markov decision process (MDP) [35]. The states, system actions, and rewards of the cache and request are shown below:

s_{i} = (s_{i, u}^{r}, s_{i, n}^{c})

(12)

Let

s_{i, u}^{r} : = s_{i, u, f}^{r}, f \in U, f \in F

and

s_{i, n}^{c} : = s_{i, n, f}^{c}, n \in N, f \in F

denote the content request state and content cache state, respectively.

s_{i, u, f}^{r} = 1

and

s_{i, n, f}^{c} = 1

indicate that the UE u requests content f and content f has been cached in BS_n, respectively. Each iteration i has F elements, and the set of epochs is the cache state over a period of time. This also indicates the cache state required for one iteration during online training.

Since the UE can obtain the requested content in the local BS, neighboring BSs or the cloud, there are three types of actions for the agent in a decision epoch:

a_{i}^{l o c a l}

replacing the content to the local edge node,

a_{i}^{B S - B S}

neighbouring edge nodes collaborating on the content request, and

a_{i}^{c l o u d}

processing the content request in the cloud. Then the action policy under state

s_{i}

is defined as

ϕ (s_{i}) = {a_{i}^{l o c a l}, a_{i}^{B S - B S}, a_{i}^{c l o u d}}

(13)

where

a_{i}^{l o c a l} = {a_{i, 1}^{l o c a l}, a_{i, 2}^{l o c a l}, \dots, a_{i, f}^{l o c a l}}

denotes the local processing action,

a_{i, f}^{l o c a l} \in {0, 1},

f \in F

, and

a_{i, f}^{l o c a l} = 1

indicates that content f needs to be replaced by the requested content in iteration i; otherwise,

a_{i, f}^{l o c a l} = 0

. If the requested content is not cached in the local BS, it needs to be processed in the neighboring BS.

a_{i}^{B S - B S} = {a_{i, 1}^{B S - B S}, a_{i, 2}^{B S - B S}, \dots, a_{i, N}^{B S - B S}}

denotes the processing action of the neighboring BSs,

a_{i, n}^{B S - B S} \in {0, 1}

, and

a_{i, n}^{B S - B S} = 1

indicates that the requested content is provided in BS_n. If the UE is unable to get its requested content in either BS, the local BS has to decide whether to forward the request to the cloud for processing. Let

a_{i}^{c l o u d} \in {0, 1}

, and

a_{i}^{c l o u d} = 1

means the request is handed over to the cloud for processing.

After an action is performed under state

s_{i}

, a reward is obtained. From formula (7), it can be seen that the reward of the cache should maximize the cache hit efficiency. Therefore, the cache reward

R_{i, n} (s_{i}, ϕ (s_{i}))

is represented in iteration i as follows:

R_{i, n} (s_{i}, ϕ (s_{i})) = H_{i, n}

(14)

when making caching decisions, it is necessary to ensure that each BS does not cache more files than its capacity. The capacity limits are defined as follows:

\sum_{f \in F} s_{i, n, f}^{c} C_{f} \leq C_{n}

(15)

MDP is usually defined as a transition

δ_{i} = {s_{i}, ϕ (s_{i}), R_{i, n} (s_{i}, ϕ (s_{i})), s_{i + 1}}

under DRL. By learning in a series of transitions

δ_{i}

, DRL can give the best strategy to maximize the return in the current state. The action–value function Q function is used to evaluate the expectation starting at state

s_{i}

. Then, the Q function is defined as

Q (s_{i}, ϕ (s_{i})) = E_{ϕ} [\sum_{i = 1}^{\infty} γ^{i - 1} R (s_{i}, ϕ (s_{i})) | s_{i}; ϕ]

(16)

where

γ \in (0, 1]

is the factor discounting future rewards. The optimal value

Q^{*} (s_{i}, ϕ (s_{i})) = {m a x}_{ϕ} Q_{ϕ} (s_{i}, ϕ (s_{i}))

, and

Q^{*} (s_{i}, ϕ (s_{i}))

can be written as follows:

Q^{*} (s_{i}, ϕ (s_{i})) = E_{s_{i + 1}, ϕ} [R_{i} + γ {max}_{ϕ} Q^{*} (s_{i + 1}, ϕ) | s_{i}, ϕ]

(17)

A local BS training model was specifically developed utilizing the double Q-learning (DDQN) algorithm as described in [36], with a focus on states, actions, and rewards as depicted in Figure 2. Two deep neural networks were employed to learn an equation

Q (s_{i}, ϕ (s_{i}); w)

that approximates

Q^{*} (s_{i}, ϕ (s_{i}))

. The objective of

Q^{*} (s_{i}, ϕ (s_{i}); w)

is

y_{i} = R_{i} + γ {max}_{ϕ} Q (s_{i + 1}, ϕ; w)

. Consequently, the parameters w can be updated to optimize

Q^{*} (s_{i}, ϕ (s_{i}))

.

Figure 2 illustrates the training process of DDQN, where every individual local BS maintains a finite-size experience replay pool, which is refreshed with the latest transitions. The Q network (MainNet) of the deep reinforcement learning (DRL) framework is leveraged to determine system actions, while the

\hat{Q}

network (TargetNet) is utilized to evaluate the selected actions. The

\hat{w}

in the

\hat{Q}

network is periodically updated with

w_{i}

in the Q network. During the training process, the agent randomly selects a minibatch

δ_{i}

from the experience replay pool and trains the Q network by minimizing the loss function for each iteration,

l o s s (w_{i}) = E_{δ_{i}} [{(R (s_{i}, ϕ) + γ \hat{Q} (s_{i}, \underset{ϕ_{i + 1}}{\arg \max} Q (s_{i + 1}, ϕ_{i + 1}; w_{i}); \hat{w_{i}}) - Q (s_{i}, ϕ_{i}; w_{i}))}^{2}]

(18)

Thus, we can obtain the gradient update formula for

w_{i}

,

\begin{matrix} \nabla_{w_{i}} l o s s (w_{i}) = E_{δ_{i}} [(R (s_{i}, ϕ_{i}) + γ \hat{Q} (s_{i}, \underset{ϕ_{i + 1}}{\arg \max} Q (s_{i + 1}, ϕ_{i + 1}; w_{i}); w_{i + 1}) - \\ Q (s_{i}, ϕ_{i}; w_{i})) \cdot \nabla_{w_{i}} Q (s_{i}, ϕ_{i}; w_{i})] \end{matrix}

(19)

The online iterative process used by DDQN for caching is represented by Algorithm 1. First, each BS initializes MainNet and TargetNet based on its historical information and obtains the initial reply memory

Ω

(shown as lines 1–5). Next, DDQN is executed to train the caching process and update all parameters provided that there is no requested content f in BS_n and the BS_n storage is full (shown as lines 8–17).

Algorithm 1 Local DDQN process for caching.

1:: for each BS_n do
2:: Initialize replay memory $Ω$ .
3:: Initialize action-value function Q with random weights $w_{0}$ .
4:: Pretraining the MainNet and TargetNet by $w_{0}$ and $\hat{w_{0}}$ from local historic information.
5:: Save ${s_{0}, ϕ_{0}, R_{0}, s_{1}}$ in $Ω$ .
6:: end for
7:: for each BS_n, each iteration $i \in L_{n}$ do
8:: $B S_{n}$ receive a request content f.
9:: if $s_{i, n, f}^{c} = 1$ then
10:: Provide content directly f to UE u.
11:: else
12:: Receive the current state $s_{i}$ , and execute $ϕ_{i}^{*} = a r g m a x Q (s_{i}, ϕ (s_{i}); w_{i})$ .
13:: Observe current reward $R_{i}$ by formula (14) and next state $s_{i + 1}$ .
14:: Store transition ${s_{i}, ϕ_{i}, R_{i}, s_{i + 1}}$ to $Ω$ .
15:: Sample random mini-batch $δ_{i}$ of $Ω$ .
16:: Update MainNet $w_{i}$ and TargetNet $\hat{w_{i}}$ by gradient descent as formula (19).
17:: Update the caching state $s_{i}$ .
18:: end if
19:: end for

4.2. Federated Mechanism

Owing to its distributed and privacy-preserving attributes, federated learning has garnered widespread adoption in domains such as cloud computing, edge computing, and device-to-device (D2D) computing [37]. By leveraging the federated approach to optimize the training models of multiple agents, superior efficiency can be achieved as compared to centralized training, while ensuring complete privacy. Consequently, this manuscript puts forth a novel dynamic energy-conserving federated mechanism to tackle the challenge of the energy efficiency ratio in cache for cloud–edge systems.

Due to the regional characteristics of local BSs, their training strategies are more tailored to local demands. In order to ensure that BSs with local advantages do not lose their cache efficiency due to the federated mechanism, it is also necessary to provide sufficient training for BSs with lower cache efficiency. The basic idea of the proposed mechanism is to determine the aggregation and training of BSs based on their data request quantity and cache hit efficiency. Specifically, the cache hit efficiency of each BS is calculated at the end of each epoch t. If a BS has a lower data request quantity, it will not participate in the aggregation process (which will not affect the effectiveness of the global model) and save sent power, while BSs with higher cache hit efficiency will not participate in online training (as local BSs have their own cache advantages at this time).

This approach has two issues. On the one hand, some BSs may have a higher hit efficiency due to receiving fewer requests, but the model is not sufficiently trained. Aggregating the model at this point will reduce the performance of the global model. On the other hand, some BSs may have a lower hit efficiency due to receiving too many content requests, but this type of model has a strong generalization ability and can significantly improve the performance of the global model. To this end, we dynamically partition the local BSs and incorporate a mechanism for adjusting the local iteration count dynamically to ensure cache efficiency while reducing energy consumption.

Assuming that each BS possesses adequate computational capabilities to execute the training tasks, the dynamic federated mechanism proposed in this study is deployed across both cloud and edge BSs, as demonstrated in Figure 3 and Algorithm 2. The said mechanism comprises three distinct steps.

Algorithm 2 Dynamic low-power federated DRL for caching (LFDC).

Initialized process in BSs:

1:: Initialize $w_{0, n} = w_{0}$ .
2:: Initialize number of local iterations $L_{0}$ , global epoch T, and step size $η$ .
3:: Pretrained parameters $w_{0, n}$ from Algorithm 1 (lines 1–5).
4:: for $t = 1, 2, 3, \dots, T$ do

Aggregate model and select BSs process in cloud:

5:: Receive $w_{t, n}$ from $N_{3}$ BSs.
6:: Update global model $w_{t + 1}$ according to (26).
7:: Computing weight factor $β$ by (22), and set $L_{t, n} = [β_{t, n} L_{0}]$ .
8:: $N_{1} = {n | ℜ_{n} > ℜ_{t h r e s h o l d}$ ∧ $H_{n} < H_{t h r e s h o l d}}$ , $N_{2} = N \ N_{1}$ .
9:: Send training requests and global models to BS_n that need to participate in training ( $N_{1}$ ).

Local online training process in BSs:

10:: for each BS_n do
11:: if $n \in N_{1}$ then
12:: Processing online training by Algorithm 1 (lines 7–19)
13:: Set local parameter $w_{t, n}$ by (23).
14:: else
15:: Processing strategy of $t - 1$ epoch in BS_n ( $N_{2}$ ).
16:: end if
17:: Computing $ℜ_{n}$ and $H_{n}$ of BS_n.
18:: $N_{3} = {n | ℜ_{n} > ℜ_{t h r e s h o l d}}$ .
19:: for $n \in N_{3}$ do
20:: Sending models to the cloud.
21:: end for
22:: end for
23:: end for

(1) BSs are selected for participation in training by defining two thresholds: the request quantity threshold (

ℜ_{t r e s h o l d}

) and the cache hit efficiency threshold (

H_{t h r e s h o l d}

) (lines 8, 12). Commencing from the

t_{1}

-th iteration, the cloud selects the BSs (

N_{1}

) that have received more requests than the request quantity threshold in the previous t − 1 iterations. Next, the BSs with cache hit efficiency lower than the efficiency threshold are chosen for online training (lines 11–13), while other BSs (

N_{2}

) adopt the cached strategy obtained after the previous

t_{1}

-th iteration of training (line 15). This strategy facilitates continued training for BSs with low caching efficiency and fully reflects the local content request characteristics for BSs with higher efficiency, thereby sustaining a high cache performance. Simultaneously, as the number of BSs participating in online training in each round diminishes, the overall energy consumption of the system reduces.

w_{t, n}

represents the model parameters in epoch t. For the local dataset

D_{n, t}

of each BS, its model loss function is given by

F_{n} (w_{t, n}) = \frac{1}{D_{t, n}} \sum_{d \in D_{t, n}} f_{d, n} (w_{t, n, d})

(20)

\begin{array}{l} s u b j e c t t o : \\ N_{1} \leftarrow {n | ℜ_{t - 1, n} > ℜ_{t h r e s h o l d}, H_{t - 1, n} < H_{threshold}} \end{array}

where

N_{1}

represents the set of BSs that require local online training. Therefore, the local problem is

w_{t, n, d} = \arg \min F_{n} (w_{t, n} | w_{t - 1, n})

(21)

that is, to minimize the local loss function relative to the previous epoch (lines 10–18).

(2) A novel strategy for the dynamic adjustment of local iteration times in BS models that participate in local online training. Our approach aims to obtain the neighboring BSs experience without compromising the unique local characteristics of each BS model. To achieve this, we propose dynamically adjusting the local iteration times based on the difference between the model parameters obtained from the previous global training (t − 1) and the current global model parameters at time t. Specifically, we define a weight factor

β

as the basis for this dynamic adjustment strategy (line 7).

β_{t, n} = \frac{\sqrt{\sum_{i}^{I} (w_{t, n} - w_{t})}}{\frac{1}{N} \sum_{n \in N} \sqrt{\sum_{i}^{I} (w_{t, n} - w_{t})}}

(22)

The dimensionality of the model parameters is denoted by I. Thus, for BS_n, the number of local training iterations is determined by

L_{t, n} = [β_{t, n} L_{0}]

, where

[β_{t, n} L_{0}] \geq 1

. Models with large differences in parameter values have lower global performance and may not meet the dynamic requirements of requests, therefore requiring more local iterations (

L_{n}

) to ensure their dynamic performance. Additionally, reducing the number of local iterations can lead to lower energy consumption of the system, as shown in Equation (8). When a local model of a participant is close enough to the global model, the number of local training rounds for this participant can be appropriately reduced to reduce the training energy consumption. Local parameter updates can be expressed as follows:

w_{t + 1, n} = w_{t, n} - η \nabla F_{n} (w_{t, n})

(23)

(3) The conventional approach to model aggregation involves weighting the training data size of each model to create a global model. However, in cloud–edge caching environments, the number of content requests and the cache hit rate are more critical technical indicators. This is because, for any BS_n, as the number of content requests increases, the caching strategy must adapt more dynamically to maintain request efficiency. Additionally, a higher cache hit rate implies a higher cache efficiency for the current BS_n. In light of this, we propose using aggregation weights that take into account the importance of the cache hit efficiency because

H_{t}, n

is the sum of the cache hit efficiency of all requests received by BS_n at the t-th epoch:

ϖ_{t, n} = \frac{H_{t, n}}{\sum_{n \in N} H_{t, n}}

(24)

The global objective of FL is to minimize the global loss function:

min_{w} F_{t} (w) = \frac{1}{D_{t}} \sum_{n \in N_{2}} \sum_{i \in D_{t, n}} f_{i, n} (w)

(25)

\begin{array}{l} s u b j e c t t o : \\ N_{3} \leftarrow {n | ℜ_{n} > ℜ_{t h r e s h o l d}}, D_{t} = \sum_{n \in N_{3}} D_{t, n} \end{array}

Let

N_{3}

be the set of BSs that require model aggregation, characterized by a content request number exceeding a predetermined threshold

ℜ_{t h r e s h o l d}

. The models associated with these base stations must be transmitted to the cloud for aggregation. At time t, the total data quantity can be represented by

D_{t} = \sum_{n \in N_{3}} D_{t, n}

, where

D_{t, n}

refers to the data amount associated with each base station. Therefore, the parameter update of the global model at this stage can be expressed as (lines 5–9)

w_{t + 1} = \frac{\sum_{n \in N_{3}} ϖ_{t, n} w_{t + 1, n}}{\sum_{n \in N_{3}} ϖ_{t, n}}

(26)

It should be noted that the edge BSs are divided into two categories: BSs for online training and BSs with fixed models. The BSs involved in online training need to receive global model parameters from the cloud and update their cache strategies in real time through DRL. The cache strategy for fixed-model BSs remains unchanged. The energy consumption of sending models from BSs is explained in Section 3.3. Both BSs for online training and BSs with fixed models need to calculate their cache hit efficiency after each round of local training. The online training BSs and model invariant BSs at the edge can be dynamically switched based on changes in the cache hit efficiency and content request number.

5. Simulation Experiments

In this section, we compare the performance of the proposed LFDC algorithm in terms of cache hit efficiency, energy consumption, and HE gain with centralized DRL ([38]), distributed DRL ([13]), the classical federated learning algorithm FedAvg ([39]), and traditional caching algorithms, including first in first out (FIFO, [18]), least frequently used (LFU, [19]), and least recently used (LRU, [17]). Furthermore, we evaluate the performance of the proposed algorithm under different hyperparameters. The following are three comparison algorithms also based on DRL:

(1): Centralized DRL: A DDQN model is deployed in the cloud to train and make decisions on global dynamic caching.
(2): Distributed DRL: A DDQN model is deployed in each BS, which is trained and makes decisions based on local data.
(3): FedAvg: A DDQN model is deployed in each BS and aggregated and distributed through the federated averaging method.

5.1. Simulation Setting

We implemented the proposed LFDC in Python 3.9, with PyTorch set up on a 64-bit CentOS Linux workstation with an Intel Xeon W-2155 CPU, 128 GB RAM, and 2080Ti GPU. In the simulation, there are

| N | = 10

BSs with

| U | = 500

randomly distributed within the coverage area of the BSs. The cache capacity of each BS is 1 GB. The maximum coverage radius of each BS is 500 m, with a channel bandwidth of 20 MHz and a transmission power of 500 W. The path loss

g_{u, n}

is modeled as

30.6 + 36.7 l g (d t_{u, n}) d B

, where

d t_{u, n}

denotes the distance between UE u and BS_n.

Moreover, to validate the effectiveness of the proposed algorithm in practical scenarios, we obtain content from actual datasets. We collect content request records from the publicly available Movielens dataset. MovieLens consists of 27,753,444 ratings from 283,228 users on 58,098 movies, as well as related information about the movies, such as their titles and genres. We assume that the number of ratings is equal to the number of requests, and the time when a review is posted is considered to be the time when a request is initiated. We record the tags of each video as video features and divide the videos into several sub-files

C_{f}

of equal size, based on their duration. To simulate users’ mobility, we set the mobility coefficient to 0.1 so that after each epoch, a portion of the users served by the base station (BS) will be randomly replaced. For each BS selected for local training, the initial value of

L_{0}

for the number of local training iterations is set to 10, with a minibatch size of 32. The optimizer used is Adam, with a step size of 0.01. We use a dataset consisting of 300 h of data for experimentation, with each epoch being one hour, and Pandas and Requests are used to retrieve and process the data. Other parameters for the experimental setup are shown in Table 2.

5.2. Evaluation Results and Analysis

The experiment initially compared the cache hit efficiency of the LFDC algorithm with other algorithms. As shown in Figure 4a, with the increase in global epoch numbers (i.e., online training/online caching time), the LFDC exhibited improved caching performance. Learning-based algorithms initially underperformed compared to traditional cache optimization algorithms, but they gradually demonstrated better cache performance with ongoing online training. In contrast, the cache hit rates of FIFO, LFU, and LRU fluctuated over time and, due to their inability to adapt to the environment, had lower cache hit efficiency than learning-based algorithms. LFDC essentially converged at 300 global epochs, while other learning algorithms converged earlier, which is attributed to the fact that some BSs did not participate in training in each epoch. Due to the optimization of the federated process, the system cache hit efficiency of LFDC had a clear advantage over other learning algorithms. Starting from the 100th epoch, the average cache hit efficiency of LFDC was 23.87%, 12.51%, 13.66%, 55.27%, 64.36%, and 38.03% higher compared to Central DRL, distributed DRL, FedAvg, FIFO, LFU, and LRU, respectively. As shown in Figure 4b, as the number of UEs increases, the number of content requests also increases. The proposed LFDC algorithm likewise outperforms other algorithms. However, in Figure 4a,b, the curve trends are contrary. This is because, under the fixed cache capacity of the BS, as the number of less-popular new request content increases, a large portion of the new request content is not replaced by the local BS.

Figure 5 compares the training energy consumption of LFDC with distributed DRL and FedAvg. As the number of epochs increases, LFDC exhibits superior energy performance compared to the other two learning algorithms. This is because, firstly, FedAvg requires all BSs to continuously undergo online training and constantly upload their parameters. Although distributed DRL does not involve parameter uploading, its BSs are always engaged in online training, and the number of local training iterations remains constant. The LFDC method, by selecting suitable BSs for training, choosing appropriate parameters from these BSs for uploading, and dynamically adjusting the number of local training iterations, achieves better online training energy efficiency than the other learning algorithms.

Figure 6 investigates the HE gain of the proposed LFDC, where T increases from 20 to 1000. Overall, after 1000 epochs, the proposed LFDC achieves the highest HE gain of 1.988439791 compared to the existing distributed DRL (1.078518039) and FedAvg (0.947555). This is because, in the environment deploying the LFDC algorithm, BSs obtain the experiences of other BSs while catering to local request characteristics. Furthermore, the dynamic federated strategy reduces the training energy consumption. In other words, LFDC can achieve a higher training energy efficiency ratio compared to other classical algorithms.

Figure 7 presents the comparison results of the cache hit efficiency (a), training energy consumption (b), and HE gain (c) for the LFDC algorithm without an adaptive local iteration count. As can be seen in Figure 7a, when the adaptive local iteration count is removed, i.e., the local iteration count is fixed at 10, the cache hit efficiency of LFDC shows a slight improvement. This is because the increased local iteration count can better adapt to the local content request situation. Simultaneously, Figure 7b shows that the total training energy consumption of the LFDC algorithm without adaptive local iteration count increases by 27.19% compared to LFDC, resulting in a 25.03% decrease in the HE gain evaluation as seen in Figure 7c. In summary, the LFDC algorithm with adaptive local iteration count can significantly enhance the training energy efficiency ratio, which means it can provide better cache performance under the same training consumption rate.

Finally, we compare the cache performance and HE gain under different step sizes

η

in Figure 8. As the learning rate increases, the performance of cache hit efficiency does not change significantly. However, when the step size factor

η

is within the range of 0.001 to 0.01, the HE gain exhibits stable performance variations. Therefore, we use a step size of

η

= 0.01 as an empirical value to maintain the stability and effectiveness of the algorithm.

6. Conclusions

In this paper, we studied the cloud–edge collaborative caching in cloud–edge integrated environments. To improve the cache hit efficiency and reduce training energy consumption, we proposed a low-training-energy federated deep reinforcement learning for caching (LFDC) algorithm. The algorithm optimizes cache decision making based on user preferences, and optimizes federated decision making based on cache hit efficiency and the number of UE requests after cache decision making. This algorithm can effectively reduce training energy consumption while ensuring cache hit efficiency and improving the efficiency ratio in cloud–edge collaboration.

Author Contributions

Conceptualization, methodology, software, writing, X.Z.; writing—review and editing, formal analysis, Z.H.; methodology, M.Z.; validation, Y.L., H.X., H.Z. and A.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC) under Grant 62172442 and The Hunan Province Natural Science Foundation of China (No. 2020JJ5775).

Data Availability Statement

Data available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. These data can be found at [https://movielens.org/, accessed on 1 May 2022].

Conflicts of Interest

The authors declare no conflict of interest.

References

Kumar, M.; Sharma, S.C.; Goel, A.; Singh, S.P. A comprehensive survey for scheduling techniques in cloud computing. J. Netw. Comput. Appl. 2019, 143, 1–33. [Google Scholar] [CrossRef]
Shi, W.; Sun, H.; Cao, J.; Zhang, Q.; Liu, W. Edge computing: An emerging computing model for the Internet of everything era. J. Comput. Res. Dev. 2017, 54, 907G924. [Google Scholar]
Ren, J.; He, Y.; Yu, G.; Li, G.Y. Joint communication and computation resource allocation for cloud-edge collaborative system. In Proceedings of the 2019 IEEE Wireless Communications and Networking Conference (WCNC), Marrakesh, Morocco, 15–18 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Ding, C.; Zhou, A.; Liu, Y.; Chang, R.N.; Hsu, C.H.; Wang, S. A cloud-edge collaboration framework for cognitive service. IEEE Trans. Cloud Comput. 2020, 10, 1489–1499. [Google Scholar] [CrossRef]
Li, X.; Wang, X.; Li, K.; Han, Z.; Leung, V.C. Collaborative multi-tier caching in heterogeneous networks: Modeling, analysis, and design. IEEE Trans. Wirel. Commun. 2017, 16, 6926–6939. [Google Scholar] [CrossRef]
Wang, X.; Chen, M.; Taleb, T.; Ksentini, A.; Leung, V.C. Cache in the air: Exploiting content caching and delivery techniques for 5G systems. IEEE Commun. Mag. 2014, 52, 131–139. [Google Scholar] [CrossRef]
Sheng, M.; Xu, C.; Liu, J.; Song, J.; Ma, X.; Li, J. Enhancement for content delivery with proximity communications in caching enabled wireless networks: Architecture and challenges. IEEE Commun. Mag. 2016, 54, 70–76. [Google Scholar] [CrossRef]
Zhao, X.; Yuan, P.; Tang, S. Collaborative edge caching in context-aware device-to-device networks. IEEE Trans. Veh. Technol. 2018, 67, 9583–9596. [Google Scholar] [CrossRef]
Sun, Y.; Peng, M.; Mao, S. Deep reinforcement learning-based mode selection and resource management for green fog radio access networks. IEEE Internet Things J. 2018, 6, 1960–1971. [Google Scholar] [CrossRef]
Yuan, W.; Yang, M.; He, Y.; Wang, C.; Wang, B. Multi-reward architecture based reinforcement learning for highway driving policies. In Proceedings of the 2019 IEEE Intelligent Transportation Systems Conference (ITSC), Auckland, New Zealand, 27–30 October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 3810–3815. [Google Scholar]
Li, J.; Gao, H.; Lv, T.; Lu, Y. Deep reinforcement learning based computation offloading and resource allocation for MEC. In Proceedings of the 2018 IEEE wireless Communications and Networking Conference (WCNC), Barcelona, Spain, 15–18 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
Li, Y.; Hu, S.; Li, G. CVC: A collaborative video caching framework based on federated learning at the edge. IEEE Trans. Netw. Serv. Manag. 2021, 19, 1399–1412. [Google Scholar] [CrossRef]
Wang, X.; Wang, C.; Li, X.; Leung, V.C.; Taleb, T. Federated deep reinforcement learning for internet of things with decentralized cooperative edge caching. IEEE Internet Things J. 2020, 7, 9441–9455. [Google Scholar] [CrossRef]
Yan, H.; Chen, Z.; Wang, Z.; Zhu, W. DRL-Based Collaborative Edge Content Replication with Popularity Distillation. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen, China, 5–9 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
Chien, W.C.; Weng, H.Y.; Lai, C.F. Q-learning based collaborative cache allocation in mobile edge computing. Future Gener. Comput. Syst. 2020, 102, 603–610. [Google Scholar] [CrossRef]
Wang, S.; Tuor, T.; Salonidis, T.; Leung, K.K.; Makaya, C.; He, T.; Chan, K.S. Adaptive Federated Learning in Resource Constrained Edge Computing Systems. IEEE J. Sel. Areas Commun. 2018, 37, 1205–1221. [Google Scholar] [CrossRef]
Ahmed, M.; Traverso, S.; Giaccone, P.; Leonardi, E.; Niccolini, S. Analyzing the performance of LRU caches under non-stationary traffic patterns. arXiv 2013, arXiv:1301.4909. [Google Scholar]
Rossi, D.; Rossini, G. Caching performance of content centric networks under multi-path routing (and more). Relatório Técnico Telecom ParisTech 2011, 2011, 1–6. [Google Scholar]
Jaleel, A.; Theobald, K.B.; Steely, S.C., Jr.; Emer, J. High performance cache replacement using re-reference interval prediction (RRIP). ACM SIGARCH Comput. Archit. News 2010, 38, 60–71. [Google Scholar] [CrossRef]
Poularakis, K.; Iosifidis, G.; Tassiulas, L. Approximation algorithms for mobile data caching in small cell networks. IEEE Trans. Commun. 2014, 62, 3665–3677. [Google Scholar] [CrossRef]
Poularakis, K.; Llorca, J.; Tulino, A.M.; Taylor, I.; Tassiulas, L. Joint service placement and request routing in multi-cell mobile edge computing networks. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference on Computer Communications, Paris, France, 29 April–2 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 10–18. [Google Scholar]
Yang, L.; Cao, J.; Liang, G.; Han, X. Cost aware service placement and load dispatching in mobile cloud systems. IEEE Trans. Comput. 2015, 65, 1440–1452. [Google Scholar] [CrossRef]
Zhang, S.; Liu, J. Optimal probabilistic caching in heterogeneous IoT networks. IEEE Internet Things J. 2020, 7, 3404–3414. [Google Scholar] [CrossRef]
Wang, X.; Han, Y.; Leung, V.C.; Niyato, D.; Yan, X.; Chen, X. Convergence of edge computing and deep learning: A comprehensive survey. IEEE Commun. Surv. Tutorials 2020, 22, 869–904. [Google Scholar] [CrossRef]
Gu, J.; Wang, W.; Huang, A.; Shan, H.; Zhang, Z. Distributed cache replacement for caching-enable base stations in cellular networks. In Proceedings of the 2014 IEEE International Conference on Communications (ICC), Sydney, Australia, 10–14 June 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 2648–2653. [Google Scholar]
Rathore, S.; Ryu, J.H.; Sharma, P.K.; Park, J.H. DeepCachNet: A proactive caching framework based on deep learning in cellular networks. IEEE Netw. 2019, 33, 130–138. [Google Scholar] [CrossRef]
Wan, Z.; Li, Y. Deep reinforcement learning-based collaborative video caching and transcoding in clustered and intelligent edge B5G networks. Wirel. Commun. Mob. Comput. 2020, 2020, 6684293. [Google Scholar] [CrossRef]
Neglia, G.; Leonardi, E.; Iecker, G.; Spyropoulos, T. A swiss army knife for dynamic caching in small cell networks. arXiv 2019, arXiv:1912.10149. [Google Scholar]
Neglia, G.; Carra, D.; Michiardi, P. Cache policies for linear utility maximization. IEEE/ACM Trans. Netw. 2018, 26, 302–313. [Google Scholar] [CrossRef]
Ricardo, G.I.; Tuholukova, A.; Neglia, G.; Spyropoulos, T. Caching policies for delay minimization in small cell networks with coordinated multi-point joint transmissions. IEEE/ACM Trans. Netw. 2021, 29, 1105–1115. [Google Scholar] [CrossRef]
Chen, X.; Jiao, L.; Li, W.; Fu, X. Efficient multi-user computation offloading for mobile-edge cloud computing. IEEE/ACM Trans. Netw. 2015, 24, 2795–2808. [Google Scholar] [CrossRef]
Xiao, Y.; Li, Y.; Shi, G.; Poor, H.V. Optimizing resource-efficiency for federated edge intelligence in IoT networks. In Proceedings of the 2020 International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 21–23 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 86–92. [Google Scholar]
Yang, Z.; Chen, M.; Saad, W.; Hong, C.S.; Shikh-Bahaei, M. Energy efficient federated learning over wireless communication networks. IEEE Trans. Wirel. Commun. 2020, 20, 1935–1949. [Google Scholar] [CrossRef]
Mao, Y.; Zhang, J.; Letaief, K.B. Dynamic computation offloading for mobile-edge computing with energy harvesting devices. IEEE J. Sel. Areas Commun. 2016, 34, 3590–3605. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
van Hasselt, H. Double Q-Learning. IEEE Intell. Syst. 2010, 23. [Google Scholar]
Yu, S.; Chen, X.; Zhou, Z.; Gong, X.; Wu, D. When deep reinforcement learning meets federated learning: Intelligent multitimescale resource management for multiaccess edge computing in 5G ultradense network. IEEE Internet Things J. 2020, 8, 2238–2251. [Google Scholar] [CrossRef]
Zhong, C.; Gursoy, M.C.; Velipasalar, S. A deep reinforcement learning-based framework for content caching. In Proceedings of the 2018 52nd Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 21–23 March 2018; pp. 1–6. [Google Scholar]
McMahan, H.B.; Moore, E.; Ramage, D.; y Arcas, B.A. Federated Learning of Deep Networks using Model Averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]

Figure 1. Proposed networks topology architecture.

Figure 2. DDQN training process for cache.

Figure 3. Proposed federated mechanism.

Figure 4. Average cache hit efficiency for different algorithms with (a) epoch (time) increasing, (b) number of UEs increasing.

Figure 5. Comparison of total training energy consumption.

Figure 6. Comparison of HE gain obtained by using distribution DRL, LFDC and FedAvg with the epochs (time) increasing.

Figure 7. Comparison of cache hit efficiency (a), energy consumption (b) and HE gain (c) by LFDC without adaptive iteration.

Figure 8. Comparison of cache hit efficiency and HE gain for LFDC at different step sizes.

Table 1. Key notations and descriptions.

Notation	Description
$N = \{1, \dots, n, \dots, N\}$	Set of BSs
$U = \{1, \dots, u, \dots, U\}$	Set of UEs
$F = \{1, \dots, f, \dots, F\}$	Set of content
$T = \{1, \dots, t, \dots, T\}$	Epochs of federated learning/caching decision.
$v_{u, n}$ , $v_{a}$ , $v_{b}$	Transmission rate between UEs and BSs, BSs and Cloud, BSs and BSs.
$D_{n}$ , $D_{f}$	The size of the content in BS_n, the size of the content f.
$P_{u, f}$	UE u preferences for content f.
$ℜ_{u}$ , $ℜ_{n}$	Number of requests from UE u, number of requests received by BS_n.
$H_{t, n}$ , $H_{t, s y s t e m}$	The cache hit efficiency of BS_n in epoch t, The cache hit efficiency of system in epoch t.
$E_{t, n}^{c m p}$ , $E_{t, n}^{u p}$	The energy consumption respectively, of local learning and upload in epoch t.
$E_{t, s y s t e m}$	The system energy consumption for learning in epoch t.
$s_{i} = (s_{i, u}^{r}, s_{i, n}^{c})$	The caching state.
$ϕ (s_{i}) = {a_{i}^{l o c a l}, a_{i}^{B S - B S}, a_{i}^{c l o u d}}$	The caching decision.
$R_{i, n} (s_{i}, ϕ (s_{i})) = H_{i, n}$	The Double Deep Q-Network (DDQN) reward.
$L o s s (w_{i})$	The loss function of local training.
$\nabla_{w_{i}} l o s s (w_{i})$	gradient update formula for $w_{i}$ .
$β_{t, n}$	The weight factor for adaptive iteration times.
$D_{t, n}$	Amount of data for a single training session in BS_n in epoch t.

Table 2. Parameter values.

Parameter	Value	Description
T	1000	Number of global epoch
$L_{0}$	10	Number of initial local iterations
$c_{n}$	20 cycles/bit	CPU cycles per bit for training one data sample
$f r e$	4 GHz	Computation capacity of BSs
$P_{n}$	500 W	Transmit power of BS_n
B	20 MHz	Channel bandwidth of BSs downlink
$v_{a}$	1000 Mbps	Transmission speed between cloud and BSs
$v_{b}$	1 Gbps	Transmission speed between BSs
$D_{f}$	10 Mbit	Content size
$ζ_{n}$	$1.2 \times 10^{- 28}$	Effective capacitance coefficient of BS_n
$℘_{n}$	$1 \times 10^{4}$ bit	Parameter size of model
$γ$	0.9	Discount factor
$η$	0.01	Step size
$ϵ$	0.1	State transition probability

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, X.; Hu, Z.; Zheng, M.; Liang, Y.; Xiao, H.; Zheng, H.; Xu, A. LFDC: Low-Energy Federated Deep Reinforcement Learning for Caching Mechanism in Cloud–Edge Collaborative. Appl. Sci. 2023, 13, 6115. https://doi.org/10.3390/app13106115

AMA Style

Zhang X, Hu Z, Zheng M, Liang Y, Xiao H, Zheng H, Xu A. LFDC: Low-Energy Federated Deep Reinforcement Learning for Caching Mechanism in Cloud–Edge Collaborative. Applied Sciences. 2023; 13(10):6115. https://doi.org/10.3390/app13106115

Chicago/Turabian Style

Zhang, Xinyu, Zhigang Hu, Meiguang Zheng, Yang Liang, Hui Xiao, Hao Zheng, and Aikun Xu. 2023. "LFDC: Low-Energy Federated Deep Reinforcement Learning for Caching Mechanism in Cloud–Edge Collaborative" Applied Sciences 13, no. 10: 6115. https://doi.org/10.3390/app13106115

APA Style

Zhang, X., Hu, Z., Zheng, M., Liang, Y., Xiao, H., Zheng, H., & Xu, A. (2023). LFDC: Low-Energy Federated Deep Reinforcement Learning for Caching Mechanism in Cloud–Edge Collaborative. Applied Sciences, 13(10), 6115. https://doi.org/10.3390/app13106115

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LFDC: Low-Energy Federated Deep Reinforcement Learning for Caching Mechanism in Cloud–Edge Collaborative

Abstract

1. Introduction

2. Related Works

3. System Model and Problem Formulation

3.1. Networks Model

3.2. Cache Performance Model

3.3. Np-Complete Proof

3.4. Drl Energy Consumption Model

3.5. Problem Formulation

4. Low-Energy Federated Deep Reinforcement Learning for Caching

4.1. Local DRL Model Design

4.2. Federated Mechanism

5. Simulation Experiments

5.1. Simulation Setting

5.2. Evaluation Results and Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI