Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks

Tan, Fangqing; Peng, Yuan; Liu, Qiang

doi:10.3390/s23198295

Open AccessArticle

Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks

by

Fangqing Tan

^1,*

,

Yuan Peng

¹ and

Qiang Liu

^2,*

¹

Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing, Guilin University of Electronic Technology, Guilin 541004, China

²

College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao 266590, China

^*

Authors to whom correspondence should be addressed.

Sensors 2023, 23(19), 8295; https://doi.org/10.3390/s23198295

Submission received: 4 September 2023 / Revised: 1 October 2023 / Accepted: 3 October 2023 / Published: 7 October 2023

(This article belongs to the Special Issue 6G Space-Air-Ground Communication Networks and Key Technologies)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the explosive growth of micro-video applications, the transmission burden of fronthaul and backhaul links is increasing, and meanwhile, a lot of energy consumption is also generated. For reducing energy consumption and transmission delay burden, we propose a cell-free massive multiple-input multiple-output (CF-mMIMO) system in which the cache on the access point (AP) is used to reduce the load on the link. In this paper, a total energy efficiency (EE) model of a cache-assisted CF-mMIMO system is established. When optimizing EE, forming the co-operation cluster is critical. Therefore, we propose an energy-efficient joint design of content caching, AP clustering, and low-resolution digital-to-analog converter (DAC) in a cache-assisted CF-mMIMO network based on deep reinforcement learning. This scheme can effectively cache content in APs and select the appropriate DAC resolution. Then, taking into account the channel state information and user equipment (UE)’s content request preference, a deep deterministic policy gradient algorithm is used to jointly optimize the cache strategy, AP clustering, and DAC resolution decisions. Simulation results show that the energy efficiency of the proposed scheme is 4% higher than that of other schemes without the resolution optimization and is much higher than that of the only AP clustering without the joint design of content caching and channel quality.

Keywords:

content caching; cell-free massive MIMO; energy efficiency; deep deterministic policy gradient algorithm

1. Introduction

Due to the rapid development of smart devices such as smart phones, smart watches, smart robots, and drones, mobile data traffic on wireless networks has experienced terrible growth. IDC estimates that by 2023, there will be 48.9 billion connected devices worldwide [1]. Such a large number of devices will not only generate exabytes of data but will also require massive amounts of content, which will create unprecedented challenges in the upcoming communications field. The capacity of a backhaul link has become the bottleneck of a data-intensive network, and it is necessary to find an efficient way to reduce the backhaul link load to meet the rapidly growing demand for mobile communication.

Caching is a well-known technique used to improve the performance of numerous wired networks, such as content-centric networks [2,3,4]. In cellular networks, caching frequently requesting content at the edge of the network can reduce backhaul costs, reduce access latency and power consumption, and increase throughput. In [5], it is proposed to replace the backhaul link by caching on base stations (BSs). By optimizing the cache strategy, it is possible to serve more users within the limits of download time, which significantly increases throughput. In [6], the caching in the BS can lighten the backhaul traffic load. In order to minimize the overall energy consumption attributed to cache and data transmission including inter-BS and BS-to-server communications, Ref. [7] optimizes the allocation of cache sizes for BSs and service gateways. With the goal of reducing the overall energy consumption of the service, the caching strategy is fine-tuned in [8], where the influence of multicast transmission is considered.

At the same time, to cater to the rising need for traffic data, more and more antennas of base stations and smaller and smaller cell radius will inevitably cause more and more inter-cells interference. To tackle this issue, many research efforts have been made to diminish inter-cell interference [9,10]. Two primary approaches exist: massive multiple-input multiple-output (mMIMO) systems [11] and distributed systems [12]. Within mMIMO systems, BS utilizes the spatial multiplexing yielded by an extensive array of antennas. These antennas are coupled with precoding techniques that can substantially and efficiently mitigate both intra-cell and inter-cell interference among user equipment (UEs). Nonetheless, the uniformity of service quality across all terminals is not guaranteed by the system. The terminal near the BS can enjoy better service due to good channel conditions, while the terminal located at the edge of the cell can only get inadequate quality of service. Within a distributed system, multiple BSs or access points (APs) collaborate by exchanging service data and channel state information (CSI), aiming to minimize inter-cell interference. But distributed systems remain centered around individual cells. Multi-cell collaboration essentially extends the coverage of a single cell. Edge effects continue to impact UE positioned at the periphery of the cell. Therefore, the cell-free massive multiple-input multiple-output (CF-mMIMO) technique has been introduced [13,14], which combines the strengths the aforementioned two systems, namely robust interference cancellation and macro diversity gain. Additionally, it has made some enhancements: (1) It shifts from a cell-centric service model to a UE-centric service model, allowing for potential overlap between distinct AP clusters. (2) There exists a substantial number of APs, wide coverage, and APs are closer to the terminal. Thus, it completely eliminates the concept of a cell.

The essence of the CF-mMIMO system is that the mMIMO system moves the AP closer to the terminals through the integration of fronthaul links and a more frequent utilization of the backhaul links. This will cause a sharp increase in the link load in the CF-mMIMO system, which inevitably results in elevated energy consumption. Therefore, traffic congestion in the fronthaul/backhaul link and high transmission energy consumption constitute the bottlenecks that impede the practical implementation of CF-mMIMO systems. The content caching technology proactively stores data in the cache device and directly transmits the data to the terminal during peak hours without the need to obtain data from the central processing unit (CPU) and core network via the fronthaul/backhaul link. Because the requested content is concentrated in a limited number of popular files [15], the cost associated with its caching is diminishing [16], so content caching proves to be a cost-effective and effective technique in lessening the burden on the link. Building upon this notion, a cache-assisted CF-mMIMO system is introduced in [17]. Moreover, we proposed energy-efficient content of a data caching strategy in CF-mMIMO systems in [18], but only research ideas were provided without experimental validation. Therefore, its total energy efficiency (EE) maximization problem is non-deterministic polynomial-hard (NP-hard) and necessitates solutions through inefficient and non-scalable methods. Additionally, researchers have started to consider the joint optimization of user association and caching strategies [19,20,21]. For example, in [19], the high-density satellite-UAV-terrestrial network scenario is considered, and the initial combination optimization problem is effectively solved using game theory and genetic algorithm for clustering and cache placement, respectively. In [20], for a CF-mMIMO-assisted vehicle edge network, a Deep-Q-Network (DQN) algorithm was proposed to optimize the cache decision for improving the network capacity and content delivery performance. Moreover, two deep reinforcement learning (DRL) methods, single-agent reinforcement learning and multi-agent reinforcement learning, were proposed to solve the joint optimization problem of user association and content cache in CF-mMIMO in [21]. However, most existing research focuses on content caching strategies in edge caching, without considering AP clustering strategies.

On the other hand, a substantial quantity of modules of high-resolution analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) generate a lot of power consumption. To avoid this, it is recommended to use low-resolution ADCs (1–3 bits) in CF-mMIMO networks. This trade-off reduces power consumption while sacrificing spectral efficiency (SE). The work of [22] shows that low-resolution ADCs have better EE than high-resolution ADCs in the uplink of CF-mMIMO systems.

Creating a practical model for the total EE of a cache-assisted CF-mMIMO system, one that is both straightforward to calculate and analyze, while also being amenable to effective optimization, poses a significant challenge. To date, little research has been conducted on cache-assisted CF-mMIMO systems with cache assistance, which encourages the development of this study. The primary contributions of this paper can be outlined as follows:

In this paper, a new total EE model of a cache-assisted CF-mMIMO system is established, which has the following advantages: the introduction of low-resolution DAC can improve EE; UE-centric cache deployment can provide a better user experience; and considering the influence of different resolution converters on EE, it is more suitable for practical use;
A deep deterministic policy gradient (DDPG) algorithm is proposed to solve the joint optimization problem of content cache, AP clustering, and DAC resolution, and it can find the global optimal decision for maximizing the EE performance in cache-assisted CF-mMIMO networks;
We compare and discuss the influence of DAC resolutions, the numbers of UEs, and APs on the EE performance. Moreover, the proposed DDPG method is compared with the benchmark methods, such as clustering based on signal-to-interference noise ratio (SINR) and caching strategies based on content popularity. By exploiting the intelligent design, its EE is not only significantly better than the benchmark (BM) methods but also better than the DDPG method based on the joint content cache and AP clustering.

The rest of the paper is organized as follows. In Section 2, we give a model for the cache-assisted CF-mMIMO system. In Section 3, we propose the total EE model of the cache-assisted CF-mMIMO system and formulate the optimization problem. In Section 4, we present an approach based on DRL. The simulation results and discussion are provided in Section 5. Conclusively, we summarize this paper in Section 6.

2. System Model

In this section, the signal model, cache model, and DAC resolution model of the cache-assisted CF-mMIMO network are introduced. For the signal model, it describes the transmitted signal in the cache-assisted CF-mMIMO network’s downlink channel. For the cache model, a content cache mechanism is outlined to enhance the network’s EE. For the low-resolution DAC model, the power consumption generated with different resolutions and the effect on the signal transmission are explained.

2.1. Signal Model

Figure 1 depicts an example topology of a dynamic collaborative cluster serving UE in a cache-assisted CF-mMIMO network. We consider a downlink CF-mMIMO, i.e., a cache-assisted CF-mMIMO system encompassing of M single-antenna AP and K single-antenna UE. Every AP is linked to the CPU via a fronthaul link, while the CPU itself connects to the core network via backhaul links. All APs and UEs are distributed randomly across

S_{a}

area. And we only focus on downlink transmissions in this paper.

In time-division duplex (TDD) mode, all APs provide identical time/frequency resources to each terminal. Let the channel linking the m-th AP and the k-th UE be

g_{m k} = {(d_{m k} / d_{0})}^{- α} h_{m k}

(1)

where

d_{m k}

denotes the distance between the m-th AP and the k-th UE,

d_{0} = \min_{m, k} d_{m k}

is the reference distance,

α

represents the path-loss exponent

(α \geq 2)

, and

h_{m k} \sim C N (0, 1)

denotes small-scale fading.

Let

ℜ_{k}

denote the set of APs serving the k-th UE and

ℂ_{m}

represent the set of UE served by the m-th AP. We make the assumption that each UE is ensured service from no more than

L (L < M)

APs (i.e.,

| ℜ_{k} | \leq L, \forall k

). Therefore, the AP set for all services and the UE set for all services can be represented as

ℜ = \cup_{k = 1}^{K} ℜ_{k}

and

ℂ = \cup_{m = 1}^{M} ℂ_{m}

, respectively. Let

q_{k}

be a symbol emitted in the service AP for the k-th UE, where

E [| q_{k} |^{2}] = 1

,

E [q_{k}] = 0, \forall k

and

E [q_{k} q_{l}^{*}] = 0, \forall k \neq l

(i.e., the symbols of distinct UE are not related). Then, the transmitted signal of the m-th AP can be expressed as [14]

x_{m} = \sum_{k \in ℂ_{m}} \sqrt{p_{m k}} {\hat{g}}_{m k}^{*} q_{k}

(2)

where

p_{m k}

signifies the power assigned to the k-th UE at the m-th AP subject to power constraints,

E [| x_{m} |^{2}] \leq P_{m}

is constrained by the maximum power

P_{m}

transmitted by the m-th AP, and

{\hat{g}}_{m k}

denotes the channel estimation

g_{m k}

at the m-th AP. This paper considers perfect CSI (i.e.,

{\hat{g}}_{m k} = g_{m k}, \forall m, k

).

Accordingly, the k-th UE’s received signal can be expressed as [23]

\begin{array}{l} r_{k} = \sum_{m \in ℜ} g_{m k} x_{m} + w_{k} \\ = \sum_{m \in ℜ_{k}} g_{m k} x_{m} + \sum_{m \in ℜ_{k}^{c}} g_{m k} x_{m} + w_{k} \\ = \sum_{m \in ℜ_{k}} \sum_{k^{'} \in ℂ_{m}} \sqrt{p_{m k^{'}}} g_{m k} {\hat{g}}_{m k^{'}}^{*} q_{k^{'}} + \sum_{m \in ℜ_{k}^{c}} g_{m k} x_{m} + w_{k} \\ = \underset{useful signal}{\underset{︸}{\sum_{m \in ℜ_{k}} \sqrt{p_{m k}} g_{m k} {\hat{g}}_{m k}^{*} q_{k}}} + \underset{interference plus noise}{\underset{︸}{\sum_{m \in ℜ_{k^{'}}} \sum_{k^{'} \in ℂ, k^{'} \neq k} \sqrt{p_{m k^{'}}} g_{m k} {\hat{g}}_{m k^{'}}^{*} q_{k^{'}} + w_{k}}} \end{array}

(3)

where

w_{k} \sim C N (0, σ_{w}^{2})

signifies the noise at the k-th UE, and

ℜ_{k}^{c} = ℜ \ ℜ_{k}

is the other AP set that does not serve the k-th UE.

2.2. Caching Model

We consider a limited file library

C F = {c f_{1}, c f_{2}, \dots, c f_{F}}

with F content files. Let

C F_{m} \subset C F

be the set of content files cached in the m-th AP. Additionally, we make the assumption that each AP has the capacity to cache a maximum of

N (N < F)

files, denoted as

| C F_{m} | \leq N, \forall m

. Each UE makes content file requests independently or abandons the request. The content file for the k-th UE request is represented by

c f_{k} \in C F

, where

c f_{k}

is determined by the content preference vector for the k-th UE (arranged in descending order of preference for all content files) and the distribution of content popularity (specified with the Zipf distribution). To be more specific, in the content preference vector of the k-th UE, the probability that

c f_{k}

equals the content file of the i-th rank is

i^{- β} / \sum_{j = 1}^{F} j^{- β}

, where

β

is the Zipf factor; Usually, set to

β = 0.5, 1, 2

. Each UE possesses a different, independent, and time-invariant content preference vector.

We use

H_{m k}

to define the event that the content file requested by the k-th UE is cached on its m-th service AP, i.e.,

c f_{k} \in C F_{m}, m \in ℜ_{k}

. Therefore, the matching event of the k-th UE

H_{k}

indicates that the file requested by the k-th UE is cached across all APs serving the k-th UE, i.e.,

c f_{k} \in C F_{m}, \forall m \in ℜ_{k}

. In case of a miss, there exist certain

m \in ℜ_{k}

APs that do not cache the file of the k-th UE, i.e.,

c f_{k} \notin C F_{m}, \exists m \in ℜ_{k}

. In such scenarios, these APs will necessitate requesting the content file

c f_{k}

from the CPU/core network for joint AP transfer. The network’s hit ratio is denoted as

H = \sum_{k \in ℂ} 1_{H_{k}} / | ℂ |

, where

1_{H_{k}}

signifies the indicator function, and it is set to be 1 if the

H_{k}

event occurs; otherwise, it is set to 0.

2.3. Low-Resolution DAC Model

We adopt a low-resolution DAC with a binary-weighted current-oriented topology, whose power consumption is composed of both static and dynamic components. The power consumption of a DAC module with a resolution of b can be given as [24,25]

P_{D A C} (b, F_{s}) = 1.5 \times 10^{- 5} \cdot 2^{b} + 4.5 \times 10^{- 12} \cdot b \cdot F_{s}

(4)

where

F_{s}

is the sampling frequency.

Each AP’s antenna is connected to a low-resolution DAC, and the resulting signal has

α \in [0, 1]

linear gain. Therefore, the transmitting signal of the m-th AP given in (2) is now modified as

x_{m} = α_{m} \sum_{k \in ℂ_{m}} \sqrt{p_{m k}} {\hat{g}}_{m k}^{*} q_{k}

(5)

where

α_{m}

represents the linear gain of the m-th AP, and its expression is [22,26]

α_{m} = {\begin{array}{r} 0.6366, b_{m} = 1 \\ 0.8825, b_{m} = 2 \\ 1 - (π \sqrt{3} / 2) \cdot 2^{^{- 2 b_{m}}}, b_{m} \geq 3 \end{array}

(6)

The received signal of k-th terminal provided by (3) is rewritten as

r_{k} = \underset{useful signal}{\underset{︸}{\sum_{m \in ℜ_{k}} α_{m} \sqrt{p_{m k}} g_{m k} {\hat{g}}_{m k}^{*} q_{k}}} + \underset{interference plus noise}{\underset{︸}{\sum_{m \in ℜ_{k^{'}}} α_{m} \sum_{k^{'} \in ℂ, k^{'} \neq k} \sqrt{p_{m k^{'}}} g_{m k} {\hat{g}}_{m k^{'}}^{*} q_{k^{'}} + w_{k}}}

(7)

3. The EE Model and Problem Formulation

3.1. The System Sum Rate

According to the Shannon theory, the achievable rate of the k-th UE can be expressed as

R_{k} = B \log_{2} (1 + \frac{{| \sum_{m \in ℜ_{k}} α_{m} \sqrt{p_{m k}} g_{m k} {\hat{g}}_{m k}^{*} |}^{2}}{\sum_{k^{'} \in ℂ, k^{'} \neq k} {| \sum_{m \in ℜ} α_{m} \sqrt{p_{m k^{'}}} g_{m k} {\hat{g}}_{m k^{'}}^{*} |}^{2} + {| w_{k} |}^{2}})

(8)

where B is the bandwidth. Therefore, the overall achievable rate of the considered cache-assisted CF-mMIMO network is given using

R_{s u m} = \sum_{k \in ℂ} R_{k}

(9)

3.2. Power Consumption

The overall power consumption of the network consists of four parts: (1) the transmission power of all service APs; (2) power consumption of DAC in all service APs; (3) the power required by the AP to recover the lost content file from the CPU; and (4) the power needed by the CPU to recover the lost content from the core network.

For (1), the total transmitted power of all service APs is represented by

\sum_{m \in ℜ} P_{m}

. For (2), the sum of DAC power consumption in all service APs is given using

\sum_{m \in ℜ} P_{D A C}^{m}

, where

P_{D A C}^{m}

indicates the power consumption of the DAC module with

b_{m}

resolution selected using the m-th AP.

For (3) and (4) in cache-assisted CF-mMIMO systems, all APs within a cluster must concurrently transmit identical content to the terminal. Cache deployment results in three scenarios are illustrated in Figure 2: (a) Every AP within the cluster has deployed the required content. (b) Only some APs have deployed the required data. c) None of the APs deployed the required content. In Scenario (a), no content is conveyed through either the fronthaul or backhaul link. In Scenario (b), certain APs are required to transmit content via the fronthaul link. In Scenario (c), all content is transmitted to the AP via the backhaul link of the core network and the fronthaul link of the CPU.

The fronthaul link is utilized for content transmission between the AP and the CPU, its power consumption is proportional to the cumulative SE sum, and its expression is [27]

P_{b h, m} = E_{b h} \sum_{k \in ℂ_{m}} R_{k}

(10)

where

E_{b h}

indicates the energy consumed for transmitting 1 Mbit of data over the fronthaul link.

The m-th AP is used to transmit data

q_{1}, q_{2}, \dots q_{K}

via a fronthaul/backhaul link between the CPU and core network. Therefore, the fronthaul/backhaul power consumption depends on the SE,

{SE}_{1}, {SE}_{2}, \dots {SE}_{K}

. If the m-th AP serves only specific UEs, it merely transmits data related to these UEs. Therefore, the power consumption for the fronthaul/backhaul is contingent solely on the SE of these UEs. As shown in Figure 2, the cache power consumption is calculated with the user as the center, so the fronthaul power consumption of the k-th cluster can be represented as

P_{b h, k} = E_{b h} R_{k}

(11)

Similarly, the backhaul power generated by the k-th cluster’s backhaul link for transferring data between the core network and the CPU can be expressed as

P_{b b, k} = E_{b b} R_{k}

(12)

where

E_{b b}

indicates the energy consumed for transmitting 1 Mbit of data over a backlink. Therefore, the power uploaded via the AP to the CPU in the k-th cluster can be represented as

P_{b h, k}^{u p} = {\begin{cases} P_{b h, k}, 0 < \sum_{m \in ℜ_{k}} | H_{m k}^{m i s s} | / | ℜ_{k} | < 1 \\ 0, others \end{cases}

(13)

where

H_{m k}^{m i s s} = | 1 - H_{m k} |

represents the event that the UE content request provided by the m-th AP is not cached on the m-th AP. The value is 1 if the event

H_{m k}^{m i s s}

occurs, but it is 0 otherwise.

The energy consumption associated with the content requested by the AP from the CPU in the k-th cluster can be expressed as

P_{b h, k}^{d o w n} = P_{b h, k} \sum_{m \in ℜ_{k}} | H_{m k}^{m i s s} |

(14)

The backhaul power generated by the k-th cluster requesting content from the core network can be denoted as

P_{b b, k}^{d o w n} = P_{b b, k} ⌊ \sum_{m \in ℜ_{k}} | H_{m k}^{m i s s} | / | ℜ_{k} | ⌋

(15)

Therefore, the fronthaul/backhaul power consumption of the k-th cluster can be expressed as

P_{B, k} = P_{_{b h, k}}^{u p} + P_{_{b h, k}}^{d o w n} + P_{_{b b, k}}^{d o w n}

(16)

So, the overall energy consumption can be expressed as

P_{t o t a l} = \sum_{m \in ℜ} (P_{m} + P_{D A C}^{m}) + \sum_{k = 1}^{K} P_{B, k}

(17)

3.3. Problem Formulation

Our aim is to find a strategy that determines the AP clustering with

ℜ_{1}, ℜ_{2}, \dots, ℜ_{K}

, as well as the AP’s content cache

C F_{1}, C F_{2}, \dots, C F_{M}

and its DAC resolution

b_{1}, b_{2}, \dots, b_{M}

, in order to maximize the system’s EE. The optimization problem can be expressed as

\begin{array}{l} \max \frac{R_{s u m}}{P_{t o t a l}} \\ s . t . (C 1) : | ℜ_{k} | \leq M, \forall k \in ℂ, \\ (C 2) : | C F_{m} | \leq L, \forall m \in ℜ, \\ (C 3) : b_{m} \in N^{+}, \forall m \in ℜ . \end{array}

(18)

where constraint (C1) indicates that the count of APs within the AP cluster of each UE cannot exceed the maximum number of connections M. Constraint (C2) requires that the amount of content cached on each AP must not exceed its maximum capacity number L. Constraint (C3) means that the resolution of each DAC is a positive integer.

To maximize the EE performance, the design for trade-offs is needed. First, AP clusters based on channel quality and high-resolution DAC can select better channels to get the best SE. In contrast, an AP cluster based entirely on cached content and low-resolution DAC can avoid the energy consumption of the fronthaul/backhaul link, reducing the energy consumption of the DAC module. In addition, in large networks, solving this issue is complicated due to the large number of APs and UEs. To solve this problem, we developed deep reinforcement learning-based content caching, AP clustering, and DAC resolution co-selection strategies, which will be elaborated upon in the Section 4.

4. Deep Reinforcement Learning Method

In this section, we will describe how the DDPG algorithm solves the combined predicament of AP clustering, caching, and selecting DAC resolution. Three basic components (action, state, and reward) are defined in reinforcement learning (RL) problems.

4.1. Action, State, and Reward

In slot t, action

a_{t}

encompasses the processes of clustering, caching, and selecting resolution. Let

a_{m k, t} \in {0, 1}

,

a_{m c f, t} \in {0, 1}

, and

a_{m b, t} \in {0, 1}

represent the status of m-th AP and k-th UE service, the cf file cache, and the b bit resolution switch, respectively, where “1” indicates that the service or cache is successful or enabled and “0” indicates that the service or cache is not served, there is no cache, or it is disabled. So, action

a_{t}

can be defined as

a_{t} ≜ {a_{t}^{c l}, a_{t}^{c a}, a_{t}^{r e s}}

(19)

The sets

a_{t}^{c l} = {a_{m k, t} : m \in M, k \in K}

,

a_{t}^{c a} = {a_{m c f, t} : m \in M, c f \in C F}

, and

a_{t}^{r e s} = {a_{m b, t} : m \in M, b \in N^{+}}

contain aggregate results representing the t-th time slot for clustering, caching, and selecting resolution, respectively.

Similarly, the action a_t uniquely determines the sets

C F_{m}

,

b_{m}

,

ℜ_{k}

and

ℂ_{m}

, i.e.,

C F_{m} = {c f : a_{m c f, t} = 1, c f \in C F}

,

b_{m} = {b : a_{m b, t} = 1, b \in N^{+}}

,

ℜ_{k} = {m : a_{m k, t} = 1, m \in M}

, and

ℂ_{m} = {k : a_{m k, t} = 1, k \in K}

.

The state considered in RL should be the set of information that the CPU can collect to compute the reward. In this article, the state of the t-th slot is characterized as the collection of channel gain

G_{t} = {g_{m k, t} : m \in M, k \in K}

, the action of the preceding time slot, and the historical record of file requests for each UE. Define the history set of user requests as

e_{t} = {e_{k c f, t} : k \in K, c f \in C F}

, where

e_{k c f, t} = \sum_{t^{'} = 1}^{t} 1_{c f_{k, t^{'} = c f}}

is the cf file download from the k-th UE request as of time t. So, the state can be denoted as

s_{t} ≜ {G_{t}, a_{t - 1}, e_{t}}

(20)

According to the objection function of the optimization problem (18), the reward function of the t-th slot is defined as

r (s_{t}, a_{t}) ≜ \frac{R_{s u m, t}}{P_{t o t a l, t}}

(21)

where

R_{s u m, t}

and

P_{t o t a l, t}

are given in (9) and (17), correspondingly, and the extra subscript t is given to emphasize the dynamic behavior. It is worth noting that the total achievable rate

R_{s u m, t}

is contingent upon the channel conditions

G_{t}

, the clustering outcome

a_{t}^{c l}

, and the result

a_{t}^{r e s}

of the selection resolution, while the overall power

P_{t o t a l, t}

is contingent upon the caching result

a_{t}^{c a}

and the result

a_{t}^{r e s}

of the selection resolution.

4.2. Deep Deterministic Policy Gradient Approach

DDPG algorithm utilizes an actor–critic network architecture. Moreover, each network is accompanied by its respective target network, resulting in a total of four networks within the DDPG algorithm, namely, the actor network

μ (\cdot | θ^{μ})

, critic network

Q (\cdot | θ^{Q})

, target actor network

μ^{'} (\cdot | θ^{μ^{'}})

, and target critic network

Q^{'} (\cdot | θ^{Q^{'}})

. Each network updates according to its own update rules, maximizing cumulative expected returns. Figure 3 gives the schematic diagram of the DDPG algorithm.

The DDPG algorithm is well-suited for multi-task learning, aligning with the objectives of this paper. This algorithm enhances training stability by adopting a deterministic strategy, which means it directly outputs a specific action value instead of a probability distribution. The algorithm is trained using an experience replay buffer to store past experiences and then to randomly sample from it. This approach breaks the data correlations and ensures that the data conform to an independent distribution, thereby reducing parameter update variance and enhancing convergence speed. Additionally, experiences can be reused, resulting in high data utilization. DDPG leverages neural networks to represent policies (actor) and value functions (critic), making it suitable for high-dimensional state spaces and capable of learning from vast amounts of perceptual data. In comparison to the widely used DQN, DDPG is particularly apt for continuous action spaces. Furthermore, employing actor networks can improve training efficiency, and having more target actor networks and target critic networks helps prevent potential overestimation issues present in DQN.

Algorithm 1 primarily revises the parameters of the actor network and critic network. The actor network adjusts the weight

θ^{μ}

by aiming to maximize the cumulative anticipated reward. The critic network adjusts the weight

θ^{Q}

by seeking to minimize the discrepancy between the evaluation value and the target value. Regarding the update process of the target network, a soft update method is adopted, which can also be called exponential average motion. That is, the learning rate (or momentum)

τ

is introduced, and the weighted average of the previous target network parameters and the current corresponding network parameters are subsequently applied to update the target network. Algorithm 1 summarizes the whole DDPG algorithm process.

Algorithm 1 DDPG Algorithm Procedure

1 : Actor - critic network parameter θ^{μ}

and θ^{Q}

initialization

2 : Set the same parameters θ^{μ^{'}}

and θ^{Q^{'}}

in the target network

3: for plot = 1 to Plot do

4: for timeslot = 1 to T do

5 : Generate action a_{t}

through the actor network μ (s_{t} | θ^{μ})

6 : Get rewards r (s_{t}, a_{t})

and next status s_{t + 1}

according to the action a_{t}

7 : Get the target value q through the critic network Q (s_{t}, a_{t} | θ^{Q})

8 : Use the target network Q^{'} (s_{t + 1}, {a^{'}}_{t + 1} | θ^{Q^{'}}) to get the separate target value y

9 : The gradient is determined by the target value q of the actor - critic network and the target network target value y

10:

Update parameters θ^{μ}

and θ^{Q}

in the network of actors and critics according to the gradient

11:

Update parameters θ^{μ^{'}}

and θ^{Q^{'}}

in the target network according to the

parameters θ^{μ} and θ^{Q} in the actor and critic network and the learning rate τ

12: end for

13: end for

4.3. Computational Complexity

In the DDPG algorithm, the input dimension of the neural network is denoted as

I n p u t ≜ M (3 K + N + 1) + K N

, the output dimension is denoted as

O u t p u t ≜ M (K + N + b)

, and the number of model parameters is denoted as

N u m b e r ≜ 5 I n p u t (I n p u t + 1) + 9 O u t p u t (O u t p u t + 1) + 10 I u p u t * O u t p u t

, determined by the neural network’s layer count and layer size. The experience pool’s size is

B a t c h = 128

, and it holds states, actions, rewards, and next states, resulting in a complexity of

O (B a t c h * (2 K (M + N + b) + K (M + N + 1) + 1) + N u m b e r)

. The decision-making process for actions has a time complexity of

O (K + 2 M)

, thus leading to the complexity of

O (t i m e s l o t * (N u m b e r + K + 2 M))

, where

t i m e s l o t

stands for the number of training iterations.

5. Simulation Results

5.1. Simulation Settings

In this section, we conduct a comparison and analysis of (1) the EE performance of the proposed RL method with three different BM strategies (called BM1, BM2, and BM3), (2) the convergence behavior of DDPG algorithm, (3) the effect of DAC resolutions on the EE, (4) the impact of the number of UE-associated AP on the EE, and (5) the influence of UE and AP quantity on the EE. The BM strategies are given as follows:

BM1: clustering policy based on the SINR ( $l \leq L$ APs to which the k-th UE is connected is the l with the highest SINR) and caching policy based on local popularity (in the UE served by the m-th AP, the most popular N files are cached on the m-th AP);
BM2: clustering strategy based on SINR (same as BM1) and caching strategy based on network popularity (caching the N most popular files across all UEs in all APs);
BM3: cache-based clustering strategy (each UE is connected to $l \leq L$ APs, and its cache is the content request that best matches each UE in the previous slot) and network-based caching strategy (same as BM2).

The computational complexity of BM strategies is all equal to

O (K (M + N + 1))

, with BM2 having the minimum time complexity of

O ((M \log_{2} M) * (K N \log_{2} N))

, BM3 following with

O ((K M \log_{2} M) * (K N \log_{2} N))

, and BM1 having the maximum complexity of

O ((M \log_{2} M) * (M K N \log_{2} N))

. Although the complexity of the BM strategy is lower than that of DDPG algorithm, the optimization effect of the DDPG algorithm is much better than that of the BM strategy.

We contemplate a situation where APs and UEs are randomly distributed within the region of

S_{a} = 1 k m^{2}

, one AP is located at the reference coordinate

(0, 0)

. Both the positions of UEs and APs remain constant during the training phase. We set the number of APs and UEs to be

M = 10

and

K = 5

, correspondingly, the cache size to

| C F_{m} | \leq 2

for each AP, the number of files to

| C F | = 10

, and the DAC resolution to

b_{m} \leq 5

. Refer to [14,28,29,30,31,32] for other system settings and parameters, which are summarized in Table 1.

5.2. Numerical Results Analysis

Figure 4 shows the convergence of RL+DAC and RL in Algorithm 1, where the EE values versus training episodes are demonstrated. The diagram is trained 1000 times and then fused together. When the BM strategy is compared at the 10th episode of training, it can be seen that the EE of RL+DAC and RL proposed after the 26th episode of training is better than that of other BM strategies. The EE of RL+DAC also completely outperformed the RL algorithm after about the 150th episode of training. Note that in Algorithm 1, RL+DAC assumes that each AP employs a distinct ADC resolution, which sacrifices some computational time in exchange for the improved performance. In contrast, RL employs the same ADC resolution for all APs, thereby reducing the algorithm’s complexity.

Figure 5 illustrates the impact on total EE of changes in the relative positions of APs and UEs over time. It can be easily observed that the total EE of the RL+DAC and RL algorithms are always better than that of other BM strategies. In BM schemes, we find that when each UE is attended to by a single AP at moments 0, 3, 4, 7, 8, 10, and 11, their total EE values are higher than others. When each UE is attended to by three AP at moments 1, 5, 6, and 9, the total EE performance is the highest. What this means is that one UE does not choose more AP services to get a better EE performance. Furthermore, the increasing of the number of UEs will result in a greater number of UEs being served by the APs. Consequently, this necessitates APs to make trade-offs when selecting DAC resolution with the RL+DAC algorithm, significantly diminishing the SE improvement for UEs. When there is an abundance of UEs, this effect becomes nearly equivalent to the average SE achieved in the case of employing the same low resolution at each AP.

Figure 6 depicts the correlation between the number of UEs and the mean SE, where the number of APs is

M = 10

. As illustrated in Figure 6, the mean SE of UEs diminishes as the quantity of UEs increases and then tends to be stable. This phenomenon arises due to the escalation in the quantity of UEs, leading to a gradual intensification of inter-UE interference. Ultimately, the average SE will become stable. Furthermore, the upsurge in the number of UEs will result in a greater number of UEs being served by the APs. Consequently, this necessitates APs to make trade-offs when selecting DAC resolution within the RL+DAC algorithm, significantly diminishing the SE improvement for UEs. When there is an abundance of UEs, this effect becomes nearly equivalent to the average SE achieved when each AP in the RL algorithm utilizes the same low resolution.

Figure 7 explores the influence of the number of UEs on the total EE, where the number of APs is

M = 10

. As observed in Figure 7, the overall trend of the total EE decreases as the quantity of UEs increases and then tends to stabilize. This is because the growth of the number of UEs in the early stage is approximately proportional to the energy consumed by the system, and the existence of inter-UE interference will slow the growth of its sum achievable rate, so its total EE continues to decline. When the number of UEs is large, all APs are already serving UE, and augmenting the number of UEs will not lead to an elevation in the power consumption of AP activities, resulting in a smaller increase in total power consumption, so the total EE will tend to balance. It also indirectly validates the result of Figure 6: the increase in the number of UE does not always guarantee better overall system performance. In other words, the higher the number of UE, the interference between UEs will be particularly significant. Note: the total EE of

K = 3

UEs in Figure 7 is lower than the total EE of

K = 4

. This is due to the different location of AP and UE, which will lead to different channel conditions, so that the total EE will produce a certain range fluctuation when the quantity of UE is determined. When the quantity is smaller, the fluctuation due to the different effects of the location will be greater.

Figure 8 shows the correlation between the quantity of APs and the sum achievable rate, where the number of UEs is set as

K = 5

. It is readily noticeable that the sum achievable rate firstly increases with the number of APs and then tends to be stable. This is because when the number of APs is small, the UE selects the APs with better channel conditions, so that the rate can be increased. Nevertheless, in scenarios where the number of APs is large, the augmentation for the number of APs brings about a gradual intensification of interference between APs. Consequently, when the number of APs is already relatively high, further increasing the number of APs will not lead to an increase in the sum achievable rate; in fact, it might even decrease it. Moreover, as depicted in Figure 8, the curves for BM1

(l = 3)

and BM2

(l = 3)

overlap due to their shared clustering policies, distinct caching strategies, and the fact that the sum achievable rate is solely contingent on SINR and not influenced by caching.

Figure 9 shows the impact of the total quantity of AP on the total EE, where the quantity of UE is

K = 5

. The simulation diagram also indirectly verifies the result of Figure 8, i.e., the more AP, the overall system performance is not necessarily better. At the same time, it illustrated from Figure 9 that when the quantity of APs is 16, the total EE of the system is the highest. Because as the quantity of APs increases, so does their power consumption, and when their sum achievable rate increases slowly, their total EE will begin to decrease. In addition, note that the sum achievable rate and total EE of 20 for the number of AP in Figure 8 and Figure 9 do not strictly follow the trend. This is caused by fluctuations due to the randomness of the positions of APs and UEs.

In Figure 10, the impact of low-resolution DAC on the total EE is demonstrated, and it can be observed that when

b \geq 6

, the total EE decreases as the resolution

b

increases. This means that resolution

b

will achieve a better total EE performance in the interval

[1, 5]

, and the RL+DAC resolution in the figure is

b_{m} \leq 5, \forall m \in ℜ

, which has the best total EE. Therefore, this also validates the wisdom of limiting the resolution range to

b \leq 5

in our RL+DAC algorithm design. In addition, with the increasing resolution

b

, its total EE decreases faster and faster because in Formula (4), part of the DAC module’s power consumption increases exponentially with the increase in resolution

b

, while in Formulas (6) and (8), with the resolution

b > 5

, it becomes evident that the sum achievable rate tends to be stable.

6. Conclusions

In this paper, an innovative and practical total EE model of a cache-assisted CF-mMIMO system is established. To maximize the total EE, an energy-efficient joint design of content cache, AP clustering, and low-resolution DAC is carried out, and then, a DRL algorithm (i.e., DDPG method) is proposed. Numerical results show that the total EE of the RL+DAC strategy considering DAC resolution is generally 4% higher than that of the RL strategy, and the total EE of these strategies is much higher than those of the BM strategies. In addition, it can be expected that for multi-antenna APs, the number of DAC modules increases linearly with the increase in the quantity of antennas, so the total EE of our proposed RL+DAC strategy will be much higher than that of the RL strategy.

Author Contributions

Conceptualization, F.T. and Y.P.; methodology, Y.P. and F.T.; software, Y.P.; validation, F.T. and Q.L.; writing—original draft preparation, F.T. and Y.P.; writing—review and editing, Y.P., F.T. and Q.L.; supervision, F.T. and Q.L.; project administration, F.T. and Q.L.; funding acquisition, F.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by the National Natural Science Foundation of China under Grant 62261013 and in part by the Director Foundation of Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing under Grant GXKL06220104.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AP	Access Point
EE	Energy Efficiency
mMIMO	Massive Multiple-Input Multiple-Output
CF-mMIMO	Cell-free Massive Multiple-Input Multiple-Output
DDPG	Deep Deterministic Policy Gradient
BS	Base Station
DAC	Digital-to-Analog Converter
UE	User Equipment
CPU	Central Processing Unit
BM	Benchmark
SE	Spectrum Efficiency
DRL	Deep Reinforcement Learning
CSI	Channel Statement Information

References

Rydning, J. IDC Worldwide Global DataSphere IoT Device and Data Forecast, 2019–2023; IDC: Needham, MA, USA, 2019; p. 3. [Google Scholar]
Choi, N.; Guan, K.; Kilper, D.C.; Atkinson, G. In-network caching effect on optimal energy consumption in content-centric networking. In Proceedings of the IEEE International Conference on Communications, Ottawa, ON, Canada, 10–15 June 2012; pp. 2889–2894. [Google Scholar]
Llorca, J.; Tulino, A.M.; Guan, K.; Esteban, J.; Varvello, M.; Choi, N.; Kilper, D.C. Dynamic in-network caching for energy efficient content delivery. In Proceedings of the 32nd IEEE International Conference on Computer Communications, Turin, Italy, 14–19 April 2013; pp. 245–249. [Google Scholar]
Li, J.; Liu, B.; Wu, H. Energy-efficient in-network caching for content-centric networking. IEEE Commun. Lett. 2013, 17, 797–800. [Google Scholar] [CrossRef]
Golrezaei, N.; Shanmugam, K.; Dimakis, A.G.; Molisch, A.F.; Caire, G. FemtoCaching: Wireless video content delivery through distributed caching helpers. Proceedings of The 31st Annual IEEE International Conference on Computer Communications, Orlando, FL, USA, 25–30 March 2012; pp. 1107–1115. [Google Scholar]
Bastug, E.; Bennis, M.; Debbah, M. Living on the edge: The role of proactive caching in 5G wireless networks. IEEE Commun. Mag. 2014, 52, 82–89. [Google Scholar] [CrossRef]
Xu, Y.; Li, Y.; Wang, Z.; Lin, T.; Zhang, G.; Ci, S. Coordinated caching model for minimizing energy consumption in radio access network. In Proceedings of the IEEE International Conference on Communications, Sydney, Australia, 10–14 June 2014; pp. 2406–2411. [Google Scholar]
Poularakis, K.; Iosifidis, G.; Sourlas, V.; Tassiulas, L. Multicast-aware caching for small cell networks. In Proceedings of the IEEE Wireless Communications and Networking Conference, Istanbul, Turkey, 6–9 April 2014; pp. 2300–2305. [Google Scholar]
Rusek, F.; Persson, D.; Lau, B.K.; Larsson, E.G.; Marzetta, T.L.; Edfors, O.; Tufvesson, F. Scaling up MIMO: Opportunities and challenges with very large arrays. IEEE Signal Process. Mag. 2013, 30, 40–60. [Google Scholar] [CrossRef]
Lopezperez, D.; Roche, G.; Kountouris, M.; Quek, T.; Jie, Z. Enhanced inter-cell interference coordination challenges in heterogeneous networks. IEEE Wirel. Commun. 2011, 18, 22–30. [Google Scholar] [CrossRef]
Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 2014, 52, 186–195. [Google Scholar] [CrossRef]
Gesbert, D.; Hanly, S.; Huang, H.; Shitz, S.S.; Simeone, O.; Yu, W. Multi-cell MIMO cooperative networks: A new look at interference. IEEE J. Sel. Areas Commun. 2010, 28, 1380–1408. [Google Scholar] [CrossRef]
Ammar, H.A.; Adve, R.; Shahbazpanahi, S. User-centric cell-free massive MIMO networks: A survey of opportunities, challenges and solutions. IEEE Commun. Surv. Tutor. 2022, 24, 611–652. [Google Scholar] [CrossRef]
Ngo, H.Q.; Ashikhmin, A.; Yang, H.; Larsson, E.G.; Marzetta, T.L. Cell-free massive MIMO versus small cells. IEEE Trans. Wirel. Commun. 2017, 16, 1834–1850. [Google Scholar] [CrossRef]
Wang, K.; Chen, Z.; Liu, H. Push-based wireless converged networks for massive multimedia content delivery. IEEE Trans. Wirel. Commun. 2014, 13, 2894–2905. [Google Scholar]
Peng, M.; Sun, Y.; Li, X.; Mao, Z.; Wang, C. Recent advances in cloud radio access networks: System architectures, key techniques, and open issues. IEEE Commun. Surv. Tutor. 2016, 18, 2282–2308. [Google Scholar] [CrossRef]
Chen, S.; Zhang, J.; Björnson, E.; Wang, S.; Xing, C.; Ai, B. Wireless caching: Cell-free versus small cells. In Proceedings of the IEEE/CIC International Conference on Communications in China, Xiamen City, China, 28–30 July 2021; pp. 1–6. [Google Scholar]
Peng, Y.; Tan, F.; Liu, Q. Energy-efficient content caching strategy in cell-free massive MIMO networks with reinforcement learning. In Proceedings of the 2023 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Beijing, China, 14–16 June 2023; pp. 1–3. [Google Scholar]
Nguyen, M.-H.T.; Bui, T.T.; Nguyen, L.D. Real-time optimized clustering and caching for 6G satellite-UAV-terrestrial networks. IEEE Trans. Intell. Transp. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
Chaowei, W.; Ziye, W.; Lexi, X. Collaborative Caching in Vehicular Edge Network Assisted by Cell-Free Massive MIMO. Chin. J. Electron. 2023, 33, 1–13. [Google Scholar]
Chuang, Y.-C.; Chiu, W.-Y.; Chang, R.Y.; Lai, Y.-C. Deep reinforcement learning for energy efficiency maximization in cache-enabled cell-free massive MIMO networks: Single- and multi-agent approaches. IEEE Trans. Veh. Technol. 2023, 72, 10826–10839. [Google Scholar] [CrossRef]
Zhang, Y.; Zhou, M.; Qiao, X.; Cao, H.; Yang, L. On the performance of cell-free massive MIMO with low-resolution ADCs. IEEE Access 2019, 7, 117968–117977. [Google Scholar] [CrossRef]
Chang, R.Y.; Han, S.-F.; Chien, F.-T. Reinforcement learning based joint cooperation clustering and content caching in cell-free massive MIMO networks. In Proceedings of the 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall), Norman, OK, USA, 27–30 September 2021; pp. 1–7. [Google Scholar]
Cui, S.; Goldsmith, A.J.; Bahai, A. Energy-constrained modulation optimization. IEEE Trans. Wirel. Commun. 2005, 4, 2349–2360. [Google Scholar]
Ribeiro, L.N.; Schwarz, S.; Rupp, M.; de Almeida, A.L.F. Energy efficiency of mmWave massive MIMO precoding with low-resolution DACs. IEEE J. Sel. Top. Signal Process. 2018, 12, 298–312. [Google Scholar] [CrossRef]
Zhang, J.; Dai, L.; He, Z.; Jin, S.; Li, X. Performance analysis of mixed-ADC massive MIMO systems over Rician fading channels. IEEE Trans. Wirel. Commun. 2017, 35, 1327–1338. [Google Scholar] [CrossRef]
Ngo, H.Q.; Tran, L.-N.; Duong, T.Q.; Matthaiou, M.; Larsson, E.G. On the total energy efficiency of cell-free massive MIMO. IEEE Trans. Green Commun. Netw. 2018, 2, 25–39. [Google Scholar] [CrossRef]
Sadeghi, A.; Sheikholeslami, F.; Giannakis, G.B. Optimal and scalable caching for 5G using reinforcement learning of space-time popularities. IEEE J. Sel. Top. Signal Process. 2018, 12, 180–190. [Google Scholar] [CrossRef]
Yang, C.; Yao, Y.; Chen, Z.; Xia, B. Analysis on cache-enabled wireless heterogeneous networks. IEEE Trans. Wirel. Commun. 2016, 15, 131–145. [Google Scholar] [CrossRef]
Zhong, C.; Gursoy, M.C.; Velipasalar, S. Deep multi-agent reinforcement learning based cooperative edge caching in wireless networks. In Proceedings of the IEEE International Conference on Communications, Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
Björnson, E.; Sanguinetti, L. Scalable cell-free massive MIMO systems. IEEE Trans. Commun. 2020, 68, 4247–4261. [Google Scholar] [CrossRef]
Zhang, H.; Li, H.; Liu, T.; Dong, L.; Shi, G.; Gao, X. Lower energy consumption in cache-aided cell-free massive MIMO systems. Digit. Signal Process. 2023, 135, 103936. [Google Scholar] [CrossRef]

Figure 1. Caching-assisted cell-free massive MIMO model.

Figure 2. Three scenarios in which content is requested during transmission.Green arrows represent the access link between APs and UEs, red arrows represent the backhaul/fronthaul links between the core network and CPU or APs and CPU.

Figure 3. The principle of DDPG algorithm.

Figure 4. The convergence of Algorithm 1.

Figure 5. The total EE versus times.

Figure 6. The relationship between the number of UEs and the average SE.

Figure 7. The relationship between the number of UEs and the total EE.

Figure 8. The relationship between the number of APs and the sum achievable rate.

Figure 9. The relationship between the number of APs and the total EE.

Figure 10. The total EE versus the DAC resolutions.

Table 1. The simulation parameters.

Parameters	Value
Bandwidth B	20 MHz
$Maximum DL transmit power P_{m}$	1000 mW
$Energy consumption of fronthaul link E_{b h}$	$0.25 \times 10^{- 3}$ Joule/Mbit
$Energy consumption of backhaul link E_{b b}$	$15 E_{b h}$
$Thermal noise power per UE σ_{w}^{2}$	$7.457 \times 10^{- 13}$ W
Path-loss exponent α	2
Zipf distribution factor β	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tan, F.; Peng, Y.; Liu, Q. Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks. Sensors 2023, 23, 8295. https://doi.org/10.3390/s23198295

AMA Style

Tan F, Peng Y, Liu Q. Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks. Sensors. 2023; 23(19):8295. https://doi.org/10.3390/s23198295

Chicago/Turabian Style

Tan, Fangqing, Yuan Peng, and Qiang Liu. 2023. "Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks" Sensors 23, no. 19: 8295. https://doi.org/10.3390/s23198295

APA Style

Tan, F., Peng, Y., & Liu, Q. (2023). Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks. Sensors, 23(19), 8295. https://doi.org/10.3390/s23198295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks

Abstract

1. Introduction

2. System Model

2.1. Signal Model

2.2. Caching Model

2.3. Low-Resolution DAC Model

3. The EE Model and Problem Formulation

3.1. The System Sum Rate

3.2. Power Consumption

3.3. Problem Formulation

4. Deep Reinforcement Learning Method

4.1. Action, State, and Reward

4.2. Deep Deterministic Policy Gradient Approach

4.3. Computational Complexity

5. Simulation Results

5.1. Simulation Settings

5.2. Numerical Results Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI