Next Article in Journal
The Performance Characterization and Optimization of Fiber-Optic Acoustic Pressure Sensors Based on the MOEMS Sensitized Structure
Next Article in Special Issue
Satellite Network Security Routing Technology Based on Deep Learning and Trust Management
Previous Article in Journal
Measuring and Energizing Sensor System for Digital Signal Monitoring of an Academic–Experimental CubeSat for Wireless Telemetry Purposes
Previous Article in Special Issue
A Federated Learning Latency Minimization Method for UAV Swarms Aided by Communication Compression and Energy Allocation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks

1
Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing, Guilin University of Electronic Technology, Guilin 541004, China
2
College of Electronic and Information Engineering, Shandong University of Science and Technology, Qingdao 266590, China
*
Authors to whom correspondence should be addressed.
Sensors 2023, 23(19), 8295; https://doi.org/10.3390/s23198295
Submission received: 4 September 2023 / Revised: 1 October 2023 / Accepted: 3 October 2023 / Published: 7 October 2023
(This article belongs to the Special Issue 6G Space-Air-Ground Communication Networks and Key Technologies)

Abstract

:
With the explosive growth of micro-video applications, the transmission burden of fronthaul and backhaul links is increasing, and meanwhile, a lot of energy consumption is also generated. For reducing energy consumption and transmission delay burden, we propose a cell-free massive multiple-input multiple-output (CF-mMIMO) system in which the cache on the access point (AP) is used to reduce the load on the link. In this paper, a total energy efficiency (EE) model of a cache-assisted CF-mMIMO system is established. When optimizing EE, forming the co-operation cluster is critical. Therefore, we propose an energy-efficient joint design of content caching, AP clustering, and low-resolution digital-to-analog converter (DAC) in a cache-assisted CF-mMIMO network based on deep reinforcement learning. This scheme can effectively cache content in APs and select the appropriate DAC resolution. Then, taking into account the channel state information and user equipment (UE)’s content request preference, a deep deterministic policy gradient algorithm is used to jointly optimize the cache strategy, AP clustering, and DAC resolution decisions. Simulation results show that the energy efficiency of the proposed scheme is 4% higher than that of other schemes without the resolution optimization and is much higher than that of the only AP clustering without the joint design of content caching and channel quality.

1. Introduction

Due to the rapid development of smart devices such as smart phones, smart watches, smart robots, and drones, mobile data traffic on wireless networks has experienced terrible growth. IDC estimates that by 2023, there will be 48.9 billion connected devices worldwide [1]. Such a large number of devices will not only generate exabytes of data but will also require massive amounts of content, which will create unprecedented challenges in the upcoming communications field. The capacity of a backhaul link has become the bottleneck of a data-intensive network, and it is necessary to find an efficient way to reduce the backhaul link load to meet the rapidly growing demand for mobile communication.
Caching is a well-known technique used to improve the performance of numerous wired networks, such as content-centric networks [2,3,4]. In cellular networks, caching frequently requesting content at the edge of the network can reduce backhaul costs, reduce access latency and power consumption, and increase throughput. In [5], it is proposed to replace the backhaul link by caching on base stations (BSs). By optimizing the cache strategy, it is possible to serve more users within the limits of download time, which significantly increases throughput. In [6], the caching in the BS can lighten the backhaul traffic load. In order to minimize the overall energy consumption attributed to cache and data transmission including inter-BS and BS-to-server communications, Ref. [7] optimizes the allocation of cache sizes for BSs and service gateways. With the goal of reducing the overall energy consumption of the service, the caching strategy is fine-tuned in [8], where the influence of multicast transmission is considered.
At the same time, to cater to the rising need for traffic data, more and more antennas of base stations and smaller and smaller cell radius will inevitably cause more and more inter-cells interference. To tackle this issue, many research efforts have been made to diminish inter-cell interference [9,10]. Two primary approaches exist: massive multiple-input multiple-output (mMIMO) systems [11] and distributed systems [12]. Within mMIMO systems, BS utilizes the spatial multiplexing yielded by an extensive array of antennas. These antennas are coupled with precoding techniques that can substantially and efficiently mitigate both intra-cell and inter-cell interference among user equipment (UEs). Nonetheless, the uniformity of service quality across all terminals is not guaranteed by the system. The terminal near the BS can enjoy better service due to good channel conditions, while the terminal located at the edge of the cell can only get inadequate quality of service. Within a distributed system, multiple BSs or access points (APs) collaborate by exchanging service data and channel state information (CSI), aiming to minimize inter-cell interference. But distributed systems remain centered around individual cells. Multi-cell collaboration essentially extends the coverage of a single cell. Edge effects continue to impact UE positioned at the periphery of the cell. Therefore, the cell-free massive multiple-input multiple-output (CF-mMIMO) technique has been introduced [13,14], which combines the strengths the aforementioned two systems, namely robust interference cancellation and macro diversity gain. Additionally, it has made some enhancements: (1) It shifts from a cell-centric service model to a UE-centric service model, allowing for potential overlap between distinct AP clusters. (2) There exists a substantial number of APs, wide coverage, and APs are closer to the terminal. Thus, it completely eliminates the concept of a cell.
The essence of the CF-mMIMO system is that the mMIMO system moves the AP closer to the terminals through the integration of fronthaul links and a more frequent utilization of the backhaul links. This will cause a sharp increase in the link load in the CF-mMIMO system, which inevitably results in elevated energy consumption. Therefore, traffic congestion in the fronthaul/backhaul link and high transmission energy consumption constitute the bottlenecks that impede the practical implementation of CF-mMIMO systems. The content caching technology proactively stores data in the cache device and directly transmits the data to the terminal during peak hours without the need to obtain data from the central processing unit (CPU) and core network via the fronthaul/backhaul link. Because the requested content is concentrated in a limited number of popular files [15], the cost associated with its caching is diminishing [16], so content caching proves to be a cost-effective and effective technique in lessening the burden on the link. Building upon this notion, a cache-assisted CF-mMIMO system is introduced in [17]. Moreover, we proposed energy-efficient content of a data caching strategy in CF-mMIMO systems in [18], but only research ideas were provided without experimental validation. Therefore, its total energy efficiency (EE) maximization problem is non-deterministic polynomial-hard (NP-hard) and necessitates solutions through inefficient and non-scalable methods. Additionally, researchers have started to consider the joint optimization of user association and caching strategies [19,20,21]. For example, in [19], the high-density satellite-UAV-terrestrial network scenario is considered, and the initial combination optimization problem is effectively solved using game theory and genetic algorithm for clustering and cache placement, respectively. In [20], for a CF-mMIMO-assisted vehicle edge network, a Deep-Q-Network (DQN) algorithm was proposed to optimize the cache decision for improving the network capacity and content delivery performance. Moreover, two deep reinforcement learning (DRL) methods, single-agent reinforcement learning and multi-agent reinforcement learning, were proposed to solve the joint optimization problem of user association and content cache in CF-mMIMO in [21]. However, most existing research focuses on content caching strategies in edge caching, without considering AP clustering strategies.
On the other hand, a substantial quantity of modules of high-resolution analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) generate a lot of power consumption. To avoid this, it is recommended to use low-resolution ADCs (1–3 bits) in CF-mMIMO networks. This trade-off reduces power consumption while sacrificing spectral efficiency (SE). The work of [22] shows that low-resolution ADCs have better EE than high-resolution ADCs in the uplink of CF-mMIMO systems.
Creating a practical model for the total EE of a cache-assisted CF-mMIMO system, one that is both straightforward to calculate and analyze, while also being amenable to effective optimization, poses a significant challenge. To date, little research has been conducted on cache-assisted CF-mMIMO systems with cache assistance, which encourages the development of this study. The primary contributions of this paper can be outlined as follows:
  • In this paper, a new total EE model of a cache-assisted CF-mMIMO system is established, which has the following advantages: the introduction of low-resolution DAC can improve EE; UE-centric cache deployment can provide a better user experience; and considering the influence of different resolution converters on EE, it is more suitable for practical use;
  • A deep deterministic policy gradient (DDPG) algorithm is proposed to solve the joint optimization problem of content cache, AP clustering, and DAC resolution, and it can find the global optimal decision for maximizing the EE performance in cache-assisted CF-mMIMO networks;
  • We compare and discuss the influence of DAC resolutions, the numbers of UEs, and APs on the EE performance. Moreover, the proposed DDPG method is compared with the benchmark methods, such as clustering based on signal-to-interference noise ratio (SINR) and caching strategies based on content popularity. By exploiting the intelligent design, its EE is not only significantly better than the benchmark (BM) methods but also better than the DDPG method based on the joint content cache and AP clustering.
The rest of the paper is organized as follows. In Section 2, we give a model for the cache-assisted CF-mMIMO system. In Section 3, we propose the total EE model of the cache-assisted CF-mMIMO system and formulate the optimization problem. In Section 4, we present an approach based on DRL. The simulation results and discussion are provided in Section 5. Conclusively, we summarize this paper in Section 6.

2. System Model

In this section, the signal model, cache model, and DAC resolution model of the cache-assisted CF-mMIMO network are introduced. For the signal model, it describes the transmitted signal in the cache-assisted CF-mMIMO network’s downlink channel. For the cache model, a content cache mechanism is outlined to enhance the network’s EE. For the low-resolution DAC model, the power consumption generated with different resolutions and the effect on the signal transmission are explained.

2.1. Signal Model

Figure 1 depicts an example topology of a dynamic collaborative cluster serving UE in a cache-assisted CF-mMIMO network. We consider a downlink CF-mMIMO, i.e., a cache-assisted CF-mMIMO system encompassing of M single-antenna AP and K single-antenna UE. Every AP is linked to the CPU via a fronthaul link, while the CPU itself connects to the core network via backhaul links. All APs and UEs are distributed randomly across S a area. And we only focus on downlink transmissions in this paper.
In time-division duplex (TDD) mode, all APs provide identical time/frequency resources to each terminal. Let the channel linking the m-th AP and the k-th UE be
g m k = ( d m k / d 0 ) α h m k
where d m k denotes the distance between the m-th AP and the k-th UE, d 0 = min m , k d m k is the reference distance, α represents the path-loss exponent ( α 2 ) , and h m k C N ( 0 , 1 ) denotes small-scale fading.
Let k denote the set of APs serving the k-th UE and m represent the set of UE served by the m-th AP. We make the assumption that each UE is ensured service from no more than L ( L < M ) APs (i.e., | k | L , k ). Therefore, the AP set for all services and the UE set for all services can be represented as = k = 1 K k and = m = 1 M m , respectively. Let q k be a symbol emitted in the service AP for the k-th UE, where E [ | q k | 2 ] = 1 , E [ q k ] = 0 , k and E [ q k q l ] = 0 , k l (i.e., the symbols of distinct UE are not related). Then, the transmitted signal of the m-th AP can be expressed as [14]
x m = k m p m k g ^ m k * q k
where p m k signifies the power assigned to the k-th UE at the m-th AP subject to power constraints, E [ | x m | 2 ] P m is constrained by the maximum power P m transmitted by the m-th AP, and g ^ m k denotes the channel estimation g m k at the m-th AP. This paper considers perfect CSI (i.e., g ^ m k = g m k , m , k ).
Accordingly, the k-th UE’s received signal can be expressed as [23]
r k = m g m k x m + w k = m k g m k x m + m k c g m k x m + w k = m k k m p m k g m k g ^ m k * q k + m k c g m k x m + w k = m k p m k g m k g ^ m k * q k useful   signal + m k k , k k p m k g m k g ^ m k * q k + w k interference   plus   noise
where w k C N ( 0 , σ w 2 ) signifies the noise at the k-th UE, and k c = \ k is the other AP set that does not serve the k-th UE.

2.2. Caching Model

We consider a limited file library C F = { c f 1 , c f 2 , , c f F } with F content files. Let C F m C F be the set of content files cached in the m-th AP. Additionally, we make the assumption that each AP has the capacity to cache a maximum of N ( N < F ) files, denoted as | C F m | N , m . Each UE makes content file requests independently or abandons the request. The content file for the k-th UE request is represented by c f k C F , where c f k is determined by the content preference vector for the k-th UE (arranged in descending order of preference for all content files) and the distribution of content popularity (specified with the Zipf distribution). To be more specific, in the content preference vector of the k-th UE, the probability that c f k equals the content file of the i-th rank is i β / j = 1 F j β , where β is the Zipf factor; Usually, set to β = 0.5 , 1 , 2 . Each UE possesses a different, independent, and time-invariant content preference vector.
We use H m k to define the event that the content file requested by the k-th UE is cached on its m-th service AP, i.e., c f k C F m , m k . Therefore, the matching event of the k-th UE H k indicates that the file requested by the k-th UE is cached across all APs serving the k-th UE, i.e., c f k C F m , m k . In case of a miss, there exist certain m k APs that do not cache the file of the k-th UE, i.e., c f k C F m , m k . In such scenarios, these APs will necessitate requesting the content file c f k from the CPU/core network for joint AP transfer. The network’s hit ratio is denoted as H = k 1 H k / | | , where 1 H k signifies the indicator function, and it is set to be 1 if the H k event occurs; otherwise, it is set to 0.

2.3. Low-Resolution DAC Model

We adopt a low-resolution DAC with a binary-weighted current-oriented topology, whose power consumption is composed of both static and dynamic components. The power consumption of a DAC module with a resolution of b can be given as [24,25]
P D A C ( b , F s ) = 1.5 × 10 5 2 b + 4.5 × 10 12 b F s
where F s is the sampling frequency.
Each AP’s antenna is connected to a low-resolution DAC, and the resulting signal has α [ 0 , 1 ] linear gain. Therefore, the transmitting signal of the m-th AP given in (2) is now modified as
x m = α m k m p m k g ^ m k * q k
where α m represents the linear gain of the m-th AP, and its expression is [22,26]
α m = { 0.6366 , b m = 1 0.8825 , b m = 2 1 ( π 3 / 2 ) 2 2 b m   , b m 3
The received signal of k-th terminal provided by (3) is rewritten as
r k = m k α m p m k g m k g ^ m k * q k useful   signal + m k α m k , k k p m k g m k g ^ m k * q k + w k interference   plus   noise

3. The EE Model and Problem Formulation

3.1. The System Sum Rate

According to the Shannon theory, the achievable rate of the k-th UE can be expressed as
R k = B log 2 ( 1 + | m k α m p m k g m k g ^ m k * | 2 k , k k | m α m p m k g m k g ^ m k * | 2 + | w k | 2 )
where B is the bandwidth. Therefore, the overall achievable rate of the considered cache-assisted CF-mMIMO network is given using
R s u m = k R k

3.2. Power Consumption

The overall power consumption of the network consists of four parts: (1) the transmission power of all service APs; (2) power consumption of DAC in all service APs; (3) the power required by the AP to recover the lost content file from the CPU; and (4) the power needed by the CPU to recover the lost content from the core network.
For (1), the total transmitted power of all service APs is represented by m P m . For (2), the sum of DAC power consumption in all service APs is given using m P D A C m , where P D A C m indicates the power consumption of the DAC module with b m resolution selected using the m-th AP.
For (3) and (4) in cache-assisted CF-mMIMO systems, all APs within a cluster must concurrently transmit identical content to the terminal. Cache deployment results in three scenarios are illustrated in Figure 2: (a) Every AP within the cluster has deployed the required content. (b) Only some APs have deployed the required data. c) None of the APs deployed the required content. In Scenario (a), no content is conveyed through either the fronthaul or backhaul link. In Scenario (b), certain APs are required to transmit content via the fronthaul link. In Scenario (c), all content is transmitted to the AP via the backhaul link of the core network and the fronthaul link of the CPU.
The fronthaul link is utilized for content transmission between the AP and the CPU, its power consumption is proportional to the cumulative SE sum, and its expression is [27]
P b h , m = E b h k m R k
where E b h indicates the energy consumed for transmitting 1 Mbit of data over the fronthaul link.
The m-th AP is used to transmit data q 1 , q 2 , q K via a fronthaul/backhaul link between the CPU and core network. Therefore, the fronthaul/backhaul power consumption depends on the SE, SE 1 , SE 2 , SE K . If the m-th AP serves only specific UEs, it merely transmits data related to these UEs. Therefore, the power consumption for the fronthaul/backhaul is contingent solely on the SE of these UEs. As shown in Figure 2, the cache power consumption is calculated with the user as the center, so the fronthaul power consumption of the k-th cluster can be represented as
P b h , k = E b h R k
Similarly, the backhaul power generated by the k-th cluster’s backhaul link for transferring data between the core network and the CPU can be expressed as
P b b , k = E b b R k
where E b b indicates the energy consumed for transmitting 1 Mbit of data over a backlink. Therefore, the power uploaded via the AP to the CPU in the k-th cluster can be represented as
P b h , k u p = { P b h , k , 0 < m k | H m k m i s s | / | k | < 1 0 , others
where H m k m i s s = | 1 H m k | represents the event that the UE content request provided by the m-th AP is not cached on the m-th AP. The value is 1 if the event H m k m i s s occurs, but it is 0 otherwise.
The energy consumption associated with the content requested by the AP from the CPU in the k-th cluster can be expressed as
P b h , k d o w n = P b h , k m k | H m k m i s s |
The backhaul power generated by the k-th cluster requesting content from the core network can be denoted as
P b b , k d o w n = P b b , k m k | H m k m i s s | / | k |
Therefore, the fronthaul/backhaul power consumption of the k-th cluster can be expressed as
P B , k = P b h , k u p + P b h , k d o w n + P b b , k d o w n
So, the overall energy consumption can be expressed as
P t o t a l = m ( P m + P D A C m ) + k = 1 K P B , k

3.3. Problem Formulation

Our aim is to find a strategy that determines the AP clustering with 1 , 2 , , K , as well as the AP’s content cache C F 1 , C F 2 , , C F M and its DAC resolution b 1 , b 2 , , b M , in order to maximize the system’s EE. The optimization problem can be expressed as
max R s u m P t o t a l s . t . ( C 1 ) : | k | M , k , ( C 2 ) : | C F m | L , m , ( C 3 ) : b m N + , m .
where constraint (C1) indicates that the count of APs within the AP cluster of each UE cannot exceed the maximum number of connections M. Constraint (C2) requires that the amount of content cached on each AP must not exceed its maximum capacity number L. Constraint (C3) means that the resolution of each DAC is a positive integer.
To maximize the EE performance, the design for trade-offs is needed. First, AP clusters based on channel quality and high-resolution DAC can select better channels to get the best SE. In contrast, an AP cluster based entirely on cached content and low-resolution DAC can avoid the energy consumption of the fronthaul/backhaul link, reducing the energy consumption of the DAC module. In addition, in large networks, solving this issue is complicated due to the large number of APs and UEs. To solve this problem, we developed deep reinforcement learning-based content caching, AP clustering, and DAC resolution co-selection strategies, which will be elaborated upon in the Section 4.

4. Deep Reinforcement Learning Method

In this section, we will describe how the DDPG algorithm solves the combined predicament of AP clustering, caching, and selecting DAC resolution. Three basic components (action, state, and reward) are defined in reinforcement learning (RL) problems.

4.1. Action, State, and Reward

In slot t, action a t encompasses the processes of clustering, caching, and selecting resolution. Let a m k , t { 0 , 1 } , a m c f , t { 0 , 1 } , and a m b , t { 0 , 1 } represent the status of m-th AP and k-th UE service, the cf file cache, and the b bit resolution switch, respectively, where “1” indicates that the service or cache is successful or enabled and “0” indicates that the service or cache is not served, there is no cache, or it is disabled. So, action a t can be defined as
a t { a t c l , a t c a , a t r e s }
The sets a t c l = { a m k , t : m M , k K } , a t c a = { a m c f , t : m M , c f C F } , and a t r e s = { a m b , t : m M , b N + } contain aggregate results representing the t-th time slot for clustering, caching, and selecting resolution, respectively.
Similarly, the action at uniquely determines the sets C F m , b m , k and m , i.e., C F m = { c f : a m c f , t = 1 , c f C F } , b m = { b : a m b , t = 1 , b N + } , k = { m : a m k , t = 1 , m M } , and m = { k : a m k , t = 1 , k K } .
The state considered in RL should be the set of information that the CPU can collect to compute the reward. In this article, the state of the t-th slot is characterized as the collection of channel gain G t = { g m k , t : m M , k K } , the action of the preceding time slot, and the historical record of file requests for each UE. Define the history set of user requests as e t = { e k c f , t : k K , c f C F } , where e k c f , t = t = 1 t 1 c f k , t = c f is the cf file download from the k-th UE request as of time t. So, the state can be denoted as
s t { G t , a t 1 , e t }
According to the objection function of the optimization problem (18), the reward function of the t-th slot is defined as
r ( s t , a t ) R s u m , t P t o t a l , t
where R s u m , t and P t o t a l , t are given in (9) and (17), correspondingly, and the extra subscript t is given to emphasize the dynamic behavior. It is worth noting that the total achievable rate R s u m , t is contingent upon the channel conditions G t , the clustering outcome a t c l , and the result a t r e s of the selection resolution, while the overall power P t o t a l , t is contingent upon the caching result a t c a and the result a t r e s of the selection resolution.

4.2. Deep Deterministic Policy Gradient Approach

DDPG algorithm utilizes an actor–critic network architecture. Moreover, each network is accompanied by its respective target network, resulting in a total of four networks within the DDPG algorithm, namely, the actor network μ ( | θ μ ) , critic network Q ( | θ Q ) , target actor network μ ( | θ μ ) , and target critic network Q ( | θ Q ) . Each network updates according to its own update rules, maximizing cumulative expected returns. Figure 3 gives the schematic diagram of the DDPG algorithm.
The DDPG algorithm is well-suited for multi-task learning, aligning with the objectives of this paper. This algorithm enhances training stability by adopting a deterministic strategy, which means it directly outputs a specific action value instead of a probability distribution. The algorithm is trained using an experience replay buffer to store past experiences and then to randomly sample from it. This approach breaks the data correlations and ensures that the data conform to an independent distribution, thereby reducing parameter update variance and enhancing convergence speed. Additionally, experiences can be reused, resulting in high data utilization. DDPG leverages neural networks to represent policies (actor) and value functions (critic), making it suitable for high-dimensional state spaces and capable of learning from vast amounts of perceptual data. In comparison to the widely used DQN, DDPG is particularly apt for continuous action spaces. Furthermore, employing actor networks can improve training efficiency, and having more target actor networks and target critic networks helps prevent potential overestimation issues present in DQN.
Algorithm 1 primarily revises the parameters of the actor network and critic network. The actor network adjusts the weight θ μ by aiming to maximize the cumulative anticipated reward. The critic network adjusts the weight θ Q by seeking to minimize the discrepancy between the evaluation value and the target value. Regarding the update process of the target network, a soft update method is adopted, which can also be called exponential average motion. That is, the learning rate (or momentum) τ is introduced, and the weighted average of the previous target network parameters and the current corresponding network parameters are subsequently applied to update the target network. Algorithm 1 summarizes the whole DDPG algorithm process.
Algorithm 1 DDPG Algorithm Procedure
1 :   Actor critic   network   parameter   θ μ   and   θ Q initialization
2 :   Set   the   same   parameters   θ μ   and   θ Q in the target network
3: for plot = 1 to Plot do
4:   for timeslot = 1 to T do
5 :   Generate   action   a t   through   the   actor   network   μ ( s t | θ μ )
6 :   Get   rewards   r ( s t , a t )   and   next   status   s t + 1   according   to   the   action   a t
7 :   Get   the   target   value   q   through   the   critic   network   Q ( s t , a t | θ Q )
8 :   Use   the   target   network   Q ( s t + 1 , a t + 1 | θ Q )   to   get   the   separate   target   value y
9 :   The   gradient   is   determined   by   the   target   value   q   of the actor critic network and   the   target   network   target   value   y
10: Update   parameters   θ μ   and   θ Q in the network of actors and critics according to the gradient
11: Update   parameters   θ μ   and   θ Q in the target network according to the   parameters   θ μ   and   θ Q   in   the   actor   and   critic   network   and   the   learning   rate   τ
12:   end for
13:  end for

4.3. Computational Complexity

In the DDPG algorithm, the input dimension of the neural network is denoted as I n p u t M ( 3 K + N + 1 ) + K N , the output dimension is denoted as O u t p u t M ( K + N + b ) , and the number of model parameters is denoted as N u m b e r 5 I n p u t ( I n p u t + 1 ) + 9 O u t p u t ( O u t p u t + 1 ) + 10 I u p u t O u t p u t , determined by the neural network’s layer count and layer size. The experience pool’s size is B a t c h = 128 , and it holds states, actions, rewards, and next states, resulting in a complexity of O ( B a t c h ( 2 K ( M + N + b ) + K ( M + N + 1 ) + 1 ) + N u m b e r ) . The decision-making process for actions has a time complexity of O ( K + 2 M ) , thus leading to the complexity of O ( t i m e s l o t ( N u m b e r + K + 2 M ) ) , where t i m e s l o t stands for the number of training iterations.

5. Simulation Results

5.1. Simulation Settings

In this section, we conduct a comparison and analysis of (1) the EE performance of the proposed RL method with three different BM strategies (called BM1, BM2, and BM3), (2) the convergence behavior of DDPG algorithm, (3) the effect of DAC resolutions on the EE, (4) the impact of the number of UE-associated AP on the EE, and (5) the influence of UE and AP quantity on the EE. The BM strategies are given as follows:
  • BM1: clustering policy based on the SINR ( l L APs to which the k-th UE is connected is the l with the highest SINR) and caching policy based on local popularity (in the UE served by the m-th AP, the most popular N files are cached on the m-th AP);
  • BM2: clustering strategy based on SINR (same as BM1) and caching strategy based on network popularity (caching the N most popular files across all UEs in all APs);
  • BM3: cache-based clustering strategy (each UE is connected to l L APs, and its cache is the content request that best matches each UE in the previous slot) and network-based caching strategy (same as BM2).
The computational complexity of BM strategies is all equal to O ( K ( M + N + 1 ) ) , with BM2 having the minimum time complexity of O ( ( M log 2 M ) ( K N log 2 N ) ) , BM3 following with O ( ( K M log 2 M ) ( K N log 2 N ) ) , and BM1 having the maximum complexity of O ( ( M log 2 M ) ( M K N log 2 N ) ) . Although the complexity of the BM strategy is lower than that of DDPG algorithm, the optimization effect of the DDPG algorithm is much better than that of the BM strategy.
We contemplate a situation where APs and UEs are randomly distributed within the region of S a = 1   k m 2 , one AP is located at the reference coordinate ( 0 , 0 ) . Both the positions of UEs and APs remain constant during the training phase. We set the number of APs and UEs to be M = 10 and K = 5 , correspondingly, the cache size to | C F m | 2 for each AP, the number of files to | C F | = 10 , and the DAC resolution to b m 5 . Refer to [14,28,29,30,31,32] for other system settings and parameters, which are summarized in Table 1.

5.2. Numerical Results Analysis

Figure 4 shows the convergence of RL+DAC and RL in Algorithm 1, where the EE values versus training episodes are demonstrated. The diagram is trained 1000 times and then fused together. When the BM strategy is compared at the 10th episode of training, it can be seen that the EE of RL+DAC and RL proposed after the 26th episode of training is better than that of other BM strategies. The EE of RL+DAC also completely outperformed the RL algorithm after about the 150th episode of training. Note that in Algorithm 1, RL+DAC assumes that each AP employs a distinct ADC resolution, which sacrifices some computational time in exchange for the improved performance. In contrast, RL employs the same ADC resolution for all APs, thereby reducing the algorithm’s complexity.
Figure 5 illustrates the impact on total EE of changes in the relative positions of APs and UEs over time. It can be easily observed that the total EE of the RL+DAC and RL algorithms are always better than that of other BM strategies. In BM schemes, we find that when each UE is attended to by a single AP at moments 0, 3, 4, 7, 8, 10, and 11, their total EE values are higher than others. When each UE is attended to by three AP at moments 1, 5, 6, and 9, the total EE performance is the highest. What this means is that one UE does not choose more AP services to get a better EE performance. Furthermore, the increasing of the number of UEs will result in a greater number of UEs being served by the APs. Consequently, this necessitates APs to make trade-offs when selecting DAC resolution with the RL+DAC algorithm, significantly diminishing the SE improvement for UEs. When there is an abundance of UEs, this effect becomes nearly equivalent to the average SE achieved in the case of employing the same low resolution at each AP.
Figure 6 depicts the correlation between the number of UEs and the mean SE, where the number of APs is M = 10 . As illustrated in Figure 6, the mean SE of UEs diminishes as the quantity of UEs increases and then tends to be stable. This phenomenon arises due to the escalation in the quantity of UEs, leading to a gradual intensification of inter-UE interference. Ultimately, the average SE will become stable. Furthermore, the upsurge in the number of UEs will result in a greater number of UEs being served by the APs. Consequently, this necessitates APs to make trade-offs when selecting DAC resolution within the RL+DAC algorithm, significantly diminishing the SE improvement for UEs. When there is an abundance of UEs, this effect becomes nearly equivalent to the average SE achieved when each AP in the RL algorithm utilizes the same low resolution.
Figure 7 explores the influence of the number of UEs on the total EE, where the number of APs is M = 10 . As observed in Figure 7, the overall trend of the total EE decreases as the quantity of UEs increases and then tends to stabilize. This is because the growth of the number of UEs in the early stage is approximately proportional to the energy consumed by the system, and the existence of inter-UE interference will slow the growth of its sum achievable rate, so its total EE continues to decline. When the number of UEs is large, all APs are already serving UE, and augmenting the number of UEs will not lead to an elevation in the power consumption of AP activities, resulting in a smaller increase in total power consumption, so the total EE will tend to balance. It also indirectly validates the result of Figure 6: the increase in the number of UE does not always guarantee better overall system performance. In other words, the higher the number of UE, the interference between UEs will be particularly significant. Note: the total EE of K = 3 UEs in Figure 7 is lower than the total EE of K = 4 . This is due to the different location of AP and UE, which will lead to different channel conditions, so that the total EE will produce a certain range fluctuation when the quantity of UE is determined. When the quantity is smaller, the fluctuation due to the different effects of the location will be greater.
Figure 8 shows the correlation between the quantity of APs and the sum achievable rate, where the number of UEs is set as K = 5 . It is readily noticeable that the sum achievable rate firstly increases with the number of APs and then tends to be stable. This is because when the number of APs is small, the UE selects the APs with better channel conditions, so that the rate can be increased. Nevertheless, in scenarios where the number of APs is large, the augmentation for the number of APs brings about a gradual intensification of interference between APs. Consequently, when the number of APs is already relatively high, further increasing the number of APs will not lead to an increase in the sum achievable rate; in fact, it might even decrease it. Moreover, as depicted in Figure 8, the curves for BM1 ( l = 3 ) and BM2 ( l = 3 ) overlap due to their shared clustering policies, distinct caching strategies, and the fact that the sum achievable rate is solely contingent on SINR and not influenced by caching.
Figure 9 shows the impact of the total quantity of AP on the total EE, where the quantity of UE is K = 5 . The simulation diagram also indirectly verifies the result of Figure 8, i.e., the more AP, the overall system performance is not necessarily better. At the same time, it illustrated from Figure 9 that when the quantity of APs is 16, the total EE of the system is the highest. Because as the quantity of APs increases, so does their power consumption, and when their sum achievable rate increases slowly, their total EE will begin to decrease. In addition, note that the sum achievable rate and total EE of 20 for the number of AP in Figure 8 and Figure 9 do not strictly follow the trend. This is caused by fluctuations due to the randomness of the positions of APs and UEs.
In Figure 10, the impact of low-resolution DAC on the total EE is demonstrated, and it can be observed that when b 6 , the total EE decreases as the resolution b increases. This means that resolution b will achieve a better total EE performance in the interval [ 1 , 5 ] , and the RL+DAC resolution in the figure is b m 5 , m , which has the best total EE. Therefore, this also validates the wisdom of limiting the resolution range to b 5 in our RL+DAC algorithm design. In addition, with the increasing resolution b , its total EE decreases faster and faster because in Formula (4), part of the DAC module’s power consumption increases exponentially with the increase in resolution b , while in Formulas (6) and (8), with the resolution b > 5 , it becomes evident that the sum achievable rate tends to be stable.

6. Conclusions

In this paper, an innovative and practical total EE model of a cache-assisted CF-mMIMO system is established. To maximize the total EE, an energy-efficient joint design of content cache, AP clustering, and low-resolution DAC is carried out, and then, a DRL algorithm (i.e., DDPG method) is proposed. Numerical results show that the total EE of the RL+DAC strategy considering DAC resolution is generally 4% higher than that of the RL strategy, and the total EE of these strategies is much higher than those of the BM strategies. In addition, it can be expected that for multi-antenna APs, the number of DAC modules increases linearly with the increase in the quantity of antennas, so the total EE of our proposed RL+DAC strategy will be much higher than that of the RL strategy.

Author Contributions

Conceptualization, F.T. and Y.P.; methodology, Y.P. and F.T.; software, Y.P.; validation, F.T. and Q.L.; writing—original draft preparation, F.T. and Y.P.; writing—review and editing, Y.P., F.T. and Q.L.; supervision, F.T. and Q.L.; project administration, F.T. and Q.L.; funding acquisition, F.T. All authors have read and agreed to the published version of the manuscript.

Funding

This work has been supported by the National Natural Science Foundation of China under Grant 62261013 and in part by the Director Foundation of Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing under Grant GXKL06220104.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
APAccess Point
EEEnergy Efficiency
mMIMOMassive Multiple-Input Multiple-Output
CF-mMIMOCell-free Massive Multiple-Input Multiple-Output
DDPGDeep Deterministic Policy Gradient
BSBase Station
DACDigital-to-Analog Converter
UEUser Equipment
CPUCentral Processing Unit
BMBenchmark
SESpectrum Efficiency
DRLDeep Reinforcement Learning
CSIChannel Statement Information

References

  1. Rydning, J. IDC Worldwide Global DataSphere IoT Device and Data Forecast, 2019–2023; IDC: Needham, MA, USA, 2019; p. 3. [Google Scholar]
  2. Choi, N.; Guan, K.; Kilper, D.C.; Atkinson, G. In-network caching effect on optimal energy consumption in content-centric networking. In Proceedings of the IEEE International Conference on Communications, Ottawa, ON, Canada, 10–15 June 2012; pp. 2889–2894. [Google Scholar]
  3. Llorca, J.; Tulino, A.M.; Guan, K.; Esteban, J.; Varvello, M.; Choi, N.; Kilper, D.C. Dynamic in-network caching for energy efficient content delivery. In Proceedings of the 32nd IEEE International Conference on Computer Communications, Turin, Italy, 14–19 April 2013; pp. 245–249. [Google Scholar]
  4. Li, J.; Liu, B.; Wu, H. Energy-efficient in-network caching for content-centric networking. IEEE Commun. Lett. 2013, 17, 797–800. [Google Scholar] [CrossRef]
  5. Golrezaei, N.; Shanmugam, K.; Dimakis, A.G.; Molisch, A.F.; Caire, G. FemtoCaching: Wireless video content delivery through distributed caching helpers. Proceedings of The 31st Annual IEEE International Conference on Computer Communications, Orlando, FL, USA, 25–30 March 2012; pp. 1107–1115. [Google Scholar]
  6. Bastug, E.; Bennis, M.; Debbah, M. Living on the edge: The role of proactive caching in 5G wireless networks. IEEE Commun. Mag. 2014, 52, 82–89. [Google Scholar] [CrossRef]
  7. Xu, Y.; Li, Y.; Wang, Z.; Lin, T.; Zhang, G.; Ci, S. Coordinated caching model for minimizing energy consumption in radio access network. In Proceedings of the IEEE International Conference on Communications, Sydney, Australia, 10–14 June 2014; pp. 2406–2411. [Google Scholar]
  8. Poularakis, K.; Iosifidis, G.; Sourlas, V.; Tassiulas, L. Multicast-aware caching for small cell networks. In Proceedings of the IEEE Wireless Communications and Networking Conference, Istanbul, Turkey, 6–9 April 2014; pp. 2300–2305. [Google Scholar]
  9. Rusek, F.; Persson, D.; Lau, B.K.; Larsson, E.G.; Marzetta, T.L.; Edfors, O.; Tufvesson, F. Scaling up MIMO: Opportunities and challenges with very large arrays. IEEE Signal Process. Mag. 2013, 30, 40–60. [Google Scholar] [CrossRef]
  10. Lopezperez, D.; Roche, G.; Kountouris, M.; Quek, T.; Jie, Z. Enhanced inter-cell interference coordination challenges in heterogeneous networks. IEEE Wirel. Commun. 2011, 18, 22–30. [Google Scholar] [CrossRef]
  11. Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 2014, 52, 186–195. [Google Scholar] [CrossRef]
  12. Gesbert, D.; Hanly, S.; Huang, H.; Shitz, S.S.; Simeone, O.; Yu, W. Multi-cell MIMO cooperative networks: A new look at interference. IEEE J. Sel. Areas Commun. 2010, 28, 1380–1408. [Google Scholar] [CrossRef]
  13. Ammar, H.A.; Adve, R.; Shahbazpanahi, S. User-centric cell-free massive MIMO networks: A survey of opportunities, challenges and solutions. IEEE Commun. Surv. Tutor. 2022, 24, 611–652. [Google Scholar] [CrossRef]
  14. Ngo, H.Q.; Ashikhmin, A.; Yang, H.; Larsson, E.G.; Marzetta, T.L. Cell-free massive MIMO versus small cells. IEEE Trans. Wirel. Commun. 2017, 16, 1834–1850. [Google Scholar] [CrossRef]
  15. Wang, K.; Chen, Z.; Liu, H. Push-based wireless converged networks for massive multimedia content delivery. IEEE Trans. Wirel. Commun. 2014, 13, 2894–2905. [Google Scholar]
  16. Peng, M.; Sun, Y.; Li, X.; Mao, Z.; Wang, C. Recent advances in cloud radio access networks: System architectures, key techniques, and open issues. IEEE Commun. Surv. Tutor. 2016, 18, 2282–2308. [Google Scholar] [CrossRef]
  17. Chen, S.; Zhang, J.; Björnson, E.; Wang, S.; Xing, C.; Ai, B. Wireless caching: Cell-free versus small cells. In Proceedings of the IEEE/CIC International Conference on Communications in China, Xiamen City, China, 28–30 July 2021; pp. 1–6. [Google Scholar]
  18. Peng, Y.; Tan, F.; Liu, Q. Energy-efficient content caching strategy in cell-free massive MIMO networks with reinforcement learning. In Proceedings of the 2023 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Beijing, China, 14–16 June 2023; pp. 1–3. [Google Scholar]
  19. Nguyen, M.-H.T.; Bui, T.T.; Nguyen, L.D. Real-time optimized clustering and caching for 6G satellite-UAV-terrestrial networks. IEEE Trans. Intell. Transp. Syst. 2023, 1–11. [Google Scholar] [CrossRef]
  20. Chaowei, W.; Ziye, W.; Lexi, X. Collaborative Caching in Vehicular Edge Network Assisted by Cell-Free Massive MIMO. Chin. J. Electron. 2023, 33, 1–13. [Google Scholar]
  21. Chuang, Y.-C.; Chiu, W.-Y.; Chang, R.Y.; Lai, Y.-C. Deep reinforcement learning for energy efficiency maximization in cache-enabled cell-free massive MIMO networks: Single- and multi-agent approaches. IEEE Trans. Veh. Technol. 2023, 72, 10826–10839. [Google Scholar] [CrossRef]
  22. Zhang, Y.; Zhou, M.; Qiao, X.; Cao, H.; Yang, L. On the performance of cell-free massive MIMO with low-resolution ADCs. IEEE Access 2019, 7, 117968–117977. [Google Scholar] [CrossRef]
  23. Chang, R.Y.; Han, S.-F.; Chien, F.-T. Reinforcement learning based joint cooperation clustering and content caching in cell-free massive MIMO networks. In Proceedings of the 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall), Norman, OK, USA, 27–30 September 2021; pp. 1–7. [Google Scholar]
  24. Cui, S.; Goldsmith, A.J.; Bahai, A. Energy-constrained modulation optimization. IEEE Trans. Wirel. Commun. 2005, 4, 2349–2360. [Google Scholar]
  25. Ribeiro, L.N.; Schwarz, S.; Rupp, M.; de Almeida, A.L.F. Energy efficiency of mmWave massive MIMO precoding with low-resolution DACs. IEEE J. Sel. Top. Signal Process. 2018, 12, 298–312. [Google Scholar] [CrossRef]
  26. Zhang, J.; Dai, L.; He, Z.; Jin, S.; Li, X. Performance analysis of mixed-ADC massive MIMO systems over Rician fading channels. IEEE Trans. Wirel. Commun. 2017, 35, 1327–1338. [Google Scholar] [CrossRef]
  27. Ngo, H.Q.; Tran, L.-N.; Duong, T.Q.; Matthaiou, M.; Larsson, E.G. On the total energy efficiency of cell-free massive MIMO. IEEE Trans. Green Commun. Netw. 2018, 2, 25–39. [Google Scholar] [CrossRef]
  28. Sadeghi, A.; Sheikholeslami, F.; Giannakis, G.B. Optimal and scalable caching for 5G using reinforcement learning of space-time popularities. IEEE J. Sel. Top. Signal Process. 2018, 12, 180–190. [Google Scholar] [CrossRef]
  29. Yang, C.; Yao, Y.; Chen, Z.; Xia, B. Analysis on cache-enabled wireless heterogeneous networks. IEEE Trans. Wirel. Commun. 2016, 15, 131–145. [Google Scholar] [CrossRef]
  30. Zhong, C.; Gursoy, M.C.; Velipasalar, S. Deep multi-agent reinforcement learning based cooperative edge caching in wireless networks. In Proceedings of the IEEE International Conference on Communications, Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
  31. Björnson, E.; Sanguinetti, L. Scalable cell-free massive MIMO systems. IEEE Trans. Commun. 2020, 68, 4247–4261. [Google Scholar] [CrossRef]
  32. Zhang, H.; Li, H.; Liu, T.; Dong, L.; Shi, G.; Gao, X. Lower energy consumption in cache-aided cell-free massive MIMO systems. Digit. Signal Process. 2023, 135, 103936. [Google Scholar] [CrossRef]
Figure 1. Caching-assisted cell-free massive MIMO model.
Figure 1. Caching-assisted cell-free massive MIMO model.
Sensors 23 08295 g001
Figure 2. Three scenarios in which content is requested during transmission.Green arrows represent the access link between APs and UEs, red arrows represent the backhaul/fronthaul links between the core network and CPU or APs and CPU.
Figure 2. Three scenarios in which content is requested during transmission.Green arrows represent the access link between APs and UEs, red arrows represent the backhaul/fronthaul links between the core network and CPU or APs and CPU.
Sensors 23 08295 g002
Figure 3. The principle of DDPG algorithm.
Figure 3. The principle of DDPG algorithm.
Sensors 23 08295 g003
Figure 4. The convergence of Algorithm 1.
Figure 4. The convergence of Algorithm 1.
Sensors 23 08295 g004
Figure 5. The total EE versus times.
Figure 5. The total EE versus times.
Sensors 23 08295 g005
Figure 6. The relationship between the number of UEs and the average SE.
Figure 6. The relationship between the number of UEs and the average SE.
Sensors 23 08295 g006
Figure 7. The relationship between the number of UEs and the total EE.
Figure 7. The relationship between the number of UEs and the total EE.
Sensors 23 08295 g007
Figure 8. The relationship between the number of APs and the sum achievable rate.
Figure 8. The relationship between the number of APs and the sum achievable rate.
Sensors 23 08295 g008
Figure 9. The relationship between the number of APs and the total EE.
Figure 9. The relationship between the number of APs and the total EE.
Sensors 23 08295 g009
Figure 10. The total EE versus the DAC resolutions.
Figure 10. The total EE versus the DAC resolutions.
Sensors 23 08295 g010
Table 1. The simulation parameters.
Table 1. The simulation parameters.
ParametersValue
Bandwidth B20 MHz
Maximum   DL   transmit   power   P m 1000 mW
Energy   consumption   of   fronthaul   link   E b h 0.25 × 10 3 Joule/Mbit
Energy   consumption   of   backhaul   link   E b b 15 E b h
Thermal   noise   power   per   UE   σ w 2 7.457 × 10 13 W
Path-loss exponent α2
Zipf distribution factor β1
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Tan, F.; Peng, Y.; Liu, Q. Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks. Sensors 2023, 23, 8295. https://doi.org/10.3390/s23198295

AMA Style

Tan F, Peng Y, Liu Q. Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks. Sensors. 2023; 23(19):8295. https://doi.org/10.3390/s23198295

Chicago/Turabian Style

Tan, Fangqing, Yuan Peng, and Qiang Liu. 2023. "Cluster Content Caching: A Deep Reinforcement Learning Approach to Improve Energy Efficiency in Cell-Free Massive Multiple-Input Multiple-Output Networks" Sensors 23, no. 19: 8295. https://doi.org/10.3390/s23198295

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop