Cooperative Sleep and Energy-Sharing Strategy for a Heterogeneous 5G Base Station Microgrid System Integrated with Deep Learning and an Improved MOEA/D Algorithm

Ming Yan; Tuanfa Qin; Wenhao Guo; Yongle Hu

doi:10.3390/en18071580

,

and

¹

School of Electrical Engineering, Guangxi University, Nanning 530004, China

²

School of Computer and Electronic Information, Guangxi University, Nanning 530004, China

³

Runjian Co., Ltd., Nanning 530004, China

^*

Authors to whom correspondence should be addressed.

Energies2025, 18(7), 1580;https://doi.org/10.3390/en18071580

This article belongs to the Section F: Electrical Engineering

Version Notes

Order Reprints

Abstract

With the rapid growth of heterogeneous fifth-generation (5G) communication networks and a surge in global mobile traffic, energy consumption in mobile network systems has increased significantly. This underscores the need for energy-efficient networks that lower operational costs and carbon emissions, leading to a focus on microgrids powered by renewable energy. However, accurately predicting base station traffic demand and optimizing energy consumption while maximizing green energy usage—especially concerning quality of service (QoS) for users—remains a challenge. This paper proposes a cooperative sleep and energy-sharing strategy for heterogeneous 5G base station microgrid (BSMG) systems, utilizing deep learning and an improved multi-objective evolutionary algorithm based on decomposition (MOEA/D). We present a reference scenario for a 5G BSMG system comprising a central and sub-base station microgrid. A prediction model was developed, integrating a convolutional neural network with a dual attention mechanism and bidirectional long short-term memory to determine the operational status of BSMGs. Our cooperative strategy addresses QoS requirements and uses the enhanced MOEA/D to improve performance. Numerical results indicate that our approach achieves significant energy savings while ensuring accurate predictions of BSMG energy demands through a multi-objective evolutionary algorithm based on decomposition.

Keywords:

deep learning; 5G; traffic prediction; base station microgrid

1. Introduction

Advancements in information and communication technology (ICT), including fifth-generation communication technology (5G) and increasingly sophisticated mobile networks, have led to a significant increase in the number of mobile users and Internet of Things (IoT) devices. In particular, projections indicate that, by 2030, 5G technology will account for over 75% of all mobile traffic [1,2]. In order to ensure the low latency and high reliability required for future wireless devices, the network management efficiency and flexibility of heterogeneous 5G networks can be improved by employing a large number of both micro- and macro-base stations [3]. However, the energy consumption associated with these base stations and their operation under high loads mean that ICT is potentially a major contributor to global energy consumption and greenhouse gas emissions in the future. Therefore, the development of green heterogeneous 5G networks is crucial to ensure the sustainability of ICT [4,5,6].

The integration of sustainable renewable energy sources, such as solar and wind power, can significantly reduce the electricity costs and carbon emissions associated with base stations in 5G networks. However, it is difficult for traditional power grids to fully accommodate green energy, thus exacerbating the environmental burden [7,8,9]. A promising solution to this is the use of microgrids, a critical component of smart grids, to facilitate the interconnection of heterogeneous base station microgrid (BSMG) systems through renewable energy access, enabling bidirectional flows of information and energy [10,11]. The strategic implementation of sleep mode for base stations can significantly reduce excessive energy consumption. While renewable energy-powered BSMGs present a sustainable solution, several critical challenges remain unresolved [12,13]. Existing strategies often fail to effectively balance accurate traffic demand prediction, efficient green energy utilization, and strict user quality of service (QoS) guarantees, given the stochastic nature of renewable energy generation and dynamic network loads. Moreover, traditional sleep-mode strategies for base stations (BSs) frequently compromise QoS during user handovers to active stations, while static optimization methods cannot adapt to spatiotemporal variations in energy supply and demand. Furthermore, conventional multi-objective optimization frameworks, such as the standard multi-objective evolutionary algorithm based on decomposition (MOEA/D), exhibit slow convergence and poor solution diversity when addressing the coupled objectives of energy efficiency, cost minimization, and QoS preservation in large-scale heterogeneous networks. These limitations lead to inefficient energy sharing, over-reliance on grid power, and degraded network performance, thereby hindering the development of sustainable 5G infrastructure [14].

This study proposes a cooperative strategy for sleep and energy sharing within a heterogeneous BSMG system. The primary objective is to minimize grid dependency and carbon footprint in heterogeneous BSMG systems while ensuring strict QoS requirements. This is achieved through synergistic traffic-aware sleep scheduling and adaptive energy-sharing strategies. The principal contributions of this paper are as follows:

A two-layer heterogeneous BSMG system model enabling bidirectional energy-information flow through centralized control.
A deep learning model integrating convolutional blocks and multi-head attention mechanisms for enhanced spatiotemporal traffic forecasting.
An improved MOEA/D algorithm incorporating quantum local search and adaptive mutation to optimize energy sharing and sleep scheduling while ensuring QoS.
Comprehensive simulations demonstrating 13% energy savings during peak loads and 36.5% higher green energy utilization compared to existing approaches.

The remainder of this paper is organized as follows. In Section 2, we review related works on reducing the energy consumption of heterogeneous base station networks. In Section 3, the system model is formulated. The design of the proposed scheme is described in Section 4. We evaluate the performance of the proposed scheme in Section 5. Finally, Section 6 concludes this study.

2. Related Work

Past studies have highlighted significant advancements in reducing the energy consumption of heterogeneous base station networks, particularly via the integration of renewable energy into mobile networks and the energy management of base station dormancy [15,16]. For example, Guo et al. introduced an effective algorithm for the energy deployment of heterogeneous 5G base stations that optimized the number of micro-base stations, their deployment locations, and their power configurations, thus enhancing both energy efficiency and overall network performance [17]. The authors of [18] also derived a coverage probability expression for a base station on/off control system utilizing random geometry methods and extended the analysis to heterogeneous networks based on weighted density. This approach offered a promising strategy to minimize the number of active base stations, consequently reducing energy consumption. However, the optimization of deployment strategies may lead to a significant increase in construction costs, which is inconsistent with the pursuit of low-carbon practices. To address this, Ghosh et al. examined a sleep strategy for heterogeneous base stations using M/M/1(Markovian Arrival, Markovian Service, Single Server) queuing theory and analysed the energy utilization rate of their proposed strategy using continuous-time Markov chains [19]. Similarly, the scheme described in [20] proposed a power-sensing user association and renewable energy allocation strategy designed to optimize grid-connected energy in heterogeneous mixed-energy cellular networks, employing Lyapunov optimization to manage energy consumption across users. Nevertheless, previous research on heterogeneous base station networks has primarily concentrated on the base stations themselves, overlooking both user QoS and energy consumption costs. Moreover, under rapidly changing environmental conditions or fluctuating traffic demand, the adaptability of these decision-making methodologies may be insufficient.

In addition, in 5G network design and management, there are also literatures that use mathematical optimization models and algorithms for optimization. In recent years, mathematical optimization models and algorithms have shown significant potential in 5G network resource management and energy efficiency optimization. The study [21] introduced a mixed integer programming model based on virtual network function reusable function blocks, providing an innovative framework for deploying 5G network virtualization functions. By jointly optimizing computing resource allocation and network topology reconstruction, this model reduces end-to-end delay and enhances resource utilization. In the domain of energy efficiency management, Salahdine et al. [22] conducted a systematic review of sleep mode techniques in ultra-dense networks, highlighting the energy-saving potential of dynamic sleep strategies. This conclusion is supported by the variable threshold sleep mechanism proposed by Ma et al. [23], which significantly improves the energy efficiency of base stations through a dynamic activation threshold in live network tests. Regarding data uncertainty in network management, the robust optimization framework proposed by Garroppo et al. [24] represents a methodological breakthrough in addressing wireless channel fluctuations and traffic prediction errors. By constructing a multi-scenario robust model, this scheme reduces network energy consumption while maintaining QoS constraints.

To address randomness in network traffic demand for heterogeneous base stations, several studies have employed deep learning techniques for predictive modelling to enhance load management [25]. One notable contribution involves the development of a graph attention network that employs time series similarity to capture spatial and temporal information from cellular traffic data [26]. This approach has been successfully applied to traffic prediction in 5G and beyond-5G environments, demonstrating superior performance compared to classical prediction models such as graph neural networks and gated recurrent units (GRUs). Another study [27] has proposed a traffic prediction method based on support vector regression, which allows users to identify the optimal unit within a heterogeneous network based on anticipated traffic demand. Additionally, Ref. [28] introduced the integration of Butterworth filters for the preprocessing of raw data, constructing various hybrid prediction models that combine CNNs and LSTMs tailored to different frequency bands of Butterworth outputs with multiple features. Simulation results confirmed the effectiveness of this methodology across a diverse range of scenarios. The authors in [29] also introduced a base station hibernation scheme based on BiLSTM networks and the signal-to-interference-plus-noise ratio (SINR) in a two-layer heterogeneous cellular network. This framework predicted user traffic demand, which was then used to determine hibernation strategies for base stations within the network. Attention mechanisms have emerged as a critical advancement in deep learning, enabling models to concentrate on salient features while mitigating long-range dependency issues. Recent literature highlights their effectiveness in spatiotemporal data analysis. The study by [30] proposed a time-level attention-assisted CNN architecture for cellular traffic prediction, effectively reducing the computational time required for traffic data forecasting. Additionally, Ref. [31] introduced a novel temporal attention-enhanced network that selectively filters out highly correlated data due to short-term temporal dependencies and periodicity by leveraging input time series. The temporal attention mechanism is subsequently applied to extracted spatiotemporal features to exploit further temporal dependencies. In [32], variational mode decomposition (VMD) is utilized to preprocess network traffic, while the whale optimization algorithm is employed to select optimal parameters for VMD. Furthermore, the transformer network is enhanced by integrating a temporal convolutional network and a multi-head attention mechanism. This proposed model is validated to outperform traditional single or combined models in wireless network traffic prediction.

Inspired by the above research, traffic prediction primarily relies on deep learning models such as CNNs and LSTMs, with their performance directly impacting the energy efficiency of base station networks. To enhance these deep learning models, attention mechanisms have been developed to effectively capture salient information while addressing the long-range dependency issues faced by some models. Motivated by these studies and the high energy consumption associated with heterogeneous 5G BSMG networks, this paper implements a DAM that incorporates both convolutional block attention and multi-head attention to optimize the performance of a CNN–BiLSTM traffic prediction model. It also explores the use of collaborative sleep and energy-sharing algorithms for BSMGs. In particular, a modified MOEA/D enhanced using quantum local search and adaptive variation techniques is employed to reduce the energy consumption of heterogeneous BSMG systems while considering user QoS requirements.

3. System Model

3.1. Reference Scenario

This study establishes a heterogeneous BSMG system consisting of

C

layers, with a CBSMG as the initial layer and the subsequent

C - 1

layers consisting of SBSMGs (including a microcell, femtocell, and similar base station networks), characterized by varying transmit power across different layers. Our focus is on a downlink transmission system with

C = 2

, as illustrated in Figure 1.

Figure 1. Heterogeneous 5G BSMG system model employed in the present study.

A CBSMG and an SBSMG are primarily distinguished by their service capabilities; a CBSMG offers low-rate services and capacity, whereas an SBSMG is deployed in high-traffic areas within the coverage area of a CBSMG to enhance network capacity and deliver high-rate services. Each BSMG is equipped with renewable energy generation and storage devices. The CBSMG functions as an information and energy management unit, overseeing multiple SBSMGs and mobile users within its coverage area. The CBSMG is powered by its own green energy generation equipment and an external smart grid, which includes photovoltaic modules and energy storage systems, providing supplementary power when green energy cannot meet the demand. Each SBSMG is also equipped with green energy generation devices, with its energy management governed by the CBSMG. When there is an energy shortfall, the external smart grid can supply power to each SBSMG via the CBSMG system, ensuring that the QoS requirements of users are met. Additionally, because the CBSMG is equipped with communication units such as active antenna processing units and baseband units, it can engage directly with mobile users. Conversely, when an SBSMG experiences heavy loads and its own green energy supply is insufficient, it can use other green energy sources via cooperative energy sharing with other SBSMGs. In addition, when idle or under low-load conditions, an SBSMG can enter sleep mode to conserve energy.

3.2. BSMG Network Model

We define a CBSMG and N SBSMGs within a heterogeneous BSMG system as the set

B_{n} = {0, 1, 2, \dots, N}

. Let

U_{m} = {1, 2, \dots, M}

represent the set of users in this system, where clusters operate without interference with one another. The system state fluctuates in accordance with the traffic load and energy levels of the base station throughout the duration of the time slot

Δ T

. According to the literature [33], the SINR for a user

j \in U_{m}

associated with the BSMG

i \in B_{n}

is given by the following:

S I N R_{i, j} = \frac{W_{i} τ_{i}}{\sum_{i = 1}^{N} W_{s} τ_{s} + σ_{0}^{2}}

(1)

The SINR of user

j

is determined by the signal power from the serving base station, the aggregate interference power from other base stations, and the noise power. This metric reflects the channel quality experienced by the user. Here,

W_{i}

and

τ_{i}

represent the transmit power and channel gain of BSMGi, while

\sum_{i = 1}^{N} W_{s} τ_{s}

encompasses all co-frequency interference sources, respectively. Lastly,

σ_{0}^{2}

signifies the power of the noise. According to Shannon’s formula, at time slot t, the average service rate accessible to user j over BSMGi can be computed using the following equation.

c_{i, j} (t) = μ_{0} (t) \log_{2} (1 + S I N R_{i, j})

(2)

μ_{0} (t)

denotes the bandwidth allocated by a resource block of BSMGi at time t. Let us assume that a user is affiliated with a single BSMG during time slot t, and the current set of users associated with BSMGi is represented by

U_{i, j}

. Furthermore, the average service rate of BSMGi while serving user j is denoted as

r_{i, j} (t)

. Consequently, the normalized load of BSMGi at time slot t can be expressed as follows:

ρ_{i} (t) = \sum_{j \in U_{i, j}} \frac{r_{i, j} (t)}{c_{i, j} (t) ψ_{i}}

(3)

ψ_{i}

represents the number of resource blocks that BSMG i can allocate. For the sake of simplicity, it is assumed that a BSMG will not operate in an overloaded state, with its operational modes categorized as either active or asleep. The CBSMG remains perpetually active, whereas the SBSMGs are able to enter sleep mode when required. Accordingly, the energy consumption of the CBSMG and SBSMGs can be determined as follows:

P_{c} = P_{0} + ρ_{i} (θ P_{\max} + P_{f})

(4)

P_{s} = \{\begin{matrix} P_{A}, if SBSMG is active \\ P_{S L}, if SBSMG is sleeping \end{matrix}

(5)

The energy consumption of the CBSMG has both static and dynamic components. Here,

P_{0}

denotes the fixed energy consumption during static periods, while the dynamic energy consumption is contingent upon the real-time load experienced by the BSMG.

P_{\max}

and

P_{f}

represent the maximum transmit power and circuit power of the CBSMG, respectively.

θ

represents the power loss factor attributable to hardware inefficiencies. Conversely, the energy consumption of the SBSMG remains almost constant despite fluctuations in the traffic load; thus,

P_{A}

and

P_{S L}

are used to represent the energy consumption of the SBSMG in its active and sleep states, respectively. Based on Equations (4) and (5), the total energy consumption of the heterogeneous BSMG system can be expressed as follows:

P_{t o t a l} = P_{c} + \sum_{i = 1}^{N} P_{s}^{i}

(6)

4. Cooperative Sleep and Energy-Sharing Strategy for BSMGs

To reduce energy consumption within the grid while protecting user QoS, we propose a cooperative strategy for sleep and energy sharing among BSMGs guided by the prediction of traffic loads. Then, the optimization problem is modelled and finally solved by the improved MOEA/D algorithm.

4.1. Traffic Prediction Based on Deep Learning

In practice, user traffic within a cellular network exhibits real-time fluctuations, self-similarity, and long-range dependence. To take full advantage of the spatial expansion, feature extraction, and fusion capabilities of the CNN and the temporal extension of BiLSTM, this paper introduces a battery storage management grid traffic prediction model based on a combination of a CNN, DAM, and BiLSTM. By precisely forecasting the traffic for each battery storage management grid, we aim to effectively manage network resources and enhance the energy efficiency of the overall system.

As a multi-layer supervised learning network employing convolutional operations, a CNN employs weight sharing for spatial expansion, feature extraction, and fusion. The architecture of a CNN generally consists of convolutional, pooling, and fully connected layers. However, for time series prediction models, the use of traditional CNNs may lead to information loss or the identification of local features that do not represent the overall trend of the series.

Attention mechanisms are designed to optimize the use of limited computational resources when processing large volumes of data by focusing on the most significant information. We thus integrate a convolutional block attention module (CBAM) into the CNN framework. By taking into account the interaction of information across time scales and introducing weight allocation, the ability of the CNN to learn salient channel feature information is strongly enhanced, thus improving the predictive accuracy of the model.

The CBAM consists of two components: a channel attention module (CAM) and a spatial attention module (SAM). The CAM begins with global average pooling, followed by two fully connected layers that reduce the dimensionality and increase the nonlinearity, respectively. A sigmoid activation function and feature multiplication yield the attention weight vector for each channel, resulting in a weighted feature representation that strengthens the correlation between different feature channels while eliminating redundant information. This process enables the model to concentrate on the most pertinent feature channels.

The SAM employs maximum pooling across the channel dimensions for each weighted feature map, capturing the maximum response value at each spatial location to highlight important features. Similar to the CAM, this module enhances various spatial locations within the feature map, forcing the model to focus on the most important spatial areas.

The CBAM is employed after each convolutional block of the CNN. The output feature map from each convolutional block is first processed through the CAM and then the SAM, which emphasizes the representation of significant channels and spatial locations. Figure 2 summarizes the structure of the refined CNN network.

Figure 2. Network structure diagram of the modified CNN.

LSTM networks generally avoid the vanishing and exploding gradient issues associated with traditional recurrent neural networks during the training process and are more suited for the tracking of the long-term dependencies in time-series data. A BiLSTM network consists of two LSTM units, thus improving on traditional one-way LSTM networks, which cannot make full use of future information in sequence data. BiLSTM employs a shared weight matrix that can extract forward and reverse features from the time series via splicing, thus overcoming long-term dependency issues and improving prediction performance. The network structure of BiLSTM is presented in Figure 3.

Figure 3. Network structure of BiLSTM.

The BiLSTM cell structure includes forget, input, and output gates. In the BiLSTM network, the state unit updates and retains the historical state, ensuring that the output at each step is closely linked to both the current and historical input. However, when the time series being processed is very long, the limited storage capacity of the state unit may result in the loss of critical information from important time nodes. To address this, we introduce a multi-head attention mechanism (MHAM) to the BiLSTM network, enhancing its ability to focus on and capture a variety of features within the time series for the prediction of traffic loads.

The MHAM independently concentrates on various features via the parallel computation of multiple attention heads, each employing its own weight allocation method. This facilitates the analysis of input information from multiple perspectives. The output vector from the BiLSTM is transformed into three input matrices—query (Q), key (K), and value (V)—using different mapping operations. The corresponding weights obtained during the training process are denoted

w^{Q}

,

w^{K}

, and

w^{V}

, respectively. The assigned weights for each hidden layer vector can consequently be expressed as

Q = w^{Q} q

(7)

K = w^{K} k

(8)

V = w^{V} v

(9)

The MHAM partitions the time series into L subspaces, with each head executing self-attention calculations on its respective subspace. The multiple heads that are generated are subsequently concatenated. The attention output matrix is normalized using a softmax function as follows:

h e a d = A t t e n t i o n (Q, K, V) = s o f t \max ((\frac{Q K^{T}}{\sqrt{d_{k}}}) V)

(10)

where

d_{k}

represents the feature dimensions for each key, which are used for weight scaling and are normalized to the interval [0, 1] using the softmax function.

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{l}) w^{l}

(11)

In Equation (12),

w^{l}

is the weight of the linear transformation,

h e a d_{l}

represents the L-th head of the MHAM, and Concat represents the concatenation operation. The structure of the MHAM is presented in Figure 4.

Figure 4. Structure of the multi-head attention mechanism used in the BiLSTM model.

To improve the traffic prediction performance of the BSMGs, we combine the CBAM with the MHAM to produce a DAM that optimizes the CNN and BiLSTM, respectively (Figure 5). In the feature extraction stage of the CNN–DAM–BiLSTM model, the CAM and the SAM are employed after each convolutional block of the CNN by introducing the CBAM to the CNN. The output feature map of each convolutional block passes sequentially through the CAM and SAM to increase the weighting of important channel features, before being processed by the pooling layer and the fully connected layer. Because BiLSTM can suffer from longitudinal temporal information loss, the MHAM is added to the BiLSTM model to more accurately capture correlations with the time series data. The BiLSTM network is trained to extract features with long-term correlation characteristics from the traffic feature vectors extracted from the CNN layer, and, when combined with the MHAM, the output of the BiLSTM network is weighted to learn more feature information from different spaces. The weighted feature information is then passed to the fully connected layer to predict future traffic loads.

Figure 5. Flow chart for the CNN–DAM–BiLSTM traffic prediction model for BSMGs.

4.2. Optimization Problem Modeling

Based on the aforementioned system model, this section formulates a multi-objective optimization problem to coordinate the sleep decisions, energy allocation, and resource sharing within the BSMG. By jointly optimizing

λ_{i} (t)

,

u_{i} (t)

, and

R_{i} (t)

, the optimal trade-off among system revenue, QoS benefits, and energy efficiency is achieved while satisfying QoS constraints and maintaining energy balance.

Decision variable: Based on predictions from the CNN–DAM–BiLSTM traffic prediction model, when a BSMG is idle or its normalized load

ρ_{i} (t)

is low, it can be switched to sleep mode to save grid or renewable energy. The

ρ_{i} (t)

at time t determines state

λ (t)

of the BSMG. When cells become idle (

ρ_{i} (t) = 0

), they transition to sleep mode and remain there (

ρ_{i} (t) < ρ_{t h}

) while the traffic loads remain below the threshold. The sets of active and sleeping BSMGs are defined as

\{\begin{matrix} Γ^{a c t i v e} (t) = {i \in B S M G_{n} \cap i \neq 0 |λ (t) = 1} \\ Γ^{s l e e p} (t) = {i \in B S M G_{n} \cap i \neq 0 |λ (t) = 0} \end{matrix}

(12)

and

R_{i} (t) = \min r_{i, j} (t), j \in U_{j}

denotes the minimum required rate set of users during a low-load period. Because the distance between individual BSMGs is short, it can be assumed that transmission loss during energy sharing is negligible, meaning that BSMG i has surplus energy to share with other BSMGs that are facing energy shortages. Therefore, the energy-sharing strategy employed between BSMGs

i

and

i^{*}

at time t can be expressed as

u_{i} (t) = {[u_{i i^{*}} (t)]}_{0 \leq i, i^{*} \leq N}, i \neq i^{*}

(13)

where

i = i^{*}

denotes the energy taken by BSMG i from the smart grid.

Feasibility constraint: Equation (14) is then introduced as a constraint for the active and sleeping BSMGs to ensure the QoS of users based on the rate index while also preventing BSMGs from being overloaded:

\{\begin{matrix} \log_{2} (1 + S I N R_{t h}) \geq R_{i} (t) \\ 0 \leq ρ_{i} (t) = \sum_{j \in U_{i, j}} \frac{r_{i, j} (t)}{c_{i, j} (t) ψ_{i}} \leq 1 \end{matrix}, \forall j \in U_{m}, t \in Δ T, \exists i \in B_{n} = Γ^{a c t i v e} (t) \cup Γ^{s l e e p} (t)

(14)

where

S I N R_{t h}

is the threshold for the user SINR. We obtain the candidate sleep set for the BSMGs using the CNN–DAM–BiLSTM traffic prediction model. The energy consumption of BSMG i can be expressed as follows:

P_{i} (t) = \{\begin{matrix} λ (t) P_{A} + (1 - λ (t)) P_{S L}, i \neq 0 \\ P_{c} (t), i = 0 \end{matrix}

(15)

The BSMGs can share energy with each other or purchase energy from the smart grid to ensure the QoS of users when their renewable energy supply is exhausted. For simplicity, in this paper, solar energy is the only source of BSMG renewable energy for the model network. The renewable energy generation of a BSMG at time t can be expressed as

g (t)

, which can be predicted based on the previous historical data for photovoltaic power generation and statistical methods as follows [34]:

g^{p} (t) = g_{N}^{p} \frac{G}{G_{S T C}} [1 + χ (E T + 30 * \frac{G}{G_{S T C}} - T_{S T C})]

(16)

where

g^{p} (t)

is the generation power of the photovoltaic power generation system,

g_{N}^{p}

is the rated generation power of (

G_{S T C} = 1000 \frac{W}{m^{2}}

,

T_{S T C} = 25 ° C

) under standard test conditions,

G

is the intensity of light radiation,

E T

is the ambient temperature, and

χ

is the power factor. In addition, it is assumed that the energy consumption of the renewable energy capacity generation process is 0.

Each BSMG is equipped with an energy storage device to meet periods of high demand in the future and reduce its dependence on smart grid energy. Assuming that the initial energy storage is 0, the energy storage for a BSMG at time t is denoted as

e^{s} (t)

, which is constrained by the maximum energy storage capacity

E^{s}

. The energy storage state in the subsequent time slot can be determined, thereby satisfying the energy storage dynamic balance constraint, as shown in Equation (17):

0 \leq e_{i}^{s} (t + 1) = e_{i}^{s} (t) + g_{i}^{p} (t) - P_{i} (t) - \sum_{i} u_{i} (t) \leq E^{s}, \forall i \in B_{n}, t \in Δ T

(17)

Energy sharing between BSMGs is based on the difference in energy consumption caused by spatiotemporal variation in traffic loads and differences in renewable energy reserves due to geographic factors. The amount of energy available for energy sharing based on the current load and the renewable energy harvesting status of BSMG i can be expressed as follows:

e_{i}^{c o} (t) = g_{i}^{p} (t) - P_{i} (t) - e^{s} (t), i \in B S M G_{n} \cap i \neq 0

(18)

Objective function: To encourage energy sharing among BSMGs, we set the cost of energy sharing between BSMGs (

α

) to be lower than the cost of energy purchased from the smart grid (

α^{*}

). The objective function of the BSMG system benefit can then be compared with the use of the traditional grid power supply as follows:

z_{i} (t) = α^{*} P_{i} (t) - (α u_{i i^{*}} (t) + α^{*} u_{i i} (t))

(19)

In addition, considering the user experience, we define the QoS objective function for the network energy efficiency obtained using the maximum demand rate as follows:

ϖ_{i} (t) = \ln (1 + \frac{R_{i} (t)}{P_{i} (t)})

(20)

Finally, based on the ratio of total energy consumption to total service rate, the objective function used to determine the network energy efficiency is as follows:

E E = \frac{\sum_{i \in B_{n}} \sum_{j \in U_{j}} r_{i, j}}{P_{t o t a l}}

(21)

Based on the above analysis, the decision variables of the optimization problem mainly include binary vectors (

Γ^{a c t i v e} (t)

and

Γ^{s l e e p} (t)

) and continuous variables (the amount of energy sharing between BSMGS

u_{i i^{*}} (t) \geq 0

and the user rate guarantee

R_{i} (t) \geq 0

based on QoS requirements). Feasibility constraints include QoS constraints and load constraints (Equation (14)), sleep state constraints (Equation (15)) and energy balance constraints (Equation (17)). The objective function of the model aims to maximize system revenue, QoS benefits, and energy efficiency. This constitutes a typical NP-hard problem. To address the trade-offs between these objectives, we construct a mixed integer nonlinear programming model within the Pareto optimization framework, as demonstrated in Equation (22).

\begin{array}{l} \max {\sum_{i = 0}^{N} z_{i} (t), \sum_{i = 0}^{N} ϖ_{i} (t), E E} \\ s . t . Eqs (14), (15) and (17) \\ λ_{i} (t) \in {0, 1}, u_{i i^{*}} (t) \geq 0, R_{i} (t) \geq 0 \end{array}

(22)

4.3. Improved MOEA/D Algorithm

To address the aforementioned multi-objective optimization problem, this paper employs an enhanced version of the MOEA/D algorithm. The MOEA/D is based on a decomposition strategy that is particularly suitable for solving optimization problems with complex nonlinear and integer decision variables. This algorithm generates a set of uniform weight vectors, divides the multi-objective problem into many single-objective sub-problems, assigns the corresponding weight and the neighbourhood of the related population point to each sub-problem, and generates new solutions through the crossover and mutation of each sub-problem. The decomposition strategy is then used to aggregate the multi-objective problem into a single-objective problem, and the parent population is updated. Each sub-problem is repeated until the termination condition is reached by optimizing the solution of the corresponding neighbourhood. The weighted sum method, Chebyshev method, and penalty-based boundary intersection method are three common decomposition methods in the MOEA/D algorithm. Among these, the Chebyshev decomposition serves as the most fundamental and widely used strategy within MOEA/D. It primarily minimizes the maximum weighted deviation, thereby effectively addressing complex multi-objective optimization problems. The mathematical formulation of its aggregation function is as follows:

J (a |D, V^{*}) = \min_{a \in Ω} {\max [D (F (a) - V^{*}]}

(23)

Let

Ω

denote the set of feasible strategies, V^∗ represent the current optimal solution of the objective function

F (a)

, which is also the ideal point, and

D

be the weight vector. When the weight vector is uniformly distributed, the corresponding solutions of the subproblems are uniformly distributed along the Pareto front. Moreover, the solution update for each subproblem relies on the solutions of neighbouring subproblems, with information sharing facilitated through the neighbourhood structure of the weight vector.

The proposed algorithm outperforms other alternatives when dealing with continuous multi-objective problems. However, its fixed neighbourhood value reduces the convergence speed, and its global search ability is weak because it uses a fixed mutation probability. Therefore, this paper proposes a method to self-adaptively adjust the size of the neighbourhood value based on a quantum local search. Assume that the current solution represents a quantum state, as depicted in Equation (24):

|φ⟩ = ε |0⟩ + κ |1⟩ = \cos (\frac{μ}{2}) |0⟩ + \sin (\frac{μ}{2}) |1⟩

(24)

Let

μ

denote the angle between the current solution and the local optimum solution. The quantum rotation gate functions to rotate the quantum state and update the probability amplitude:

\{\begin{matrix} ε^{*} = \cos (γ) \cdot ε - \sin (γ) \cdot κ \\ κ^{*} = \sin (γ) \cdot ε + \cos (γ) \cdot κ \end{matrix}

(25)

Let

γ

denote the rotation angle. If

μ \approx 0

indicates that the current solution is close to a local optimum, then

ε \approx 1, κ \approx 0

follows. The probability amplitude after rotation is presented in Equation (26):

κ^{*} \approx \sin (γ) \cdot 1 + \cos (γ) \cdot 0 = \sin (γ)

(26)

Consequently, the probability of escaping the local optimum is given by the following:

O_{e s c a p e} = {|κ^{*}|}^{2} = \sin^{2} (γ)

(27)

The initial rotation angle

γ

is assumed to be small and can be approximated using a Taylor expansion.

\sin^{2} (γ) \approx {(γ)}^{2} - \frac{{(γ)}^{4}}{3} + \dots

(28)

When the rotation angle is much less than 1, higher-order terms can be neglected, yielding the following result:

O_{e s c a p e} \propto {(γ)}^{2}

(29)

This indicates that the square of the rotation angle

γ

is proportional to the probability of escaping the local optimum.

To accelerate convergence and avoid invalid search, the adaptive rotation coefficient is first calculated as follows [35]:

γ (t) = (γ_{\max} - γ_{\min}) \frac{I t_{\max} - t}{I t_{\max}} + γ_{\max}

(30)

Where

I t_{\max}

is the maximum number of iterations and

γ_{\max}

and

γ_{\min}

are preset fixed values. By substituting Equation (29), the probability of escaping the local optimum using the adaptive rotation strategy can be derived as follows:

O_{e s c a p e} (t) \propto {[γ (t)]}^{2} = {[(γ_{\max} - γ_{\min}) \frac{I t_{\max} - t}{I t_{\max}} + γ_{\max}]}^{2}

(31)

Consequently, the probability of escaping the local optimum is high during the initial iterations. As the rotation angle

γ

decreases in the later stages, the ability to escape the local optimum diminishes, thereby balancing exploration and exploitation.

The attraction point

S

is generated based on the individual historical optimal position and the group historical optimal position using Equation (32):

S_{k} = ϕ X_{k}^{b e s t} + (1 - ϕ) X^{b e s t}

(32)

where

ϕ

is a random matrix with a uniform distribution within [0, 1]. Based on the

δ

-well potential, it is assumed that the neighbourhood position vector exhibits quantum behaviour, with the wave function

ϕ

used to describe the vector state. By solving the Schrodinger equation for the one-dimensional

δ

-well potential, the probability density for the vector at a certain point in space is obtained:

D (X_{k}^{b e s t}) = e^{\frac{- 2 |S_{k}^{i} - x_{k}^{b e s t}|}{L}}, L = 2 γ (t) |X_{k}^{b e s t} - X_{k}^{i}|

(33)

where

X_{k}^{i}

is the new neighbourhood position vector and

L

is the position vector after the individual expansion coefficient is changed. Finally, the new position vector equation is obtained using a Monte Carlo stochastic simulation as follows:

X_{k}^{i} (t + 1) = γ_{k} \pm \frac{L}{2} \ln (\frac{1}{ϕ})

(34)

In addition, the adaptive mutation probability

υ (t + 1)

is generated based on the number of iterations and Gaussian mutation to enhance the global search without reducing the convergence speed, as shown in Equation (35):

υ (t + 1) = υ (t) \cdot (1 - ϑ \cdot \frac{I t}{I t_{\max}}) + G a (0, 1)

(35)

where

ϑ

is the rate adjustment factor and

G a (0, 1)

is the standard normal distribution. A summary of the improved MOEA/D is presented in Figure 6 and Algorithm 1.

Algorithm 1. Improved MOEA/D algorithm

Input: multi-objective optimization problem, evenly distributed weight vector, neighbourhood size, maximum number of iterations

{I t}_{\max}

and crossover and mutation parameters, and quantum rotation angle range (

γ_{\min}, γ_{\max}

)
Step 1: Initialize the neighbourhood. For each weight vector, compute the Euclidean distance to all other weight vectors. Select the

ς

nearest weight vectors as the neighbourhood. Subsequently, initialize the population and the ideal point

V^{*}

.
Step 2: For

i = 1, 2 \dots L

, do
Step 3: For each individual

i

, candidate solution

\hat{a}

is generated based on quantum local search.
Calculate the adaptive rotation angle:
Generate the attraction point.
The solution is updated by Monte Carlo sampling.
Step 4: Perform adaptive Gaussian mutation on

\hat{a}

and calculate the dynamic mutation probability.
Step 5: Update the ideal point: If an objective value of

\hat{a}

is superior to that of

V^{*}

, then

V^{*} = F (\hat{a})

is updated accordingly.
Step 6: Update the neighbourhood. Utilize the Chebyshev decomposition-based aggregate function. If

J (a |D, V^{*}) \leq J (\hat{a} |D, V^{*})

, then the objective function value is

F V^{i} = F (\hat{a})

.
Step 7: If the maximum number of iterations is reached, stop the iteration and output the optimal solution set, otherwise return to step 2.
Output: Non-dominated solution stored during search.

Figure 6. Flowchart for the improved MOEA/D.

The time complexity of the original MOEA/D algorithm is primarily determined by the following operations:

Neighbourhood cooperation: Each subproblem must traverse

T

solutions within its neighbourhood for information exchange. The time complexity for this operation is

O (Z * T)

, where

Z

denotes the number of subproblems.

Crossover and mutation: The genetic operation complexity for each subproblem is

O (ω)

, and the total is

O (N * ω)

.

Reference point update: To maintain the ideal point, all objective values must be traversed. The complexity for this operation is

O (N * V A)

, where

V A

is the number of objectives.

In summary, the time complexity for a single iteration of the original algorithm is

O (Z * (T + ω + V A))

. The overall complexity, considering

I t_{\max}

as the maximum number of iterations, is

O (I t_{\max} * N * (T + β + m))

.

The improvements to the algorithm include the adaptive neighbourhood size for quantum local search and the dynamic adjustment of the Gaussian mutation probability. The calculations of the expansion coefficient, the generation of the attraction point, the computation of probability density, and the Monte Carlo simulation, which are introduced in the algorithm, are related to the neighbourhood size

T

. However, since only linear operations are required for each individual, the complexity remains

O (Z * T)

, which is of the same order as the original neighbourhood operation. The dynamic adjustment of the mutation probability involves generating a normal distribution and updating parameters, with the complexity of a single operation being

O (1)

, and the total being

O (Z)

. Compared to the original mutation operation, the complexity order does not increase. In summary, the complexity of the improved algorithm remains

O (I t_{\max} * N * (T + β + m))

. Through adaptive neighbourhood adjustment and dynamic mutation probability, the enhanced MOEA/D algorithm significantly improves global search capability and convergence speed while maintaining the complexity order of the original algorithm. In the context of energy sharing within a base station microgrid, the algorithm can efficiently address dynamic multi-objective optimization problems and meet real-time scheduling requirements.

5. Simulation Results

We initially conducted the simulation verification utilizing MATLAB 2023a to assess the performance of the proposed scheme. To capture spatiotemporal dynamics in urban user mobility and renewable energy generation, we analysed a 12-month dataset from 2023 from 5G base stations at Guangxi University, shown in Figure 7. The dataset includes half-hourly measurements of network traffic, energy consumption, PV generation, and environmental factors. In addition, the base station comprises 30 photovoltaic panels for green energy provision, along with batteries having a capacity of 8 kW·h for energy storage. Prior to model training, feature selection was performed using mutual information scores, retaining the top 15 features (e.g., active users, lagged traffic, hour of day). Highly correlated features (PV generation vs. solar irradiance) were aggregated to avoid multicollinearity. Figure 8 presents the traffic patterns for the base station over the course of two days. The load at the base station remains relatively high during daylight hours, coinciding with higher energy demands, while it experiences a significant decline at night, leading to lower energy consumption and a potential energy surplus within the storage device.

Figure 7. 5G base station used for traffic prediction at Guangxi University in China.

Figure 8. Traffic variation for a base station over a period of 48 h.

The network parameters used for the heterogeneous battery storage management grid system are summarized in Table 1, while the cost of energy sharing between BSMGs and the cost for procuring energy from the smart grid are set at

α = 0.5

and

α^{*} = 1

, respectively.

Table 1. Network parameters for the heterogeneous BSMG system.

Table 2 presents the traffic and meteorological data during the evening peak period (18:00–20:30) on a particular summer day, characterized by high user density and a load rate of 0.95, nearing the network capacity limit. This scenario primarily results from the surge in traffic due to video streaming, social media, and other applications, coupled with increased user density that intensifies resource competition and diminishes the efficiency of base station scheduling. Consequently, some users may switch to nearby base stations; however, the decline in channel quality leads to a reduced service rate (from 350 Mbps at 18:00 to 320 Mbps at 19:30). Additionally, decreased photovoltaic generation compels base stations to purchase power from the grid and necessitates increased energy sharing.

Table 2. The traffic and meteorological data.

The CNN–DAM–BiLSTM architecture encompasses a substantial number of hyperparameters, and their configuration significantly impacts the experimental outcomes. To efficiently identify the optimal parameter combination, we employ a Bayesian optimization algorithm. The Bayesian optimization algorithm leverages the prior probability distribution of the objective function and known observation points to update the posterior probability distribution, subsequently identifying the next minimum value based on this updated distribution. In Bayesian optimization, the optimal parameters are obtained by sampling from regions where the global optimum is most probable and from unexplored areas, iteratively minimizing the loss function. Initially, a pre-defined range of parameters is specified, and Bayesian optimization searches within these ranges to determine the best parameter combination for the current model. The root mean square error (RMSE) and mean absolute percentage error (MAPE) are also used to quantify the performance of the prediction models (Equations (36) and (37), respectively).

r_{R M S E} = \sqrt{\frac{1}{n} \sum_{i = 0}^{N} {(y_{i} - {\tilde{y}}_{i})}^{2}}

(36)

r_{M A P E} = \frac{1}{n} \sum_{i = 0}^{N} |\frac{y_{i} - {\tilde{y}}_{i}}{y_{i}}| \times 100 %

(37)

The primary hyperparameters of the prediction model in this study include the number of BiLSTM units, the number of attention heads (MHAM), the number of convolution kernels (CNN), and the learning rate. To more clearly compare the performance metrics of different parameter settings, we conducted experiments with several pre-selected values, and the resulting parameter performance is presented in the Table 3.

Table 3. The resulting parameter performance.

The results indicate that the attention mechanism and the number of BiLSTM units significantly influence the model’s performance and should be prioritized for optimization. The learning rate and convolution kernel size also require adjustment to prevent overfitting or underfitting. Specifically, increasing the number of BiLSTM units from 32 to 64 results in a substantial decrease in RMSE and MAE by 15.1% and 19.2%, respectively. This suggests that augmenting model capacity enhances its ability to capture temporal dependencies. However, further increasing the number of units to 128 leads to overfitting, as evidenced by a validation set RMSE of 21.8. The number of attention heads, when within an optimal range, improves the model’s capacity to focus on multi-dimensional features. Exceeding this range results in performance degradation due to computational redundancy and noise interference. Additionally, appropriately selecting the convolution kernel size and learning rate enriches spatial feature extraction and ensures stable convergence, thereby ultimately enhancing prediction accuracy.

SHAP (SHapley Additive exPlanations) offers a significant advantage in explaining deep learning models by quantifying the directional impact of hyperparameters on prediction error. Grounded in game theory, SHAP equitably allocates contributions among parameters, thereby circumventing the biases inherent in traditional grid search methods. As depicted in Figure 9, the colour gradient (ranging from blue to red) intuitively illustrates nonlinear interactions. Notably, the number of BiLSTM cells exhibits diminishing returns. Additionally, SHAP values are predominantly concentrated within the range of −0.2 to 0.2 (with a mean value close to 0), indicating that adjusting the number of convolution kernels has minimal impact on RMSE, with fluctuations limited to ±2.1%. Attention heads exceeding eight demonstrate a substantial shape variance of 15%, suggesting that redundant feature interactions impair generalization and lead to overfitting. Lastly, the learning rate exhibits the broadest range of SHAP values (−0.3 to 0.4), underscoring its pivotal role in model convergence. However, high learning rates can induce performance degradation, with optimal stability achieved at a learning rate of 0.005. Consequently, the optimal hyperparameter combination is as follows: the number of BiLSTM cells = 64, the number of attention heads = 8, the number of convolution kernels = 32, and the learning rate = 0.005.

Figure 9. SHAP analysis of predictive models.

To demonstrate the necessity of each component in the dual attention mechanism, we conducted ablation experiments, as illustrated in Table 4. The experimental results indicate that the components within the DAM are complementary, and their synergy effectively fuses spatio-temporal features. CBAM selects key spectral features and spatial hotspots through channel-spatial attention. CBAM alone outperforms MHAM alone, suggesting that CBAM attention is more critical for local feature extraction. CAM induces more significant performance degradation than SAM, where the RMSE decreases by 15%, indicating that channel attention plays a dominant role in spectral feature selection. Conversely, MHAM captures long-term traffic dependencies through temporal attention. Removing either component results in a significant decrease in prediction performance (p < 0.05), thereby verifying the necessity of the DAM architecture design.

Table 4. Ablation experiments for CNN–DAM–BiLSTM prediction models.

To simulate the traffic demand for the BSMGs, we utilized historical traffic data from the previous eight days for model training, with the first seven days allocated for training and the eighth day reserved for testing. We compared the results of the proposed CNN–DAM–BiLSTM model with those obtained from CNN–BiLSTM [36], CNN–BiGRU [37], and CNN–SA–BiLSTM [38]. The selected baselines—CNN–BiLSTM, CNN–BiGRU, and CNN–SA–BiLSTM—represent the state-of-the-art in hybrid deep learning for cellular traffic prediction. CNN–BiLSTM and CNN–BiGRU serve as foundational architectures for spatiotemporal modelling, while CNN–SA–BiLSTM embodies a recent advancement incorporating self-attention mechanisms. By comparing these models with our DAM-enhanced architecture, we illustrate the improvements in feature fusion and long-range dependency capture achieved through the incorporation of dual attention mechanisms.

Figure 10 presents the real traffic for a base station over 24 h and the prediction results obtained from the compared prediction models. The predictions from each model are more accurate in the initial stages, with the CNN–BiLSTM, CNN–BiGRU, and CNN–SA–BiLSTM models returning large errors during peak night-time traffic. In contrast, the proposed model exhibits a higher prediction accuracy during this peak and over the entire period in general.

Figure 10. Traffic prediction results for the compared models and the ground truth over a 24 h period.

The evaluation results are presented in Table 5. BiLSTM and BiGRU cannot accurately capture long-distance dependencies, resulting in a high RMSE and MAPE, while CNN–SA–BiLSTM cannot effectively combine the local features extracted by the CNN and the key information from BiLSTM due to its attention mechanism, adversely affecting its prediction performance. In contrast, due to its DAM, the proposed model accurately captures different features of the time series, leading to an RMSE and MAPE that are 72.09% and 74.71% lower, respectively, than the CNN–BiGRU model. Collectively, these results confirm the feasibility of the CNN–DAM–BiLSTM model for traffic prediction.

Table 5. Evaluation results for the compared prediction models.

Based on the user traffic data predicted by the CNN–DAM–BiLSTM model, we then analyse the influence of the improved MOEA/D on the energy-sharing strategy of the BSMGs in comparison to the other multi-objective optimization algorithms MOSSA [39], NSGA-II [40], MOPSO [41], and MOEA/D [42]. NSGA-II and MOPSO serve as classical benchmarks for diversity and convergence, respectively. MOSSA, with its efficient exploration and exploitation balancing mechanism, is particularly suitable for nonlinear problems such as energy sharing. Meanwhile, the original MOEA/D underscores the impact of our quantum-inspired modifications.

Figure 11 compares the improved MOEA/D and the other four algorithms in terms of the average system revenue when the number of iterations is set at 400, with the results averaged over every 20 iterations. The improved MOEA/D is observed to converge more rapidly to the optimal solution than the other algorithms. This is because the quantum local search strategy used in the improved MOEA/D allows local optima to be avoided, ensuring that the algorithm approaches the Pareto front during the iterations and leading to an optimal solution that is closer to the theoretical optimum than the other algorithms. MOSSA locates the Pareto front more rapidly due to its unique search mechanism and adaptive strategy, but it readily falls into local optima in complex objective space. The other two algorithms converge to a poor Pareto front due to their over-dependence on local information and the uneven distribution of the initial population.

Figure 11. Impact of different multi-objective optimization algorithms on the energy-sharing strategy of BSMGs.

The selection of the sleep threshold

ρ_{t h}

has a significant impact on the average system revenue and network energy efficiency. Figure 12 presents a comparison of the average system revenue and network energy efficiency of the proposed scheme under different load thresholds. As the threshold rises, the energy efficiency of the network increases because the number of dormant BSMGs is lower, meaning that more users can access the CBSMG and meet their QoS requirements more effectively. Considering the system revenue and network energy efficiency, a moderate load threshold of

ρ_{t h} = 0.2

can maximize the traffic rate requirements of all users while ensuring that multiple BSMGs enter sleep mode, thus leading to more significant energy-saving advantages.

Figure 12. Comparison of average system revenue and network energy efficiency under different load thresholds.

In addition, we assess the impact of the number of BSMGs on the average system revenue (Figure 13). The average system revenue is positively correlated with the increase in the number of BSMGs, mainly because an increase in the number of BSMGs leads to a higher possibility of no or low loads. Therefore, using sleep mode decisions based on traffic predictions and energy sharing between BSMGs, more BSMGs can be made dormant while still ensuring the service requirements of users, raising the average revenue of the system.

Figure 13. Impact of the number of BSMGs on the average system revenue.

After confirming the stability of the number of BSMGs and the load threshold, the performance of our proposed strategy for cooperative sleep and energy sharing is compared with other energy-saving schemes ([43,44,45,46]). The comparative literature on this topic is detailed as follows:

Ref. [43]: The algorithm primarily uses the distance between CBSMGs as a criterion and randomly selects active SBSMGs to enter a sleep state.

Ref. [44]: This method does not consider the QoS demands of users when determining base station sleep patterns. It also relies on traffic prediction results generated through deep learning methods.

Ref. [45]: The algorithm employs Q-learning and other reinforcement learning techniques to tackle the challenge of optimizing BSMG sleep states and energy-sharing mechanisms.

Ref. [46]: The energy-sharing problem in BSMGs is addressed through a dynamic evolutionary game, offering a solution for optimal energy distribution among base stations.

Figure 14 presents the system revenue for a heterogeneous BSMG system under five energy-saving algorithms. Overall, the system power consumption increases with higher network loads because more SBSMGs are woken up, and the load for the CBSMG increases due to the higher number of users. At night, because there are fewer active users and more sleeping BSMGs, the energy storage devices for each BSMG meet their energy demand, so the difference in the system revenue between the control algorithms is low. However, in the daytime, the proposed strategy significantly reduces the energy consumption by 13% on average compared with the other energy-saving algorithms. This is because the CNN–DAM–BiLSTM traffic prediction model more accurately identifies which SBSMGs can be put into sleep mode. The heterogeneous BSMG energy-sharing strategy proposed in this paper also reduces the dependence on grid energy by encouraging energy sharing between individual BSMGs, while other energy-saving algorithms need to purchase more energy from the grid once the green energy is exhausted, which reduces system revenue. In particular, the control algorithm from [43] prioritizes SBSMGs that sleep close to the CBSMG, which means that the CBSMG load is always close to saturation, reducing the number of sleeping BSMGs and thus reducing system revenue.

Figure 14. Comparison of the system revenue for a heterogeneous BSMG system under five energysaving algorithms ([43,44,45,46]).

Figure 15 presents the network energy efficiency obtained using the five energy-saving algorithms over a 24 h period. The network energy efficiency decreases with the lower number of users in the early morning. In the daytime, with an increase in user activity, the network energy efficiency of each algorithm also rises, reaching a peak between 12 h and 18 h. However, the network energy efficiency of the proposed algorithm is higher than that of the other four algorithms across the entire period. This is because, when using the other four algorithms, because the maximum demand rate for users is not considered, QoS requirements cannot be fully met, which means that these algorithms tend to improve system efficiency while ignoring network energy efficiency. The dynamic evolutionary game is used to improve the energy efficiency in [46], but the improved MOEA/D outperforms it because of its better target processing mechanism, stronger convergence, and solution diversity.

Figure 15. Network energy efficiency obtained using various BSMG energy-saving algorithms ([43,44,45,46]).

The modified MOEA/D also improves green energy utilization due to energy-sharing among the individual BSMGs (Figure 16). Compared with the other four energy-saving algorithms, the green energy utilization rate of the proposed algorithm increases by up to 36.5%, highlighting its ability to maximize the use of renewable energy through energy sharing and cooperative sleep decisions, which is in line with low-carbon policies and initiatives.

Figure 16. Comparison of different energy-saving algorithms in green energy utilization. The error bars represent a confidence interval for the mean of 95% [32,33,34,35].

In summary, the literature [43] primarily relies on the physical distance between the base station and the CBSMG for sleep decision-making, employing a random selection mechanism. This static strategy, based on geometric location, overlooks the spatio-temporal dynamics of traffic demand and the uneven distribution of renewable energy. Although the literature [44] utilizes deep learning for traffic prediction, its sleep decision-making process does not explicitly account for user QoS, thereby increasing the risk of service interruption and local overload. The optimization strategy based on Q-learning and other reinforcement learning methods, as presented in reference [45], faces two significant challenges: the curse of dimensionality and the exploration–exploitation trade-off. The study [46] focuses on solving the Nash equilibrium through evolutionary game theory, which typically assumes that participants are fully rational and that the payoff function is linearly separable. These assumptions limit the strategy’s ability to address multi-objective nonlinear optimization problems and result in high computational complexity. In contrast, the proposed strategy leverages a deep learning-driven prediction-optimization closed-loop architecture and integrates multi-objective decomposition with adaptive search algorithms. This approach systematically enhances energy efficiency while ensuring QoS and effectively manages photovoltaic fluctuations and traffic bursts, thereby providing a scalable collaborative management framework for 5G heterogeneous BSMGs.

6. Conclusions

This study proposes a novel cooperative sleep and energy-sharing strategy for heterogeneous BSMG systems, integrating deep learning-based traffic prediction and an enhanced MOEA/D algorithm. Specifically, a hierarchical BSMG model (CBSMG and SBSMGs) was established to enable bidirectional energy–information flow, addressing the challenges of green energy intermittency and traffic variability in dense 5G deployments. To enhance prediction accuracy, a hybrid CNN–DAM–BiLSTM architecture incorporating dual attention mechanisms was developed. This architecture achieved a 72% reduction in RMSE compared to baseline models and reliably identified low-load SBSMGs for sleep mode activation. Furthermore, the enhanced MOEA/D algorithm, augmented with quantum local search, optimized energy sharing and sleep decisions, resulting in a 13% average energy saving during peak loads (compared to traditional grid-only schemes) and a 36.5% increase in intra-BSMG green energy collaboration. However, limitations include simplified energy storage dynamics (neglecting battery degradation), idealized user mobility assumptions, and deterministic renewable generation models. To address these limitations, future work will explore cross-layer optimization with software-defined networking to jointly allocate radio resources and energy, while integrating user-association strategies to balance fairness and QoS.

Author Contributions

Software, M.Y. and W.G.; project administration, T.Q.; formal analysis, M.Y.; investigation, W.G.; resources, Y.H.; data curation, M.Y.; writing—original draft preparation, M.Y.; writing—review and editing, T.Q.; visualization, W.G.; supervision, Y.H.; funding acquisition, T.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by Guangxi Key Research and Development plan project (AB23026037, AB24010274); in part by the Guangxi Science and technology base and talent project (AD24010061).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Yongle Hu was employed by the company Runjian Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Attaoui, W.; Sabir, E.; Elbiaze, H.; Guizani, M. VNF and CNF Placement in 5G: Recent Advances and Future Trends. IEEE Trans. Netw. Serv. Manag. 2023, 20, 4698–4733. [Google Scholar] [CrossRef]
Bagwari, A.; Logeshwaran, J.; Raja, M.; Devisivasankari, P.; Bagwari, J.; Rathi, V.; Saad, A.M. Intelligent Computational Model for Energy Efficiency and AI Automation of Network Devices in 5G Communication Environment. Tsinghua Sci. Technol. 2024, 29, 1728–1751. [Google Scholar] [CrossRef]
Agarwal, B.; Togou, M.A.; Ruffini, M.; Muntean, G. A Comprehensive Survey on Radio Resource Management in 5G HetNets: Current Solutions, Future Trends and Open Issues. IEEE Commun. Surv. Tutor. 2022, 24, 2495–2534. [Google Scholar] [CrossRef]
Fourati, H.; Maaloul, R.; Chaari, L.; Jmaiel, M. An Efficient Energy-Saving Scheme Using Genetic Algorithm for 5G Heterogeneous Networks. IEEE Syst. J. 2023, 17, 589–600. [Google Scholar] [CrossRef]
Sharma, H.; Kumar, N.; Tekchandani, R.K. SecBoost: Secrecy-Aware Deep Reinforcement Learning Based Energy-Efficient Scheme for 5G HetNets. IEEE. Trans. Mob. Comput. 2024, 23, 1401–1415. [Google Scholar] [CrossRef]
Mughees, A.; Tahir, M.; Sheikh, M.A.; Amphawan, A.; Meng, Y.K.; Ahad, A.; Chamran, K. Energy-efficient joint resource allocation in 5G HetNet using Multi-Agent Parameterized Deep Reinforcement learning. Phys. Commun. 2023, 61, 102206. [Google Scholar] [CrossRef]
Gorla, P.; Deshmukh, A.; Joshi, S.; Chamola, V.; Guizani, M. A Game Theoretic Analysis for Power Management and Cost Optimization of Green Base Stations in 5G and Beyond Communication Networks. IEEE Trans. Netw. Serv. Manag. 2022, 19, 2714–2725. [Google Scholar] [CrossRef]
Israr, A.; Yang, Q.; Israr, A. Emission-Aware Sustainable Energy Provision for 5G and B5G Mobile Networks. IEEE Trans. Sustain. Comput. 2023, 8, 670–681. [Google Scholar] [CrossRef]
Jiang, X.; Sun, A.; Sun, Y.; Luo, H.; Guizani, M. A Trust-Based Hierarchical Consensus Mechanism for Consortium Blockchain in Smart Grid. Tsinghua Sci. Technol. 2023, 28, 69–81. [Google Scholar] [CrossRef]
Yan, M.; Guo, W.; Zheng, H.; Qin, T. Joint NTP-MAPPO and SDN for Energy Trading Among Multi-Base-Station Microgrids. IEEE Internet Things J. 2024, 11, 18568–18579. [Google Scholar] [CrossRef]
Muhtadi, A.; Pandit, D.; Nguyen, N.; Mitra, J. Distributed Energy Resources Based Microgrid: Review of Architecture, Control, and Reliability. IEEE Trans. Ind. Appl. 2021, 57, 2223–2235. [Google Scholar] [CrossRef]
Wu, J.; Wong, E.W.M.; Chan, Y.; Zukerman, M. Power Consumption and GoS Tradeoff in Cellular Mobile Networks with Base Station Sleeping and Related Performance Studies. IEEE Trans. Green Commun. Netw. 2020, 4, 1024–1036. [Google Scholar] [CrossRef]
Ma, X.; Mu, Y.; Jia, H.; Li, M.; Long, Y.; Huang, Q.; Jiang, X. Exploring power system flexibility regulation potential based on multi-base-station cooperation self-optimising sleep strategy for 5G base stations. IET Energy Syst. Integr. 2023, 6, 345–363. [Google Scholar] [CrossRef]
Liu, S.; He, M.; Wu, Z.; Lu, P.; Gu, W. Spatial-temporal graph neural network traffic prediction based load balancing with reinforcement learning in cellular networks. Inf. Fusion 2024, 103, 102079. [Google Scholar] [CrossRef]
Liu, L.; Yuan, X.; Chen, D.; Zhang, N.; Sun, H.; Taherkordi, A. Multi-User Dynamic Computation Offloading and Resource Allocation in 5G MEC Heterogeneous Networks with Static and Dynamic Subchannels. IEEE Trans. Veh. Technol. 2023, 72, 14924–14938. [Google Scholar] [CrossRef]
Kaur, P.; Garg, R.; Kukreja, V. Energy-efficiency schemes for base stations in 5G heterogeneous networks: A systematic literature review. Telecommun. Syst. 2023, 84, 115–151. [Google Scholar] [CrossRef]
Guo, W.; Koo, J.; Siddiqui, I.F.; Qureshi, N.M.F.; Shin, D.R. QoS-Aware Energy-Efficient MicroBase Station Deployment for 5G-Enabled HetNets. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 10487–10495. [Google Scholar] [CrossRef]
Noh, J.; Lee, B.; Oh, S. User-Number Threshold-Based Base Station On/Off Control for Maximizing Coverage Probability. IEEE Trans. Veh. Technol. 2022, 71, 3214–3228. [Google Scholar] [CrossRef]
Ghosh, A.; Misra, I.S. Enabling sustainable green communication in three-tier 5G ultra dense HetNet with sleep cycle modulated energy harvesting. Wirel. Netw. 2024, 31, 841–864. [Google Scholar] [CrossRef]
Qiu, B.; Mao, S.; Xiao, H.; Zhang, Z. Power-Aware User Association and Renewable Energy Configuration for the Optimized On-Grid Energy in Hybrid-Energy Heterogeneous Cellular Networks. IEEE Syst. J. 2024, 18, 162–173. [Google Scholar] [CrossRef]
Chiaraviglio, L.; D’Andreagiovanni, F.; Rossetti, S.; Sidoretti, G.; Blefari-Melazzi, N.; Salsano, S.; Chiasserini, C.-F.; Malandrino, F. Algorithms for the design of 5G networks with VNF-based Reusable Functional Blocks. Ann. Telecommun. 2019, 74, 559–574. [Google Scholar] [CrossRef]
Salahdine, F.; Opadere, J.; Liu, Q.; Han, T.; Zhang, N.; Wu, S. A survey on sleep mode techniques for ultra-dense networks in 5G and beyond. Comput. Netw. 2021, 201, 108567. [Google Scholar] [CrossRef]
Ma, X.; Mu, Y.; Liu, Z.; Jiang, X.; Zhang, J.; Gao, Y. Energy consumption optimization of 5G base stations considering variable threshold sleep mechanism. Energy Rep. 2023, 9, 34–42. [Google Scholar] [CrossRef]
Garroppo, R.G.; Scutella, M.G.; D’Andreagiovanni, F. Robust green Wireless Local Area Networks: A matheuristic approach. J. Netw. Comput. Appl. 2020, 163, 102657. [Google Scholar] [CrossRef]
Wassie, G.; Ding, J.; Wondie, Y. Traffic prediction in SDN for explainable QoS using deep learning approach. Sci. Rep. 2023, 13, 20607. [Google Scholar] [CrossRef]
Wang, Z.; Hu, J.; Min, G.; Zhao, Z.; Chang, Z.; Wang, Z. Spatial-Temporal Cellular Traffic Prediction for 5G and Beyond: A Graph Neural Networks-Based Approach. IEEE Trans. Ind. Inform. 2023, 19, 5722–5731. [Google Scholar] [CrossRef]
Qi, Y.; Wang, H. QoS-aware cell association based on traffic prediction in heterogeneous cellular networks. IET Commun. 2017, 11, 2775–2782. [Google Scholar] [CrossRef]
Hu, X.; Liu, W.; Huo, H. An intelligent network traffic prediction method based on Butterworth filter and CNN-LSTM. Comput. Netw. 2024, 240, 110172. [Google Scholar] [CrossRef]
He, C.; Zhang, B.; Wei, S. Design of Base Station Sleeping Scheme in Heterogeneous Cellular Networks Based on User Traffic and SINR. In Proceedings of the 2024 International Wireless Communications and Mobile Computing (IWCMC), Ayia Napa, Cyprus, 27–31 May 2024; pp. 933–938. [Google Scholar]
Shen, W.; Zhang, H.; Guo, S.; Zhang, C. Time-Wise Attention Aided Convolutional Neural Network for Data-Driven Cellular Traffic Prediction. IEEE Wirel. Commun. Lett. 2021, 10, 1747–1751. [Google Scholar] [CrossRef]
Wu, H.; Wang, J.; Liu, C. TAN: Temporal Attention Enhanced Network for Cellular Traffic Prediction. In Proceedings of the 2022 14th International Conference on Wireless Communications and Signal Processing, WCSP, Virtual, 1–3 November 2022; pp. 342–346. [Google Scholar] [CrossRef]
Guo, J.; Tang, C.; Lu, J.; Zou, A.; Yang, W. WVETT-Net: A Novel Hybrid Prediction Model for Wireless Network Traffic Based on Variational Mode Decomposition. Electronics 2024, 13, 3109. [Google Scholar] [CrossRef]
Ouamri, M.A.; Otesteanu, M.E.; Isar, A.; Azni, M. Coverage, Handoff and cost optimization for 5G Heterogeneous Network. Phys. Commun. 2020, 39, 101037. [Google Scholar] [CrossRef]
Rodrigo, P.; Fernandez, E.F.; Almonacid, F.; Perez-Higueras, P.J. Review of methods for the calculation of cell temperature in high concentration photovoltaic modules for electrical characterization. Renew. Sust. Energ. Rev. 2014, 38, 478–488. [Google Scholar] [CrossRef]
Sun, Y.; Liu, J.; Liu, Z. Adaptive decomposition-based evolutionary algorithm for many-objective optimization with two-stage dual-density judgment. Appl. Soft Comput. 2024, 167, 112434. [Google Scholar] [CrossRef]
Mendez, M.; Merayo, M.G.; Nunez, M. Long-term traffic flow forecasting using a hybrid CNN-BiLSTM model. Eng. Appl. Artif. Intell. 2023, 121, 106041. [Google Scholar] [CrossRef]
Qi, H.; Gani, A.; Fu, D.; Gong, C.; Wang, L. Network Traffic Forecasting Model Based on Improved Fireworks Algorithm Optimized CNN-BiGRU. In Proceedings of the 2024 7th International Conference on Advanced Algorithms and Control Engineering (ICAACE), Shanghai, China, 1–3 March 2024; pp. 802–805. [Google Scholar]
Zhang, X.; Chen, Z.; Wang, W.; Fang, X. Prediction Method of PHEV Driving Energy Consumption Based on the Optimized CNN BiLSTM Attention Network. Energies 2024, 17, 2959. [Google Scholar] [CrossRef]
Li, B.; Wang, H. Multi-objective sparrow search algorithm: A novel algorithm for solving complex multi-objective optimisation problems. Expert Syst. Appl. 2022, 210, 118414. [Google Scholar] [CrossRef]
Tong, J.; Li, Y.; Liu, J.; Cheng, R.; Guan, J.; Wang, S.; Liu, S.; Hu, S.; Guo, T. Experiment analysis and computational optimization of the Atkinson cycle gasoline engine through NSGA II algorithm using machine learning. Energy Conv. Manag. 2021, 238, 113871. [Google Scholar] [CrossRef]
Cheraghi, R.; Jahangir, M.H. Multi-objective optimization of a hybrid renewable energy system supplying a residential building using NSGA-II and MOPSO algorithms. Energy Conv. Manag. 2023, 294, 117515. [Google Scholar] [CrossRef]
Cheng, H.; Li, L.; You, L. Dynamic Neighborhood Adjustment Strategy for Multi-Objective Evolutionary Algorithm Based on Decomposition. IEEE Access 2023, 11, 6574–6583. [Google Scholar] [CrossRef]
Li, L.; Meng, W. Collaborative base station sleeping solution design in heterogeneous cellular network. In Proceedings of the 2022 27th Asia Pacific Conference on Communications (APCC 2022): Creating Innovative Communication Technologies for Post-Pandemic Era, Jeju Island, Republic of Korea, 19–21 October 2022; pp. 231–235. [Google Scholar] [CrossRef]
Zhu, Y.; Wang, S. Joint Traffic Prediction and Base Station Sleeping for Energy Saving in Cellular Networks. In Proceedings of the IEEE International Conference on Communications (ICC 2021), Virtual, 14–23 June 2021. [Google Scholar] [CrossRef]
Piovesan, N.; Lopez-Perez, D.; Miozzo, M.; Dini, P. Joint Load Control and Energy Sharing for Renewable Powered Small Base Stations: A Machine Learning Approach. IEEE Trans. Green Commun. Netw. 2021, 5, 512–525. [Google Scholar] [CrossRef]
Xu, H.; Hui, H.; Zhou, C.; Zeng, G.; Han, Z. Cooperative Energy Trading for HetNets with Renewable Energy: A Dynamic Energy Trading Game. IEEE Internet Things J. 2024, 11, 11606–11618. [Google Scholar] [CrossRef]

Figure 1. Heterogeneous 5G BSMG system model employed in the present study.

Figure 2. Network structure diagram of the modified CNN.

Figure 3. Network structure of BiLSTM.

Figure 4. Structure of the multi-head attention mechanism used in the BiLSTM model.

Figure 5. Flow chart for the CNN–DAM–BiLSTM traffic prediction model for BSMGs.

Figure 6. Flowchart for the improved MOEA/D.

Figure 7. 5G base station used for traffic prediction at Guangxi University in China.

Figure 8. Traffic variation for a base station over a period of 48 h.

Figure 9. SHAP analysis of predictive models.

Figure 10. Traffic prediction results for the compared models and the ground truth over a 24 h period.

Figure 11. Impact of different multi-objective optimization algorithms on the energy-sharing strategy of BSMGs.

Figure 12. Comparison of average system revenue and network energy efficiency under different load thresholds.

Figure 13. Impact of the number of BSMGs on the average system revenue.

Figure 14. Comparison of the system revenue for a heterogeneous BSMG system under five energysaving algorithms ([43,44,45,46]).

Figure 15. Network energy efficiency obtained using various BSMG energy-saving algorithms ([43,44,45,46]).

Figure 16. Comparison of different energy-saving algorithms in green energy utilization. The error bars represent a confidence interval for the mean of 95% [32,33,34,35].

Table 1. Network parameters for the heterogeneous BSMG system.

Parameters	CBSMG	SBSMG
System bandwidth (MHZ)	10	10
Carrier frequency (GHZ)	2	4
Transmit power (dBm)	46	30
Gain of the antenna (dBi)	15	5
Static power consumption (W)	486	10.4

Table 2. The traffic and meteorological data.

Time	Number of Users	Rate of Service (Mbps)	Load (ρ)	Ambient Temperature (°C)	Intensity of Illumination (W/m²)
18:00	85	350	0.72	31.2	154
18:30	102	340	0.81	30.7	117
19:00	120	330	0.89	29.4	72
19:30	135	320	0.95	28.7	35
20:00	180	325	0.91	28.3	0
20:30	400	335	0.84	27.6	0

Table 3. The resulting parameter performance.

Hyperparameter	RMSE	MAPE
BiLSTM elements = 32	24.5	1.82
BiLSTM elements = 64	20.8	1.47
BiLSTM elements = 128	21.8	1.61
Heads of attention = 4	22.3	1.65
Heads of attention = 8	20.2	1.43
Heads of attention = 12	21.5	1.60
Number of convolution nuclei = 16	23.1	1.73
Number of convolution nuclei = 32	20.9	1.62
Number of convolution nuclei = 64	21.7	1.62
Learning rate = 0.01	25.8	1.91
Learning rate = 0.005	20.6	1.57
Learning rate = 0.001	22.4	1.68

Table 4. Ablation experiments for CNN–DAM–BiLSTM prediction models.

Model	RMSE	MAPE
CNN–DAM–BiLSTM	20.89	1.77%
CNN–BiLSTM–MHAM	28.41	2.95%
CNN–BiLSTM–CBAM	25.13	2.14%
CNN–BiLSTM–CAM	32.67	3.82%
CNN–BiLSTM–SAM	29.45	3.12%

Table 5. Evaluation results for the compared prediction models.

Prediction Model	RMSE	MAPE
CNN–DAM–BiLSTM	20.89	1.77%
CNN–SA–BiLSTM	62.49	5.42%
CNN–BiGRU	74.86	7.00%
CNN–BiLSTM	68.99	5.79%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Cooperative Sleep and Energy-Sharing Strategy for a Heterogeneous 5G Base Station Microgrid System Integrated with Deep Learning and an Improved MOEA/D Algorithm

Abstract

1. Introduction

2. Related Work

3. System Model

3.1. Reference Scenario

3.2. BSMG Network Model

4. Cooperative Sleep and Energy-Sharing Strategy for BSMGs

4.1. Traffic Prediction Based on Deep Learning

4.2. Optimization Problem Modeling

4.3. Improved MOEA/D Algorithm

5. Simulation Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics