Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs

Wan Ariffin, Wan Nur Suryani Firuz; Zhang, Xinruo; Nakhai, Mohammad Reza; Rahim, Hasliza A.; Ahmad, R. Badlishah

doi:10.3390/s21072308

Open AccessArticle

Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs

by

Wan Nur Suryani Firuz Wan Ariffin

^1,*

,

Xinruo Zhang

²

,

Mohammad Reza Nakhai

³

,

Hasliza A. Rahim

^1,4,*

and

R. Badlishah Ahmad

^1,5

¹

Faculty of Electronic Engineering Technology, Universiti Malaysia Perlis, Arau 02600, Malaysia

²

School of Computer Science and Electronic Engineering, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK

³

Department of Informatics, Centre for Telecommunications Research, King’s College London, Aldwych WC2B 4BG, UK

⁴

Advanced Communication Engineering, Centre of Excellence (ACE), Universiti Malaysia Perlis, Kangar 01000, Malaysia

⁵

Advanced Computing, Centre of Excellence (AdComp), Universiti Malaysia Perlis, Arau 02600, Malaysia

^*

Authors to whom correspondence should be addressed.

Sensors 2021, 21(7), 2308; https://doi.org/10.3390/s21072308

Submission received: 8 December 2020 / Revised: 22 January 2021 / Accepted: 5 February 2021 / Published: 25 March 2021

(This article belongs to the Special Issue Energy Harvesting Communication and Computing Systems)

Download

Browse Figures

Versions Notes

Abstract

Constantly changing electricity demand has made variability and uncertainty inherent characteristics of both electric generation and cellular communication systems. This paper develops an online learning algorithm as a prescheduling mechanism to manage the variability and uncertainty to maintain cost-aware and reliable operation in cloud radio access networks (Cloud-RANs). The proposed algorithm employs a combinatorial multi-armed bandit model and minimizes the long-term energy cost at remote radio heads. The algorithm preschedules a set of cost-efficient energy packages to be purchased from an ancillary energy market for the future time slots by learning both from cooperative energy trading at previous time slots and by exploring new energy scheduling strategies at the current time slot. The simulation results confirm a significant performance gain of the proposed scheme in controlling the available power budgets and minimizing the overall energy cost compared with recently proposed approaches for real-time energy resources and energy trading in Cloud-RANs.

Keywords:

cloud radio access network; combinatorial multi-armed bandit; online learning; energy trading

1. Introduction

Denser site deployment has been contemplated as a key enabling technology that can support the mushrooming of mobile data traffic and meet the demands of high-data-rate communications for next-generation wireless communication networks [1]. In contrast, conventional base stations (BSs) consume 80% of the electricity [2], as, in the BSs, all the radio and baseband processing functions are coordinated in the second-generation (2G) radio access network (RAN) architecture. Subsequently, the radio and baseband processing functions are divided into two separate nodes, i.e., remote radio head (RRH) and baseband processing unit (BBU), in the development of the third-generation (3G) and fourth-generation (4G) distributed radio access network (Distributed-RAN) architecture. Nevertheless, Distributed-RAN is incompetent in dealing with tremendous growths in data traffic to deliver high-bandwidth, low-latency, and cost-efficient services [3], and is incapable of supporting the demands of the quality of expectation (QoE) and quality of service (QoS) [4] for the fifth-generation (5G) of mobile communication systems. Cloud radio access networks (Cloud-RANs) have been regarded as a promising solution, owing to their superiority in reducing the capital expenditure (CAPEX) and operational expenditure (OPEX) of the network operators with the centralization and cloudification of BBUs and their corresponding RRHs. Cloud-RANs can solve the limitations of the Distributed-RAN architecture in terms of expanding the network scalability, simplifying network management and maintenance, optimizing network performance, reducing energy consumption, and enhancing spectrum efficiency [3]. In a Cloud-RAN architecture, the conventional BSs are physically detached into two parts: BBUs, which are grouped as a cloud processing unit (CU) for designing all coordination and energy trading strategies, and the remaining RRHs, which are in charge of all radio frequency (RF) operations [5]. Even beamforming is designed in the CU; RRHs consume an enormous portion of electricity to amplify and transmit RF signals to users in order to satisfy their data-rate and energy requirements. However, due to the large number of densely deployed RRHs, with each serving a time-varying number of users in a highly dynamic wireless environment, the amount of energy demanded by the wireless network operators from the energy generation (EG) plants will be highly variable and statistically unknown over different times of the day. Equipping the RRHs with green energy technology that harvests energy from natural sources, such as wind and sunlight, to power next-generation mobile communication networks can significantly contribute to the reduction of the global carbon footprint [6]. However, the uncertain nature of renewable energy supply coupled with dynamic user energy demand necessitates the integration of green energy supply with the conventional grid to maximally benefit the network operator [7,8,9,10,11,12,13,14,15]. These random variations in electricity demand increase the OPEX of the energy generation process because the EG plants must maintain an instantaneous balance between the aggregate demand for electricity and the total power generated as a whole [7]. Hence, the operators need to routinely control the operation of the wireless network based on the well-known operating characteristics of the conventional EG plants. Deviation from the operating points of the EG plants to provide compensating variations in order to maintain the balance increases the total OPEX of the EG plants, which will, in turn, reflect on the OPEX of the network operators.

The operational time frame of the grid can be generally divided into regulation, load following, and unit commitment. During each one of these time frames, suitable reserved energy sources are dispatched to correct the imbalance between the generation and the demand. The EG sources reserved for load following, which are deployed on a slower time scale than the regulating frames, are used to accommodate for causes of variability and uncertainty, e.g., due to traffic energy demand and renewable energy generation, during the regular operation of the grid. Although the ramping and energy needed to follow the variations and uncertainties can be supplied by the ancillary energy markets, the insufficient ramping capability of the base low-cast conventional power plants can significantly inflate the price of the energy dispatched by expensive peaking EG units with fast ramp rates. Using conventional regulation units to compensate for uncertain abrupt ramps in energy demand is among the most expensive services. Hence, efficient control mechanisms are required to be developed for the flexibility in the EG fleet in order to maintain their cost-aware reliable operation under variability and uncertainty.

This paper focuses on designing an intelligent control mechanism for the steep ramps in energy demand in wireless cellular networks to minimize the long-term energy cost. We introduce an online learning approach for price-aware energy procurement at RRHs by supplying the load-following EG reserves within advance energy trading offers based on possible forthcoming variations and uncertainties in the energy demand. As the energy demand varies from low to peak values during different hours, the proposed strategy is designed to avoid paying for high peak-time energy cost by purchasing energy in advance at a lower off-peak price to reduce the OPEX. The proposed approach anticipates the future energy demand (surplus) at each RRH and prepares for purchasing (selling) the energy from the hour-ahead/day-ahead market (to the grid) before the actual demand (surplus) emerges. In this way, the EG units will have more time to regulate their electricity generation process according to the demand with slower ramp rates and, consequently, at lower prices.

1.1. Related Works

The authors in [7] first investigated the energy efficiency problem in a coordinated multipoint (CoMP) system powered by a smart gird. They formed the problem formulation for the proposed system as a simplified two-level Stackelberg game and concluded that such a design significantly reduces the OPEX. Equipping the end-user with renewable energy devices and accounting for the varying electricity price, the authors in [8] developed an energy trading algorithm to maximally benefit the network operator while satisfying the energy demand of end-users in a grid/renewable energy hybrid network. To take advantage of two-way energy trading with the grid and cooperative transmission, the authors in [9] proposed an aggregator-aided joint communication and energy cooperation strategy in the CoMP networks powered by both grid and renewable energy. In [10], the authors designed a joint real-time energy trading and cooperative transmission mechanism based on convex optimization techniques in a smart-grid-powered CoMP system. In [11], the authors studied energy trading in a more general setting, including trading among a set of storage units and the grid from the perspective of noncooperative game theory, and they proposed an algorithm that achieves at least one Nash equilibrium point. By assuming the availability of varying hourly profiles of the energy demand of base stations and renewable generation as well as the day-ahead knowledge of hourly varying electricity prices, the authors of [12] minimized the electricity bill in cellular base stations powered jointly by a smart grid and locally harvested solar energy. The authors of [13] integrated the CoMP system with a simultaneous wireless information and power transfer (SWIPT) concept and proposed a joint energy trading and partial cooperation design based on sparse beamforming, accounting for limited-capacity backhaul links in a green Cloud-RAN by minimizing the instantaneous energy cost without integrating reinforcement learning. The authors of [14] investigated the optimal power flow problem for smart micro-grids in a distributed manner and adopted an alternating direction method of multipliers to ensure the global optimum of the semidefinite programming (SDP) problem. It can be perceived that an abstract idea of the combinatorial multi-armed bandit (CMAB) approach was firstly tackled in [15] by introducing two iterative energy trading algorithms to search for a set of cost-efficient energy packages in ascending and descending order of package sizes and assuming invariability of wireless channel circumstances. Consequently, the study in [16] proposed a CMAB approach for energy trading in the cellular network to support the unpredictable wireless channel conditions to further lessen the total energy cost over a finite time horizon.

1.2. Main Contributions

This paper’s main contributions to real-time energy resource and energy trading in Cloud-RAN environments are summarized as follows:

A joint energy trading and clustering technique to account for limited-capacity backhaul links in a green Cloud-RAN with a SWIPT system was proposed in [13]. However, their design was based on myopic optimization of semidefinite programming (SDP) (i.e., minimizing the instantaneous energy cost for the current time only) without any learning process for future demand provisioning. Furthermore, their proposed design cannot cope with the time-varying system dynamics, since they considered no temporal dynamic of the energy demand and cost over time and provided no solution for the look-ahead energy purchase decisions.
In contrast to [15], this paper develops a combinatorial upper confidence bound (CUCB) algorithm as a prescheduling mechanism to maintain cost-aware reliable operation in CRANs to handle the variability and uncertainty of both the electrical generation and the intrinsic characteristics of the cellular communication system. This paper predicts the best possible combination of energy packages to be purchased for the next time slot by exploring the rewards of new combinations of energy packages within given trials at the current time slot and exploiting the past captured information on rewards of super arms from the previous time slots to optimize long-term averaged rewards.
Differently from the system model proposed in [16], this paper considers a downlink Cloud-RAN with SWIPT, where the RRHs concurrently transfer satisfied data beams to information users and requested energy beams to active energy users. Furthermore, this paper also integrates a sparse beamforming technique to iteratively remove the cooperative links between the RRHs and the active information users based on the renewable power budgets and front-haul link capacity limitations at the individual RRHs. The clustering technique has been confirmed to enhance energy efficiency and decrease the total energy cost of the RRH in pragmatic Cloud-RANs [17]. In contrast to their CMAB approach, this paper estimates the imminent energy demands by dynamically deciding on an optimal set of super arms by exploring all of the possible minimal combinatorial energy packages to be purchased from the day-ahead market, thus diminishing the risk of regret factors.

This work’s novel contribution is the development of a sequential learning algorithm that adaptively tracks the temporal variations of energy demands and makes predictive decisions on look-ahead energy purchases in dynamically changing environments with unknown statistics to asymptotically minimize the time-averaged overall energy cost in the long run. The proposed algorithm anticipates the future energy demands of the distributed RRHs in the Cloud-RAN and schedules these demands by invoking the various power plants well in advance so that higher energy prices at peak demand times are curtailed. The proposed algorithm does not require any other description of usage patterns or statistical distribution of stochastic events. It performs foresighted optimization based on online learning during the operation. It only uses the past captured data on averaged accumulated rewards for predicting the energy consumption at the next period based on the proposed strategy.

1.3. Organization and Notations

The rest of this paper is structured as follows. The system model for the downlink Cloud-RAN with SWIPT and the energy management model are introduced in the Section 2. In Section 3, the problem of real-time collaborative energy trading at an individual time frame is formulated and then transformed into a numerically tractable form. The predictive energy trading strategy is proposed in Section 4. Numerical simulation results are interpreted in Section 5. Finally, Section 6 summarizes the proposed work.

Notation 1.

w,

w

, and

W ⪰ 0

, respectively, denote a scalar w, a vector

w

, and a positive semidefinite matrix

W

.

C^{n \times m}

,

{(.)}^{H}

,

tr (.)

, and

E

indicate the sets of n-by-m dimensional complex matrices, the complex conjugate transpose operators, the trace operators, and the expected value, respectively.

{∥ . ∥}_{p}

represents the

ℓ_{p}

-norm of a vector and

{∥ . ∥}_{0}

denotes the number of non-zero entries in the vector. Notice that the duration of a time frame is normalized to one and the normalized energy unit, i.e.,

J s^{- 1}

, is assumed in this paper. Therefore, in this paper, the terms “power” and “energy” are mutually interchangeable.

2. System Model

Consider a downlink transmission Cloud-RAN with SWIPT from N RRHs towards

K_{i}

information users (IUs),

K_{e}

active energy users (EUs), and

K_{e}^{[idle]}

idle EUs, respectively, over a shared bandwidth. Notice that the active EUs located within the energy-serving area of an RRH can exploit the energy-carrying signals directly from that particular RRH. In contrast, the idle EUs located outside any energy-serving area of the RRHs can only scavenge energy from the ambient radio frequency signals for self-sustainability [13]. Each RRH is equipped with M antennas, and the individual IUs and EUs have one single antenna. Based on perfect knowledge of channel state information (CSI), the CU coordinates all the resource management and energy trading strategies for the RRHs and administers all the IUs’ data to the corresponding RRHs finite-capacity front-haul links. Remark that, under the perfect CSI assumption, all the channel properties of the downlink Cloud-RAN communication links, i.e., path loss, scattering, fading, shadowing, etc., are assumed to be perfectly known at both the IU and EU terminals.

Let

L_{b} = {1, \dots, N}

,

L_{i} = {1, \dots, K_{i}}

,

L_{e} = {1, \dots, K_{e}}

, and

L_{e}^{[idle]} = {1, \dots, K_{e}^{[idle]}}

denote, respectively, the set of indexes of the RRHs, the IUs, the active EUs, and the idle EUs. The amount of energy flow in this paper depends on the data-rate requirements by the IUs, the wireless energy transfer requirements by the active EUs, and the harvested energy requirements from the environment by the idle EUs, whereas the amount of data flow depends only on the IUs. Let us divide the long-term period T into discrete time slots, indexed as

T = {1, \dots, T}

, and define

F = {1, \dots, F}

and

K = {1, \dots, K}

as the set of indexes of the frames within a time slot and the set of indexes of the learning trials within a frame, respectively. The channel is assumed to vary across frames, but remains invariant within each frame. This paper proposes an online learning algorithm that iteratively alternates between designing the overall transmission strategy using convex optimization and preparing for future energy demand from the day-/hour-ahead market via online learning, i.e., a CMAB approach, to avoid steep ramps in the energy generation plant and to minimize the long-term energy cost.

2.1. Energy Management Model

Similarly to [13], it is assumed that at least one renewable energy generator, i.e., solar panel or/and wind turbine, is installed in the vicinity of each RRH. In this setup, none of the RRHs are equipped with any frequently rechargeable storage devices. Furthermore, bidirectional energy trading with the primary grid is enabled at the individual RRHs. Thus, the RRHs can purchase energy in the day-/hour-ahead market during off-peak hours at a lower price and/or in the spot market during peak hours at a higher price, and the surplus energy can also be sold back to the grid at an agreed-upon price. Let

B_{n}^{[spot]}

,

B_{n}^{[ahead]}

,

S_{n}

, and

E_{n}

denote, at time slot

t, t \in T

, the amount of real-time energy purchases from the spot-market for the n-th RRH to cover an instantaneous energy shortage, the amount of look-ahead energy purchases from the day-/hour-ahead market at the end of previous time slot “t − 1”, the amount of surplus energy to be traded back to the primary grid, and the amount of renewable energy generation at the n-th RRH, respectively. In addition, let

P_{n}^{[Tx]}

be the total transmit power and

P_{n}^{[circ]}

be the total power consumption of the hardware circuits at the n-th RRH. Furthermore, in any frame, the total energy consumption at the n-th RRH, i.e.,

P_{n}^{[total]}

, is constrained as

P_{n}^{[total]} = P_{n}^{[Tx]} + P_{n}^{[circ]} = B_{n}^{[spot]} + B_{n}^{[ahead]} - S_{n} + E_{n} .

(1)

By viewing from the perspective of supply and demand, let us assume

π^{[spot]} \geq π^{[ahead]} \geq π^{[sell]} \geq π^{[renew]}

, where

π^{[spot]}

,

π^{[ahead]}

,

π^{[sell]}

, and

π^{[renew]}

denote the price of purchasing (selling) per unit energy of

B_{n}^{[spot]}

,

B_{n}^{[ahead]}

,

S_{n}

, and generating per unit energy of

E_{n}

(by averaging the capital expenses and OPEX of renewable devices over their lifetime), respectively. Then, the cumulative energy cost procured by the n-th RRH at the k-th trial,

k \in K

of the frame

f, f \in F

at the time slot

t, t \in T

, i.e.,

B_{n}^{[total]} (k)

, is given by

\begin{matrix} B_{n}^{[total]} (k) & = π^{[spot]} B_{n}^{[spot]} (k) + π^{[ahead]} B_{n}^{[ahead]} (k) \\ - π^{[sell]} S_{n} (k) + π^{[renew]} E_{n} (k), \forall n \in L_{b} . \end{matrix}

(2)

2.2. Downlink Transmission Model

Let

w_{i} = {[w_{1 i}^{H}, \dots, w_{N i}^{H}]}^{H} \in C^{M N \times 1}

and

v_{e} = {[v_{1 e}^{H}, \dots, v_{N e}^{H}]}^{H} \in C^{M N \times 1}

be defined, respectively, as the set of indexes of the beamforming vector from all RRHs towards the i-th IU,

i \in L_{i}

and the e-th active EU,

e \in L_{e}

, where

w_{n i} \in C^{M \times 1}

and

v_{n e} \in C^{M \times 1}

represent the beamformer from the n-th RRH to the i-th IU and the e-th active EU, respectively. In addition, let

h_{i} = {[h_{1 i}^{H}, \dots, h_{N i}^{H}]}^{H} \in C^{M N \times 1}

denote the set of indexes of the channel vector between all RRHs and the i-th IU, where

h_{n i} \in C^{M \times 1}

denotes the channel vector from the n-th RRH to the i-th IU. Accordingly, the signal collected at the i-th IU,

i \in L_{i}

, can be expressed as the summation of the dedicated information-carrying signal, the inter-user interference induced by other non-devoted information beams, the interference provoked by the energy-carrying signals assigned to all active EUs, and the additive white Gaussian noise at the i-th IU as

\begin{matrix} y_{i} = h_{i}^{H} w_{i} s_{i}^{[IU]} + \sum_{\begin{matrix} j \neq i \\ j \in L_{i} \end{matrix}} h_{i}^{H} w_{j} s_{j}^{[IU]} + \sum_{e \in L_{e}} h_{i}^{H} v_{e} s_{e}^{[EU]} + n_{i} . \end{matrix}

(3)

Due to the fact that energy beams carry no information, only the data of IUs will be delivered via the front-haul links. Without loss of generality,

E (s_{i}^{[IU]}) = E (s_{e}^{[EU]}) = 1

is assumed, and the signal-to-interference-plus-noise ratio (SINR) at the i-th IU,

i \in L_{i}

, is formulated as

{SINR}_{i}^{[IU]} = \frac{| h_{i}^{H} w_{i} |^{2}}{\sum_{j \in L_{i}, j \neq i} | h_{i}^{H} w_{j} |^{2} + \sum_{e \in L_{e}} {| h_{i}^{H} v_{e} |}^{2} + σ_{i}^{2}},

(4)

where

| h_{i}^{H} w_{i} |^{2}

indicates the desired power received at the i-th IU and

| w_{i} |^{2}

is the required transmit power at the RRHs. Let us define the scheduling arrangements between the i-th IU and the n-th RRH for partial cooperation [18], i.e.,

{∥∥ w_{n i} ∥_{2}^{2}∥}_{0}

, as

{∥∥ w_{n i} ∥_{2}^{2}∥}_{0} = \{\begin{matrix} 0, if ∥ w_{n i} ∥_{2}^{2} = 0, \\ 1, if ∥ w_{n i} ∥_{2}^{2} \neq 0, \end{matrix}

(5)

where

∥ w_{n i} ∥_{2}^{2} = 0

betokens that the i-th IU is not selected to be supported by the n-th RRH and, hence, the front-haul link between the CU and the n-th RRH is not employed for joint data transmission to the i-th IU. Hence, the front-haul link capacity consumption of the n-th RRH is expressed as

C_{n}^{[front]} = \sum_{i \in L_{i}} {∥∥ w_{n i} ∥_{2}^{2}∥}_{0} R_{i}, \forall n \in L_{b},

(6)

where

R_{i} = \log_{2} (1 + {SINR}_{i}^{[IU]})

is the achievable data-flow rate (bit/s/Hz) for the i-th IU and directly depends on the transmit power and the wireless channel fading condition. The total energy received by the e-th active EU,

e \in L_{e}

, is defined as

G_{e}^{[EU]} = η (| g_{e}^{H} v_{e} |^{2} + \sum_{j \in L_{e}, j \neq e} | g_{e}^{H} v_{j} |^{2} + \sum_{i \in L_{i}} {| g_{e}^{H} w_{i} |}^{2}),

(7)

where the terms on the right-hand side of (7) represent the intended energy-carrying signal for the e-th active EU, the inter-user interference caused by all other non-desired energy beams, and the inter-user interference caused by information beams, respectively. Let

0 \leq η \leq 1

denote the conversion efficiency to convert the harvested RF energy into the functional electrical energy form, and

g_{e} = {[g_{1 e}^{H}, \dots, g_{N e}^{H}]}^{H} \in C^{M N \times 1}

indicates the set of indexes of the channel vector between all the RRHs and the e-th active EU. The collective energy that can be harvested from the ambiances and atmospheres by the z-th idle EU,

z \in L_{e}^{[idle]}

, is presented as

G_{z}^{[ET - idle]} = η (\sum_{i \in L_{i}} | f_{z}^{H} w_{i} |^{2} + \sum_{e \in L_{e}} | f_{z}^{H} v_{e} |^{2}),

(8)

where

f_{z} = {[f_{1 z}^{H}, \dots, f_{N z}^{H}]}^{H} \in C^{M N \times 1}

represents the set of indexes of the channel vector between all the RRHs and the z-th idle EU.

3. Real-Time Energy Trading on an Individual Time Frame

This paper relies on foresighted optimization based on CMAB learning to minimize the long-term average energy cost. In accordance with (2), the total energy cost at a given trial of a frame within a time slot is determined by four parameters, i.e.,

B_{n}^{[spot]}

,

S_{n}

,

E_{n}

, and

B_{n}^{[ahead]}

. It is assumed that the amount of renewable energy supply

E_{n}

is given at the beginning of each time slot, whereas

B_{n}^{[ahead]}, \forall n \in L_{b}

is determined in advance at the end of the previous time slot via the proposed online learning algorithm, i.e., Algorithms 1 and 2 in Section 4, to prepare for future demands.

3.1. Problem Formulation

Let us define

P_{n}^{[Tx]} = \sum_{i \in L_{i}} | | w_{n i} {| |}_{2}^{2} + \sum_{e \in L_{e}} | | v_{n e} {| |}_{2}^{2}

as the total power transmitted by the n-th RRH to its scheduled users and the degree of partial cooperation among RRHs as

\begin{matrix} P^{[coop]} & = (\sum_{i \in L_{i}} {∥∥ w_{1 i} ∥_{2}^{2}∥}_{0} + \dots + \sum_{i \in L_{i}} {∥∥ w_{N i} ∥_{2}^{2}∥}_{0}) \\ + (\sum_{e \in L_{e}} {∥∥ v_{1 e} ∥_{2}^{2}∥}_{0} + \dots + \sum_{e \in L_{e}} {∥∥ v_{N e} ∥_{2}^{2}∥}_{0}) . \end{matrix}

To minimize the average energy cost, let us consider the following cooperative energy trading model in each trial of a frame within a time slot for the given

B_{n}^{[ahead]}

as

\begin{matrix} min_{\begin{matrix} w_{n i}, v_{n e}, \\ B_{n}^{[spot]}, S_{n} \end{matrix}} & α P^{[coop]} + \sum_{n \in L_{b}} P_{n}^{[Tx]} + \sum_{n \in L_{b}} \{B_{n}^{[spot]}\} \\ s . t . & C 1 : {SINR}_{i}^{[IU]} \geq γ_{i}, \forall i \in L_{i}, \\ C 2 : G_{e}^{[EU]} \geq P_{e}^{[\min]}, \forall e \in L_{e}, \\ C 3 : G_{z}^{[EU - idle]} \geq P_{z}^{[idle]} \forall z \in L_{e}^{[idle]}, \\ C 4 : P_{n}^{[Tx]} \leq E_{n} + B_{n}^{[ahead]} + B_{n}^{[spot]} - S_{n} - P_{n}^{[circ]} \\ C 5 : P_{n}^{[Tx]} \leq P_{n}^{[Tmax]}, \forall n \in L_{b}, \\ C 6 : C_{n}^{[front]} \leq C_{n}^{[limit]}, \forall n \in L_{b}, \\ C 7 : \sum_{n \in L_{b}} B_{n}^{[ahead]} + \sum_{n \in L_{b}} B_{n}^{[spot]} \leq P_{CU}^{[\max]} - P_{CU}^{[circ]} \\ C 8 : B_{n}^{[spot]} \geq 0,, \forall n \in L_{b}, \\ C 9 : S_{n} \geq 0, \forall n \in L_{b}, \end{matrix}

(9)

where

α \geq 0

is the maximal energy cost in the front-haul link for the degree of partial cooperation among RRHs. C1 guarantees the minimum SINR requirements

γ_{i}

for the i-th IUs.

P_{e}^{[\min]}

in C2 indicates the minimal energy demanded by the active EUs, whereas

P_{z}^{[idle]}

in C3 is the minimal conditions of energy harvested from the ambiance and atmospheres by the idle EUs. C4 betokens that the individual RRHs’ power budget restrains the total transmit power as per (1), while C5 emphasizes that the total transmit power is upper-limited by the maximum transmit power permitted, i.e.,

P_{n}^{[Tmax]}

, at the n-th RRH. C6 expresses the front-haul link capacity limitations for the individual RRHs. C7 implies the restriction for the total power provided by the grid to the RRHs, where

P_{CU}^{[circ]}

is the hardware circuit power consumption and

P_{CU}^{[\max]}

is the maximum power generated by the grid at the CU [19]. C8 and C9 are the non-negative constraints set for the optimization variables.

3.2. Re-Weighted $ℓ_{1}$ -Norm and Semidefinite Programming

The optimization problem in (9) is an NP-hard (nondeterministic polynomial time) problem due to the non-convexity of the

ℓ_{0}

-norm term in the objective function and the constraints C1 and C6. These non-convexity terms can be reformulated by using one of the powerful convex optimization techniques, i.e., semidefinite programming (SDP). Note that the

ℓ_{1}

-norm approximation is commonly adopted in compressed sensing to handle

ℓ_{0}

-norm optimization problems [20]. Then, let us consider the following property:

x^{H} A x = tr (x x^{H} A) = tr (A x x^{H}) .

(10)

From a mathematical point of view, the property in (10) can be interpreted as the inner vector product being equal to the trace of the outer product. If

A = I

, then

x^{H} x = tr (x x^{H}) .

(11)

By adding this property, denoting

H_{i} = h_{i} h_{i}^{H}

,

G_{e} = g_{e} g_{e}^{H}

, and

F_{z} = f_{z} f_{z}^{H}

, and specifying the rank-one semidefinite matrices as

W_{i} = w_{i} w_{i}^{H}

and

v_{e} = v_{e} v_{e}^{H}

in the optimization problem, the constraint C1 can be reformulated as

\begin{matrix} C 1 : \frac{| h_{i}^{H} w_{i} |^{2}}{\sum_{j \in L_{i}, j \neq i} | h_{i}^{H} w_{j} |^{2} + \sum_{e \in L_{e}} {| h_{i}^{H} v_{e} |}^{2} + σ_{i}^{2}} \geq γ_{i}, \forall i \in L_{i}, \end{matrix}

(12)

\begin{matrix} C 1 : tr (H_{i} W_{i}) \geq γ_{i} \sum_{j \in L_{i}, j \neq i} tr (H_{i} W_{j}) + γ_{i} \sum_{e \in L_{e}} tr (H_{i} v_{e}) + γ_{i} σ_{i}^{2}, \forall i \in L_{i} . \end{matrix}

(13)

Following a procedure similar to that in [13], the intractability of the

ℓ_{0}

-norm term in the objective function and constraint C6 is overcome by approximating by their respective re-weighted

ℓ_{1}

-norms [20], as follows:

\begin{matrix} P^{[coop]} & \approx \sum_{i \in L_{i}} {∥[ξ_{1 i} ∥ w_{1 i} ∥_{2}^{2}]∥}_{1} + \dots + \sum_{i \in L_{i}} {∥[ξ_{N i} ∥ w_{N i} ∥_{2}^{2}]∥}_{1} \\ + \sum_{e \in L_{e}} {∥[κ_{1 e} ∥ v_{1 e} ∥_{2}^{2}]∥}_{1} + \dots + \sum_{e \in L_{e}} {∥[κ_{N e} ∥ v_{N e} ∥_{2}^{2}]∥}_{1} \\ = \sum_{n \in L_{b}} (\sum_{i \in L_{i}} ξ_{n i} tr (w_{i} w_{i}^{H} D_{n}) + \sum_{e \in L_{e}} κ_{n e} tr (v_{e} v_{e}^{H} D_{n})), \end{matrix}

(14)

\begin{matrix} C_{n}^{[front]} \approx \sum_{i \in L_{i}} {∥[ξ_{n i} ∥ w_{n i} ∥_{2}^{2}]∥}_{1} R_{i} = \sum_{i \in L_{i}} ξ_{n i} tr (w_{i} w_{i}^{H} D_{n}) R_{i}, \end{matrix}

(15)

where

D_{n} ≜ diag (\overset{(n - 1) M}{\overset{︷}{0, . . ., 0,}} \overset{M}{\overset{︷}{1, . . ., 1,}} \overset{(N - n) M}{\overset{︷}{0, . . ., 0}}) ⪰ 0, \forall n \in L_{b}

, is used for extracting the corresponding beamformer

w_{n i}

.

ξ_{n i}

and

κ_{n e}

, respectively, are the weighting factors associated with the n-th RRH and the i-th IU/the e-th active EU, which will be updated as per the re-weighted

ℓ_{1}

-norm method algorithm in [13] to iteratively remove the collaborative links between the RRHs and the IUs/active EUs in the circumstances of front-haul link capacity limitations at the individual RRHs. Hence, the non-convex problem formulation introduced in (9) can be modified to a convex optimization problem with significantly reduced complexity [21] after relaxing the rank-one constraints of

rank (W_{i}) = 1

and

rank (v_{e}) \leq 1

, as (16).

Lemma 1.

The optimal solutions to the problems (16) satisfy rank

(W_{i}^{*}) = 1

and rank

(v_{e}^{*}) \leq 1

with a probability of one.

Proof.

The proof is straightforward by following similar steps to those in [13]. □

As per [22], the interior point methods for solving SDP have polynomial (quadratic) worst-case complexity and are superb for medium- and large-scale problems, e.g., those bounded by

O (\log n)

, where n is the problem size. Furthermore, as the size of the optimization problem grows large, the computational complexity tends to grow more slowly and even remains almost constant according to [23]. Hence, with increasing size of the problem, e.g., an increasing total number of RRHs, users, and per-RRH antennas, the number of iterations needed to solve the optimization problem grows sub-linearly with the size of the problem, and even tends to remain almost constant.

\begin{matrix} min_{\begin{matrix} W_{i}, \\ V_{e}, S_{n}, \\ B_{n}^{[spot]} \end{matrix}} & \sum_{n \in L_{b}} (\sum_{i \in L_{i}} ξ_{n i} tr (W_{i} D_{n}) + \sum_{e \in L_{e}} κ_{n e} tr (V_{e} D_{n})) \\ + (\sum_{i \in L_{i}} tr (W_{i}) + \sum_{e \in L_{e}} tr (V_{e})) + \sum_{n \in L_{b}} \{B_{n}^{[spot]}\} \\ s . t . & C 1 : tr (H_{i} W_{i}) \geq γ_{i} \sum_{j \in L_{i}, j \neq i} tr (H_{i} W_{j}) + \\ γ_{i} \sum_{e \in L_{e}} tr (H_{i} V_{e}) + γ_{i} σ_{i}^{2}, \forall i \in L_{i}, \\ C 2 : tr (G_{e} V_{e}) + \sum_{\begin{matrix} j \in L_{e}, \\ j \neq e \end{matrix}} tr (G_{e} V_{j}) + \sum_{i \in L_{i}} tr (G_{e} W_{i}) \\ \geq P_{e}^{[\min]} η^{- 1}, \forall e \in L_{e}, \\ C 3 : \sum_{i \in L_{i}} tr (F_{z} W_{i}) + \sum_{e \in L_{e}} tr (F_{z} V_{e}) \geq P_{z}^{[idle]} η^{- 1}, \\ \forall z \in L_{e}^{[idle]}, \\ C 4 : \sum_{i \in L_{i}} tr (W_{i} D_{n}) + \sum_{e \in L_{e}} tr (V_{e} D_{n}) \leq [E_{n} - S_{n} \\ + B_{n}^{[ahead]} + B_{n}^{[spot]} - P_{n}^{[circ]}], \forall n \in L_{b}, \\ C 5 : \sum_{i \in L_{i}} tr (W_{i} D_{n}) + \sum_{e \in L_{e}} tr (V_{e} D_{n}) \leq P_{n}^{[Tmax]}, \\ C 6 : \sum_{i \in L_{i}} ξ_{n i} tr (W_{i} D_{n}) R_{i} \leq C_{n}^{[limit]}, \forall n \in L_{b}, \\ C 7 : \sum_{n \in L_{b}} B_{n}^{[ahead]} + \sum_{n \in L_{b}} B_{n}^{[spot]} \leq P_{CU}^{[\max]} - P_{CU}^{[circ]} \\ C 8 : B_{n}^{[spot]} \geq 0, C 9 : S_{n} \geq 0, \forall n \in L_{b}, \\ C 10 : W_{i} ⪰ 0, \forall i \in L_{i}, C 11 : V_{e} ⪰ 0, \forall e \in L_{e} . \end{matrix}

(16)

4. Predictive Energy Trading Strategy

The multi-armed bandit (MAB) problem is expressed as a J-arm system, with each being associated with independent and identically distributed (i.i.d.) stochastic rewards. The objective is to maximize the accumulated profits by observing the associated reward of new arms during the exploration stage while simultaneously optimizing the decisions among a set of arms based on existing knowledge at the exploitation stage in multiple trials [24]. Let consider a combinatorial generalization of the classical MAB problem, where a super arm consisting of a set of N base arms,

N \subset J

, is played, and the rewards of its relevant base arms are observed individually in each trial [25].

As illustrated in Figure 1, the problem scrutinized in this paper is categorized as a combinatorial MAB problem, where a super arm is composed of N base arms and each base arm corresponds to an energy package purchased for an RRH from the day-/hour-ahead market at each trial

k, k \in K,

before the real-time energy demand. The CU adapts its cooperative energy trading strategies to the intermittent environment in the Cloud-RAN by dynamically forming super arms to maximize the averaged rewards accumulated over the period T, which is equivalent to lessening the averaged energy expense in the long run. Let

J = {1, \dots, J}

be defined as a set of indexes for possible energy packages offered in the day-/hour-ahead market by the grid, and let

E^{[total]} = {E^{1}, \dots, E^{J}}

denote all energy packages offered by the grid in the day-/hour-ahead market, where

E^{p} = E^{p - 1} + Δ E, p \in J

. Furthermore, let

A_{k}^{[set]} = {B_{1}^{[ahead]} (k), \dots, B_{N}^{[ahead]} (k)}

represent a super arm, i.e., a set of N energy packages purchased in advance for N RRHs from the day-/hour-ahead market, at the k-th trial. Let us further define the reward for the individual arms at the n-th RRH and the reward for the super arm at the k-th trial as

R (B_{n}^{[ahead]} (k))

and

R (A_{k}^{[set]})

, respectively, as

\begin{matrix} R (B_{n}^{[ahead]} (k)) = & B_{n}^{[total]} (1) - B_{n}^{[total]} (k), \end{matrix}

(17)

\begin{matrix} R (A_{k}^{[set]}) = & \sum_{n \in L_{b}} R (B_{n}^{[ahead]} (k)), \end{matrix}

(18)

where

B_{n}^{[total]} (1)

and

B_{n}^{[total]} (k)

in (17) are the total energy cost incurred by the n-th RRH at the initial trial and the k-th trial of a frame, respectively. Furthermore, let

μ_{n}^{[k, f, t]} = (μ_{n, 1}^{[k, f, t]}, μ_{n, 2}^{[k, f, t]}, \dots, μ_{n, J}^{[k, f, t]})

be defined as the reward vector for the n-th RRH, where

μ_{n, p}^{[k, f, t]} = R (B_{n}^{[ahead]} (k)), p \in J

, is the reward associated to the p-th energy package in the k-th trial of the f-th frame at the t-th time slot.

In the following, we propose CUCB-based [25] predictive energy trading strategy, which is shown in Figure 2 and detailed in Algorithms 1 and 2, to find the best possible combination of energy packages to be purchased from the day-/hour-ahead market for N RRHs for the next time slot by exploring the rewards of new combinations of energy packages within a limited number of trials at the current time slot and exploiting the past captured information on rewards of super arms from the previous time slots so that the long-term averaged rewards, i.e., the total energy cost in the long run, can be optimized.

Algorithm 1 Super Arm Exploration

1:: Initialize: Total number of trials K
2:: for $k = 1 : K$
3:: Solve problem (16) for a given $B_{n}^{[ahead]} (k)$ ,
4:: CU calculates $B_{n}^{[total]} (k)$ as per (2), $R (B_{n}^{[ahead]} (k))$ as per (17), and $R (A_{k}^{[set]})$ as per (18).
5:: If $k = 1$
6:: then $B_{n}^{[ahead]} (k + 1) = B_{n}^{[ahead]} (k) + Δ E, n \in L_{b}$ .
7:: else if the super arm reward of all the RRHs $R (A_{k}^{[s e t]}) \leq R (A_{k - 1}^{[s e t]})$ ,
8:: then $B_{n}^{[ahead]} (k + 1) = B_{n}^{[ahead]} (k - 1)$ , $\forall n \in L_{b}$ ,
9:: else if the individual reward for the n-th RRH, $n \in N$ $R (B_{n}^{[ahead]} (k)) \geq R (B_{n}^{[ahead]} (k - 1))$ and $B_{n}^{[ahead]} (k) \neq E^{J}$ ,
10:: then $B_{n}^{[ahead]} (k + 1) = B_{n}^{[ahead]} (k) + Δ E$ ,
11:: else $B_{n}^{[ahead]} (k + 1) = B_{n}^{[ahead]} (k)$ .
12:: end If
13:: Calculate the total energy cost of all the RRHs, $β^{[k, f, t]}$ as $β^{[k, f, t]} = \sum_{n \in L_{b}} B_{n}^{[total]} (k)$ .
14:: Calculate the energy package index p at all RRHs from $p = \frac{B_{n}^{[ahead]} (k)}{Δ E}, n \in L_{b}$ .
15:: Update $μ_{n, p}^{[k, f, t]} = R (B_{n}^{[ahead]} (k)), \forall p \in J, n \in L_{b}$ ;
16:: Update $A_{k + 1}^{[set]} = {B_{1}^{[ahead]} (k + 1), \dots, B_{N}^{[ahead]} (k + 1)}$ ;
17:: end for
18:: Estimated mean reward for K trials ${\hat{μ}}_{n, p}^{[f, t]} = \frac{\sum_{k = 1}^{K} μ_{n, p}^{[k, f, t]}}{K}, \forall p \in J, n \in L_{b}$ .

Algorithm 2 Main Online Learning Algorithm

1:: Initialize: Time slot count: $t = 0$ ;
2:: while $t \neq T$ do
3:: Increment the iteration index $t = t + 1$ ;
4:: for $f = 1 : F$
5:: if $t = 1$ (initial time slot)
6:: then Initialize the super arm for the first trial ( $k = 1$ ) as $A_{1}^{[set]} = {0_{1}, \dots, 0_{N}}$ ,
7:: else $A_{1}^{[set]} = S^{*}$ ,
8:: end if
9:: Exploration Stage: Run Algorithm 1
10:: Estimation Stage:
11:: Calculate the mean reward vector for the frame ${\hat{μ}}_{n}^{[f, t]} = ({\hat{μ}}_{n, 1}^{[f, t]}, {\hat{μ}}_{n, 2}^{[f, t]}, \dots, {\hat{μ}}_{n, J}^{[f, t]})$ , where ${\hat{μ}}_{n, p}^{[f, t]} = \frac{\sum_{k = 1}^{K} μ_{n, p}^{[k, f, t]}}{K}, \forall p \in J, n \in L_{b}$ .
12:: Adjustment Stage:
13:: if $Ψ_{p}$ (number of times the p-th arm is played) $\neq 0$
14:: then adjust ${\bar{μ}}_{n, p}^{[f, t]} = {\hat{μ}}_{n, p}^{[f, t]} + \sqrt{\frac{3 \ln K}{2 Ψ_{p}}}$ ,
15:: else ${\bar{μ}}_{n, p}^{[f, t]} = {\hat{μ}}_{n, p}^{[f, t]}, \forall p \in J, n \in L_{b}$ .
16:: end if
17:: end for
18:: Average adjusted mean reward vector over all frames ${\bar{μ}}_{n}^{[t]} = (\frac{\sum_{f \in F} {\bar{μ}}_{n, 1}^{[f, t]}}{F}, \frac{\sum_{f \in F} {\bar{μ}}_{n, 2}^{[f, t]}}{F}, \dots, \frac{\sum_{f \in F} {\bar{μ}}_{n, J}^{[f, t]}}{F}), n \in L_{b}$ .
19:: Exploitation Stage:
20:: Average ${\bar{μ}}_{n}^{[t]}$ over accumulated number of time slots, as ${\bar{μ}}_{n} = \frac{\sum_{t^{'} = 1}^{t} {\bar{μ}}_{n}^{[t^{'}]}}{t} = [{\bar{μ}}_{n, 1}, {\bar{μ}}_{n, 2}, \dots, {\bar{μ}}_{n, J}], n \in L_{b}$ .
21:: For the next time slot: find N optimum arm indexes as $p_{n}^{*} = \underset{p}{argmax} ({\bar{μ}}_{n, p}), p \in J, \forall n \in L_{b}$ , and the updated super arm as $S^{*} = Δ E [p_{1}^{*}, p_{2}^{*}, \dots, p_{N}^{*}]$ .
22:: end while

Let

{\hat{μ}}_{n}^{[f, t]} = ({\hat{μ}}_{n, 1}^{[f, t]}, {\hat{μ}}_{n, 2}^{[f, t]}, \dots, {\hat{μ}}_{n, J}^{[f, t]})

and

{\bar{μ}}_{n}^{[f, t]} = ({\bar{μ}}_{n, 1}^{[f, t]}, {\bar{μ}}_{n, 2}^{[f, t]}, \dots, {\bar{μ}}_{n, J}^{[f, t]}), \forall n \in L_{b}, f \in F, t \in T

denote the estimated mean reward vector and the adjusted reward vector of individual energy packages, respectively. In the exploration stage within each frame, Algorithm 1 explores new combinations of energy packages (super arms) for the next trial based on the rewards obtained at the current and the previous trials. Once a given number of K trials are completed, the mean rewards for individual energy packages, i.e.,

{\hat{μ}}_{n}^{[f, t]}

, in each frame are estimated. The estimated mean rewards are, first, adjusted and averaged over a total number of F frames of a time slot as per step 18, then averaged again over the total number of past time slots as per step 20 [26], and, finally, used to update the super arm

S^{*}

, i.e., the optimal set of energy packages purchased from the day-ahead market, to be exploited in the next time slot, as detailed in Algorithm 2.

The proposed learning-based algorithm can be considered as a mixed online learning and convex optimization problem with linear matrix inequality constraints. The optimization problem is solved once per learning trial. Therefore, the complexity of the resulting algorithm is mainly due to the number of iterations required for solving a convex optimization problem that has polynomial worst-case complexity [22] and whose total number of learning trials depends on the dynamic range of variations in the environment.

5. Simulation Results

A downlink Cloud-RAN consisting of three adjacent RRHs with SWIPT towards six single-antenna IUs and six single-antenna EUs was considered in this paper. The proposed Cloud-RAN operated under the channel bandwidth of 20 MHz. All of the RRHs were installed with eight antennas and placed 500 m away from each other. The performance of the proposed scheme was assessed with

K = 10

trials per frame,

F = 10

frames per time slot,

T = 60

time slots, and a total number of

J = 20

energy packages with

Δ E = 100

mW, i.e.,

E_{t}^{[total]} = {100, 200, \dots, 2000}

mW. The renewable energy generation values at the individual RRHs were

E_{1} = 1.5

,

E_{2} = 0.2

, and

E_{3} = 0.05

W, respectively, at a price of

π^{[renew]} = 0.02

GBP/W. It was assumed that

π^{[ahead]} = 0.07

,

π^{[spot]} = 0.15

, and

π^{[sell]} = 0.05

GBP/W. A correlated channel model,

h_{n i} = R^{1 / 2} h_{w}

, was adopted [17,27], where

h_{w} \in C^{M \times 1}

are zero-mean circularly symmetric complex Gaussian random variables with unit variance,

R \in C^{M \times M}

is the spatial covariance matrix, and its

(m, n)

-th element is given by

G_{a} L_{p} σ_{F}^{2} e^{- 0.5 \frac{{(σ_{s} ln 10)}^{2}}{100}} e^{j \frac{2 π δ}{λ} [(n - m) s i n θ]} e^{- 2 {[\frac{π δ σ}{λ} (n - m) c o s θ]}^{2}},

where

G_{a} = 15

dBi denotes the antenna gain,

L_{p} (dB) = 125.2 + 36.3 \log_{10} (d)

represents the path loss model over a distance of d km,

σ_{F}^{2}

is the variance of the complex Gaussian fading coefficient,

σ_{s} = 8

dB is the log-normal shadowing standard deviation,

σ = 2^{\circ}

is the angular offset standard deviation, and

θ

is the estimated angle of departure. The simulation parameters were assumed, unless otherwise stated, to be

P_{CU}^{[circ]} = 40

dBm,

P_{CU}^{[\max]} = 50

dBm,

P_{n}^{[circ]} = 30

dBm,

P_{n}^{[Tmax]} = 46

dBm,

C_{n}^{[limit]} = 30

bits/s/Hz,

P_{e}^{[\min]} = - 60

dBm,

P_{z}^{[idle]} = - 90

dBm [18], and

η = 0.5

, respectively. The simulation results were accomplished via CVX [28] using an Intel i7-3770 CPU at 3.4GHz with 8 GB RAM, and the running time for each learning trial was approximately seven seconds without use of parallelization. Our proposed online learning strategy was compared against a baseline design that had no ahead-of-time energy preparation and the non-learning based design in [13], which always assumes that a fixed set of energy packages is prepared from the day-/hour-ahead market, i.e.,

A^{[set]} = {B_{1}^{[ahead]} = B_{2}^{[ahead]} = B_{3}^{[ahead]}} = 700

mW. For fair comparison, identical constraints were applied to all the strategies.

Note that the convergence speed of the proposed online learning strategy to achieve its steady-state is based on the total number of learning trials, which also depends on the dynamic range of variations in the environment. Due to the limitations of our simulation tool, we downsized the total number of learning trials and the other simulation parameters according to the scale of our problem size. In a practical scenario, with a large number of users, the resulting amount of look-ahead energy purchased from the day-/hour-ahead market will be increased proportionally, which may increase the number of arms or increase the difference between two adjacent arms, and may also increase the number of learning trials needed to speed up the convergence. Therefore, the practical enlarged scenario does not affect the scalability of the proposed algorithm, as it may only increase the computational burden.

Figure 3a compares the normalized total energy cost over discrete time slots for different strategies at

γ = 15

dB. It can be observed that, at its steady-state, the proposed strategy achieves performance gains of 43 percent and 11 percent, respectively, as compared with the baseline scheme and the design in [13], since their designs provide no adaption to the dynamic wireless channel conditions in Cloud-RANs. Figure 3b shows the normalized total energy cost of our proposed strategy at

γ = 20

dB. One may observe that the performance of the proposed strategy slightly degrades with increasing target SINR, i.e., from

γ = 15

dB to

γ = 20

dB. Figure 3c represents the normalized total energy cost of our proposed strategy at

γ = 20

dB in a more complex scenario, where it is assumed that the number of per-RRH antennas is six and the renewable energy generation at individual RRHs ranges from [0.5 2.5], [0.3 1.5], and [0.1 1.0] W, respectively.

It is clear from the Figure 3c that the performance of the proposed strategy was slightly degraded compared to Figure 3b, which was simulated in a simpler scenario. However, as the time-slot index increases, the performance of our proposed strategy indicates considerable smaller variations in total energy cost and much better average performance compared to that of [13] under the same system setup. This validates the ability of our proposed algorithm to adapt to more realistic wireless networks.

Figure 4 presents in detail the procedure of a super arm being selected in accordance with Algorithms 1 and 2. Figure 4a illustrates the procedure of a super arm, i.e., an optimal set of energy packages purchased for a set of RRHs from the day-/hour-ahead market, in different trials at the fifth time slot. In each trial, a new combination of energy packages is explored on the basis of the individual and the averaged accumulated rewards obtained from the current and the previous trials, as per Algorithm 1. Figure 4b demonstrates the optimal super arm that was selected at the t-th time slot to be exploited as the starting point at the (

t + 1

)-th time slot, as per Algorithm 2. It can be observed that from the 15th time slot onwards, nearly identical super arms that associate with the highest rewards for the RRHs are selected, which demonstrates the convergence of the proposed algorithm for the given simulation.

The normalized accumulated reward and regret at each time slot for different strategies are shown in Figure 5. The normalized accumulated reward at time slot t, denoted by

R_{t}^{[acc]}

, is calculated by averaging the difference of the total energy cost at the t-th time slot and the initial time slot over all frames, i.e.,

R_{t}^{[acc]} = \frac{\sum_{f \in F} (β^{[k, f, 1]} - β^{[k, f, t]})}{F}

. In contrast, the regret of the strategies is defined as the difference in the accumulated reward between always playing the optimal super arm and playing the super arm according to the proposed strategy at the t-th time slot, i.e.,

Q_{t} = R_{opt}^{[acc]} - R_{t}^{[acc]}

, where

R_{opt}^{[acc]}

is the accumulated reward after the convergence. Figure 5 confirms that a significant performance gap exists between the proposed strategy and the baseline scheme, as well as the design in [13]. One can conclude that, although the regret of the proposed strategy has the worst performance at the initial time slot, it declines rapidly with the continuous learning process until convergence due to the fact that the proposed strategy learns from the past captured behavior of cooperative energy trading and adapts to the dynamic wireless environment.

6. Conclusions

This paper proposes a predictive cooperative energy trading mechanism based on a CMAB model in a green Cloud-RAN with SWIPT, which adapts to the temporal variations of energy demands in a statistically unknown changing environment and improves its performance gain over time, with the objective of minimizing the time-averaged overall energy cost in the long run. The proposed strategy anticipates future energy demand and supplies the instantaneous energy demand at the current time slot with energy prepared in advance based on existing knowledge of uncertain wireless system dynamics at the previous time slots. The presented simulation results confirmed a reduction of the long-term running cost. Our proposed scheme outperforms a baseline scheme that purchases no ahead-of-time energy packages and a recently proposed non-learning-based design that assumes fixed energy purchases from the day-ahead market.

Author Contributions

Conceptualization, W.N.S.F.W.A.; Data curation, W.N.S.F.W.A.; Formal analysis, W.N.S.F.W.A. and M.R.N.; Funding acquisition, H.A.R. and R.B.A.; Investigation, W.N.S.F.W.A.; Methodology, W.N.S.F.W.A. and M.R.N.; Project administration, W.N.S.F.W.A. and M.R.N.; Resources, W.N.S.F.W.A. and X.Z.; Software, W.N.S.F.W.A.; Supervision, M.R.N.; Validation, W.N.S.F.W.A. and X.Z.; Visualization, W.N.S.F.W.A. and M.R.N.; Writing—original draft, W.N.S.F.W.A., X.Z. and M.R.N.; Writing—review & editing, W.N.S.F.W.A., X.Z., M.R.N. and H.A.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Fundamental Research Grant Scheme, FRGS/1/2018/ TK10/UNIMAP/02/11, from the Ministry of Education Malaysia (MOE), Malaysia and Universiti Malaysia Perlis (UniMAP), Malaysia.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

$L_{b}$	Index of the number N of BSs.
$L_{i}$	Index of the number $K_{i}$ of information users.
$L_{e}$	Index of the number $K_{e}$ of active energy users (EUs).
$L_{e}^{[idle]}$	Index of the number $K_{e}^{[idle]}$ of idle EUs.
$P_{n}^{[Tx]}$	Total transmit power at the n-th RRH.
$P_{n}^{[circ]}$	Hardware circuit power consumption at the n-th RRH.
$P_{CU}^{[circ]}$	Hardware circuit power consumption at the CU.
$P_{n}^{[Tmax]}$	Maximum transmit power allowance of the n-th RRH.
$P_{CU}^{[\max]}$	Maximum power provision by the grid at the CU.
$T$	Index of the number T of time slots.
$F$	Index of the number F of frames within a time slot.
$K$	Index of the number K of trials within a frame.
$B_{n}^{[ahead]}$	Amount of energy purchased from the day-ahead
	market (Arm).
$B_{n}^{[spot]}$	Amount of energy to be purchased from the spot-market.
$S_{n}$	Amount of excessive energy to be sold back to the grid.
$E_{n}$	Amount of renewable energy generation at the n-th RRH.
$B_{n}^{[total]} (k)$	Total energy cost of the n-th RRH at the k-th trial.
$E^{[total]} = {E^{1}, \dots, E^{J}}$	All energy packages (arms) offered by the grid in the
	day-ahead market.
$μ_{n, p}^{[k, f, t]} = R (B_{n}^{[ahead]} (k))$	Reward associated with arm $B_{n}^{[ahead]}$ at the k-th trial of the
	f-th frame at the t-th time slot.
$A_{k}^{[set]} = {B_{1}^{[ahead]} (k), \dots, B_{N}^{[ahead]} (k)}$	N energy packages purchased a day ahead at the k-th trial
	(super arm).
$R (A_{k}^{[set]})$	Reward for the super arm $A_{k}^{[set]}$ at the k-th trial.
$μ_{n}^{[k, f, t]} = (μ_{n, 1}^{[k, f, t]}, μ_{n, 2}^{[k, f, t]}, \dots, μ_{n, J}^{[k, f, t]})$	Reward vector for the n-th RRH
${\hat{μ}}_{n}^{[f, t]} = ({\hat{μ}}_{n, 1}^{[f, t]}, {\hat{μ}}_{n, 2}^{[f, t]}, \dots, {\hat{μ}}_{n, J}^{[f, t]})$	Estimated mean reward vector
${\bar{μ}}_{n}^{[f, t]} = ({\bar{μ}}_{n, 1}^{[f, t]}, {\bar{μ}}_{n, 2}^{[f, t]}, \dots, {\bar{μ}}_{n, J}^{[f, t]})$	Adjusted reward vector of individual arms
$R_{t}^{[acc]}$	Accumulated reward at time slot t.
$Q_{t}$	Regret at time slot t.

References

Peng, M.; Li, Y.; Jiang, J.; Li, J.; Wang, C. Heterogeneous Cloud Radio Access Networks: A New Perspective for Enhancing Spectral and Energy Efficiencies. IEEE Wirel. Commun. 2014, 21, 126–135. [Google Scholar] [CrossRef]
Tan, Z.; Yang, C.; Song, J.; Liu, Y.; Wang, Z. Energy consumption analysis of C-RAN architecture based on 10G EPON front-haul with daily user behaviour. In Proceedings of the 2015 14th International Conference on Optical Communications and Networks (ICOCN), Nanjing, China, 3–5 July 2015. [Google Scholar]
Habibi, M.A.; Nasimi, M.; Han, B.; Schotten, H.D. A Comprehensive Survey of RAN Architectures Toward 5G Mobile Communication System. IEEE Access 2019, 7, 70371–70421. [Google Scholar] [CrossRef]
Mahapatra, R.; Nijsure, Y.; Kaddoum, G.; Hassan, N.U.; Yuen, C. Energy Efficiency Tradeoff Mechanism towards Wireless Green Communication: A Survey. IEEE Commun. Surv. Tutor. 2016, 18, 686–705. [Google Scholar] [CrossRef]
China Mobile Research Institute. C-RAN: The Road Towards Green RAN. version 2.5; White Paper. 2011. Available online: labs.chinamobile.com/cran/ (accessed on 30 March 2016).
Fehske, A.; Fettweis, G.; Malmodin, J.; Biczok, G. The Global Footprint of Mobile Communications: The Ecological and Economic Perspective. IEEE Commun. Mag. 2011, 49, 55–62. [Google Scholar] [CrossRef]
Bu, S.; Yu, F.R.; Cai, Y.; Liu, X.P. When the smart grid meets energy efficient communications: Green wireless cellular networks powered by the smart grid. IEEE Trans. Wirel. Commun. 2012, 11, 3014–3024. [Google Scholar] [CrossRef]
Chen, S.; Shroff, N.B.; Sinha, P. Energy trading in the smart grid: From end-user’s perspective. In Proceedings of the 2013 Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 3–6 November 2013; pp. 327–331. [Google Scholar] [CrossRef]
Xu, J.; Zhang, R. CoMP meets smart grid: A new communication and energy cooperation paradigm. IEEE Trans. Veh. Technol. 2015, 64, 2476–2488. [Google Scholar] [CrossRef]
Xu, J.; Zhang, R. Cooperative Energy Trading in CoMP Systems Powered by Smart Grids. In Proceedings of the 2014 IEEE Global Communications Conference, Austin, TX, USA, 8–12 December 2014; pp. 2697–2702. [Google Scholar] [CrossRef]
Wang, Y.; Saad, W.; Han, Z.; Poor, H.V.; Başar, T. A game-theoretic approach to energy trading in the smart grid. IEEE Trans. Smart Grid 2014, 5, 1439–1450. [Google Scholar] [CrossRef]
Leithon, J.; Lim, T.J.; Sun, S. Online energy management strategies for base stations powered by the smart grid. In Proceedings of the 2013 IEEE International Conference on Smart Grid Communications (SmartGridComm), Vancouver, BC, Canada, 21–24 October 2013; pp. 199–204. [Google Scholar] [CrossRef]
Ariffin, W.N.S.F.W.; Zhang, X.; Nakhai, M.R. Sparse Beamforming for Real-Time Resource Management and Energy Trading in Green C-RAN. IEEE Trans. Smart Grid 2016, 8, 2022–2031. [Google Scholar] [CrossRef]
Dall’Anese, E.; Zhu, H.; Giannakis, G.B. Distributed Optimal Power Flow for Smart Microgrids. IEEE Trans. Smart Grid 2013, 4, 1464–1475. [Google Scholar] [CrossRef]
Ariffin, W.N.S.F.W.; Zhang, X.; Nakhai, M.R. Combinatorial multi-armed bandit algorithms for real-time energy trading in green C-RAN. In Proceedings of the 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 23–27 May 2016; pp. 1–6. [Google Scholar] [CrossRef]
Zhang, X.; Nakhai, M.R.; Ariffin, W.N.S.F.W. A Bandit Approach to Price-Aware Energy Management in Cellular Networks. IEEE Commun. Lett. 2017, 21, 1609–1612. [Google Scholar] [CrossRef]
Ariffin, W.N.S.F.W.; Zhang, X.; Nakhai, M.R. Sparse beamforming for real-time energy trading in CoMP-SWIPT networks. In Proceedings of the 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 23–27 May 2016; pp. 1–6. [Google Scholar] [CrossRef]
Dai, B.; Yu, W. Sparse Beamforming and User-Centric Clustering for Downlink Cloud Radio Access Network. IEEE Access Recent Adv. Cloud RAN 2014, 2, 1326–1339. [Google Scholar]
Ng, D.W.K.; Schober, R. Resources Allocation for Coordinated Multipoint Networks with Wireless Information and Power Transfer. In Proceedings of the 2014 IEEE Global Communications Conference, Austin, TX, USA, 8–12 December 2014; pp. 4281–4287. [Google Scholar] [CrossRef]
Candes, E.; Wakin, M.; Boyd, S. Enhancing Sparsity by Reweighted ℓ₁ Minimization. J. Fourier Anal. Appl. 2008, 14, 877–905. [Google Scholar] [CrossRef]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Vandenberghe, L.; Boyd, S. Semidefinite programming. SIAM Rev. 1996, 38, 4995. [Google Scholar] [CrossRef]
Nesterov, Y.; Nemirovsky, A. Interior-Point Polynomial Methods in Convex Programming; Studies in Applied Mathematics; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1994; Volume 13. [Google Scholar]
Blasco, P.; Gunduz, D. Learning-Based Optimization of Cache Content in a Small Cell Base Station. In Proceedings of the 2014 IEEE International Conference on Communications (ICC), Sydney, Australia, 10–14 June 2014; pp. 1897–1903. [Google Scholar] [CrossRef]
Chen, W.; Wang, Y.; Yuan, Y. Combinatorial multi-armed bandit: General framework, results and applications. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
Yang, Y.; Zhu, D. Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Ann. Statist. 2002, 30, 100–121. [Google Scholar]
Ariffin, W.N.S.F.W.; Zhang, X.; Nakhai, M.R. Real-time power balancing in green CoMP network with wireless information and energy transfer. In Proceedings of the 2015 IEEE 26th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), Hong Kong, 30 August–2 September 2015; pp. 1574–1578. [Google Scholar] [CrossRef]
Grant, M.; Boyd, S. CVX: Matlab Software for Disciplined Convex Programming, Version 2.0 (Beta). March 2013. Available online: http://cvxr.com/cvx/ (accessed on 30 January 2014).

Figure 1. Combinatorial multi-armed bandit (CMAB) problem for predictive real-time energy trading in a cloud radio access network (Cloud-RAN) with the sparse beamforming technique.

Figure 2. Proposed predictive energy trading strategy in a Cloud-RAN.

Figure 3. Normalized total energy cost at (a)

γ = 15

, (b)

γ = 20

, and (c)

γ = 20

dB with random variations in renewable generation.

Figure 3. Normalized total energy cost at (a)

γ = 15

, (b)

γ = 20

, and (c)

γ = 20

dB with random variations in renewable generation.

Figure 4. Illustration of super arm decisions according to the proposed strategy: (a) super arms chosen in the individual trials at the fifth time slot; (b) look-ahead energy purchase decisions (i.e., final super arm) for individual time slots.

Figure 5. Normalized accumulated reward/regret for different strategies.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wan Ariffin, W.N.S.F.; Zhang, X.; Nakhai, M.R.; Rahim, H.A.; Ahmad, R.B. Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs. Sensors 2021, 21, 2308. https://doi.org/10.3390/s21072308

AMA Style

Wan Ariffin WNSF, Zhang X, Nakhai MR, Rahim HA, Ahmad RB. Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs. Sensors. 2021; 21(7):2308. https://doi.org/10.3390/s21072308

Chicago/Turabian Style

Wan Ariffin, Wan Nur Suryani Firuz, Xinruo Zhang, Mohammad Reza Nakhai, Hasliza A. Rahim, and R. Badlishah Ahmad. 2021. "Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs" Sensors 21, no. 7: 2308. https://doi.org/10.3390/s21072308

APA Style

Wan Ariffin, W. N. S. F., Zhang, X., Nakhai, M. R., Rahim, H. A., & Ahmad, R. B. (2021). Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs. Sensors, 21(7), 2308. https://doi.org/10.3390/s21072308

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs

Abstract

1. Introduction

1.1. Related Works

1.2. Main Contributions

1.3. Organization and Notations

2. System Model

2.1. Energy Management Model

2.2. Downlink Transmission Model

3. Real-Time Energy Trading on an Individual Time Frame

3.1. Problem Formulation

3.2. Re-Weighted $ℓ_{1}$ -Norm and Semidefinite Programming

4. Predictive Energy Trading Strategy

5. Simulation Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs

Abstract

1. Introduction

1.1. Related Works

1.2. Main Contributions

1.3. Organization and Notations

2. System Model

2.1. Energy Management Model

2.2. Downlink Transmission Model

3. Real-Time Energy Trading on an Individual Time Frame

3.1. Problem Formulation

3.2. Re-Weighted ℓ 1 -Norm and Semidefinite Programming

4. Predictive Energy Trading Strategy

5. Simulation Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3.2. Re-Weighted $ℓ_{1}$ -Norm and Semidefinite Programming