# Joint Optimization of Energy Efficiency and User Outage Using Multi-Agent Reinforcement Learning in Ultra-Dense Small Cell Networks

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

#### 1.1. Motivation and Related Works

#### 1.2. Paper Organization

## 2. System Model

## 3. Joint Optimization of EE and User Outage Based on Multi-Agent Distributed Reinforcement Learning in Ultra-Dense Small Cell Networks

#### 3.1. MAQ-OCB with SBS Collaboration

#### 3.2. MAQ-OCB without SBS Collaboration

## 4. Simulation Results and Discussions

## 5. Conclusions

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

6G | Sixth-generation |

BS | Base station |

C-OCB | Centralized Q-learning based outage-aware cell breathing |

DNN | Deep neural network |

DQL | Deep Q-learning |

DQN | Deep Q-network |

EE | Energy efficiency |

IIR | Infinite impulse response |

IoT | Internet of Things |

KPI | Key performance indicator |

MAQ-OCB | Multi-agent Q-learning based outage-aware cell breathing |

MAQ-OCB w/ SC | MAQ-OCB with SBS collaboration |

MAQ-OCB w/o SC | MAQ-OCB without SBS collaboration |

MBS | Macro cell BS |

MIMO | Multiple-input multiple-output |

NB-IoT | Narrow-band Internet of Things |

NOMA | Non-orthogonal multiple access |

No TPC | No transmission power control |

RSRP | Reference signal received power |

SBS | Small cell BS |

SE | Spectral efficiency |

SEE | Spectral and energy efficiency |

SINR | Signal-to-interference-plus-noise ratio |

XR | Extended reality |

**Figure 1.**System model of proposed multi-agent Q-learning framework for maximizing EE while minimizing user outage in ultra-dense small cell networks.

**Figure 2.**Energy efficiency and reward vs. episode when $\left|\mathbb{U}\right|$ = 20, $\left|\mathbb{M}\right|$ = 3, and $\left|\mathbb{N}\right|$ = 4, and global optimum vs. local optimum in two-agent case. (

**a**) Accumulated energy efficiency. (

**b**) Accumulated reward. (

**c**) Global optimum vs. local optimum.

**Figure 3.**Reward vs. episode when $\left|\mathbb{U}\right|$ = 60, $\left|\mathbb{M}\right|$ = 3, and $\left|\mathbb{N}\right|$ = 6.

**Figure 4.**Energy efficiency vs. episode when $\left|\mathbb{U}\right|$ = 60, $\left|\mathbb{M}\right|$ = 3, and $\left|\mathbb{N}\right|$ = 6.

**Figure 5.**Number of outage users vs. episode when $\left|\mathbb{U}\right|$ = 60, $\left|\mathbb{M}\right|$ = 3, and $\left|\mathbb{N}\right|$ = 6.

Parameter | Value | Parameter | Value |
---|---|---|---|

$\left|\mathbb{M}\right|$ | 3 | $\left|\mathbb{N}\right|$ | $4,6$ |

$\left|\mathbb{U}\right|$ | 20 ∼ 60 | ${d}_{th}$ | 150 m ∼ 450 m |

$\delta $ | $0.5$ | W | 10 MHz |

${\sigma}^{2}$ | $-174$ dBm | ${P}_{c}^{a}$ | $0.25$ W |

${P}_{c}^{s}$ | $0.025$ W | $\rho $ | 3 |

$\varsigma $ | $0.1$ | $\eta $ | $0.9$ |

${\u03f5}_{be}$ | 0.99 | $\chi $ | 330 |

${\Delta}_{{P}_{t}}$ | $0.5$ W | ${\gamma}_{th}$ | 0 dB |

Algorithm | EE-Optimal, Reward-Optimal | C-OCB | MAQ-OCB w/o SC | MAQ-OCB w/ SC |
---|---|---|---|---|

$\mathcal{O}(\xb7)$ | $\mathcal{O}\left(\right|\mathbf{S}{|}^{\left|\mathbb{N}\right|}\left|\mathbf{A}{|}^{\left|\mathbb{N}\right|}\right)$ | $\mathcal{O}\left(\right|\mathbf{S}{|}^{\left|\mathbb{N}\right|}\left|\mathbf{A}{|}^{\left|\mathbb{N}\right|}\right)$ | $\mathcal{O}\left(\right|\mathbf{S}\left|\right|\mathbf{A}\left|\right)$ | $\mathcal{O}\left(\right|\mathbf{S}{|}^{|\Im |}\left|\mathbf{A}\right|)$ |

