Energy-Efficient Access Point Switch On/Off in Cell-Free Massive MIMO Using Proximal Policy Optimization
Abstract
1. Introduction
1.1. Problem Statement
1.2. Contributions
- We formulate the ASO problem explicitly as the selection of an energy-efficient subset of APs in cell-free massive MIMO systems and address it through a scalable RL framework operating under partial observability. The proposed formulation avoids exhaustive combinatorial search and does not require full large-scale fading knowledge, making it suitable for practical online implementation.
- We develop a PPO-based ASO policy that directly optimizes network energy efficiency and benchmark it against well-defined baseline strategies, including random activation, the all-on configuration, and a greedy oracle upper bound. This comparison framework makes explicit the underlying information assumptions and enables a transparent interpretation of the learned policy’s performance.
- We conduct a systematic performance evaluation across multiple deployment scenarios of increasing scale and density, designed to preserve AP density while increasing network size. This evaluation framework allows the scalability and robustness of the learning-based policy to be assessed under controlled and reproducible conditions.
- Beyond average performance metrics, we provide a detailed distributional analysis based on boxplots, cumulative distribution functions (CDFs), and EE conditioned on the number of active APs. This analysis characterizes not only typical performance but also robustness and reliability, revealing how the learned policy reduces the likelihood of highly energy-inefficient operating regimes.
1.3. Structure
1.4. Reproducible Research
2. Related Work
3. System Model and Energy Efficiency Formulation
3.1. Channel Model
3.2. Channel Estimation
3.3. Downlink Data Transmission
3.4. Network Scalability
3.5. Power Consumption Model
3.6. Energy Efficiency Definition
4. Access Point Switch On/Off Problem Formulation and Baselines
4.1. Problem Formulation
4.2. Baseline Strategies
- Random-Selection ASO (Lower Bound): A practical lower bound on the achievable energy efficiency can be obtained by employing a random AP activation mechanism. In this scheme, each AP is independently switched on or off with equal probability, while the number of active APs can be controlled to match a target activation level. Since this strategy does not account for channel conditions, interference, or the impact of individual APs on the aggregate throughput, the resulting energy efficiency is typically poor. Nevertheless, random-selection ASO (RS-ASO) provides a simple and computationally inexpensive baseline that serves as a conservative lower bound for the performance of more sophisticated ASO strategies.
- Greedy Energy-Efficiency ASO (Upper Bound): An upper performance bound can be approximated by identifying, for a given number of active APs, the subset that maximizes energy efficiency. In principle, this would require evaluating all possible AP activation combinations, which is computationally infeasible due to the exponential growth of the search space. To obtain a tractable approximation, a greedy energy-efficiency-oriented ASO strategy is adopted. The algorithm starts with all APs active and iteratively switches off one AP at a time. At each iteration, all candidate configurations obtained by deactivating a single AP are evaluated, and the configuration yielding the highest energy efficiency is selected. This process continues until no further improvement in EE is observed. This optimal EE greedy ASO (OG-ASO) provides a tight and computationally feasible approximation of the maximum achievable energy efficiency and is used as an upper reference for performance evaluation. It should be noted that the greedy scheme is evaluated exclusively as an offline reference to approximate the location of energy-efficient operating points. Due to its combinatorial nature, it is not intended for real-time implementation, but rather serves as an interpretability and benchmarking tool.
5. Reinforcement Learning Framework
5.1. State, Action, and Reward Definition
- State: The state is designed to capture the structural information that determines the EE contribution of each AP, while avoiding dependence on instantaneous small-scale fading. This choice reflects the slow time scale of ASO decisions and promotes stable learning. The observation available to the agent is collected and preprocessed at the CPU, which has access to large-scale channel statistics and AP-side power parameters, as commonly assumed in cell-free massive MIMO architectures.For each AP l, the state includes a compact set of features derived from its large-scale fading coefficients toward all UEs, namely the mean, maximum, and minimum values. In addition, the power consumption associated with activating AP l is included to account for its energy cost. The resulting local observation vector is given bywhere denotes the normalized large-scale fading coefficient between AP l and UE k, and represents the power consumption of AP l in active mode. The global state is obtained by stacking the observations of all APs, resulting in a fixed-dimensional representation that scales linearly with the number of APs.Large-scale fading coefficients are normalized using affine transformations with constants derived offline from scenario-level statistics and applied consistently across all training and evaluation episodes. Since large-scale fading evolves over much longer time scales than small-scale fading, these features and their normalization only need to be updated at the large-scale fading coherence time, resulting in negligible signaling and computational overhead. The per-AP aggregation operations (mean, max, min over users) are simple and scale linearly with the number of users.
- Action: The action corresponds to selecting an AP activation pattern and is defined as a binary vectorwhere indicates that AP l is active and denotes that it operates in sleep mode. This multi-binary action space directly reflects the physical ASO decision and avoids explicit enumeration of the possible activation configurations.
- Reward: The reward is defined directly as the downlink energy efficiency achieved under the selected AP activation pattern. Specifically, for a given action , the reward is computed aswhere is the energy efficiency metric defined in Section 3.6. Since small-scale fading varies independently across time steps, the reward effectively reflects the energy efficiency averaged over multiple small-scale channel realizations for a fixed large-scale topology.
5.2. PPO-Based Solution
5.3. Training Procedure
6. Methodology
6.1. Simulations Setup
6.2. Comparison Framework and Evaluation Philosophy
7. Results and Discussion
7.1. Baseline Analysis
7.2. Training Behavior
7.3. Performance Evaluation
8. Conclusions and Future Work
8.1. Conclusions
8.2. Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Demir, Ö.T.; Björnson, E.; Sanguinetti, L. Foundations of User-Centric Cell-Free Massive MIMO. Found. Trends Signal Process. 2021, 14, 162–472. [Google Scholar] [CrossRef]
- Chen, S.; Zhang, J.; Zhang, J.; Björnson, E.; Ai, B. A survey on user-centric cell-free massive MIMO systems. Digit. Commun. Netw. 2022, 8, 695–719. [Google Scholar] [CrossRef]
- Elhoushy, S.; Ibrahim, M.; Hamouda, W. Cell-Free Massive MIMO: A Survey. IEEE Commun. Surv. Tutor. 2022, 24, 492–523. [Google Scholar] [CrossRef]
- Feng, M.; Mao, S.; Jiang, T. Base Station ON-OFF Switching in 5G Wireless Networks: Approaches and Challenges. IEEE Wirel. Commun. 2017, 24, 46–54. [Google Scholar] [CrossRef]
- Femenias, G.; Lassoued, N.; Riera-Palou, F. Access Point Switch ON/OFF Strategies for Green Cell-Free Massive MIMO Networking. IEEE Access 2020, 8, 21788–21803. [Google Scholar] [CrossRef]
- Vu, T.X.; Chatzinotas, S.; ShahbazPanahi, S.; Ottersten, B. Joint Power Allocation and Access Point Selection for Cell-free Massive MIMO. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Jung, S.; Hong, S.E. Performance analysis of Access Point Switch ON/OFF schemes for Cell-free mmWave massive MIMO UDN systems. In Proceedings of the 2021 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 20–22 October 2021; pp. 644–647. [Google Scholar] [CrossRef]
- Ito, M.; Kanno, I.; Amano, Y.; Kishi, Y.; Chen, W.Y.; Choi, T.; Molisch, A.F. Joint AP On/Off and User-Centric Clustering for Energy-Efficient Cell-Free Massive MIMO Systems. In Proceedings of the 2022 IEEE 96th Vehicular Technology Conference (VTC2022-Fall), London, UK, 26–29 September 2022; pp. 1–5. [Google Scholar] [CrossRef]
- Hong, S.E.; Na, J.H. Joint Access Point Beamforming and Switch On/Off Scheme for Energy Efficient Cell-Free mmWave massive MIMO. In Proceedings of the 2022 13th International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 19–21 October 2022; pp. 730–732. [Google Scholar] [CrossRef]
- Zhou, O.; Wang, J.; Liu, F.; Wang, J. Energy-Efficient Clustered Cell-Free Networking With Access Point Selection. IEEE Open J. Commun. Soc. 2024, 5, 1551–1565. [Google Scholar] [CrossRef]
- Munawar, M.; Guenach, M.; Moerman, I. Performance and Architectural Tradeoffs in Scalable Cell-Free Massive MIMO. IEEE Access 2024, 12, 150189–150203. [Google Scholar] [CrossRef]
- Mendoza, C.F.; Schwarz, S.; Rupp, M. Deep Reinforcement Learning for Dynamic Access Point Activation in Cell-Free MIMO Networks. In Proceedings of the WSA 2021; 25th International ITG Workshop on Smart Antennas, Sophia Antipolis, France, 10–12 November 2021; pp. 1–6. [Google Scholar]
- Suh, H.; Oh, J.; Kang, S.; Hwang, T. DRL-Based AP Switch On/Off Scheme for Cell-Free Massive MIMO MEC Networks. In Proceedings of the 2023 International Conference on Information and Communication Technology Convergence (ICTC), Jeju Island, Republic of Korea, 11–13 October 2023; pp. 235–237. [Google Scholar] [CrossRef]
- Sun, L.; Hou, J.; Chapman, R. Multi-Agent Deep Reinforcement Learning for Access Point Activation Strategy in Cell-Free Massive MIMO Networks. In Proceedings of the IEEE INFOCOM 2023—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Hoboken, NJ, USA, 20 May 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Li, W.; Jiang, Y.; Huang, Y.; Zheng, F.C. Energy-Efficient Access Point Sleep Control in User-Centric Cell-Free Massive MIMO Systems. In Proceedings of the 2024 16th International Conference on Wireless Communications and Signal Processing (WCSP), Hefei, China, 24–26 October 2024; pp. 585–590. [Google Scholar] [CrossRef]
- Xu, X.; Jiang, Y.; Huang, Y.; Zheng, F.C. A Nested DRL-Based Method for Power Allocation and AP Sleep Control in Cell-Free Massive MIMO Systems. In Proceedings of the 2024 10th International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2024; pp. 1663–1668. [Google Scholar] [CrossRef]
- Xu, J.; Wang, C.; Deng, D.; Li, Y.; Pang, M.; Zhang, Z.; Wang, D. Joint AP Scheduling and Power Allocation Based on Synergistic DRL for Cell-Free Massive MIMO. IEEE Commun. Lett. 2025, 29, 1082–1086. [Google Scholar] [CrossRef]
- Tan, F.; Deng, Q.; Liu, Q. Energy-efficient access point clustering and power allocation in cell-free massive MIMO networks: A hierarchical deep reinforcement learning approach. EURASIP J. Adv. Signal Process. 2024, 2024, 18. [Google Scholar] [CrossRef]
- Wu, Z.; Jiang, Y.; Huang, Y.; Zheng, F.C.; Zhu, P. Energy-Efficient Joint AP Selection and Power Control in Cell-Free Massive MIMO Systems: A Hybrid Action Space-DRL Approach. IEEE Commun. Lett. 2024, 28, 2086–2090. [Google Scholar] [CrossRef]
- Masoudi, M.; Soroush, E.; Zander, J.; Cavdar, C. Digital Twin Assisted Risk-Aware Sleep Mode Management Using Deep Q-Networks. IEEE Trans. Veh. Technol. 2023, 72, 1224–1239. [Google Scholar] [CrossRef]
- Suh, H.; Kang, S.; Hwang, T. Intelligent AP Control and Computation Offloading for Cell-Free Massive MIMO MEC Networks. In Proceedings of the 2024 International Conference on Electronics, Information, and Communication (ICEIC), Taipei, Taiwan, 28–31 January 2024; pp. 1–3. [Google Scholar] [CrossRef]
- Wang, G.; Cheng, P.; Chen, Z.; Vucetic, B.; Li, Y. Green Cell-Free Massive MIMO: An Optimization Embedded Deep Reinforcement Learning Approach. IEEE Trans. Signal Process. 2024, 72, 2751–2766. [Google Scholar] [CrossRef]
- Zaher, M.; Demir, Ö.T.; Björnson, E.; Petrova, M. Learning-Based Downlink Power Allocation in Cell-Free Massive MIMO Systems. IEEE Trans. Wirel. Commun. 2023, 22, 174–188. [Google Scholar] [CrossRef]
- Salaün, L.; Yang, H. Deep Learning Based Power Control for Cell-Free Massive MIMO with MRT. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Salaün, L.; Yang, H.; Mishra, S.; Chen, C.S. A GNN Approach for Cell-Free Massive MIMO. In Proceedings of the 2022 IEEE Global Communications Conference (GLOBECOM), Rio de Janeiro, Brazil, 4–8 December 2022; pp. 3053–3058. [Google Scholar] [CrossRef]
- Series, M. Guidelines for Evaluation of Radio Interface Technologies for IMT-Advanced. Report ITU M.2135-1. Technical Report, International Telecommuncation Union, 2009. Available online: https://www.itu.int/dms_pub/itu-r/opb/rep/r-rep-m.2135-1-2009-pdf-e.pdf (accessed on 14 February 2026).
- Chakraborty, S.; Manoj, B.R. Power Allocation in a Cell-Free MIMO System using Reinforcement Learning-Based Approach. In Proceedings of the 2023 National Conference on Communications (NCC), Guwahati, India, 23–26 February 2023; pp. 1–6. [Google Scholar] [CrossRef]
- Zhao, Y.; Niemegeers, I.G.; De Groot, S.H. Deep Q-network based dynamic power allocation for cell-free massive MIMO. In Proceedings of the 2021 IEEE 26th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), Porto, Portugal, 25–27 October 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Zhao, Y.; Niemegeers, I.G.; De Groot, S.M.H. Dynamic Power Allocation for Cell-Free Massive MIMO: Deep Reinforcement Learning Methods. IEEE Access 2021, 9, 102953–102965. [Google Scholar] [CrossRef]
- Bashar, M.; Akbari, A.; Cumanan, K.; Ngo, H.Q.; Burr, A.G.; Xiao, P.; Debbah, M.; Kittler, J. Exploiting Deep Learning in Limited-Fronthaul Cell-Free Massive MIMO Uplink. IEEE J. Sel. Areas Commun. 2020, 38, 1678–1697. [Google Scholar] [CrossRef]
- Zhang, Y.; Zhang, J.; Buzzi, S.; Xiao, H.; Ai, B. Unsupervised Deep Learning for Power Control of Cell-Free Massive MIMO Systems. IEEE Trans. Veh. Technol. 2023, 72, 9585–9590. [Google Scholar] [CrossRef]






| Ref. | RL Alg. | Evaluation Scale | Power Model | Primary Focus | Key Limitations |
|---|---|---|---|---|---|
| [12] | DQN | Small-scale (8 APs) | None | QoS | Very small-scale; no EE; weak baselines |
| [20] | DQN | Moderate-scale | Basic | Delay, risk-aware | EE not optimized; DT-oriented framework |
| [13] | DDPG | Moderate-scale | Partial | Power minimization | EE not isolated; ASO not standalone |
| [14] | Multi-Agent DQN | Moderate-scale (50 APs) | Partial | Power minimization | No EE; limited scale; random baseline |
| [15] | PPO | Large-scale (~64 APs) | Basic | EE-oriented | Simplified power model; no EE-optimal AP analysis |
| [18] | Hierarchical DDPG | Moderate-scale | Partial | EE + clustering | Joint multi-level control; ASO not standalone |
| [21] | DDPG | Moderate-scale | Partial | Delay | Delay-driven; EE secondary |
| [22] | SAC + Graph Transformer | Moderate-scale (~40 APs) | Partial | Power minimization | EE not central; high complexity |
| [19] | SAC (hybrid) | Moderate-scale | None | EE + SE constraints | No power model; weak baselines |
| [16] | Nested Actor- Critic | Small-scale (16 APs) | Basic | EE-oriented | Very small-scale; basic power model |
| [17] | A2C | Small-scale (20 APs) | Basic | EE + SE constraints | Reduced scale; limited generality |
| Parameter | Value |
|---|---|
| AP power parameters | |
| (fixed power, active) | 8.0 W |
| (fixed power, sleep) | 0.8 W |
| (per RF chain, active) | 0.2 W |
| (per RF chain, sleep) | 0.02 W |
| (PA efficiency) | 0.39 |
| UE power parameters | |
| (fixed reception power) | 0.75 W |
| (traffic-dependent processing) | 0.25 W/Gbps |
| Fronthaul power parameters | |
| (fixed power, active) | 5.0 W |
| (fixed power, sleep) | 0.5 W |
| (traffic-dependent power) | 0.25 W/Gbps |
| Parameter | Value |
|---|---|
| Learning rate | |
| Discount factor () | |
| Generalized advantage estimation (GAE) parameter () | |
| Clipping range | |
| Entropy coefficient | |
| Value function coefficient | |
| Max. gradient norm | |
| Rollout steps () | 2048 |
| Batch size | 256 |
| Epochs per update | 10 |
| Policy/value network | MLP |
| Hidden layers | (ReLU) |
| Training device | CPU |
| Scenario | APs (L) | UEs (K) | Area Side Length (m) |
|---|---|---|---|
| Scenario 1 | 32 | 6 | 500 |
| Scenario 2 | 64 | 18 | 750 |
| Scenario 3 | 96 | 30 | 1000 |
| Parameter | Value |
|---|---|
| Deployment and general parameters | |
| Antennas per AP (N) | 4 |
| Topology | Wrapped-around |
| Realizations | 100 |
| Radio parameters | |
| Coherence block () | 200 symbols |
| Pilot length () | 20 symbols |
| Max. transmit power () | 100 mW |
| Total DL power budget () | 200 mW |
| Bandwidth (B) | 20 MHz |
| Noise figure | 7 dB |
| Correlation and propagation model | |
| Decorrelation factor | 9 m |
| Antenna spacing | |
| ASD (azimuth) | |
| ASD (elevation) | |
| Shadow fading std. dev. () | 8 dB |
| Carrier frequency () | 2 GHz |
| AP height () | 10 m |
| UE height () | 1.65 m |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
García-Barrios, G.; Alonso, A.; Fuentes, M. Energy-Efficient Access Point Switch On/Off in Cell-Free Massive MIMO Using Proximal Policy Optimization. Electronics 2026, 15, 1219. https://doi.org/10.3390/electronics15061219
García-Barrios G, Alonso A, Fuentes M. Energy-Efficient Access Point Switch On/Off in Cell-Free Massive MIMO Using Proximal Policy Optimization. Electronics. 2026; 15(6):1219. https://doi.org/10.3390/electronics15061219
Chicago/Turabian StyleGarcía-Barrios, Guillermo, Alberto Alonso, and Manuel Fuentes. 2026. "Energy-Efficient Access Point Switch On/Off in Cell-Free Massive MIMO Using Proximal Policy Optimization" Electronics 15, no. 6: 1219. https://doi.org/10.3390/electronics15061219
APA StyleGarcía-Barrios, G., Alonso, A., & Fuentes, M. (2026). Energy-Efficient Access Point Switch On/Off in Cell-Free Massive MIMO Using Proximal Policy Optimization. Electronics, 15(6), 1219. https://doi.org/10.3390/electronics15061219

