ML-Based Autoscaling for Elastic Cloud Applications: Taxonomy, Frameworks, and Evaluation
Abstract
1. Introduction
1.1. Motivation and Significance
1.2. Objectives and Contributions
- 1.
- Taxonomy of ML-based autoscaling. We propose a taxonomy for machine learning-based autoscalers across five dimensions: goal, decision logic, scaling mode (horizontal, vertical, or hybrid), control scope (e.g., VMs, containers, services), and deployment setting (cloud, edge, hybrid), and use it to classify existing approaches.
- 2.
- Systematic classification of ML techniques. We systematically examine supervised, unsupervised, and reinforcement learning approaches to autoscaling, characterising each by workload types, input signals, control decisions, and optimisation objectives, and provide comparative summaries of representative methods.
- 3.
- Analysis of frameworks and platform integrations. We analyze end-to-end autoscaling frameworks and their implementations of the control loop, and study how ML-based autoscalers are integrated with practical platforms, such as Kubernetes (HPA, VPA, KEDA, and custom controllers), and autoscaling services from major cloud providers.
- 4.
- Synthesis of evaluation practice and cross-cutting challenges. We consolidate evaluation practices, including metrics, workloads, and benchmarks, and identify recurring challenges such as hybrid scaling, multi-service coordination, telemetry lag, and concept drift, cost–SLO–energy trade-offs, and limited reproducibility, leading to design guidelines and concrete directions for future research.
1.3. Problem Statement
- : vector of allocated resources at time t (for example CPU, memory, and instance count);
- : workload or demand at time t;
- : predicted workload or demand at time t;
- : cost incurred at time t;
- : QoS metric at time t (for example response time or SLA violation rate).
1.4. Review Method
1.4.1. Search Engines
1.4.2. Search Limits
1.4.3. Inclusion Criteria
- They apply machine learning techniques to autoscaling cloud or elastic applications.
- They explicitly define autoscaling goals or policies (for example, meeting QoS targets or minimising cost).
1.4.4. Exclusion Criteria
- They do not address autoscaling using machine learning techniques.
- They do not concern cloud or elastic applications.
- They are not peer reviewed (for example, theses, white papers, or technical reports).
- They are not written in English.
1.5. Survey Organization
2. Taxonomy and Background
2.1. Goal (What Is Optimised?)
2.2. Decision (How Does It Decide?)
2.3. Scaling (How Does It Scale?)
2.4. Control Scope (What Is Controlled?)
2.5. Deployment (Where Does It Run?)
3. ML-Based Autoscaling Approaches
3.1. ML Techniques for Autoscaling
3.1.1. Supervised Learning
3.1.2. Unsupervised Learning
3.1.3. Reinforcement Learning (RL)
3.2. Evaluation Metrics and Benchmarks
4. Frameworks and Systems
4.1. ML-Driven Autoscaling Pipelines
- Monitor. ML is used to filter and enrich raw metrics, for example anomaly or outlier detection on CPU, latency, or error rates before they are passed to the analysis step.
- Analyze. This is where ML most often resides: time series models and neural networks forecast workload or QoS, and anomaly or root-cause analysis methods identify bottlenecks and impending SLA violations.
- Plan. In ML-driven autoscalers the planning logic itself can be learned. Reinforcement learning, model predictive control, or optimisation heuristics choose the scaling action (for example, the number of instances or the amount of extra CPU) based on predictions and the current state.
- Execute. Execution typically performs the concrete actions (calling cloud or Kubernetes APIs, updating replica counts or resource limits). ML is rarely used here, apart from occasional coordination of multiple actions.
- Knowledge. ML models, learned policies, trace databases, and SLA goals are stored and updated in the knowledge base. Supervised and RL methods use this data for training and experience replay, and the stored objectives encode the trade off between QoS, cost, and energy.
4.2. Integration with Orchestration Tools
4.2.1. Kubernetes Integration
- HPA & VPA: Autoscaling in Kubernetes is typically done via the Horizontal Pod Autoscaler (HPA) for horizontal scaling and the Vertical Pod Autoscaler (VPA) for vertical adjustments. The HPA runs as a controller in the cluster, periodically checking metrics (through the Metrics API). Research prototypes that use custom ML logic often implement their own controller to replace or augment HPA. For example, Wu et al. (2019) [81] developed a custom autoscaler that uses deep reinforcement learning to adjust replica counts; they integrated it by watching the same metrics and then setting the Deployment’s replica field (essentially doing HPA’s job with their logic).
- KEDA: Another method is to use KEDA, which supports external metrics and event-driven triggers. KEDA is flexible—one can plug in a predictive model as an external metric source (e.g., a custom metrics adapter that provides “predicted load 5 min ahead”), and then let HPA scale on that metric as if it were any other input. This approach was used by Saxena and Singh (2021): they proposed a proactive framework using an online multi-resource neural network predictor for demand forecasting, combined with clustering for VM autoscaling decisions in cloud data centers [21].
- Kubernetes cannot vertically resize a running Pod without restarting it (the VPA typically evicts and recreates Pods with new resources). Most academic works focusing on horizontal scaling avoid modifying vertical resource limits during experiments (to keep the app running continuously). However, very few focus on VPA. For example, Pham and Kim (2024) propose an Elastic Federated Learning framework [93] that integrates Kubernetes Vertical Pod Autoscaler (VPA) in a KubeEdge-based edge environment to dynamically adjust pod resources (CPU, RAM) based on historical and real-time usage data, enabling efficient handling of heterogeneous FL workloads while accelerating model convergence and preserving training progress.
- Another aspect is cluster-level scaling. If an autoscaler rapidly increases Pods beyond current cluster capacity, it should also trigger cluster autoscaling (adding worker VMs via the Kubernetes Cluster Autoscaler) or risk unschedulable Pods. Some works integrate cluster scaling explicitly—e.g., Qiu et al. (2020) considered both container scaling and node provisioning in their FIRM framework, using a hierarchical RL (service-level agent for containers, top-level agent for adding nodes) [78]. As illustrated in Figure 6, the choice of scaling target (pod/node) dramatically affects inter-pod latency.
- Actuation delays and platform constraints are also important. Orchestrators often have their own logic (cooldowns, max scaling speed). A custom autoscaler usually must be tuned with these in mind or disable them. For research, authors often turn off such features to evaluate their algorithm in isolation. Nguyen et al. (2020) [94] provide a comprehensive analysis of Kubernetes HPA operational behaviors, including the effects of metric scraping periods on scaling responsiveness and the default 5-minute downscale delay designed to prevent thrashing from continuous scaling actions.
- Integration also involves getting the right data. Using a service mesh (Istio, Linkerd) or distributed tracing can provide detailed metrics and insight into inter-service dependencies. For example, Istio telemetry can report per-service request rates and latencies; an autoscaler might use that to perform critical path analysis (like FIRM’s SVM to find the current latency bottleneck service [78]). Some advanced frameworks use telemetry from systems like Jaeger or Zipkin—e.g., to feed a graph neural network that predicts how a surge in Service A will affect Service B and C down the line [88]. In practice, one might keep a cooldown to prevent thrashing (e.g., “don’t scale again for 2 min after a scale action”).
4.2.2. Cloud Provider Auto Scaling
5. Discussion and Challenges
5.1. When Does a Learned Scaler Beat a Tuned Threshold?
- Workloads exhibit complex or non-stationary patterns (for example, bursty traffic, diurnal cycles, or workload mixes that change over time). For instance, Calheiros et al. [1] demonstrated that an ARIMA-based predictor achieved 91% accuracy on seasonal workloads, enabling proactive resource provisioning that would be difficult to replicate with static thresholds. Similarly, Shahin [24] proposed an LSTM-based autoscaler and empirically showed that it outperformed traditional threshold-based methods under sudden workload changes. In production environments, Qiu et al. [65] reported that their reinforcement learning–based AWARE framework adapted to new workloads 5.5× faster than transfer learning-baselines and reduced SLO violations by a factor of 16.9×, while improving CPU and memory utilization by 47.5% and 39.2%, respectively.
- Objectives are multi-dimensional, combining QoS, cost, and possibly energy or carbon constraints. Chen et al. [6] observed that rule-based policies struggle with such multi-objective trade-offs, often requiring manual tuning and lacking adaptability. Horn et al. [27] addressed this by employing ML-based performance modeling to jointly optimize response time SLOs and resource efficiency in Kubernetes environments. Saxena et al. [23] further demonstrated that a neural network–driven autoscaler could achieve energy-efficient VM allocation while maintaining SLA compliance, highlighting the flexibility of learned policies in balancing competing objectives.
- There are strong non-linear interactions between resources (CPU, memory, I/O) and end-to-end QoS that are difficult to encode as fixed rules. Wajahat et al. [19] developed MLscale, a neural network–based black-box performance model that accurately captured the non-linear relationship between resource metrics and response time. This enabled proactive autoscaling that minimized SLA violations more effectively than static heuristics. Similarly, Rossi et al. [76] showed that a reinforcement learning–based hybrid autoscaler could dynamically choose between horizontal and vertical scaling actions to address shifting bottlenecks, outperforming fixed strategies in both latency and resource efficiency.
- Training data are sparse or unrepresentative.
- Telemetry is noisy or delayed (see Section 5.2).
- The scaling granularity is coarse and actuation delays dominate, limiting the benefit of fine-grained policies.
5.2. How Do Actuation Delays and Telemetry Lag Change the Winner?
5.3. Is Diagonal (Hybrid) Scaling Worth the Complexity?
5.4. Centralised vs. Decentralised Control: Who Scales Better as Service Count Grows?
5.5. What Is the Cost of Safety and Multi-Objective Guarantees?
5.6. What Are the Challenges in Achieving Energy-Efficient Autoscaling?
6. Future Directions
6.1. Unified ML-Orchestration Frameworks
6.2. Edge Intelligence and Decentralised Scaling
6.3. Multi-Agent and Federated Learning for Autoscaling
6.4. Standardised Benchmarks and Evaluation Methodologies
- Representative application workloads (multi-tier and microservice);
- Trace-based and synthetic load patterns (including diurnal, bursty, and non-stationary scenarios);
- Standardised metrics (for example, tail-latency SLO shortfall, cost per request, oscillation indices, and where relevant energy or carbon metrics);
- Containerised experiment packages and open artefact repositories for reproducibility [86].
6.5. Sustainability and Green Autoscaling
6.6. Threats to Validity and Limitations
- Coverage Bias: The search was restricted to selected scholarly databases (IEEE Xplore, ACM DL, SpringerLink, ScienceDirect, Scopus) and may exclude relevant works published in other venues or grey literature.
- Time Window: The review considered studies published between 2015 and 2025, which omits earlier foundational work and very recent developments beyond the cutoff date.
- Terminology Bias: The search strategy relied on keywords such as autoscaling, elastic scaling, and related terms. Studies addressing similar concepts under different terminology (e.g., adaptive resource management, dynamic provisioning) may have been missed.
- Absence of Formal Quality Assessment: No formal risk-of-bias or methodological quality scoring was applied to the included studies. Consequently, the synthesis does not differentiate between high- and low-quality evidence.
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| API | Application Programming Interface |
| ARIMA | AutoRegressive Integrated Moving Average |
| AWS | Amazon Web Services |
| CPU | Central Processing Unit |
| DRL | Deep Reinforcement Learning |
| FaaS | Function-as-a-Service |
| GCP | Google Cloud Platform |
| HPA | Horizontal Pod Autoscaler |
| IaaS | Infrastructure-as-a-Service |
| IoT | Internet of Things |
| KEDA | Kubernetes Event-Driven Autoscaler |
| ML | Machine Learning |
| MAPE-K | Monitor, Analyze, Plan, Execute—Knowledge (Autonomic Loop) |
| MARL | Multi-Agent Reinforcement Learning |
| PaaS | Platform-as-a-Service |
| PCA | Principal Component Analysis |
| PRISMA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
| QoS | Quality of Service |
| RL | Reinforcement Learning |
| SARSA | State–Action–Reward–State–Action |
| SLA | Service Level Agreement |
| SLO | Service Level Objective |
| VPA | Vertical Pod Autoscaler |
| VM | Virtual Machine |
| K8s | Kubernetes |
| RAM | Random Access Memory |
| DVFS | Dynamic Voltage and Frequency Scaling |
References
- Calheiros, R.N.; Masoumi, E.; Ranjan, R.; Buyya, R. Workload prediction using ARIMA model and its impact on cloud applications’ QoS. IEEE Trans. Cloud Comput. 2015, 3, 449–458. [Google Scholar] [CrossRef]
- Alharthi, S.; Alshamsi, A.; Alseiari, A.; Alwarafy, A. Auto-scaling techniques in cloud computing: Issues and research directions. Sensors 2024, 24, 5551. [Google Scholar] [CrossRef]
- Dragoni, N.; Giallorenzo, S.; Lluch Lafuente, A.; Mazzara, M.; Montesi, F.; Mustafin, R.; Safina, L. Microservices: Yesterday, Today, and Tomorrow. In Present and Ulterior Software Engineering; Mazzara, M., Meyer, B., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 195–216. [Google Scholar] [CrossRef]
- Baldini, I.; Castro, P.; Chang, K.; Cheng, P.; Fink, S.; Ishakian, V.; Mitchell, N.; Muthusamy, V.; Rabbah, R.; Slominski, A.; et al. Serverless computing: Current trends and open problems. In Research Advances in Cloud Computing; Springer: Singapore, 2017; pp. 1–20. [Google Scholar] [CrossRef]
- Li, Y.; Lin, Y.; Wang, Y.; Ye, K.; Xu, C. Serverless computing: State-of-the-art, challenges and opportunities. IEEE Trans. Serv. Comput. 2022, 16, 1522–1539. [Google Scholar] [CrossRef]
- Chen, T.; Bahsoon, R.; Yao, X. A survey and taxonomy of self-aware and self-adaptive cloud autoscaling systems. ACM Comput. Surv. (CSUR) 2018, 51, 61. [Google Scholar] [CrossRef]
- Garí, Y.; Monge, D.A.; Pacini, E.; Mateos, C.; Garino, C.G. Reinforcement learning-based application autoscaling in the cloud: A survey. Eng. Appl. Artif. Intell. 2021, 102, 104288. [Google Scholar] [CrossRef]
- Al Qassem, L.M.; Stouraitis, T.; Damiani, E.; Elfadel, I.M. Containerized Microservices: A Survey of Resource Management Frameworks. IEEE Trans. Netw. Serv. Manag. 2024, 21, 3775–3796. [Google Scholar] [CrossRef]
- Dogani, J.; Namvar, R.; Khunjush, F. Auto-scaling techniques in container-based cloud and edge/fog computing: Taxonomy and survey. Comput. Commun. 2023, 200, 120–150. [Google Scholar] [CrossRef]
- Tran, M.N.; Vu, D.D.; Kim, Y. A Survey of Autoscaling in Kubernetes. In Proceedings of the 2022 Thirteenth International Conference on Ubiquitous and Future Networks (ICUFN), Barcelona, Spain, 5–8 July 2022; pp. 263–265. [Google Scholar] [CrossRef]
- Zhong, Z.; Xu, M.; Rodriguez, M.A.; Xu, C.; Buyya, R. Machine Learning-based Orchestration of Containers: A Taxonomy and Future Directions. ACM Comput. Surv. 2022, 54, 217. [Google Scholar] [CrossRef]
- Verma, S.; Bala, A. Auto-scaling techniques for IoT-based cloud applications: A review. Clust. Comput. 2021, 24, 2425–2459. [Google Scholar] [CrossRef]
- Qu, C.; Calheiros, R.N.; Buyya, R. Auto-scaling Web Applications in Clouds: A Taxonomy and Survey. ACM Comput. Surv. 2018, 51, 73. [Google Scholar] [CrossRef]
- Wohlin, C. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the EASE ’14: 18th International Conference on Evaluation and Assessment in Software Engineering, New York, NY, USA, 13–14 May 2014. [Google Scholar] [CrossRef]
- Hu, Y.; Deng, B.; Peng, F.; Wang, D. Workload Prediction for Cloud Computing Elasticity Mechanism. In Proceedings of the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 5–7 July 2016; IEEE: New York, NY, USA, 2016; pp. 244–249. [Google Scholar] [CrossRef]
- Liu, C.; Liu, C.; Shang, Y.; Chen, S.; Cheng, B.; Chen, J. An adaptive prediction approach based on workload pattern discrimination in the cloud. J. Netw. Comput. Appl. 2017, 80, 35–44. [Google Scholar] [CrossRef]
- Chen, Z.; Zhu, Y.; Di, Y.; Feng, S. Self-Adaptive Prediction of Cloud Resource Demands Using Ensemble Model and Subtractive-Fuzzy Clustering Based Fuzzy Neural Network. Comput. Intell. Neurosci. 2015, 2015, 919805. [Google Scholar] [CrossRef]
- Zhang, Q.; Yang, L.T.; Yan, Z.; Chen, Z.; Li, P. An Efficient Deep Learning Model to Predict Cloud Workload for Industry Informatics. IEEE Trans. Ind. Inform. 2018, 14, 3170–3178. [Google Scholar] [CrossRef]
- Wajahat, M.; Karve, A.; Kochut, A.; Gandhi, A. MLscale: A machine learning based application-agnostic autoscaler. Sustain. Comput. Inform. Syst. 2019, 22, 287–299. [Google Scholar] [CrossRef]
- Kim, I.K.; Wang, W.; Qi, Y.; Humphrey, M. Forecasting Cloud Application Workloads with CloudInsight for Predictive Resource Management. IEEE Trans. Cloud Comput. 2020, 10, 1848–1863. [Google Scholar] [CrossRef]
- Saxena, D.; Singh, A.K. A Proactive Autoscaling and Energy-Efficient VM Allocation Framework Using Online Multi-Resource Neural Network for Cloud Data Center. Neurocomputing 2021, 426, 248–264. [Google Scholar] [CrossRef]
- Xu, M.; Song, C.; Wu, H.; Gill, S.S.; Ye, K.; Xu, C. esDNN: Deep Neural Network Based Multivariate Workload Prediction in Cloud Computing Environments. ACM Trans. Internet Technol. 2022, 22, 75. [Google Scholar] [CrossRef]
- Saxena, D.; Kumar, J.; Singh, A.K.; Schmid, S. Performance Analysis of Machine Learning Centered Workload Prediction Models for Cloud. IEEE Trans. Parallel Distrib. Syst. 2023, 34, 1313–1330. [Google Scholar] [CrossRef]
- Shahin, A.A. Automatic Cloud Resource Scaling Algorithm based on Long Short-Term Memory Recurrent Neural Network. Int. J. Adv. Comput. Sci. Appl. 2016, 7. [Google Scholar] [CrossRef]
- Yu, G.; Chen, P.; Zheng, Z. Microscaler: Automatic Scaling for Microservices with an Online Learning Approach. In Proceedings of the 2019 IEEE International Conference on Web Services (ICWS), Milan, Italy, 8–13 July 2019; pp. 68–75. [Google Scholar] [CrossRef]
- Yan, M.; Liang, X.; Lu, Z.; Wu, J.; Zhang, W. HANSEL: Adaptive horizontal scaling of microservices using Bi-LSTM. Appl. Soft Comput. 2021, 105, 107216. [Google Scholar] [CrossRef]
- Horn, A.; Fard, H.M.; Wolf, F. Multi-objective hybrid autoscaling of microservices in Kubernetes clusters. In European Conference on Parallel Processing; Springer: Cham, Switzerland, 2022; pp. 233–250. [Google Scholar] [CrossRef]
- Pintye, I.; Kovács, J.; Lovas, R. Enhancing Machine Learning-Based Autoscaling for Cloud Resource Orchestration. J. Grid Comput. 2024, 22, 67. [Google Scholar] [CrossRef]
- Guruge, P.B.; Priyadarshana, Y.H.P.P. Time Series Forecasting-Based Kubernetes Autoscaling Using Facebook Prophet and Long Short-Term Memory. Front. Comput. Sci. 2025, 7, 1509165. [Google Scholar] [CrossRef]
- Rahman, J.; Lama, P. Predicting the End-to-End Tail Latency of Containerized Microservices in the Cloud. In Proceedings of the 2019 IEEE International Conference on Cloud Engineering (IC2E), Prague, Czech Republic, 24–27 June 2019; pp. 200–210. [Google Scholar] [CrossRef]
- Jeong, B.; Baek, S.; Park, S.; Jeon, J.; Jeong, Y.S. Stable and efficient resource management using deep neural network on cloud computing. Neurocomputing 2023, 521, 99–112. [Google Scholar] [CrossRef]
- Iqbal, W.; Dailey, M.N.; Carrera, D. Unsupervised Learning of Dynamic Resource Provisioning Policies for Cloud-Hosted Multitier Web Applications. IEEE Syst. J. 2015, 10, 1435–1446. [Google Scholar] [CrossRef]
- Yu, Y.; Jindal, V.; Yen, I.-L.; Bastani, F. Integrating Clustering and Learning for Improved Workload Prediction in the Cloud. In Proceedings of the 2016 IEEE 9th International Conference on Cloud Computing (CLOUD), San Francisco, CA, USA, 27 June–2 July 2016; pp. 876–879. [Google Scholar] [CrossRef]
- Nikravesh, A.Y.; Ajila, S.A.; Lung, C.H. An autonomic prediction suite for cloud resource provisioning. J. Cloud Comput. 2017, 6, 3. [Google Scholar] [CrossRef]
- Daradkeh, T.; Agarwal, A.; Goel, N.; Kozlowski, A.J. Dynamic K-Means Clustering of Workload and Cloud Resource Configuration for Cloud Elastic Model. IEEE Access 2020, 8, 219430–219445. [Google Scholar] [CrossRef]
- Shahidinejad, A.; Ghobaei-Arani, M.; Masdari, M. Resource provisioning using workload clustering in cloud computing environment: A hybrid approach. Clust. Comput. 2021, 24, 319–342. [Google Scholar] [CrossRef]
- Ghobaei-Arani, M.; Shahidinejad, A. An Efficient Resource Provisioning Approach for Analyzing Cloud Workloads: A Metaheuristic-Based Clustering Approach. J. Supercomput. 2021, 77, 711–750. [Google Scholar] [CrossRef]
- Sridhar, P.; Sathiya, R.R. Cloud Workload Forecasting via Latency-Aware Time Series Clustering-Based Scheduling Technique. Concurr. Comput. Pract. Exp. 2025, 37, e70151. [Google Scholar] [CrossRef]
- Betti, P.; Thushantha, L.; Khan, Z.; Munir, K. Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures: Current Status, Challenges, and Opportunities. Encyclopedia 2025, 5, 37. [Google Scholar] [CrossRef]
- Moghaddam, S.K.; Buyya, R.; Ramamohanarao, K. ACAS: An anomaly-based cause aware auto-scaling framework for clouds. J. Parallel Distrib. Comput. 2019, 126, 107–120. [Google Scholar] [CrossRef]
- Zhang, X.; Meng, F.; Xu, J. PerfInsight: A Robust Clustering-Based Abnormal Behavior Detection System for Large-Scale Cloud. In Proceedings of the 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), San Francisco, CA, USA, 2–7 July 2018; IEEE: New York, NY, USA, 2018; pp. 896–899. [Google Scholar] [CrossRef]
- He, Z.; Chen, P.; Li, X.; Wang, Y.; Yu, G.; Chen, C.; Li, X.; Zheng, Z. A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 1705–1719. [Google Scholar] [CrossRef] [PubMed]
- Liu, X.; Zhu, S.; Yang, F.; Liang, S.; Zhao, Z. Research on Unsupervised Anomaly Data Detection Method Based on Improved Automatic Encoder and Gaussian Mixture Model. J. Cloud Comput. 2022, 11, 58. [Google Scholar] [CrossRef]
- Ali, S.M.; Kecskemeti, G. SeQual: An Unsupervised Feature Selection Method for Cloud Workload Traces. J. Supercomput. 2023, 79, 15079–15097. [Google Scholar] [CrossRef]
- Ali, S.M.; Kecskemeti, G. EFection: Effectiveness Detection Technique for Clustering Cloud Workload Traces. Int. J. Comput. Intell. Syst. 2024, 17, 198. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, H.; Wen, Y. Elastic Resource Provisioning Using Data Clustering in Cloud Service Platform. IEEE Access 2020, 8, 108436–108447. [Google Scholar] [CrossRef]
- Rahmanian, A.A.; Ghobaei-Arani, M.; Tofighy, S. A learning automata-based ensemble resource usage prediction algorithm for cloud computing environment. Future Gener. Comput. Syst. 2018, 79, 54–71. [Google Scholar] [CrossRef]
- Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
- Rummery, G.A.; Niranjan, M. On-Line Q-Learning Using Connectionist Systems; Technical Report TR 166; Cambridge University Engineering Department: Cambridge, UK, 1994. [Google Scholar]
- Bahrpeyma, F.; Haghighi, H.; Zakerolhosseini, A. An adaptive RL based approach for dynamic resource provisioning in Cloud virtualized data centers. Computing 2015, 97, 1209–1234. [Google Scholar] [CrossRef]
- Jamshidi, P.; Sharifloo, A.M.; Pahl, C.; Metzger, A.; Estrada, G. Self-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge Evolution. In Proceedings of the 2015 International Conference on Cloud and Autonomic Computing, Boston, MA, USA, 21–25 September 2015; pp. 208–211. [Google Scholar] [CrossRef]
- Arabnejad, H.; Jamshidi, P.; Estrada, G.; El Ioini, N.; Pahl, C. An Auto-Scaling Cloud Controller Using Fuzzy Q-Learning—Implementation in OpenStack. In Service-Oriented and Cloud Computing; Aiello, M., Johnsen, E.B., Dustdar, S., Georgievski, I., Eds.; Springer: Cham, Switzerland, 2016; pp. 152–167. [Google Scholar] [CrossRef]
- Arabnejad, H.; Pahl, C.; Jamshidi, P.; Estrada, G. A Comparison of Reinforcement Learning Techniques for Fuzzy Cloud Auto-Scaling. In Proceedings of the 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Madrid, Spain, 14–17 May 2017; IEEE: New York, NY, USA, 2017; pp. 64–73. [Google Scholar] [CrossRef]
- Horovitz, S.; Arian, Y. Efficient Cloud Auto-Scaling with SLA Objective Using Q-Learning. In Proceedings of the 2018 IEEE 6th International Conference on Future Internet of Things and Cloud (FiCloud), Barcelona, Spain, 6–8 August 2018; IEEE: New York, NY, USA, 2018; pp. 85–92. [Google Scholar] [CrossRef]
- Nouri, S.M.R.; Li, H.; Venugopal, S.; Guo, W.; He, M.; Tian, W. Autonomic Decentralized Elasticity Based on a Reinforcement Learning Controller for Cloud Applications. Future Gener. Comput. Syst. 2019, 94, 765–780. [Google Scholar] [CrossRef]
- Bitsakos, C.; Konstantinou, I.; Koziris, N. DERP: A deep reinforcement learning cloud system for elastic resource provisioning. In Proceedings of the International Conference on Cloud Computing Technology and Science (CloudCom), Nicosia, Cyprus, 10–13 December 2018; IEEE Computer Society: New York, NY, USA, 2018; pp. 21–29. [Google Scholar] [CrossRef]
- Zhang, S.; Wu, T.; Pan, M.; Zhang, C.; Yu, Y. A-SARSA: A Predictive Container Auto-Scaling Algorithm Based on Reinforcement Learning. In Proceedings of the 2020 IEEE International Conference on Web Services (ICWS), Beijing, China, 19–23 October 2020; IEEE: New York, NY, USA, 2020; pp. 489–497. [Google Scholar] [CrossRef]
- Khaleq, A.A.; Ra, I. Intelligent Autoscaling of Microservices in the Cloud for Real-Time Applications. IEEE Access 2021, 9, 35464–35476. [Google Scholar] [CrossRef]
- Rossi, F.; Cardellini, V.; Presti, F.L.; Nardelli, M. Dynamic Multi-Metric Thresholds for Scaling Applications Using Reinforcement Learning. IEEE Trans. Cloud Comput. 2023, 11, 1807–1821. [Google Scholar] [CrossRef]
- Xue, S.; Qu, C.; Shi, X.; Liao, C.; Zhu, S.; Tan, X.; Ma, L.; Wang, S.; Wang, S.; Hu, Y.; et al. A Meta Reinforcement Learning Approach for Predictive Autoscaling in the Cloud. In Proceedings of the KDD ’22: 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 14–18 August 2022; pp. 4290–4299. [Google Scholar] [CrossRef]
- Hanafy, W.A.; Liang, Q.; Bashir, N.; Irwin, D.; Shenoy, P. CarbonScaler: Leveraging Cloud Workload Elasticity for Optimizing Carbon-Efficiency. Proc. ACM Meas. Anal. Comput. Syst. 2023, 7, 57. [Google Scholar] [CrossRef]
- Fodor, B.; Jakub, Á; Szűcs, G.; Sonkoly, B. A Multi-Agent Deep-Reinforcement Learning Approach for Application-Agnostic Microservice Scaling. In Proceedings of the 2023 IEEE Virtual Conference on Communications (VCC), New York, NY, USA, 28–30 November 2023; pp. 139–144. [Google Scholar] [CrossRef]
- Bai, H.; Xu, M.; Ye, K.; Buyya, R.; Xu, C. DRPC: Distributed Reinforcement Learning Approach for Scalable Resource Provisioning in Container-Based Clusters. IEEE Trans. Serv. Comput. 2024, 17, 2433–2446. [Google Scholar] [CrossRef]
- Prodanov, J.; Bertalanič, B.; Fortuna, C.; Chou, S.-K.; Jurič, M.B.; Sanchez-Iborra, R.; Hribar, J. Multi-Agent Reinforcement Learning-Based In-Place Scaling Engine for Edge-Cloud Systems. In Proceedings of the 2025 IEEE 18th International Conference on Cloud Computing (CLOUD), Helsinki, Finland, 7–12 July 2025; pp. 32–42. [Google Scholar] [CrossRef]
- Qiu, H.; Mao, W.; Wang, C.; Franke, H.; Youssef, A.; Kalbarczyk, Z.T.; Başar, T.; Iyer, R.K. AWARE: Automate Workload Autoscaling with Reinforcement Learning in Production Cloud Systems. In Proceedings of the 2023 USENIX Annual Technical Conference (USENIX ATC 23), Boston, MA, USA, 10–12 July 2023; pp. 387–402. [Google Scholar]
- Park, J.; Choi, B.; Lee, C.; Han, D. Graph Neural Network-Based SLO-Aware Proactive Resource Autoscaling Framework for Microservices. IEEE/ACM Trans. Netw. 2024, 32, 1325–1340. [Google Scholar] [CrossRef]
- Santos, J.; Reppas, E.; Wauters, T.; Volckaert, B.; De Turck, F. Gwydion: Efficient auto-scaling for complex containerized applications in Kubernetes through Reinforcement Learning. J. Netw. Comput. Appl. 2025, 234, 104067. [Google Scholar] [CrossRef]
- Yuan, H.; Wang, T.; Fu, M.; Shi, Y. GIRP: Energy-Efficient QoS-Oriented Microservice Resource Provisioning via Multi-Objective Multi-Task Reinforcement Learning. IEEE Trans. Mob. Comput. 2025, 24, 5793–5807. [Google Scholar] [CrossRef]
- Hua, Q.; Yang, D.; Qian, S.; Cao, J.; Xue, G.; Li, M. Humas: A Heterogeneity- and Upgrade-Aware Microservice Auto-Scaling Framework in Large-Scale Data Centers. IEEE Trans. Comput. 2025, 74, 968–982. [Google Scholar] [CrossRef]
- Qiu, H.; Mao, W.; Patke, A.; Cui, S.; Jha, S.; Wang, C.; Franke, H.; Kalbarczyk, Z.; Başar, T.; Iyer, R.K. Power-aware Deep Learning Model Serving with μ-Serve. In Proceedings of the 2024 USENIX Annual Technical Conference (USENIX ATC 24), Santa Clara, CA, USA, 10–12 July 2024; pp. 75–93. [Google Scholar]
- Zhang, C.; Yu, M.; Wang, W.; Yan, F. MArk: Exploiting Cloud Services for Cost-Effective, SLO-Aware Machine Learning Inference Serving. In Proceedings of the 2019 USENIX Annual Technical Conference (USENIX ATC 19), Renton, WA, USA, 10–12 July 2019; pp. 1049–1062. [Google Scholar]
- Kim, Y.G.; Wu, C.J. AutoScale: Energy Efficiency Optimization for Stochastic Edge Inference Using Reinforcement Learning. In Proceedings of the 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Athens, Greece, 7–21 October 2020; pp. 1082–1096. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, Q.; Chu, X. Energy-efficient Inference Service of Transformer-based Deep Learning Models on GPUs. In Proceedings of the 2020 International Conferences on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics (Cybermatics), Rhodes, Greece, 2–6 November 2020; pp. 323–331. [Google Scholar] [CrossRef]
- Cañete, A.; Djemame, K.; Amor, M.; Fuentes, L.; Aljulayfi, A. A proactive energy-aware auto-scaling solution for edge-based infrastructures. In Proceedings of the 2022 IEEE/ACM 15th International Conference on Utility and Cloud Computing (UCC), Vancouver, WA, USA, 6–9 December 2022; pp. 240–247. [Google Scholar] [CrossRef]
- Benifa, J.V.B.; Dejey, D. RLPAS: Reinforcement Learning-Based Proactive Auto-Scaler for Resource Provisioning in Cloud Environment. Mob. Netw. Appl. 2019, 24, 1348–1363. [Google Scholar] [CrossRef]
- Rossi, F.; Nardelli, M.; Cardellini, V. Horizontal and Vertical Scaling of Container-Based Applications Using Reinforcement Learning. In Proceedings of the 2019 IEEE 12th International Conference on Cloud Computing (CLOUD), Milan, Italy, 8–13 July 2019; IEEE: New York, NY, USA, 2019; pp. 329–338. [Google Scholar] [CrossRef]
- Xu, M.; Song, C.; Ilager, S.; Gill, S.S.; Zhao, J.; Ye, K.; Xu, C. CoScal: Multifaceted Scaling of Microservices with Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2022, 19, 3995–4009. [Google Scholar] [CrossRef]
- Qiu, H.; Banerjee, S.S.; Jha, S.; Kalbarczyk, Z.T.; Iyer, R.K. FIRM: An Intelligent Fine-grained Resource Management Framework for SLO-Oriented Microservices. In Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), Online, 4–6 November 2020; pp. 805–825. [Google Scholar]
- Wang, Z.; Zhu, S.; Li, J.; Jiang, W.; Ramakrishnan, K.K.; Yan, M. DeepScaling: Autoscaling Microservices With Stable CPU Utilization for Large Scale Production Cloud Systems. IEEE/ACM Trans. Netw. 2024, 32, 3267–3282. [Google Scholar] [CrossRef]
- Mangalampalli, S.; Karri, G.R.; Kumar, M.; Khalaf, O.I.; Romero, C.A.T.; Sahib, G.M.A. DRLBTSA: Deep reinforcement learning based task-scheduling algorithm in cloud computing. Multimed. Tools Appl. 2024, 83, 8359–8387. [Google Scholar] [CrossRef]
- Wei, Y.; Kudenko, D.; Deng, S.; Wu, L.; Fu, X.; Liu, X.; Wu, X.; Meng, X. A Reinforcement Learning Based Auto-Scaling Approach for SaaS Providers in Dynamic Cloud Environments. Math. Probl. Eng. 2019, 2019, 5080647. [Google Scholar] [CrossRef]
- Khaleq, A.A.; Ra, I. Development of QoS-aware agents with reinforcement learning for autoscaling of microservices on the cloud. In Proceedings of the 2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C), Washington, DC, USA, 27 September–1 October 2021; IEEE: New York, NY, USA, 2021; pp. 13–19. [Google Scholar] [CrossRef]
- Choochotkaew, S.; Chiba, T.; Trent, S.; Amaral, M. Run Wild: Resource Management System with Generalized Modeling for Microservices on Cloud. In Proceedings of the 2021 IEEE 14th International Conference on Cloud Computing (CLOUD), Chicago, IL, USA, 5–10 September 2021; IEEE: New York, NY, USA, 2021; pp. 609–618. [Google Scholar] [CrossRef]
- Golshani, E.; Ashtiani, M. Proactive auto-scaling for cloud environments using temporal convolutional neural networks. J. Parallel Distrib. Comput. 2021, 154, 119–141. [Google Scholar] [CrossRef]
- Zhang, Y.; Hua, W.; Zhou, Z.; Suh, G.E.; Delimitrou, C. Sinan: ML-based and QoS-aware resource management for cloud microservices. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual, 19–23 April 2021; pp. 167–181. [Google Scholar] [CrossRef]
- Tamiru, M.A.; Tordsson, J.; Elmroth, E.; Pierre, G. An Experimental Evaluation of the Kubernetes Cluster Autoscaler in the Cloud. In Proceedings of the 2020 IEEE International Conference on Cloud Computing Technology and Science (CloudCom), Bangkok, Thailand, 14–17 December 2020; pp. 17–24. [Google Scholar] [CrossRef]
- Aslanpour, M.S.; Toosi, A.N.; Taheri, J.; Gaire, R. AutoScaleSim: A simulation toolkit for auto-scaling Web applications in clouds. Simul. Model. Pract. Theory 2021, 108, 102245. [Google Scholar] [CrossRef]
- Nguyen, H.X.; Zhu, S.; Liu, M. Graph-PHPA: Graph-based Proactive Horizontal Pod Autoscaling for Microservices using LSTM-GNN. In Proceedings of the 2022 IEEE 11th International Conference on Cloud Networking (CloudNet), Paris, France, 7–10 November 2022; pp. 237–241. [Google Scholar] [CrossRef]
- Esposito, M.; Bakhtin, A.; Ahmad, N.; Robredo, M.; Su, R.; Lenarduzzi, V.; Taibi, D. Autonomic Microservice Management via Agentic AI and MAPE-K Integration. In Proceedings of the 19th European Conference on Software Architecture (ECSA 2025), Limassol, Cyprus, 15–19 September 2025; Bianculli, D., Sartaj, H., Andrikopoulos, V., Pautasso, C., Mikkonen, T., Perez, J., Bureš, T., De Sanctis, M., Muccini, H., Navarro, E., et al., Eds.; Springer: Cham, Switzerland, 2025; Volume 15982, pp. 105–118. [Google Scholar] [CrossRef]
- Kumar, B.; Verma, A.; Verma, P. A multivariate transformer-based monitor-analyze-plan-execute (MAPE) autoscaling framework for dynamic resource allocation in cloud environment. Computing 2025, 107, 69. [Google Scholar] [CrossRef]
- Karol Santos Nunes, J.P.; Nejati, S.; Sabetzadeh, M.; Nakagawa, E.Y. Self-adaptive, Requirements-driven Autoscaling of Microservices. In Proceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’24), Lisbon, Portugal, 15–16 April 2024; pp. 168–174. [Google Scholar] [CrossRef]
- Jamshidi, P.; Pahl, C.; Mendonça, N.C. Managing Uncertainty in Autonomic Cloud Elasticity Controllers. IEEE Cloud Comput. 2016, 3, 50–60. [Google Scholar] [CrossRef]
- Pham, K.Q.; Kim, T. Elastic Federated Learning with Kubernetes Vertical Pod Autoscaler for edge computing. Future Gener. Comput. Syst. 2024, 158, 501–515. [Google Scholar] [CrossRef]
- Nguyen, T.T.; Yeom, Y.J.; Kim, T.; Park, D.H.; Kim, S. Horizontal Pod Autoscaling in Kubernetes for Elastic Container Orchestration. Sensors 2020, 20, 4621. [Google Scholar] [CrossRef]
- Thota, R.C. Intelligent Auto-Scaling in AWS: Machine Learning Approaches for Predictive Resource Allocation. Int. J. Sci. Res. Manag. (IJSRM) 2022, 10, 999–1005. [Google Scholar] [CrossRef]
- Poppe, O.; Guo, Q.; Lang, W.; Arora, P.; Oslake, M.; Xu, S.; Kalhan, A. Moneyball: Proactive auto-scaling in Microsoft Azure SQL database serverless. Proc. VLDB Endow. 2022, 15, 1279–1287. [Google Scholar] [CrossRef]
- Guo, Y.; Ge, J.; Guo, P.; Chai, Y.; Li, T.; Shi, M.; Tu, Y.; Ouyang, J. PASS: Predictive Auto-Scaling System for Large-scale Enterprise Web Applications. In Proceedings of the ACM Web Conference (WWW ’24), Singapore, 13–17 May 2024; pp. 2747–2758. [Google Scholar] [CrossRef]
- Rzadca, K.; Findeisen, P.; Swiderski, J.; Zych, P.; Broniek, P.; Kusmierek, J.; Nowak, P.; Strack, B.; Witusowski, P.; Hand, S.; et al. Autopilot: Workload autoscaling at Google. In Proceedings of the Fifteenth European Conference on Computer Systems (EuroSys ’20), Heraklion, Greece, 27–30 April 2020. [Google Scholar] [CrossRef]
- Bao, G.; Guo, P. Federated learning in cloud-edge collaborative architecture: Key technologies, applications and challenges. J. Cloud Comput. 2022, 11, 94. [Google Scholar] [CrossRef] [PubMed]






| Survey Paper | Year | What Is Covered | What Is Not Covered |
|---|---|---|---|
| Containerized Microservices: A Survey of Resource Management Frameworks [8] | 2024 | Frameworks for container/microservice resource allocation and autoscaling; resource models; hardware-aware scaling; SLA and cost considerations. | No serverless; minimal ML/RL depth; no multi-agent RL; no benchmarks/evaluation frameworks; limited provider integration; sustainability barely mentioned. |
| Auto-Scaling Techniques in Cloud Computing: Issues and Research Directions [2] | 2024 | Broad taxonomy; ML, RL, fuzzy logic, time-series; reactive/proactive; QoS, cost, energy; AWS/Azure comparison. | Limited microservices/serverless focus; no benchmarks; sustainability lightly touched; no multi-agent RL. |
| Auto-scaling techniques in container-based cloud and edge/fog computing: Taxonomy and survey [9] | 2023 | Container autoscaling in cloud-edge/fog; taxonomy; latency & resource efficiency; predictive heuristics. | No serverless; shallow ML coverage; no multi-agent RL; no benchmarks; minimal provider details; sustainability absent. |
| A Survey of Autoscaling in Kubernetes [10] | 2022 | HPA, VPA, custom metrics; pod/container scaling mechanisms. | No formal taxonomy; almost no ML; no benchmarks; no provider integration; sustainability absent; no multi-agent RL. |
| Machine Learning-based Orchestration of Containers: Taxonomy and Future Directions [11] | 2022 | ML orchestration for containers; supervised, RL, DL; taxonomy; QoS and resource utilization focus. | No serverless; no benchmarks; no multi-agent RL; sustainability absent; limited provider details. |
| Auto-scaling techniques for IoT-based cloud applications: A review [12] | 2021 | IoT-specific autoscaling; fuzzy logic and basic RL; taxonomy; latency/QoS focus. | No microservices/serverless; limited ML depth; no benchmarks; no provider integration; sustainability absent. |
| Auto-scaling Web Applications in Clouds: A Taxonomy and Survey [13] | 2018 | Classic reactive/proactive taxonomy; VM-level web apps; QoS and cost; elasticity challenges. | No microservices/serverless; minimal ML; outdated scope; no benchmarks; sustainability absent; no multi-agent RL. |
| Symbol | Description |
|---|---|
| t | Decision time step, |
| Resource allocation vector at time t | |
| Workload/demand at time t | |
| Predicted workload at time t | |
| Cost function at time t | |
| QoS metric at time t | |
| QoS threshold (SLO bound) | |
| Learned policy parameterised by | |
| Reward at time t | |
| Discount factor, | |
| Indicator function |
| Category | Example Methods | Description | Timing |
|---|---|---|---|
| Rule based | Threshold rules, step policies, schedules | Manually defined rules on metrics such as CPU, memory, or latency; simple and widely deployed in cloud and Kubernetes autoscalers. | Reactive |
| Supervised ML | Regression, time series forecasting (ARIMA, LSTM, Prophet) | Predicts future demand or performance; the autoscaler adjusts resources based on forecasted values. | Proactive |
| Unsupervised ML | Clustering, anomaly detection | Finds patterns or outliers in workload metrics without labels; can trigger scaling on abnormal conditions or specialise policies for clusters of workloads. | Reactive or proactive |
| Reinforcement learning | Q-learning, DQN, actor-critic, PPO | Learns scaling policies from trial and error using reward signals that combine QoS and cost; can handle hybrid (horizontal and vertical) actions and long-term effects. | Reactive and proactive (hybrid) |
| Year | Paper Title | ML Technique | Workload Type | Autoscaling Objective | Deployment | Training Mode |
|---|---|---|---|---|---|---|
| 2025 | Time Series Forecasting-Based Kubernetes Autoscaling Using Facebook Prophet and LSTM [29] | Prophet + LSTM hybrid | Containers (Kubernetes) | HTTP request prediction, SLA compliance | Cloud | Offline |
| 2024 | Enhancing Machine Learning-Based Autoscaling for Cloud Resource Orchestration [28] | Statistical feature selection + ML | Cloud services (IaaS) | QoS-aware resource management | Cloud | Offline |
| 2023 | Performance Analysis of Machine Learning Centered Workload Prediction Models for Cloud [23] | Comparative ML models (LSTM, CNN, ensemble) | Cloud services (IaaS) | Workload prediction accuracy | Cloud | Offline |
| 2023 | Stable and Efficient Resource Management Using Deep Neural Network on Cloud Computing [31] | Deep Neural Network | Containers (Kubernetes pods) | Resource utilization, overload prevention | Cloud | Offline |
| 2022 | Machine Learning-Based Adaptive Auto-scaling Policy for Resource Orchestration in Kubernetes Clusters [11] | LSTM Recurrent Neural Network | Containers (Kubernetes pods) | Resource utilization (performance) | Cloud | Offline |
| 2022 | Multi-objective Hybrid Autoscaling of Microservices in Kubernetes Clusters [27] | ML-based performance modeling | Microservices (Kubernetes) | Response time SLO + resource efficiency | Cloud | Offline |
| 2022 | esDNN: Deep Neural Network Based Multivariate Workload Prediction in Cloud Computing Environments [22] | Deep Neural Network (GRU-based) | Cloud services (VMs) | Workload prediction, auto-scaling | Cloud | Offline |
| 2021 | Hansel: A Bi-LSTM Based Proactive Auto-Scaler [26] | Bi-LSTM | Cloud services (VMs) | Workload prediction, SLA compliance | Cloud | Offline |
| 2021 | A Proactive Autoscaling and Energy-Efficient VM Allocation Framework Using Online Multi-Resource Neural Network [21] | Multi-resource Neural Network | Virtual Machines (VMs) | Energy efficiency, proactive scaling | Cloud | Online |
| 2020 | Forecasting Cloud Application Workloads with CloudInsight for Predictive Resource Management [20] | Ensemble model (multiple predictors) | Cloud applications | Cost efficiency, SLA compliance | Cloud | Offline |
| 2019 | MLscale: A Machine Learning-Based Application-Agnostic Autoscaler [19] | Neural network with regression | Cloud resources (VMs) | Performance metrics (response time) | Cloud | Online |
| 2019 | Predicting the End-to-End Tail Latency of Containerized Microservices in the Cloud [30] | Machine learning regression | Containers, microservices | Tail latency | Cloud | Offline |
| 2019 | Microscaler: Automatic Scaling for Microservices with an Online Learning Approach [25] | Online Bayesian regression | Microservices | Latency SLO | Cloud | Online |
| 2018 | An Efficient Deep Learning Model to Predict Cloud Workload for Industry Informatics [18] | Deep Neural Network | Cloud services (IaaS) | Workload prediction, industrial applications | Cloud | Offline |
| 2017 | An Adaptive Prediction Approach Based on Workload Pattern Discrimination in the Cloud [16] | SVM, Linear Regression | Cloud tasks (IaaS, VMs) | Workload (throughput, latency) | Cloud | Offline |
| 2016 | Workload Prediction for Cloud Computing Elasticity Mechanism [15] | Random Forest, ARIMA, SVM | Cloud services (IaaS) | Elastic scaling, prediction accuracy | Cloud | Offline |
| 2015 | Workload Prediction Using ARIMA Model and Its Impact on Cloud Applications’ QoS [1] | ARIMA | Web applications (SaaS) | QoS, proactive provisioning | Cloud | Offline |
| 2015 | Self-Adaptive Prediction of Cloud Resource Demands Using Ensemble Model and Subtractive-Fuzzy Clustering Based Fuzzy Neural Network [17] | Ensemble + Fuzzy Neural Network | Cloud services (IaaS) | Resource demand prediction | Cloud | Offline |
| Year | Paper Title | ML Technique | Workload Type | Autoscaling Objective | Deployment | Training Mode |
|---|---|---|---|---|---|---|
| 2025 | Cloud Workload Forecasting via Latency-Aware Time Series Clustering-Based Scheduling Technique [38] | Dynamic fuzzy c-means clustering | Cloud services (IaaS) | Latency-aware scheduling, resource optimization | Cloud | Offline |
| 2025 | Horizontal Autoscaling of Virtual Machines in Hybrid Cloud Infrastructures [39] | K-means clustering | Virtual machines (IaaS) | Response time, throughput, SLA compliance | Hybrid | Offline |
| 2024 | EFection: Effectiveness Detection Technique for Clustering Cloud Workload Traces [45] | Adaptive clustering with internal validation | Cloud services (IaaS) | Workload classification, resource optimization | Cloud | Offline |
| 2023 | A Spatiotemporal Deep Learning Approach for Unsupervised Anomaly Detection in Cloud Systems [42] | Graph neural network + LSTM (TopoMAD) | Cloud systems (VMs/containers) | Performance anomaly detection, system reliability | Cloud | Offline |
| 2023 | SeQual: An Unsupervised Feature Selection Method for Cloud Workload Traces [44] | Silhouette-based feature selection + clustering | Cloud services (IaaS) | Workload characterization, user identification | Cloud | Offline |
| 2022 | Research on Unsupervised Anomaly Data Detection Method Based on Improved Autoencoder and Gaussian Mixture Model [43] | Deep autoencoder + GMM (MemAe-gmm-ma) | Cloud services (IaaS) | Anomaly detection, cloud security | Cloud | Offline |
| 2021 | Resource Provisioning Using Workload Clustering in Cloud Computing Environment: A Hybrid Approach [36] | QoS-based K-means + fuzzy logic | Cloud services (VMs) | SLA compliance, resource optimization | Cloud | Offline |
| 2021 | An Efficient Resource Provisioning Approach for Analyzing Cloud Workloads: A Metaheuristic-Based Clustering Approach [37] | GA + Fuzzy C-means + Gray Wolf Optimizer | Cloud services (IaaS) | QoS-aware resource provisioning | Cloud | Offline |
| 2020 | Dynamic K-Means Clustering of Workload and Cloud Resource Configuration for Cloud Elastic Model [35] | Enhanced K-means with kernel density estimation | Cloud services (IaaS) | Elastic scaling, workload-resource mapping | Cloud | Offline |
| 2020 | Elastic Resource Provisioning Using Data Clustering in Cloud Service Platform [46] | Clustering ensemble method | Cloud services (IaaS) | Dynamic resource provisioning, task scheduling | Cloud | Online |
| 2019 | ACAS: An Anomaly-Based Cause Aware Auto-Scaling Framework for Clouds [40] | Isolation Forest (anomaly detection) | Virtual Machines (VMs) | SLA-aware scaling | Cloud | Online |
| 2018 | A Learning Automata-Based Ensemble Resource Usage Prediction Algorithm [47] | Ensemble prediction with clustering | Cloud services (IaaS) | Resource usage prediction accuracy | Cloud | Online |
| 2018 | PerfInsight: A Robust Clustering-Based Abnormal Behavior Detection System for Large-Scale Cloud [41] | Clustering-based anomaly detection | Cloud services (VMs) | Abnormal behavior detection, system reliability | Cloud | Online |
| 2017 | An Autonomic Prediction Suite for Cloud Resource Provisioning [34] | Unsupervised clustering of resource usage profiles | Cloud services (IaaS) | Improved provisioning accuracy, SLA compliance | Cloud | Offline |
| 2016 | Integrating Clustering and Learning for Improved Workload Prediction in the Cloud [33] | K-means clustering + neural network | Cloud services (IaaS) | Workload prediction, resource provisioning | Cloud | Offline |
| 2015 | Unsupervised Learning of Dynamic Resource Provisioning Policies for Cloud-Hosted Multitier Web Applications [32] | Unsupervised clustering + online learning | Web applications (multi-tier) | Dynamic provisioning, SLO compliance | Cloud | Online |
| Year | Paper Title | RL Technique | Workload Type | Autoscaling Objective | Deployment | RL Method |
|---|---|---|---|---|---|---|
| 2025 | Multi-Agent RL-Based In-Place Scaling Engine for Edge-Cloud [64] | Multi-Agent Deep RL | Edge-Cloud microservices | In-place scaling latency reduction | Edge-Cloud | Off-policy |
| 2025 | Gwydion: Efficient Auto-Scaling for Complex Containerized Applications [67] | RL (OpenAI Gym-based) | Microservices (Kubernetes) | Latency-aware horizontal scaling | Cloud | Off-policy |
| 2023 | AWARE: RL-Based Autoscaling in Production Cloud Systems [65] | Meta-RL with safe exploration | Mixed workloads | Minimize SLO violations | Cloud | Off-policy |
| 2023 | Multi-Agent Deep-RL for Application-Agnostic Microservice Scaling [62] | Multi-Agent Deep RL (MADDPG) | Microservices | Application-agnostic horizontal scaling | Cloud | Off-policy |
| 2023 | Dynamic Multi-Metric Thresholds for Scaling Using RL [59] | Deep Q-Learning | Cloud applications | Adaptive scaling thresholds | Cloud | Off-policy |
| 2022 | A Meta RL Approach for Predictive Autoscaling [60] | Meta-RL (PPO-based) | Cloud services | Predictive scaling generalization | Cloud | On-policy |
| 2021 | Intelligent Autoscaling of Microservices for Real-Time Applications [58] | Actor-Critic, DQN, SARSA, Q-learning | Microservices | Response time optimization | Cloud | Mixed |
| 2020 | A-SARSA: Predictive Container Auto-Scaling Based on RL [57] | SARSA + ARIMA prediction | Containers | SLA violation reduction | Cloud | On-policy |
| 2019 | RLPAS: RL-Based Proactive Auto-Scaler [75] | Parallel SARSA | VMs | SLA reduction | Cloud | On-policy |
| 2019 | Autonomic Decentralized Elasticity Based on RL Controller [55] | Distributed Q-learning | Web apps | SLA reduction | Cloud | Off-policy |
| 2019 | Horizontal and Vertical Scaling Using RL [76] | Model-based RL | Containers | Response time optimization | Cloud | Off-policy |
| 2018 | Efficient Cloud Auto-Scaling with SLA Using Q-Learning [54] | Q-learning | Web apps | SLA-aware scaling | Cloud | Off-policy |
| 2018 | DERP: Deep RL for Elastic Resource Provisioning [56] | Deep Q-Network | NoSQL cluster | Throughput optimization | Cloud | Off-policy |
| 2017 | Comparison of RL Techniques for Fuzzy Cloud Auto-Scaling [53] | Fuzzy SARSA, Fuzzy Q-learning | OpenStack VMs | SLA compliance | Cloud | Mixed |
| 2016 | Auto-Scaling Cloud Controller Using Fuzzy Q-Learning [52] | Fuzzy Q-learning | OpenStack VMs | Response time optimization | Cloud | Off-policy |
| 2015 | Adaptive RL-Based Approach for Dynamic Resource Provisioning [50] | Continuous Q-learning | VMs | Energy minimization | Cloud | Off-policy |
| 2015 | Self-Learning Cloud Controllers: Fuzzy Q-Learning [51] | Fuzzy Q-learning | Cloud VMs | Knowledge evolution | Cloud | Off-policy |
| Year | Paper Title | RL Technique | Workload Type | Autoscaling Objectives | Deployment | RL Method |
|---|---|---|---|---|---|---|
| 2025 | GIRP: Energy-Efficient QoS-Oriented Microservice Resource Provisioning [68] | Multi-objective Multi-task DDPG | Microservices | Energy efficiency + latency minimization | Cloud | Off-policy |
| 2025 | Humas: Heterogeneity- and Upgrade-Aware Microservice Auto-Scaling [69] | Adaptive RL | Microservices (large-scale) | Resource heterogeneity + rolling updates | Cloud | Off-policy |
| 2024 | DRPC: Distributed RL for Scalable Resource Provisioning [63] | TD3 (distributed) | Containers (Kubernetes) | QoS + resource utilization | Cloud | Off-policy |
| 2024 | DeepScaling: Autoscaling Microservices With Stable CPU [79] | Deep learning-based | Microservices (production) | CPU stability + SLA compliance | Cloud | Off-policy |
| 2024 | GNN-Based SLO-Aware Proactive Resource Autoscaling [66] | GNN + RL | Microservices | SLO compliance + resource efficiency | Cloud | Off-policy |
| 2024 | DRLBTSA: Deep RL-Based Task-Scheduling Algorithm [80] | Deep Q-Network (DQN) | Cloud tasks (heterogeneous) | Makespan + SLA violations + energy | Cloud | Off-policy |
| 2022 | CoScal: Multifaceted Scaling of Microservices with RL [77] | DQN-based multi-faceted scaling | Microservices | Response time SLO + cost optimization | Cloud | Off-policy |
| 2020 | FIRM: Fine-Grained Intelligent Resource Management [78] | DDPG + SVM-guided RL | Microservices | SLO violation reduction + fine-grained control | Cloud | Off-policy |
| 2019 | RL-Based AutoScaling Approach for SaaS Providers [81] | Q-learning | SaaS apps on VMs | Cost minimization + SLA satisfaction | Cloud | Off-policy |
| Method | Typical Workload Types | Evaluation Environments | Typical Gains Reported |
|---|---|---|---|
| Reinforcement Learning (RL) | Microservices, containerized apps | Kubernetes clusters, cloud testbeds, production systems | SLA violation reduction (up to 30%), cost savings (15–25%), energy reduction (up to 43%), improved adaptability under dynamic workloads [62,63,65,68,76] |
| Supervised Learning | VM-based workloads, multi-tier web apps, microservices | Simulation tools (CloudSim, AutoScaleSim), controlled testbeds | Accurate workload prediction; proactive scaling reduces latency spikes by 10–20% [19,26,29,31] |
| Unsupervised Learning | Hybrid cloud workloads (primarily VMs); containers/microservices | Simulation + emulation; testbeds discussed in reviewed works | Reduced oscillations and improved SLA compliance via proactive methods; clustering for workload grouping enables tailored scaling decisions [38,39,45] |
| Year | Reference | SLA Compliance | Cost/Resource Usage | Stability | Convergence Time | Prediction Accuracy |
|---|---|---|---|---|---|---|
| 2023 | Jeong et al. [31] | SLA + energy-aware scaling | Cost and carbon footprint | Not reported | RL adaptation time | N/A |
| 2022 | Xue et al. [60] | SLO violation reduction | 50% resource savings | Resource Control Stability 0.91–0.95 | Fast meta-RL adaptation | Workload Root Mean Squared Error 112.59 |
| 2021 | Golshani & Ashtiani [84] | SLA compliance under bursty traffic | Cost normalized to baseline | Stability index | Adaptation time | Forecast RMSE |
| 2021 | Zhang et al. [85] | SLO violation reduction | Resource efficiency (CPU limits) | Not reported | Not reported | N/A |
| 2021 | Yan et al. [26] | SLA compliance | Cost vs. baseline | Not reported | Not reported | MAE/RMSE (workload prediction) |
| 2021 | Shahidinejad et al. [36] | SLA compliance via QoS-based clustering | Cost savings vs. baseline | Scaling decisions count | Not reported | N/A |
| 2020 | Tamiru et al. [86] | SPEC Cloud metrics (under/over-provisioning) | Monetary cost comparison | Instability of elasticity | Not reported | N/A |
| 2019 | Wajahat et al. [19] | 79% reduction in SLA violations (vs. reactive baselines) | 23% lower resource costs (vs. reactive baselines) | Not reported | Not reported | RMSE/MAPE on response time prediction (specific values vary by workload; e.g., MAPE 5–15%) |
| 2019 | Rossi et al. [59] | 95th-percentile latency vs. HPA | Avg. CPU cores (10% savings) | Variance in replica count | RL convergence (episodes) | N/A |
| 2019 | Benifa and Dejey [75] | SLA violation penalty in reward | VM-hours vs. baseline | Scaling frequency | Training episodes to convergence | N/A |
| 2017 | Arabnejad et al. [53] | SLA compliance | Cost reduction vs. baseline | Not reported | Not reported | N/A |
| 2015 | Calheiros et al. [1] | QoS impact analysis | Resource utilization | Prediction accuracy (RMSE/MAE) | Scaling overhead | Response time |
| Year | Reference | Benchmark Application | Real Cloud Platform | Interaction Mode | Workload Characteristics | Autoscaler Type Tested | Open-Source Availability |
|---|---|---|---|---|---|---|---|
| 2023 | Jeong et al. [31] | Sustainability-focused workloads | Yes (AWS) | Real deployment | Mixed CPU/energy profiles | RL-based | No |
| 2023 | Betti et al. [62] | Custom microservice benchmark (complex inter-service dependencies) | No | Simulation/testbed (Kubernetes-like environment) | Complex inter-service graph | Multi-agent deep RL (MADDPG) | Yes |
| 2021 | Zhang et al. [85] | DeathStarBench (social network) | No | Kubernetes testbed | Microservices, diurnal patterns | ML-based (Sinan) | Yes |
| 2020 | Tamiru et al. [86] | No (evaluation of configurations) | No | Real cloud testbed (GKE) | Mixed (representative applications: e.g., web serving, batch processing) | Experimental evaluation of Kubernetes Cluster Autoscaler | Yes |
| 2019 | Rossi et al. [59] | RUBiS, DVD Store | No | Kubernetes testbed | Diurnal cycles, bursty traffic | RL-based | Yes |
| 2019 | Gao et al. [88] | DeathStarBench (social network) | No | Testbed | Microservices, inter-service dependencies | GNN-based autoscaler | Yes |
| 2017 | Arabnejad et al. [53] | Synthetic workloads | No (OpenStack testbed) | Real deployment | Traffic variability, sudden spikes | Fuzzy Q-learning | No |
| Phase | ML Focus | Example Use/Works |
|---|---|---|
| Monitor | Anomaly detection; metric imputation | Proactive anomaly sensing via agentic AI [89] |
| Analyze | Forecasting; bottleneck and root-cause analysis | Multivariate forecasting for proactive analysis [90]; SLO-driven and uncertainty-aware analysis using fuzzy logic [91,92] |
| Plan | Action selection; model predictive control (MPC) | Neural network prediction + evolutionary optimization for capacity planning [21]; deep RL for scaling decisions [81]; fuzzy + RL for adaptive autoscaling [92] |
| Execute | Action ordering/coordination | Agentic AI coordinating execution with human-in-the-loop safeguards [89] |
| Knowledge | Storage of models and histories | SLO-oriented knowledge base for adaptation [91]; evolving knowledge for uncertainty-aware autoscaling [92] |
| Framework | ML Technique | Integration Approach |
|---|---|---|
| Saxena & Singh [21] | Neural network forecasting + Evolutionary optimization | Proactive VM autoscaling and energy-efficient placement; evaluated on Google Cluster Dataset |
| MARLISE [64] | Multi-Agent Deep RL (DQN/PPO) | Independent agents for vertical scaling of individual microservices; implicit coordination via shared environment in edge-cloud deployments |
| FIRM [78] | Hierarchical RL | Service-level and cluster-level coordination; uses tracing for dependency-aware scaling |
| AWARE [65] | RL + Safety Logic | Integrated with Kubernetes scheduler; offline-trained RL with runtime safety checks |
| Platform | Integration Method | Example Work (Reference) |
|---|---|---|
| AWS (EC2 + ALB) | PASS: Predictive auto-scaling for large-scale web applications (forecast-driven horizontal scaling with warmup-aware provisioning and QoS guarantees) | Li et al. (2024)—Deployed system at Tencent using deep learning workload prediction and risk-aware scaling decisions [97] |
| Azure (VMs/Scale Sets) | ML-enhanced autoscale rules | Arabnejad et al. (2017)—Fuzzy Q-learning controller deployed on Azure/OpenStack for cost-SLA optimal VM scaling [52] |
| Azure (PaaS/Serverless) | Platform-managed predictive autoscaling | Poppe et al. (2022)—“Moneyball” predictive scaler for Azure SQL Database (serverless tier), eliminates reactivation latency [96] |
| Kubernetes (HPA) | External predictive metrics fed into HPA | Yu et al. (2019)—Microscaler uses online Bayesian regression to forecast load and drive proactive HPA decisions [25] |
| Kubernetes (Custom Controller) | Fully custom autoscaler replacing HPA | Wu et al. (2019)—Deep RL agent as drop-in HPA replacement for microservices [81]; Rossi et al. (2019)—Q-learning-based custom controller |
| Kubernetes (Hybrid) | Combined horizontal + vertical scaling via RL | Pan et al. (2022)—RL agent selects scale-out/in or resource resize actions [67]; Qiu et al. (2023)—AWARE meta-RL framework dynamically adjusts replicas and resources (16× fewer SLA violations than default) [65] |
| System (Year) | Energy Efficiency Technique | Workload & Environment | SLO Guarantees | Scalability & Context |
|---|---|---|---|---|
| -Serve (2024) [70] | Fine-grained GPU DVFS; model partitioning; speculative scheduling | Deep learning inference (CNNs, Transformers) in homogeneous GPU clusters | Yes—strict SLO preservation via MIAD frequency control | Evaluated on 8-node (16 GPU) cluster; no horizontal scaling or heterogeneity support |
| A proactive energy-aware auto-scaling solution for edge-based infrastructures (2022) [74] | Proactive horizontal autoscaling; energy-aware node selection | Edge computing with heterogeneous nodes | Yes—predictive scaling with 0% failed requests | Simulated up to 500 nodes; focuses on idle/dynamic energy; no device-level DVFS |
| AutoScale (2020) [72] | RL-based execution target selection (edge/cloud/mobile) | Edge inference (mobile devices, cloud offloading) | Yes—latency and accuracy included in RL reward | Per-device RL agents; handles heterogeneity; not cluster-scale |
| Energy-efficient Inference Service of Transformer-based Deep Learning Models on GPUs (2020) [73] | Batch scheduling + GPU DVFS for Transformer inference | Transformer-based NLP inference on GPUs | Partial—allows latency increase for energy savings | Single-node focus; tested on multiple GPU types; no autoscaling or cluster-level control |
| MArk (2019) [71] | Predictive autoscaling; multi-tier provisioning (IaaS + FaaS); batching | ML inference on AWS (image, language models) | Yes—predictive scaling + serverless fallback for SLO compliance | Scales across cloud instances; cost-focused; no energy metrics; supports heterogeneous provisioning |
| Question | Primary Taxonomy Nodes | Key Approaches | Guideline (Q&A Answer) | Open Challenges |
|---|---|---|---|---|
| When does a learned scaler beat a tuned threshold? | Goal, Decision | Threshold policies; supervised prediction; RL-based autoscalers | Learned scalers are most beneficial for complex, non-stationary, and multi-objective workloads; tuned thresholds remain strong baselines for simple, stable cases | Systematic characterisation of regimes where ML reliably outperforms tuned thresholds; standardised baselines and reporting |
| How do actuation delays and telemetry lag change the winner? | Decision, Deployment, Monitor/Knowledge | Reactive vs predictive control; delay-aware policies | Design autoscalers with explicit knowledge of actuation and telemetry delays; use forecasting and conservative scale-in under long delays | Stratified evaluations across delay regimes; explicit delay modelling in learning algorithms and control loops |
| Is diagonal (hybrid) scaling worth the complexity? | Scaling, Control Scope | Horizontal, vertical, hybrid (diagonal) scaling; RL-based hybrid controllers | Use hybrid scaling when both primitives are available and workloads have shifting bottlenecks; otherwise, well-tuned pure strategies may suffice | Multi-resource optimisation under hybrid policies; stability and restart-aware designs; general evidence on benefit vs complexity |
| Centralised vs decentralised control: who scales better as service count grows? | Control Scope, Deployment | Centralised controllers; per-service policies; MARL; hierarchical models | Apply centralised or hierarchical control for tightly coupled services; use decentralised control for large, loosely coupled microservice or edge deployments with light global coordination | Scaling behaviour with service count; MARL reward design and coordination; observability and debugging in distributed autoscaling |
| What is the cost of safety and multi-objective guarantees? | Goal, Decision, Deployment | Safe/constrained RL; external guardrails; multi-objective optimisation; carbon-aware policies | Quantify SLO, cost, and energy trade-offs; treat safety as a first-class objective and report Pareto frontiers rather than single operating points | Quantifying the “cost of safety”; dynamic movement along Pareto frontiers; integrating pricing, penalties, and carbon targets into robust ML controllers |
| What are the challenges in achieving energy-efficient autoscaling? | Goal, Decision, Deployment | Device-level DVFS; predictive autoscaling; energy-aware scheduling; RL-based execution scaling | Combine device-level power management with cluster-level autoscaling; report energy–performance trade-offs explicitly; evaluate across heterogeneous hardware and dynamic workloads | Integration of vertical and horizontal scaling; handling hardware heterogeneity; dynamic energy-aware control under workload and system variability; transparent reporting of energy trade-offs and carbon impact |
| Direction | Focus Area | Main Open Issues |
|---|---|---|
| Unified ML-Orchestration Frameworks | Integrated monitoring, prediction, and scaling across tiers; extended MAPE-K loops | Co-ordination of multi-tier scaling actions under shared bottlenecks and delays; design of ML-driven control planes with clear interfaces and guarantees |
| Edge Intelligence and Decentralised Scaling | Hierarchical control for edge–cloud; decentralised and federated updates | Robust operation under heterogeneous delays, bandwidth limits, and partial observability; partitioning of control between edge and cloud |
| Multi-Agent and Federated Learning | MARL for service-level scaling; federated learning for cross-cluster policy sharing | Reward design and coordination that align local agents with global objectives; stability and adaptability under non-stationary workloads |
| Standardised Benchmarks and Evaluation | Common workloads, metrics, and reproducible simulation and deployment environments | Community-agreed benchmark suites, baselines, and reporting guidelines; long-term maintenance of artefacts and reference implementations |
| Sustainability and Green Autoscaling | Carbon-aware autoscaling; multi-objective optimisation including energy and emissions | Joint optimisation of cost, performance, and carbon footprint; modelling and exploiting temporal/spatial flexibility under realistic constraints |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Machiraju, V.S.; Kumar, V.; Sharma, S. ML-Based Autoscaling for Elastic Cloud Applications: Taxonomy, Frameworks, and Evaluation. Math. Comput. Appl. 2026, 31, 49. https://doi.org/10.3390/mca31020049
Machiraju VS, Kumar V, Sharma S. ML-Based Autoscaling for Elastic Cloud Applications: Taxonomy, Frameworks, and Evaluation. Mathematical and Computational Applications. 2026; 31(2):49. https://doi.org/10.3390/mca31020049
Chicago/Turabian StyleMachiraju, Vishwanath Srikanth, Vijay Kumar, and Sahil Sharma. 2026. "ML-Based Autoscaling for Elastic Cloud Applications: Taxonomy, Frameworks, and Evaluation" Mathematical and Computational Applications 31, no. 2: 49. https://doi.org/10.3390/mca31020049
APA StyleMachiraju, V. S., Kumar, V., & Sharma, S. (2026). ML-Based Autoscaling for Elastic Cloud Applications: Taxonomy, Frameworks, and Evaluation. Mathematical and Computational Applications, 31(2), 49. https://doi.org/10.3390/mca31020049

