You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

27 November 2023

An Autoscaling System Based on Predicting the Demand for Resources and Responding to Failure in Forecasting

and
Department of Computer Science and Engineering, Dongguk University, Seoul 04620, Republic of Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Innovations in AI and ML-Based Techniques for Image and Video Analysis from Sensors

Abstract

In recent years, the convergence of edge computing and sensor technologies has become a pivotal frontier revolutionizing real-time data processing. In particular, the practice of data acquisition—which encompasses the collection of sensory information in the form of images and videos, followed by their transmission to a remote cloud infrastructure for subsequent analysis—has witnessed a notable surge in adoption. However, to ensure seamless real-time processing irrespective of the data volume being conveyed or the frequency of incoming requests, it is vital to proactively locate resources within the cloud infrastructure specifically tailored to data-processing tasks. Many studies have focused on the proactive prediction of resource demands through the use of deep learning algorithms, generating considerable interest in real-time data processing. Nonetheless, an inherent risk arises when relying solely on predictive resource allocation, as it can heighten the susceptibility to system failure. In this study, a framework that includes algorithms that periodically monitor resource requirements and dynamically adjust resource provisioning to match the actual demand is proposed. Under experimental conditions with the Bitbrains dataset, setting the network throughput to 300 kB/s and with a threshold of 80%, the proposed system provides a 99% performance improvement in terms of the autoscaling algorithm and requires only 0.43 ms of additional computational overhead compared to relying on a simple prediction model alone.

1. Introduction

In recent years, the convergence of edge computing and sensor technologies has emerged as a major frontier for the augmentation of real-time data processing. This includes applications in which the effective processing of data collected from a multitude of sensors is of the utmost importance, encompassing various fields, such as transportation, healthcare, and emergency response, all of which are based on 6G communication []. One example is the use of intelligent traffic control systems, wherein high-speed autonomous vehicles on highways require real-time interaction with their surroundings, other vehicles, and humans. Unfortunately, a considerable portion of the existing artificial intelligence (AI)- and cloud-based research centers for static data processing have failed to address the requirements of real-time processing. Recent studies have shown that AI solutions implemented in edge networks—wherein intelligent prediction, reasoning, and decision-making tasks are performed—can decrease service response times and provide genuine real-time services [,,].
As the demand for these applications increases, problems related to resource allocation and potential resource overload have become evident [,]. The basic premise of this technology is to facilitate rapid response times by localizing data processing near the Internet of Things (IoT) devices. However, the increased demand and potential resource overload can lead to serious problems, such as increased response latency and system failure.
One way to solve this problem is through container orchestration. Kubernetes is the most popular container orchestration platform and offers the horizontal pod autoscaling method, designed to dynamically fine-tune resource allocation based on current utilization metrics []. Nonetheless, this post-processing method involves time-consuming state validation and the creation or removal of resources when predefined thresholds are exceeded, which can introduce system-overload-related problems. Concurrently, research efforts have explored the use of deep learning techniques for the proactive prediction of resource allocation [,,,,,]. Most of these studies engage in time-series forecasting by leveraging log data such as HTTP requests and CPU utilization. A major benefit of this predictive approach is that resource tuning can be executed proactively, thereby mitigating the risk of overload. However, it is crucial to acknowledge that resource overload problems can still exist if the prediction model is unstable.
This paper presents a system architecture for the operationalization of this methodology within a real container orchestration system. To address these problems for effective resource provisioning, we propose a novel resource demand prediction method that combines deep-learning-based forecasting with subsequent monitoring and dynamic resource adjustment to overcome forecasting failures. This predictive capability aims to preemptively adjust resources in response to evolving demands, thereby ensuring efficient resource utilization within the dynamic landscape of IoT-enabled edge computing. In other words, the proposed system aims to pre-emptively forecast the resource demand to facilitate proactive resource allocation before overload scenarios manifest. The proposed system provides a basic predictive capability by forecasting the future resource demand based on historical time-series data using deep learning algorithms combined with periodic inspection phases, to reduce the uncertainty inherent in predictive modeling. These inspection phases enable dynamic adjustments to be made if the initially allocated resources are insufficient to meet the actual demand.
The contributions of this research can be summarized as follows:
(1)
The development of a resource forecasting method that enables proactive resource adjustments before overload situations occur;
(2)
An intermediate inspection process based on network throughput for adjustment when the actual demand differs from deep-learning-based forecasts;
(3)
A 99% improvement in the elastic speedup value through actual system implementation by applying the proposed methodology in an authentic container orchestration environment, under experimental conditions with the Bitbrains dataset, setting the network throughput to 300 kB/s and using a threshold of 80%.
The remainder of this paper is organized as follows. Section 2 reviews related research, both in terms of resource prediction and post-processing methods. In Section 3, we present our proposed methodology and elaborate on each step of the method. Section 4 provides insights into the experiments conducted to evaluate the proposed method, along with an analysis of the experimental results. Finally, Section 5 concludes the study and summarizes the key findings and implications.

3. Methods

3.1. System Architecture

Figure 3 shows the system architecture and demonstrates the application of the proposed model in a practical container orchestration system. Metrics such as network reception and memory usage are collected from a managed cluster through collectors and sent to a time-series database, such as Prometheus []. The collected data are then transmitted to an autoscaler located outside the cluster. At this stage, the proposed system predicts and plans resource demand adjustments. Following this phase, adjustments are made based on the established plan. When configuring the prediction and inspection cycles, it is essential to consider factors such as the workload of the managed cluster and the time required for implementation.
Figure 3. Structure of the overall autoscaling system.
If the cycle duration is set too short, frequent incorrect adjustments can occur owing to rapid fluctuations. Moreover, there is a risk of making readjustments before observing the effects of previous adjustments []. To address these concerns, there is a waiting period after executing the adjustments that allows for a brief pause before initiating inspections. After resource adjustments are made during the planning phase, there is a one-minute interval during which no further action is taken. The prediction cycle occurs every five minutes, with the timing based on the reference point following the waiting period after the initial prediction. Over the subsequent five minutes leading up to the next prediction, the proposed system engages in an interim inspection, and resource adjustments are performed if deemed necessary. This interim inspection process occurs every minute. Even if actual adjustments are enacted during this phase, another waiting period is incorporated, allowing time for the effects of these adjustments to materialize.
Figure 4 shows a flowchart illustrating the system’s operation. Initially, information on server usage is collected from the metric server. If at least 5 min elapse, predictions are made; otherwise, inspections are performed. Subsequently, the required number of VMs is calculated, and a plan for system adjustments is formulated. Finally, the system performs modifications according to the plan and concludes its operation.
Figure 4. Flow chart for the entire system.

3.2. Forecasting Stage

During the forecasting stage, the proposed system performs periodic workload predictions. In this study, the Bi-LSTM model was selected for our prediction model, as described in Section 2.3. The model leverages both antecedent and subsequent data at the present time step. This unique approach permits the use of future information, thereby contributing to enhanced prediction accuracy.
Upon the collection and incorporation of metrics into the system, the system harnesses our pretrained model to formulate predictions. These predictions serve as the foundation for subsequent actions. Specifically, the planning algorithm (elaborated upon in Section 3.4) computes the requisite quantity of resources essential for accommodating the anticipated workload guided by the predicted outcomes. This seamless integration of forecasting and resource planning is a pivotal strategy that enables the judicious allocation of resources and the optimization of system performance.

3.3. Proposed Interim Inspection Stage

Relying solely on a prediction algorithm would be ideal in the pursuit of flawless predictions. However, these aspirations are often unattainable. Hence, an interim inspection algorithm was introduced, as presented in Algorithm 1, assuming the role of assessing whether the existing quantity of resources was sufficient to manage the current network reception input. This algorithm draws inspiration from Kubernetes’ horizontal pod autoscaling algorithm and hinges on the assessment of the ratio between the desired and prevailing metric values. Moreover, the proposed algorithm integrates considerations for the maximum network throughput of a VM and threshold values to ensure adaptability to fluctuations in the network throughput attributed to VM performance variations, thereby facilitating the calculation of an appropriate network throughput value consistent with the defined criteria.
In the proposed interim inspection phase, four essential inputs are required—the  i n p u t  denoting the presently received network throughput;  n o w v m  denoting the number of currently active VMs;  t h r e s h o l d  denoting the VM utilization limit; and  n e t w o r k m a x  denoting the maximum network throughput achievable on a VM. Additionally, the system computes  n e t w o r k , which denotes the network throughput constraints. This value is derived by multiplying the maximum network throughput by the  t h r e s h o l d  value. Subsequently, the requisite number of VMs can be ascertained through the following formula:  r e q u i r e d v m = c e i l [ n o w v m × ( i n p u t / n e t w o r k ) ] . If the computed number of required resources deviates from the current resource count (i.e.,  n o w v m ), it serves as an indicator of the potential scaling requirements, either upscaling or downscaling. In such cases, the algorithm triggers the planning function and initiates a new planning phase in alignment with the observed resource requirements.
Algorithm 1 Interim Inspection
   Input:   i n p u t //Currently received network throughput
    n o w v m //Number of VMs currently in use
    t h r e s h o l d //Limit of VM utilization
    m i n v m //Number of minimum VMs
    n e t w o r k m a x //Maximum network throughput on a VM
   Output: Scheduled planning actions
1:  I n i t i a l i z a t i o n
2:  n e t w o r k = n e t w o r k m a x     t h r e s h o l d
3:  r e q u i r e d v m = c e i l [ n o w   v m   ( i n p u t / n e t w o r k ) ]
4:  i f   r e q u i r e d   v m   i s   n o t   n o w v m t h e n
5:     P l a n n i n g ( i n p u t ,   n o w v m )
6:  e l s e
7:      S l e e p
8:  e n d   i f  
Before delving to the planning phase, it is important to clarify the underlying rationale behind the adoption of periodic predictions and interim inspection procedures. As listed in Table 2, the time required to conduct periodic predictions and interim inspections can be quantified. Considering the time spent on these two activities while excluding the fixed time allotted for resource adjustments, there is a substantial contrast. Periodic predictions, conducted at 5-min intervals, require approximately 19.46 ms, whereas interim inspections require a mere 0.4528 ms (a 43× difference). This stark disparity underscores the efficiency achieved by incorporating interim inspections in tandem with periodic predictions. Nonetheless, it is important to acknowledge that both time intervals fall within the one-second range, rendering any noticeable difference in practical terms negligible. Moreover, the impetus behind instituting interim inspection lies in the intrinsic divergence between the prediction outcomes and actual resource demand. As discussed in the forthcoming Section 4.4, the prediction results often exhibit disparities, particularly during periods characterized by heightened demand. Consequently, even if predictions are made at frequent intervals, their consistent inadequacy, particularly during high-demand scenarios, can lead to an amplified overhead associated with resource adjustments. Consequently, the expected benefits of resource preparation remain unknown.
Table 2. Comparison of the time between prediction and interim inspection.
However, a paradigm shift can be accomplished by subjecting the actual resource demand to interim inspection and subsequently executing resource adjustments solely when warranted. This provides an avenue for the optimization of resource management by mitigating waste and promoting efficient resource utilization. While an ideal scenario would entail the accurate provisioning of all resources grounded solely in perfect predictions, reality underscores the fallibility of predictions. In this context, the incorporation of an interim inspection process not only enhances the prediction accuracy but also augments the resource efficiency. This, in turn, provides more fitting predictions for the actual resource demand, thereby fortifying the effectiveness of resource management.

3.4. Planning Stage

Subsequent to the execution of forecasting or post-processing operations, periodic verification of the actual resource demand is of great importance. In this context, Algorithm 2 serves as a manifestation of the planning phase and is dedicated to ascertaining the requisite number of resources, which is a prerequisite for both prediction and periodic verification scenarios.
Within the planning stage, a set of four vital inputs is mandated—the  i n p u t  denoting the anticipated network workload derived from the prediction;  n o w _ v m  denoting the prevailing count of active VMs;  t h r e s h o l d  denoting the threshold value governing VM utilization; and  n e t w o r k m a x  denoting the upper limit of the network throughput achievable on a VM.
Algorithm 2 Planning
   Input:  i n p u t //predicted network workload
    n o w v m //Number of VMs currently in use
    t h r e s h o l d //Limit of VM utilization
    m i n v m //Number of minimum VMs
    n e t w o r k m a x //Maximum network throughput on a VM
   Output: Scheduled scaling actions
1:  I n i t i a l i z a t i o n
2:  n e t w o r k = n e t w o r k m a x     t h r e s h o l d
3:  n e x t v m = c e i l [ m i n   v m   ( i n p u t / n e t w o r k ) ]
4:  i f   n e x t   v m > n o w v m   t h e n
5:    S C A L E _ U P ( n e x t v m )
6:  e l s e   i f   n e x t v m < n o w v m   t h e n
7:    S C A L E _ D O W N ( n e x t v m )
8:  e l s e
9:    S l e e p
10:  e n d   i f  
In Line 1, the proposed system computes  n e t w o r k , denoting the network throughput constraint, computed as the product of the maximum network throughput and the  t h r e s h o l d  value. In Line 2, the essential number of VMs is determined using the following formula:   n e x t v m = c e i l [ n o w v m × ( i n p u t / n e t w o r k ) ] . In Lines 3–6, the calculated number of required resources diverges from the current resource count (i.e.,  n o w v m ), and the system promptly initiates either upscaling or downscaling, depending on the prevailing state of the system.
In Line 7, if the calculated figure aligns with the existing resource count, the system maintains its current resource allocation configuration. This planning algorithm exemplifies the adaptability of the system, ensuring the dynamic adjustment of resource allocation to harmonize with forecasted or actual workloads. This adaptive approach culminates in the optimization of resource utilization and augments the overall performance of the system.

4. Results

4.1. Dataset

The GWA-T-12 Bitbrains dataset [], generously provided by Bitbrains, served as the primary data source for both training and testing in this study. This extensive dataset encompasses performance metrics gathered from 1750 VMs situated within the distributed data centers of Bitbrains. The dataset is bifurcated into two distinct files—namely, the “FastStorage” and “Rnd” files. The “FastStorage” file comprises data originating from VMs tasked with hosting frequently accessed programs, while the “Rnd” file encapsulates data emanating from less frequently accessed and lower-performance VMs. Notably, for the scope of this study, the “FastStorage” dataset was employed exclusively.
The dataset comprised a diverse array of performance metrics encompassing parameters such as CPU usage, CPU utilization, memory usage, disk read throughput, disk write throughput, network reception throughput, and network transmission throughput. In the context of model training, the memory usage and network reception throughput metrics were selected. This selection was underpinned by the highest Pearson correlation coefficient, denoting a robust and substantial relationship with the target variable []. Prior to commencing model training, the dataset was normalized, standardizing its values to fall within the [0, 1] range, thereby ensuring uniformity in the data distribution. Subsequently, the dataset was partitioned, allocating 80% to train the model while reserving the remaining 20% for the critical task of evaluating the model’s performance. This partitioning strategy enabled the assessment of the model’s capacity to effectively generalize unknown data instances.

4.2. Experiment Details

First, a training phase was initiated to develop a predictive model. For training purposes, we selected the network received throughput and memory usage as they exhibited the highest correlation value, and the training process employed a Bi-LSTM model, following the tuned hyper-parameter specific configuration detailed in Table 3 []. Initially, the training was scheduled to encompass 100 epochs; however, to optimize the efficiency, we introduced an early stopping mechanism. When triggered, this mechanism ceased the training process if it discerned no significant improvement, thereby conserving the computational resources. This strategy is particularly advantageous for low-performance VMs.
Table 3. Multi-bi-LSTM model configuration.
Second, we conducted a comparative analysis of the outcomes obtained by implementing the proposed algorithm on the trained Bi-LSTM model and those derived without its application. To facilitate this assessment, we leveraged system-oriented elasticity metrics tailored to gauge the efficiency of resource provisioning. During the prediction phase, coupled with an interim inspection, a cooldown period of 1 min was imposed after any resource adjustment. This precautionary measure aimed to forestall additional modifications before the effects of the preceding actions became discernible. Additionally, predictions occurred at 5-min intervals, with interim inspection routines executed every 1 min.
Conversely, in the prediction phase without interim inspection, predictions transpired every minute if no resource adjustments occurred. In the non-autoscaling scenario, the number of autoscalers was fixed at 2, with a specified network throughput threshold per VM set at 0.8. The maximum network throughput was 300 kB/s. Notably, the network number was intentionally kept minimal, accentuating the magnitude of fluctuations in the required number of VMs, thus facilitating the performance assessment.

4.3. Evaluation Metrics

To gauge the efficacy of resource provisioning, we employed the efficiency criteria prescribed by the Standard Performance Evaluation Corporation (SPEC) research organization [], with the computation of efficiency metrics conducted in accordance with Equations (6)–(10).
θ u = 100 T Σ t = 1 T m a x d n p n , 0 d n Δ t
θ o = 100 T Σ t = 1 T m a x p n d n , 0 d n Δ t
τ u = 100 T Σ t = 1 T m a x ( s g n d n p n , 0 ) Δ t
τ o = 100 T Σ t = 1 T m a x ( s g n p n d n , 0 ) Δ t
ϵ n = θ U , n θ U , a · θ O , n θ O , a · τ U , n τ U , a · τ O , n τ O , a 1 4
The under-provisioning resource metric ( Θ u ) serves as an indicator of the extent to which the allocated resources fall short of fulfilling the genuine demands of the system. Essentially, it quantifies the frequency at which a system encounters resource deficiencies, leading to performance degradation. A higher  Θ u  value signifies a higher incidence of under-provisioning, potentially resulting in bottlenecks or reduced user satisfaction. Conversely, the over-provisioning resource metric ( Θ o ) assesses the degree to which resources are allocated in excess of the system’s actual requirements. This metric sheds light on scenarios in which resources are wasted because of overallocation. A higher  Θ o   value indicates a prevalence of over-provisioning, leading to suboptimal resource utilization.
The metrics denoted as “duration the system is under-provisioned” ( τ u ) quantify the cumulative period during which the system operates in an under-provisioned state. This measurement encapsulates the total time required by the system to cope with insufficient resource allocation, potentially resulting in performance bottlenecks and user dissatisfaction. By contrast, the metrics labeled “duration the system is over-provisioned” ( τ o ) calculate the cumulative period during which the system contends with an excess of resources. This metric highlights instances of resource inefficiency and suboptimal resource utilization.
Elastic speedup ( ϵ n ) denotes a metric devised to assess the performance enhancement of a specific approach relative to a scenario without autoscaling. This metric is particularly valuable in comparing two distinct strategies—that is, one incorporating autoscaling (denoted as  a ) and another devoid of autoscaling (referred to as  n ). The objective is to quantify the degree to which autoscaling improves the performance in comparison with a non-autoscaling scenario. By evaluating the system-oriented metrics such as  Θ u Θ o τ u , and  τ o  of method “ a ”, we can assess its efficiency relative to the baseline “ n ” scenario. A fundamental step involves computing the geometric mean of the ratios between the paired metrics. A value exceeding one indicates that the proposed method surpasses the non-autoscaling scenario, signifying a positive performance gain. Conversely, a value below 1 suggests that the proposed method underperforms compared to the non-autoscaling scenario.

4.4. Experimental Results

4.4.1. Prediction Performance Results

The training progress is visually represented in Figure 5, which shows a consistent reduction in loss as the training epochs advance. To provide additional insight into the performance of the model, Figure 6 shows the predicted workload when subjected to testing using the trained model. Although the predicted workload is consistent with the underlying trend, instances of prediction failure can be highlighted, particularly when the incoming workload reaches elevated levels.
Figure 5. Model loss for the training phase.
Figure 6. Predicted workload tested against the trained model.

4.4.2. Autoscaling Results and Comparison

Figure 7 demonstrates the results of the proposed approach that combines forecasting and inspection, in comparison to the approach that solely employs simplistic forecasting. The result expresses the number of VMs with three colored lines, blue, red, and green, and the blue line represents the actual number of VMs. Specifically, the red line illustrates the number of VMs acquired through the application of the autoscaling model coupled with our interim inspection algorithm. By contrast, the green line shows the VM count achieved exclusively through the application of the autoscaling model, excluding the inspection algorithm. These visualizations provide insights into the impact of the interim inspection algorithm on the performance of the autoscaling model.
Figure 7. Estimated number of virtual machines with autoscaling: comparison between ground truth, autoscaling with proposed post-processing algorithm, and autoscaling by forecasting alone algorithm.
Additionally, Table 4 provides a comprehensive comparative analysis of three distinct methodologies—that is, no autoscaling, autoscaling without interim inspection, and autoscaling with the interim inspection algorithm. A thorough examination of Table 4 reveals that autoscaling using the proposed algorithm outperforms the alternative methods. This superiority is underscored by the smaller over-provisioning metric values compared with the other methods. Furthermore, it boasts superior elastic speedup ( ϵ n ) values, with a notable increase of 99% (2.31 compared to 1.16). This outcome accentuates the fact that the autoscaling method using the proposed algorithm achieves a more substantial autoscaling gain than the version lacking the algorithm. The observed disparities in the under-provisioning values within the context of autoscaling without the proposed algorithm can be ascribed to the inclination toward low-workload predictions. Moreover, when estimating the number of VMs to accurately represent the ever-changing real-time demand, the cooling-down period is not considered. Consequently, this scenario exhibits higher volatility in its values than the other scenarios, leading to inferior outcomes.
Table 4. Estimation of autoscaling model with and without proposed post-processing algorithm.

5. Discussion

The use of predictive models for autoscaling in real-time latency-sensitive IoT environments is important. This study leveraged a deep learning model to predict resource demands with the added implementation of intermediate verification steps in the event of prediction failures. This approach aims to mitigate the limitations of relying solely on predictive systems.
However, there are several problems associated with the methods described. First, the dataset used in this study did not represent applications that were heavily reliant on GPU and CPU resources. The data used in this study primarily pertained to the correlation between network throughput and memory usage, which were the most significant factors. If the dataset had encompassed applications that predominantly relied on GPU or CPU resources, adjustments to the algorithm and the choice of appropriate metrics would have been necessary. However, this issue is not a factor that significantly changes the proposed framework; it simply means that there may be discussions on evaluation methods for key metrics at each stage. Therefore, one can expect future improvements to methods that are effective even for datasets that rely on GPU and CPU resources.
Second, the autoscaling system in this study only considered scenarios in which resources were 100% available. It did not address the cases in which certain VMs had limited availability. Consequently, methods of calculating the overall system capacity by assessing the availability of VMs are required. Thus, the current system can adequately adjust resource allocation (even as resource availability increases), but further enhancements are essential to improve the efficiency and stability.

6. Conclusions

In this study, we introduced and implemented an interim inspection algorithm to mitigate the disparities between the predicted and actual demand. Following the prediction phase, this algorithm periodically assessed the demand, and, if disparities emerged, post-processing solutions were initiated. Notably, the proposed algorithm operated with a time requirement that was approximately 43 times shorter than the prediction process itself. By extending the prediction cycle and integrating the algorithm within an extended timeframe, the overhead imposed on the system was greatly reduced. Furthermore, this study recognized that exclusive reliance on prediction models for demand estimation could result in errors. To address this concern, an interim inspection was conducted to identify and rectify these errors. Notably, when prediction was combined with interim inspection, there was a substantial 99% improvement compared to the method without the proposed algorithm. When equipped with the proposed algorithm, the efficiency of the system surpassed that of systems that lacked optimization measures. Using this methodology, we aim to achieve the efficient provisioning of server resources for tasks that demand seamless real-time operations.
It is important to emphasize that this study primarily focused on assessing the overall demand for the entire cluster and performed post-processing accordingly. However, this study did not explicitly account for variations in demand among individual nodes within a cluster. Consequently, it did not address scenarios in which specific high-demand nodes might encounter resource allocation challenges. Future research should explore strategies that incorporate per-node demand considerations, thereby enabling more precise prediction and post-processing techniques to address demand variations at the node level.

Author Contributions

Conceptualization, J.P. and J.J.; methodology, J.P. and J.J.; software, J.P.; experiment J.P.; validation, J.P.; formal analysis, J.P.; investigation, J.P.; resources, J.P.; data curation, J.P.; writing—original draft preparation, J.P. and J.J.; writing—review and editing, J.P. and J.J.; visualization, J.P.; supervision, J.J.; project administration, J.J.; funding acquisition, J.J. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Dongguk University Research Fund of 2022 (S-2022-G0001-00070) and by a National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2018R1A5A7023490) and a Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2021R1F1A1061514).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Publicly available datasets were used in this study. The datasets can be found at GWA-T-12-Bitbrains (http://gwa.ewi.tudelft.nl/datasets/gwa-t-12-bitbrains, accessed on 19 August 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Khanh, Q.V.; Chehri, A.; Quy, N.M.; Han, N.D.; Ban, N.T. Innovative trends in the 6G era: A comprehensive survey of architecture, applications, technologies, and challenges. IEEE Access 2023, 11, 39824–39844. [Google Scholar]
  2. Dong, L.; Yang, Z.; Cai, X.; Zhao, Y.; Ma, Q.; Miao, X. WAVE: Edge-device cooperated real-time object detection for open-air applications. IEEE Trans. Mob. Comput. 2022, 22, 4347–4357. [Google Scholar] [CrossRef]
  3. Han, H.-Y.; Chen, Y.-C.; Hsiao, P.-Y.; Fu, L.-C. Using channel-wise attention for deep CNN based real-time semantic segmentation with class aware edge information. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1041–1051. [Google Scholar] [CrossRef]
  4. Xu, X.; Li, H.; Xu, W.; Liu, Z.; Yao, L.; Dai, F. Artificial intelligence for edge service optimization in Internet of Vehicles: A survey. Tsinghua Sci. Technol. 2022, 27, 270–287. [Google Scholar] [CrossRef]
  5. Dang-Quang, N.M.; Yoo, M. An efficient multivariate autoscaling framework using Bi-LSTM for cloud computing. Appl. Sci. 2022, 12, 3523. [Google Scholar] [CrossRef]
  6. Scaling Cooldowns for Amazon EC2 Auto Scaling. Available online: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-scaling-cooldowns.html (accessed on 17 August 2023).
  7. Horizontal Pod Autoscaler. Available online: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/ (accessed on 17 August 2023).
  8. Prachitmutita, I.; Aittinonmongkol, W.; Pojjanasuksakul, N.; Supattatham, M.; Padungweang, P. Auto-scaling microservices on IaaS under SLA with cost-effective framework. In Proceedings of the 2018 Tenth International Conference on Advanced Computational Intelligence (ICACI), Xiamen, China, 29–31 March 2018; pp. 583–588. [Google Scholar]
  9. Mahmoud, I.; Imtiaz, A.M.G.A. Machine Learning-Based Autoscaling for Containerized Applications. Neural Comput. Appl. 2019, 32, 9745–9760. [Google Scholar]
  10. Zhu, Y.; Zhang, W.; Chen, Y.; Gao, H. A novel approach to workload prediction using attention-based LSTM Encoder-decoder network in cloud environment. EURASIP J. Wirel. Commun. Netw. 2019, 2019, 274. [Google Scholar] [CrossRef]
  11. Dang-Quang, N.M.; Yoo, M. Deep-learning-based autoscaling using bidirectional long short-term memory for Kubernetes. Appl. Sci. 2022, 11, 3835. [Google Scholar] [CrossRef]
  12. Xu, M.; Song, C.; Wu, H.; Gill, S.S.; Ye, K.; Xu, C. esDNN: Deep Neural network based multivariate workload prediction in cloud computing environments. ACM Trans. Internet Technol. 2022, 22, 1–24. [Google Scholar] [CrossRef]
  13. Schuster, M.; Paliwal, K.K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
  14. Sepp, H.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar]
  15. Arlitt, M.; Jin, T. 1998 World Cup Web Site Access Logs. Available online: http://www.acm.org/sigcomm/ITA/ (accessed on 17 August 2023).
  16. Chen, W.; Ye, K.; Wang, Y.; Xu, G.; Xu, C. How does the workload look like in production cloud? Analysis and clustering of workloads on Alibaba Cluster Trace. In Proceedings of the 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS), Singapore, 11–13 December 2018; pp. 102–109. [Google Scholar]
  17. Dinda, P.; O’Hallaron, D. An Evaluation of Linear Models for Host Load Prediction. TO APPEAR in HPDC ‘99. Available online: http://www.cs.cmu.edu/~pdinda/LoadTraces/ (accessed on 17 August 2023).
  18. Arlitt, M.; Williamson, C. Web server workload characterization: The search for invariants. In Proceedings of the 1996 ACM SIGMETRICS Conference on the Measurement and Modeling of Computer Systems, Philadelphia, PA, USA, 23–26 May 1996. [Google Scholar]
  19. John, W. More Google Cluster Data. Google Research Blog. Available online: http://googleresearch.blogspot.com/2011/11/more-google-cluster-data.html (accessed on 17 August 2023).
  20. Shen, S.; Beek, V.V.; Iosup, A. Statistical characterization of business-critical workloads hosted in cloud datacenters. In Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, China, 4–7 May 2015; pp. 465–474. [Google Scholar]
  21. Soukaina, O.; Hadi, Y.; Ullah, A. An efficient forecasting approach for resource utilization in cloud data center using CNN-LSTM model. Neural Comput. Appl. 2022, 33, 10043–10055. [Google Scholar]
  22. Phan, L.A.; Kim, T. Traffic-aware horizontal pod autoscaler in Kubernetes-based edge computing infrastructure. IEEE Access 2022, 10, 18966–18977. [Google Scholar]
  23. Prometheus. Available online: https://prometheus.io/ (accessed on 17 August 2023).
  24. Bauer, A.; Grohmann, J.; Herbst, N.; Kounev, S. On the value of service demand estimation for auto-scaling. In Proceedings of the 19th International GI/ITG Conference on Measurement, Modelling and Evaluation of Computing Systems, Erlangen, Germany, 26–28 February 2018; pp. 142–156. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.