MDPI - Publisher of Open Access Journals

32 pages, 2917 KiB

Open AccessArticle

Self-Adapting CPU Scheduling for Mixed Database Workloads via Hierarchical Deep Reinforcement Learning

by Suchuan Xing, Yihan Wang and Wenhe Liu

Symmetry 2025, 17(7), 1109; https://doi.org/10.3390/sym17071109 - 10 Jul 2025

Viewed by 328

Modern database systems require autonomous CPU scheduling frameworks that dynamically optimize resource allocation across heterogeneous workloads while maintaining strict performance guarantees. We present a novel hierarchical deep reinforcement learning framework augmented with graph neural networks to address CPU scheduling challenges in mixed database [...] Read more.

Modern database systems require autonomous CPU scheduling frameworks that dynamically optimize resource allocation across heterogeneous workloads while maintaining strict performance guarantees. We present a novel hierarchical deep reinforcement learning framework augmented with graph neural networks to address CPU scheduling challenges in mixed database environments comprising Online Transaction Processing (OLTP), Online Analytical Processing (OLAP), vector processing, and background maintenance workloads. Our approach introduces three key innovations: first, a symmetric two-tier control architecture where a meta-controller allocates CPU budgets across workload categories using policy gradient methods while specialized sub-controllers optimize process-level resource allocation through continuous action spaces; second, graph neural network-based dependency modeling that captures complex inter-process relationships and communication patterns while preserving inherent symmetries in database architectures; and third, meta-learning integration with curiosity-driven exploration enabling rapid adaptation to previously unseen workload patterns without extensive retraining. The framework incorporates a multi-objective reward function balancing Service Level Objective (SLO) adherence, resource efficiency, symmetric fairness metrics, and system stability. Experimental evaluation through high-fidelity digital twin simulation and production deployment demonstrates substantial performance improvements: 43.5% reduction in p99 latency violations for OLTP workloads and 27.6% improvement in overall CPU utilization, with successful scaling to 10,000 concurrent processes maintaining sub-3% scheduling overhead. This work represents a significant advancement toward truly autonomous database resource management, establishing a foundation for next-generation self-optimizing database systems with implications extending to broader orchestration challenges in cloud-native architectures. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

19 pages, 1008 KiB

Open AccessArticle

On the Analysis of Inter-Relationship between Auto-Scaling Policy and QoS of FaaS Workloads

by Sara Hong, Yeeun Kim, Jaehyun Nam and Seongmin Kim

Sensors 2024, 24(12), 3774; https://doi.org/10.3390/s24123774 - 10 Jun 2024

Cited by 1 | Viewed by 1807

Abstract

A recent development in cloud computing has introduced serverless technology, enabling the convenient and flexible management of cloud-native applications. Typically, the Function-as-a-Service (FaaS) solutions rely on serverless backend solutions, such as Kubernetes (K8s) and Knative, to leverage the advantages of resource management for [...] Read more.

A recent development in cloud computing has introduced serverless technology, enabling the convenient and flexible management of cloud-native applications. Typically, the Function-as-a-Service (FaaS) solutions rely on serverless backend solutions, such as Kubernetes (K8s) and Knative, to leverage the advantages of resource management for underlying containerized contexts, including auto-scaling and pod scheduling. To take the advantages, recent cloud service providers also deploy self-hosted serverless services by facilitating their on-premise hosted FaaS platforms rather than relying on commercial public cloud offerings. However, the lack of standardized guidelines on K8s abstraction to fairly schedule and allocate resources on auto-scaling configuration options for such on-premise hosting environment in serverless computing poses challenges in meeting the service level objectives (SLOs) of diverse workloads. This study fills this gap by exploring the relationship between auto-scaling behavior and the performance of FaaS workloads depending on scaling-related configurations in K8s. Based on comprehensive measurement studies, we derived the logic as to which workload should be applied and with what type of scaling configurations, such as base metric, threshold to maximize the difference in latency SLO, and number of responses. Additionally, we propose a methodology to assess the scaling efficiency of the related K8s configurations regarding the quality of service (QoS) of FaaS workloads. Full article

(This article belongs to the Special Issue Edge Computing in Internet of Things Applications)

► Show Figures

Figure 1

28 pages, 845 KiB

Open AccessArticle

SLA-Adaptive Threshold Adjustment for a Kubernetes Horizontal Pod Autoscaler

by Olesia Pozdniakova, Dalius Mažeika and Aurimas Cholomskis

Electronics 2024, 13(7), 1242; https://doi.org/10.3390/electronics13071242 - 27 Mar 2024

Cited by 4 | Viewed by 1950

Abstract

Kubernetes is an open-source container orchestration system that provides a built-in module for dynamic resource provisioning named the Horizontal Pod Autoscaler (HPA). The HPA identifies the number of resources to be provisioned by calculating the ratio between the current and target utilisation metrics. [...] Read more.

Kubernetes is an open-source container orchestration system that provides a built-in module for dynamic resource provisioning named the Horizontal Pod Autoscaler (HPA). The HPA identifies the number of resources to be provisioned by calculating the ratio between the current and target utilisation metrics. The target utilisation metric, or threshold, directly impacts how many and how quickly resources will be provisioned. However, the determination of the threshold that would allow satisfying performance-based Service Level Objectives (SLOs) is a long, error-prone, manual process because it is based on the static threshold principle and requires manual configuration. This might result in underprovisioning or overprovisioning, leading to the inadequate allocation of computing resources or SLO violations. Numerous autoscaling solutions have been introduced as alternatives to the HPA to simplify the process. However, the HPA is still the most widely used solution due to its ease of setup, operation, and seamless integration with other Kubernetes functionalities. The present study proposes a method that utilises exploratory data analysis techniques along with moving average smoothing to identify the target utilisation threshold for the HPA. The objective is to ensure that the system functions without exceeding the maximum number of events that result in a violation of the response time defined in the SLO. A prototype was created to adjust the threshold values dynamically, utilising the proposed method. This prototype enables the evaluation and comparison of the proposed method with the HPA, which has the highest threshold set that meets the performance-based SLOs. The results of the experiments proved that the suggested method adjusts the thresholds to the desired service level with a 1–2% accuracy rate and only 4–10% resource overprovisioning, depending on the type of workload. Full article

(This article belongs to the Special Issue Advanced Theories, Applications and Techniques in Cloud and Distributed Computing)

► Show Figures

Figure 1

39 pages, 1887 KiB

Open AccessArticle

Efficient Resource Utilization in IoT and Cloud Computing

by Vivek Kumar Prasad, Debabrata Dansana, Madhuri D. Bhavsar, Biswaranjan Acharya, Vassilis C. Gerogiannis and Andreas Kanavos

Information 2023, 14(11), 619; https://doi.org/10.3390/info14110619 - 19 Nov 2023

Cited by 12 | Viewed by 6644

Abstract

With the proliferation of IoT devices, there has been exponential growth in data generation, placing substantial demands on both cloud computing (CC) and internet infrastructure. CC, renowned for its scalability and virtual resource provisioning, is of paramount importance in e-commerce applications. However, the [...] Read more.

With the proliferation of IoT devices, there has been exponential growth in data generation, placing substantial demands on both cloud computing (CC) and internet infrastructure. CC, renowned for its scalability and virtual resource provisioning, is of paramount importance in e-commerce applications. However, the dynamic nature of IoT and cloud services introduces unique challenges, notably in the establishment of service-level agreements (SLAs) and the continuous monitoring of compliance. This paper presents a versatile framework for the adaptation of e-commerce applications to IoT and CC environments. It introduces a comprehensive set of metrics designed to support SLAs by enabling periodic resource assessments, ensuring alignment with service-level objectives (SLOs). This policy-driven approach seeks to automate resource management in the era of CC, thereby reducing the dependency on extensive human intervention in e-commerce applications. This paper culminates with a case study that demonstrates the practical utilization of metrics and policies in the management of cloud resources. Furthermore, it provides valuable insights into the resource requisites for deploying e-commerce applications within the realms of the IoT and CC. This holistic approach holds the potential to streamline the monitoring and administration of CC services, ultimately enhancing their efficiency and reliability. Full article

(This article belongs to the Special Issue Systems Engineering and Knowledge Management)

► Show Figures

Figure 1

26 pages, 4909 KiB

Open AccessArticle

FireFace: Leveraging Internal Function Features for Configuration of Functions on Serverless Edge Platforms

by Ming Li, Jianshan Zhang, Jingfeng Lin, Zheyi Chen and Xianghan Zheng

Sensors 2023, 23(18), 7829; https://doi.org/10.3390/s23187829 - 12 Sep 2023

Cited by 2 | Viewed by 2054

Abstract

The emerging serverless computing has become a captivating paradigm for deploying cloud applications, alleviating developers’ concerns about infrastructure resource management by configuring necessary parameters such as latency and memory constraints. Existing resource configuration solutions for cloud-based serverless applications can be broadly classified into [...] Read more.

The emerging serverless computing has become a captivating paradigm for deploying cloud applications, alleviating developers’ concerns about infrastructure resource management by configuring necessary parameters such as latency and memory constraints. Existing resource configuration solutions for cloud-based serverless applications can be broadly classified into modeling based on historical data or a combination of sparse measurements and interpolation/modeling. In pursuit of service response and conserving network bandwidth, platforms have progressively expanded from the traditional cloud to the edge. Compared to cloud platforms, serverless edge platforms often lead to more running overhead due to their limited resources, resulting in undesirable financial costs for developers when using the existing solutions. Meanwhile, it is extremely challenging to handle the heterogeneity of edge platforms, characterized by distinct pricing owing to their varying resource preferences. To tackle these challenges, we propose an adaptive and efficient approach called FireFace, consisting of prediction and decision modules. The prediction module extracts the internal features of all functions within the serverless application and uses this information to predict the execution time of the functions under specific configuration schemes. Based on the prediction module, the decision module analyzes the environment information and uses the Adaptive Particle Swarm Optimization algorithm and Genetic Algorithm Operator (APSO-GA) algorithm to select the most suitable configuration plan for each function, including CPU, memory, and edge platforms. In this way, it is possible to effectively minimize the financial overhead while fulfilling the Service Level Objectives (SLOs). Extensive experimental results show that our prediction model obtains optimal results under all three metrics, and the prediction error rate for real-world serverless applications is in the range of 4.25∼9.51%. Our approach can find the optimal resource configuration scheme for each application, which saves 7.2∼44.8% on average compared to other classic algorithms. Moreover, FireFace exhibits rapid adaptability, efficiently adjusting resource allocation schemes in response to dynamic environments. Full article

(This article belongs to the Special Issue Serverless Continuum: Serverless Computing for the Edge-Cloud-IoT Continuum)

► Show Figures

Figure 1

31 pages, 7425 KiB

Open AccessArticle

Evaluating Task-Level CPU Efficiency for Distributed Stream Processing Systems

by Johannes Rank, Jonas Herget, Andreas Hein and Helmut Krcmar

Big Data Cogn. Comput. 2023, 7(1), 49; https://doi.org/10.3390/bdcc7010049 - 10 Mar 2023

Viewed by 3234

Abstract

Big Data and primarily distributed stream processing systems (DSPSs) are growing in complexity and scale. As a result, effective performance management to ensure that these systems meet the required service level objectives (SLOs) is becoming increasingly difficult. A key factor to consider when [...] Read more.

Big Data and primarily distributed stream processing systems (DSPSs) are growing in complexity and scale. As a result, effective performance management to ensure that these systems meet the required service level objectives (SLOs) is becoming increasingly difficult. A key factor to consider when evaluating the performance of a DSPS is CPU efficiency, which is the ratio of the workload processed by the system to the CPU resources invested. In this paper, we argue that developing new performance tools for creating DSPSs that can fulfill SLOs while using minimal resources is crucial. This is especially significant in edge computing situations where resources are limited and in large cloud deployments where conserving power and reducing computing expenses are essential. To address this challenge, we present a novel task-level approach for measuring CPU efficiency in DSPSs. Our approach supports various streaming frameworks, is adaptable, and comes with minimal overheads. This enables developers to understand the efficiency of different DSPSs at a granular level and provides insights that were not previously possible. Full article

(This article belongs to the Special Issue Distributed Applications and Services for Future Internet)

► Show Figures

Figure 1

19 pages, 632 KiB

Open AccessArticle

Tail Prediction for Heterogeneous Data Center Clusters

by Sharaf Malebary, Sami Alesawi and Hao Che

Processes 2023, 11(2), 407; https://doi.org/10.3390/pr11020407 - 30 Jan 2023

Viewed by 1955

Abstract

Service providers need to meet their service level objectives (SLOs) to ensure better client experiences. Predicting tail sojourn times of applications is an essential step to combat long tail latency. Therefore, as an attempt to further unravel the power of our prediction model, [...] Read more.

Service providers need to meet their service level objectives (SLOs) to ensure better client experiences. Predicting tail sojourn times of applications is an essential step to combat long tail latency. Therefore, as an attempt to further unravel the power of our prediction model, new study scenarios for heterogeneous environments will be introduced in this research by using either of two methods: white- or black-box solutions. This research presents several techniques for modeling clusters of inhomogeneous nodes. Those techniques are recognized as heterogeneous fork-join queuing networks (HFJQNs). Moreover, included in the research is a nested-event-based simulation model, borrowing help from multi-core technologies. This model adopts the multiprocessing technique to take part in its design to enable different architectural designs for all computing nodes. This novel implementation of the simulation model is believed to be the next logical step for research studies targeting heterogeneous clusters in addition to the several provided scenarios. Experimental results confirm that even with the existence of such heterogeneous conditions, the tail latency can be predicted at high-load regions with an approximated relative error of less than 15%. Full article

(This article belongs to the Special Issue Trends of Machine Learning in Multidisciplinary Engineering Processes)

► Show Figures

Figure 1

18 pages, 553 KiB

Open AccessArticle

HetSev: Exploiting Heterogeneity-Aware Autoscaling and Resource-Efficient Scheduling for Cost-Effective Machine-Learning Model Serving

by Hao Mo, Ligu Zhu, Lei Shi, Songfu Tan and Suping Wang

Electronics 2023, 12(1), 240; https://doi.org/10.3390/electronics12010240 - 3 Jan 2023

Cited by 2 | Viewed by 2803

Abstract

To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective manner. Novel autoscaling [...] Read more.

To accelerate the inference of machine-learning (ML) model serving, clusters of machines require the use of expensive hardware accelerators (e.g., GPUs) to reduce execution time. Advanced inference serving systems are needed to satisfy latency service-level objectives (SLOs) in a cost-effective manner. Novel autoscaling mechanisms that greedily minimize the number of service instances while ensuring SLO compliance are helpful. However, we find that it is not adequate to guarantee cost effectiveness across heterogeneous GPU hardware, and this does not maximize resource utilization. In this paper, we propose HetSev to address these challenges by incorporating heterogeneity-aware autoscaling and resource-efficient scheduling to achieve cost effectiveness. We develop an autoscaling mechanism which accounts for SLO compliance and GPU heterogeneity, thus provisioning the appropriate type and number of instances to guarantee cost effectiveness. We leverage multi-tenant inference to improve GPU resource utilization, while alleviating inter-tenant interference by avoiding the co-location of identical ML instances on the same GPU during placement decisions. HetSev is integrated into Kubernetes and deployed onto a heterogeneous GPU cluster. We evaluated the performance of HetSev using several representative ML models. Compared with default Kubernetes, HetSev reduces resource cost by up to 2.15× while meeting SLO requirements. Full article

(This article belongs to the Topic Future Internet Architecture: Difficulties and Opportunities)

► Show Figures

Figure 1

16 pages, 1186 KiB

Open AccessArticle

User-Engagement Score and SLIs/SLOs/SLAs Measurements Correlation of E-Business Projects Through Big Data Analysis

by Solomiia Fedushko, Taras Ustyianovych, Yuriy Syerov and Tomas Peracek

Appl. Sci. 2020, 10(24), 9112; https://doi.org/10.3390/app10249112 - 20 Dec 2020

Cited by 33 | Viewed by 4530

Abstract

The Covid-19 crisis lockdown caused rapid transformation to remote working/learning modes and the need for e-commerce-, web-education-related projects development, and maintenance. However, an increase in internet traffic has a direct impact on infrastructure and software performance. We study the problem of accurate and [...] Read more.

The Covid-19 crisis lockdown caused rapid transformation to remote working/learning modes and the need for e-commerce-, web-education-related projects development, and maintenance. However, an increase in internet traffic has a direct impact on infrastructure and software performance. We study the problem of accurate and quick web-project infrastructure issues/bottleneck/overload identification. The research aims to achieve and ensure the reliability and availability of a commerce/educational web project by providing system observability and Site Reliability Engineering (SRE) methods. In this research, we propose methods for technical condition assessment by applying the correlation of user-engagement score and Service Level Indicators (SLIs)/Service Level Objectives (SLOs)/Service Level Agreements (SLAs) measurements to identify user satisfaction types along with the infrastructure state. Our solution helps to improve content quality and, mainly, detect abnormal system behavior and poor infrastructure conditions. A straightforward interpretation of potential performance bottlenecks and vulnerabilities is achieved with the developed contingency table and correlation matrix for that purpose. We identify big data and system logs and metrics as the central sources that have performance issues during web-project usage. Throughout the analysis of an educational platform dataset, we found the main features of web-project content that have high user-engagement and provide value to services’ customers. According to our study, the usage and correlation of SLOs/SLAs with other critical metrics, such as user satisfaction or engagement improves early indication of potential system issues and avoids having users face them. These findings correspond to the concepts of SRE that focus on maintaining high service availability. Full article

(This article belongs to the Special Issue Digital Transformation in Manufacturing Industry Ⅱ)

► Show Figures

Figure 1

21 pages, 374 KiB

Open AccessArticle

A Taxonomy of Techniques for SLO Failure Prediction in Software Systems

by Johannes Grohmann, Nikolas Herbst, Avi Chalbani, Yair Arian, Noam Peretz and Samuel Kounev

Computers 2020, 9(1), 10; https://doi.org/10.3390/computers9010010 - 11 Feb 2020

Cited by 3 | Viewed by 5287

Abstract

Failure prediction is an important aspect of self-aware computing systems. Therefore, a multitude of different approaches has been proposed in the literature over the past few years. In this work, we propose a taxonomy for organizing works focusing on the prediction of Service [...] Read more.

Failure prediction is an important aspect of self-aware computing systems. Therefore, a multitude of different approaches has been proposed in the literature over the past few years. In this work, we propose a taxonomy for organizing works focusing on the prediction of Service Level Objective (SLO) failures. Our taxonomy classifies related work along the dimensions of the prediction target (e.g., anomaly detection, performance prediction, or failure prediction), the time horizon (e.g., detection or prediction, online or offline application), and the applied modeling type (e.g., time series forecasting, machine learning, or queueing theory). The classification is derived based on a systematic mapping of relevant papers in the area. Additionally, we give an overview of different techniques in each sub-group and address remaining challenges in order to guide future research. Full article

(This article belongs to the Special Issue Applications in Self-Aware Computing Systems and their Evaluation)

► Show Figures

Figure 1

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (10)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI